Benchmarks, while inherently contentious and not always representative of real-world performance, are an important tool in any kind of quantitative evaluation. That’s why nerds are obsessed with them. And not just nerds: companies use third-party benchmark results to make decisions on millions, sometimes billions of dollars in investment. So when someone finds evidence of a company putting its figurative thumb on the scale, it has the potential for big ramifications. Such is the case with some recent, and very specific, Intel Xeon CPU benchmarks.
The Standard Performance Evaluation Corporation, better known as SPEC, has invalidated over 2600 of its own results testing Xeon processors in the 2022 and 2023 version of its popular industrial SPEC CPU 2017 test. After investigating, SPEC found that Intel had used compilers that were, quote, “performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability.”
In layman’s terms, SPEC is accusing Intel of optimizing the compiler specifically for its benchmark, which means the results weren’t indicative of how end users could expect to see performance in the real world. Intel’s custom compiler might have been inflating the relevant results of the SPEC test by up to 9%. For more technical details (many of which are, frankly, beyond my level of compsci understanding) check out the reports from ServeTheHome and Phoronix, via Tom’s Hardware.
SPEC uncovered these results while looking back over its own benchmark database, and while it’s not deleting them for the sake of historical records, it is invalidating them for its own reports. Slightly newer versions of the compilers used in the latest industrial Xeon processors, the 5th-gen Emerald Rapids series, do not use these allegedly performance-enhancing APIs.
I’ll point out that both the Xeon processors and the SPEC 2017 test are some high-level hardware meant for “big iron” industrial and educational applications, and aren’t especially relevant for the consumer market we typically cover. But companies giving their chips a little extra oomph for the sake of attention-grabbing benchmarks isn’t exactly novel. Most recently, mobile chip suppliers across the industry (Qualcomm, Samsung, and MediaTek, supplying chips in almost every non-Apple phone) were accused of effectively faking Android performance results in 2020. Accusations of interference in companies’ own self-reported benchmarks, often without specific parameters and therefore unverifiable, are incredibly common.