AMD Zen 2 in every segment

After releasing Zen 2 for mainstream desktop as Ryzen 3000 and for server as EPYC 7002, AMD is following up with a 7 nm update to the Threadripper platform. While desktop users are mostly concerned with performance over power, improvements to energy efficiency usually allows for higher overall performance. The release of second generation Threadripper processors in 2018 increased the peak core count from 16 to 32 resulting in escalated power consumption. This article takes a look at how Zen 2’s improved energy efficiency translates to raw performance for high-end desktops.

Platform changes

  • Unified memory architecture using a 12 nm I/O-die coupled with up to eight 7 nm Zen 2 chiplets
  • Increased fabric bandwidth and doubled L3 cache
  • Support for higher memory speeds and asynchronous memory/fabric clock domains
  • 3960X (24 core) and 3970X (32 cores) at launch, with up to 64 cores coming later
  • CPU PCI-E links upgraded to Gen 4
  • CPU PCI-E lanes from 60 (2×16, 2×8, 3×4) to 56 (2×16, 2×8, 2×4)
  • New TRX40 chipset (same as X570)
  • CPU-chipset link upgraded from x4 Gen 3 to x8 Gen 4 (bandwidth from 3.94 GB/s to 15.75 GB/s)
  • Four USB 3.2 Gen 2 (previously eight USB 3.1 Gen 1)

While the new AM4 and SP3 processors drop into existing boards with a firmware update (sacrificing PCI-E 4.0 capability), backwards compatibility is broken between TR4 and the new TRX4 socket. The physical socket is the same LGA4094 model used for both EPYC and previous Threadripper parts. The CPU-chipset link is increased from four to eight lanes through a re-purposed PCI-E x4 port, which connects to the TRX40 chipset.

Test Setup

  • AMD Threadripper “3970X” ES (32 cores, 4.4 GHz Boost, 280W TDP) / 2990WX (32 cores, 4.2 GHz Boost, 250W TDP)
  • ASUS ROG Zenith II Extreme (BIOS 0042) / Zenith Extreme (BIOS 2001)
  • G.SKILL F4-4000C18Q-32GTZKW (4x8GB)
  • Custom water cooling (Rajintek CWB-TR4 RBW + Bykski B-RD360-TK60)
  • Windows 10 1903, AMD Chipset Driver 1.9.27.1033, Ryzen High Performance power plan

The Threadripper 3000 processor used in this test is an engineering sample with slightly different specifications compared to a retail 3970X. Notably the boost and base clocks are 100 MHz lower, which means stock performance is lower than a final part.

Method

Any benchmark results are the average of three runs. In the single threaded tests, the thread affinity for the benchmark was manually set to the highest ranked core. HWInfo 6.14 was used to record the monitoring information during the run. The average frequency was measured using the “Effective Clock” item. The average power was measured directly from the VRM controller and reports the CPU Core power or SOC power only. Load-Line Calibration was set to Level 5 for the overclocked results.

Results and Analysis

Memory Performance

The new memory architecture coupled with increased fabric speed shows major improvements in memory bandwidth at the same settings. At 3200 MT/s the read bandwidth is 13.4 GB/s higher. Since all cores now have to access memory through the I/O-die, there is a best case latency penalty. At these settings, latency is increased from 66.4 ns to 84.2 ns.

As with regular desktop Ryzen 3000 processors, higher memory speeds are possible with fabric and memory independently clocked. In this case, the system was capable of 4400 MT/s while locking fabric speed to 1800 MHz.

Stock Performance

Comparing the multi-threaded stock performance reveals just how much more efficient Zen 2 is compared to the last generation. At the same power consumption the new processor is 39% faster while maintaining 14% higher clock speeds and reports 14 °C lower temperatures. This is in one benchmark, but arguably other applications may benefit more due to the simplified memory architecture being less dependent on software optimization.

Moving over to the single-threaded test shows a performance increase of 22% at 7% higher frequency. The reported temperatures are similar across the single- and multi-threaded tests. Probably the reported temperature is considering peaks from several sensors and a single core is allowed to use more power when others are idle.

Overclocked Performance

Both processors were overclocked to a point where they could consistently pass the benchmark, which resulted in 4.2 GHz on the 2990WX and 4.4 GHz on the 3970X. In this all-out configuration, the 3970X ES is 31% faster while consuming 30% less power. The reported temperature is also significantly lower.

In heavier tests, the frequencies had to be dropped. The 2990WX could pass for at least 10 minutes in Prime 95 at 3.9 GHz. The 3970X ES passed at 4.3 GHz without AVX and 4.2 GHz when enabled. When allowing AVX instructions the 2990WX power consumption drops, because it takes two clock cycles to executes 256-bit AVX instructions. The 3970X ES can execute them in a single cycle, resulting in higher power consumption even with reduced frequency and voltage.

SOC Power Consumption

At default SOC voltage for 3200 MT/s memory speed (1.05 V in both cases) there are substantial power reductions from the chiplet design. Idle power consumption drops by 56% and in MemTest64 it’s 49% lower.

References

Discuss