You need to log in to create posts and topics.

AMD Ryzen 3000 investigated

This is a brief summary with limited testing to highlight changes to memory and fabric behavior on Zen 2 (Matisse).

As you've probably seen, Ryzen 3000 is capable of much higher memory speeds than previous generations. The main reason for this is added circuitry on-chip that allows running the memory clock (MCLK) asynchronous from the fabric clock (FCLK). While this enables higher memory frequency, it also increases access latency because the data travelling across different frequency domains has to be synchronized.

This post tries to answer the following questions:

  • How high memory frequency can be expected with Ryzen 3000 on X370/X470/X570?
  • How high fabric clock can be expected?
  • Is there a benefit to high memory frequency and high fabric clock?
  • How big is the latency penalty when running MCLK != 2*FCLK?

An issue with these tests is that BIOS releases are at this time very unpolished. There are new AGESA versions and patches released frequently which board vendors struggle to keep up with. Especially X370 and X470 versions are problematic for memory overclocking. Both the AMD memory initialization method and vendor DRAM settings seem not to not be working or lack tuning. At best these tests will be able to hint at how things will be for end users.

CPUs

  • Ryzen 5 3600X
  • Ryzen 9 3900X

Motherboards

  • ROG Crosshair VI Hero (X370) BIOS 7106 AGESA 1.0.0.2
  • ROG Crosshair VII Hero (X470) BIOS 2406 1.0.0.2
  • ROG Crosshair VIII Formula (X570) BIOS 7702 1.0.0.3

Memory kits

  • Gskill F4-3200C14D-16GTR (2x8GB Samsung B Single-Rank)

Settings

CPU Frequency (RamTest) Default
CPU Frequency (AIDA64) Fixed 4.0G 1.25V
SOC Voltage 1.25V
Timings (CL16) 16-16-16-16-45-1T-GD=En
Timings (CL18) 18-18-18-18-45-1T-GD=En

Issues

C6H-7106 DRAM Voltage at the beginning of POST is always 1.200V which limits the max memory frequency. It's possible to work around by first booting with lower DRAM Frequency and higher voltage, then only increasing DRAM Frequency in steps without the board shutting down.
C7H-2406 DRAM Vboot is always 1.200V by default, it can be manually set instead. However the setting is lost after standby power is removed from the motherboard.
C6H-7106 /C7H-2406 After failing memory overclocking you get stuck at C5 POST code which never seems to recover. The only way to get back is to clear CMOS.

Results and discussion

This shows the highest DRAM Frequency that would pass 100% in RamTest with 2x8GB Samsung B-die. 1.25V SOC and 1.40V DRAM Voltage was used in all cases. These settings are not considered 100% stable and are only an indicator of what's possible without much tuning. I plan to test and update with additional configurations but am currently lacking the time. Above 4000 MHz with 2x16GB has been demonstrated on C8F. The tests were completed at 1733 MHz FCLK, anything higher would cause stability issues. Up to 1900 MHz FCLK is possible on some chips, which will depend on the chip quality. Adjusting SOC and VDDG voltages may help to stabilize higher speeds. The C7H was getting surprisingly close to the C8F, both using "Daisy Chain" memory layout topology. The C6H with its "T" topology is doing much worse, likely because of less time spent tuning signaling parameters at this point. These numbers are expected to improve with the next couple of BIOS releases.

3600X / 1xCCD

Setting MCLK = 2*FCLK (1:1) yields by far the highest read bandwidth at FCLK / MCLK = 1800 / 3600. Increasing DRAM Frequency by one step to 3666 MT/s while keeping the fabric at 1800 MHz reveals the penalty when using asynchronous clock domains. The read bandwidth immediately drops 1.7 GB/s. At higher memory speeds and CAS Latency, it can be seen how fabric and memory clocks scale. Increasing DRAM Frequency from 4266 MT/s to 4333 MT/s at the same 1733 MHz fabric clock doesn't increase read bandwidth. Instead with DRAM Frequency fixed at 4333 MT/s, increasing FCLK from 1733 MHZ to 1800 MHz immediately shows a gain of 1.6 GB/s.

3900X / 2xCCD

With 2xCCD read bandwidth is no longer penalized in asynchronous mode but keeps scaling with both fabric and memory clock.

3600X / 1xCCD

Due to Infinity Fabric redesign between CCD and IOD the memory write bandwidth is half on CPUs with 1xCCD (<=8C) compared to previous generations (see [url]https://hexus.net/tech/reviews/cpu/132374-amd-ryzen-9-3900x-ryzen-7-3700x/[/url]). This is clearly reflected in these results, where the bandwidth only seem to depend on the fabric speed.

3900X / 2xCCD

As expected with 2xCCD write bandwidth is approximately doubled. Similarly to the read bandwidth scenario, asynchronous mode does not longer seem to reduce the write bandwidth.

3600X / 1xCCD

Moving away from synchronous memory/fabric clocks doesn't hurt copy bandwidth by much, and it seems to scale nicely both with increased memory and fabric speeds.

3900X / 2xCCD

Copy bandwidth gets a nice boost from the increased write bandwidth with 2xCCD.

As mentioned in AMD's own presentation, the main issue with asynchronous clock domains is the latency penalty. Changing settings from 1800 / 3600 synchronous to 1800 / 3666 asynchronous increases latency by over 10 ns. The difference is slowly reduced as memory and fabric clocks are increased, but never fully recovers.

Summary

  • X370/X470 BIOS is still immature when using Ryzen 3000 (depending on the board)
  • Highest achieved memory frequency was 4333 MT/s on X570, 4200 MT/s on X470 and 3666 MT/s on X370. The results will most likely get closer with BIOS updates.
  • Fabric clock target 1700-1900 MHz []Memory latency penalty ~10ns when moving from synchronous to asynchronous memory/fabric clock, which can be reduced by further increasing memory frequency
  • Highest possible fabric clock with memory synchronized (MCLK = 2FCLK) yields the highest read bandwidth and lowest latency with 1xCCD
  • Write bandwidth only scales with fabric clock with 1xCCD []Copy bandwidth benefits from both memory and fabric clock, and is not hurt much by increased latency
  • 2xCCD read, write and copy bandwidth keeps scaling with both memory and fabric clock without being hurt by asynchronous mode. In this case, it comes down to choosing between lower latency or higher bandwidth.

Bonus: PCI-E Gen4 test on X470

With the current available C7H BIOS 2406, PCI-E Gen 4 will work with any PCI-E Gen 4 device connected to the CPU PCI-E lanes without any changes to BIOS settings. Just plug the device and install the driver. Similarly a PCI-E Gen 4 NVMe drive plugged in the CPU M.2-slot should work at Gen 4. The conclusion is that at least on AGESA 1.0.0.2, PCI-E Gen 4 is not actively blocked on non-X570 motherboards. If it will work on your board or in specific slots will be down to the board layout. The closer the slot is to the CPU (shorter trace length) the higher the chance of success. Increased PCB layer count and lack of PCI-E switches will also increase the chance of success.

Proof of concept with Radeon 5700XT:

edit: Added detailed settings
edit2: Added note about 1/2 CCD configurations and write bandwidth
edit3: Added 3900X results and PCI-E Gen 4 test on X470