Anybody planning a Ryzen build?


Recommended Posts

13 hours ago, Pauven said:

An update on stability:  

  • After 2 back-to-back hangs in unRAID, the system has run over 12 hours in Win10 with no problems whatsoever.  In all my testing with Win10, including overnight torture tests, I've never had a single problem in Windows.  And yes, this was with the 64GB memory kit that showed errors in Memtest.

 

Sooooo, it seems like you are primarily seeing errors in MemTest and unRAID.

 

Not to throw more fuel on the fire, but I noticed your screen shot showed "MemTest86 5.01"?  My screen is showing "PassMark MemTest86 V7.3".

 

I don't remember all the details on the various different MemTest versions.  Might be worth checking both in Google to see which is latest, and if the PassMark version is better suited to your configuration?  Especially if Windows 10 and open SUSE are both happy on this RAM?

 

- Bill

Link to comment
10 hours ago, ufopinball said:

 

Sooooo, it seems like you are primarily seeing errors in MemTest and unRAID.

 

Not to throw more fuel on the fire, but I noticed your screen shot showed "MemTest86 5.01"?  My screen is showing "PassMark MemTest86 V7.3".

 

I don't remember all the details on the various different MemTest versions.  Might be worth checking both in Google to see which is latest, and if the PassMark version is better suited to your configuration?  Especially if Windows 10 and open SUSE are both happy on this RAM?

 

- Bill

 

5.01 is the Memtest86 that comes with the unRAID distro.  I don't think it would make much difference running the Windows flavor.

 

I swapped in the one stick of the new QVL'd RAM (which booted by default to 2400, unlike the G.Skills that booted by default to 2133.  Seems there are BIOS optimizations in play).

 

I reset my BIOS back to defaults.

 

I created a new USB unRAID stick.

 

And 5 hours later, crash...

 

I then booted into openSUSE, and 18 hours later, still running great.  I plan to let it get to 24 hours before going to another test.

 

Am I really the only person experiencing crashes with unRAID on Ryzen?  

 

I'm pretty sure I don't have a hardware problem.  I've tried 2 different power supplies, 3 different RAMs, different BIOS versions including resetting BIOS to defaults, 2 different USB sticks, moving memory sticks and the GPU around to different slots, different USB ports, unRAID Safe Mode and with Plugins.  No matter what I do, Windows and openSUSE have worked perfectly, and unRAID produces consistent crashes, typically within an hour but have had occasional long stretches of uptime (max is 13 hours).

 

I'm purely guessing here, but I'm thinking there is something about my motherboard that is incompatible with the current version of unRAID.

 

I think my next test is going to be finding a 4.9.10 distro of Linux somewhere (same version as unRAID) and see if it crashes too.  Isn't unRAID based upon slackware?  Perhaps I can find that distro in the right version...

 

I'm trying hard not to point my finger at unRAID, but I'm running outta culprits.

 

-Paul

Link to comment

Hmmm, latest slackware is v14.2 (which apparently is what unRAID is based upon), but that release is 9 months old, and has kernel 4.4.14.  I guess Lime-Tech is doing some significant upgrades under the hood.

 

If anyone would like to recommend a distro running 4.9.10 for me to test, I'm listening.

 

-Paul

Link to comment
11 minutes ago, Pauven said:

Hmmm, latest slackware is v14.2 (which apparently is what unRAID is based upon), but that release is 9 months old, and has kernel 4.4.14.  I guess Lime-Tech is doing some significant upgrades under the hood.

 

If anyone would like to recommend a distro running 4.9.10 for me to test, I'm listening.

 

-Paul

i'm on Ubuntu, so you can install Ubuntu 16.04 for example and then update it to kernel 4.9. Instructions here: http://ubuntuhandbook.org/index.php/2016/12/install-linux-kernel-4-9-ubuntu-linux-mint/

i did it in my test environment many times to compare hardware compatibility to unRAID.. 

 

 

  • Upvote 1
Link to comment
1 hour ago, uldise said:

i'm on Ubuntu, so you can install Ubuntu 16.04 for example and then update it to kernel 4.9. Instructions here: http://ubuntuhandbook.org/index.php/2016/12/install-linux-kernel-4-9-ubuntu-linux-mint/

i did it in my test environment many times to compare hardware compatibility to unRAID.. 

 

 

 

That's extremely helpful, thanks!

 

Using that article, I found the 4.9.10 mainline build, so I can get to the exact same kernel version.

 

-Paul

 

Link to comment
2 hours ago, Pauven said:

 

5.01 is the Memtest86 that comes with the unRAID distro.  I don't think it would make much difference running the Windows flavor.

 

 

Okay, so MemTest86 is "The original industry standard memory diagnostic utility", located here: http://www.memtest86.com/ -- latest version is V7.3, released 27 Feb 2017.  This is the version I used and is not Windows based.  It boots off a thumb drive, I believe in non-UEFI mode.  I unplugged everything else and let the BIOS decide what's best.

 

Memtest86+ is "Based on the well-known original memtest86", located here: http://www.memtest.org/ -- latest version is 5.01, released 27 Sep 2013, and appears to be the edition included with unRAID.

 

I think both are similar in terms of the base code.  It seems that the 7.3 version should be more advanced, but I don't know if it will uncover any issues or point to any culprits.  It runs a stock 4 passes unless you configure it otherwise.  It would be nicer if it would run "forever" and just keep track of how many passes it has completed.

 

There's also an option to save an HTML file with your test results, so mine are attached.  The first two sessions (2x4 = 8 passes) are complete.  For the last test I removed #13 the RowHammer test since I'm less interested in that as a test parameter.  RowHammer also takes forever.  This time I set it for 25 passes, and just ended it partway through the 12th pass since I felt the results were satisfactory.  I also wanted to spend some time measuring power consumption today.

 

- Bill

MemTest86-Report-20170303-234108.html

MemTest86-Report-20170317-202218.html

MemTest86-Report-20170318-161210.html

  • Upvote 1
Link to comment
3 hours ago, Pauven said:
13 hours ago, ufopinball said:

 

Sooooo, it seems like you are primarily seeing errors in MemTest and unRAID.

 

Not to throw more fuel on the fire, but I noticed your screen shot showed "MemTest86 5.01"?  My screen is showing "PassMark MemTest86 V7.3".

 

I don't remember all the details on the various different MemTest versions.  Might be worth checking both in Google to see which is latest, and if the PassMark version is better suited to your configuration?  Especially if Windows 10 and open SUSE are both happy on this RAM?

 

5.01 is the Memtest86 that comes with the unRAID distro.  I don't think it would make much difference running the Windows flavor.

 

Just to clarify, both Memtests are based on the same code, and neither are Linux or Windows programs, probably pure assembly based (maybe something added for the PassMark GUI).  If either one detects errors, you can trust there are problems.  The PassMark one is more advanced, supports the latest memory tech, and is probably the one you should use.  For more detail on both, including their history and differences, see my (unfinished) FAQ entry:  How do I test my RAM?

 

The general rule is that if you can see any errors at all, there's a bad RAM stick, don't use it.  But I think it's possible to see RAM errors if it's configured wrong, wrong timings or wrong voltages.  Not an expert opinion though ...

 

Edit:  Hah!  I see ufopinball just beat me!

Edited by RobJ
add note
Link to comment
2 hours ago, Pauven said:

Hmmm, latest slackware is v14.2 (which apparently is what unRAID is based upon), but that release is 9 months old, and has kernel 4.4.14.  I guess Lime-Tech is doing some significant upgrades under the hood.

 

If anyone would like to recommend a distro running 4.9.10 for me to test, I'm listening.

 

-Paul

 

My guess is the unRAID team compiles Slackware from the sources?  That would certainly give them more flexibility when it comes to kernels and components/packages, etc.  Also I think they have to so they can integrate the unRAID code as well.

 

When I was doing IOMMU testing, I ran Ubuntu Desktop version 17.04 downloaded from here:  http://cdimage.ubuntu.com/daily-live/current/

 

I installed the above then followed similar steps to those shown here:  https://www.linuxbabe.com/ubuntu/install-linux-kernel-4-8-ubuntu-16-04-16-10

 

This allowed me to test kernels 4.10, 4.10.1, and 4.11-rc1.  Best to go here first and scroll to the bottom:  http://kernel.ubuntu.com/~kernel-ppa/mainline/

 

Decide which version you want to test, and adjust the parameters for the second link appropriately.

 

Note:  It turns out my IOMMU issue was in the BIOS because ASUS put the setting on a different screen than what's in the manual.

 

- Bill

 

Link to comment

Thanks gentlemen for all the great info.

 

I installed Ubuntu 16.10, and then updated to 4.9.10.  It is running now.

 

My goal, I guess, is to see if there's something incompatible with kernel 4.9.10 (in which case Ubuntu will crash/reboot), or if there is something incompatible with Lime-Tech's unRAID compilation. Based upon my success with 4.10.2 (on openSUSE), I really don't expect Ubuntu 4.9.10 to cause problems.

 

I still don't think it is a Ryzen issue, as several other users seem to be running unRAID on Ryzen with no issues.  I think it might be something about my specific motherboard or BIOS.

 

Planning for the worst, how do I go about collecting the data Lime-Tech will need to troubleshoot?  I see the Diagnostics, but I'm thinking they may also need the log files from when the system goes belly-up.  How do I actually save those?  Seems like the log files go to RAM and are lost on a reboot.

 

-Paul

Edited by Pauven
Link to comment
59 minutes ago, RobJ said:

The general rule is that if you can see any errors at all, there's a bad RAM stick, don't use it.  But I think it's possible to see RAM errors if it's configured wrong, wrong timings or wrong voltages.  Not an expert opinion though ...

 

Agree, memtest errors always mean a hardware issue, usually a bad stick, but definatly a problem, even if crashing has been limited to unRAID so far.

Link to comment
On 3/18/2017 at 11:10 AM, lionceau said:

Digital Foundry reports 39W idle in their Ryzen review.

Specs mentioned: R7 1800X, MSI X370 "Gaming Titanium", 32GB RAM, Titan X Pascal, Windows 10

 

Did some power usage testing on my own rig.

 

Specs:

  • ASUS Prime X370-PRO MB (Bios 0504)
  • AMD Ryzen 7 1800X 8-Core 3.6GHz
  • Crucial CT16G4DFD8213 DDR4 2133 (4 x 16GB = 64GB)
  • Seasonic SS-660XP2 660W 80 PLUS PLATINUM
  • Asus Radeon 6450 1GB (Desktop) Graphics Card
  • 4 x 4TB Hitachi/HGST Deskstar 7K4000
  • Crucial C300 128GB SSD (Cache)
  • SYBA USB 3.0 PCI-e x1 2.0 Card
  • StarTech USB Audio Adapter
  • unRAID 6.3.2  

Standby: 1 watt

 

No drives, only CPU/RAM/GPU:

  • BIOS Setup: 70 watts
  • unRAID @ root login (idle): 44 watts

 4 x 4TB drives (spun up) + SSD + SYBA USB card + USB sound

  • BIOS Setup: 106 watts
  • unRAID @ root login (idle): 78 watts
  • unRAID w/ Windows 10 VM (idle): 86 watts
  • unRAID w/ Windows 10 VM (idle), HGST drives spun down: 68 watts

Note: HGST drives are connected to motherboard SATA ports
 
4 x 4TB drives (spun up) + SSD + SYBA USB card + USB sound + SAS2LP controller + Seagate 2TB external + 32GB thumb drive

  • BIOS Setup: 118 watts
  • unRAID @ root login (idle): 94 watts
  • unRAID w/ Windows 10 VM (idle): 98 watts
  • unRAID w/ Windows 10 VM (idle), HGST drives spun down: 68 watts

Note 1: HGST drives are connected to SAS2LP controller ports
Note 2: Seagate 2TB external & 32GB thumb drives are connected to the SYBA USB 3.0 card and are being passed through to the Windows 10 VM

 

Power consumption was measured using Kill-a-Watt, and tends to fluctuate +/- 5 watts, depending on what the system is doing in the background.  Your results may vary, but the above rough results seem reasonable.  Ultimately, this will become the new "Cortex" unRAID server and will need to support 16 SATA devices ... that's why the SuperMicro AOC-SAS2LP-MV8 controller is included.  I will take more power measurements once I make the full switchover, but due to other commitments, that won't happen for at least a week.

 

For the moment, I'll run a parity check against the SAS2LP and see if there are still problems.  See this thread for details.  In the end, I'll likely stick with the Dell HV52W PERC H310 8-Port controller since when I measured last, it used slightly less power than the SAS2LP.  Mainly this test is out of curiosity that a newer motherboard can make the difference with the SAS2LP issue.

 

- Bill

 

  • Upvote 1
Link to comment

Another update on stability.

 

I tested my hardware on Ubuntu 6.10, upgraded to kernel version 4.9.10, and it ran for over 44 hours without issue.  I finally decided this is not a kernel issue and rebooted back into unRAID.

 

In my previous unRAID testing, I had been doing it without a registration key and without creating an array.  Basically, it was just there sitting on the GUI with no array and no work to do.  I noticed some messages in the log file every few minutes, seemingly related to not having a registration key or a super.dat file, and thought perhaps this "unrest" was somehow the source of the problem.

 

So I got a trial key installed, and created a single drive array (no parity), which I started up fine.

 

I also tailed the output of the syslog file to a file on my flash drive, and also kept the logging window open on the GUI, hoping to catch any errors generated right before an unRAID crash. (Side Note:  I tried to configure a /etc/syslog.conf file to write directly to the flash drive, but this isn't working, perhaps a 6.x issue?)

 

Within 2 hours unRAID crashed again and rebooted.  There were no new lines written to the log file from the hour preceding the crash, so if there were errors generated, they didn't get written to the log file before the server rebooted.

 

So at this point, I feel it is safe to say the problem is unRAID on my hardware.  I can't tell if it is processor or motherboard related, could be either, but since I am the only one having these problems with unRAID, and also the only unRAID user on this motherboard, most likely it is an unRAID issue with my ASRock X370 Fatal1ty Gaming Professional motherboard, and not an issue with the AMD Ryzen processor itself.

 

I'm at a complete loss on where to go from here.

 

Here are the clues:

  • The server only crashes/hangs/reboots in unRAID.  Completely stable in Win10 and other Linux distros.
  • The server crashes in unRAID Safe Mode or in regular plugin mode.
  • The server typically crashes/hangs/reboots within 1-2 hours, but this varies and on one occasion it made it about 13 hours.
  • On one occasion I could see a "Hangcheck: hangcheck value past margin!" error message on the console, and the server was hung.
  • On multiple occasions after a crash/reboot, there were messages that a Machine Check Exception (MCE) had occurred.  I was not able to view the details in unRAID, but when I configured my server to reboot into Windows after a crash, I was able to see the MCE details in Windows, complaining of a Cache Heirarchy Error.
  • Diagnostics are attached.

-Paul

Ryzen-tower-diagnostics-20170321-1117.zip

Link to comment

Dunno if this has affected anyone here, but I guess a big bug is getting fixed in Ryzen via a BIOS upgrade:

 

AMD found the root problem causing its new Ryzen processors to freeze desktops

 

Read more: http://www.digitaltrends.com/computing/ryzen-amd-bios-fix-fma3-crash/

 

Last BIOS release for the ASUS Prime X370-PRO was nearly a month ago.  I hope the new release brings more fixes/features than just this ... whenever it's ready.

 

- Bill

 

  • Upvote 1
Link to comment

I've pulled the trigger on a 1700X  + crosshair vi. 

Will do some testing, i know straight away this board does not support ECC as stated early.

For anyone out there using EKWB evo cpu block that was bought this year mine came with the AM4 fittings.

"Why the hero model akio"?  - well i have the asus maxi IX z270 and both have the water cooling stuff i use  , since there's no news on the kabylake refresh with 6 or more cores,  i'm going back to team red!

 

My ideal unraid  rig would have been an x99 with plenty of pci gen3's loads of lanes and 8/10core cpu but i cant touch the prices of that.

 

Link to comment
2 hours ago, Akio said:

I've pulled the trigger on a 1700X  + crosshair vi. 

Will do some testing, i know straight away this board does not support ECC as stated early.

For anyone out there using EKWB evo cpu block that was bought this year mine came with the AM4 fittings.

"Why the hero model akio"?  - well i have the asus maxi IX z270 and both have the water cooling stuff i use  , since there's no news on the kabylake refresh with 6 or more cores,  i'm going back to team red!

 

My ideal unraid  rig would have been an x99 with plenty of pci gen3's loads of lanes and 8/10core cpu but i cant touch the prices of that.

 

 

I just downgraded my x99 64gb 2600v3 10 core because I hardly ever used the full capacity of it. I'm now waiting to build my 1800x and asus prime with 32gb gskill. TBH it's not that much of a downgrade since I'll be saving some watts by not using the Xeon...

Link to comment

there just isnt the pci-e lanes for all my needs i think?  can someon firm this before i eyes glaze over

only 3 pci-e slots for my 4 cards. i like having the 10gb just for direct connection to the backup server but i could make the sacrifice if it meant i run a windows vm and have this ryzen build my main pc/nash unraid beast  water cooled  too in the lian li d8000

gt710 - (OS vga)

gtx 1070 G1

lsi 9201 16i - gen2 x8

intel x540-t1  gen2 x8

m.2 comes from chipset? taking out 2 sata?

 

any advice would be grateful guys.  lost my job before xmas  so making a full switch to unraid for windows VM+nas functionality  would be good to free up the other hardware and sell.

Link to comment

I think you are right on the threshold of maxing out the PCIe lanes, but not quite over.

 

On my motherboard, I wanted to understand the impact of running the GPU in the third PCIe x16 slot, which was wired only for PCIe 2.0 x4 and routed through the x370 chipset.

 

I ran the latest 3DMark Time Spy to test my GTX 670 in both the primary PCIe 3.0 x16 slot, and again in the PCIe 2.0 x4 slot.

 

Surprisingly, the score was almost identical, only a few points lower in the 2.0 x4 slot (2038 in 3.0 x16 vs. 1974 in 2.0 x4).  Even more surprising, when I looked in AIDA64, it reported that the GPU in that 2.0 x4 slot was actually connecting at 1.1 x4. Not sure if that info was correct, and if so why it was gen1.1 and not gen2.0 (probably because the bandwidth was being shared with other devices), but it really opened my eyes to how little bandwidth GPU's need on the PCIe bus.  I felt a 3% performance reduction was a small penalty to pay to keep my main PCIe slot at full speed for my 24-drive storage controller.

 

Hypothetically, you could put your LSI 9201 and Intel x540-t1 in the primary and secondary PCIe 3.0 slots, and they will both have full speed 3.0 x8, more than each device needs.  The two GPU's could then be plugged into the remaining slots, and they would go through the chipset.  I know the GTX 1070 is quite a bit faster than my GTX 670, and it might be more sensitive to the reduced bandwidth, but hopefully you would see acceptable results similar to mine.

 

That just leaves the M.2.  Ryzen has 24 lanes of PCIe 3.0:  16 lanes for the primary and secondary PCIe x16 slots, 4 lanes for the X370 chipset, and the last 4 lanes for storage.  It's up to the motherboard maker to determine what storage it goes to.  It could be dedicated to the M.2 slot (so full PCIe 3.0 x4) which is most likely, or it could be used for the many SATA ports, in which case the M.2 could be going through the X370 and sharing the bandwidth.  My motherboard actually has two M.2 slots, one CPU direct connect PCIe 3.0 x4, and one routed through the X370 chipset rated at PCIe 2.0 x4, but also sharing it's bandwidth with other devices.

 

The last thing you need to consider is whether or not your motherboard will shut off certain slots/ports when other slots/ports are in use.  For example, on my motherboard I have to choose between using the second M.2 port @ 2.0 x4, or the third PCIe x16 slot @ 2.0 x4, it won't let you use both and disables the one not in use.  Every motherboard is different, but it is easy to download the manual from the website and see what it says.

 

TL;DR - You're probably okay to run the GPU's in the slower slots.

 

-Paul

 

  • Upvote 1
Link to comment

Thanks paul, will have a look at the online manual. 

The hero only has 3 full pci slots.  If it was possible to use an pcie-x1 card as OS  then it would allow to have main full size pcie lane for high end card. 

 

From my understanding it isnt possible but was a long shot idea. 

Link to comment
5 minutes ago, Akio said:

Thanks paul, will have a look at the online manual. 

The hero only has 3 full pci slots.  If it was possible to use an pcie-x1 card as OS  then it would allow to have main full size pcie lane for high end card. 

 

From my understanding it isnt possible but was a long shot idea. 

 

I just looked at the specs on their website.  You would have:

  • 2 x PCIe 3.0 x16  sized slots (running as 2 x PCIe 3.0 x8)

plus these following 4 slots that share bandwidth with each other:

  • 1 x PCIe 2.0 x16 sized slot (running as 2.0 x4 unless something plugged into PCIe 3.0 x1 slots, then probably runs as 1.1 x4).
  • 3 x PCIe 2.0 x1 sized slots

That means if you use the PCIe x1 GT710 as the OS video card in any one of the three PCIe 2.0 x1 slots, the PCIe 2.0 x4 slot will downgrade itself to share its bandwidth, and will probably run at PCIe 1.1 x4 (half speed), but worst case would run at PCIe 2.0 x1 (quarter speed, but only if you used all four slots).

 

Like I said, this is the exact same behavior I found on my motherboard, and when I ran my GTX 670 at PCIe 1.1 x4, it gave me 97% of full performance in the PCIe 3.0 x16 slot.  I couldn't believe how well it worked.

Link to comment

Yes, GPU PCIe bandwidth has surprisingly little effect on gaming performance in my experience.

Many OSD programs show PCIe bus saturation and the most I see is 5-7% on loading screens where the VRAM is being filled with textures from RAM or SSD. Typical bus utilisation during gaming is around 1-2% even when with remote streaming.

Of course this could be wildly different with other workloads running CUDA.

 

The GPU PCIe bandwidth topic often comes up with people running external GPUs over Thunderbolt. What they don’t realise that it's the significant latency from protocol encapsulation/translation that robs performance, not the raw throughput.

 

Edited by lionceau
Link to comment
On 3/21/2017 at 2:26 PM, Pauven said:

Another update on stability.

 

I tested my hardware on Ubuntu 6.10, upgraded to kernel version 4.9.10, and it ran for over 44 hours without issue.  I finally decided this is not a kernel issue and rebooted back into unRAID.

 

In my previous unRAID testing, I had been doing it without a registration key and without creating an array.  Basically, it was just there sitting on the GUI with no array and no work to do.  I noticed some messages in the log file every few minutes, seemingly related to not having a registration key or a super.dat file, and thought perhaps this "unrest" was somehow the source of the problem.

 

So I got a trial key installed, and created a single drive array (no parity), which I started up fine.

 

I also tailed the output of the syslog file to a file on my flash drive, and also kept the logging window open on the GUI, hoping to catch any errors generated right before an unRAID crash. (Side Note:  I tried to configure a /etc/syslog.conf file to write directly to the flash drive, but this isn't working, perhaps a 6.x issue?)

 

Within 2 hours unRAID crashed again and rebooted.  There were no new lines written to the log file from the hour preceding the crash, so if there were errors generated, they didn't get written to the log file before the server rebooted.

 

So at this point, I feel it is safe to say the problem is unRAID on my hardware.  I can't tell if it is processor or motherboard related, could be either, but since I am the only one having these problems with unRAID, and also the only unRAID user on this motherboard, most likely it is an unRAID issue with my ASRock X370 Fatal1ty Gaming Professional motherboard, and not an issue with the AMD Ryzen processor itself.

 

I'm at a complete loss on where to go from here.

 

Here are the clues:

  • The server only crashes/hangs/reboots in unRAID.  Completely stable in Win10 and other Linux distros.
  • The server crashes in unRAID Safe Mode or in regular plugin mode.
  • The server typically crashes/hangs/reboots within 1-2 hours, but this varies and on one occasion it made it about 13 hours.
  • On one occasion I could see a "Hangcheck: hangcheck value past margin!" error message on the console, and the server was hung.
  • On multiple occasions after a crash/reboot, there were messages that a Machine Check Exception (MCE) had occurred.  I was not able to view the details in unRAID, but when I configured my server to reboot into Windows after a crash, I was able to see the MCE details in Windows, complaining of a Cache Heirarchy Error.
  • Diagnostics are attached.

-Paul

Ryzen-tower-diagnostics-20170321-1117.zip

 

Hi Paul,

 

Earlier you asked if anyone else is having freezing.  I have Ryzen 1700 and Asus x370-pro.  Randomly everything comes to a halt and completely unresponsive requiring a hard reboot.  I am also not getting a proper log report.  I check the logs and the only data I have is from current boot.

 

I recently updated the BIOS to see if that would help and same problem persists.  I was on 0504 and now 0511.--update noticed the digital trends article and latest BIOS from ASUS didn't seem to resolve my freezing issues.  0511 is latest as of 2017/03/23

 

From a hardware stand point I have tried to strip back all components removing all cache drives etc.  Trying to determine if their was a specific component that was causing issues.  I have not gone down your path with a clean install with a stock config and single drive.  But running Win 10 and linux distro on bare metal was next so you have saved me some time.  I am to the point where I think there is a incompatibility issue somewhere with unRAID specifically.

 

Model: Custom

M/B: ASUSTeK COMPUTER INC. - PRIME X370-PRO

CPU: AMD Ryzen 7 1700 Eight-Core @ 3000

HVM: Enabled

IOMMU: Enabled

Cache: 768 kB, 4096 kB, 16384 kB

Memory: 16 GB (max. installable capacity 16 GB)

Network: bond0: fault-tolerance (active-backup), mtu 1500 
 eth0: 1000 Mb/s, full duplex, mtu 1500

Kernel: Linux 4.9.10-unRAID x86_64

 

I submitted my logs to limetime feedback this morning.

 

Chad

 

Edited by chadjj
Link to comment
On 3/21/2017 at 9:21 PM, ufopinball said:

Dunno if this has affected anyone here, but I guess a big bug is getting fixed in Ryzen via a BIOS upgrade:

 

AMD found the root problem causing its new Ryzen processors to freeze desktops

 

Read more: http://www.digitaltrends.com/computing/ryzen-amd-bios-fix-fma3-crash/

 

Last BIOS release for the ASUS Prime X370-PRO was nearly a month ago.  I hope the new release brings more fixes/features than just this ... whenever it's ready.

 

- Bill

 

 

Hi Bill,

 

I am on a recent release post this article date 0511 and still locking up.

 

Chad

Link to comment
1 hour ago, chadjj said:

I am on a recent release post this article date 0511 and still locking up.

 

I would be surprised if any motherboard maker has released a new BIOS that addresses the lock-up issue, it's only been a few days since AMD announced the issue and forthcoming fix.

 

That said, I don't know if this issue is in any way related to our lock-up issue.  The bug is in the FMA3 module, for performing Fuse-Multiply-Add operations, something that is typically only seen in certain benchmarks and scientific type applications.  I could be wrong, but I don't see how an idle unRAID server would be doing FMA3 operations.

 

1 hour ago, chadjj said:

Hi Paul,

 

Earlier you asked if anyone else is having freezing.  I have Ryzen 1700 and Asus x370-pro.  Randomly everything comes to a halt and completely unresponsive requiring a hard reboot.  I am also not getting a proper log report.  I check the logs and the only data I have is from current boot.

...

Chad

 

Miserly loves company, thanks for joining... :/

 

Sorry to hear you are experiencing issues as well.  Have you heard back from Lime-Tech?  I never did, but maybe I communicated incorrectly somehow.

 

Just to be sure it's not a hardware issue, have you run Memtest86 and also booted into Windows or other Linux distros?  From your post, it reads as if you skipped those steps since I had done them on my hardware.  If you haven't done these tests, I highly encourage them.  

 

I took a look at your motherboard specs to see what is common between your motherboard and mine, since I'm still thinking this is not a CPU issue with several people here successfully running unRAID on Ryzen.

 

  • One thing unique about my MB is the Aquantia AQC108 5Gb LAN chip, and your motherboard doesn't have one.  There are also no drivers in 4.9.10 to support it (comes in 4.11), so technically it is going unused by unRAID.  I'm now thinking it is unlikely to be part of the problem.
  • Both motherboards have the Intel i211-AT gigabit LAN chip.  I would expect an Intel LAN chip would be problem free, but who knows...
  • I also noticed that both of our motherboards are running with a bonded network configuration:  " Network: bond0: fault-tolerance (active-backup), mtu 1500" .  I've become increasingly suspicious of this, as I"ve never seen a bonded configuration before (just personal experience, doesn't mean too much).  In addition to the dual ethernet ports, my motherboard also has Wi-Fi and Bluetooth.  I haven't been able to turn off the Wi-Fi or BT, and I was thinking that one of those was bonding with the ethernet port, something I don't want.  Doubtful that would cause a crash, but again who knows...
  • Your motherboard doesn't appear to have Wi-FI or Bluetooth.  In fact, you only seem to have a single network port, so I'm not sure how you're running a bonded configuration.  Seems odd.
  • Both motherboards have the Realtek ALC S1220A audio codec chip.  This chip doesn't have driver support in 4.9.10 (comes in 4.11), so again I'm thinking it is just sitting there unused by unRAID and not part of the problem, but who knows...
  • Both motherboards have USB 3.1 support, but I believe this is part of the X370 chipset so it shouldn't be anything special.  I tried disabling them in my BIOS, but flipping the switch did not actually turn them off.
  • Your motherboard has 8 SATA3 ports, which I believe are all from the Ryzen CPU and the X370 chipset - the combo maxes out at 8.  My motherboard has 10, with the extra 2 coming from an ASMedia ASM1061 chip.  Since that is not a shared feature of our motherboards, I don't think this is the problem.
  • Both motherboards have dual M.2 ports.  This is a fairly unusual feature, common on several ASRock boards, but less common elsewhere.  I'm not certain if anyone else here has dual M.2 ports.  I am using one, the primary port, with a Samsung 960 in it.  I installed Win10 on it, and dual boot into it.  In unRAID, the drive is visible but not usable without formatting.  I guess I could remove it to see if it makes a difference, as that is about the only hardware change I haven't made, but I'm doubtful it would help.  I also don't see how having dual M.2 ports would be a problem, especially if the 2nd port is empty, but who knows...
  • One thing I could not determine is whether or not your motherboard has the external bus clock generator (BCLK).  My motherboard has one (called the ASRock Hyper BCLK Engine II) and an external bus generator is required to go over 100MHz on the bus.  Typically only the highest end motherboards have one, and it is for overclocking.  I've wondered if it is related to the problem, since the Hangcheck error can supposedly be triggered by bad bus timings.  Best I can tell, your motherboard doesn't have one.

 

I can't really see anything else that would be relevant.

 

I need to go back through this long thread and reread what Ryzen motherboards are working okay with unRAID.  I'm pretty sure there have been other X370 chipset motherboards, but can't remember for certain.

 

-Paul

  • Upvote 1
Link to comment
4 hours ago, chadjj said:

 

Hi Bill,

 

I am on a recent release post this article date 0511 and still locking up.

 

Chad

 

I was away for a few days, so I left my new Ryzen build running with unRAID loaded.  I also had my Windows 10 VM running, but otherwise both unRAID and Windows had no specific tasks assigned.  So on the older 0504 BIOS, I recorded 5 days, 21 hours and 35 minutes of uptime.

 

From the website:

 

PRIME X370-PRO BIOS 0511
1.Improve system performance.
2.Make CPU temperature more precise.

 

I just took the system down and upgraded to 0511.  I have no idea how to measure system performance, but my CPU temperature dropped from 60c down to as low as 56c.  Color me unimpressed.  Dunno if you had a more significant delta for the temperature reading.

 

Anyway, today was the day I was going to move the new CPU/MB into my main server.  I kicked off one last MemTest just to be sure, and then I'll try and do the switch-over this afternoon.

 

- Bill

 

Edited by ufopinball
Link to comment
5 hours ago, Pauven said:

 

I would be surprised if any motherboard maker has released a new BIOS that addresses the lock-up issue, it's only been a few days since AMD announced the issue and forthcoming fix.

 

That said, I don't know if this issue is in any way related to our lock-up issue.  The bug is in the FMA3 module, for performing Fuse-Multiply-Add operations, something that is typically only seen in certain benchmarks and scientific type applications.  I could be wrong, but I don't see how an idle unRAID server would be doing FMA3 operations.

 

 

Miserly loves company, thanks for joining... :/

 

Sorry to hear you are experiencing issues as well.  Have you heard back from Lime-Tech?  I never did, but maybe I communicated incorrectly somehow.

 

Just to be sure it's not a hardware issue, have you run Memtest86 and also booted into Windows or other Linux distros?  From your post, it reads as if you skipped those steps since I had done them on my hardware.  If you haven't done these tests, I highly encourage them.  

 

I took a look at your motherboard specs to see what is common between your motherboard and mine, since I'm still thinking this is not a CPU issue with several people here successfully running unRAID on Ryzen.

 

  • One thing unique about my MB is the Aquantia AQC108 5Gb LAN chip, and your motherboard doesn't have one.  There are also no drivers in 4.9.10 to support it (comes in 4.11), so technically it is going unused by unRAID.  I'm now thinking it is unlikely to be part of the problem.
  • Both motherboards have the Intel i211-AT gigabit LAN chip.  I would expect an Intel LAN chip would be problem free, but who knows...
  • I also noticed that both of our motherboards are running with a bonded network configuration:  " Network: bond0: fault-tolerance (active-backup), mtu 1500" .  I've become increasingly suspicious of this, as I"ve never seen a bonded configuration before (just personal experience, doesn't mean too much).  In addition to the dual ethernet ports, my motherboard also has Wi-Fi and Bluetooth.  I haven't been able to turn off the Wi-Fi or BT, and I was thinking that one of those was bonding with the ethernet port, something I don't want.  Doubtful that would cause a crash, but again who knows...
  • Your motherboard doesn't appear to have Wi-FI or Bluetooth.  In fact, you only seem to have a single network port, so I'm not sure how you're running a bonded configuration.  Seems odd.
  • Both motherboards have the Realtek ALC S1220A audio codec chip.  This chip doesn't have driver support in 4.9.10 (comes in 4.11), so again I'm thinking it is just sitting there unused by unRAID and not part of the problem, but who knows...
  • Both motherboards have USB 3.1 support, but I believe this is part of the X370 chipset so it shouldn't be anything special.  I tried disabling them in my BIOS, but flipping the switch did not actually turn them off.
  • Your motherboard has 8 SATA3 ports, which I believe are all from the Ryzen CPU and the X370 chipset - the combo maxes out at 8.  My motherboard has 10, with the extra 2 coming from an ASMedia ASM1061 chip.  Since that is not a shared feature of our motherboards, I don't think this is the problem.
  • Both motherboards have dual M.2 ports.  This is a fairly unusual feature, common on several ASRock boards, but less common elsewhere.  I'm not certain if anyone else here has dual M.2 ports.  I am using one, the primary port, with a Samsung 960 in it.  I installed Win10 on it, and dual boot into it.  In unRAID, the drive is visible but not usable without formatting.  I guess I could remove it to see if it makes a difference, as that is about the only hardware change I haven't made, but I'm doubtful it would help.  I also don't see how having dual M.2 ports would be a problem, especially if the 2nd port is empty, but who knows...
  • One thing I could not determine is whether or not your motherboard has the external bus clock generator (BCLK).  My motherboard has one (called the ASRock Hyper BCLK Engine II) and an external bus generator is required to go over 100MHz on the bus.  Typically only the highest end motherboards have one, and it is for overclocking.  I've wondered if it is related to the problem, since the Hangcheck error can supposedly be triggered by bad bus timings.  Best I can tell, your motherboard doesn't have one.

 

I can't really see anything else that would be relevant.

 

I need to go back through this long thread and reread what Ryzen motherboards are working okay with unRAID.  I'm pretty sure there have been other X370 chipset motherboards, but can't remember for certain.

 

-Paul

Hi Paul,

 

Yes, their are a fair number of differences between our motherboards and a lot of common chipsets, LAN etc.  I have run memtest86 and did hear back from limetech support and they are having the same issues and thought it may have been a memory problem and were considering an RMA.  They have gSkill and I have Corsair so the assumption is that we had similar RAM with similar issues.  With RAM as a differentiator and my RAM reporting 100% ok it is unlikely we both have bad RAM.  My kit is brand new for this build as well.

 

This is a great article that compares all AM4 boards by manufacturer.  I settled on ASUS due to the 8 SATA ports without gong to a top of the line x370 from Asrock, Asus, Gigabyte.

http://wccftech.com/amd-ryzen-am4-motherboard-round-up-msi-gigabyte-asrock-asus-x370/

 

The challenge is that there are multiple variables.  CPU requiring an update from MB BIOS.  unRAID updating their Linux kernel and or the kernel being updated with better Ryzen compatibility.

 

In time I plan to test within Windows and Linux just need to find the time.  As of right now I have been up for 6hrs 53mins.  We will see how long that lasts.

 

Thanks,

Chad

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.