Rhynri

Members
  • Posts

    68
  • Joined

  • Last visited

1 Follower

Recent Profile Visitors

1579 profile views

Rhynri's Achievements

Rookie

Rookie (2/14)

14

Reputation

  1. I sold the original and have a Sage wifi II now. As far as I recall there are several IOMMU settings in different sections. Just look for anything IOMMU and virtualization related. If you had a SAGE II I can give you my mobo settings.
  2. To Follow Up - the VIDEO_DXGKRNL_FATAL_ERROR was actually because of AquaComputer's Aquasuite - turns out that they have some video and audio services that try to access low-level resources in the system and cause the Graphics subsystem to crash - once these are disabled the problem goes away. I still have the BAR issues and a nasty hard-lock crash detailed here.
  3. Thanks for the reply! To clarify - if I do that with SR-IOV off, one of the GPUs (and probably some other devices, I haven't looked that close) does not show up in Unraid. If I do it with it on, the BAR errors are still present. For sake of completeness, I've attached new diagnostics with `pci=realloc=off` in the bootstring tower-diagnostics-20230104-1205.zip
  4. Hello! I've been experiencing some wonkiness with my Unraid server, including hard locks that take out the VMs [displays go black], the HTTP interface, and SSH. In the logs I'm seeing many lines of: Jan 4 10:49:11 Tower kernel: pci 0000:00:03.1: BAR 15: no space for [mem size 0x00600000 64bit pref] Jan 4 10:49:11 Tower kernel: pci 0000:00:03.1: BAR 15: failed to assign [mem size 0x00600000 64bit pref] Jan 4 10:49:11 Tower kernel: pci 0000:00:03.1: BAR 13: no space for [io size 0x2000] Jan 4 10:49:11 Tower kernel: pci 0000:00:03.1: BAR 13: failed to assign [io size 0x2000] Jan 4 10:49:11 Tower kernel: clipped [mem size 0x00000000 64bit pref] to [mem size 0xfffffffffffc0000 64bit pref] for e820 entry [mem 0x000a0000-0x000fffff] Jan 4 10:49:11 Tower kernel: clipped [mem size 0x00020000 64bit pref] to [mem size 0xfffffffffffe0000 64bit pref] for e820 entry [mem 0x000a0000-0x000fffff] Jan 4 10:49:11 Tower kernel: pci 0000:00:03.1: BAR 15: no space for [mem size 0x00200000 64bit pref] Jan 4 10:49:11 Tower kernel: pci 0000:00:03.1: BAR 15: failed to assign [mem size 0x00200000 64bit pref] Jan 4 10:49:11 Tower kernel: pci 0000:00:03.1: BAR 13: no space for [io size 0x2000] Jan 4 10:49:11 Tower kernel: pci 0000:00:03.1: BAR 13: failed to assign [io size 0x2000] It's almost like Unraid is booting in 32bit mode or something and running out of memory space - although I wouldn't think this is possible. In a previous post here, I detailed some attempts at workarounds I made to get the system to boot, but no combination of pci=realloc=off, SR-IOV, and my motherboard's PCIe settings can seem to resolve this. Either I get a partial boot (pci=realloc=off for example, I lose a GPU) or none of the drives/gpus are visible. Memory test came out perfectly clean - no errors in SMT or normal mode. Most of the attached hardware was in my previous 1950X Threadripper build, which was solid as a rock. I'm appreciate any help. I have a syslog server up on a raspberry pi to try and catch one of the crashes directly if possible, but I am thinking that all these PCIe issues can't be helping. tower-diagnostics-20230104-1113.zip
  5. So after much testing and experimentation it appears that the actual install of Windows is broken - Win11 install is a straight beast on the new hardware and solid as a rock. It could also be that running it off of the M.2 to PCIe slot adapter is broken, I'm not sure which. I'm not sure which. As an experiment, I'm going to rip the drive to a QEMU disk image and boot from that. I'll keep recording my journey here for posterity in case it helps someone else.
  6. So there is another level of complexity here. Turns out SR-IOV support is the one actually solving this. Without it on, the symptoms return. If you have BME DMA Mitigation on then when you pass through the cards, they won't return to the system (and can't be reset). However now I error out of windows hard when trying to game. First it games fine (I'm hitting my monitor refresh in Deep Rock Galactic) but then I get a VIDEO_DXGKRNL_FATAL_ERROR - after some tinkering I now get one when shutting down and the video card will not return to the system when this happens, but I can still boot. Going to load up a blank Windows 11 install and see if it has the same issues.
  7. Having experimented in the BIOS, I was able to boot with full functionality and without the `pci=realloc=off` by going into my PCI Subsystem settings and changing the following settings: 1) Above 4G Decoding - ENABLED 2) Re-Size BAR Support - AUTO 3) SR-IOV Support - ENABLED 4) BME DMA Mitigation - DISABLED [Edit - See below for explanation] 5) Hot-Plug Support - DISABLED It's also worth noting I'm not booting in EFI - I prefer the way the onboard ASPEED VGA outputs a functional console in legacy boot.
  8. Thanks for the follow up, I had to sleep a bit, but it looks like @JorgeB kindly marked your solution. I had a look again today and it also restored the USB devices. After looking at this kernel pci patch thread I'll experiment with 4G bios settings to see if I can run without this setting.
  9. Yes! That worked! What exactly does that do, and how would one go about figuring out they need it in the absence of wonderful people like you? (i.e. - Is there a list of boot options and their functions?)
  10. I tried disabling IOMMU first (it's a separate setting - or rather a half dozen - on my board) and then SVM. No change. For the record, booting into Windows (I had a disposable install on one of the NVMe's that going to be in the cache) with current settings shows all devices as expected. So this is an Unraid only issue - the hardware itself seems to be working fine.
  11. Hello - and thanks for taking a peek! I'm running into an issue even though I can see my 6 SATA Drives and 3 NVME drives on the BIOS Post screen, they only two of the NVME Drives show up in the OS itself. I'm also noticing that none of the USB devices attached to the system aside from the boot drive are visible, including USB drives plugged in after boot. An NVMe drives plugged into a PCIe slot via PCIe -> M.2 Adapters did appear. What's even weirder is that the third "missing" NVMe controller does seem to show up, but it doesn't appear as a drive. The motherboard is a Pro WS WRX80E-SAGE SE WIFI with an AMD Threadripper Pro 5965WX with 128GB of system RAM installed and no PCIe slot devices installed at the moment (for testing). I've tried making a new Unraid USB - they boot fine but do not resolve the issue. I see a lot of pci 0000:22:06.0: BAR 14: failed to assign -type messages in the SysLog so I'm going to fiddle with the ram a bit and see if I have a bad stick or something since it's my understanding that bad ram can cause this. Worth a shot. Diags [Edit: from Safe Mode] attached. tower-diagnostics-20221228-2321.zip
  12. You have no idea how helpful this was to me. You are a lifesaver. I have what I now know is a bad board - I was 60% sure and this post made me 100% sure. It wouldn't boot with any GPUs in (at all) and was doing some really weird stuff otherwise. I've previously tested the GPUs in other systems, and the RAM came from a functioning Unraid server. The BMC won't respond either. I took all the GPUs out, and it'd boot into Unraid, but no SATA devices showed up and one of the NVMe drives mounted on the board didn't show up either. But somehow the SATA controllers did? Your post confirms that my motherboard is just hosed and not somehow incompatible with Unraid, as you have the exact board and processor I have on the same BIOS revision. Thank you, from the bottom of my heart! 🥰
  13. Geforce GPU Passtrhough For Windows Virtual Machine (Beta) I was looking up something unrelated and stumbled upon this Nvidia link. This could potentially mean an end for Code 43 issues, although the text at the bottom makes me wonder if you can pass through a primary card under this regime. Worth a test for a brave soul, as I use my VM as a work machine and can't afford bricking it right this second to test, although I will once I have time if someone else hasn't by then.
  14. Same here! We meet again, @testdasi. It's almost like we use unraid in similar manners.