Solved - VMs Locking Up Unraid Server


Endy

Recommended Posts

I've been running a Windows 10 VM as my daily driver for over a year now. I recently made some changes to my network and my Unraid server was moved to a closet (I've been monitoring and heat is not an issue). I now have another PC setup as my daily driver and I had my Windows 10 VM now hooked up directly to a tv.

 

This was when I noticed the first problem, which was that the sound over HDMI was flaky. After doing research, it appeared that using ovmf instead of seabios should fix the issue (it did). This is where I started to make actual changes. I created a new Windows 10 using ovmf and after that is when I started noticing some issues. I noticed that I couldn't change the number of cores a VM was using. If I did, the VM would not work. (I know I am missing details on this, but I didn't think it was going to be a big deal at the time so I didn't take notes. I may have noticed it before creating the new VM.)

 

After getting the new Windows 10 VM going, I decided to try a Windows 7 VM. I wanted to use the same video card and USB card as the Windows 10 VM (they would not be run at the same time) and because of the issues with HDMI sound I first tried to use  ovmf. That was not working. Tried using just one core, starting by just using VNC instead of the video card, used Cirrus instead of QXL. Nothing was working, so I switched to seabios. I was finally able to get Windows 7 installed.

 

I believe this was when the lockups first started to happen. Windows 7 would install updates and reboot and then lockup my entire Unraid server. Then I had a lockup when starting the Windows 10 VM just after rebooting Unraid. Sometimes it happens when stopping one VM and starting the other. I haven't played with it much, but I do not think my windows XP VM that uses VNC has had any problems. 

 

Sometimes when Unraid comes back up VMs will be disabled. I also had an issue where the VMs would not start, I would get an error message (Unable to power on device stuck in D3). I think i ended up deleting libvirt.img at one point (I'm not sure why I did that). After that is when I think I was able to add or subtract cores properly again. That may have also been when the Unable to power on device message stopped happening. 

 

Another problem I noticed was that Windows 7 didn't seem to always see the USB card I was passing through (Windows 10 had no issues with it). I fixed this by changing from using pci-stub to vfio-pci in the syslinux config file. The card is a Fresco Logic FL1100. I do see this in the VM log (both for Windows 7 and Windows 10) that is referring to the USB card "2017-11-10T14:28:12.423626Z qemu-system-x86_64: -device vfio-pci,host=05:00.0,id=hostdev2,bus=pci.0,addr=0x8: Failed to mmap 0000:05:00.0 BAR 2. Performance may be slow".

 

I am attaching the diagnostics from just after I shutdown the Windows 7 VM. I then attempted to launch the Windows 10 VM and Unraid locked up.

 

I am at a loss as to what could be wrong and what to do next.

 

 

turtle-diagnostics-20171110-0958.zip

Edited by Endy
Mark as solved
Link to comment

I tried to tail the syslog and took a picture, but that is the same thing that was onscreen after I shutdown the Windows 7 VM. Nothing new when I started the Windows 10 VM that locked up Unraid. I'll attach anyways.

 

It seems to be consistent that if I start Windows 7 and then go to Windows 10 that it locks everything up. I believe I have gone from Windows 10 to Windows 7 without that happening. I can't recall if I have tried just reloading the same VM.

2017-11-10-10.24.png

Link to comment

It seems there have been a few other people that have had this issue, but I have not come across any fixes yet.

 

I did see something mentioned about using the Tips and Tweaks plugin and setting vm.dirty_background_ratio to 1 and vm.dirty_ratio to 2. I did that and I booted into the Windows 7 vm and then stopped it and booted into the Windows 10 vm and then stopped it. The server did not lockup.

 

I also now have "irq 16: nobody cared found on your server" showing up in Fix Common Problems. I saw this once when I plugged in a mouse and keyboard to tail the syslog, but it disappeared after I rebooted. It then showed up again over night and now there's a bunch of them. I'm thinking I should start a new topic in General Support for that.

Link to comment

I finally figured out how to install Windows 7 using ovmf.  Machine type i440fx. Add both VNC/Qxl and the passed through video card. Install Windows 7 using VNC. Install the drivers for the video card. Reboot. Shutdown the vm and remove VNC/Qxl. (There are a lot of recommendations to use Cirrus for Windows 7, but that only works for seabios.)

 

I switched back and forth a couple of times between the new Windows 7 vm using ovmf and my Windows 10 vm without any more lockups. I'll report back after giving it some time to see if that does indeed fix my issue.

Link to comment

After getting Windows 7 working using ovmf I stopped having lockups just by switching between Windows 7 and Windows 10. I did have an occasion or two where everything locked up starting a vm after rebooting Unraid.

 

I noticed that the USB card I was passing through seemed to be rather flaky. In Windows 7 it would sometimes show up and sometimes not. Devices attached to it also seemed to perform a little irregularly. I wasn't sure if this was the card or my setup.

 

With my Unraid server being in a closet, it wasn't too far from my living room so instead of using Steamlink or Gamestream to play games over the local network, I figured it would be better to just run HDMI and USB directly. Not a big deal because I was already up in the attic running cat 6 throughout the house. I also figured that it wasn't that much to also run HDMI and USB to my bedroom and then I had a choice of which room to use for those vm's.

 

HDMI was no problem after I stopped trying to use 2 ports from the video card and bought and HDMI splitter. (Windows defaults to extending the display instead of mirroring it for a desktop pc which could be dealt with, but even worse was that there was no way to send sound to 2 different tv's at once that I could find.)

 

For USB I had to use a couple of active cables, 1 to each room. Then I had a USB hub at each location. In my research I found that this could have an issue with not enough power because of the distance (32' the shortest active cable I found, I really only need about 20') and also trying to power the hubs. Unplugging one active cable and hub seemed to help but not entirely fix the problem.

 

I ended up buying another USB card and removing the old one. With just one active cable and one hub plugged in, it seems to work great and no more lockups. I added the other USB card back in to see if I could use that for the connecting the other active cable and hub and it partially works. Everything showed up fine and seemed to be working, but it doesn't like it when I switch vm's, despite supposedly being able to be reset. It gives me an error "internal error: unknown pci header type '127'".

 

The flaky USB card has the FrescoLogic 1100 chipset and I believe other people are using it successfully. I have a feeling this one is just bad. Most likely I will just replace it and then I should have everything working the way I want it to and no more lockups.

 

Sorry for being a little long winded, just wanted to be thorough with what should be the solution.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.