[Solved] Logs filling up with AER/PCIe Bus errors


Recommended Posts

Hey everyone, I'm new to unraid and I love it (especially the community; so much good info on here)! Just got into unraid a couple months ago with 6.1.9 and moved over to 6.2 last weekend. I've learned a lot from studying these forums and reading up on virtualization, but with school and work in full swing I haven't had a chance to do anything besides set up a couple VMs for my wife and I. With 6.1.9 I had two Windows 10 VMs set up with SeaBios, but since 6.2 went live and I only had been using the VMs for a couple weeks, I decided to start over with 6.2 and reinstall the Windows 10 VMs with OVMF/i440fx-2.5.

 

I hadn't noticed any problems with performance, so I didn't think anything was the matter, but as I was reading through the post 6.2 tips I wanted to get notifications set up. After I set up notifications I was looking through the logs to see how notifications get logged and discovered that my logs are full of the same error repeating many, many times. I had looked at the logs before with 6.1.9 and my previous setup and I don't remember seeing this error:

 

Oct 5 19:23:53 agni kernel: pcieport 0000:00:02.0: AER: Multiple Corrected error received: id=0010

Oct 5 19:23:53 agni kernel: pcieport 0000:00:02.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0010(Receiver ID)

Oct 5 19:23:53 agni kernel: pcieport 0000:00:02.0: device [8086:6f04] error status/mask=00000040/00002000

Oct 5 19:23:53 agni kernel: pcieport 0000:00:02.0: [ 6] Bad TLP

 

From what I can tell, the error only occurs when I have the VMs turned on, and it gives a slightly different device address depending on which VM is currently booted up. I'm at a loss as to what's causing it. My motherboard does have the latest version of the BIOS and I don't know what else could be causing this (the device ID from the error maps to 00:02.0 PCI bridge [0604]: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 [8086:6f04] (rev 01)). I've attached my diagnostics zip, and my two VMs XML for reference. Any help or insight anyone could provide would be very helpful  :)!

agni-diagnostics-20161005-1904.zip

vm1.txt

vm2.txt

Link to comment

Had some time to do some more digging and found out some lovely things about x99 chipset boards. I've included links to my research for those who may experience any similar issues:

 

An Intel document that mentions TLP timeouts

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/core-i7-lga2011-3-spec-update.pdf

 

Another Intel document on the x99/c610 chipsets with information about TLP stuff

http://www.intel.nl/content/dam/www/public/us/en/documents/specification-updates/x99-chipset-pch-spec-update.pdf

 

So what's the issue? I'm no expert on PCI-E, but it seems like a board implementation issue. So if a BIOS update didn't already fix it (I'm using the latest), then the only option was probably to try either

pci=nomsi

 

or

 

pci=nommconf

 

Didn't really want to use pci=nomsi if I could avoid it, but using pci=nommconf also left my Windows 10 VMs with no ability to keep a driver (Windows getting a Code 43 on the devices and running in 800x600 resolution) ???. I found another forum talking about ASUS x99 boards errors being related to bus power management, so I figured why not try disabling aspm.

 

Sure enough, adding

pcie_aspm=off

has resolved the issue entirely (the error is not occurring anymore), though I still don't know the root cause for certain. Maybe ASUS will fix the problem through a BIOS update eventually.

  • Like 3
Link to comment
  • 1 month later...
  • 2 weeks later...

This needs to be added to the boot command you can access it by going to the main tab of the unraid gui and clicking the flash option.

 

I have been having this issue for months and have gone through all the mentioned fixes above but the final option. I have just enabled it on my system to see if this fixes the problem

 

One thing I will say is that I have had a vm playing up for ages where it would try to boot then turn off. I would then have to tell it to boot again before it started

 

I have enabled this and the vm started instantly. As an idea of what I have tried I have updated all bios, reseated cards, changed slots entirely, flashed device bios including my sas controller and none of that helped.

 

Time to see if this works, huge thanks

 

Jamie

Link to comment
  • 1 month later...
  • 3 years later...
  • 2 years later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.