Marv Posted February 12, 2017 Share Posted February 12, 2017 Hi Everyone, I just scanned my system with Fix Common Problems for the first time today since upgrading to v6.3.1. Unfortunately I get the following error reported by the plugin: "Call Traces found on your server" - "Your server has issued one or more call traces. This could be caused by a Kernel Issue, Bad Memory, etc. You should post your diagnostics and ask for assistance on the unRaid forums." I didn't have any problems with my server yet and don't really know where to start here. So here are my diagnostics. Hopefully someone can take a look at this. unmarv-diagnostics-20170212-1355.zip Quote Link to comment
John_M Posted February 13, 2017 Share Posted February 13, 2017 Since nobody has responded I'll have a go. The two call traces seem to be due to a page allocation failure, possibly related to KVM. The first thing I'd do is select Memtest from the boot menu and run it for a good long time to make sure your RAM is ok because there's no point in continuing if it's bad. Then I'd try running with no VMs running and the VM service disabled in Settings. Your libvirt log has entries saying that your libreELEC VM is running tainted code so best to make sure the system works well as a basic NAS before re-enabling the complicated stuff. Quote Link to comment
Squid Posted February 13, 2017 Share Posted February 13, 2017 Your libvirt log has entries saying that your libreELEC VM is running tainted code so best to make sure the system works well as a basic NAS before re-enabling the complicated stuff. Just FYI since that "tainted" message implies that there is actually something wrong here ....is tainted: High-Privileges means that QEMU is running as root: perfectly normal under unRaid ....is tainted: Host-CPU means the QEMU is passing through the CPU instead of emulating it. Once again perfectly normal under unRaid But, everything you suggest is 100% valid. Quote Link to comment
John_M Posted February 13, 2017 Share Posted February 13, 2017 Thanks for that clarification, Squid. I don't really do VMs, myself. Quote Link to comment
Marv Posted February 13, 2017 Author Share Posted February 13, 2017 Since nobody has responded I'll have a go. The two call traces seem to be due to a page allocation failure, possibly related to KVM. The first thing I'd do is select Memtest from the boot menu and run it for a good long time to make sure your RAM is ok because there's no point in continuing if it's bad. Then I'd try running with no VMs running and the VM service disabled in Settings. Your libvirt log has entries saying that your libreELEC VM is running tainted code so best to make sure the system works well as a basic NAS before re-enabling the complicated stuff. Thanks for your reply. I'll run the Memtest later today and report back. Quote Link to comment
Marv Posted February 14, 2017 Author Share Posted February 14, 2017 So I ran the Memtest for 23 hours finding 0 errors. I also didn't get the reported call traces error anymore. Should I just keep the server running as normal and check the log from time to time and hope it won't come back or is there something else I can do now? Quote Link to comment
John_M Posted February 15, 2017 Share Posted February 15, 2017 Were you running without the VM service when the problem didn't show itself? If so, try turning it back on and see if it re-appears. Ideally you want to find what's causing it. Hoping that the problem doesn't come back is not a good troubleshooting method. Quote Link to comment
Marv Posted February 15, 2017 Author Share Posted February 15, 2017 The VM Service was turned on and my LibreELEC VM was running aswell for most of the time. What I noticed is that it seems that the LibreELEC VM uses slightly more RAM since upgrading to v6.3.0/1 I assigned 1024MB to the VM and after some time 90-95% is used. Before upgrading the peak was most of the time around 80%. Could it be possible that the error occured because the VM was running out of memory maybe? Quote Link to comment
Marv Posted February 15, 2017 Author Share Posted February 15, 2017 So I was just playing some music with Kodi and my LibeELEC VM crashed (the sound did not) After that I again had the 'calle traces' error in my log. I also increased my VM memory to 2048mb before. I attached the new diagnostics. unmarv-diagnostics-20170215-1659.zip Quote Link to comment
John_M Posted February 15, 2017 Share Posted February 15, 2017 Certainly, if the VM runs out of memory it will be a problem but I don't know if it will be the same problem as you've been experiencing. However your problem is one of memory allocation so they could indeed by connected. There's an easy way to find out - see how it behaves now that you've allocated more memory. Quote Link to comment
Marv Posted February 15, 2017 Author Share Posted February 15, 2017 It already was increased when it crashed so that's not the reason unfortunately Maybe it has something to do with the LibreELEC version itself? Quote Link to comment
John_M Posted February 15, 2017 Share Posted February 15, 2017 You need to find the conditions under which the server will run reliably. I suggest turning off the VM service and see if it's stable then. Quote Link to comment
Marv Posted February 16, 2017 Author Share Posted February 16, 2017 Thanks for your Support John. So I've been running with the VM Service disabled and got no errors since. But I just remembered that the first time when the 'call trace' was issued I had lags while playing a movie with Kodi. The second time the error came up was also while playing music. So I'll just leave VMs disabled for now and see how it goes. What do you suggest when I reenable it and the error comes up again while using the LibreELEC vm? Quote Link to comment
John_M Posted February 16, 2017 Share Posted February 16, 2017 If that turns out to be the case then the problem is likely to be VM related. I would give the other VM - SteamOS - a good testing to see if it's affected too. If not then it fairly conclusively points to LibreELEC. The important thing at this stage it to find out if the server has any issues when running as a basic NAS. Quote Link to comment
Marv Posted February 17, 2017 Author Share Posted February 17, 2017 The error didn't come up yet. I also enabled the VM service again but without using LibreELEC (SteamOS isn't installed atm). I won't have time playing around with LibreELEC before sunday so I'll just let the server run like this and see what happens when using Kodi again. Quote Link to comment
Marv Posted February 21, 2017 Author Share Posted February 21, 2017 So I didn't get the error again, also after using LibreELEC. I also updated to unRAID v6.3.2 maybe this helped solving it? I don't know. Quote Link to comment
John_M Posted February 21, 2017 Share Posted February 21, 2017 unRAID 6.3.2 has a newer kernel so maybe that fixed it. If you don't have any more problems then that's good. Quote Link to comment
Marv Posted February 28, 2017 Author Share Posted February 28, 2017 I'm back again Only good thing is I'm 99% sure it only happens when my LibreELEC VM is running. Anything else I can do here? unmarv-diagnostics-20170228-2226.zip Quote Link to comment
Squid Posted February 28, 2017 Share Posted February 28, 2017 13 minutes ago, Marv said: I'm back again Only good thing is I'm 99% sure it only happens when my LibreELEC VM is running. Anything else I can do here? unmarv-diagnostics-20170228-2226.zip Can you post the output of this command: cat /proc/interrupts The call trace (caused by IRQ 16 being disabled) may be benign in your case, but we can't 100% tell unless we see which modules are utilizing the interrupt Quote Link to comment
Marv Posted March 1, 2017 Author Share Posted March 1, 2017 (edited) I guess I need to use the command when the error occured, right? Cause I powered down the server after posting my diagnostics. Have to wait for another 'call traces' then. Is it possible btw that the error has something to do with my Logitech keyboard passed through via USB3.0 to my VM? Edited March 1, 2017 by Marv Quote Link to comment
Squid Posted March 1, 2017 Share Posted March 1, 2017 I guess I need to use the command when the error occured, right? Cause I powered down the server after posting my diagnostics. Have to wait for another 'call traces' then. Is it possible btw that the error has something to do with my Logitech keyboard passed through via USB3.0 to my VM?The command will still work after the factSent from my LG-D852 using Tapatalk Quote Link to comment
Marv Posted March 1, 2017 Author Share Posted March 1, 2017 Here you go CPU0 CPU1 CPU2 CPU3 0: 28 0 0 0 IR-IO-APIC 2-edge timer 1: 2 0 0 0 IR-IO-APIC 1-edge i8042 8: 5 0 1 0 IR-IO-APIC 8-edge rtc0 9: 0 0 0 0 IR-IO-APIC 9-fasteoi acpi 12: 4 0 0 0 IR-IO-APIC 12-edge i8042 16: 19 5 0 15 IR-IO-APIC 16-fasteoi ehci_ hcd:usb1 18: 0 0 0 0 IR-IO-APIC 18-fasteoi i801_ smbus 23: 26 5 0 13 IR-IO-APIC 23-fasteoi ehci_ hcd:usb2 24: 0 0 0 0 DMAR-MSI 0-edge dmar0 25: 0 0 0 0 DMAR-MSI 1-edge dmar1 27: 3354 1209 1072 643 IR-PCI-MSI 327680-edge xh ci_hcd 28: 13695 5714 2984 1956 IR-PCI-MSI 512000-edge ah ci[0000:00:1f.2] 29: 681 98 84 67 IR-PCI-MSI 1572864-edge a hci[0000:03:00.0] 30: 5145 1417 769 657 IR-PCI-MSI 409600-edge et h0 31: 1437 334 95 142 IR-PCI-MSI 442368-edge vf io-msi[0](0000:00:1b.0) 32: 3232 268 158 368 IR-PCI-MSI 524288-edge vf io-msi[0](0000:01:00.0) NMI: 0 0 0 19 Non-maskable interrupts LOC: 106321 72012 69739 127602 Local timer interrupts SPU: 0 0 0 0 Spurious interrupts PMI: 0 0 0 19 Performance monitoring interr upts IWI: 0 0 0 0 IRQ work interrupts RTR: 3 0 0 0 APIC ICR read retries RES: 12242 10457 8108 12691 Rescheduling interrupts CAL: 3000 3157 2693 2556 Function call interrupts TLB: 2600 2698 2124 2128 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts THR: 0 0 0 0 Threshold APIC interrupts DFR: 0 0 0 0 Deferred Error APIC interrupt s MCE: 0 0 0 0 Machine check exceptions MCP: 2 2 2 2 Machine check polls ERR: 0 MIS: 0 PIN: 0 0 0 0 Posted-interrupt notification event PIW: 0 0 0 0 Posted-interrupt wakeup event Quote Link to comment
Squid Posted March 1, 2017 Share Posted March 1, 2017 1 hour ago, Marv said: 16: 19 5 0 15 IR-IO-APIC 16-fasteoi ehci_ I think you'll be safe to ignore this. IRQ 16 is ehci, which should be the USB... Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.