killeriq

Family 6 Model 92 CPU: only decoding architectural errors

Recommended Posts

killeriq    1

Hello,

 

In "Fix Common Problems" ive got notice:

Machine Check Events detected on your server

Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged

 

 

 

So ive run : 
mcelog: Family 6 Model 92 CPU: only decoding architectural errors 

 

----

I have ASrock J4205-ITX mainboard

What's next? :)

 

thanks

Share this post


Link to post
Share on other sites
unevent    10

Post a diagnostics.  It may be mcelog does not support that processor, but need more info from syslog to verify.

Share this post


Link to post
Share on other sites
Squid    515
Posted (edited)
Quote

May  4 21:35:03 unRAIDTower mcelog: Running trigger `unknown-error-trigger'
May  4 21:35:03 unRAIDTower mcelog: CPU 0 on socket 0 received unknown error
May  4 21:35:03 unRAIDTower mcelog: Location: CPU 0 on socket 0

Quote


May  4 21:35:04 unRAIDTower root: Uncorrected error

 

A hardware failure did trigger the machine check event.  But mce doesn't have it classified 

Quote

The unknown-error-trigger runs on any errors not otherwise categorized.

 

If the mce doesn't re-occur (reset your server to clear out the already existing log), then I'd chalk it up to the stars just weren't aligned properly.

 

But, if it does reoccur, then its going to wind up being on of the following

 

- Power Supply

- CPU

- Motherboard (in particular the voltage regulation on the board)

 

Unfortunately, only if the problem is reoccuring can you diagnose what is actually causing it...  And of course we don't know if anything was actually affected by the uncorrected error or not.

Edited by Squid

Share this post


Link to post
Share on other sites
unevent    10

Not sure if mcelog is running as daemon with this plugin, can try from console/telnet/SSH, enter

mcelog --client > mcelog.txt

and post the txt file if it contains anything, preferably after unRAID has been running for several hours/days.

 

Beyond that I can only suggest rebooting and running the memtest from the boot menu at least overnight to see if you get memory errors.  If none and the server is not locking up then I'd say ignore it.

Share this post


Link to post
Share on other sites
killeriq    1

This is what ive got:

root@unRAIDTower:/mnt/user/Appz# mcelog                                                                                                    
mcelog: Family 6 Model 92 CPU: only decoding architectural errors                                                                          
root@unRAIDTower:/mnt/user/Appz# mcelog --client > mcelog.txt
mcelog: client connect: No such file or directory                                                                                          
mcelog: client command write: Transport endpoint is not connected                                                                          
mcelog: client read: Invalid argument                                                                                                      
mcelog: client connect: No such file or directory                                                                                          
mcelog: client command write: Transport endpoint is not connected                                                                          
mcelog: client read: Invalid argument 

 

unraidtower-diagnostics-20170516-0121.zip

Share this post


Link to post
Share on other sites
killeriq    1

did the test for the whole night cca 8h and no issue , RAM are new did test also before i build the server.

But now server was if some failed state, not sure if you can see the error in diag, so i made screenshot as well.

 

I didnt had those issues before, strange - ive changed USD flash drive before i bought the product, but as far i understand it just load it once during the boot and thats it...

unraidtower-diagnostics-20170519-1405.zip

C360_2017-05-19-13-59-24-255.jpg

Share this post


Link to post
Share on other sites
unevent    10

Noticed the clocksource doing dance between tsc and hpet.  Perhaps Apollo Lake support in 4.9 Kernel not quite fully ready?  Someone with more Kernel experience will need to chime in or get the attention of Limetech.

Share this post


Link to post
Share on other sites
killeriq    1

no clue :( hope someone from Limetech could have a look - or i might try to mail them, if no reply within few days

 

thanks anyway!

Share this post


Link to post
Share on other sites
unevent    10

Tsc was final clocksource and had switched to it from hpet. I had a loop in my search and didn't catch it and assumed it switched from tsc back to hpet, but it didn't for the duration of the log that was posted. If keep having issues post another diagnostic before rebooting, if possible.

 

Sent from my ASUS_Z00AD using Tapatalk

 

 

 

 

Share this post


Link to post
Share on other sites
killeriq    1
May 24 04:40:08 unRAIDTower root: Fix Common Problems: Error: Machine Check Events detected on your server
May 24 04:40:08 unRAIDTower mcelog: Running trigger `unknown-error-trigger'
May 24 04:40:08 unRAIDTower mcelog: CPU 0 on socket 0 received unknown error
May 24 04:40:08 unRAIDTower mcelog: Location: CPU 0 on socket 0
May 24 04:40:08 unRAIDTower root: mcelog: Family 6 Model 92 CPU: only decoding architectural errors
May 24 04:40:08 unRAIDTower root: mcelog: Family 6 Model 92 CPU: only decoding architectural errors
May 24 04:40:08 unRAIDTower root: Hardware event. This is not a software error.
May 24 04:40:08 unRAIDTower root: MCE 0
May 24 04:40:08 unRAIDTower root: CPU 0 BANK 4 
May 24 04:40:08 unRAIDTower root: ADDR fef13b80 
May 24 04:40:08 unRAIDTower root: TIME 1495568891 Tue May 23 21:48:11 2017
May 24 04:40:08 unRAIDTower root: MCG status:
May 24 04:40:08 unRAIDTower root: MCi status:
May 24 04:40:08 unRAIDTower root: Uncorrected error
May 24 04:40:08 unRAIDTower root: MCi_ADDR register valid
May 24 04:40:08 unRAIDTower root: Processor context corrupt
May 24 04:40:08 unRAIDTower root: MCA: Internal unclassified error: 408
May 24 04:40:08 unRAIDTower root: STATUS a600000000020408 MCGSTATUS 0
May 24 04:40:08 unRAIDTower root: MCGCAP c07 APICID 0 SOCKETID 0 
May 24 04:40:08 unRAIDTower root: CPUID Vendor Intel Family 6 Model 92
May 24 04:40:08 unRAIDTower root: <27>May 24 04:40:08 mcelog: CPU 0 on socket 0 received unknown error
May 24 04:40:08 unRAIDTower root: <27>May 24 04:40:08 mcelog: Location: CPU 0 on socket 0

it start to drive me nuts...:(

unraidtower-diagnostics-20170524-2314.zip

Share this post


Link to post
Share on other sites
killeriq    1

so i got reply from support , suggesting that might be HW error, so went to  shop, got the same new mainboard - same errors.

 

So its definitely not HW but OS issue, probably Apollo Lake chipset is not implemented right or what.

 

New MB is on 1.20 bios , "old" 1.30 bios and same errors

 

1. want to try different USB drive, but not sure yet how to do it with license...

 

2. Also rollback to 6.3 or so

Share this post


Link to post
Share on other sites
unevent    10

If you are not experiencing lockups or other problems, just uninstall the plugin so you don't get the mcelog errors and ignore it.  Apollo Lake is still a work in progress in the Linux kernel and 4.10+ should provide better support based on what I am reading.  You are not the only Apollo Lake that is having similar issues and has to do with Linux, not unRAID.

Share this post


Link to post
Share on other sites
killeriq    1

this is why i like windows....dont need to wait 6-12m till the new HW is supported.

 

Ok but at least we know HW is FINE and its linux problem.

---

I do have some reboots , not regular but 1x per 2-3d or so, no clue why

Will check the mce plugin, thx

Share this post


Link to post
Share on other sites
jonp    71

Just one more update to supplement my latest e-mail, we are approaching the release of 6.4-rc1 which will be on the 4.11 kernel.  When we do, please try it and let us know if the errors persist.

Share this post


Link to post
Share on other sites
airbillion    0
Posted (edited)

Any luck with this Mobo and the new 6.4rc2 version of unraid?

 

I wanted to purchase this board, but wanted to make sure it is supported by unraid...

 

Any info would be appreciated!

 

Thanks!

Edited by airbillion

Share this post


Link to post
Share on other sites
airbillion    0
im on vacation...so i can try next week Thursday or so

Awesome...I look forward to you results!

Thanks!

Sent from my ONEPLUS A3000 using Tapatalk

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Copyright © 2005-2017 Lime Technology, Inc. unRAID® is a registered trademark of Lime Technology, Inc.