nexusmaniac Posted April 17, 2017 Share Posted April 17, 2017 Hi All, I've had my system lock up a couple of times (since 6.3.3) in the last 2 weeks... This is the 2nd time, only 1.5days of uptime before this crash - believe it was ~7 the last time. I can't get to the GUI, SSH, any Dockers or via Local access (IPMI) I'm out of ideas but wondered if there was anything I can / should grab, before / after rebooting? Quote Link to comment
Squid Posted April 17, 2017 Share Posted April 17, 2017 If you can login locally at the keyboard / monitor, then diagnostics and upload the resulting file. If you can't, then reboot, install Fix Common Problems plugin, put it into troubleshooting mode via its settings and wait for the next crash then upload the last set of diagnostics and FCPsyslog_tail.txt stored in the logs folder on the flash drive Quote Link to comment
nexusmaniac Posted April 17, 2017 Author Share Posted April 17, 2017 28 minutes ago, Squid said: If you can login locally at the keyboard / monitor, then diagnostics and upload the resulting file. If you can't, then reboot, install Fix Common Problems plugin, put it into troubleshooting mode via its settings and wait for the next crash then upload the last set of diagnostics and FCPsyslog_tail.txt stored in the logs folder on the flash drive IPMI is as good as local access Sometimes better because there's always an output on screen, anyway - Input isn't working, not even CTRL+ALT+DEL, it's locked up I'll go ahead and install that Cheers Quote Link to comment
nexusmaniac Posted April 27, 2017 Author Share Posted April 27, 2017 I've just had another crash... Syslog attached! Plenty of call traces syslog.log Quote Link to comment
nexusmaniac Posted April 27, 2017 Author Share Posted April 27, 2017 I didn't have the Fix Common Problems installed... But most commands crashed when I ran them. diagnostics - hangs ip a - hangs ifconfig - hangs ping 8.8.8.8 - works but every packet drops (i.e. nothing shows up at all) (Internet is perfect) cp to array - hangs cp to /boot - works (hence my syslog being available) Quote Link to comment
limetech Posted April 27, 2017 Share Posted April 27, 2017 3 hours ago, nexusmaniac said: I've just had another crash... Syslog attached! Plenty of call traces syslog.log A lot going on in there... but for one thing it appears your cache disk/pool is full: Apr 24 18:03:18 Raptor shfs/user: share cache full Also, you're better off posting diagnostics.zip file since it will anonymize some info in the logs and provide more information needed to troubleshoot. Finally, please don't cross post. 1 Quote Link to comment
nexusmaniac Posted April 27, 2017 Author Share Posted April 27, 2017 7 minutes ago, limetech said: A lot going on in there... but for one thing it appears your cache disk/pool is full: Apr 24 18:03:18 Raptor shfs/user: share cache full Also, you're better off posting diagnostics.zip file since it will anonymize some info in the logs and provide more information needed to troubleshoot. Finally, please don't cross post. Indeed there is! So, when the system was in this state (happened several times now) I'd better install the Fix Common Issues plugin tbh... Anyway, when in this state, the WebUI was unreachable from anything, networking is just gone, i.e. even my router can't see the server, it's as if it was unplugged (it ofc wasn't, link lights etc.) I have IPMI so I can get to the local console, the diagnostics command hung for 20 mins before I finally pulled the syslog and did a reboot. Any commands like "ip a" or "ifconfig" they also hung the terminal. Even CTRL+C didn't help, I had to switch through my ALT+F1, 2, 3, 4 terminals haha So I'm not sure how I could've retrieved a diag.zip for you I did cut out a bit from my syslog, I didn't see anything too too scary in there Apologies for the cross post - I know it's frowned upon but it semi-relates to the 6.3.3 release (at least from my POV) as it's never happened before the latest release. I'm a massive tech so I get it... but I'm still a normal impatient consumer when it comes to critical kit like my NAS! Haha My array doesn't look full to me? Not sure what gave you that impression? What's the next step here? Quote Link to comment
nexusmaniac Posted April 27, 2017 Author Share Posted April 27, 2017 19 minutes ago, limetech said: Apr 24 18:03:18 Raptor shfs/user: share cache full Aha... I see, that's just the 85% limit I believe, normally sends me an email once that threshold it reached (and that message shows up in syslog) Not full full afaik Quote Link to comment
limetech Posted April 27, 2017 Share Posted April 27, 2017 2 hours ago, nexusmaniac said: Not full full afaik No that message is generated by shfs itself. Probably one of your shares hit it's cache usage "floor" setting, or overall cache "floor" has been reached. This info is in the diagnostics.zip. If you have console access you can generate diagnostics.zip file using this command: diagnostics It will create the timestamped zip file in the 'logs' directory on the flash. Quote Link to comment
nexusmaniac Posted April 28, 2017 Author Share Posted April 28, 2017 8 hours ago, limetech said: No that message is generated by shfs itself. Probably one of your shares hit it's cache usage "floor" setting, or overall cache "floor" has been reached. This info is in the diagnostics.zip. If you have console access you can generate diagnostics.zip file using this command: diagnostics It will create the timestamped zip file in the 'logs' directory on the flash. So, I've just experienced yet another crash. I've got no networking access, webUI isn't reachable, nor is SSH. I have local access via IPMI. I run the command diagnostics and all I get is "Starting diagnostics collection... _ It hangs there for a seemingly endless amount of time. I'll leave it on this screen for a while but whenever I've run it in the past, it's never taken this long! Surely there's a pretty big problem somewhere in unRAID if a share thinking it's full results in an almost full system crash... Basically lost my array and networking - Any attempt to access /mnt/* just hangs. /boot/* is the only directory I can work in and actually save log files to. Quote Link to comment
nexusmaniac Posted April 28, 2017 Author Share Posted April 28, 2017 9 hours ago, limetech said: No that message is generated by shfs itself. Probably one of your shares hit it's cache usage "floor" setting, or overall cache "floor" has been reached. This info is in the diagnostics.zip. If you have console access you can generate diagnostics.zip file using this command: diagnostics It will create the timestamped zip file in the 'logs' directory on the flash. I left that for an hour and it eventually crashed completely, not even accepting SYSRQ commands Here's my freshly booted diag.zip... attached raptor-diagnostics-20170428-1041.zip Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.