whiteatom Posted January 8, 2017 Share Posted January 8, 2017 Hi there, I'm having trouble with my dockers not stopping when I try to stop the array. Everything just hangs up and I can't even seem to kill them off with docker kill or docker rm -r. I noticed this yesterday when I tried to stop a docker and couldn't (from the web ui). The GUI hung and i couldn't even restart from the command line.. A scary hard reset later and everything was fine, but now, it's happened again trying to upgrade a docker. I have no idea where to go from here? Where to I look to determine what's causing this? How do I kill docker off? I'm pre-clearing a disk right now, so I reeeeeealy don't want to restart. Thanks. Quote Link to comment
Squid Posted January 8, 2017 Share Posted January 8, 2017 Hi there, I'm having trouble with my dockers not stopping when I try to stop the array. Everything just hangs up and I can't even seem to kill them off with docker kill or docker rm -r. I noticed this yesterday when I tried to stop a docker and couldn't (from the web ui). The GUI hung and i couldn't even restart from the command line.. A scary hard reset later and everything was fine, but now, it's happened again trying to upgrade a docker. I have no idea where to go from here? Where to I look to determine what's causing this? How do I kill docker off? I'm pre-clearing a disk right now, so I reeeeeealy don't want to restart. Thanks. since you can telnet in, diagnostics and if that hangs cp /var/log/syslog.txt /boot/syslog.txt and post the file when it happens again. Quote Link to comment
whiteatom Posted January 8, 2017 Author Share Posted January 8, 2017 Diagnostics runs fine. Output attached. I looked through the Docker log... only thing I see there is the "name conflict for CouchPotato" - that's when I tried an upgrade of the container that started the problem today. I have tried removing some CA plugins (the auto upgrade one for starts) so see if that changes things... no improvements. Thanks for the help. Update: I tried "docker exec -it CouchPotato /bin/bash" to see if I could get into the container.. it does nothing for about 3 seconds and then just dumps me back to the host prompt. Tried the same command on a docker I haven't tried to exit and I get a root@<container-id> prompt immediately. It really looks like these containers are stuck at the tail end of a shutdown if the terminal is not responding. One other thing to note. I am running 2 other processes - a pre-clear as already stated, and I am zeroing an old disk (still in the array), so I can remove it without having to rebuild parity (according to this process). This process always kinda screws with the system because one of the data drives is loosing it's file system. I have nothing mapped to that disk, only to user shares, and that drive is excluded from all shares, but I thought it was worth mentioning. Cheers knox-diagnostics-20170108-2021.zip Quote Link to comment
Squid Posted January 9, 2017 Share Posted January 9, 2017 Haven't seen that before best I can suggest would be to edit docker.cfg on the flash drive config folder and disable docker through it. Then from The command prompt do powerdown -r Which should be able to restart the server. After it comes back alive delete the docker.img from the docker tab and recreate it then add the apps back in via CA and the previous apps section Sent from my SM-T560NU using Tapatalk Quote Link to comment
whiteatom Posted January 9, 2017 Author Share Posted January 9, 2017 The powerdown command won't work either... the system is TOTALLY locked up by docker. Docker won't quit, the array won't stop, so nothing can be done. I set the docker config to no as you suggest and I guess I have to go through another hard reboot (pre clear just finished). Before rebooting, I ran the diagnostics (attached), but here's the only indication of a problem I can find, from docker.log: time="2017-01-09T13:44:12.367055840-03:30" level=info msg="Container 2d9e18b61a4dd529ab1924b36aa5312d43324fa5a0665b3984a95668a6f23d63 failed to exit within 10 seconds of SIGTERM - using the force" time="2017-01-09T13:44:22.367326715-03:30" level=info msg="Container 2d9e18b61a4d failed to exit within 10 seconds of kill - trying direct SIGKILL" Anyone else have any tips here? knox-diagnostics-20170109-1411.zip Quote Link to comment
Squid Posted January 9, 2017 Share Posted January 9, 2017 you can try /etc/rc.d/rc.docker stop umount /var/lib/docker [code] and then see if the powerdown will work.... But I doubt either command will work properly Quote Link to comment
whiteatom Posted January 9, 2017 Author Share Posted January 9, 2017 you can try /etc/rc.d/rc.docker stop umount /var/lib/docker [code] and then see if the powerdown will work.... But I doubt either command will work properly After I tried the powerdown -r, I looked in "ps aux" and the /etc/rc/d/rc.docker stop was running for 40mins before I got bored and hard reset the box. It's back up now without docker, so I'll try removing the img as you suggested. Quote Link to comment
Squid Posted January 9, 2017 Share Posted January 9, 2017 you can try /etc/rc.d/rc.docker stop umount /var/lib/docker [code] and then see if the powerdown will work.... But I doubt either command will work properly After I tried the powerdown -r, I looked in "ps aux" and the /etc/rc/d/rc.docker stop was running for 40mins before I got bored and hard reset the box. It's back up now without docker, so I'll try removing the img as you suggested. Like I said, I have no clue what went wrong on the update - never seen it before, and my apps update every week... Quote Link to comment
whiteatom Posted January 9, 2017 Author Share Posted January 9, 2017 Ok.. still no improvements. Renamed the docker.img to docker.old and started fresh. Added a new docker for plex and it started fine - tested Plex and then tried to shut ti down. (click stop on the UI). The UI hung up for about 20 seconds and then I got a "Execution error Error code" error pop-up on the screen. The docker.log shows the same thing... time="2017-01-09T14:53:45.536132191-03:30" level=info msg="API listen on /var/run/docker.sock" time="2017-01-09T15:08:08.694244741-03:30" level=info msg="Container c30d84dab0d26e609175a008cf49db034ea02b20979747192513d4801cfb5477 failed to exit within 10 seconds of SIGTERM - using the force" time="2017-01-09T15:08:18.694564666-03:30" level=info msg="Container c30d84dab0d2 failed to exit within 10 seconds of kill - trying direct SIGKILL" And the docker is still running, even through Plex has shut down. Any new ideas? EDIT: eventually the docker did shut down, (nothing listed on docker ps), but the logs don't show anything.... Quote Link to comment
Squid Posted January 9, 2017 Share Posted January 9, 2017 Maybe post the whole diagnostics before you reboot. Might be something else going on Sent from my LG-D852 using Tapatalk Quote Link to comment
whiteatom Posted January 10, 2017 Author Share Posted January 10, 2017 OK.. moving a bunch of data now.. but when it's done, I'll run the diagnostics and reboot. Probably a hard reboot again.. yikes! Something is screwy because this only happened since I went to dual parity. docker.img is not on the array, so I don't really see how it's related, but there's gotta be something holding up the docker processes. Quote Link to comment
whiteatom Posted January 10, 2017 Author Share Posted January 10, 2017 Ok.. An update for you all because this is not a problem any more. It appears that when you are working the array hard and have a disk locked up, docker won't shut down cleanly - or at least the ones I have. Today, I was tail -f'ing the docker.log from earlier troubleshooting and in the same second my dd process (I'm zeroing an old disk for removal) ended, the docker log updated with the locked up containers exiting and the updates I had requested processing. I'm not sure if this is a knowing behaviour, a limitation, or a bug, but it's pretty easy to get around now I know, but someone else will probably run into it at some point. whiteatom Quote Link to comment
Mettbrot Posted October 2, 2017 Share Posted October 2, 2017 Hi, I have a similar problem. Since a couple of weeks I cannot seem to stop the docker containers, leading to unraid hanging on shutdown. I tried to update one of the dockers when I noticed the issue: After the download it said stopping docker: error removing image: error starting docker: unable to start because a docker with the same name is running /etc/rc.d/rc.docker stop is not running forever and shutdown does not work. I had to force shut down several times resulting in parity checks with up to 20 bad sectors. I recently removed the docker image file and reset everything. But now I have those errors again. Diagnostics are attached server-diagnostics-20171002-1503.zip Quote Link to comment
Slamer Posted April 26, 2018 Share Posted April 26, 2018 Hi there, I'm experiencing the same issue. The docker image Nzbget is freezing and can't stop it or remove it. This was happened after adding a volume mapping -v /mnt/usr/Exchange/intermediate:/intermediate as suggested by the install documentation from linuxserver/nzbget (https://hub.docker.com/r/linuxserver/nzbget/) I've tried using command line but no success: docker stop <container id> Diagnostics are attached : tower-diagnostics-20180426-1936.zip Thanks for your help. Quote Link to comment
trurl Posted April 26, 2018 Share Posted April 26, 2018 2 minutes ago, Slamer said: This was happened after adding a volume mapping -v /mnt/usr/Exchange/intermediate:/intermediate as suggested by the install documentation from linuxserver/nzbget I hope this is a typo, since /mnt/usr does not correspond to any actual storage and so would be a new folder created in RAM. All of the unRAID user shares are at /mnt/user Also looks like corruption in the cache pool Apr 26 10:37:14 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 Apr 26 10:37:14 Tower kernel: blk_partition_remap: fail for partition 1 Apr 26 10:37:14 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0 Apr 26 10:37:14 Tower kernel: blk_partition_remap: fail for partition 1 Apr 26 10:37:14 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 1, rd 2, flush 0, corrupt 0, gen 0 Apr 26 10:37:14 Tower kernel: blk_partition_remap: fail for partition 1 Apr 26 10:37:14 Tower kernel: BTRFS error (device sdb1): bdev /dev/sdb1 errs: wr 2, rd 2, flush 0, corrupt 0, gen 0 Quote Link to comment
JorgeB Posted April 26, 2018 Share Posted April 26, 2018 Those are read/write errors on cache2: Apr 26 10:37:14 Tower kernel: sd 1:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x01 driverbyte=0x00 Apr 26 10:37:14 Tower kernel: sd 1:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 00 60 08 a0 00 00 20 00 Apr 26 10:37:14 Tower kernel: print_req_error: I/O error, dev sdb, sector 6293664 Check cables, then run a correcting scrub Quote Link to comment
Slamer Posted April 27, 2018 Share Posted April 27, 2018 Hi, I finally succeeded to restart the server. After that I've started a parity check and obtained until now 535048090 Corrected Sync Errors after 18 hours. The estimated end time is in 23 hours! Is that a normal behaviour to get a lot by checking the parity check? Is the issue is coming from the HDDs? What's your recommandation to investigate. For information, I'm using : 2 SSD : 250GB (eSATA)+500GB(UDB3.0) for Caching 2 HD : 2*3TB(eSATA) for Parity 6 HD : 2*3TB(eSATA)+4*2TB(USB3.0) for Data Thanks a lot. Quote Link to comment
trurl Posted April 27, 2018 Share Posted April 27, 2018 2 minutes ago, Slamer said: Is that a normal behaviour to get a lot by checking the parity check? The only acceptable number of sync errors is exactly zero. After you correct those sync errors do another parity check to make sure you have zero sync errors. 1 Quote Link to comment
OFark Posted May 17, 2020 Share Posted May 17, 2020 I just had this problem with Plex. Couldn't terminate it, web page just showed a 503, console access was instantly disconnected. How I fixed it was to go to the Docker tab, advanced mode and force update it. Viola, up and running again. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.