Alex R. Berg

Members
  • Posts

    204
  • Joined

  • Last visited

  • Days Won

    1

Alex R. Berg last won the day on October 30 2018

Alex R. Berg had the most liked content!

1 Follower

About Alex R. Berg

  • Birthday 05/09/1977

Converted

  • Gender
    Male
  • Location
    Denmark

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Alex R. Berg's Achievements

Explorer

Explorer (4/14)

21

Reputation

  1. I'm running unraid 6.12.4. This is likely not unRaid specific, but I don't know. TLDR: I can mount a LUKS drive but sometimes when on luksClose it won't close, but so far waiting 15 minutes helps. I tried creating a LUKS encrypted file using this guide: https://blog.tian.it/create-encrypted-luks-sparse-file/ When I unmount the mounted filesystem and then try to close the LUKS using ``` umount /cryptoarchive/stuff cryptsetup luksClose "/dev/mapper/stuff" ``` then I get the following error: ``` Device /dev/mapper/stuff.img is still in use. ``` Running `df` I can see the unmount worked, and also the `umount` didn't fail. `lsof` shows me that the loop-device file is (of cause) still in use, but no files on the mounted dir `/cryptoarchive/stuff` is shown . Once after waiting an hour or so, I tried remounting and then umounting immediately afterwards, and that one time luksClose succeeded. I have shared the mount point via samba, but as mentioned umount works. I have attached my mount and unmount scripts. Thx, Alex luks-unmount.sh luks-mount.sh
  2. Ah, I had forgotten that part of the mover semantics, that the mover will leave any file currently in use. That kind of makes my hypothesis less likely. Though it might have started moving a file that then becomes used. I haven't seen the problem at all since changing my duplicati + Resilio Sync dockers to use /mnt/disk* and /mnt/cache instead of /mnt/user. Before it was there nearly once a month or so. But now I just reliably reproduced the issue. Though the reliable way is unfortunately just by calling the move script incorrectly. But maybe you can find something. This makes it way more likely that the actual bug is in the move script or how the move-script uses shfs (or something...), since it still manages to crash the shfs process Note that this thing I reproduced is not something I have done in the previous error-reports, previously it seemed quite reasonable that it was the actual mover causing the error, not my custom move-script, due to the time-of-day was exactly when the actual mover ran. --- I configured on of my shares to be cache-only. Currently the shares files are on disk1. This caused the issue immediately each time I run my custom mover which use Limetechs move-command, (but use it incorrectly, as I discovered) I now know that I can reproduce the issue when neither of these were running: Docker stopped Parity check not running Mover disabled using touch /var/run/mover.pid, and check its not running To crash the server I simply run: echo /mnt/user0/home/svnbackup/dead.letter | /usr/local/bin/move -e 0xfffffffe Other files on same share give the same result, when using /mnt/user0 `0xfffffffe` should have been the magic word for move to cache (many years ago). However I can see that the mover script has changed and it no longer uses the new 'move' command in this way, so maybe this problem is something that has nothing to do with the other problem. Though it seems to crash in a similar way (different code, but also different unRaid version). If I run this (which seems to be what mover does) then I get: > echo /mnt/user0/home/svnbackup/dead.letter | /usr/local/bin/move "" move_object: /mnt/user0/home/svnbackup/dead.letter File exists > find /mnt/*/home/svnbackup -maxdepth 0 /mnt/disk1/home/svnbackup /mnt/user/home/svnbackup /mnt/user0/home/svnbackup If I run it CORRECTLY then I get - using disk1 instead of user0: > echo /mnt/disk1/home/svnbackup/dead.letter | /usr/local/bin/move "" then it just moves the file to cache as it should. I'm running unRaid 6.12.4 now. PS: I'm now following the topic, so maybe that should help me discover late comments. tower-diagnostics-20231105-1706-shfs-crash-5-anon.zip
  3. I had a leading space in my 'Docker vDisk location:', which caused the docker image to be located on root / in a directory called ' ' (space). Since root is memory, it empty on each reboot. It was quite difficult for me to figure out that was the file being written. It is easy to get into that situation, just stop the docker, add a leading space, and apply and enable docker. It is difficult to get out of the situation because the GUI will not let me remove the space or change it to a valid location, UNLESS i use the GUI folder popup navigation. If I click my way to the correct folder then the GUI updates the location. I suspect the issue surfaced after i changed my docker image to be located on an external USB mounted with unassigned devices. Background: I'm running a QNAP hardware with unRaid, I don't have room for SSD, and don't want disks spinning up. Hence plugged in an extra USB, used unassigned devices (thank you devs) to auto-mount usb, and moved the docker image there. --- So my suggestion is to do a simple trim of the path on apply. Or something like that.
  4. I have a theory of what is causing the crash, that I posted here: It the theory holds up, it also guides us how to avoid triggering the issue.
  5. Right that fits with my experience. So it seems that when a unRaid process moves a file away from a disk via disk share /mnt/disk1 (such as the mover process) while a docker container accesses (reads?) the file via the '/mnt/user'-share, then it'll crash the shfs. It's probably the remove operation 'rm /mnt/disk1/someFile' that conflics with shfs reading that file, possible it could also be related to that same file appearing in shfs on another disk. It's vague understanding that it is shfs that creates the virtual image /mnt/user as the sum of /mnt/disk*. So I would say if someone wants to reproduce this issue, like just create a docker that reads a large GB sized file from /mnt/user and then have a mover process running directly on unraid that moves that file back and forth between two disks. The reason why I was only seeing it in the night and not from my custom mover is almost certainly that my custom mover runs every 5 minutes checking disks are idle and only then moving files. */5 * * * * /home/alex/bin/move_if_idle > /dev/null See attached files, for content of my move_if_idle move_if_idle are_disks_idle
  6. Thank you, I havn't noticed that line as being the first line, or I haven't previously followup on it. It does (did) keep happening. Now I have changed my dockers which access lots of files (backup+Resilio sync) and only allow those to access my files via disk-shares and so far it looks good. Its a bit to early to tell though. I'll report back in a few weeks, with whether it has been stable by then. It was happening maybe once-twice a week when my backup was processing all my files checking checksums through /mnt/user. I've been exporting diagnostics when it happened many times back last year, I just checked previous logs. 2022: Aug 22 03:40:01 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync/AppBackup/.sync/Archive Aug 22 03:40:01 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync/AppBackup/.sync Aug 22 03:40:02 Tower kernel: shfs[23214]: segfault at 0 ip 00000000004043a4 sp 0000152d0c670780 error 4 in shfs[402000+c000] Aug 22 03:40:02 Tower kernel: Code: 48 8b 45 f0 c9 c3 55 48 89 e5 48 83 ec 20 48 89 7d e8 48 89 75 e0 c7 45 fc 00 00 00 00 8b 45 fc 48 63 d0 48 8b 45 e0 48 01 d0 <0f> b6 00 3c 2f 74 43 8b 05 8f df 00 00 85 c0 78 2f e8 16 e0 ff ff Aug 22 03:40:02 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync/AppBackup Aug 22 03:40:02 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync Aug 29 03:40:01 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync/AppBackup/.sync/Archive Aug 29 03:40:01 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync/AppBackup/.sync Aug 29 03:40:01 Tower kernel: shfs[1963]: segfault at 0 ip 00000000004043a4 sp 0000147af70f3780 error 4 in shfs[402000+c000] Aug 29 03:40:01 Tower kernel: Code: 48 8b 45 f0 c9 c3 55 48 89 e5 48 83 ec 20 48 89 7d e8 48 89 75 e0 c7 45 fc 00 00 00 00 8b 45 fc 48 63 d0 48 8b 45 e0 48 01 d0 <0f> b6 00 3c 2f 74 43 8b 05 8f df 00 00 85 c0 78 2f e8 16 e0 ff ff Aug 29 03:40:01 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync/AppBackup Aug 29 03:40:01 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync Sep 2 03:40:01 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync/AppBackup/.sync/Archive Sep 2 03:40:01 Tower move: error: move, 392: No such file or directory (2): lstat: /mnt/cache/sync/AppBackup/.sync Sep 2 03:40:01 Tower kernel: shfs[16416]: segfault at 0 ip 00000000004043a4 sp 0000152977cfb780 error 4 in shfs[402000+c000] Sep 2 03:40:01 Tower kernel: Code: 48 8b 45 f0 c9 c3 55 48 89 e5 48 83 ec 20 48 89 7d e8 48 89 75 e0 c7 45 fc 00 00 00 00 8b 45 fc 48 63 d0 48 8b 45 e0 48 01 d0 <0f> b6 00 3c 2f 74 43 8b 05 8f df 00 00 85 c0 78 2f e8 16 e0 ff ff Sep 16 03:40:02 Tower kernel: shfs[27859]: segfault at 0 ip 00000000004043a4 sp 000014caa2cb2780 error 4 in shfs[402000+c000] Sep 16 03:40:02 Tower kernel: Code: 48 8b 45 f0 c9 c3 55 48 89 e5 48 83 ec 20 48 89 7d e8 48 89 75 e0 c7 45 fc 00 00 00 00 8b 45 fc 48 63 d0 48 8b 45 e0 48 01 d0 <0f> b6 00 3c 2f 74 43 8b 05 8f df 00 00 85 c0 78 2f e8 16 e0 ff ff 2023: Jun 18 03:40:02 Tower kernel: shfs[4635]: segfault at 0 ip 000055c0d617359c sp 0000150db53fa810 error 4 in shfs[55c0d6171000+c000] Jun 18 03:40:02 Tower kernel: Code: 48 8b 45 f0 c9 c3 55 48 89 e5 48 83 ec 20 48 89 7d e8 48 89 75 e0 c7 45 fc 00 00 00 00 8b 45 fc 48 63 d0 48 8b 45 e0 48 01 d0 <0f> b6 00 3c 2f 74 4d 8b 05 67 dd 00 00 85 c0 78 39 e8 9e da ff ff Aug 10 03:40:02 Tower kernel: shfs[8688]: segfault at 0 ip 000055a8e226f59c sp 000014bf269d9810 error 4 in shfs[55a8e226d000+c000] Aug 10 03:40:02 Tower kernel: Code: 48 8b 45 f0 c9 c3 55 48 89 e5 48 83 ec 20 48 89 7d e8 48 89 75 e0 c7 45 fc 00 00 00 00 8b 45 fc 48 63 d0 48 8b 45 e0 48 01 d0 <0f> b6 00 3c 2f 74 4d 8b 05 67 dd 00 00 85 c0 78 39 e8 9e da ff ff I picture presents it self. I have lots of things running during the day, but I also log whenever run-parts execute something, and it seems in the above examples that the previous onces finished long before. Like lines like these: Aug 29 03:24:49 Tower /etc/cron.daily/userScripts/run_vps_up_test: Executing Aug 29 03:24:49 Tower /etc/cron.daily/userScripts: run-parts completed I also checked all my other crontab-jobs and userScripts, and I'm fairly certain I have nothing else running at that specific time. Hmm, so I can just disable the mover and then my problems will probably be fixed? That does sound kind of very strange given that I'm using my own custom mover that is based on mover-codebase and runs the mover codebase regularly throughout the day. I run this 0:40 every day, so if it was just the mover script then this should also have triggered the issue some times (unless other script of mine have turned off the docker to make backups, but I don't think I have that any longer): #!/usr/bin/bash $ALEXBIN/custom_mover /usr/local/sbin/mover > /dev/null # Completion check was previously moved to appBackup which turns off docker and services: \\towerAlex\appbackup\backupRsync # but now back here as there's no longer an appbackup mover_completion_check I don't know if the scheduled mover does anything different that running '/usr/local/sbin/mover'. --- I'll leave my docker changes so stuff run on /mnt/disk* for the next month or two. I'll also change the mover schedule to run at another time, so if it crashes again I would expect a new timestamp. If it runs smoothly for a couple of months, I'll try changing by dockers to use /mnt/user again, and see it the problem the resurfaces. Will it be helpful to track the issue down to anyone else if I report back and try to recreate the issue? PS: I updated the title to include 'shfs segfault', to make it easier for others to find this issue.
  7. Thank you itimpi. Here's my diagnostics. I have further anonymized the syslog. It echoed all my sudo-lines, and lots of scripts I was running, I have removed those lines that seemed private, but kept those at the end around where the problem arises, in the hope of not removing anything important for the diagnostic-process. Best Alex tower-diagnostics-20230729-0846-anonymized-extra-anonymized.zip
  8. I have this problem and have had it for probably a couple of years now. $ ls /mnt/user /bin/ls: cannot access '/mnt/user': Transport endpoint is not connected There have been many others telling of the same problem: https://forums.unraid.net/topic/102568-transport-endpoint-is-not-connected-with-mntuser/ https://forums.unraid.net/topic/115300-binls-cannot-access-user-transport-endpoint-is-not-connected/ - The first mentions it could be docker accessing /mnt/user. This could very well fit my usecase. Because I use Duplicati (running via docker) to backup my /mnt/disk* and /mnt/user, and let Duplicati deduplicati the double backup. So if a disk+parity fails I can restore files from that disk, and if I lose a /mnt/user/ folder I can restore that across all disks. Has any found a solution to this problem? Is there a different way of using docker, so that it can access /mnt/user without the issue? Has anyone determined that docker IS the cause of this issue (at least in their case) ?
  9. There's a bug in the detection of whether WORK_DIR is on persistent storage, so it always says persistent, unless its on 'ramfs' (the first match in the array). This code fixes it: Adding 'tmpfs' to array, and don't break on first mismatch (in .plg file) denyhosts_datacheck() { array=( ramfs proc tempfs sysfs tmpfs ) fs=$( stat -f -c '%T' $WORK_DIR ) if [ "$fs" = "msdos" ]; then echo "<p style="color:red\;"><b>WARNING:</b> Your WORK_DIR is located on your flash drive. This can decrease the life span of your flash device!</p>" else found=0 for i in "${array[@]}" do if [[ "'$i'" = "'$fs'" ]]; then echo "<p style="color:red\;"><b>WARNING:</b> Your WORK_DIR is not persistent and WILL NOT survive a reboot. The WORK_DIR maintains a running history of past DenyHosts entries and ideally should be maintained across reboots. Please locate your WORK_DIR on persistent storage. eg. cache/array disk</p>" found=1 break fi done if (( ! $found )) ;then echo "<p style="color:green\;">WORK_DIR located on persistent storage. Your data will persist after a reboot :-)</p>" fi fi } I'm not really sure whether its a good idea to put it on /boot due to spamming writes on USB, and also I would prefer it to be not dependent on /mnt being available. So I suspect the best option would be copying to/from /boot on start/stop or mount or something like that. What do others do to persist the data, and what is the data? I would be fine with moving the deny-lists to /boot as I expect those are not written frequently, unless of cause I'm spammed from unlimited number of ipv6 addresses... (if that can happen...)
  10. TLDR: see below for my solution to revert the so called 'SSH Improvements', without reverting the security improvements. It is simply to 1) revert the /etc/profile HOME=/root change, and 2) fix users by usermod, and 3) possibly change sshd_config. Man that was one seriously annoying and very bad documented change. It effectively and purposefully disabled all non-root accounts for login on the linux system that unRaid was. At least they could have made bloody clear in the documentation of the change set for 6.9 what they changed, what the consequences would be for users who previously have used login using non-root account, and how those changes were impremented (like change in etc/profile) so we could undo it if in need. Its not like rocket-science on linux to use an admin account instead of root. Thank you to the people of this thread to present the issue and the changes, that was much appreciated. And thank you to @Mihai who found the security issue. I do appreciate fixing the security, but there's zero security risk of having extra non-root users available in linux for login via ssh-key, as apposed to just root (or please enlighten me). I have found a solution for me, though in time I suppose its wise to take the hint that Lime-Tech will happily in the future destroy our multi-user system, so I guess docker or vm. But for now for me that too big a hassle to migrate all my stuff there. My solution involves reverting the annoying "export HOME=/root' change in /etc/profile. Reverting to basic linux standard can hardly be a issue, security wise. My Solution At boot in the go-script I copy all files at /boot/custom/* to / (root), so /boot/custom/etc/profile goes to /etc/profile, and the original /etc/profile goes to /boot/custom_original/etc/profile, so on unRaid upgrades I can detect changes (if I so desire). 1) Revert /ett/profile change: "export HOME=/root' 2) Fix user home-dir and shell (/etc/passwd): For each non-root user that need login-shell, I run usermod to set home-dir and shell, for instance usermod -d /mnt/user/home/alex -s /bin/bash alex usermod -d /home/admin -s /bin/bash admin 3) sshd_config: Password-security in SSH is not good enough for me, and shouldn't be good enough for you if you attach your server so it can be reached from the internet via ssh. For that I use these sshd_config changes. I accepted LimeTechs change of specifiyng which users have `AllowTcpForwarding`, I just allow more users # To disable tunneled clear text passwords, change to no here! PasswordAuthentication no PermitEmptyPasswords no ChallengeResponseAuthentication no #Limit access to the following users AllowUsers root admin alex # limetech - permit only root SSH tunneling AllowTcpForwarding no Match Group root AllowTcpForwarding yes Match User alex AllowTcpForwarding yes Match User admin AllowTcpForwarding yes 3a) I place sshd_config in `/boot/custom/etc/ssh` instead of /boot/config/ssh, just for that simple reason that my go-script backup unRaid original sshd_config via the custom folder. If you don't use my custom-script, just go with /boot/config/ssh. The keys I leave in /boot/config/ssh, but also copy to /home/admin/.ssh so I have my admin user for when array isn't mounted. unraid-custom-fix-ssh-improvements.zip
  11. The UnRaid > Share gui does not correct present the state of includes defined, when the share includes non-existing disks. Any change in the gui (such as description) will remove non-existing disks from the include-definition of the share. I shrunk my array to one disk from four disks. After that operation I could not write any disk via user-share mount (no space available error), but there was no information regarding this in the UnRaid > Share gui. I realised this was caused by /boot/config/shares/foobar.cfg contaning shareInclude="disk2,disk3" but I only had disk1. I tested it and writing is possible if shareInclude="disk1,disk2", but then when updating share it becomes shareInclude="disk1" Since the OS updates the share automatically after save, it would make sense that it fixes the share on mounting the array the first time it has fewer disks, and simply removes any disk definition from include/exclude that doesn't exist. Or it would be nice to have a button on the outer share overview page that did this.
  12. Hi, I'm interested in getting fan-control to work based on disk-temperatures on QNAP with unRaid. I have bash-experience. I've forgotten which packages to install on unRaid to get c-compilation, so would appreciate a few hinters (I have dev-pack/dev tools plugin, but make fails with 'libguile-2.2' missing lib). Also your script is gone. If you share a gist or a github I can also take it from there. Best Alex
  13. cache-dirs will never be a real proper caching. Its a hack, where we try to poke linux to cache dirs. Maybe that's what you encounter. You can try to enable logging. You can find some info in this thread somewhere on the logging I believe, also see /var/log/cachedirs or something. It says when it scans everthing again, which on my system happens daily and esp. during large operations like parity check which puts load on the system. Or it could be it just works really bad for you for unknown reasons. What you tried with cache-pressure is also my experience, and for safety I use 1, but also tried 0 shortly.
  14. I'm not sure its possible, that has never been tested. But most likely some of these work You can double-escape with "\"foo bar\"". Also you can escape the space instead: "_CCC\ SafetyNet". This should be equivalent to above. And maybe you need to escape the backspace as well, so you add '\\' which might turn into "\": "_CCC\\\ SafetyNet". It depends on how many layers of quote-removing happens from the gui to the execution. Bash sucks when it comes to spaces in names, as bash operates on strings and interprets spaces as new arguments when calling another function. And unfortunately the script is written in bash, probably it grew from something small to something substantial.
  15. I also like Everything. I've set it to scan daily, but I do also have lots of non-windows shared files like subversion repositories and backups that I don't need it to touch. I don't really think its relevant for me for that usage to use cache-dirs as its only a daily scan for me, but of cause its nice that its there. I don't know why that happens. I do believe unRaid (with some settings) delete empty folders on array disks after moving, but it doesn't delete empty folders that we, the users, have left there. I had some share settings set to Use Cache: Prefer or Only. I do believe after moving, unRaid deletes empty folders in the source-disk. But I also experienced that unRaid does not move files to reflect changed settings, so maybe you have to manually delete empty dirs. `find /mnt/disk* /mnt/cache -type d -empty -mindepth 1 -delete` should delete empty dirs and only that, and thus cleanup your entire array. I advice testing such commands first in a safe location. Maybe with -maxdepth 1 to see root-deletes, and maybe delete mindepth 1 so you also cleanup empty root. You can play around a bit without the -delete option to see what it finds. I have no idea whether this has anything to do with your problem, I just noticed you mentioned it scanning empty dirs. I'm using unRaid 6.8.3.