FAQ for unRAID v6


Recommended Posts

How can I use ddrescue to recover data from a failing disk?

 

It can happen due to a variety of reasons, like a disk failing while parity is invalid or two disks failing with single parity, a user having a failing disk with pending sectors and no way to rebuild it using parity, for those cases you can use ddrescue to salvage as much data as possible.

 

To install ddrescue install the the NerdTools plugin then go to Settings -> NerdTools and install ddrescue.

 

You need an extra disk (same size or larger than the failing disk) to clone the old disk to, using the console/SSH type:

 

ddrescue -f /dev/sdX /dev/sdY /boot/ddrescue.log

Both source and destination disks can't be mounted, replace X with source disk, Y with destination, always triple check these, if the wrong disk is used as destination it will be overwritten deleting all data.

 

If this is not the first time you use ddrescue make sure you use a different log file (/boot/ddrescue.log) (or delete the existing one) or ddrescue will resume the previous run and possibly not do anything.

 

It's also possible to use an array disk as destination, though only if it's the same size as the original, but to maintain parity you can only clone the partition, so the existing array disk needs to be a formatted unRAID disk already in any filesystem, still to maintain parity you need to use the md# device and the array needs to be started in maintenance mode, i.e., not accessible during the copy, by using the command:

 

ddrescue -f /dev/sdX1 /dev/md# /boot/ddrescue.log


Replace X with source disk (note de 1 in the source disk identifier), # with destination disk number, recommend enabling turbo write first or it will take much longer.

 

 

Example output during the 1st pass:

 

GNU ddrescue 1.22
     ipos:  926889 MB, non-trimmed:    1695 kB,  current rate:  95092 kB/s
     opos:  926889 MB, non-scraped:        0 B,  average rate:  79236 kB/s
non-tried:    1074 GB,  bad-sector:        0 B,    error rate:       0 B/s
  rescued:  925804 MB,   bad areas:        0,        run time:  3h 14m 44s
pct rescued:   46.28%, read errors:       54,  remaining time:      3h 18m
                              time since last successful read:          0s
Copying non-tried blocks... Pass 1 (forwards)

After copying all the good blocks ddrescue will retry the bad blocks, forwards and backwards, this last part can take some time depending on how bad the disk is, example:

 

GNU ddrescue 1.22
     ipos:   17878 MB, non-trimmed:        0 B,  current rate:       0 B/s
     opos:   17878 MB, non-scraped:   362496 B,  average rate:  74898 kB/s
non-tried:        0 B,  bad-sector:    93696 B,    error rate:     102 B/s
  rescued:    2000 GB,   bad areas:      101,        run time:  7h 25m  8s
pct rescued:   99.99%, read errors:      260,  remaining time:         25m
                              time since last successful read:         10s
Scraping failed blocks... (forwards)

After the clone is complete you can mount the destination disk manually or using for example the UD plugin (if the cloned disk is unmountable run the appropriate filesystem repair tool, it might also be a good idea to run a filesystem check even if it mounts OK) and copy the recovered data to the array, some files will likely be corrupt and if you have checksums or are using btrfs you can easily find out which ones, if not see below.

 

 

 

 

If you don't have checksums for your files (or use btrfs) there's a way you can still check which files were affected:

 

Create a temporary text file with a text string not present on your data, e.g.:

printf "unRAID " >~/fill.txt

Then fill the bad blocks on the destination disk with that string:

ddrescue -f --fill=- ~/fill.txt /dev/sdY /boot/ddrescue.log

Replace Y with the cloned disk (not the original) and use the existing ddrescue mapfile.

 

Finally mount the disk, manually or for example using the UD plugin and search for that string:

 

find /mnt/path/to/disk -type f -exec grep -l "unRAID" '{}' ';'

Replace /path/to/disk with the correct mount point, all files containing the string "unRAID" will be output and those are your corrupt files, this will take some time as all files on the disks will be scanned, output is only displayed in the end, and if there's no output then the bad sectors were in areas without any files.

 

 

 

 

  • Upvote 2
Link to comment

I'm getting low read speeds from my unRAID server, is there a fix?

 

There's an issue with Samba included with unRAID v6.2 or above that with some hardware configurations may give slower than normal read speed for Windows 8/10 (and related server releases) clients, my tests indicate that the HDD brand/model used is one of the main factors, write speed is not affected, Windows 7 clients are also not affected.

 

To fix the issue add to "Samba extra configuration" on Settings -> SMB: *

max protocol = SMB2_02

Stop and re-start array for changes to take effect, Windows clients may need to reboot to reconnect.

 

Unrelated to this, 10GbE users should make two more changes for better overall performance (reads and writes):

 

1-Change NIC mtu do 9000 (unRAID server and any other computer with a 10GbE NIC)
2-Go to Settings -> Global Share Settings -> Tunable (enable direct IO): set to Yes **

 

* Should not be needed/make a difference in Unraid v6.5.x or newer

** Unlikely to make much of a difference starting with Unraid v6.8.x or newer due to changes in FUSE.

 

This last one may also improve performance for gigabit users in some hardware configurations when reading from user shares.

 

  • Like 1
  • Upvote 1
Link to comment
  • 1 month later...

Can you explain the Cache drive option types, and what is the difference between them?

 

Here is a table illustrating the differences:

 


Table of Cache drive usage options and their behaviors

    C=Cache drive

    D=Data drive(s)

 

                                Cache:No    Cache:Yes    Cache:Only  Cache:Prefer

Data should be on:          D              C+D                C             C+D

New files first to:             D                C                   C               C

Files overflow to:             -                 D                    -                D

Mover moves:                 No           C to D               No            D to C

Orphaned files:                C               -                     D                -

 


Notes:

- Orphaned files are those files located where they don't belong (e.g. files on D with Cache:Only), they won't be moved by the Mover

- Files on both C and D are still visible in shares, for all options

- Shares are all root folders on all array data and Cache drives

- New files overflow to the secondary destination when there is not enough space on the preferred destination

- Cache:Prefer is the newest option.  In general, it is now preferred over Cache:Only because it behaves the same but adds overflow protection.  If you fill up the Cache drive, copying to that share will continue to a data drive, and not error out, as it would if marked Cache:Only.  And if the Cache drive drops out, you will still be able to continue, using a data drive for the same share.  Once the Cache drive is restored, then the Mover will move the share back to the Cache drive.


Some typical usage scenarios

  • Cache:Yes - data is written to the Cache drive, then Mover moves it to the data drives
    - This is the typical Cache drive usage for large shares, to speed up writes to the array.  The data will mainly be stored on the parity protected array, but writes will be at full speed to the Cache drive, then later moved at idle times to the array.
  • Cache:No - keeps all data on the data drives
    - This is similar to Cache:Yes, but doesn't use the Cache drive, which is fine if you don't need the speed boost when writing files to the shares.
    - An alternative usage is to keep most of the data on the array drives, but manually place selected data on a fast Cache drive, in the same share folders, for faster access to that data.  It is still visible in the share but won't be moved to the data drives.  For example, commonly accessed metadata might be placed there.  This may help keep the data drives from spinning up.  (This is similar to the alternative usage of Cache:Only)
  • Cache:Only - keeps all data on the Cache drive or pool
    - This is typically used for smaller shares or shares you want faster access to.
    - An alternative usage is to write and keep new data on the Cache drive, but manually move rarely accessed older files to the same share folders on the array data drives.  Both sets of files are visible in the share.  This may help minimize data drive spin up.  See this post.  (This is similar to the alternative usage of Cache:No)
  • Cache:Prefer - keeps data mainly on the Cache drive or pool, but allows overflow to the array
    - This is similar to Cache:Only, typically used for smaller shares or shares you want faster access to.  But it has additional advantages over Cache:Only - data that won't fit on the Cache drive can overflow to the array drives.  Also, if the Cache drive fails, the same share folders on the data drives will still continue working.  It's also useful if you don't yet have a Cache drive, but are planning to get one.  Once it is installed, the Mover will automatically (on its schedule) move all it can to the Cache drive.  And if you need to do maintenance on the Cache drive or pool, you can move all the files to the array, and they will be moved back once you are done 'maintaining'.
Edited by RobJ
try to fix formatting for IPS
  • Like 9
  • Thanks 2
  • Upvote 3
Link to comment
  • 3 weeks later...

I have an unmountable BTRFS filesystem disk or pool, what can I do to recover my data?

 

Unlike most other file systems, btrfs fsck (check --repair) should only be used as a last resort.  While it's much better in the latest kernels/btrfs-tools, it can still make things worse.  So before doing that, these are the steps you should try in this order:

 

Note: if using encryption you need to adjust the path, e.g., instead of /dev/sdX1 it should be /dev/mapper/sdX1

 

1) Mount filesystem read only (safe to use)

 

Create a temporary mount point, e.g.:

mkdir /temp

Now attempt to mount the filesystem read-only.

v6.9.2 and older use:

mount -o usebackuproot,ro /dev/sdX1 /temp

v6.10-rc1 and newer use:

mount -o rescue=all,ro /dev/sdX1 /temp

For a single device: replace X with actual device, don't forget the 1 in the end, e.g., /dev/sdf1

For a pool: replace X with any of the devices from the pool to mount the whole pool (as long as there are no devices missing), don't forget the 1 in the end, e.g., /dev/sdf1, if the normal read only recovery mount doesn't work, e.g., because there's a damaged or missing device you should use instead the option below.

v6.9.2 and older use:

mount -o degraded,usebackuproot,ro /dev/sdX1 /temp

v6.10-rc1 and newer use:

mount -o degraded,rescue=all,ro /dev/sdX1 /temp

Replace X with any of the remaining pool devices to mount the whole pool, don't forget the 1 in the end, e.g., /dev/sdf1, if all devices are present and it doesn't mount with the first device you tried use the other(s), filesystem on one of them may be more damaged then the other(s).

 

Note that if there are more devices missing than the profile permits for redundancy it may still mount but there will be some data missing, e.g., mounting a 4 device raid1 pool with 2 devices missing will result in missing data.

 

With v6.9.2 and older, these additional options might also help in certain cases (with or without usebackuproot and degraded), with v6.10-rc1 and newer rescue=all already uses all theses options and more.

 

mount -o ro,notreelog,nologreplay /dev/sdX1 /temp

 

If it mounts copy all the data from /x to another destination, like an array disk, you can use Midnight Command (mc on the console/SSH) or your favorite tool, after all data is copied format the device or pool and restore data.

 

2) BTRFS restore (safe to use)

 

If mounting read-only fails try btrfs restore, it will try to copy all data to another disk, you need to create the destination folder before, e.g., create a folder named restore on disk2 and then:

btrfs restore -v /dev/sdX1 /mnt/disk2/restore

For a single device: replace X with actual device, don't forget the 1 in the end, e.g., /dev/sdf1

For a pool: replace X with any of the devices from the pool to recover the whole pool, don't forget the 1 in the end, e.g., /dev/sdf1, if it doesn't work with the first device you tried use the other(s).

If restoring from an unmountbale array device use mdX, where X is the disk number, e.g. to restore disk3:

 

btrfs restore -v /dev/md3 /mnt/disk2/restore

 

If the restore aborts due an error you can try adding -i to the command to skip errors, e.g.:

btrfs restore -vi /dev/sdX1 /mnt/disk2/restore

If it works check that restored data is OK, then format the original btrfs device or pool and restore data.

 

 

 

3) BTRFS check --repair (dangerous to use)

 

If all else fails ask for help on the btrfs mailing list or #btrfs on libera.chat, if you don't want to do that and as a last resort you can try check --repair:

 

If it's an array disk first start the array in maintenance mode and use mdX, where X is the disk number, e.g., for disk5:

btrfs check --repair /dev/md5

For a cache device (or pool) stop the array and use sdX:

btrfs check --repair /dev/sdX1

Replace X with actual device (use cache1 for a pool), don't forget the 1 in the end, e.g., /dev/sdf1

 

 

 

  • Like 6
  • Thanks 9
  • Upvote 2
Link to comment

Why do I see csrf errors in my syslog?

 

Starting with 6.3.0-rc9 unRaid includes code to prevent CSRF vulnerabilities.  (See here)  Some plugins may have needed to be updated in order to properly work with this security measure.

 

There are 3 different errors that you may see logged in your syslog:

 

missing csrf_token - This error happens if you have plugins that have either not been updated to conform to the security system or the version of the plugin you are running is not up to date.  Should you see this error, then check for and install updates for your plugins via the Plugins tab.  To my knowledge, all available plugins within Community Applications have been either updated to handle csrf_tokens or they were not affected in the first place.  If updating your plugins does not solve your issue, then post in the relevant support thread for the plugin.  There will be hints on the log line as to which plugin generated the error.

 

wrong csrf_token - CSRF tokens are randomly generated at every boot of unRaid.  You will see this error if you have one browser tab pointed at a page in unRaid and on another tab you initiate a restart of unRaid.  Note that the browser in question can also be on any device on your network.  This includes other computers, tablets, phones, etc.  IE: Close the other browser tabs.  This error can also be caused by mobile apps such as ControlR checking the status of unRaid but the server has been rebooted after the app was started.  Restart the application to fix.

 

unitialized csrf_token - Thus far the community has never once seen any report of this being logged.  Presumably it is an error generated by unRaid itself during Limetech's debugging period (ie: not plugin related), and should you see this you should post your diagnostics in the release thread for the version of unRaid you are running.  EDIT:  There is a possibility that if your rootfs is completely full due to misconfiguration of an application that you may see this particular token error.

 

Edited by Squid
  • Like 3
  • Upvote 3
Link to comment

Why can't I delete a file (without permissions from root/nobody/Unix user/999/etc)?

My VM/Docker created some files but I can't access them from Windows?

 

First a primer:

Unix filesystem permissions/ACLs (access control lists) in a nutshell

There are always 3 permission groups (owner, group, other)

  • owner - if you own the file, these permissions apply
  • group - if you are a member of the group, these permissions apply
  • other - if you are not the owner or member of the group, these permissions apply

Permissions are cumulative, there is no "deny" permission, so if one group grants permission, permission is granted.

 

You can easily check the permissions of a file from the shell with

root@Tower:~# # ls -l /mnt/user0/slackware/
total 92
-rwxr-xr-x 1 nobody users   410 Aug 10  2016 getall.sh*
-rw-r--r-- 1 nobody users  5336 Oct 29 15:20 mirror-slackware-current.conf
-rwxr-xr-x 1 nobody users 39870 Nov 30  2013 mirror-slackware-current.sh*
-rw-r--r-- 1 nobody users  5397 Oct 29 15:20 mirror-slackware.conf
lrwxrwxrwx 1 root   root     27 Jan 28  2016 mirror-slackware.sh -> mirror-slackware-current.sh*
drwxrws--- 1 root   root     56 Jan 16  2014 multilib/
-rwxr-xr-x 1 nobody users  7165 May 20  2010 rsync_slackware_patches.sh*
drwxrws--- 1 root   root   4096 Jun 11  2015 sbopkgs/
lrwxrwxrwx 1 root   root     16 Jan 28  2016 slackware64 -> slackware64-14.1/
drwxrws--- 1 nobody users  4096 May 28  2016 slackware64-14.1/
drwxr-xr-x 1 root   root   4096 Dec  5 02:00 slackware64-14.2/
drwxrws--- 1 root   root   4096 Aug 11  2016 slackware64-14.2-iso/
drwxr-xr-x 1 nobody users  4096 Dec  5 02:01 slackware64-current/
drwxrws--- 1 nobody users  4096 May  1  2015 slackwarearm-14.1/

The permissions are the displayed with the 10 character string at the start of the line

[l][rwx][rwx][rwx]
  • the first character just tells us the type of the file/directory/link we are working with
  • the first triad are the owner permissions, these are the permissions that apply to the owner of the file/directory/etc
  • the 2nd triad are the group permissions, these are the permissions that apply to the members of the group of the file/directory/etc
  • the last triad are the other/else permissions, these are the permissions that apply to users who are not the owner nor members of the group of the file/directory/etc

For files:

To read a file: read permission is needed. r--

To write a file: write permission is needed. -w-

To execute a file (as a script, or binary): execute is needed. --x

 

For directories:

To list the contents a directory: read and execute is needed. r-x Weird things happen otherwise

To create/delete files in a directory: write is needed on both the file and the directory. -w-

 

Example:

So for a file /mnt/user/share/a/b

drwxrwxr-x 1 nobody users 2 Mar 15 11:57 a/
-rw-rw-rw- 1 nobody users 2 Mar 15 11:57 a/b

Other than root, nobody or members of users. the file b would be impossible to delete, since the write permission to the directory is missing.

The file however, can be overwritten by anybody.

 

Now, Windows access to the files is over SMB

SMB has two modes of access to the file. samba is the app providing the access.

  • Public/Guest access - (unRAID default) in this mode, all access is allowed. There are no passwords needed. Files and directories are created with the nobody user. Permissions are typically set to rwxrwxrwx which grant anybody read and write access
  • Private/Secure access - in this mode, users need to be defined and passwords assigned. Files and directories are owned by the user who created them. But when a share is created, unRAID assigns it to nobody with full read, write, execute for all (owner, group, and others).(ie drwxrwxrwx)

The problem begins when there is a VM, docker creating files. Lets say the VM is using the user backup.

Lets say user alice is trying to delete the old backups from her Windows PC.

Even if the shares are public, she would hit the error about requiring permissions from backup to delete the files. Why?

Because samba will be using the user nobody to delete the files made by the backup user, and typically the file permissions won't allow it.

If the shares are private/secure, it can still fail because alice user is not the same as the backup user, and thus the permission problem exists again. (There are cases where this is not true, but that's a bit outside the scope of the FAQ)

 

How do we correct the issue

The easiest way to correct the issue is to run Tools|New Permissions which pave over all of the shares and disks to have files with rwxrwxrwx permission and ownership by nobody.

But now we don't want that since our VMs and dockers are, in effect, separate OS with their own users, which may or maynot coincide with the new attributes.

So, we login to the terminal (over SSH or console)

and from the terminal, we run:

root@Tower:~$ chmod 777 -Rv /mnt/user/<share1> /mnt/user/<share2> ...

This will cause all the permissions of the affected shares to be set to rwxrwxrwx which should normally fix the issues.

 

In case you have more complex settings or requirements, feel free to discuss them in the forums as this requires case to case settings that might be applicable to your specific scenario.

Initial stuff, will expand as needed

Edited by ken-ji
  • Like 1
  • Upvote 1
Link to comment
  • 4 weeks later...
On 4/18/2016 at 10:17 PM, RobJ said:

This thread is reserved for Frequently Asked Questions, concerning unRAID as a NAS, its setup, operation, management, and troubleshooting.  Please do not ask for support here, such requests and anything off-topic will be deleted or moved, probably to the FAQ feedback topic.  If you wish to comment on the current FAQ posts, or have suggestions or requests for the FAQ, please put them in the FAQ feedback topic.  Thank you!

 

Index to common questions

  Some are from the wiki FAQ, some from this thread, and some from the LimeTech web site.  There are many more questions with answers on the wiki FAQ.


Getting Started

General Questions

Cache Drive/Pool

Plugins

Maintenance and Troubleshooting


 

unRAID FAQ's and Guides -

* Guides and Videos - comprehensive collection of all unRAID guides (please let us know if you find one that's missing)

* FAQ for unRAID v6 on the forums, general NAS questions, not for Dockers or VM's

* FAQ for unRAID v6 on the unRAID wiki - it has a tremendous amount of information, questions and answers about unRAID.  It's being updated for v6, but much is still only for v4 and v5.

* Docker FAQ - concerning all things Docker, their setup, operation, management, and troubleshooting

* FAQ for binhex Docker containers - some of the questions and answers are of general interest, not just for binhex containers

* VM FAQ - a FAQ for VM's and all virtualization issues

 

Know of a question that ought to be here?  Please suggest it in the FAQ feedback topic.

 

-------------------------------------------------------

Suggested format for FAQ entries - clearly shape the issue as a question or as a statement of the problem to solve, then fully answer it below, including any appropriate links to related info or videos.  Optionally, set the subject heading to be appropriate, perhaps the question itself.

 

While a moderator could cut and paste a FAQ entry here, only another moderator could edit it.  It's best therefore if only knowledgeable and experienced users create the FAQ posts, so they can be the ones to edit it later, as needed.  Later, the author may want to add new info to the post, or add links to new and helpful info.  And the post may need to be modified if a new unRAID release changes the behavior being discussed.

 

Moderators:  please feel free to edit this post.

 

Updated the links since the ones in the OP are no longer working... (Yes I had nothing else to do   :S)

Edited by Squid
add new link
  • Like 1
  • Upvote 2
Link to comment

How can I stop mover from running?  (Possibly unRaid 6.3.3+ only)

 

Since you can't stop the array while mover is running, in order to stop mover either from an SSH terminal or from the local keyboard / monitor, enter in the following

mover stop

 

Reference:

 

Edited by Squid
  • Like 1
  • Upvote 1
Link to comment

Why is my GUI Slow and/or unresponsive?

 

This problem has been traced to an anti-virus program suite and its settings in several cases. The link below will take to two posts which provide a rather complete descriptions of the problem and its solution. 

While you might not be running Avast, I have no doubt that other antivirus products will have a similar issue in the future.  You should definitely investigate this area if you are having any type of problem with a slow, misbehaving  or unresponsive GUI.

 

EDIT:  Keep reading in the thread as there is continuing investigation into the issues with Avast. 

Edited by Frank1940
  • Upvote 1
Link to comment
  • 3 months later...

I'm having trouble with lockups / crashes / etc.  How can I see the syslog following a reboot?

 

All 3 of the methods below will continually write the syslog (as it changes) to the flashdrive up to the moment the lockup / crash / reboot of the server happens.

 

unRaid runs completely from RAM, so there is normally no way to view the syslog from one boot to another.  However, there are a few different ways to grab the syslog from one boot to another.

 

Method Preferred: ENABLE THE SYSLOG SERVER AND MIRROR THE SYSLOG TO FLASH DRIVE (SETTINGS - SYSLOG SERVER)

 

Method 1:  Via the User Scripts Plugin:

Method 2: Via Fix Common Problems Plugin

  • Within Fix Common Problems settings, put it into Troubleshooting mode

Method 3: Via a screen session or at the local keyboard & monitor

tail -f /var/log/syslog > /boot/syslog.txt

 

 

Pros / Cons

 

Method 1 will create a new syslog file on the flash drive at every boot so that you can compare / not lose any historical data for reference

 

Method 2 logs a ton of extra information that may (or may not) help with diagnosing any issues.  This extra information being logged however may contribute to a crash due to the logging being filled up if troubleshooting mode is enabled for more than a week or so.  Also requires you to re-enable it on every boot (by design)

 

Method 3 Information is identical to Method 1, but requires you to reenter the command every time you want the information.  Additionally, if this is not entered at the local command prompt or via a screen session, then closing the SSH (Putty) window will stop the logging from happening.

 

BIG NOTE:

 

In the case of lockups, etc it is highly advised to have a monitor connected to the server and take a picture of whatever is on it prior to rebooting the server.  It is impossible for any script to capture any errors that may have been outputted to the local monitor

 

  • Thanks 1
Link to comment
  • 5 weeks later...

What are "Page Allocation Stalls?"

 

While not the most technical explanation, this is as far as I can tell pretty close to the actual truth:

 

https://forums.lime-technology.com/topic/59858-trace-error-found-635/?tab=comments#comment-587518

 

(Updating to unRaid 6.4.0 will most likely also solve this problem as that OS version has better memory management)

Edited by Squid
Link to comment
  • 1 year later...

Reformat a SAS HDD to different block sizes mainly 512 to use in UNRAID

This took me a few hours to find and work out but was so much needed

OK so I have just now done this for myself by installing sg3_utils onto my UNRAID OS using installpkg

all using terminal

1. download the package into a tmp dir # wget http://slackware.cs.utah.edu/pub/slackware/slackware64-14.1/slackware64/l/sg3_utils-1.36-x86_64-1.txz

2. run this from that tmp dir after the download to install sg3_utils # upgradepkg --install-new sg3_utils-1.36-x86_64-1.txz

3. use this command to show SAS HDD's # sg_scan -i

4. this command to format 'obviously /dev/XXX should be the HDD u wish to format MAKE SURE ITS THE RIGHT ONE! # sg_format --format --size=512 -v /dev/XXX

this has been allowing me to reformat the block size and use previously non usable drives saving buttonnes of money

WARNING this format will destroy a HDD if interrupted during this process if you can a UPS is recommended 

have a great day I love UNRAID!

Edited by SundarNET
bad grammar + guide update/change
  • Like 2
  • Thanks 2
Link to comment
  • 4 weeks later...

How can I monitor a btrfs or zfs pool for errors?

 

As some may have noticed the GUI errors column for the cache pool is just for show, at least for now, as the error counter remains at zero even when there are some, I've already asked and hope LT will use the info from btrfs dev stats/zpool status in the near future, but for now, anyone using a btrfs or zfs cache or unassigned redundant pool should regularly monitor it for errors since it's fairly common for a device to drop offline, usually from a cable/connection issue, since there's redundancy the user keeps working without noticing and when the device comes back online on the next reboot it will be out of sync.

 

For btrfs a scrub can usually fix it (though note that any NOCOW shares can't be checked or fixed, and worse than that, if you bring online an out of sync device it can easy corrupt the data on the remaining good devices, since btrfs can read from the out of sync device without knowing it contains out of sync/invalid data), but it's good for the user to know there's a problem as soon as possible so it can be corrected, for zfs the missing device will automatically be synced when it's back online.

 

BTRFS

Any btrfs device or pool can be checked for errors read/write with btrfs dev stats command, e.g.:

btrfs dev stats /mnt/cache

It will output something like this:

[/dev/sdd1].write_io_errs    0
[/dev/sdd1].read_io_errs     0
[/dev/sdd1].flush_io_errs    0
[/dev/sdd1].corruption_errs  0
[/dev/sdd1].generation_errs  0
[/dev/sde1].write_io_errs    0
[/dev/sde1].read_io_errs     0
[/dev/sde1].flush_io_errs    0
[/dev/sde1].corruption_errs  0
[/dev/sde1].generation_errs  0

All values should always be zero, and to avoid surprises they can be monitored with a script using Squid's great User Scripts plugin, just create a script with the contents below, adjust path and pool name as needed, and I recommend scheduling it to run hourly, if there are any errors you'll get a system notification on the GUI and/or push/email if so configured.

#!/bin/bash
if mountpoint -q /mnt/cache; then
btrfs dev stats -c /mnt/cache
if [[ $? -ne 0 ]]; then /usr/local/emhttp/webGui/scripts/notify -i warning -s "ERRORS on cache pool"; fi
fi

If you get notified you can then check with the dev stats command which device is having issues and take the appropriate steps to fix them, most times when there are read/write errors, especially with SSDs, it's a cable issue, so start by replacing the cables, then and since the stats are for the lifetime of the filesystem, i.e., they don't reset with a reboot, force a reset of the stats with:

btrfs dev stats -z /mnt/cache

Finally run a scrub, make sure there are no uncorrectable errors and keep working normally, any more issues you'll get a new notification.

 

P.S. you can also monitor a single btrfs device or a non redundant pool, but for those any dropped device is usually quickly apparent.

 

ZFS:

For zfs click on the pool and scroll down to the "Scrub Status" section:

 

image.png

 

All values should always be zero, and to avoid surprises they can be monitored with a script using Squid's great User Scripts plugin, @Renegade605created a nice script for that, I recommend scheduling it to run hourly, if there are any errors you'll get a system notification on the GUI and/or push/email if so configured.

 

If you get notified you can then check in the GUI which device is having issues and take the appropriate steps to fix them, most times when there are read/write errors, especially with SSDs, it's a cable issue, so start by replacing the cables, zfs stats clear after an array start/stop or reboot, but if that option is available you can also clear them using the GUI by clicking on "ZPOOL CLEAR" below the pool stats.

 

Then run a scrub, make sure there are no more errors and keep working normally, any more issues you'll get a new notification.

 

P.S. you can also monitor a single zfs device or a non redundant pool, but for those any dropped device is usually quickly apparent.

 

 

 

Thanks to @golli53for a script improvement so errors are not reported if the pool is not mounted.

 

  • Like 9
  • Thanks 11
  • Upvote 1
Link to comment
  • 1 month later...

Why do I see "Cannot open root device null" and unRaid will not boot?

 

image.png.d85c61329fda7f038e0af4f51e5af66e.png

 

See this thread here: https://forums.unraid.net/topic/74419-tried-to-upgrade-from-653-to-66-and-wont-boot-up-after-reboot/

And in particular read from this post onwards: https://forums.unraid.net/topic/74419-tried-to-upgrade-from-653-to-66-and-wont-boot-up-after-reboot/?tab=comments#comment-710968

 

Edited by Squid
Link to comment
  • 5 months later...

Fix Common Problems is telling me that Write Cache is disabled on a drive.  What do I do?

 

This test has nothing to do with any given unRaid version.  For some reason, sometimes hard drive manufacturers disable write cache on their drives (in particular shucked drives) by default.  This is not a problem per se, but you will see better performance by enabling the write cache on the drive in question.

 

To do this, first make a note of the drive letter which you can get from the Main Tab

 

image.png.110b5e815d1f77f136b1bda537d32553.png

 

Then, from unRaid's terminal enter in the following (changing the sdX accordingly)

 

hdparm -W 1 /dev/sdm

You should get a response similar to this:


/dev/sdm:
 setting drive write-caching to 1 (on)
 write-caching =  1 (on)

If write caching stays disabled, then either the drive is a SAS drive, in which case you will need to utilize the sdparm commands (google is your friend), or the drive may be connected via USB in which case you may not be able to do anything about this.

 

99% of the time, this command will permanently set write caching to be on.  In some rare circumstances, this change is not permanent, and you will need to add the appropriate command to either the "go" file (/config/go on the flash drive), or execute it via the user scripts plugin (with it set to run at first array start only)

 

It should be noted that even with write-caching disabled this is not a big deal.  Only performance will suffer.  No other ill-effects will happen.

 

NOTE:  If this does not work for you, then you will either need to contact the drive manufacturer as to why or simply ignore the warning from Fix Common Problems

 

Edited by Squid
  • Like 11
  • Thanks 4
Link to comment
  • 3 months later...

How can I calibrate my UPS, silence alarms, change battery dates using apcupsd?  

 

Unraid has apcupsd built in to have any UPS's and questions have arisen about calibrations, alarm silencing, battery replacement date changes.  @hpka did some extensive research to find that there is a utility included which will allow the root user the ability to adjust many parameters.  Exactly which ones can be adjusted will depend on the manufacturer and the model of UPS that you are using. Here is a link to @hpka's post:

 

 

    

By the way, the     sudo      command is not required when using the Unraid terminal session as is shown below:  

root@Rose:~# apctest


2019-10-16 11:52:33 apctest 3.14.14 (31 May 2016) slackware
Checking configuration ...
sharenet.type = Network & ShareUPS Disabled
cable.type = USB Cable
mode.type = USB UPS Driver
Setting up the port ...
Doing prep_device() ...

 

Link to comment

How do I use the Syslog Server?

 

Beginning with release 6.7.0, there has been a syslog server functionality added to Unraid.  This can be a very powerful diagnostic tool when you are confronted with a situation where the regular tools can not or do not capture information about about a problem because the server has become non-responsive, has rebooted, or spontaneously powered down.  However, getting it set up to use has been confusing to many.  Let's see if we clarify setting it up for use.  Begin by going to  Settings   >>>   Syslog Server    

 

This is the basic Syslog Server page:

image.thumb.png.5d4e68c8ffda468f5bd39d0f2dfe652a.png

 

You can click on the 'Help' icon on the Toolbar and get more information for all of these three options. 

 

The first one to be considered for use is the Mirror syslog to flash:  This one is the simplest to set up.  You select 'Yes' from the dropdown box and click on the 'Apply' button and the syslog will be mirrored to logs folder/directory of the flash drive.  There is one principal disadvantage to this method.  If the condition, that you are trying to troubleshoot, takes days to weeks to occur, it can do a lot of writes to the flash drive.  Some folks are hesitant to use the flash drive in this manner as it may shorten the life of the flash drive.  This is how the setup screen looks when the Syslog Server is set up to mirror to the flash drive. 

image.thumb.png.b04ae407f8b5b514ae287391d334cb6f.png

 

The second option is use an external Syslog Server.  This can be another Unraid server.  You can also use virtually any other computer.  You find the necessary software by googling for the   syslog server <Operating system>  After you have set up the computer/server, you fill in the computer/server name or the IP address.  (I prefer to use the IP address as there is never any confusion about what it is.)  The Click on the 'Apply' button and your syslog will be mirrored to the other computer. The principal disadvantage to this system is that the other computer has be left on continuously until the problem occurs.

image.thumb.png.cba6555da8fb9f0a6ea1f6280003f391.png

 

The third option uses a bit of trickery in that we use the Unraid server with the problem as the Local syslog server.  Let's begin by setting up the Local syslog server.   After changing the Local syslog server: dropdown to 'Enabled', the screen will look like this.  

image.thumb.png.451e8cef38f83c5654f2b50b201900dc.png

 

Note that we have a new menu option--  Local syslog folder:  This will be a share on the your server but chose it with care.  Ideally, it will be a 'cache only' or a 'cache preferred' share.  This will minimize the spinning up of disks due to the continuous writing of new lines to the syslog.  A cache SSD drive would be the ideal choice here.  (The folder that you see above is a 'cache preferred' share.  The syslog will be in the root of that folder/share.)

 

If you click the 'Apply button at this point, you will have this server setup to serve as a Remote Syslog Server.  It can now capture syslogs from several computers if the need should arise.

 

Now, we added the ip address of this server as the  Remote syslog server  (Remember the mention of trickery.  So basically, you send data out-of-the-server and it comes-right-back-in.)   This is what it looks now:

image.thumb.png.2d3502c0a7f1e844eeeb577afd1d234b.png

 

As soon as you click on apply, the logging of your syslog will start to a file named (in this case)  syslog-192.168.1.242.log in the root of the selected folder (in this case-- Folder_Tree). One very neat feature is that each entry are appended onto this file every time a new line is added to the syslog.  This should mean if you have a reboot of the server after a week of collecting the syslog, you will have everything from before the reboot and after the reboot in one file!  

 

Thanks @bonienl for both writing this utility and the guidance in putting this together.   

Edited by Frank1940
  • Like 8
  • Thanks 9
  • Upvote 1
Link to comment
  • 4 weeks later...
  • 1 month later...

What can I do to keep my Ryzen based server from crashing/locking up with Unraid?

 

 

Ryzen on Linux can lock up due to issues with c-states, and while this should mostly affect 1st gen Ryzen there are reports that 2nd and even 3rd gen can be affected in some cases, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar).

 

If there's no such setting in the BIOS try instead to disable C-States globally, also note that there have been some reports that with some boards the setting above is not enough and only completely disabling C-States brings stability.

 

 

Also many of those servers seem to be running overclocked RAM, this is known to cause stability issues and even data corruption on some Ryzen/Threadripper systems, even if no errors are detected during memtest, server and overclock don't go well together, respect max RAM speed according to config and CPU listed on the tables below.

 

Note: Ryzen based APUs don't follow the same generation convention compared to regular desktop CPUs and are generally one generation behind, so for example Ryzen 3200G is a 2nd Gen CPU:

 

image.png.83908e85868bd93d84c8e7f9ee7d9332.png

 

 

1st gen Ryzen:

1957135240_1stgen.png.473abc526e7dcf0f315d4a49bb8cbe97.png

 

2nd gen Ryzen:

292373131_2ndgen.jpg.ce6fe9923eb50ea70fc41c5f6126883a.jpg

 

3rd gen (3xxx) and Zen3 (5xxx) Ryzen :

1543591904_3rdgen.jpg.3f68aa060b70be3fdb358a6ee48d1818.jpg

 

Threadripper 1st Gen:

image.png.189a5072ca437e384187294915fb8f9e.png

 

Threadripper 2nd Gen:

image.png.0646869ee38b1331b0bddf091c21ac5d.png

 

Threadripper 3rd Gen:

image.png.40296a4eb42eb6d650fe7b6f7e43c374.png

 

  • Like 16
  • Thanks 1
Link to comment
  • trurl featured this topic

Why are files not being moved by the Mover?

 

These are some common reasons the Mover is not working as expected:

 

  • If using the mover tuning plugin first thing to do is to check its settings or remove it and try without it just to rule it out.
  • use cache pool option for the share(s) is not correctly set, for 6.11.5 or older see here for more details but basically cache=yes moves data from pool to array, cache=prefer moves data from array to pool, cache=only and cache=no options are not touched by the Mover, for v6.12.0 or newer chech that the shares have the pool as primary storage, array as secondary storage and mover action set to move from pool to array
  • files are open, they already exist or there's not enough space in the destination, enable Mover logging (Settings -> Scheduler -> Mover Settings) and it will show in the syslog what the error is.
  • if it's a not enough space error note that split level overrides allocation method, also minimum free space for the share(s) must be correctly set, usual recommendation is to set it to twice the max file size you expect to copy/move to that share.

 

If none of these help enable Mover logging, run the Mover, download the diagnostics and please attach them to a new or your existing thread in the general support forum, also mention the share name(s) that you want/expect data to be moved.

 

 

  • Like 2
  • Thanks 1
Link to comment
  • 1 month later...

Where can I see what folders are taking up my RAM?

 

If you think (or have been told on the forum) that something somewhere is filling up your RAM (rootfs etc), then this might help in diagnosing exactly where to help you in finding out why

 

From the plugins tab, install plugin and enter in this URL

https://raw.githubusercontent.com/Squidly271/misc-stuff/master/memorystorage.plg

NOTE: this does not actually install anything, but is simply a useful way to run a script

 

You will see where all the memory in your RAM is being consumed.  Pay particular attention to the last few lines (where it will detail /mnt).  If you have anything listed under /mnt, that would mean that (most likely) a docker app is directly referencing a disk or pool that doesn't actually exist (ie: any actual disks and pools existing will not be listed)  Other common areas for trouble would be /tmp and /var/log

 

This script (while hopefully being useful) can potentially take a number of minutes to run, especially if you have bypassed the OS or Unassigned Devices (eg: rclone) and are making your own mount points anywhere in the system.  Because it's impossible for this script to know that you are making your own mountpoints manually out of system control, it will think that this is in RAM and calculate the space taken accordingly.

 

Also, do not be deceived by some of the entries in this list.  Many of the folders listed will consume a couple of hundred meg.  It's the folders which take up Gigabytes that you would be most concerned about

 

 

 

  • Like 1
  • Thanks 1
Link to comment

My server won't proper wake up after S3 Sleep

 

Note that the OS does not official support S3 Sleep.  This is handled by an auxiliary plugin (Dynamix S3 Sleep).  But for some users, the following seems to work to allow them to wakeup.  Your mileage may vary.

 

Read from here down 

 

Link to comment
  • 4 months later...
  • 3 weeks later...

Can I use a cache, log, special, spare and/or dedup vdev with my zfs pool?

 

At this time (Unraid v6.12) they cannot be added to a pool using the GUI, but you can add them manually and have Unraid import the pool, a few notes:

 

  • currently zfs must be on partition #1, for better future compatibility (though not guaranteed) recommend partitioning the devices first with UD.
  • main pool should be created with Unraid, then add the extra vdev(s) using the CLI and re-import the pool
  • available vdev types and what they do are beyond the scope of this entry, you can for example see here for more information.
  • please note that since the GUI doesn't not support this it might give unpredictable results if you then try to replace one of pool devices, so if you plan to use this recommend for now doing any needed device replacement with the CLI.

 

How to:

  • first create the main pool using Unraid
  • in this example I've created a 4 device raidz pool

imagem.png

  • start array, format the pool if it's a new one, and with the array running partition then add the extra vdev(s) using the command line
  • to partition the devices with UD you need to format them, but there's no need to use zfs, I usually format with xfs since it's faster, just format the device and leave it unmounted:

imagem.png

 

  • to add a vdev to the pool use the CLI (need to use -f to overwrite the existing filesystem, always double check that you are specifying the correct devices, also note the 1 in the end for the partition), a few examples:

            - add a 2-way mirror special vdev

zpool add tank -f special mirror /dev/sdr1 /dev/sds1

            - add a 2-way mirror log

zpool add tank -f log mirror /dev/sdt1 /dev/sdu1

           - add a striped cache vdev

zpool add tank -f cache /dev/sdv1 /dev/sdw1

           - add a 2-way mirror dedup vdev

zpool add tank -f dedup mirror /dev/sdx1 /dev/sdy1

           - add a couple of spares

zpool add tank -f spare /dev/sdb1 /dev/sde1

 

  • when all the vdev(s) are added to the pool stop the array, now you need to re-import the pool
  • unassign all pool devices
  • start array (check the "Yes I want to do this" box)
  • stop array
  • re-assign all pool devices, including the new vdev(s), assign all devices sequentially in the same order as zpool status shows, and don't leave empty slots in the middle of the assigned devices.
  • start array
  • existing pool will be imported with the new vdev(s):

 

imagem.png

  • Like 7
  • Thanks 3
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.