jbiggs77 Posted November 21, 2016 Share Posted November 21, 2016 I have 3 disks in my array (v 5.0.6). Disk 1 ran out of space last week even though Disks 2 and 3 have plenty of storage left (see screenshot - I have since deleted a few files to clear some space). I am really confused as to why this is happening. Have I set something up incorrectly? Quote Link to comment
tdallen Posted November 21, 2016 Share Posted November 21, 2016 Hi - When you say that it ran out of space what do you mean? Were you copying data to a user share? Was mover running? Were you trying to copy a very large file, or a bunch of small ones? Any additional information you can provide would help, including the Allocation Method for your user shares. Quote Link to comment
itimpi Posted November 21, 2016 Share Posted November 21, 2016 It would be useful if you provided the diagnostics (Tools->Diagnostics) so we can see what settings you have and what is happening on your system. Quote Link to comment
tdallen Posted November 21, 2016 Share Posted November 21, 2016 The OP mentioned that they're on v5.0.6 so we have to do it the hard way . Quote Link to comment
itimpi Posted November 21, 2016 Share Posted November 21, 2016 Missed that! I guess screen shots of the global share settings and for the share that is causing problems would probably provide the required information. Quote Link to comment
garycase Posted November 21, 2016 Share Posted November 21, 2016 UnRAID allocates new data based on your allocation method and split level settings. In addition, a share will only use space on the disks it's set to use -- i.e. if you restrict a share to one disk, then when that disk fills up, you won't be able to copy more data to that share. Unless you have some good reason, it's generally a good idea to let UnRAID use all of the disks for all of the shares. You should also set a minimum free space for each share that's larger than the largest file you might want to copy to that share. UnRAID has no way to know how large a file is when you start to copy it; and if it allocates it to a disk that doesn't have enough space, the copy will fail. The "minimum free" setting ensures this won't happen. I suspect the reason you ran out of space on disk #1 has something to do with one or more of these parameters not being set correctly for your usage. If you describe the shares you have; and the settings you've got for each of them; we can likely suggest changes that will eliminate this issue. Quote Link to comment
jbiggs77 Posted November 22, 2016 Author Share Posted November 22, 2016 When you say that it ran out of space what do you mean? Were you copying data to a user share? Was mover running? Were you trying to copy a very large file, or a bunch of small ones? Any additional information you can provide would help, including the Allocation Method for your user shares. I was moving some files within a folder (ex. from /media/Movies/Unwatched to /media/Movies). They were 4-6GB files. Allocation Method is high-water Quote Link to comment
garycase Posted November 22, 2016 Share Posted November 22, 2016 If you were specifying the folder using a disk share reference, then UnRAID doesn't get involved in selecting where to put the files => YOU did that. And apparently you selected a disk that didn't have room for the files. Quote Link to comment
jbiggs77 Posted November 22, 2016 Author Share Posted November 22, 2016 If you were specifying the folder using a disk share reference, then UnRAID doesn't get involved in selecting where to put the files => YOU did that. And apparently you selected a disk that didn't have room for the files. Sorry, not sure I follow. Can you give an example of how I could move them without using a disk share reference? Thanks! Quote Link to comment
garycase Posted November 22, 2016 Share Posted November 22, 2016 If you were using a share reference, UnRAID would allocate space according to the allocation settings for the share. HOWVER, do NOT move them within the SAME share or you'll likely lose all of the data you're trying to move. [The "user share copy bug" is discussed in a couple of threads on this forum) If you are going to use disk share references, that's fine => but be sure you check the available space on the disk you want to copy to BEFORE you attempt the copy [Just look at the stats in the Web GUI]. Quote Link to comment
trurl Posted November 22, 2016 Share Posted November 22, 2016 When you say that it ran out of space what do you mean? Were you copying data to a user share? Was mover running? Were you trying to copy a very large file, or a bunch of small ones? Any additional information you can provide would help, including the Allocation Method for your user shares. I was moving some files within a folder (ex. from /media/Movies/Unwatched to /media/Movies). They were 4-6GB files. Allocation Method is high-water If you did the move like you said above then that was a user share not a disk share so I think we can forget about that. We still need to see what itimpi asked for. Missed that! I guess screen shots of the global share settings and for the share that is causing problems would probably provide the required information. Quote Link to comment
jbiggs77 Posted February 10, 2017 Author Share Posted February 10, 2017 Here are the requested screenshots. The disk is now totally out of space and has a red dot next to it. Quote Link to comment
trurl Posted February 10, 2017 Share Posted February 10, 2017 Been a few months! Those are not the requested screenshots, but they are useful, since it tells us you have a much more serious problem than just a full disk. Disk1 is disabled. It will have to be rebuilt. Do you have a spare? If not maybe we can rebuild to the same disk, but we would need to see a SMART report from disk1 first. See the stickies at the top of this subforum. Do you have backups? Quote Link to comment
garycase Posted February 10, 2017 Share Posted February 10, 2017 You still haven't posted the share settings details. And the graphic of "Main" is cut off on the right so we can't see the current status of the disks -- although we CAN see that Disk 1 has a red ball, which means it's been disabled due to a write error. Post a new copy of the Main page that shows ALL of the information (If you need to do 2 screenshots to capture it all just post it in two parts). ... and post a picture of the share settings for your Movies share. [On the shares page, click on the name of the share -- e.g. "Movies" -- and post a picture of the next page it displays.] I suspect you have two problems ... (1) Your Movies share most likely is only set to use Disk #1, and that disk is most likely full => this is easily fixed. (2) Disk #1 has been disabled, so you need to replace NOW before another disk fails, or you'll lose all of its data. Right now you can still "see" the data because it's being emulated using all of the other disks to compute the correct data for it (that's what parity allows you to do). But to confirm those assumptions, you need to post the two graphics I just mentioned. Quote Link to comment
jbiggs77 Posted February 10, 2017 Author Share Posted February 10, 2017 Thank you both for your help (and patience)! I am attaching the requested screenshots of the full Main screen and the Movies share. Answers to questions: 1) Do I have a spare disk - No but I can buy one this weekend 2) Do I have backups - No I don't 3) Smart report from disk - pasted below smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org Read Device Identity failed: Invalid argument A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. root@server:/boot# smartctl -a -d ata /dev/sdd >/boot/smartd.txt root@server:/boot# cat smartd.txt smartctl 6.2 2013-07-26 r3841 [i686-linux-3.9.11p-unRAID] (local build) Copyright © 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green Device Model: WDC WD10EADS-00L5B1 Serial Number: WD-WCAU4C531119 LU WWN Device Id: 5 0014ee 25870a7d0 Firmware Version: 01.01A01 User Capacity: 1,000,203,804,160 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Device is: In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.5, 3.0 Gb/s Local Time is: Fri Feb 10 16:42:35 2017 EST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (23400) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 268) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 253 161 021 Pre-fail Always - 1966 4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3916 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 019 019 000 Old_age Always - 59314 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 79 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 16 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3908 194 Temperature_Celsius 0x0022 120 108 000 Old_age Always - 30 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@server:/boot# Quote Link to comment
trurl Posted February 10, 2017 Share Posted February 10, 2017 Maybe split level is causing it to keep writing disk1, but we can look at that later. You really need to make a backup plan. Parity is no substitute for backups. You don't have to backup everything, but you absolutely must have multiple copies of any files that are important and irreplaceable. How much copying do you think it would take to copy your important and irreplaceable files to your PC? That might be the first thing you should do. SMART for disk1 looks OK, so it might just be a connection problem. You should post a syslog. Looks like it might be OK to rebuild to the same disk after checking connections, but I really prefer to see SMART reports from all drives before rebuilding a disk, especially if it seems as if the user has been neglecting their server. If any other disks have issues it can make the rebuild fail. Quote Link to comment
garycase Posted February 10, 2017 Share Posted February 10, 2017 Also post your global share settings => on the Settings page click on Share settings and post the result. I suspect you may not have all disks included at the global share level. This is very easy to fix, but first I'd get the disabled disk resolved ... As trurl noted, your SMART data looks okay, but I would NOT try rebuilding the disk onto itself -- SOMETHING happened that caused it to be disabled; and with no backups I would NOT do anything that would change that disk until you have your data safely "tucked away" -- either on a backup or on a new disk. That disk also have nearly 60,000 hours of use ... almost 7 years of 24/7 operation ... so it's a good idea to replace it anyway. What I'd suggest at this point is: (a) Buy a new 3TB disk (since your parity is 3TB you may as well replace your old disk with a 3TB unit); (b) Stop the array; change disk 1 to "unassigned"; Start the array (so it shows a "missing" disk); then Stop the array; go to Settings - Disk Settings and disable auto-start; and then Shut down the system. © Install your new 3TB disk in place of the failed disk. (d) Boot the system and assign the new 3TB drive as disk #1; then Start the array. It will now rebuild the data from disk 1 on to the new disk, and will expand the file system, so you'll have a lot more space on the drive. (e) When the rebuild finishes (many hours), do a non-correcting parity check (Uncheck the box that says "Correct any Parity-check errors ...") to confirm all went well. [This is the only time I recommend using a non-correcting check.] Quote Link to comment
jbiggs77 Posted February 11, 2017 Author Share Posted February 11, 2017 Global share settings attached. I've ordered a new 3TB WD Red drive. I will get the photos and other irreplaceable files backed up immediately and follow your replacement plan when the new drive arrives. Quote Link to comment
garycase Posted February 11, 2017 Share Posted February 11, 2017 All of your drives should be useable by the share, as they're not excluded by either the global share settings or the settings for the share. I suspect you're attempting to copy files to a folder level that your split level prohibits a change of disk for -- so it's simply running out of space. I'd delete the split level (leave that field blank -- which means split as needed) and confirm that resolves your issue with copying data to the share (it should then use all of your disks as needed). But the first thing I'd do is get the disabled disk replaced. Except for copying data from disk #1 to another system to back it up, I wouldn't use the server anymore than you absolutely must until you have replaced the drive. Quote Link to comment
garycase Posted February 11, 2017 Share Posted February 11, 2017 Note: If you want to, you could test the new drive before installing it -- either using WD's excellent Data Lifeguard utility on a PC or with the pre-clear utility for UnRAID. But given that you're already got a failed disk, I'd be inclined to simply install it and do the rebuild -- followed by a non-correcting parity check as I suggested. As long as the parity check is error-free, you'll know the disk is good and it will be reasonably well tested just by the process of the rebuild/parity-check. One thing the pre-clear does that wouldn't be done in this case is compare the SMART values "before" and "after" the process. You can, however, do this yourself IF you have UnMenu installed (do you??) by looking at the SMART values for the drive BEFORE you Start the array to do the rebuild [using the MyMain page in Unmenu -- just click on "sm"], and then again after you've done the rebuild and parity check. Other than that, there's little difference in the effective "testing" your drive will get -- e.g.: A preclear will (a) read the entire disk to confirm all sectors are readable; (b) write zeroes to the entire disk; and then © read the entire disk again to confirm all the zeroes are okay. Simply installing the disk and doing a disk rebuild, followed by a parity check will (a) write data to every sector on the disk; and then (b) read every sector on the disk. So it effectively is as good a check as the pre-clear ... arguably even a bit better since the data written will be random rather than all zeroes. Quote Link to comment
jbiggs77 Posted February 17, 2017 Author Share Posted February 17, 2017 Ok, new drive has arrived. Maybe a stupid question but how do I ensure I'm removing the correct drive? (I have 2 identical ones) Quote Link to comment
trurl Posted February 17, 2017 Share Posted February 17, 2017 Ok, new drive has arrived. Maybe a stupid question but how do I ensure I'm removing the correct drive? (I have 2 identical ones) The serial number is on the label of each drive. Each drive's serial number also appears in the unRAID webUI on the Main page. My WD drives also have a sticker on the end of the drive (opposite end from the connectors) with the last 4 characters from the serial. Quote Link to comment
jbiggs77 Posted February 19, 2017 Author Share Posted February 19, 2017 New drive installed, array rebuilt and parity check run with 0 issues. Thanks everyone for your help, what a great community! Marking as solved. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.