Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add


Recommended Posts

Here's a really dumb question:

 

If I have a headless unRAID box, I can use Putty to connect to my server and start the preclear script.  But, do I have to leave the Putty session open for 10+ hours in order to view the status?  Is there any way to start preclear, disconnect from my box, connect again and view the preclear status?

 

I bought myself 2 WD Green drives that I am going to preclear.

 

Thanks for any info.

You can install "screen" as a supplemental package.  When you invoke it and then start a command you can then disconnect and later re-connect to a running process.   Otherwise, there is no other way I know if you don't have a system console.

 

Both these packages are needed.   Use "installpkg package_name.tgz" to install each in turn as shown below.

 

http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz

and

http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz

 

Most of us have a "packages" directory to hold downloaded packages. Create it by typing

mkdir /boot/packages

 

Download the two files by either typing:

cd /boot/packages

wget http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz

and

cd /boot/packages

wget http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz

 

Or download them to your windows PC by clicking on the links above, and then move them to the packages folder on your flash drive using windows file-explorer. (you will need to create the "packages" folder if it does not exist)

\\tower\flash\packages

 

To install these packages, log onto the unRAID server as root and then type:

cd /boot/packages

installpkg utempter-1.1.4-i486-1.tgz

installpkg screen-4.0.3-i486-1.tgz

cd /boot

 

Then type

screen

Then start up the preclear_disk.sh process.

 

To detach, leaving the preclear_disk.sh process running, type

Control-A d

 

Then, 10 hours later you can re-attach to the running process by logging in and typing

screen -r

 

To create another screen window for a second/third concurrent preclear, type

Control-A c

To switch between the screen windows type:

Control-A P

or

Control-A N

for the next or previous screen session

 

A good article on "screen" can be found here:

http://www.linuxjournal.com/article/6340

 

The manual page for screen is here:

http://ss64.com/bash/screen.html

 

It can do a lot more. You can "name" the screen sessions, list the sessions

Control-A "

(Control-A followed by a "quote")

 

Edit: updated links to screen packages

 

Joe L.

Link to comment

I ran my first preclear last night.  I connected a console to my unRAID server.

 

It appeared to run as expected steps 1 & 2 as I retired for the evening.  This morning, the console displays what looks like a lot of double spaced SMART info. I can't see it all on the screen.  And, I have a "ghost" entry "sdb1" in my unMenu drive listing which wasn't there before preclear completed.  So, I don't know if preclear was successful or not.

 

Is there a log of the preclear output someplace?

 

Link to comment

I ran my first preclear last night.  I connected a console to my unRAID server.

 

It appeared to run as expected steps 1 & 2 as I retired for the evening.  This morning, the console displays what looks like a lot of double spaced SMART info. I can't see it all on the screen.  And, I have a "ghost" entry "sdb1" in my unMenu drive listing which wasn't there before preclear completed.  So, I don't know if preclear was successful or not.

 

Is there a log of the preclear output someplace?

 

You should be able to scroll backwards (and forwards) on the console by using Shift-Pg-Up and Shift-PgDown

 

Yes, if there is a lot of differences in the "smart" output, it will scroll the rest off the top of the screen.

 

The actual "smart" output files are in /tmp/smart_startNNNN and /tmp/smart_finishNNNN  where NNNN = the process ID of the clearing script.

Type

ls -l /tmp/smart*

to see their names.

 

You can re-create the "diff" with

diff /tmp/smart_startNNN /tmp/smart_finishNNN

 

The actual "SMART" output is also saved in your syslog.  You can look in /var/log/syslog for it.  You can use the "syslog" viewer built into unmenu to see it there.

 

The "Ghost" entry in unMRNU is not a ghost, it is an actual partition.  In fact, it was the most difficult part of the pre-clear script to get correct.  It has to be exactly as if unRAID had set up the partition, skipping the first cylinder on the disk, and extending for the entire remainder of the drive.  The pre-clear process creates that partition on the cleared disk.  It does not put a file-system on it, but the partition is there, and it would be /dev/sdb1 (for /dev/sdb)

 

If you are using unMENU you can use the "Smart" view of the myMain plug-in page to see how the drive did as far as SMART goes.  Most important are any re-allocated sectors, and any pending re-allocation.

I recently purchased two 1.5TB drives and have been putting them through pre-clear cycles to burn them in.  Below is a screen capture of the myMain "Smart view" for two of my new drives I am burning in.

 

One of them (sdb) initially had a bad cable, so the "reported_uncorrect" errors are not as bad as it might seem.  That same drive re-allocated three sectors the first time I did a pre-clear.  I've been running it again and again, and the number or reallocated sectors has not increased, so the drive is probably stable.  (In any case, it has a 5 Yr warranty, so I'll keep an eye on it)

2wftffl.jpg

 

It sounds like everything went as expected with your preclear.  You can test it, of course, by typing

preclear_disk.sh -t /dev/sdb

 

Joe L.

What I find most interesting is that unless you get SMART reports on the drives you have no idea these errors are happening... That means "some" of the MS-Windows errors we see might be a disk acting up, and not the Microsoft-OS.  Of course, they should give you the tools to monitor the disks health... but they don't.  <rant> (A crashed disk/computer is often leads to a NEW sale of a Microsoft-OS. They really don't have a huge incentive to keep the existing OS working, besides, they give no easy way to replace the disk anyway when it starts to go bad.) </rant> 

Link to comment

Just started this script on a 1TB Seagate drive.  Am going for 3 cycles and will let everyone know how long it takes (expect to here back sometime tomorrow night most likely).

 

This is a great little script that would be great if it was included in unMenu (which i still need to get working with my BubbaRaid install).

 

Thanks for the work you have done Joe!!

Link to comment

starting to preclear 2 1.5TB Seagates. brand new box, brand new unRAID user.

 

similar to a previous smartctl post on this thread, my drives both show this in the report:

Device is: Not in smartctl database

 

Does this mean i have to configure something differently to take advantage of SMART?

 

tia

 

And, Joe - thanks for this tool, looks like a real timesaver!

Link to comment

starting to preclear 2 1.5TB Seagates. brand new box, brand new unRAID user.

 

similar to a previous smartctl post on this thread, my drives both show this in the report:

Device is:        Not in smartctl database

 

Does this mean i have to configure something differently to take advantage of SMART?

 

tia

 

And, Joe - thanks for this tool, looks like a real timesaver!

Nothing you can do until the drives get added to the next version of smartctl.  It happens with lots of new drives.

I took a look a few hours ago, 5.38 is the most current version of smartctl unless you want to go to their development CVS tree and compile it yourself.

 

Fortunately, most of the SMART parameters are common between the manufacturers and drive models, so the SMART reports will still help to know if the drive is acting up.

 

I'll be curious to learn how quickly the drives clear on your server.  On my array it took about 20 hours to do two concurrent 1.5TB drives while it was also doing a monthly parity check I had scheduled.  All I can say is the PCI bus on mt poor server was probably very glad when it was over.

 

Joe L.

Link to comment

I'll be curious to learn how quickly the drives clear on your server.  On my array it took about 20 hours to do two concurrent 1.5TB drives while it was also doing a monthly parity check I had scheduled.   All I can say is the PCI bus on mt poor server was probably very glad when it was over.

 

looks like the 1st one i kicked of will complete in about 14:20. the 2nd completed in 12:35. I think I only enabled SMART on the drives after the process started, so I'll rerun both in a bit - doesn't hurt to be sure.

Link to comment

Mine got done in 26 hours and 35 minutes.  That was 3 cycles on a 1TB Seagate drive.

 

The script worked great and it stressed the drive like i wanted.  Once i get done with this i will might run it on the old parity drive.  I'm not sure i really want to/need to as the old parity drive was running fine.

Link to comment

I just kicked off three telnet preclear sessions on three new 1.5TB drives.

 

Is the time display supposed to show the dashes?

Yes. Oops, I see what you are talking about now...

Looks like your time-zone might not be set on your server.

 

What do you get when you type:

date '+%s'

in another telnet window.   I'll bet it is not just a number of "seconds" it returns.

 

It should look like this:

root@Tower:/boot# date '+%s'

1231539768

This was fixed in the most recent 4.4.2 unraid release, and broken in 4.4 and 4.5beta.

 

The pre-clear will still work, but the elapsed time might need to be tracked manually.

Joe L.

Link to comment

This is a fresh (as in rolled 1/2 hour before use) install of 4.4.2 with the only customizations being the download and install of the smartctl libraries, and download of the new york timezone file. Timezone is set to custom in the configuration. The date command as you specified returned 1231549459 as of 8:05 eastern.

Link to comment

This is a fresh (as in rolled 1/2 hour before use) install of 4.4.2 with the only customizations being the download and install of the smartctl libraries, and download of the new york timezone file. Timezone is set to custom in the configuration. The date command as you specified returned 1231549459 as of 8:05 eastern.

Interesting...  I just loaded 4.4.2 myself the other day, but I don't think I've pre-cleared a disk since then. 

I'll need to give it a try. 

 

What "telnet" client are you using?  Are you using "putty" or the command built into windows?  I'm in the same time-zone as you, so my server should act the same.

 

Joe L.

Link to comment

The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why?

 

Type:

fdisk -l /dev/sdb

dd if=/dev/sdb count=1 | od -x -A d

 

Post the output of both commands.

 

Should be interesting to see what happened.

 

The "dd" output should look like this for a 1.5TB drive (assuming your geometry is the same as my 1.5TB drive)

root@Tower:/boot# dd if=/dev/sdb count=1 | od -x -A d

1+0 records in

1+0 records out

512 bytes (512 B) copied, 0.00120228 s, 426 kB/s

0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

0000448 0000 0000 0000 003f 0000 7af1 aea8 0000

0000464 0000 0000 0000 0000 0000 0000 0000 0000

*

0000496 0000 0000 0000 0000 0000 0000 0000 aa55

0000512

 

The fdisk something like this:

root@Tower:/boot# fdisk -l /dev/sdb

 

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes

255 heads, 63 sectors/track, 182401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

 

  Device Boot      Start        End      Blocks  Id  System

/dev/sdb1              1      182402  1465138552+  0  Empty

Partition 1 does not end on cylinder boundary.

 

Joe L.

Link to comment

Hi, thanks for a great script!  I used it on a 1 TB WD10EADS yesterday and it seemed to get through it ok.  It came up with one mildly worrisome error:

 

UDMA_CRC_Error_Count : 1

 

Is that something worth worrying about?

 

Jan 12 00:54:05 Tower preclear_disk-finish[1004]: SMART Attributes Data Structure revision number: 16
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: Vendor Specific SMART Attributes with Thresholds:
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   3 Spin_Up_Time            0x0027   170   169   021    Pre-fail  Always       -       6483
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       25
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       12
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       23
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       20
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       25
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

 

 

Took about 11.5 hours to get through one cycle.

 

I'm hoping the fact that it's just one UDMA CRC error that I should be OK.

 

 

I'm going to do another WD10EADS shortly.

 

Next question: I'd like to stress-test a drive that is already part of my array (2 data drives, no parity drive), but doesn't have any data on it yet.  I thought about using the preclear utility on it, but if I remove it from the array, then I can't restart the array - I get the "Too many wrong/missing disks" error.  Can I just use the restore function?  Is there a better way to stress-test a drive that is already part of the array?  Actually I wouldn't mind testing the drive that DOES have data on it as well.

 

 

Link to comment

The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why?

 

Type:

fdisk -l /dev/sdb

dd if=/dev/sdb count=1 | od -x -A d

 

Post the output of both commands.

 

Joe L.

 

Tower login: root

Linux 2.6.27.7-unRAID.

root@Tower:~# fdisk -l /dev/sdb

 

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes

255 heads, 63 sectors/track, 182401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

 

  Device Boot      Start         End      Blocks   Id  System

/dev/sdb1               1      182402  1465138552+   0  Empty

Partition 1 does not end on cylinder boundary.

root@Tower:~# dd if=/dev/sdb count=1 | od -x -A d

1+0 records in

1+0 records out

512 bytes (512 B) copied, 0.000298241 s, 1.7 MB/s

0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

0000448 0000 0000 0000 003f 0000 7af1 aea8 0000

0000464 0000 0000 0000 0000 0000 0000 0000 0000

*

0000496 0000 0000 0000 0000 0000 0000 0000 aa55

0000512

root@Tower:~#

 

Link to comment

The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why?

 

Type:

fdisk -l /dev/sdb

dd if=/dev/sdb count=1 | od -x -A d

 

Post the output of both commands.

 

Joe L.

 

Tower login: root

Linux 2.6.27.7-unRAID.

root@Tower:~# fdisk -l /dev/sdb

 

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes

255 heads, 63 sectors/track, 182401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

 

  Device Boot      Start         End      Blocks   Id  System

/dev/sdb1               1      182402  1465138552+   0  Empty

Partition 1 does not end on cylinder boundary.

root@Tower:~# dd if=/dev/sdb count=1 | od -x -A d

1+0 records in

1+0 records out

512 bytes (512 B) copied, 0.000298241 s, 1.7 MB/s

0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

0000448 0000 0000 0000 003f 0000 7af1 aea8 0000

0000464 0000 0000 0000 0000 0000 0000 0000 0000

*

0000496 0000 0000 0000 0000 0000 0000 0000 aa55

0000512

root@Tower:~#

 

It sure looks to me as if the geometry is identical, and the "od" looks the same too as mine for the 1.5TB disk.

 

What do you get if you type:

preclear_disk.sh -t /dev/sdb

 

I'll be shocked if it does not indicate the clearing worked as it was supposed to.

I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation.  It is almost as if the "shell" was having memory problems.

 

We you doing anything else at the time the preclear was occurring to the same disk?  Did you reset the time-zone and/or time when pre-clear was in progress?

 

Could you have had a second preclear_disk.sh running on the same disk at the same time?

 

If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors.  Something made it think the data was different the last time it looked.

 

Joe L.

 

 

Link to comment

 

What do you get if you type:

preclear_disk.sh -t /dev/sdb

 

I'll be shocked if it does not indicate the clearing worked as it was supposed to.

I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation.  It is almost as if the "shell" was having memory problems.

 

We you doing anything else at the time the preclear was occurring to the same disk?  Did you reset the time-zone and/or time when pre-clear was in progress?

 

Could you have had a second preclear_disk.sh running on the same disk at the same time?

 

If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors.  Something made it think the data was different the last time it looked.

 

Joe L.

 

 

 

root@Tower:/boot/scripts# preclear_disk.sh -t /dev/sdb
Pre-Clear unRAID Disk
########################################################################
Device Model:     ST31500341AS
Serial Number:    9VS0HE2T
Firmware Version: CC1H
User Capacity:    1,500,301,910,016 bytes

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63  2930277167  1465138552+   0  Empty
Partition 1 does not end on cylinder boundary.
########################################################################
============================================================================
==
== DISK /dev/sdb IS PRECLEARED
==
============================================================================
root@Tower:/boot/scripts#

All I did was boot the server, install the smartctl libraries, install the preclear script, and kick it off in three different telnet windows on the three different drives. 2 completed successfully, 1 didn't.

Link to comment

 

What do you get if you type:

preclear_disk.sh -t /dev/sdb

 

I'll be shocked if it does not indicate the clearing worked as it was supposed to.

I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation.  It is almost as if the "shell" was having memory problems.

 

We you doing anything else at the time the preclear was occurring to the same disk?  Did you reset the time-zone and/or time when pre-clear was in progress?

 

Could you have had a second preclear_disk.sh running on the same disk at the same time?

 

If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors.  Something made it think the data was different the last time it looked.

 

Joe L.

 

 

 

root@Tower:/boot/scripts# preclear_disk.sh -t /dev/sdb
Pre-Clear unRAID Disk
########################################################################
Device Model:     ST31500341AS
Serial Number:    9VS0HE2T
Firmware Version: CC1H
User Capacity:    1,500,301,910,016 bytes

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63  2930277167  1465138552+   0  Empty
Partition 1 does not end on cylinder boundary.
########################################################################
============================================================================
==
== DISK /dev/sdb IS PRECLEARED
==
============================================================================
root@Tower:/boot/scripts#

All I did was boot the server, install the smartctl libraries, install the preclear script, and kick it off in three different telnet windows on the three different drives. 2 completed successfully, 1 didn't.

I would do a through memory test then, and/or replace the cable to disk with another, as there would be no reason why reading a disk one day would give a different result than reading it the next.   

 

In any case, you will want to run it through another pre_clear disk cycle, just to make sure it is working well before you add it to the array.  That is one of the major reasons you are burning in the drives... to detect errors that are much harder to deal with once you start using the disks for data.

 

Joe L.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.