Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add

Joe L. · January 5, 2009

Here's a really dumb question:

If I have a headless unRAID box, I can use Putty to connect to my server and start the preclear script. But, do I have to leave the Putty session open for 10+ hours in order to view the status? Is there any way to start preclear, disconnect from my box, connect again and view the preclear status?

I bought myself 2 WD Green drives that I am going to preclear.

Thanks for any info.

You can install "screen" as a supplemental package. When you invoke it and then start a command you can then disconnect and later re-connect to a running process. Otherwise, there is no other way I know if you don't have a system console.

Both these packages are needed. Use "installpkg package_name.tgz" to install each in turn as shown below.

http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz

and

http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz

Most of us have a "packages" directory to hold downloaded packages. Create it by typing

mkdir /boot/packages

Download the two files by either typing:

cd /boot/packages

wget http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/ap/screen-4.0.3-i486-1.tgz

and

cd /boot/packages

wget http://slackware.cs.utah.edu/pub/slackware/slackware-12.2/slackware/a/utempter-1.1.4-i486-1.tgz

Or download them to your windows PC by clicking on the links above, and then move them to the packages folder on your flash drive using windows file-explorer. (you will need to create the "packages" folder if it does not exist)

\\tower\flash\packages

To install these packages, log onto the unRAID server as root and then type:

cd /boot/packages

installpkg utempter-1.1.4-i486-1.tgz

installpkg screen-4.0.3-i486-1.tgz

cd /boot

Then type

screen

Then start up the preclear_disk.sh process.

To detach, leaving the preclear_disk.sh process running, type

Control-A d

Then, 10 hours later you can re-attach to the running process by logging in and typing

screen -r

To create another screen window for a second/third concurrent preclear, type

Control-A c

To switch between the screen windows type:

Control-A P

or

Control-A N

for the next or previous screen session

A good article on "screen" can be found here:

http://www.linuxjournal.com/article/6340

The manual page for screen is here:

http://ss64.com/bash/screen.html

It can do a lot more. You can "name" the screen sessions, list the sessions

Control-A "

(Control-A followed by a "quote")

Edit: updated links to screen packages

Joe L.

abq-pete · January 5, 2009

Joe,

How about an option to output to a text file and copy the text file to the flash root when completed?

Regards, Peter

bill_in_socal · January 5, 2009

Thanks for that golden nugget Joe! I can imagine that someday preclear will be integrated into unMenu. But until then, "screen" looks like a great solution. I am going to give it a shot.

Thanks for your time and I hope you got to do your new server build this weekend.

Thanks again

bill_in_socal · January 5, 2009

I ran my first preclear last night. I connected a console to my unRAID server.

It appeared to run as expected steps 1 & 2 as I retired for the evening. This morning, the console displays what looks like a lot of double spaced SMART info. I can't see it all on the screen. And, I have a "ghost" entry "sdb1" in my unMenu drive listing which wasn't there before preclear completed. So, I don't know if preclear was successful or not.

Is there a log of the preclear output someplace?

Joe L. · January 5, 2009

I ran my first preclear last night. I connected a console to my unRAID server.

It appeared to run as expected steps 1 & 2 as I retired for the evening. This morning, the console displays what looks like a lot of double spaced SMART info. I can't see it all on the screen. And, I have a "ghost" entry "sdb1" in my unMenu drive listing which wasn't there before preclear completed. So, I don't know if preclear was successful or not.

Is there a log of the preclear output someplace?

You should be able to scroll backwards (and forwards) on the console by using Shift-Pg-Up and Shift-PgDown

Yes, if there is a lot of differences in the "smart" output, it will scroll the rest off the top of the screen.

The actual "smart" output files are in /tmp/smart_startNNNN and /tmp/smart_finishNNNN where NNNN = the process ID of the clearing script.

Type

ls -l /tmp/smart*

to see their names.

You can re-create the "diff" with

diff /tmp/smart_startNNN /tmp/smart_finishNNN

The actual "SMART" output is also saved in your syslog. You can look in /var/log/syslog for it. You can use the "syslog" viewer built into unmenu to see it there.

The "Ghost" entry in unMRNU is not a ghost, it is an actual partition. In fact, it was the most difficult part of the pre-clear script to get correct. It has to be exactly as if unRAID had set up the partition, skipping the first cylinder on the disk, and extending for the entire remainder of the drive. The pre-clear process creates that partition on the cleared disk. It does not put a file-system on it, but the partition is there, and it would be /dev/sdb1 (for /dev/sdb)

If you are using unMENU you can use the "Smart" view of the myMain plug-in page to see how the drive did as far as SMART goes. Most important are any re-allocated sectors, and any pending re-allocation.

I recently purchased two 1.5TB drives and have been putting them through pre-clear cycles to burn them in. Below is a screen capture of the myMain "Smart view" for two of my new drives I am burning in.

One of them (sdb) initially had a bad cable, so the "reported_uncorrect" errors are not as bad as it might seem. That same drive re-allocated three sectors the first time I did a pre-clear. I've been running it again and again, and the number or reallocated sectors has not increased, so the drive is probably stable. (In any case, it has a 5 Yr warranty, so I'll keep an eye on it)

It sounds like everything went as expected with your preclear. You can test it, of course, by typing

preclear_disk.sh -t /dev/sdb

Joe L.

What I find most interesting is that unless you get SMART reports on the drives you have no idea these errors are happening... That means "some" of the MS-Windows errors we see might be a disk acting up, and not the Microsoft-OS. Of course, they should give you the tools to monitor the disks health... but they don't. <rant> (A crashed disk/computer is often leads to a NEW sale of a Microsoft-OS. They really don't have a huge incentive to keep the existing OS working, besides, they give no easy way to replace the disk anyway when it starts to go bad.) </rant>

bill_in_socal · January 5, 2009

Excellent Joe. I didn't even think to look at the myMenu SMART page. Looks like my drive is OK. Was also unaware of the console scrolling hotkeys.

Expecting 2 more 1TB WD drives today, so preclear is going to be busy.

I really appreciate all your help. Thanks!

prostuff1 · January 7, 2009

Just started this script on a 1TB Seagate drive. Am going for 3 cycles and will let everyone know how long it takes (expect to here back sometime tomorrow night most likely).

This is a great little script that would be great if it was included in unMenu (which i still need to get working with my BubbaRaid install).

Thanks for the work you have done Joe!!

JDGJr · January 7, 2009

starting to preclear 2 1.5TB Seagates. brand new box, brand new unRAID user.

similar to a previous smartctl post on this thread, my drives both show this in the report:

Device is: Not in smartctl database

Does this mean i have to configure something differently to take advantage of SMART?

tia

And, Joe - thanks for this tool, looks like a real timesaver!

Joe L. · January 7, 2009

starting to preclear 2 1.5TB Seagates. brand new box, brand new unRAID user.

similar to a previous smartctl post on this thread, my drives both show this in the report:

Device is: Not in smartctl database

Does this mean i have to configure something differently to take advantage of SMART?

tia

And, Joe - thanks for this tool, looks like a real timesaver!

Nothing you can do until the drives get added to the next version of smartctl. It happens with lots of new drives.

I took a look a few hours ago, 5.38 is the most current version of smartctl unless you want to go to their development CVS tree and compile it yourself.

Fortunately, most of the SMART parameters are common between the manufacturers and drive models, so the SMART reports will still help to know if the drive is acting up.

I'll be curious to learn how quickly the drives clear on your server. On my array it took about 20 hours to do two concurrent 1.5TB drives while it was also doing a monthly parity check I had scheduled. All I can say is the PCI bus on mt poor server was probably very glad when it was over.

Joe L.

JDGJr · January 7, 2009

I'll be curious to learn how quickly the drives clear on your server. On my array it took about 20 hours to do two concurrent 1.5TB drives while it was also doing a monthly parity check I had scheduled. All I can say is the PCI bus on mt poor server was probably very glad when it was over.

looks like the 1st one i kicked of will complete in about 14:20. the 2nd completed in 12:35. I think I only enabled SMART on the drives after the process started, so I'll rerun both in a bit - doesn't hurt to be sure.

prostuff1 · January 9, 2009

Mine got done in 26 hours and 35 minutes. That was 3 cycles on a 1TB Seagate drive.

The script worked great and it stressed the drive like i wanted. Once i get done with this i will might run it on the old parity drive. I'm not sure i really want to/need to as the old parity drive was running fine.

JonathanM · January 9, 2009

I just kicked off three telnet preclear sessions on three new 1.5TB drives.

Is the time display supposed to show the dashes?

Joe L. · January 9, 2009

I just kicked off three telnet preclear sessions on three new 1.5TB drives.

Is the time display supposed to show the dashes?

Yes. Oops, I see what you are talking about now...

Looks like your time-zone might not be set on your server.

What do you get when you type:

date '+%s'

in another telnet window. I'll bet it is not just a number of "seconds" it returns.

It should look like this:

root@Tower:/boot# date '+%s'

1231539768

This was fixed in the most recent 4.4.2 unraid release, and broken in 4.4 and 4.5beta.

The pre-clear will still work, but the elapsed time might need to be tracked manually.

Joe L.

JonathanM · January 10, 2009

This is a fresh (as in rolled 1/2 hour before use) install of 4.4.2 with the only customizations being the download and install of the smartctl libraries, and download of the new york timezone file. Timezone is set to custom in the configuration. The date command as you specified returned 1231549459 as of 8:05 eastern.

Joe L. · January 10, 2009

This is a fresh (as in rolled 1/2 hour before use) install of 4.4.2 with the only customizations being the download and install of the smartctl libraries, and download of the new york timezone file. Timezone is set to custom in the configuration. The date command as you specified returned 1231549459 as of 8:05 eastern.

Interesting... I just loaded 4.4.2 myself the other day, but I don't think I've pre-cleared a disk since then.

I'll need to give it a try.

What "telnet" client are you using? Are you using "putty" or the command built into windows? I'm in the same time-zone as you, so my server should act the same.

Joe L.

JonathanM · January 10, 2009

What "telnet" client are you using? Are you using "putty" or the command built into windows? I'm in the same time-zone as you, so my server should act the same.

This was the stock w2k command line telnet. I normally use putty, but this server is not on my home lan.

JonathanM · January 10, 2009

The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why?

Joe L. · January 10, 2009

The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why?

Type:

fdisk -l /dev/sdb

dd if=/dev/sdb count=1 | od -x -A d

Post the output of both commands.

Should be interesting to see what happened.

The "dd" output should look like this for a 1.5TB drive (assuming your geometry is the same as my 1.5TB drive)

root@Tower:/boot# dd if=/dev/sdb count=1 | od -x -A d

1+0 records in

1+0 records out

512 bytes (512 B) copied, 0.00120228 s, 426 kB/s

0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

0000448 0000 0000 0000 003f 0000 7af1 aea8 0000

0000464 0000 0000 0000 0000 0000 0000 0000 0000

*

0000496 0000 0000 0000 0000 0000 0000 0000 aa55

0000512

The fdisk something like this:

root@Tower:/boot# fdisk -l /dev/sdb

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes

255 heads, 63 sectors/track, 182401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

Device Boot Start End Blocks Id System

/dev/sdb1 1 182402 1465138552+ 0 Empty

Partition 1 does not end on cylinder boundary.

Joe L.

barbapapa · January 12, 2009

Hi, thanks for a great script! I used it on a 1 TB WD10EADS yesterday and it seemed to get through it ok. It came up with one mildly worrisome error:

UDMA_CRC_Error_Count : 1

Is that something worth worrying about?

Jan 12 00:54:05 Tower preclear_disk-finish[1004]: SMART Attributes Data Structure revision number: 16
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: Vendor Specific SMART Attributes with Thresholds:
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   3 Spin_Up_Time            0x0027   170   169   021    Pre-fail  Always       -       6483
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       25
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       12
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:  10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]:  12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       23
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       20
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       25
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       1
Jan 12 00:54:05 Tower preclear_disk-finish[1004]: 200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

Took about 11.5 hours to get through one cycle.

I'm hoping the fact that it's just one UDMA CRC error that I should be OK.

I'm going to do another WD10EADS shortly.

Next question: I'd like to stress-test a drive that is already part of my array (2 data drives, no parity drive), but doesn't have any data on it yet. I thought about using the preclear utility on it, but if I remove it from the array, then I can't restart the array - I get the "Too many wrong/missing disks" error. Can I just use the restore function? Is there a better way to stress-test a drive that is already part of the array? Actually I wouldn't mind testing the drive that DOES have data on it as well.

RobJ · January 12, 2009

UDMA_CRC_Error_Count : 1

Is that something worth worrying about?

No, unless it continues to increase. If it rises further, then you may want to replace its SATA cable with a better one.

JonathanM · January 12, 2009

The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why?

Type:

fdisk -l /dev/sdb

dd if=/dev/sdb count=1 | od -x -A d

Post the output of both commands.

Joe L.

Tower login: root

Linux 2.6.27.7-unRAID.

root@Tower:~# fdisk -l /dev/sdb

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes

255 heads, 63 sectors/track, 182401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

Device Boot Start End Blocks Id System

/dev/sdb1 1 182402 1465138552+ 0 Empty

Partition 1 does not end on cylinder boundary.

root@Tower:~# dd if=/dev/sdb count=1 | od -x -A d

1+0 records in

1+0 records out

512 bytes (512 B) copied, 0.000298241 s, 1.7 MB/s

0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

0000448 0000 0000 0000 003f 0000 7af1 aea8 0000

0000464 0000 0000 0000 0000 0000 0000 0000 0000

*

0000496 0000 0000 0000 0000 0000 0000 0000 aa55

0000512

root@Tower:~#

Joe L. · January 12, 2009

The first disk didn't complete successfully. What logs and or other info do I need to look at to figure out why?

Type:

fdisk -l /dev/sdb

dd if=/dev/sdb count=1 | od -x -A d

Post the output of both commands.

Joe L.

Tower login: root

Linux 2.6.27.7-unRAID.

root@Tower:~# fdisk -l /dev/sdb

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes

255 heads, 63 sectors/track, 182401 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Disk identifier: 0x00000000

Device Boot Start End Blocks Id System

/dev/sdb1 1 182402 1465138552+ 0 Empty

Partition 1 does not end on cylinder boundary.

root@Tower:~# dd if=/dev/sdb count=1 | od -x -A d

1+0 records in

1+0 records out

512 bytes (512 B) copied, 0.000298241 s, 1.7 MB/s

0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

0000448 0000 0000 0000 003f 0000 7af1 aea8 0000

0000464 0000 0000 0000 0000 0000 0000 0000 0000

*

0000496 0000 0000 0000 0000 0000 0000 0000 aa55

0000512

root@Tower:~#

It sure looks to me as if the geometry is identical, and the "od" looks the same too as mine for the 1.5TB disk.

What do you get if you type:

preclear_disk.sh -t /dev/sdb

I'll be shocked if it does not indicate the clearing worked as it was supposed to.

I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation. It is almost as if the "shell" was having memory problems.

We you doing anything else at the time the preclear was occurring to the same disk? Did you reset the time-zone and/or time when pre-clear was in progress?

Could you have had a second preclear_disk.sh running on the same disk at the same time?

If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors. Something made it think the data was different the last time it looked.

Joe L.

JonathanM · January 12, 2009

What do you get if you type:

preclear_disk.sh -t /dev/sdb

I'll be shocked if it does not indicate the clearing worked as it was supposed to.

I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation. It is almost as if the "shell" was having memory problems.

We you doing anything else at the time the preclear was occurring to the same disk? Did you reset the time-zone and/or time when pre-clear was in progress?

Could you have had a second preclear_disk.sh running on the same disk at the same time?

If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors. Something made it think the data was different the last time it looked.

Joe L.

root@Tower:/boot/scripts# preclear_disk.sh -t /dev/sdb
Pre-Clear unRAID Disk
########################################################################
Device Model:     ST31500341AS
Serial Number:    9VS0HE2T
Firmware Version: CC1H
User Capacity:    1,500,301,910,016 bytes

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63  2930277167  1465138552+   0  Empty
Partition 1 does not end on cylinder boundary.
########################################################################
============================================================================
==
== DISK /dev/sdb IS PRECLEARED
==
============================================================================
root@Tower:/boot/scripts#

All I did was boot the server, install the smartctl libraries, install the preclear script, and kick it off in three different telnet windows on the three different drives. 2 completed successfully, 1 didn't.

Joe L. · January 12, 2009

What do you get if you type:

preclear_disk.sh -t /dev/sdb

I'll be shocked if it does not indicate the clearing worked as it was supposed to.

I'm a bit at a loss to figure out what is happening... You also had off numbers with your elapsed time calculation. It is almost as if the "shell" was having memory problems.

We you doing anything else at the time the preclear was occurring to the same disk? Did you reset the time-zone and/or time when pre-clear was in progress?

Could you have had a second preclear_disk.sh running on the same disk at the same time?

If the output of the preclear_disk.sh -t /dev/sdb indicates it is cleared, then you might want to look at your syslog for any indications of disk read errors. Something made it think the data was different the last time it looked.

Joe L.
root@Tower:/boot/scripts# preclear_disk.sh -t /dev/sdb
Pre-Clear unRAID Disk
########################################################################
Device Model:     ST31500341AS
Serial Number:    9VS0HE2T
Firmware Version: CC1H
User Capacity:    1,500,301,910,016 bytes

Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders, total 2930277168 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1              63  2930277167  1465138552+   0  Empty
Partition 1 does not end on cylinder boundary.
########################################################################
============================================================================
==
== DISK /dev/sdb IS PRECLEARED
==
============================================================================
root@Tower:/boot/scripts#
All I did was boot the server, install the smartctl libraries, install the preclear script, and kick it off in three different telnet windows on the three different drives. 2 completed successfully, 1 didn't.

I would do a through memory test then, and/or replace the cable to disk with another, as there would be no reason why reading a disk one day would give a different result than reading it the next.

In any case, you will want to run it through another pre_clear disk cycle, just to make sure it is working well before you add it to the array. That is one of the major reasons you are burning in the drives... to detect errors that are much harder to deal with once you start using the disks for data.

Joe L.

JonathanM · January 12, 2009

I'm kicking off another set of 3 preclears on all 3 disks. We'll see in a couple days.

Re: preclear_disk.sh - a new utility to burn-in and pre-clear disks for quick add

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

Joe L.

sureguy

sureguy

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation