mcs

Potential Samsung F4 issues.

Recommended Posts

sounds nice.  Always love to hear about silent corruption.  

 

to turn off write caching -

 

hdparm -W 0 /dev/sda  (replace sda with your disk)

 

to look at write caching status :-

 

hdparm -W /dev/sda (replace sda with your disk)

 

Question for the experts : Will the mover script pickup this corruption?

Share this post


Link to post
Share on other sites
Joe L.    6

Question for the experts : Will the mover script pickup this corruption?

No script will pick up on the problem...  There is no error returned but the data "written" to the disk is not written to the disk platter.

 

It is even worse than that... If you do a parity "check" it will read the zeros (or whatever) was in the sectors that should have been written, report a parity error and then update parity to reflect the bad data on the disk.  Ouch.

 

If you disabled the F4 disk (un-assign it) then start the array without it, then stop and re-assign it you can get the parity disk in combination with the other disks to potentially write the correct data to the F4 drive as part of its re-construction.  You'll be without parity protection until it completes, but at least your data will be good when it is done.

 

This will only work if you have not over-written parity with the stock unRAID "Check" button before attempting the drive re-construction.

 

Joe L.

Share this post


Link to post
Share on other sites
Joe L.    6

So rsync doesn't check it has written the data correctly before deleting the original files?  or do I not understand the actual problem? :)

I did not think of that... I'm not sure if it reads it back when on the same server, or just does the checksum when used across servers.

 

HOWEVER... since the file was just written it would still be in the Linix buffer cache and if read back it would not even go to the disk itself unless the file was huge and could not all be buffered.  The file would be read back from memory not the physical disk.

Share this post


Link to post
Share on other sites

Yeah,  I'd pretty much came to that conclusion too Joe.  Seems I have 800gb of data with a question mark hanging over it......  time to get the original media out methinks :)

Share this post


Link to post
Share on other sites
Joe L.    6

Yeah,  I'd pretty much came to that conclusion too Joe.  Seems I have 800gb of data with a question mark hanging over it......  time to get the original media out methinks :)

If you trust the rest of your disks,

Stop the array

un-assign the F4 disk

Start the array  (this will cause unRAID to forget the serial number of the F4 disk so it can be used as its own replacement)

Stop the array

re-assign the F4 disk

Start the array.

Let it re-construct the F4 disk.  It will fix any of the "data blocks" that were never written to the disk.

 

Yes you'll be without parity protection until the disk is re-constructed, but since you are really writing back exactly what was on the disk you can recover (somewhat) if another disk were to fail by forcing parity to be trusted.  You are really only going to "potentially" update the blocks that were not correctly written originally.

 

Joe L.

Share this post


Link to post
Share on other sites

Actually, thinking about it, under what conditions would this problem occur under normal unRaid operation?  Refreshing main.htm while writing to the dodgy disk? Using smartctl or hdparm from the command line while writing obviously.

 

At this point I'm thinking I'll just disable write cache for the drive and suck up any corruption if and when I find it.  Parity check ran on the 1st without listing any errors and I've watched maybe 50% of the films on that drive without noticing anything.

 

Thanks for the instructions Joe but my monthly Parity check ran 3 days ago so I think, if I understand correctly, any bad data is now part of the array :)

Share this post


Link to post
Share on other sites
pras1011    0

I am building a new server and I have 7 brand new untouched F4. I have one Hitachi 7k2000 that will act as parity.

 

How do I put them into the server to avoid any problems?

Share this post


Link to post
Share on other sites
johnny121b    0

Oh, this is depressing....especially on December 4th  Thanks for the command line to turn off write caching, Chris.

 

By turning off the caching as described, will it remain off if the server is restarted?  If not, how can I ensure that it does?

 

I know that I have never issued the commands at the command-line prompt....but are they something that UnMenu or even UnRaid make-use-of???

 

 

Share this post


Link to post
Share on other sites
graywolf    5

Have no clue if it would turn back on with a server restart/reboot

 

But if you wanted, you could put the command in your go script.

 

Share this post


Link to post
Share on other sites

I am building a new server and I have 7 brand new untouched F4. I have one Hitachi 7k2000 that will act as parity.

 

How do I put them into the server to avoid any problems?

 

turn off write caching before you write any data to them :-

 

hdparm -W 0 /dev/sda  (replace sda with your disk)

 

wait for a new firmware from Samsung.

 

By turning off the caching as described, will it remain off if the server is restarted?  If not, how can I ensure that it does?

 

The setting survives a reboot on my system, yes.

 

Share this post


Link to post
Share on other sites
pras1011    0

This is absolutely ridiculous!! There doesn't seem to be any HDD that is safe to use. Apart from maybe the Hitachi 7K2000 2TB but that hdd is noisy, runs extremely hot and expensive!

Share this post


Link to post
Share on other sites

This is absolutely ridiculous!! There doesn't seem to be any HDD that is safe to use. Apart from maybe the Hitachi 7K2000 2TB but that hdd is noisy, runs extremely hot and expensive!

 

What about the Seagate ST32000542AS ?  or is there some problem with that one too?

Share this post


Link to post
Share on other sites
pras1011    0

This is absolutely ridiculous!! There doesn't seem to be any HDD that is safe to use. Apart from maybe the Hitachi 7K2000 2TB but that hdd is noisy, runs extremely hot and expensive!

 

What about the Seagate ST32000542AS ?  or is there some problem with that one too?

 

Apparently that one has only 50,000 load/unload cycles. There was something about the firmware.

Share this post


Link to post
Share on other sites

The problem could not be reproduced with the above test if any of the following conditions are met:

 

* Disk write cache is disabled.

 

* NCQ is disabled. This may not always be true as the c't lab also reported problems with NCQ disabled.

 

* A modified test version of smartctl which does not issue IDENTIFY DEVICE commands is used. Then all other SMART and non-SMART commands used by smartctl work without any data loss.

 

Christian Franke

 

 

 

NCQ Is disabled on my system, i run virtual machines and have 5 of these 204UI disks and have never noticed the issue and you think i would considering a virtual machine would be very sensitive to data corruption.

 

Also putting the pc in IDE mode according to christian will alleviate the issue.

 

since im running so many of these disks im going to spend some serious time trying to make this issue show up.

Share this post


Link to post
Share on other sites

Well I can recreate the issue and can confirm that it doesn't occur if you turn off write caching.  AVOID THESE DISKS

 

Methodology :-

 

Copy large file onto the Samsung disk from another

Run smartctl -i /dev/sdf a few times in another window

Run md5sum on source and destination files.  Different checksums reported.

 

After issuing hdparm -W 0 /dev/sdf repeat the above several times, checksums are always the same.

 

EDIT : Just in case there is any doubt, I do not recommend doing the above on your live array as it will invalidate parity!

Share this post


Link to post
Share on other sites
bjp999    149

If you run a parity check with one of these disks in the array, it would "correct" parity to fit the corrupted data.  Parity will show one or more sync errors as a result.

 

Therefore, I would recommend only running read-only parity checks.  If you only had one F4 you would be able to use unRAID to reverse the corruption using parity and the other disks, but if you have multiple of them you wouldn't have enough info.

Share this post


Link to post
Share on other sites
prostuff1    0

Is there any problems with the Hitachi 7K2000?

 

Not that I know of.  I am running 2 of them in my system and they are working great.  They do run hotter then the WD Green and Seagate LP drives but they are not bad.  They are better then the 7200 RPM Seagate I have

Share this post


Link to post
Share on other sites
madpoet    0

I am so confused :(  I have 4 of them in my array, and in fact just ran a parity check.  Am I screwed?

 

Dec  4 16:33:59 Tower kernel: md: sync done. time=30247sec rate=64585K/sec

Dec  4 16:33:59 Tower kernel: md: recovery thread sync completion status: 0

 

Share this post


Link to post
Share on other sites

I think you would see sync errors on parity check if you had encountered any corruption.  Certainly seems to be the case on my test box with one of these as a data disk.  Turn off the write cache and wait for a new firmware.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Copyright © 2005-2017 Lime Technology, Inc. unRAID® is a registered trademark of Lime Technology, Inc.