Potential Samsung F4 issues.


Recommended Posts

  • Replies 239
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

my testing indicates that you WILL see sync errors when checking parity if you have encountered this issue, I would suggest that running a parity check should either ease your mind or confirm that you have corruption.

If you have one of these drives I'd suggest a read only parity check.

 

You can do it from the command line.

/root/mdcmd check NOCORRECT

If it shows no errors you are fine.  It will look exactly like the normal check and the web-interface will still say it is correcting the errors but it is not.

 

If it does show parity errors AND if you ONLY have one of the F4 drives in your array, you can

Stop the array

un-assign the F4 drive

Start the array with it un-assigned  (This will cause the array to forget its serial number so it can be used as its own replacement)

Stop the array once more

Re-Assign the F4 drive

Start the array again... It will re-construct the data onto the F4 drive.  It will have the correct data.  

 

This is assuming the F4 drive is a data drive and not the parity drive.  If the F4 drive is the parity drive just do a normal parity "Check" and it will be corrected based on the actual data on the data disk.

 

You'll be without parity protection for the duration of the re-construction but since you are actually only writing what is already there for most of the disk you are no worse off than with the corruption.  (You could force the F4 data disk to be valid if another were to fail during the process.)

 

Joe L.

Link to comment

my testing indicates that you WILL see sync errors when checking parity if you have encountered this issue, I would suggest that running a parity check should either ease your mind or confirm that you have corruption.

If you have one of these drives I'd suggest a read only parity check.

 

You can do it from the command line.

/root/mdcmd check READONLY

If it shows no errors you are fine.  It will look exactly like the normal check and the web-interface will still say it is correcting the errors but it is not.

 

If it does show parity errors AND if you ONLY have one of the F4 drives in your array, you can

Stop the array

un-assign the F4 drive

Start the array with it un-assigned  (This will cause the array to forget its serial number so it can be used as its own replacement)

Stop the array once more

Re-Assign the F4 drive

Start the array again... It will re-construct the data onto the F4 drive.  It will have the correct data.  

 

This is assuming the F4 drive is a data drive and not the parity drive.  If the F4 drive is the parity drive just do a normal parity "Check" and it will be corrected based on the actual data on the data disk.

 

You'll be without parity protection for the duration of the re-construction but since you are actually only writing what is already there for most of the disk you are no worse off than with the corruption.  (You could force the F4 data disk to be valid if another were to fail during the process.)

 

Joe L.

 

 

Is that the actual command?  Is readonly supposed to be in all CAPS?  If so I get this:

 

root@Tower:~# /root/mdcmd check READONLY

/root/mdcmd: line 11: echo: write error: Invalid argument

 

 

Link to comment

my testing indicates that you WILL see sync errors when checking parity if you have encountered this issue, I would suggest that running a parity check should either ease your mind or confirm that you have corruption.

If you have one of these drives I'd suggest a read only parity check.

 

You can do it from the command line.

/root/mdcmd check READONLY

If it shows no errors you are fine.  It will look exactly like the normal check and the web-interface will still say it is correcting the errors but it is not.

 

If it does show parity errors AND if you ONLY have one of the F4 drives in your array, you can

Stop the array

un-assign the F4 drive

Start the array with it un-assigned  (This will cause the array to forget its serial number so it can be used as its own replacement)

Stop the array once more

Re-Assign the F4 drive

Start the array again... It will re-construct the data onto the F4 drive.  It will have the correct data.  

 

This is assuming the F4 drive is a data drive and not the parity drive.  If the F4 drive is the parity drive just do a normal parity "Check" and it will be corrected based on the actual data on the data disk.

 

You'll be without parity protection for the duration of the re-construction but since you are actually only writing what is already there for most of the disk you are no worse off than with the corruption.  (You could force the F4 data disk to be valid if another were to fail during the process.)

 

Joe L.

 

 

Is that the actual command?  Is readonly supposed to be in all CAPS?  If so I get this:

 

root@Tower:~# /root/mdcmd check READONLY

/root/mdcmd: line 11: echo: write error: Invalid argument

 

 

Yes it is supposed to be all caps...  but I made a mistake in the command argument

 

It should be

/root/mdcmd check NOCORRECT

 

Sorry. 

 

 

Link to comment

Is that the actual command?  Is readonly supposed to be in all CAPS?  If so I get this:

 

root@Tower:~# /root/mdcmd check READONLY

/root/mdcmd: line 11: echo: write error: Invalid argument

 

 

 

No, if I remember correctly the command is:

 

/root/mdcmd check NOCORRECT

 

 

EDIT: beat by 6 seconds by JoeL

Link to comment

Is that the actual command?  Is readonly supposed to be in all CAPS?  If so I get this:

 

root@Tower:~# /root/mdcmd check READONLY

/root/mdcmd: line 11: echo: write error: Invalid argument

 

 

 

No, if I remember correctly the command is:

 

/root/mdcmd check NOCORRECT

<embarrassed> yes </embarrassed>  My photographic memory apparently had the wrong photograph.
Link to comment

Until a fw fix comes out wouldn't we want to disable the write cache in the middle of your process Joe?  Thanks.

It would still require users to download and use the newer version.

 

I suppose it would not hurt, but as mentioned earlier, many shell scripts that monitor the disks use hdparm and smartctl.

 

Joe L.

Link to comment

I didn't realize how labor intensive doing the checksum thing would be. Considering the number of files I would have to check it's just not a feasible option (unless I'm missing something). So, two questions.

 

1. All I've done is basically move a bunch of files off of external harddrives onto the system. unMenu wasn't being used at the time (someone asked about this) or any other addon that I know of. What are the chances I'm going to run into a problem?

 

2. When the firmware is released, will it only solve the issue for newly added files, or will it fix any possible issues that arose before the firmware fix was applied?

 

If I have to I'll just re-preclear the drives. At least that's automated  :-\

 

Edit: Also, if you're using one of these for your parity, is it still ok to turn off write caching?

Link to comment

I didn't realize how labor intensive doing the checksum thing would be. Considering the number of files I would have to check it's just not a feasible option (unless I'm missing something). So, two questions.

 

1. All I've done is basically move a bunch of files off of external harddrives onto the system. unMenu wasn't being used at the time (someone asked about this) or any other addon that I know of. What are the chances I'm going to run into a problem?

Unless you were issuing hdparm or smartctl commands during the time you were copying the files (and you probably were not) you'll not have any problem at all.

2. When the firmware is released, will it only solve the issue for newly added files, or will it fix any possible issues that arose before the firmware fix was applied?

Only for future files written to the drive. It will have no effect on files that were not written previously.  They will still not be written.

If I have to I'll just re-preclear the drives. At least that's automated  :-\

I don't think you'll need to, from what you've said.

Edit: Also, if you're using one of these for your parity, is it still ok to turn off write caching?

Yes.  you'll want to turn off the write-caching.

 

Joe L.

Link to comment

unMenu is installed, although I haven't really used it at all. I've been following the Configuration tutorial found here:

http://lime-technology.com/wiki/index.php?title=Configuration_Tutorial

 

However, the web page was not open during file transfer. I haven't opened it since it was originally called for in that tutorial.

 

You should be just fine then. You'd have to be doing something that was querying the drives (unMENU or command line stuff) while transferring data. You're saying you didn't do anything but boot unRAID and moving files.

 

Peter

Link to comment

Hrm... how the heck do you patch your drives in unRaid?  Pull them and patch on another machine, then put them back?

 

I think the easiest way to patch if you do not want to remove your drive is to unplug either the sata or the power from all drives but the samsung one.  Then boot from a seperate flash drive with the upgrade on it.

 

Here is a simple tutorial for getting DOS boot files onto a flash drive.

http://thelostbrain.com/post/2008/01/Make-your-flash-drive-bootable!-%28Boot-into-DOS-with-access-to-full-flash-drive-space%29.aspx

Link to comment

Do you think that it is safe to re-enable write caching after the firmware upgrade? ???

Personally if I had one of those drives, I'd wait before applying the patched firmware until you read of reports of it working.

 

Are you certain it corrects the problem? and that no others are introduced?  I would wait until I learn the patch process is tried by other more-anxious users of their disks...  It here are wrinkles in their patch process, I'd want it to happen to somebody else first.

 

 

 

Link to comment

The description on that "patch" seems to indicate the problem occurred if NCQ was enabled and an IDENTITY command issued.

For most unRAID users, unless you changed the default settings, this might be good news, as NCQ is disabled by default.

 

From the bug description linked earlier, it did not matter if NCQ was enabled or not, the investigating labs were able to duplicate the bug without NCQ.

 

I would certainly wait until it's been confirmed by many others including ct labs and the bug tracker that this firmware indeed corrects the issue. I have a feeling it might only partially correct the problem.

 

http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks

This may not always be true as the c't lab also reported problems with NCQ disabled.

 

 

To make matters even worse, Samsung's patch does not change the firmware reported! How sloppy and irresponsible of them. Now there is no way of knowing if you need the patch or not. For shame!

 

http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks

The patch did not change the firmware version number reported by IDENTIFY DEVICE:

 

smartctl -i -q noserial /dev/sda

smartctl 5.40 2010-10-16 r3189 [i686-pc-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

=== START OF INFORMATION SECTION ===

Device Model:    SAMSUNG HD204UI

Firmware Version: 1AQ10001

User Capacity:    2,000,398,934,016 bytes

Link to comment
Personally if I had one of those drives, I'd wait before applying the patched firmware until you read of reports of it working.

 

Are you certain it corrects the problem? and that no others are introduced?   I would wait until I learn the patch process is tried by other more-anxious users of their disks...   It here are wrinkles in their patch process, I'd want it to happen to somebody else first.

 

Always a good plan. I believe Zotac released a BIOS that bricked a whole bunch of motherboards one time in history. But then, that could have been Seagate during their drive issues...

 

You either must be ready for the worst to happen (the drive failing badly in some manner) or just let others be the initial testers.

 

Peter

Link to comment

5187799933

 

And I stupidly updated parity without checking the forums when I saw 14 errors.  The fact that I did it before this thread started doesn't make it suck any less.

 

On the plus side, I'm pretty sure everything I've stored on this drive is from bittorrent so I can just force a re-check and patch up the corrupt files.  Now I have to decide if I want to Go For It and try the firmware update or grab an EARS drive on the way home and put the F4 in the time-out zone until the firmware fix has been tested and unRAID provides AF support.  I think I'll go with the time-out.  At least I know the EARS drive will give its all with the jumper installed and, by the time I finish filling it, it will probably be safe to bring the F4 back.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.