tstor

Members
  • Posts

    104
  • Joined

  • Last visited

Converted

  • Gender
    Male

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

tstor's Achievements

Apprentice

Apprentice (3/14)

9

Reputation

  1. No, the data you write is encoded for two reasons: guarantee a minimum of transitions for clock recovery (see RLL codes, https://en.wikipedia.org/wiki/Run-length_limited) and modern error recovery algorithms (https://en.wikipedia.org/wiki/Low-density_parity-check_code, https://web.archive.org/web/20161213104211/http://www.marvell.com/storage/assets/Marvell_88i9422_Soleil_pb_FINAL.pdf) In other words, there will always be a lot of flux reversals / polarity changes regardless of what exactly you write. Yes, but assume you have an address line defect in a higher address line (or a fake SSD with only 128 GB instead of the expected 2 TB). You would write a block, e.g. 1 GB and successfully read back from there. All your tests would be successful. During your tests you would never notice, that when you assume to write into the second 128 GB you actually overwrite the first. Only when you write the whole disk first and then start reading back would you detect this kind of defect / cheating. Since you are unlikely to have enough RAM to keep a copy of written random data for the whole disk, you need a pseudo-random sequence as a source for the written data. When you read back you can then just use the same generator and seed and generate the sequence again for comparison with the read data.
  2. Your assumptions on magnetic media are wrong. The information on a disk is not coded into magnetised / demagnetised spots, it is coded into transitions of magnetisation with opposite polarities. The read heads detect magnetic flux changes, not the magnetisation itself. Also what is written are not the bits that the disk driver hands over to the drive. The data coming from the driver gets re-encoded in a way that optimises several parameters, e.g. number of transitions for clock recovery, influence on neighbouring spots, error correction. How it is done on a given drive is the secret sauce of its manufacturer, but whatever bit patterns you send to the drive, what ends up on the disk is something different and the spots on the disk platter will always get magnetised with one polarity or the other. For disk surface testing it does therefore not matter much, whether you write zeroes or random data. For stressing the mechanical elements and forcing early failures of marginal drives it doesn't matter either. For pre-clearing a disk you obviously need to write zeroes. There are some scenarios, e.g. a controller circuit on the drive having issues or a fake SSD with less capacity than advertised, where using a pseudo-random sequence would be far superior to using zeroes. But in that case you would need reproducible pseudo-random sequences so that you only need to store the seed between write / read and not the sequence itself. Your proposal to create a block of random data and repeatedly write / read that block would not detect issues with addressing. I do however think that those scenarios are beyond the intention of this plugin. There are already tools for that kind of tests. Pre-clearing was very important when Unraid was not able to clear a disk while keeping the array available. This has changed, but the plugin can still be used to stress-test a drive before adding it to the array.
  3. Excellent, I'l use it with the next swap. And with the current one I have learned a few new things about encrypted drives.
  4. blkid did not show any duplicate UUIDs. I changed the duplicate UUID of the encrypted partition with "cryptsetup luksUUID...", but I didn't think about the XFS file system becoming visible once the encrypted partition is unlocked. So here is write-up of what I did ultimately to get access in case someone else gets into the same situation Goal: Access an encrypted drive taken out of the array in UD on the same system. Issue: If this was due to a swap and the drive taken out has been reconstructed, there will be UUID conflicts that prevent mounting. This is because the reconstructed drive is a true clone including UUIDs. So one needs to change the UUIDs of the drive no longer being in the array. In my case the UD drive is /dev/sdv First generate two new UUIDs root@Tower:/mnt# uuidgen 2284b0b6-eead-4d1c-adf0-58efb36085e2 root@Tower:/mnt# uuidgen 980b09f3-ced6-4fc4-8875-f94086100f39 Use the first one with luksUUID command to change the first conflicting UUID root@Tower:~# cryptsetup luksUUID --uuid=2284b0b6-eead-4d1c-adf0-58efb36085e2 /dev/sdv1 WARNING! ======== Do you really want to change UUID of device? Are you sure? (Type 'yes' in capital letters): YES Now the encrypted partition needs to be unlocked in order to get access at the XFS file system UUID that also needs to be changed using the second UUID generated above. root@Tower:/mnt# /usr/sbin/cryptsetup --verbose luksOpen /dev/sdt1 UAdisklabel Enter passphrase for /dev/sdt1: Key slot 0 unlocked. Command successful. root@Tower:/mnt# ls /dev/mapper/ control md10@ md11@ md12@ md13@ md14@ md2@ md3@ md4@ md5@ md6@ md7@ md8@ md9@ UAdisklabel@ root@Tower:/mnt# xfs_admin -U 980b09f3-ced6-4fc4-8875-f94086100f39 /dev/mapper/UAdisklabel Clearing log and setting UUID writing all SBs new UUID = 980b09f3-ced6-4fc4-8875-f94086100f39 In order to allow mounting via UD, I used the same label ("UAdisklabel") as UD uses for that drive. Helpful additional commands: blkid cryptsetup luksUUID /dev/sdv1 xfs_admin -lu /dev/mapper/UAdisklabel blkid lists the UUIDs visible for the system. Initially it does not show the XFS file system UUID because that one is not yet accessible. luksUUID displays the UUID of the encrypted partition xfs_admin shows the UUID of the unmounted XFS partition after unlocking Now I can mount the previous array drive in UD and access / change its content. I hope it helps and if someone with deeper knowledge would do a sanity check of what I wrote, it would not hurt.
  5. I did, the diagnostics in my previous post to which you replied were the ones after changing the UUID manually and rebooting. https://forums.unraid.net/topic/92462-unassigned-devices-managing-disk-drives-and-remote-shares-outside-of-the-unraid-array/?do=findComment&comment=935227 But here is a fresh set: tower-diagnostics-20210116-1333.zip
  6. Thanks Thanks again, I really appreciate the efforts you put into the UD plugins. Now even though I have first changed the UUID via CLI and then rebooted the server, UD does not mount the encrypted disk. Any idea?
  7. Here they are. Please note that in the mean time I have changed the conflicting UUID manually (/dev/sdt). However UD still does not show any disk under "Change Disk UUID". By the way and completely unrelated, when searching this thread for information regarding LUKS and UD I got a bit confused regarding LUKS and SSDs. In your first post it is stated first that "SSD disks formatted with xfs, btrfs, or ext4 will be mounted with 'discard'. This includes encrypted disks." Then further down in the same post it is said that "Discard is disabled on an encrypted SSD because of potential security concerns. Fstrim will fail." Finally a post much later contains this: "Add '--allow-discards' to luks open when an encrypted disk is a SSD so discard and trim will work on the disk." What is the current status regarding SSD / discard / encryption? tower-diagnostics-20210115-0306.zip
  8. Makes sense and is correct. Partition 1 however has a different GUID on the rebuilt drive. Is this Unraid's work when resizing the partition after the rebuild? For some reason it didn't. The list of available drives for changing the GUID in UD is empty. Any idea?
  9. I have a question regarding UD / encrypted disks. I replaced one of the encrypted array data drives with larger one. After rebuild had successfully finished, I plugged the previous data drive into an empty slot and wanted to mount it. However it does not mount. It obviously has the same password as the array and the array is mounted. UD displays "luks" as file system for both the drive as well as partition 1. Mount button for the drive is clickable, for partition 1 it is greyed out. If I click on mount, UD spends some seconds doing something, the disk log adds one line of "/usr/sbin/cryptsetup luksOpen /dev/sdv1 TOSHIBA_HDWN180_xxxxxxxxxxxxx", but the partition does not mount. Is there a misunderstanding on my side or is something wrong that I need to troubleshoot? Unraid 6.8.3, UD 2021.01.09
  10. Thanks, the array is now rebuilding the missing drive (disk12). I also observe the head load count because in my opinion it is excessive. For the busy array drives it currently remains stable, but for the idling UA drives (not mounted) it continues to increase (S.M.A.R.T 192 & 193 increase about 3 per hour). It is known that WD Green drives aggressively park their heads, but these are HGST data center drives. Looking at the high values in the counters, all drives seem to do this, when inactive. Is this normal? tower-diagnostics-20200421-0920.zip
  11. Thanks, I will. I'd like to do that it maintenance mode so that I can be sure that there are no writes to the array during rebuild. But I would like to have read access. Is there a way to do that? If not, can I mount individual drives with mount -o ro /dev/sdX1 /x or does that interfere with the rebuild process?
  12. There were no errors during the parity sync. I ran the parity check in non-correcting mode and it terminated with zero errors.
  13. I have started to use two 16 TB drives from Seagate, one of each for the parity drives: - Seagate Exos X16 ST16000NM001G 16 TB SATA - Seagate Exos X16 ST16000NM002G 16 TB SAS I did chose Seagate simply because they are the only ones from which it is currently easy to get 16 TB drives in the open market. The competitors seem to sell everything above 14 TB almost exclusively to data center operators and storage vendors. Interestingly the SAS drives at that size are sometimes cheaper than the SATA, it seems that they sell more of those. I chose one of each, hoping to minimise the possibility that they come from the same manufacturing lot. While they don't provide the best TB/$ relationship if you look only at the drive itself, for me they have become cost efficient when I also take the cost of the disk slot into account (hot swap bay plus controller port). They were recognized, I could do a full disk write/read with the Preclear plugin and recalculate parity, which obviously took longer than ever, as mentioned by itimpi. Controller is an LSI 9201-16i (now Broadcom). So far I did not have any issues with the new drives. Alas shortly after the upgrade a 12 TB data drive (form HGST) started to create issues and I shut down the server, so there is not a lot of runtime experience I could share. If somebody with more knowledge regarding "pending sectors" and "uncorrected errors" and disk layout could have look and give me some tip, I would be grateful :-) https://forums.unraid.net/topic/89193-parity-swap-procedure-with-dual-parity/?do=findComment&comment=845421
  14. While I originally wanted to avoid increasing the array size and therefore was interested in the array swap procedure, I ultimately decided to add another controller and increase the number of drives. Therefore I just put in larger parity drives and recalculated parity. For that I mounted the array in maintenance mode, so that If a drive would fail during the recalculation, I still would have the old parity drives for reconstruction. Now two things are worrying me. 1. Parity does not match between old a new drives. In order to learn more about Unraid as well as a to be sure everything went well, I did a binary compare of some parts between the previous and the replacement parity drive, assuming that parity calculation should result in the same bytes. This was not the case. Therefore I have the following questions: Is there somewhere a description of the disk layout of the parity drives? I had assumed that the parity drives are just a binary blob, but they seem to be at least partitioned. Correct? Is the assumption correct, that except for the first sectors the parity bytes should be the same as for the previous drives? If the above is correct, where does the parity data start? 2. Drive Errors Parity reconstruction took about three days and finished without any warnings. Then I started a parity check requiring another three days, it reported zero errors. Only 35 hours later however I got notifications for "offline uncorrectable" and "current pending sector" (see screenshot) from a data drive not physically touched during the parity upgrade. Seven days later there is yet another warning. During all this time, I left the array in maintenance mode because I wanted to observe how the situation evolves before writing to it again. Then I downloaded the S.M.A.R.T info for the drive (tower-smart-20200412-2324-1.zip) , started an extended test and did the same again (tower-smart-20200412-2324-2.zip). The extended test came back immediately, while I expected it to take some time. After that, I immediately shut the server down, alas without doing diagnostics first. Questions: I did not find any definitive answer, what "Current pending sector count" and "Offline scan uncorrectable count" precisely mean. Google hits resulted in conflicting answers and the documentation from the T13 Technical Committee just says "Number of unstable sectors (waiting for remapping)" for the former, "Number of uncorrected errors" for the latter. There are 48 pending and 6 uncorrectable sectors, but no reallocated ones, which I don't fully understand. So it's not clear to me, whether the pending sectors already signify data loss, but the uncorrected errors definitively do (assuming they contained data, of course). Agree? I assume, that I can no longer trust this drive and have to restore data from parity. Agree? Given the timing, can I trust the new parity drives or could it be that the sector read errors already happened during parity recalculations and were just reported a few days later on? In other words, is it better to re-install the previous parity drives or can I continue with the new ones for restoring the drive? I did a parity check before taking out the old parity drives. Result was zero corrections, just as with every previous check. Does it make sense to zero every sector of the old drive once it has been replaced and see whether the pending sectors disappear without being reallocated (outside of the array, of course)? The hypothesis here is that the read errors could have been the result of something having gone wrong during writing and that there is nothing physically wrong with the drive. If sectors get reallocated, the drive is damaged, otherwise it was a glitch. The drive has a very high number of head parking / loading: 12121 for only 12 start/stop counts. Being an enterprise drive I wouldn't expect it to aggressively park its heads in order to conserve energy. Is there a feature In Unraid responsible for that or is it firmware related? A lot of questions, I know and I will be grateful for anyone able to provide answers to some of those. tower-smart-20200412-2324-1.zip tower-smart-20200412-2324-2.zip
  15. Hello, I need to upgrade my two parity drives with larger disks in oder to be able to use larger data drives in the future. The current parity drives shall then replace the smallest data drives. Is the parity swap procedure described here (https://wiki.unraid.net/The_parity_swap_procedure) still supported with Unraid 6.8.2 and dual parity array and if yes, can I do both swaps at the same time? It would obviously be faster to just copy the current parity drives to the new ones and filling the remaining area on the parity disks with the correct bits using the swap procedure than to recreate parity by reading the whole array, but I have some doubts that it works for dual parity arrays. Thx for feedback