Squid Posted July 30, 2017 Share Posted July 30, 2017 (edited) Let me preface this by stating that I am one of the users who has absolutely zero problems with Marvel controllers. I run VM's with passthrough, I have never had a drive randomly drop offline, and I've never suffered from any corruption on the drives connected to the controllers. I have always maintained that anyone who suffers from the above problems it is either a specific hardware combination causing it or they have sniffed too much glue, while other users maintain that it is a driver issue, and to avoid Marvel controllers like the plague. Yesterday after thinking about the problems a bit more and recalling other problems a couple of years ago, I have successfully managed to recreate at will one of the symptoms (and possibly two) that users with, and have narrowed it down to a specific piece of hardware. Back History At the start of the v6 series, a number of users complained quite vocally about parity check slowdowns (significant) when using a supermicro SAS2LP. Once again, I did not suffer from this problem. (Because any drives I had that would have caused this issue were not installed on the SAS2LP) TLDR: Users suffering from slowdowns on when using the SAS2LP had one or more drives connected to the HBA that as the ATA version (you can see that ATA version by looking at the Identity Tab when you click on the drive from main) as ATA8-ACS @limetech @eschultz @jonp introduced a tunable into the system (nr_requests) to fix this. However, Tom didn't particularly like that solution, so he tweaked some driver code (https://forums.lime-technology.com/topic/40944-partially-solved-is-there-an-effort-to-solve-the-sas2lp-issue-tom-question/?page=16#comment-414289) to solve the problem without resorting to having the user change the tunable. This worked, and solved the parity check slowdowns for affected users. Today's Problems Coinciding with the fix for the slowdowns being introduced into the system new problems started to appear with Marvel controllers that no one related back to the original slowdown issues: Recurring 5 parity errors being corrected with every correcting parity check Drives randomly dropping offline Corruption randomly occuring on drives When IOMMU / AMD-Vi enabled above problems could get worse. I have managed to be able to replicate the recurring 5 parity errors on my secondary server at will by rearranging some hardware. My secondary server under normal circumstances has its 3 TB hard drives connected to the motherboard (hold over from when that server was utilizing a Br10i controller). But, but placing its 3TB drives onto the SAS2LP now installed into it, from a fresh power on (clean shutdown), I have this: Jul 28 12:03:45 Server_B kernel: md: recovery thread: P corrected, sector=1565565768 Jul 28 12:03:45 Server_B kernel: md: recovery thread: P corrected, sector=1565565776 Jul 28 12:03:45 Server_B kernel: md: recovery thread: P corrected, sector=1565565784 Jul 28 12:03:45 Server_B kernel: md: recovery thread: P corrected, sector=1565565792 Jul 28 12:03:45 Server_B kernel: md: recovery thread: P corrected, sector=1565565800 A subsequent parity check turns up zero errors. Perform a clean powerdown, restart the computer, and a new correcting parity check shows this: Jul 29 18:05:02 Server_B kernel: md: recovery thread: P corrected, sector=1565565768 Jul 29 18:05:02 Server_B kernel: md: recovery thread: P corrected, sector=1565565776 Jul 29 18:05:02 Server_B kernel: md: recovery thread: P corrected, sector=1565565784 Jul 29 18:05:02 Server_B kernel: md: recovery thread: P corrected, sector=1565565792 Jul 29 18:05:02 Server_B kernel: md: recovery thread: P corrected, sector=1565565800 Note that the 5 errors are on the exact same sectors. The recurring 5 parity check errors only happen after restarts. If the 5 errors are corrected, then subsequent parity checks are clean so long as the system has no been reset. Once I rearrange the drives back to their original controllers, the 5 parity check errors on clean starts are gone forever. The drives that I've managed to replicate this on are these: ST3000DM001-1CH166 and ST3000DM001-1CH166 (both installed simultaneously to the SAS2LP) From the Identity Tab: Model family: Seagate Barracuda 7200.14 (AF) Device model: ST3000DM001-1CH166 Serial number: Z1F1Q0L2 LU WWN device id: 5 000c50 04f033b3a Firmware version: CC24 User capacity: 3,000,592,982,016 bytes [3.00 TB] Sector sizes: 512 bytes logical, 4096 bytes physical Rotation rate: 7200 rpm Form factor: 3.5 inches Device: In smartctl database [for details use: -P show] ATA version: ATA8-ACS T13/1699-D revision 4 SATA version: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local time: Sun Jul 30 10:20:50 2017 EDT SMART support: Available - device has SMART capability. SMART support: Enabled SMART overall-health: Passed Note that the ATA Version is ATA8-ACS However, the ATA version in and by itself is not the cause, as I have other drives utilizing that version, but they are all less than 3TB. Should also be noted that Seagate themselves have updated the ST3000DM001's to not utilize ATA8-ACS. I have other ST3000DM001's that do not use that interface in my primary server: Model family: Seagate Barracuda 7200.14 (AF) Device model: ST3000DM001-1CH166 Serial number: Z1F33KPN LU WWN device id: 5 000c50 050a62f10 Firmware version: CC27 User capacity: 3,000,592,982,016 bytes [3.00 TB] Sector sizes: 512 bytes logical, 4096 bytes physical Rotation rate: 7200 rpm Form factor: 3.5 inches Device: In smartctl database [for details use: -P show] ATA version: ACS-2, ACS-3 T13/2161-D revision 3b SATA version: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local time: Sun Jul 30 10:27:19 2017 EDT SMART support: Available - device has SMART capability. SMART support: Enabled SMART overall-health: Passed Assumptions (And this is an assumption - almost a leap of faith) While these results aren't exactly scientific, it is a decent starting point for trying to figure out a fix / why certain people are affected. Based upon my results above, (in conjunction with the fact that my primary server was originally running the SAS2LP with zero problems (no ATA8-ACS drives connected to it) the initial assumptions would be: Drives 3TB+ that utilize ATA8-ACS when connected to a Marvel Controller will give you the instability problems that some users have Seagate ST3000DM001's that utilize ATA8-ACS may be able to be have their firmware upgraded to remove that interface from the drive (the firmware versions above do differ) If you suffer from problems with Marvel Controllers, removing any drives (especially 3TB+) that utilize ATA8-ACS from the controller and instead placing them on the motherboard may solve your problems. The problems some users have with Marvel Controllers may or may not have been introduced by the code changes made by Limetech to solve the parity check slowdown issues. If you do not have any ATA8-ACS drives connected to a Marvel Controller, you will not have any issues at all) While this isn't the end-all-be-all diagnosis of the issues (I do have better things to do than run parity check after parity check after parity check), it does at least somewhat prove that it is certain hardware combinations (in this case the drives themselves) that are causing the issues, and how to possibly work around them without having to invest any money in an expensive LSI controller card. And if anyone is going to use this as a chance to bash Seagate (the best hard drives in the world), that is very premature, and I also have Hitachi drives (albeit < 3TB) that utilize ATA8-ACS, and it appears that only early ST3000DM001's used ATA8-ACS and that a firmware update to the drives may also fix the problem (not tested) Edited July 30, 2017 by Squid 1 2 Quote Link to comment
JorgeB Posted July 30, 2017 Share Posted July 30, 2017 Interesting findings, though I'm not sure that the repeatable parity errors and disks dropping offline are necessarily related, I just checked a couple of old treads from users with dropped disks, one user with the SASLP and another with the SAS2LP, and in both cases the dropped disks weren't ATA8-ACS, still worth investigating but I maintain my recommendation, replace any SASLP/SAS2LP with an LSI controller because IMO they are a ticking time bomb. Quote Link to comment
Squid Posted July 30, 2017 Author Share Posted July 30, 2017 5 minutes ago, johnnie.black said: Interesting findings, though I'm not sure that the repeatable parity errors and disks dropping offline are necessarily related, I just checked a couple of old treads from users with dropped disks, one user with the SASLP and another with the SAS2LP, and in both cases the dropped disks weren't ATA8-ACS, still worth investigating but I maintain my recommendation, replace any SASLP/SAS2LP with an LSI controller because IMO they are a ticking time bomb. Understand completely. Hence why I labelled the topic starting point for investigation. To my knowledge, I'm the first who was able to replicate some of the issues and point the finger at a certain piece of hardware. Not going to set up one of my production servers such that I think it might fail so that I can continue an investigation. But with a possible culprit found, then people with more time on their hands can truly begin to find what the root cause is. Quote Link to comment
JorgeB Posted July 30, 2017 Share Posted July 30, 2017 (edited) Also worth noting that those repeatable parity error result in data corruption, according to the tests done by S80_UK, so if users get those they should really get rid of them. Edited July 30, 2017 by johnnie.black Quote Link to comment
Vr2Io Posted July 30, 2017 Share Posted July 30, 2017 (edited) I have different think of this issue, my Marvell (9215) add-on card got SATA interface error, the problem could easy reproduce during disk write ( should be not happen in read ) and just some WD / Seagate happen ( they are all 3TB ). Same system haven't problem if use LSI / Asmedia controller. But I have a QNAP ( also 9215 x2 8-Bays ) which running unRAID with those problem disk never got such problem, QNAP's product use lot of Marvell SATA controller. I try got the Marvell firmwae update from ASROCK, but the link seems broken, btw not use Marvell now. Edited July 30, 2017 by Benson Quote Link to comment
Squid Posted July 30, 2017 Author Share Posted July 30, 2017 4 minutes ago, Benson said: and just some WD / Seagate happen ( they are all 3TB ). And that's exactly my point. Would be very interesting for you to post your diagnostics (even though you aren't using a Marvel anymore) to see if those drive(s) utilize ATA8-ACS, and why I'm suggesting that its entirely possible that @limetech's fix for one issue inadvertently created another one Quote Link to comment
Vr2Io Posted July 30, 2017 Share Posted July 30, 2017 FYR, Actually Toshiba also got same problem but never happen under QNAP with unRAID. Model family: Western Digital Green Device model: WDC WD30EZRX-00DC0B0 Serial number: WD-WMC1T3342970 LU WWN device id: 5 0014ee 058df2227 Firmware version: 80.00A80 User capacity: 3,000,592,982,016 bytes [3.00 TB] Sector sizes: 512 bytes logical, 4096 bytes physical Device: In smartctl database [for details use: -P show] ATA version: ACS-2 (minor revision not indicated) SATA version: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local time: Mon Jul 31 00:37:27 2017 CST 194 Temperature celsius 0x0022 122 102 000 Old age Always Never 28 196 Reallocated event count 0x0032 200 200 000 Old age Always Never 0 197 Current pending sector 0x0032 200 200 000 Old age Always Never 0 198 Offline uncorrectable 0x0030 200 200 000 Old age Offline Never 0 199 UDMA CRC error count 0x0032 200 200 000 Old age Always Never 14 200 Multi zone error rate 0x0008 200 200 000 Old age Offline Never 0 Model family: Seagate Barracuda 7200.14 (AF) Device model: ST3000DM001-1CH166 Serial number: W1F2VKGA LU WWN device id: 5 000c50 060948b00 Firmware version: CC24 User capacity: 3,000,592,982,016 bytes [3.00 TB] Sector sizes: 512 bytes logical, 4096 bytes physical Rotation rate: 7200 rpm Form factor: 3.5 inches Device: In smartctl database [for details use: -P show] ATA version: ATA8-ACS T13/1699-D revision 4 SATA version: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local time: Mon Jul 31 00:35:15 2017 CST 194 Temperature celsius 0x0022 029 084 000 Old age Always Never 29 (0 11 0 0 0) 197 Current pending sector 0x0012 100 100 000 Old age Always Never 0 198 Offline uncorrectable 0x0010 100 100 000 Old age Offline Never 0 199 UDMA CRC error count 0x003e 200 200 000 Old age Always Never 14 240 Head flying hours 0x0000 100 253 000 Old age Offline Never 3081h+49m+35.636s 241 Total lbas written 0x0000 100 253 000 Old age Offline Never 93801331818 242 Total lbas read 0x0000 100 253 000 Old age Offline Never 260885704609 Model family: Western Digital Green Device model: WDC WD30EZRX-00DC0B0 Serial number: WD-WMC1T2755139 LU WWN device id: 5 0014ee 603307d59 Firmware version: 80.00A80 User capacity: 3,000,592,982,016 bytes [3.00 TB] Sector sizes: 512 bytes logical, 4096 bytes physical Device: In smartctl database [for details use: -P show] ATA version: ACS-2 (minor revision not indicated) SATA version: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local time: Mon Jul 31 00:41:24 2017 CST 194 Temperature celsius 0x0022 118 105 000 Old age Always Never 32 196 Reallocated event count 0x0032 200 200 000 Old age Always Never 0 197 Current pending sector 0x0032 200 200 000 Old age Always Never 0 198 Offline uncorrectable 0x0030 200 200 000 Old age Offline Never 0 199 UDMA CRC error count 0x0032 200 200 000 Old age Always Never 4 200 Multi zone error rate 0x0008 200 200 000 Old age Offline Never 0 Model family: Toshiba 3.5" DT01ACA... Desktop HDD Device model: TOSHIBA DT01ACA300 Serial number: 25EEEPXGS LU WWN device id: 5 000039 ff4f062ca Firmware version: MX6OABB0 User capacity: 3,000,592,982,016 bytes [3.00 TB] Sector sizes: 512 bytes logical, 4096 bytes physical Rotation rate: 7200 rpm Form factor: 3.5 inches Device: In smartctl database [for details use: -P show] ATA version: ATA8-ACS T13/1699-D revision 4 SATA version: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local time: Mon Jul 31 00:43:56 2017 CST 194 Temperature celsius 0x0002 187 187 000 Old age Always Never 32 (min/max 13/47) 196 Reallocated event count 0x0032 100 100 000 Old age Always Never 0 197 Current pending sector 0x0022 100 100 000 Old age Always Never 0 198 Offline uncorrectable 0x0008 100 100 000 Old age Offline Never 0 199 UDMA CRC error count 0x000a 200 200 000 Old age Always Never 26 --------------------------------------------------------------- Another Toshiba disk, no counter 199 error Model family: Toshiba 3.5" DT01ACA... Desktop HDD Device model: TOSHIBA DT01ACA300 Serial number: 66T1SL8AS LU WWN device id: 5 000039 fe6c0ccec Firmware version: MX6OABB0 User capacity: 3,000,592,982,016 bytes [3.00 TB] Sector sizes: 512 bytes logical, 4096 bytes physical Rotation rate: 7200 rpm Form factor: 3.5 inches Device: In smartctl database [for details use: -P show] ATA version: ATA8-ACS T13/1699-D revision 4 SATA version: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local time: Mon Jul 31 00:46:37 2017 CST 194 Temperature celsius 0x0002 187 187 000 Old age Always Never 32 (min/max 20/48) 196 Reallocated event count 0x0032 100 100 000 Old age Always Never 0 197 Current pending sector 0x0022 100 100 000 Old age Always Never 0 198 Offline uncorrectable 0x0008 100 100 000 Old age Offline Never 0 199 UDMA CRC error count 0x000a 200 200 000 Old age Always Never 0 Quote Link to comment
Vr2Io Posted July 30, 2017 Share Posted July 30, 2017 (edited) All PCIe device should be under a PLX PCIe bridge as J1800 only have total 4 PCIe lane PCI Devices and IOMMU Groups [8086:0f00] 00:00.0 Host bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx Series SoC Transaction Register (rev 0e) [8086:0f31] 00:02.0 VGA compatible controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display (rev 0e) [8086:0f15] 00:11.0 SD Host controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series SDIO Controller (rev 0e) [8086:0f16] 00:12.0 SD Host controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series SDIO Controller (rev 0e) [8086:0f23] 00:13.0 SATA controller: Intel Corporation Atom Processor E3800 Series SATA AHCI Controller (rev 0e) [8086:0f35] 00:14.0 USB controller: Intel Corporation Atom Processor Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI (rev 0e) [8086:0f50] 00:17.0 SD Host controller: Intel Corporation Atom Processor E3800 Series eMMC 4.5 Controller (rev 0e) [8086:0f40] 00:18.0 DMA controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO2 DMA Controller (rev 0e) [8086:0f41] 00:18.1 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO2 I2C Controller #1 (rev 0e) [8086:0f42] 00:18.2 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO2 I2C Controller #2 (rev 0e) [8086:0f43] 00:18.3 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO2 I2C Controller #3 (rev 0e) [8086:0f44] 00:18.4 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO2 I2C Controller #4 (rev 0e) [8086:0f45] 00:18.5 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO2 I2C Controller #5 (rev 0e) [8086:0f46] 00:18.6 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO2 I2C Controller #6 (rev 0e) [8086:0f47] 00:18.7 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO2 I2C Controller #7 (rev 0e) [8086:0f18] 00:1a.0 Encryption controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Trusted Execution Engine (rev 0e) [8086:0f04] 00:1b.0 Audio device: Intel Corporation Atom Processor Z36xxx/Z37xxx Series High Definition Audio Controller (rev 0e) [8086:0f48] 00:1c.0 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 1 (rev 0e) [8086:0f4a] 00:1c.1 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 2 (rev 0e) [8086:0f4c] 00:1c.2 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 3 (rev 0e) [8086:0f4e] 00:1c.3 PCI bridge: Intel Corporation Atom Processor E3800 Series PCI Express Root Port 4 (rev 0e) [8086:0f06] 00:1e.0 DMA controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO1 DMA Controller (rev 0e) [8086:0f08] 00:1e.1 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO1 PWM Controller (rev 0e) [8086:0f09] 00:1e.2 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO1 PWM Controller (rev 0e) [8086:0f0a] 00:1e.3 Communication controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO1 HSUART Controller #1 (rev 0e) [8086:0f0c] 00:1e.4 Communication controller: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO1 HSUART Controller #2 (rev 0e) [8086:0f0e] 00:1e.5 Serial bus controller [0c80]: Intel Corporation Atom Processor Z36xxx/Z37xxx Series LPIO1 SPI Controller (rev 0e) [8086:0f1c] 00:1f.0 ISA bridge: Intel Corporation Atom Processor Z36xxx/Z37xxx Series Power Control Unit (rev 0e) [8086:0f12] 00:1f.3 SMBus: Intel Corporation Atom Processor E3800 Series SMBus Controller (rev 0e) [1b4b:9215] 01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9215 (rev 11) [1b4b:9215] 02:00.0 SATA controller: Marvell Technology Group Ltd. Device 9215 (rev 11) [10b5:8603] 03:00.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab) [10b5:8603] 04:01.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab) [10b5:8603] 04:02.0 PCI bridge: PLX Technology, Inc. PEX 8603 3-lane, 3-Port PCI Express Gen 2 (5.0 GT/s) Switch (rev ab) [8086:1533] 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) [8086:1533] 06:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) Edited July 30, 2017 by Benson Quote Link to comment
BobPhoenix Posted July 31, 2017 Share Posted July 31, 2017 (edited) Here is the smart report for a drive I had problems with on my MB Marvel 9230 controller. I had the marvel controller passed through to a WHSv1 VM and it would drop this or one of the other 3 identical drives all the time. I had to reboot the server to get the controller back. That is why I first got a LSI 9201-16i controller. That way I could pass one of the other MB controllers through to my WHS v1 VM. I see it isn't exactly in the best shape but I am only recording local news on it now so not terribly important to me and the last ATA error was at ~1/3 it's current age. smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.9.30-unRAID] (local build) Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD20EFRX-68AX9N0 Serial Number: WD-WMC300xxxxxxx LU WWN Device Id: 5 0014ee 6adbc128d Firmware Version: 80.00A80 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Jul 30 20:33:13 2017 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (25440) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 257) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x70bd) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 398 3 Spin_Up_Time 0x0027 165 163 021 Pre-fail Always - 4741 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1441 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 052 052 000 Old_age Always - 35132 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 199 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 135 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1305 194 Temperature_Celsius 0x0022 121 106 000 Old_age Always - 26 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 38788 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 38788 occurred at disk power-on lifetime: 10288 hours (428 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 61 0c 00 00 00 00 Device Fault; Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ef 03 0c 00 00 00 00 00 21:01:08.072 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:08.072 CHECK POWER MODE ec 00 00 00 00 00 00 00 21:01:08.072 IDENTIFY DEVICE ef 03 0c 00 00 00 00 00 21:01:07.822 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.822 CHECK POWER MODE Error 38787 occurred at disk power-on lifetime: 10288 hours (428 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 61 0c 00 00 00 00 Device Fault; Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ef 03 0c 00 00 00 00 00 21:01:07.822 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.822 CHECK POWER MODE ec 00 00 00 00 00 00 00 21:01:07.822 IDENTIFY DEVICE ef 03 0c 00 00 00 00 00 21:01:07.573 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.573 CHECK POWER MODE Error 38786 occurred at disk power-on lifetime: 10288 hours (428 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 61 0c 00 00 00 00 Device Fault; Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ef 03 0c 00 00 00 00 00 21:01:07.573 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.573 CHECK POWER MODE ec 00 00 00 00 00 00 00 21:01:07.573 IDENTIFY DEVICE ef 03 0c 00 00 00 00 00 21:01:07.323 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.323 CHECK POWER MODE Error 38785 occurred at disk power-on lifetime: 10288 hours (428 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 61 0c 00 00 00 00 Device Fault; Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ef 03 0c 00 00 00 00 00 21:01:07.323 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.323 CHECK POWER MODE ec 00 00 00 00 00 00 00 21:01:07.322 IDENTIFY DEVICE ef 03 0c 00 00 00 00 00 21:01:07.074 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.074 CHECK POWER MODE Error 38784 occurred at disk power-on lifetime: 10288 hours (428 days + 16 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 04 61 0c 00 00 00 00 Device Fault; Error: ABRT Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- ef 03 0c 00 00 00 00 00 21:01:07.074 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.074 CHECK POWER MODE ec 00 00 00 00 00 00 00 21:01:07.073 IDENTIFY DEVICE ef 03 0c 00 00 00 00 00 21:01:07.073 SET FEATURES [Set transfer mode] e5 00 00 00 00 00 00 00 21:01:07.073 CHECK POWER MODE SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Edited July 31, 2017 by BobPhoenix Quote Link to comment
Squid Posted July 31, 2017 Author Share Posted July 31, 2017 Its possible that simply having an ATA8-ACS installed would cause another drive to possibly drop. We don't know, just like we don't know in my tests what the origin of the 5 parity check errors was. Quote Link to comment
HellDiverUK Posted July 31, 2017 Share Posted July 31, 2017 I've never had any problems with the SAS2LP cards on unRAID. I tend to use WD drives, though. The Seagates I have are 6TB Ironwolfs, 8TB Archives and the venerable 4TB Desktop ST4000DM000. Quote Link to comment
srfnmnk Posted October 8, 2017 Share Posted October 8, 2017 Thanks for the analysis @Squid I am suffering from this issue too. I wanted to let you know that I during parity check I did have the same issue with ACS-2, ACS-3 T13/2161-D revision 3b. The drive dropped off just like the ATA8-ACS did. Also, I have a parity disk that is Toshiba using ATA8-ACS that also dropped off. Quote Link to comment
HellDiverUK Posted October 10, 2017 Share Posted October 10, 2017 Marvel issues? I blame Iron Man. Marvell issues on the other hand are probably the binary blobs or bad firmware versions. For example, I have issues with an elcheapo Marvell card, yet the identical chipset soldered to a Supermicro board has no issues at all. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.