NVMe GUI integration - Partition format & SMART Info


dAigo

Recommended Posts

Partition format:

GUI shows "Partition format:    unknown", while it should be "GPT: 4K-aligned".

 

 

SMART Info

Back in 6.2 Beta 18, unRAID started support for NVMe devices as cache/pool-disk.

Beta 22 got the latest version of smartmontools (6.5)

Smartmontools supports NVMe starting from version 6.5.

Please note, that currently NVMe support is considered as experimental.

 

While the CLI output of smartctl definitly improved, GUI still has no SMART info. Which is ok, for an "experimental feature".

It even gives errors in the console, while  browsing through the Disk-Info of the GUI, which was reported HERE, but got no answer it seems.

I have not seen any hint in the 6.3 notes, so I asume, nothing will change. Probably low priority.

 

I think the issue is a wrong smartctl command issued through the WebGUI.

In my case, the NVMe disk is the cache disk and the gui identifies it as "nvme0n1", which is not wrong, but does not work with smartmontools...

 

root@unRAID:~# smartctl -x /dev/nvme0n1

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE2MW400G4
Serial Number:                      CVCQ5130003F400CGN
Firmware Version:                   8EV10171
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          400,088,457,216 [400 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue Nov 15 20:21:12 2016 CET
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +    25.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         2
1 -     512       8         2
2 -     512      16         2
3 -    4096       0         0
4 -    4096       8         0
5 -    4096      64         0
6 -    4096     128         0

=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x02

 

Don't ask me why, but if you use the "NVMe character device (ex: /dev/nvme0)" instead of the "namespace block device (ex: /dev/nvme0n1)" there is a lot more information.

Like "SMART overall-health self-assessment test result: PASSED"

 

root@unRAID:~# smartctl -x /dev/nvme0

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE2MW400G4
Serial Number:                      CVCQ5130003F400CGN
Firmware Version:                   8EV10171
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          400,088,457,216 [400 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Tue Nov 15 20:37:58 2016 CET
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +    25.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         2
1 -     512       8         2
2 -     512      16         2
3 -    4096       0         0
4 -    4096       8         0
5 -    4096      64         0
6 -    4096     128         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        24 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    1%
Data Units Read:                    18,412,666 [9.42 TB]
Data Units Written:                 18,429,957 [9.43 TB]
Host Read Commands:                 224,748,225
Host Write Commands:                233,072,991
Controller Busy Time:               0
Power Cycles:                       152
Power On Hours:                     8,784
Unsafe Shutdowns:                   1
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          2     1       -  0x400c      -            0     -     -
  1          1     1       -  0x400c      -            0     -     -

 

According to NVME Specs (Page 92-94) the output does contain all "mandatory" information. (so it "should work" regardless of the vendor...)

Its probably a PITA, but I guess there are not many common information between SATA/IDE and NVMe Smart Infos... which means the GUI needs some reworking in that regard.

 

I guess "SMART overall-health self-assessment test result", "Critical Warning", "Media and Data Integrity Errors" and "Error Information Log Entries" would be most usefull in terms of health info and things that could raise an alert/notification.

Depending on the amount of work it needs, maybe monitoring "Spare Threshold" as an indication of a "soon to fail" drive (like reallocated events/sectors).

 

For science:

I changed the "$port" variable in "smartinfo.php" to a hardcoded "nvme0", ignoring any POST values.

See attachments for the result... not that bad for the quickest and most dirty solution I could think of ;)

Attributes_with_nvme0.PNG.12fd6b5c06b0fcda68eea13cd2b253ec.PNG

Identity_with_nvme0.PNG.de1bdd98d9d6d901d4d0ad02a5706a7f.PNG

unraid-smart-20161115-2020.zip

unraid-smart-20161115-2220_with_nvme0.zip

Link to comment

I'm interpreting your results a little differently, and I could be wrong, but it looks like the report may have aborted in both cases, more obviously in the first case.  From what I read, the symbol /dev/nvme0 is the broadcast address, for all devices associated with nvme0, and /dev/nvme0n1 is the specific device.  That results in different report results.

 

More importantly, I don't see the device type listed in the identity info, so I think you need to specify that with the -d option, -d nvme.  Try using the command smartctl -a -d nvme /dev/nvme0n1.  If you prefer, use -x instead of -a.

Link to comment

I'm interpreting your results a little differently, and I could be wrong, but it looks like the report may have aborted in both cases, more obviously in the first case.  From what I read, the symbol /dev/nvme0 is the broadcast address, for all devices associated with nvme0, and /dev/nvme0n1 is the specific device.  That results in different report results.

Thats why I said "don't ask me why" it works :) I am under the same impression as you are. Maybe its a smartmontools bug (experimental after all) but clearly nvm0 gives correct results in regard of the nvme-logs that are specified in the official specs.

Currently the GUI ALWAYS states "Unavailable - disk must be spun up" under "Last SMART Results". But I am writng this on a vm thats running on the cache disk, so its NOT spun down. That also changed with nvm0, but the buttons for smart tests were still greyed out.

 

I am not sure, that nvme devices even works the same as old devices. It was designed for flash memory, so most of the old SMART mechanics are useless, because sectors and read/write operations are handled differently.

Not sure a "Self Test" makes sense on a flash drive due to that reason... why running a self test, if the specs makes it mandatory to log every error in a log? As long as there is free "Spare" space on the disk, "bad/worn out" flash cells should be "replaced".

 

More importantly, I don't see the device type listed in the identity info, so I think you need to specify that with the -d option, -d nvme.  Try using the command smartctl -a -d nvme /dev/nvme0n1.  If you prefer, use -x instead of -a.

 

Same result, nothing changed as far as I can see.

root@unRAID:~# smartctl -a -d nvme /dev/nvme0n1
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE2MW400G4
Serial Number:                      CVCQ5130003F400CGN
Firmware Version:                   8EV10171
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          400,088,457,216 [400 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Nov 16 07:06:07 2016 CET
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +    25.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         2
1 -     512       8         2
2 -     512      16         2
3 -    4096       0         0
4 -    4096       8         0
5 -    4096      64         0
6 -    4096     128         0

=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x02

root@unRAID:~# smartctl -a -d nvme /dev/nvme0
smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.4.30-unRAID] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE2MW400G4
Serial Number:                      CVCQ5130003F400CGN
Firmware Version:                   8EV10171
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
Number of Namespaces:               1
Namespace 1 Size/Capacity:          400,088,457,216 [400 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Nov 16 07:06:11 2016 CET
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0006):   Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +    25.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         2
1 -     512       8         2
2 -     512      16         2
3 -    4096       0         0
4 -    4096       8         0
5 -    4096      64         0
6 -    4096     128         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        23 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    1%
Data Units Read:                    18,416,807 [9.42 TB]
Data Units Written:                 18,433,008 [9.43 TB]
Host Read Commands:                 224,801,960
Host Write Commands:                233,126,889
Controller Busy Time:               0
Power Cycles:                       152
Power On Hours:                     8,794
Unsafe Shutdowns:                   1
Media and Data Integrity Errors:    0
Error Information Log Entries:      0

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0          2     1       -  0x400c      -            0     -     -
  1          1     1       -  0x400c      -            0     -     -

Link to comment

Thank you for trying that!  :)  Apparently it auto-recognized the nvme type, and didn't need it specified.  Wish some of the other types could do that.

 

I believe we're going to need a newer smartmontools, fully capable of reading all of the SMART info.  Trying to use the vendor-specific log pages is a non-starter I think, because it means custom programming for each vendor.  And if SMART history is any guide, it will mean custom changes for the different models too, even from the same vendor.  The Intel document you linked shows that a true SMART attribute table is available, and that's what all SMART software is designed for.  Hopefully a newer smartctl will handle it correctly.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.