eschultz

Version 6.3.0-rc4 Release Notes

Recommended Posts

LT should be able to set threads to 1 when AMD CPUs are detected.

 

This code already exists in the current RC, but it doesn't actively scan existing VMs that were created PRIOR to that fix being put in place.  As such, you need to click "Edit" and then "Update" on each of your existing VMs before starting them in the new version.  This will remove any custom XML you've applied, but it will also fix the topology section of the XML to properly use cores, not threads.

 

If anyone has an AMD system, please test this for us and reply back here with confirmation.

 

That's good to know. Didn't see anything about it in the release notes  ;)

 

It's actually been around for a while.  It's not even about detecting if you are using an AMD cpu, its really about detecting if the processor advertises to the OS that it has hyperthreading support.

Share this post


Link to post
Share on other sites

Wondering if you could add a patch to the next release, or the following?

 

Issue: Receiving "Failed to mmap 0000:0e:00.0 BAR 2. Performance may be slow" in my VM log with the use of a Fresco Logic FL1100 USB 3.0 Host Controller.

Problem is reported here, using the same card chipset http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.html

 

Detailed response from Alex Williamson:

    As reported in the link below, user has a PCI device with a 4KB BAR

    which contains the MSI-X table.  This seems to hit a corner case in

    the kernel where the region reports being mmap capable, but the sparse

    mmap information reports a zero sized range.  It's not entirely clear

    that the kernel is incorrect in doing this, but regardless, we need

    to handle it.  To do this, fill our mmap array only with non-zero

    sized sparse mmap entries and add an error return from the function

    so we can tell the difference between nr_mmaps being zero based on

    sparse mmap info vs lack of sparse mmap info.

   

    NB, this doesn't actually change the behavior of the device, it only

    removes the scary "Failed to mmap ... Performance may be slow" error

    message.  We cannot currently create an mmap over the MSI-X table.

 

Patch is detailed within a later post (scroll down) here https://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00023.html

 

If I should post this elsewhere, please let me know (it is related to RC4 as I see this issue while running it, but not directly).

Thanks.

 

Share this post


Link to post
Share on other sites

LT should be able to set threads to 1 when AMD CPUs are detected.

 

This code already exists in the current RC, but it doesn't actively scan existing VMs that were created PRIOR to that fix being put in place.  As such, you need to click "Edit" and then "Update" on each of your existing VMs before starting them in the new version.  This will remove any custom XML you've applied, but it will also fix the topology section of the XML to properly use cores, not threads.

 

If anyone has an AMD system, please test this for us and reply back here with confirmation.

 

Editing a Windows VM via the 'Edit' option selecting 2 cores (2 and 3) for assignment to the VM shows XML code of

 

  <topology sockets='1' cores='1' threads='2'/>

 

Editing a Windows VM via the 'Edit' option selecting 1 core (3) for assignment to the VM shows XML code of

 

    <topology sockets='1' cores='1' threads='1'/>

 

I was going to display the codes and show the difference when cores are manually edited, but after changing the VM back to 2 cores on the 'Edit' option and starting, my VM just crashed unRAID and as I was remoting in I now no longer have access. I will update when I have physical access to the server.

 

As my post in the CPU KVM Bug thread, I am positive that this is the cause of my unRAID instability and the weekly crash of the system, but as I cannot get logs for it then I cant investigate further.

 

 

 

 

Share this post


Link to post
Share on other sites

hard reboot done

 

manual edit of xml to

 

  <topology sockets='1' cores='2' threads='1'/>

 

results in a log of

 

2016-11-16 14:22:27.648+0000: starting up libvirt version: 1.3.1, qemu version: 2.7.0, hostname: MediaOne

LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/sbin/qemu -name Windows10_S -S -machine pc-i440fx-2.5,accel=kvm,usb=off,mem-merge=off -cpu host -m 16384 -realtime mlock=on -smp 2,sockets=1,cores=2,threads=1 -uuid 0cc6ef55-6f27-50ee-c5c5-6a9cb077177c -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-Windows10_S/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/mnt/cache/domains/Windows10_S/vdisk1.img,format=raw,if=none,id=drive-virtio-disk2,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=1 -drive file=/mnt/cache/domains/Windows10_S/vdisk2.img,format=raw,if=none,idr=0x8 -device usb-host,hostbus=3,hostaddr=2,id=hostdev2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -msg timestamp=on

Domain id=2 is tainted: high-privileges

Domain id=2 is tainted: host-cpu

char device redirected to /dev/pts/1 (label charserial0)

 

update of VM using the 'edit' menu and save (no changes made) results in xml being modified to

 

  <topology sockets='1' cores='1' threads='2'/>

 

click start VM - immediate crash of unRAID - no GUI/telnet - hard reboot required.

 

on RC3 the server would not lock up with the above configuration but i have not had a need to edit the VM so cannot confirm when this started to happen.

 

Share this post


Link to post
Share on other sites

hard reboot done

 

manual edit of xml to

 

  <topology sockets='1' cores='2' threads='1'/>

 

results in a log of

 

2016-11-16 14:22:27.648+0000: starting up libvirt version: 1.3.1, qemu version: 2.7.0, hostname: MediaOne

LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ QEMU_AUDIO_DRV=none /usr/local/sbin/qemu -name Windows10_S -S -machine pc-i440fx-2.5,accel=kvm,usb=off,mem-merge=off -cpu host -m 16384 -realtime mlock=on -smp 2,sockets=1,cores=2,threads=1 -uuid 0cc6ef55-6f27-50ee-c5c5-6a9cb077177c -nographic -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-Windows10_S/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device nec-usb-xhci,id=usb,bus=pci.0,addr=0x7 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/mnt/cache/domains/Windows10_S/vdisk1.img,format=raw,if=none,id=drive-virtio-disk2,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=1 -drive file=/mnt/cache/domains/Windows10_S/vdisk2.img,format=raw,if=none,idr=0x8 -device usb-host,hostbus=3,hostaddr=2,id=hostdev2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9 -msg timestamp=on

Domain id=2 is tainted: high-privileges

Domain id=2 is tainted: host-cpu

char device redirected to /dev/pts/1 (label charserial0)

 

update of VM using the 'edit' menu and save (no changes made) results in xml being modified to

 

  <topology sockets='1' cores='1' threads='2'/>

 

click start VM - immediate crash of unRAID - no GUI/telnet - hard reboot required.

 

on RC3 the server would not lock up with the above configuration but i have not had a need to edit the VM so cannot confirm when this started to happen.

Can you please run 'lscpu' from a terminal and post the output from that command here?

Share this post


Link to post
Share on other sites

 

Linux 4.8.7-unRAID.

1 failure since last login.

Last was Wed Nov 16 20:36:51 2016 on /dev/pts/0.

root@MediaOne:~# lscpu

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                4

On-line CPU(s) list:  0-3

Thread(s) per core:    2

Core(s) per socket:    2

Socket(s):            1

NUMA node(s):          1

Vendor ID:            AuthenticAMD

CPU family:            21

Model:                48

Model name:            AMD A8-7600 Radeon R7, 10 Compute Cores 4C+6G

Stepping:              1

CPU MHz:              1900.000

CPU max MHz:          3100.0000

CPU min MHz:          1400.0000

BogoMIPS:              6386.11

Virtualization:        AMD-V

L1d cache:            16K

L1i cache:            96K

L2 cache:              2048K

NUMA node0 CPU(s):    0-3

Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov

root@MediaOne:~#

Share this post


Link to post
Share on other sites

For some time I've been noticing a considerably slower (about half what it should be) read speed from most/all of my servers, and I'm not the only one, it doesn't happen always but more than 50% of reads.

 

I finally had some time to investigate, tested on more than one server, some of them with nothing in common hardware wise, and this example is from a small server I use at work for temporary client backups, since it's the one where the issue is always reproducible.

 

I completed formatted the flash drive to test with a clean install, no plugins installed and using default settings.

 

Disks used are not very fast but still capable of ~100MB/s as you see in the v6.2.0-beta21 screenshot, starting with beta22 read speed goes to about half what it should be.

 

Write speed is not affected and whatever the issue is is still present on 6.3.0-rc4.

 

Edit: forgot to mention, same low read speed copying from disk or user shares.

6.2.0-beta21.jpg.ab16648665b22a1565344c6ac0608342.jpg

6.2.0-beta22.jpg.3958adbd9aa4da55b03217ba4c8a3f98.jpg

6.3.0-rc4.jpg.a70a7bb288d67175d003df3430af073b.jpg

tower8-diagnostics-20161118-0436.zip

Share this post


Link to post
Share on other sites

For some time I've been noticing a considerably slower (about half what it should be) read speed from most/all of my servers, and I'm not the only one, it doesn't happens always but more than 50% of reads.

 

I finally had some time to investigate, tested on more than one server, some of them with nothing in common hardware wise, and this example is from a small server I use at work for temporary client backups, since it's the one where the issue is always reproducible.

 

I completed formatted the flash drive to test with a clean install, no plugins installed and using default settings.

 

Disks used are not very fast but still capable of ~100MB/s as you see in the v6.2.0-beta21 screenshot, starting with beta22 read speed goes to about half what it should be.

 

Write speed is not affected and whatever the issue is is still present on 6.3.0-rc4.

 

Johnnie.black, I also had this issue.  Here is what I found was the problem:

 

  https://lime-technology.com/forum/index.php?topic=53134.0

 

I don't know if you are seeing the same thing  but it worth a quick check.

Share this post


Link to post
Share on other sites

For some time I've been noticing a considerably slower (about half what it should be) read speed from most/all of my servers, and I'm not the only one, it doesn't happen always but more than 50% of reads.

 

I finally had some time to investigate, tested on more than one server, some of them with nothing in common hardware wise, and this example is from a small server I use at work for temporary client backups, since it's the one where the issue is always reproducible.

 

I completed formatted the flash drive to test with a clean install, no plugins installed and using default settings.

 

Disks used are not very fast but still capable of ~100MB/s as you see in the v6.2.0-beta21 screenshot, starting with beta22 read speed goes to about half what it should be.

 

Write speed is not affected and whatever the issue is is still present on 6.3.0-rc4.

 

Edit: forgot to mention, same low read speed copying from disk or user shares.

 

How are you connecting to the server, UNC or mapped drive letter?  Which ever way you are using, please try the other and see if issue persists.

Share this post


Link to post
Share on other sites

How are you connecting to the server, UNC or mapped drive letter?  Which ever way you are using, please try the other and see if issue persists.

 

Was using UNC, although at home I use mapped drives, but same result with mapped drive.

6.3.0-rc4_mapped.png.f94c021df1981e2abc349822654e6066.png

Share this post


Link to post
Share on other sites

How are you connecting to the server, UNC or mapped drive letter?  Which ever way you are using, please try the other and see if issue persists.

 

Was using UNC, although at home I use mapped drives, but same result with mapped drive.

 

If you have two unRAID servers you can test this independently of windows.  For example, suppose you have tower1 and tower2.  From tower1 telnet/console type this:

 

mkdir /x
mount //tower2/<share> /x -o user=nobody
time cp /x/path/to/big/file /dev/null
umount /x

 

When this finishes take the size of the file divided by "real" time reported to come up with transfer rate.

 

You can log into tower2 to see what actual file transfer rate you are getting off the storage similarly:

 

time cp /mnt/<share>/path/to/big/file /dev/null

 

Caution: don't use the same big/file since some/all of it may be cached in memory.  However, knowing that files get cached in RAM if you have more free RAM than size of the file you can first copy the file to measure storage read + network, and then copy a second time to see network speed.

 

In my testing in all cases I see the transfer rate of 101MB/sec (M=1,000,000) which is max on 1Gb/s network connection.

 

With windows client I don't see this but also my windows PC has a crappy slow nearly full hdd, so I always chalked it up to that.  But in 6.2-beta21 we used samba-4.4.0 and starting in beta21 we went to samba-4.4.4 (and now we're up to 4.5.1).  Well there was a series of pretty big changes starting in samba-4.4.1, related to security.  Could be something's not quite right.

Share this post


Link to post
Share on other sites

If you have two unRAID servers you can test this independently of windows.  For example, suppose you have tower1 and tower2.  From tower1 telnet/console type this:

 

I will make some more tests over the weekend and post results, one thing I tried was changing to a different filesystem, reiserfs shows the same slow read as xfs, but btrfs is considerably faster, almost normal, does this make any sense?

 

Meanwhile, after checking the diagnostics from a user with v6.2.4 with an unclean shutdown, it was detected and and parity check started but with the nocorrect option, is this intentional? I just checked with rc4 and behavior is the same, in this circumstance I believe it should be a correcting check.

 

PS. If auto start is disable the "write corrections to disk" box is checked, but with auto start enable it starts a nocorrect check, diagnostics are from the unclean shutdown.

 

tower8-diagnostics-20161119-0208.zip

6.3.0-rc4_btrfs.png.6dad8a162d2cc75d666b79baf13ea839.png

Share this post


Link to post
Share on other sites

But in 6.2-beta21 we used samba-4.4.0 and starting in beta21 we went to samba-4.4.4 (and now we're up to 4.5.1).  Well there was a series of pretty big changes starting in samba-4.4.1, related to security.  Could be something's not quite right.

 

After some more testing I believe the slowdown is related to Samba and SMB3, but this is a strange issue, it doesn't affect all devices the same way, for some reason some disks are much more affected than others, see screenshot below where I compare 3 disks on the same server, between 6.0.0-beta21 and 6.3.0-rc4.

 

A few more observations:

-Windows 7 is not affected

-Transfers from one unRAID directly to another are not affected

-Server hardware seems irrelevant, tested AMD and Intel, onboard SATA and HBA, different NICs, always same results

-Some devices apparently continue to operate at normal speed, eg, I can copy from my NVMe cache device at 500MB/s+

-For some reason BTRFS formatted disks are much less affected than if XFS or Reiser is used.

-To get decent performance on the worst affected disks post 6.0.0-beta22 I have to disable SMB2 and 3 and got back to SMB1

 

 

Disk1 - Hitachi Deskstar 7K1000.C 250GB - hdpam buffered disk reads: 404 MB in  3.01 seconds = 134.08 MB/sec

Disk2 - WD30EZRX-00MMMB0 3TB - hdparm buffered disk reads: 382 MB in  3.01 seconds = 127.04 MB/sec

Disk3 - Seagate ST3250310CS 250GB - hdparm buffered disk reads: 302 MB in  3.02 seconds = 100.05 MB/sec

 

 

b21_rc4_xfs.png.bb7a7df07773bcb5536575bd7dfecf4f.png

rc4_d3.png.2905e3155bfdb8497b1ecbcc5ccffca3.png

Share this post


Link to post
Share on other sites

After some more testing I believe the slowdown is related to Samba and SMB3, but this is a strange issue, it doesn't affect all devices the same way, for some reason some disks are much more affected than others, see screenshot below where I compare 3 disks on the same server, between 6.0.0-beta21 and 6.3.0-rc4.

 

A few more observations:

-Windows 7 is not affected

-Transfers from one unRAID directly to another are not affected

-Server hardware seems irrelevant, tested AMD and Intel, onboard SATA and HBA, different NICs, always same results

-Some devices apparently continue to operate at normal speed, eg, I can copy from my NVMe cache device at 500MB/s+

-For some reason BTRFS formatted disks are much less affected than if XFS or Reiser is used.

-To get decent performance on the worst affected disks post 6.0.0-beta22 I have to disable SMB2 and 3 and got back to SMB1

 

 

Disk1 - Hitachi Deskstar 7K1000.C 250GB - hdpam buffered disk reads: 404 MB in  3.01 seconds = 134.08 MB/sec

Disk2 - WD30EZRX-00MMMB0 3TB - hdparm buffered disk reads: 382 MB in  3.01 seconds = 127.04 MB/sec

Disk3 - Seagate ST3250310CS 250GB - hdparm buffered disk reads: 302 MB in  3.02 seconds = 100.05 MB/sec

 

Wow, thanks for the details and testing on this.  Would you be up for one more test when you get a chance:

1) Using the worst offender, Seagate ST3250310CS 250GB, formatted as either XFS or Reiser

2) Assign it as the cache disk

3) Benchmark read transfer using 6.3.0-rc4 with SMB3

 

Share this post


Link to post
Share on other sites

Wow, thanks for the details and testing on this.  Would you be up for one more test when you get a chance:

1) Using the worst offender, Seagate ST3250310CS 250GB, formatted as either XFS or Reiser

2) Assign it as the cache disk

3) Benchmark read transfer using 6.3.0-rc4 with SMB3

 

Same thing, this looks more like a SAMBA issue rather than unRAID.

cache_smb3.png.9094c697a3f2d8793f198de714efddd3.png

Share this post


Link to post
Share on other sites

One final note, I posted earlier that to get better performance I needed to go back to SMB1, I found that using an earlier SMB2 also works and has slightly better performance, practically identical to 6.0.0-beta19 with SMB3, so that's what I'll be using for now.

 

I was adding "max protocol = SMB2" to unRAID Samba extra conf, but it's possible to use an earlier SMB2 version:

 

max protocol = NT1 (SMB 1.5)

max protocol = SMB2_02 (SMB 2.0.2)

max protocol = SMB2 (SMB 2.1)

SMB.png.e8607dcac105fe0049f9cdf9179017a7.png

  • Upvote 1

Share this post


Link to post
Share on other sites

I've just resolved a number of issues on my machine and is seems that similar to jonnie.black i was having SMB issues, transfers from my NVMe cache drive to a store drive via a windows 10 vm was very slow

 

I tried to launch a game located on a share from my vm and it hung the vm, i did a vm restart and i couldn't access my desktop (my desktop maps to a network share on unraid) for a number of minutes while it tried to connect to unraid

 

I have switch to using 2.0.2 as outlined in the last message and my system is running ok again, i have attached my server diagnostics in case it helps with anything (and will post if anymore issues arise)

 

On a side note, has anyone tried turning the windows feature on for "SMB Direct"? Not sure if unraid or the drivers would even support it, nor what it does in honesty but may be something worth investigating to someone more network savvy than myself?

 

Jamie

archangel-diagnostics-20161122-1903.zip

Share this post


Link to post
Share on other sites
Guest
This topic is now closed to further replies.

Copyright © 2005-2017 Lime Technology, Inc. unRAID® is a registered trademark of Lime Technology, Inc.