**VIDEO GUIDE** Advanced Server tuning Techniques for VMs, Docker and General Server


Recommended Posts

Hi, Guys.
This is a series of 3 videos about tuning the unRAID server.
It is a guide that is for the server as a whole but has a lot
of information for VMs so I thought this forum section the best place to post this.
Some of the topics are:-

Cpu governor and enabling turbo boost
About vCPUs and hyperthreading.
How VMs and Docker containers and affect each other performance.
Pinning cores to Docker containers.
Using the same container with different profiles
Allocating resources to Docker containers.
Decreasing latency in VMs
Using emuatorpin
Isolating CPU cores
Setting extra profiles in syslinux for isolated cores
Checking wether cores have been correctly isolated
Disabling hyperthreading
Having unRAID manage vCPUs as opposed to vCPU pinning.

Hope this video is interesting :)
    
Part 1

 

 

 

 

 

Part 2


    
    

Part 3

 

Edited by gridrunner
  • Like 2
  • Upvote 6
Link to comment

These videos are very good.  I actually learned some things I didn't know.

 

You did an excellent job with the cpu pinning and assignment.  As you said it is as much art as science.  There is no "one size fits all" solution.  If there was, LT would just do things that way.

 

The only thing I might comment on is the caching on a VM vdisk.  From everything I've read, this is the best setup for performance on a vdisk:

<driver name='qemu' type='raw' cache='directsync' io='native'/>

I appreciate your compliments on the Tips & Tweaks plugin.  I did it because I thought it appropriate to keep people off the command line where it is too easy to make mistakes, and give users an easy way to make adjustments.  I had no idea the Performance governor and Turbo could make that much difference.

Link to comment
On 17/07/2017 at 9:52 PM, dlandon said:

These videos are very good.  I actually learned some things I didn't know.

 

You did an excellent job with the cpu pinning and assignment.  As you said it is as much art as science.  There is no "one size fits all" solution.  If there was, LT would just do things that way.

 

The only thing I might comment on is the caching on a VM vdisk.  From everything I've read, this is the best setup for performance on a vdisk:


<driver name='qemu' type='raw' cache='directsync' io='native'/>

I appreciate your compliments on the Tips & Tweaks plugin.  I did it because I thought it appropriate to keep people off the command line where it is too easy to make mistakes, and give users an easy way to make adjustments.  I had no idea the Performance governor and Turbo could make that much difference.

Thanks @dlandon it means a lot if you think they are good.

I have always used cache='none' as it's it meant to be equivalent to the host's disk performance wise. But yes you saying that I think it may be better using direct sync as it will use both o_dsync and 0_direct when interacting with the vdisk. The 'none' uses the disk write cache.

I have read the io='native' is definitely better than io='threads' and works only with with o_direct . So if using either cache='none' or cache='directsync' they use o_direct so both would benefit from this setting.   

Yeah, I really like the tips and tweaks plugin. I was very happy when i found it :) I am surprised the turbo isn't the default unRAID setting for intel CPUs. Especially with how many people use gaming vms on unRAID. I have been thinking of making a script that checks which vms are running from virsh then if its my gaming vm, enabling the turbo and performance but when not running go back to powersave.

Edited by gridrunner
Link to comment
23 minutes ago, gridrunner said:

Thanks @dlandon it means a lot if you think they are good.

I have always used cache='none' as it's it meant to be equivalent to the host's disk performance wise. But yes you saying that I think it may be better using direct sync as it will use both o_dsync and 0_direct when interacting with the vdisk. The 'none' uses the disk write cache.

I have read the io='native' is definitely better than io='threads' and works only with with o_direct . So if using either cache='none' or cache='directsync' they use o_direct so both would benefit from this setting.   

Yeah, I really like the tips and tweaks plugin. I was very happy when i found it :) I am surprised the turbo isn't the default unRAID setting for intel CPUs. Especially with how many people use gaming vms on unRAID. I have been thinking of making a script that checks which vms are running from virsh then if its my gaming vm, enabling the turbo and performance but when not running go back to powersave.

You have obviously done more research on the vdisk caching than I have.  It sounds like the 'none' works fine and is equivalent.

 

I believe that Linux does set the Turbo mode on by default.  I don't know why I did it that way, but T&T defaults Turbo to off.  Maybe I need to re-think that.  Probably what I should do is display the current Turbo setting in the status under the 'Governor:'.

Link to comment
10 minutes ago, dlandon said:

You have obviously done more research on the vdisk caching than I have.  It sounds like the 'none' works fine and is equivalent.

 

I believe that Linux does set the Turbo mode on by default.  I don't know why I did it that way, but T&T defaults Turbo to off.  Maybe I need to re-think that.  Probably what I should do is display the current Turbo setting in the status under the 'Governor:'.

Ah Ok. I thought the turbo was set off as default on unRAID. I just ran  cat /sys/devices/system/cpu/intel_pstate/no_turbo  on my test server and see its on. 

Yes think tips and tweaks should have the turbo set to on as default leaving the user able to turn off if they want.

Link to comment
  • 3 weeks later...

Thanks for the videos.

 

Quick question re emulator pinning.  I have a 14 core CPU and I have 4 VMs running pretty much 247 spread between 13 of the cores, with each one pinned to the first core (pinned 0, vm1 cores 1,2 vm2 3,4,5,6 vm3 7,8,9,10 vm4 11,12,13)

 

In your video you suggest pinning all free cores.  Do you think I should pin 0,3-13 for vm1, 0-2,7-13 for vm2 etc etc?

Link to comment
On 04/08/2017 at 8:30 PM, DZMM said:

Thanks for the videos.

 

Quick question re emulator pinning.  I have a 14 core CPU and I have 4 VMs running pretty much 247 spread between 13 of the cores, with each one pinned to the first core (pinned 0, vm1 cores 1,2 vm2 3,4,5,6 vm3 7,8,9,10 vm4 11,12,13)

 

In your video you suggest pinning all free cores.  Do you think I should pin 0,3-13 for vm1, 0-2,7-13 for vm2 etc etc?

1

sorry @DZMM sorry for this late reply to your question. I suggest when using emulator pin you pin to a core that is free from use by the VM.

I also have a 14 core CPU. Thread pairing for my 14 core Xeon looks like this. So i  guess this is how your VMS pinned at present

cpu 0 <===> cpu 14 . emulator pin to first core for all vms
cpu 1 <===> cpu 15 . vm1
cpu 2 <===> cpu 16   vm1
cpu 3 <===> cpu 17   vm2
cpu 4 <===> cpu 18 . vm2
cpu 5 <===> cpu 19 . vm2
cpu 6 <===> cpu 20 . vm2
cpu 7 <===> cpu 21   vm3
cpu 8 <===> cpu 22 . vm3
cpu 9 <===> cpu 23 . vm3
cpu 10 <===> cpu 24 .vm3
cpu 11 <===> cpu 25  vm4
cpu 12 <===> cpu 26 vm4
cpu 13 <===> cpu 27 vm4

So yes you could emulator pin to core 0 (0,14) for all VMS. But I would only emulator pin if you are having latency issues with a particular VM

Are you having issues with all of them? What are the VMS type? Are they Windows VMS, Linux etc and what are they used for?

Are you running any docker containers on the server at the same time as these VMs?. 

Core 0 is favoured by unRAID so it is possible you could max out core 0 if things are happening on the server and 4 VMS emulator processes are pinned to core 0 (0,14)

so there by It can adversely affect the VMs.

If you have isolated the VMs cores in syslinux from unRAID then this would make this problem worse because unRAID would only be able to use core 0, so any docker containers it spins up unless those were manually pinned to other cores would sit on core 0.

 

Do you really need 4 cores for 2 of your VMs. Could the 4 core VMs use 3 cores? then you could have like this

 

cpu 0 <===> cpu 14    unRAID .                dockers
cpu 1 <===> cpu 15    emulator pin all vms    dockers
cpu 2 <===> cpu 16    emulator pin all vms    dockers
---------vms------------------------
cpu 3 <===> cpu 17  . vm1
cpu 4 <===> cpu 18  . vm1
cpu 5 <===> cpu 19  . vm2
cpu 6 <===> cpu 20  . vm2
cpu 7 <===> cpu 21  . vm2
cpu 8 <===> cpu 22  . vm3
cpu 9 <===> cpu 23  . vm3
cpu 10 <===> cpu 24 . vm3
cpu 11 <===> cpu 25 . vm4 
cpu 12 <===> cpu 26 . vm4
cpu 13 <===> cpu 27 . vm4

 

If you really need the 4 cores for those VMs, how about the other 2 VMs (the 3 core and the 2 core.)

Are they using a lot of horse-power and/or low latency is essential. If not they could share cores whilst the 

4 core VMs have their own. ie

cpu 0 <===> cpu 14 unraid
cpu 1 <===> cpu 15 emulator pin vm2 and vm3
cpu 2 <===> cpu 16 emulator pin vm2 and vm3
-----low cpu usuage vms-------
cpu 3 <===> cpu 17  vm1 & vm4
cpu 4 <===> cpu 18  vm1 & vm4
cpu 5 <===> cpu 19  vm1 & vm4
------important low latency 4 core vms---- (maybe isolate below cores in syslinuxcofig for exclusive use???)
cpu 6 <===> cpu 20 . vm2
cpu 7 <===> cpu 21 . vm2
cpu 8 <===> cpu 22 . vm2
cpu 9 <===> cpu 23 . vm2
cpu 10 <===> cpu 24 . vm3
cpu 11 <===> cpu 25 . vm3
cpu 12 <===> cpu 26 . vm3
cpu 13 <===> cpu 27 . vm3

Anyway, I am not sure if I have answered your question or just rambled on. But really only emulator pin

the VMS that need low latency. Don't be worried to share cores with other VMS that arent doing lots

of number crunching or need low latency as in a gaming VM. When all your VMS are running look at the load on

each core in the unRAID dash. That way you can see whats going on across the cores that have been pinned.

    

 

 

 

 

Edited by gridrunner
Link to comment

Thanks gridrunner - that helped me decide what to do.

 

My 4 VMs are:

VM1&2: Used by my young kids.  They don't do much heavy duty, but their gaming is increasing.  Probably could get away with less than 3 cores, but my overall CPU usage rarely goes >50%, so for now I'm indulging them

VM3: My daily driver.  Giving myself 4 cores because I'm a bloke, even though I probably don't need them

VM4: pfsense.  At the moment the CPU usage is also low, but might pick up as my linespeed goes up (hopefully soon as I'm on 19/1)

 

I've adjusted my cores to:

 

cpu 0 <===> cpu 14 . Unraid
cpu 1 <===> cpu 15 . Emulator Pin for VMs
cpu 2 <===> cpu 16   vm1
cpu 3 <===> cpu 17   vm1
cpu 4 <===> cpu 18 . vm1
cpu 5 <===> cpu 19 . vm2
cpu 6 <===> cpu 20 . vm2
cpu 7 <===> cpu 21   vm2
cpu 8 <===> cpu 22 . vm3
cpu 9 <===> cpu 23 . vm3
cpu 10 <===> cpu 24 .vm3
cpu 11 <===> cpu 25  vm3
cpu 12 <===> cpu 26 vm4
cpu 13 <===> cpu 27 vm4

 

I think you're right and I was taxing Core 0 too hard, so moving the emulators to another core will hopefully help.  I've pinned all as the usage on core 1 never seems to max out, so I don't think there's a lot of overhead

 

My docker usage can get high thanks to mainly Plex, but even when transcoding, unraid seems to intelligently spread the usage across a number of cores.  It seems to touch the higher cores last, so that's why I've put VMs 3&4 on the higher cores.  In the future I might pin dockers, but I honestly think unraid does a good job of spreading the load and I don't want to restrict it's options.

 

I haven't really had a situation where a docker has impacted VM usage yet, although I do occasionally get a blip on video playback that could be down to sabnzbd activity -  hopefully having two cores that aren't used by VMs will help unRAID better balance docker usage.

 

 

Link to comment

I too would like to thank @gridrunner for all of his videos.  What I have found is that after watching them 3-5 times, if I want to fully implement what you spell out in a given video, there's say 5 tips, each with 5 steps, that is at least 25 pauses, rewinds, zooms, etc to be sure I have gotten it perfect.   I'm a total noob when it comes to linux, I had some exposure to BSD Unix about 30 years ago but I was mainly playing Hack, rogue, hunt, advent et. al and reading/posting in the newsgroups.  @gridrunner, have you joined Patreon, or have a public facing PayPal?  If you are, I have a tip for you.

 

@TinkerToyTech

Link to comment
On 17/07/2017 at 9:52 PM, dlandon said:

These videos are very good.  I actually learned some things I didn't know.

 

You did an excellent job with the cpu pinning and assignment.  As you said it is as much art as science.  There is no "one size fits all" solution.  If there was, LT would just do things that way.

 

The only thing I might comment on is the caching on a VM vdisk.  From everything I've read, this is the best setup for performance on a vdisk:


<driver name='qemu' type='raw' cache='directsync' io='native'/>

I appreciate your compliments on the Tips & Tweaks plugin.  I did it because I thought it appropriate to keep people off the command line where it is too easy to make mistakes, and give users an easy way to make adjustments.  I had no idea the Performance governor and Turbo could make that much difference.

For me personally, I see massive latency spikes if I change it to anything suggested in this thread. Leaving it on cache='writeback' gives me greens the whole time.

Link to comment
  • 5 months later...
  • 1 month later...

Hey - had a thought as I was setting up my new 12 core server.

 

My two main sources of CPU stress come from my Plex Docker and from my WIndows VM. And when one is working hardest, normally the other is pretty quiet. So I prefer not to limit cores which allows the task needing the cores to have them all available.

 

From prior experience, if I let them share all their cores, I have issues with the Windows VM slowing while heavy transcoding is happening in Plex.  To address, I had excluded the VM's first core from Plex, and that helped a lot, but VM still a bit sluggish. (This was on my old 4 core server that I experimented with this). Holding back a full core from Plex was sort of a big deal though, but couldn't be avoided because otherwise the VM was unusable.

 

But as I was thinking, seems both are using cores in the same order. What if I could let Plex continue to use cores from lowest core up, but let the VM use cores from last core down, as shown below:

 

Default:

Plex:  0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

VM:   1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

 

New:

Plex:  0, 1, 2, 3, 4,  5, 6, 7, 8, 9, 10

VM:  11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1

 

They should interfere with each other less. I still have the first core used by each excluded from the other, at least for now.

 

Hoping that will mean less interference in general. But not 100% sure. The nature of Plex may be to transcode using all available cores uniformly, but this can't hurt I don't think.

 

Have not done much transcoding yet, but think this might be worth trying to allow cores to be shared most effectively.

 

Btw - the core mappings have to be manually editted in the XML, from:

    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='13'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='14'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='15'/>

etc.

 

 to:

    <vcpupin vcpu='0' cpuset='11'/>
    <vcpupin vcpu='1' cpuset='23'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='22'/>
    <vcpupin vcpu='4' cpuset='9'/>
    <vcpupin vcpu='5' cpuset='21'/>
etc.

 

Apologies if this is covered in the video. I didn't go back and rewatch but don't remember seeing it.

Link to comment
48 minutes ago, SSD said:

Hey - had a thought as I was setting up my new 12 core server.

 

My two main sources of CPU stress come from my Plex Docker and from my WIndows VM. And when one is working hardest, normally the other is pretty quiet. So I prefer not to limit cores which allows the task needing the cores to have them all available.

 

From prior experience, if I let them share all their cores, I have issues with the Windows VM slowing while heavy transcoding is happening in Plex.  To address, I had excluded the VM's first core from Plex, and that helped a lot, but VM still a bit sluggish. (This was on my old 4 core server that I experimented with this). Holding back a full core from Plex was sort of a big deal though, but couldn't be avoided because otherwise the VM was unusable.

 

But as I was thinking, seems both are using cores in the same order. What if I could let Plex continue to use cores from lowest core up, but let the VM use cores from last core down, as shown below:

 

Default:

Plex:  0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

VM:   1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11

 

New:

Plex:  0, 1, 2, 3, 4,  5, 6, 7, 8, 9, 10

VM:  11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1

 

They should interfere with each other less. I still have the first core used by each excluded from the other, at least for now.

 

Hoping that will mean less interference in general. But not 100% sure. The nature of Plex may be to transcode using all available cores uniformly, but this can't hurt I don't think.

 

Have not done much transcoding yet, but think this might be worth trying to allow cores to be shared most effectively.

 

Btw - the core mappings have to be manually editted in the XML, from:

    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='13'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='14'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='15'/>

etc.

 

 to:

    <vcpupin vcpu='0' cpuset='11'/>
    <vcpupin vcpu='1' cpuset='23'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='22'/>
    <vcpupin vcpu='4' cpuset='9'/>
    <vcpupin vcpu='5' cpuset='21'/>
etc.

 

Apologies if this is covered in the video. I didn't go back and rewatch but don't remember seeing it.

I kind of do this with my VMs - my kids VMs are on the lower cores, and mine and the pfsense VM are on the highest cores.  I don't know if it helps, but when I'm bored and watching the cores activity the higher cores do seem to get less use.

Link to comment
  • 4 weeks later...

I have had a great time setting up my UnRaid server and going through all your videos Gridrunner, they have made the setup and tweaking process a joy :x

 

All tips work great for me except ISOLCPUS which dramatically reduced performance in my case (by dropping 1 core it seems). Has anyone seen this before?

 

I have three cores and 6 threads pinned to my gaming VM which gives good benchmarks in aida64. When I put ISOLCPUs on, the score plummeted by 30-50% and only used two cores while running the test (before it used all three it had pinned).

 

This is a Threadripper system so it has two memory controllers and other semi exotic features. Perhaps the memory score is going down because it's losing access to a core (less access to memory), which is born out by the CPU usage I'm seeing also. But why ISOLCPUS loses me a core, I do not know.

 

 

5aab91c7f00a8_ScreenShot2018-03-15at22_22_17.thumb.png.0f9f90e6d1c77e57d5a717048a1846ca.png

 

This gives a aida64 cache and memory result of:

5aab9486cabe9_aida64hyper-vonnoisol.png.bb40620ca5e1f7fad731d9ca6b672f11.png

 

Now, with those CPUS isolated as follows:

Quote

label unRAID OS
  menu default
  kernel /bzimage
  append isolcpus=1-3,9-11 initrd=/bzroot


I get this: 
5aab95a52655c_ScreenShot2018-03-16at09_58_56.png.41d0d975059d7afb08457a69bf4f9265.png

 

and a peak at system status while it's running show that only 2 cores are being used. The same test without ISOLCPUS used all 3 of the cores and gained a better score. 

5aab91d181c9e_ScreenShot2018-03-15at22_11_37.thumb.png.6dd0ee7dc89e12e40b72fa5626593de4.png

 

Any thoughts?

 

 

On 16/01/2018 at 11:24 PM, NotYetRated said:

Awesome videos, hoping tips here will help me with my 2 VM's experiencing audio/video desync as well as random audio static type issues I have been experiencing.

 

Sorry if I missed this, but how are people checking latency on their VM's?

 

I also would like to know what test others are running here, it's not clear to me if there's any popular testing tools that this forum generally uses.

Link to comment
  • 1 year later...

This is a very interesting thread. No clue how I could have missed this one earlier.


Not sure whether my question should be in a separate thread, but let me start here. Happy to open a new one if fits?

 

I noticed that Intel has implemented several version of turbo boost: turboboost, turboboost 2.0 and even turboboost 3.0. I am planning to upgrade my 7800X to the newly released 10980XE. But want to make sure that my gaming VM would benefit from the maximum single core boost of 4.8GHz. I'd need this particularly for emulator games that rely on single-core performance. Any thoughts whether turboboost 3.0 would be supported within a Windows VM?

 

https://blogs.forbes.com/antonyleather/files/2019/10/intel-cascade-lake-x-2.png

Link to comment
  • 2 months later...
On 10/3/2019 at 4:39 PM, steve1977 said:

This is a very interesting thread. No clue how I could have missed this one earlier.


Not sure whether my question should be in a separate thread, but let me start here. Happy to open a new one if fits?

 

I noticed that Intel has implemented several version of turbo boost: turboboost, turboboost 2.0 and even turboboost 3.0. I am planning to upgrade my 7800X to the newly released 10980XE. But want to make sure that my gaming VM would benefit from the maximum single core boost of 4.8GHz. I'd need this particularly for emulator games that rely on single-core performance. Any thoughts whether turboboost 3.0 would be supported within a Windows VM?

 

https://blogs.forbes.com/antonyleather/files/2019/10/intel-cascade-lake-x-2.png

Any thoughts?

Link to comment
  • 6 months later...

I just built a server with a threadripper and I'm letting Unraid, dockers and some VM's that don't use much resources ( mail-server, home-assistant, XP, tvheadend and some various others) share my first 8 cores, then I have two others that need all the performance they can get out their assigned cores.

 

My question is: Is it better to pin the emulator pins of the VM's last mentioned to just a single core of the first eight or to all eight and let Unraid handle the load balancing?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.