Page 1 of 2 12 LastLast
Results 1 to 10 of 14
  1. #1
    Newbie
    Join Date
    Aug 2020
    Posts
    10

    Default Enabling QoS significantly impacts download speed?

    Hardware:
    Supermicro A1SRi-2558F (Atom C2558), 32GB (ECC) RAM
    Running VMware ESXi 7.0.x
    Untangle 16.0.0 beta is a VM with 3 vCPUs and 8GB of RAM

    Untangle was installed from ISO instead of the OVA (in order to use a smaller virtual disk file).
    VM is configured to use VMXNET3 adapters, although it started out using Intel-emulation and was exhibiting the same behavior then.

    In case it matters - system is configured for dual WAN, but in failover-only setup, and I'm doing all testing with the primary connection active.

    Problem:
    Enabling QoS seems to limit download speed more than I'd expect.
    WAN is WOW cable service rated at 500Mbps x 50 Mbps, and routinely tests higher (more like 525 x 54).
    I have QoS bandwidth configured at 520,000Kbps by 52,000 KBps upload.

    I'm only seeing around 360Mbps download on tests (primarily using a "local" Ookla speedtest server that is within my ISP's network) when QoS is enabled, on an otherwise quiet network. (There might be some ancillary small traffic but not more than 1-2Mbps at very most.)
    If I disable QoS, I am back to expected speeds (tends to spike up above 600 and then level out around 525 or so). Upload is not impacted and seems to obey the set speed limit (as expected).

    During a speedtest, top running on the Untangle box shows a massive CPU spike from the java process. It's not at max, but sits around 230% or so (3 vCPUs so "300%" should be fully pegged). I just tried increasing to 4 vCPU and this did not help.

    Am I just running into CPU limits? (This seems unlikely to me since I'm not seeing all 3 cores pegged.) Is QoS known to be demanding / slow on Untangle? I'm coming from pfSense where enabling CoDeL did not seem to impact the box all that much (but I was also running pfSense bare metal). When I disable QoS and test again, the usage by the java process still spikes, but it tops out around 170-200% and speed does not appear to be impacted.

    Note that VMware is showing the CPU spike as coming from the VM itself, not as coming from VMware (so I don't think it's the physical-virtual translation causing the CPU usage). This seems to be borne out by the spike in java within Untangle too...
    Last edited by ZPrime; 09-21-2020 at 10:45 PM.

  2. #2
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    25,071

    Default

    It seems to me that QoS is doing exactly what you've asked it to do.

    And yes, Untangle requires buckets more resources than PFSense, they aren't even close to doing the same thing.

    Also, vCPUs don't work the way you think they do... and you need to give more cores to Untangle.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  3. #3
    Newbie
    Join Date
    Aug 2020
    Posts
    10

    Default

    Also, vCPUs don't work the way you think they do... and you need to give more cores to Untangle.
    I'm not sure how you know what I think about vCPUs when I haven't made any comments about them beyond how many the VM has? The info about "300% should be fully pegged" was from the perspective of top within the VM itself (which thought it had 3 processors, hence 300% = "full load" to the kernel).

    The host only has 4 cores on one physical socket. I was already giving Untangle (effectively) 75% of the system.
    I just increased the VM to 4 vCPU. Did not improve anything, test is still around 330-360Mbps.

    I would expect QoS to limit a speed test to approximately the speed limit I set (~520Mbps), maybe give or take 5-10Mbps.
    360Mbps is significantly less than 520 in my book.

    If there was some sort of mandatory hold-back in QoS, I could maybe understand this discrepancy, but I would normally expect lower tiers of traffic to "steal" from higher buckets if they are not in use? Plus, I tried changing the default traffic bucket to be "Very High" (which should win over everything else regardless), and again it didn't improve speed.

    Changing the download limit to something way above my rated speed (~800Mbps), saw me hit ~450Mbps download on tests. So it seems like the hardware has more to give, which then leads me to believe maybe something is going on with "reserved" bandwidth that is being held back and not allowing other traffic to steal from that bucket?

    If the QoS process is single-threaded, I'm also cognizant of the fact that an Atom C2558's 2.4Ghz isn't as capable as a "real" CPU, and the lackluster single thread throughput could be to blame here. But again, pfSense on this box did not suffer a several-hundred-Mbps throughput loss from their QoS.

    I understand that Untangle is more sophisticated than pfSense and has far more functionality, but I also have as much of that turned off right now as I can. Most of the "Apps" are not even installed at the moment - Firewall (with zero rules enabled), WAN failover, and the reporting stuff. That's it. (I have Wireguard installed too but have not done any config and have no VPNs active.)

    I'm just trying to figure out if there's anything else I can do to hit close to the rated line speed in this scenario (without having to lie to the system about my download speed).

  4. #4
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    25,071

    Default

    VCPUs inform the hypervisor of how many threads a given VM is allowed to create. At the speeds you're talking about, 3 is going to be a problem. Generally speaking I recommend assigning as many vCPUs to any guest VM as the host has, that way all guests can use 100% of the CPU. Please note, this number includes virtual cores from hyper-threading. From there, you use priority settings to let the hypervisor know which VMs are more important. This configuration lets you actually use your host's CPU. Untangle needs to be extremely important, or everything slows down so beware the priority settings.

    As for the rest, Untangle's QoS doesn't reserve bandwidth it simply uses the numbers you feed it. If you're seeing less than what's being fed, you have something else going on, that isn't normal. A CPU scheduling conflict in the hypervisor caused by poorly scheduled threading could explain the discrepancy. This is double true on CPU power limited platforms such as what you're working with. I think you'll see a substantial performance improvement with that adjustment. I know I did when I first learned about it.
    Last edited by sky-knight; 09-22-2020 at 01:24 AM.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  5. #5
    Untangle Ninja dwasserman's Avatar
    Join Date
    Jun 2008
    Location
    Argentina
    Posts
    4,341

    Default

    (Atom C2558) 2 Mb Cache L3, can be the bottleneck
    The world is divided into 10 kinds of people, who know binary and those not

  6. #6
    Newbie
    Join Date
    Aug 2020
    Posts
    10

    Default

    Quote Originally Posted by dwasserman View Post
    (Atom C2558) 2 Mb Cache L3, can be the bottleneck
    Again, it isn't with FreeBSD + pf.
    But I understand that this is a different system. If that's my answer, so be it. Was just hoping maybe somebody here had some other suggestions, in case it isn't hardware.

    If it was truly hardware, I'd expect to see full CPU usage during a test as the speed chokes off, but that wasn't happening. That's part of why I thought maybe it wasn't hardware.

  7. #7
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    25,071

    Default

    Quote Originally Posted by ZPrime View Post
    Again, it isn't with FreeBSD + pf.
    But I understand that this is a different system. If that's my answer, so be it. Was just hoping maybe somebody here had some other suggestions, in case it isn't hardware.

    If it was truly hardware, I'd expect to see full CPU usage during a test as the speed chokes off, but that wasn't happening. That's part of why I thought maybe it wasn't hardware.
    It wasn't happening because your vCPU configuration prevents that from happening.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  8. #8
    Newbie
    Join Date
    Aug 2020
    Posts
    10

    Default

    Quote Originally Posted by sky-knight View Post
    It wasn't happening because your vCPU configuration prevents that from happening.
    During the first post, OK, I only gave it 3 vCPUs. That's on me.

    I made changes after that:
    Quote Originally Posted by ZPrime View Post
    I just increased the VM to 4 vCPU. Did not improve anything, test is still around 330-360Mbps.
    I shut down the VM, changed vCPUs to 4, and tested again (multiple times). Same problem, roughly the same performance. I think I was able to get it closer to 380-400Mbps, but that's still ~120Mbps off of the speed I set in QoS.

    Top (inside the VM) isn't showing full CPU usage, and VMware isn't showing full usage either (although admittedly the resolution on the VMware graph isn't the finest). Again, I'd expect pegged CPU if this was truly cpu-limited, but I'm not seeing that.

  9. #9
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    25,071

    Default

    TOP inside the VM won't report high CPU utilization if the load is outside userland. Which is what happens when you have an ATOM platform without appropriate offloading for the NICs. Untangle is computationally much heavier than a layer 2 platform like PFSense. So the first place you feel that loss is in bandwidth if the CPU is limiting.

    VMWare's monitors have similar limitations, and if you're using VMXNics in Untangle (which you should be doing), you're moving the load from the guest to the host, but in the end the same problem remains. You have software driven NICs without enough hardware to do gigabit reliably without CPU backing them up. Then you pair it with a low power CPU. It's a recipe that ends in inconsistent or poor performance every time.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  10. #10
    Newbie
    Join Date
    Aug 2020
    Posts
    10

    Default

    Intel I354 NICs are considered "software driven?" Or are you saying that just virtualization in general is the problem, and this hardware should perform better without VMware in the way?

    I was going to try to pass-through the NICs directly to the VM, but apparently this board/CPU combo can't do passthrough (at least not with VMware).

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2