Page 1 of 2 12 LastLast
Results 1 to 10 of 13
  1. #1
    Master Untangler
    Join Date
    Jan 2011
    Posts
    103

    Default Insane spikes in CPU usage

    Over the last month I've been seeing crazy spikes in CPU usage. eg, the latest from yesterday, load average load went from 0.5 to 60 for 1 minute, no increase in traffic / bandwidth / web traffic / blocked events / bypassed events / intrusions / session count / memory, swap or disc usage etc. On the graph below you can see other spikes during the day, again with no associated increase in the network throughput on the appliance. The 'normal' load average on the appliance is 0.15 to 0.5 on on Intel Celeron 4 Thread/Core CPU (so a load average of 60 in insane/very broken....)

    Any idea what this is ? ( got to be a software issue somewhere, surely ?)

    Screenshot from 2021-12-30 11-14-13.png
    Last edited by tescophil; 12-30-2021 at 05:28 AM.

  2. #2
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    26,491

    Default

    This sort of thing is really hard to troubleshoot.

    And the systems that gave me the most grief were Azure B1S instances. That's 1 cVPU, 4gb of ram. These are exceedingly small instances, and while they run Untangle well, there's a reason Untangle recommends a D4S.

    The only way to know what's causing the spike is to be watching it when it happens. That means SSH into the box, run top, and stare at it during the times it spikes. Then you'll see what process is sucking up the CPU.

    If you see java.exe sucking it up, that simply means you've outgrown your hardware. It happens, tends to show up with upgrades, but isn't actually the fault of the upgrade itself.

    If your install is ancient, it might just be suffering from upgrade creep too. Eventually, all systems need a nuke and pave.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  3. #3
    Untangler
    Join Date
    Nov 2017
    Posts
    45

    Default

    I'm seeing this as well on a dual Xeon firewall. Happens at 2AM sometimes, 3PM at others. Can never catch it real time, so I can't pinpoint it. Right now there's nobody in the building yet I get a spike at a random time every couple days. I have a ticket open as well. It happened before break for our students during academic testing, and we had complaints of students not being able to advance to the next question easily, so it's having an impact.

  4. #4
    Untangler
    Join Date
    May 2008
    Posts
    522

    Default

    Quote Originally Posted by MNTech68 View Post
    I'm seeing this as well on a dual Xeon firewall. Happens at 2AM sometimes, 3PM at others. Can never catch it real time, so I can't pinpoint it. Right now there's nobody in the building yet I get a spike at a random time every couple days. I have a ticket open as well. It happened before break for our students during academic testing, and we had complaints of students not being able to advance to the next question easily, so it's having an impact.
    Look at the cron settings and see if something is running at those times.

    Google search shows many ways to "record" the output of top. Never tried that but could be useful. How big that file might end up I don/t know.
    https://www.google.com/search?channe...nux+record+top

  5. #5
    Master Untangler
    Join Date
    Jan 2011
    Posts
    103

    Default

    So, the idea of logging in when this is happening is a non starter for a couple of reasons. Firstly it's a very short spike, 63 seconds to be exact with this particular event, how do I know that?, well for 63 seconds not a single session was created. Secondly, because the load is SO high, nothing on the command line will function, I probably could not even type the 'top' command in 63 seconds with the load so high.

    Looks like I'm not the only one with this issue. The machine I have is already massivly over specified and runs with load under 0.25, hitting 1 when maxing out my internet connection..., I don't need a faster machine, just better software...

  6. #6
    Untangler
    Join Date
    May 2008
    Posts
    522

    Default

    It doesn't seem to be a very common problem. So it will be very hard to duplicate and fix. The way to get better software is to provide as much information as possible. Try disabling apps one at a time.

    I don't know if it might help or not but Untangle does not include "intel-microcode" or "amd-microcode". If it is cpu related it might. Probably wont hurt anyway.

    Good luck

  7. #7
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    26,491

    Default

    As I said on the other thread, a spike in CPU utilization can mean many things. But on a fundamental level it means a burst of network sessions.

    This can be caused by a ton of different factors, all of which traffic related. Do you have a web server behind Untangle? Because if you do... and you don't have the ingress HTTP/HTTPs traffic going through a very carefully curated policy... well... You're going to see this, because of all the LOG4J vuln scanning being done right now.

    Running a larger network with a bunch of machines on it? Perhaps you've got a dumb user that installed a bot / malware on their machine that's aiding in this process, or something similar. Malware with a bittorrent delivery engine can do this too... always could. There are 1000 different things here, that aren't Untangle that can cause Untangle to do this. And yes, it is an indication you've outgrown your hardware.

    Is that what's happening? I have no idea. But Support's explanation is VALID.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  8. #8
    Untangler
    Join Date
    Nov 2017
    Posts
    45

    Default

    I agree this is a possible scenario, but in my case, it's VERY unlikely the cause. The untangle is dropping sessions as a result of the load spike, not the other way around. Even with zero inbound rules I see it happen. When I looked at the WAN switching side, there's no corresponding spike in sessions inbound or outbound. If this was a DDoS from either direction, the WAN side should at least see an increase in sessions up until the point where Untangle dies.

  9. #9
    Master Untangler
    Join Date
    Jul 2010
    Location
    Nanaimo B.C
    Posts
    714

    Default

    Quote Originally Posted by MNTech68 View Post
    I agree this is a possible scenario, but in my case, it's VERY unlikely the cause. The untangle is dropping sessions as a result of the load spike, not the other way around. Even with zero inbound rules I see it happen. When I looked at the WAN switching side, there's no corresponding spike in sessions inbound or outbound. If this was a DDoS from either direction, the WAN side should at least see an increase in sessions up until the point where Untangle dies.
    Is this a paid version of UT ? or Free version ? How many users ? I know i'm grasping at some straws here but..
    Started Youtube Channel, Have a question about Untangle Ask me : jason @ jasonslab.ca
    https://www.youtube.com/c/jasonslabvideos << Please like and subscribe, helps me out !!

  10. #10
    Untangler
    Join Date
    Nov 2017
    Posts
    45

    Default

    This is paid, 2000+ Users (Unlimited License). So far I've traced it back to starting on Sep 8th 2021. That's about 2 weeks after going from v15 to v16. Since that date, it's happening randomly, Sep 8,17,29 Oct 14,15, Nov 9,11,19,22,30, Dec 3,7,9,21,29.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2