Page 1 of 3 123 LastLast
Results 1 to 10 of 21
  1. #1
    Untangler
    Join Date
    Apr 2010
    Posts
    50

    Default Potential packet loss/ DHCP latency

    Hi,

    I doubt this issue is untangle but I'm working backwards through everything to try and isolate it rather than guess.

    I have multiple VLANs setup in my network, some are physical separation and others are VLANs. In total, there are about 10 different network segments. Untangle functions as the L3 device between all VLANs.

    Untangle is running on custom hardware with an Intel i7 4 core cpu (average load is under 1.0) 8GB of ram. It has 4 NICs (1 Onboard, 1 dual nic card, and 1 single nic card). Two of the NICs are used for the dual ATT/Comcast connections. The other two separate out user networks vs infrastructure networks. (All Links are 1Gb)

    The core switch is a Unifi 48 port non-POE switch. All servers and Access switches connect to the core switch. Wireless APs connect to the Access Switches. All Access switches are 24port or 8 port Unifi switches.

    I'm noticing a few oddities related to DHCP when a device connects to the network. Sometimes it takes a user device 5-60 seconds to get a new DHCP lease.

    Sometimes when talking to servers and devices on other VLANs, the first request times out and then the next one goes through.

    The webserver I'm running in the DMZ vlan keeping having DNS, http, and other connection timeouts. I'm having trouble tracking these down but wanted to ask the community, any tips for seeing if the PL/session issues are in Untangle?

    Thanks in Advance.

  2. #2
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    24,958

    Default

    How many devices? How many VLANs?

    You've got 1gbit interfaces... you'd be surprised how quickly they can saturate!
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  3. #3
    Untangler
    Join Date
    Apr 2010
    Posts
    50

    Default

    11 VLANs beyond the LAN (VLAN1) and 36 devices on that switch. 5 XEN hosts, 2 NAS, Camera Uplink, 3 Access Switches, Total network clients across VMs, Servers, endpoints, etc. comes to ~70.

    Looking in the Unifi Console, Untangle Dashboard, NAS consoles, etc. I'm not seeing maxed out links.

    The 48 port switch has two uplinks to Untangle that split its network traffic to help prevent saturation. I was noticing issues with only 1 uplink without the port being saturated fully so I split them between infrastructure and user VLANs and split them to different uplinks.

    CPU loads on everything are well below any thresholds that produce lag or PL.

    Anyway, I wanted to check and see if the community had any experience with this. If not, It may be time to start upgrading to 10G and move away from Unifi.

  4. #4
    Untangle Ninja Jim.Alles's Avatar
    Join Date
    Jul 2008
    Location
    Central PA
    Posts
    2,447

    Lightbulb

    I have observed, in a more simple architecture and much less capable hardware <read obsolete>, some client devices (smartphones, tablets) (and if I recall primarily iOS), would time out before obtaining DHCP over Wi-Fi. By default , dnsmasq does a 'belts & suspenders' thing where it pings the network for an IP address before it answers the DHCP request with the offer. I do not know what the timing parameters are.

    Untangle NGFW uses that default.

    In my opinion, if the subnet is large enough so dnsmasq's hash selection has room to breath, and the network is properly configured otherwise, this double-check may not be necessary. dnsmasq keeps very good track of what it is doing.

    I am going to assume that Untangle will not support an advanced configuration change like disabling this check, but I have used it for years, with no perceptible issues.

    This might be enough for you to look into, or DM me if you want details. mebbe there is something in the hacks forum, if you search.
    Last edited by Jim.Alles; 09-26-2019 at 08:17 AM.

  5. #5
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    24,958

    Default

    Hmm, if you're only having issues on one of the interfaces, then I'd say you've got a configuration problem specific to that interface.

    There are a bucket of potential issues here, and these forums aren't a terribly good place to explore them. For example, what hardware are you running on? The specifications you've listed aren't sufficient. If you've built your own Untangle out of a scrap workstation, or using any workstation grade components, you more than likely have a PCI bus in your server that simply cannot keep up with the task at hand. And to make matters worse, this design fault is impossible to see with the tools provided via the UI. There is no SSH command or SNMP metric to see this either.

    Moving to 10G only makes the problem WORSE, as there is absolutely no such thing as a 10gb interface on a desktop, not that works at a high capacity anyway. High performance Untangle servers require special care to create, that's why Untangle and others sell appliances. Those systems have specially designed mainboards that have the PCI lanes required, in the correct places, to allow the NICs installed to be saturated.

    You could simply have a bad interface in and of itself, or a bad cable in that interface as well. And we haven't even gotten into the VLAN configuration.

    I suppose the next step would be to use the troubleshooting tools to watch for DHCP traffic, and see how quickly Untangle receives the DHCP request, and responds.
    Last edited by sky-knight; 09-26-2019 at 08:21 AM.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  6. #6
    Untangle Ninja Jim.Alles's Avatar
    Join Date
    Jul 2008
    Location
    Central PA
    Posts
    2,447

    Default

    yes, test cables
    If you think I got Grumpy

  7. #7
    Untangler
    Join Date
    Apr 2010
    Posts
    50

    Default

    Thank you for the responses. Some very good reminders and insights.

    UT Hardware List

    CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
    RAM: 2x4GB DDR3 1333Mhz
    MOBO: Intel Classic DH61CR Desktop Motherboard - Intel H61 Express Chipset - Socket H2 LGA-1155

    PCI-E NIC 1: 2 port Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) (Rosewill 2 port gigabit PCI-E)
    PCI-E NIC 2: 1 port Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)

    I pulled pcaps from the troubleshooting and never saw a dhcp request but i did see 5-6 re-transmission in 120s.

    When I started pulling more traffic across VLANs (UT being the L3), the re-transmissions went way up. As did the number of out of order TCP packets. (This was streaming a large number of 1080p 30fps video streams at the same time).
    Last edited by Archness; 09-27-2019 at 02:17 AM.

  8. #8
    Untangle Ninja dwasserman's Avatar
    Join Date
    Jun 2008
    Location
    Argentina
    Posts
    4,335

    Default

    By chance the switches have spanning tree protocol enabled ?
    The world is divided into 10 kinds of people, who know binary and those not

  9. #9
    Master Untangler
    Join Date
    Apr 2010
    Posts
    108

    Default

    We are also having issues with Untangle dhcp on wireless. We have a bunch of VLAN-s, all of them get IP's from Server 2016 DHCP (with Untangle serving as dhcp rely) except public VLAN for wireless which get everything from specific Untangle interface. We see a lot of dhcp lease timeouts on wireless clients. I think it's a dnsmasq issue as it's got really bad since V14.1 (I think).

    And by the way, we don't have Unify for wireless.

  10. #10
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    24,958

    Default

    Quote Originally Posted by Archness View Post
    Thank you for the responses. Some very good reminders and insights.

    UT Hardware List

    CPU: Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz
    RAM: 2x4GB DDR3 1333Mhz
    MOBO: Intel Classic DH61CR Desktop Motherboard - Intel H61 Express Chipset - Socket H2 LGA-1155

    PCI-E NIC 1: 2 port Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 0c) (Rosewill 2 port gigabit PCI-E)
    PCI-E NIC 2: 1 port Intel Corporation 82541PI Gigabit Ethernet Controller (rev 05)

    I pulled pcaps from the troubleshooting and never saw a dhcp request but i did see 5-6 re-transmission in 120s.

    When I started pulling more traffic across VLANs (UT being the L3), the re-transmissions went way up. As did the number of out of order TCP packets. (This was streaming a large number of 1080p 30fps video streams at the same time).
    This indicates bus saturation. What is your hardware? The NICs already have me raising an eyebrow, but I need to know what motherboard you've inserted them into.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2