Page 1 of 2 12 LastLast
Results 1 to 10 of 19
  1. #1
    Untangle Ninja
    Join Date
    Jan 2011
    Posts
    1,322

    Default how to fail-back after outage?

    Anyone know how to get sessions to fail-back properly after an outage?

    I have 2 sites that have a high-speed cable or fiber connection for their primary, and a fairly slow DSL as a backup connection.

    At both sites, everything works fine when it fails over to the DSL but there are problems when the primary outage is over:

    OpenVPN Site-to-Site connection (Site 1): the site-to-site connection remains stuck on the slow DSL until I manually force it back to the high speed connection.

    VOIP Phones (Site 2): the phones don't work once the fiber connection comes back; they all have their control sessions stuck on the DSL, but when they try to establish a call the new voice session goes over the Fiber. This simply doesn't work. I have to unplug the DSL for awhile to force the phones to re-establish their control sessions on the fiber.

    Site 1 has only WAN Failover, while Site 2 has WAN Balancer as well, but it's set to route 0% of traffic to the DSL. At both sites, the DSL is meant strictly as a backup.

    Is there any way to automatically trigger an artificial outage of the backup WAN connection when the primary comes back online, in order to force all connections back to the primary?

    (I should mention the two sites are completely unrelated, the VPN site-to-site is not between these two sites)
    Last edited by johnsonx42; 09-30-2020 at 03:38 PM.

  2. #2
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    25,251

    Default

    This seems strange to me...

    Ingress connections are going to use whatever WAN the client connects to via IP address. WAN Balancer and Failover are only really concerned with EGRESS sessions. And those are going to use the WAN they have available at the time the connections are established. Even if you have a given WAN set to 0% in balancer, it's going to stay on the 0% WAN if the connection never resets itself after the 100% wan comes back online.

    If you want to force all connections off the 2nd WAN, then yes you're going to have to generate a down on that interface to force sessions to reconnect.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  3. #3
    Untangler jcoffin's Avatar
    Join Date
    Aug 2008
    Location
    Sunnyvale, CA
    Posts
    9,196

    Default

    Session are sticky on UT. Only if the current session interface is down will the session be reset. It's a "damn if you do, damn if you don't". Without sticky sessions, the current session on WAN2 would change to WAN1 but it will break most modern HTTP(s) sessions that use CDN. Sky-knight answer is correct.
    Attention: Support and help on the Untangle Forums is provided by
    volunteers and community members like yourself.
    If you need Untangle support please call or email support@untangle.com

  4. #4
    Untangle Ninja
    Join Date
    Jan 2011
    Posts
    1,322

    Default

    Quote Originally Posted by sky-knight View Post
    If you want to force all connections off the 2nd WAN, then yes you're going to have to generate a down on that interface to force sessions to reconnect.
    exactly... any idea how to do it?

    In the case of the OpenVPN, it technically has nothing to do with the WAN Failover module: the primary goes down, the remote OpenVPN client times out, retries, then tries the second WAN connection, and makes it's connection. When the primary comes back, the OpenVPN client has no idea. It's an established ingress session, so neither WAN Failover nor WAN Balancer are going to mess with it. (if it were an end-user client, they'd eventually reconnect naturally, but with an always-on site-to-site connection, it could stay stuck for many days (and did))

    For the phones, they are egress sessions, but again because they're established sessions on the DSL neither WAN Failover nor WAN Balancer mess with them just because the fiber came back online.
    Last edited by johnsonx42; 09-30-2020 at 03:39 PM.

  5. #5
    Untangle Ninja
    Join Date
    Jan 2011
    Posts
    1,322

    Default

    Quote Originally Posted by jcoffin View Post
    Session are sticky on UT. Only if the current session interface is down will the session be reset. It's a "damn if you do, damn if you don't". Without sticky sessions, the current session on WAN2 would change to WAN1 but it will break most modern HTTP(s) sessions that use CDN. Sky-knight answer is correct.
    I'd rather have HTTP sessions break than have the VOIP phones inoperable and site-to-site VPN stuck on a slow connection.

    Is there any way to force all sessions from the same client IP to use the same WAN? maybe something with tags? that would fix the phones - they'd be stuck on the DSL, but at least they'd work.

  6. #6
    Untangle Ninja
    Join Date
    Jan 2011
    Posts
    1,322

    Default

    maybe I can do something with Event Triggers?

    There is a WANFailOverEvent classid... maybe when the Primary WAN is CONNECTED, I can put a 2 minute tag on every host (or device? not sure which), and then add a Filter Rule or Access Rule that blocks traffic on the DSL for all tagged hosts. Any host on the primary WAN will be ignored by the rule, and any host on the DSL will be blocked for long enough to establish a new session on the Primary WAN. Then the tag will expire after 2 minutes, so everything will work on the next fail-over.

    I have a feeling though that no tag will be applied to a host with an existing session? I haven't played with these triggers and tags before.

    Any thoughts?

  7. #7
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    25,251

    Default

    I know of no way to automate this within Untangle itself.

    The only means to force the sessions off the slow WAN is to reset all the sessions in question. The only way to reset those sessions is to have the WAN go into a "down" state. This would require creating a test for that WAN that's guaranteed to fail in WAN Failover, marking that WAN as down so Untangle forcibly moves things over. But then that test would need disabled to allow the WAN to come back online.

    Event Triggers flag events for logging, they don't really enable any sort of actual action. You need some sort of decision making engine that can receive a WAN is ONLINE event notice for the larger WAN, that would then perhaps use an IP power switch to kill the DSL router for a time to force everyone back.

    This seems like a task for a python applet running on a raspberry pie somewhere.

    Or as I said you can do it manually with a WAN Failover test guaranteed to fail, thereby marking the WAN as offline and forcing WAN Failover to reset all those sessions. You'd just enable that test when needed, and disable it when done.

    If the DSL is DHCP assigned you can to something similar by setting that interface to disabled, saving, then setting it back to DHCP and saving. But I'd be more comfortable with the test rule flopping I think... mucking with the interfaces tab risks losing configuration data, not to mention resets networking which flushes even working sessions.
    Last edited by sky-knight; 09-30-2020 at 01:40 PM.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  8. #8
    Untangle Ninja Jim.Alles's Avatar
    Join Date
    Jul 2008
    Location
    Central PA
    Posts
    2,606

    Default

    I am usually pretty good at finding memes and such, but I could not find an appropriate Rube Goldberg Machine.
    This may be better as it will only waste 5 seconds of your life. It might even inspire you!
    https://www.youtube.com/watch?v=Pxi4SjJM2vI
    cheers!

  9. #9
    Untangle Ninja
    Join Date
    Jan 2011
    Posts
    1,322

    Default

    Quote Originally Posted by Jim.Alles View Post
    I am usually pretty good at finding memes and such, but I could not find an appropriate Rube Goldberg Machine.
    This may be better as it will only waste 5 seconds of your life. It might even inspire you!
    https://www.youtube.com/watch?v=Pxi4SjJM2vI
    cheers!
    well yes, that's exactly what I did a couple of weeks ago to get the phones working again (after the receptionist tore out her hair for over an hour trying to figure out why the phone kept ringing but every call dropped as soon as she answered).

    I just wish I could figure out a way to do this automatically, or even better if Untangle would add a fail-back feature to WAN Failover.

    I guess for now, unless I can figure out something with triggers and tags, I guess all I can do is carefully watch my email for WAN failure notices and then go check the sites to see if they need manual intervention.

  10. #10
    Untangle Ninja
    Join Date
    Feb 2016
    Posts
    1,121

    Default

    Quote Originally Posted by johnsonx42 View Post
    Any thoughts?
    I had a problem with an access point wandering off to sleep periodically and causing unpredictable problems. So I hooked it up to an AC timer and rebooted it daily. I don’t suppose you have enough business downtime to take the DSL modem down routinely in a similar fashion? I know that’s not terribly elegant, so just a desperate thought.
    Jim.Alles likes this.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2