Page 1 of 2 12 LastLast
Results 1 to 10 of 13
  1. #1
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    26,173

    Default Untangle 1 : Power failure + Dumb Users 0

    Last night at 11:30pm one of my contracts went dark. All servers showing offline, and my alarms blaring. First thing this morning as I prepare to take a trip there to see what's up I get a call... we have no internet.

    10 minutes later I get another call, the server room got really hot and we pop'd a breaker. Everything seems to be back online now except for Untangle. Untangle is apparently spewing error messages all over the place to the screen. Some of which in classic linux fashion are quite humorous and the client is laughing making a comment on how at least the malfunctioning bit of equipment has some entertainment value.

    However, one of the error messages the client is relaying to me indicates the hard drive may have failed in the Untangle server. No problem I have a spare, but with 6 interfaces, 5 virtual racks, and customized stuff everywhere I wasn't about to reconfigure that thing by hand. So I call up Tony at Untangle support and get him to e-mail me last night's configuration backup.

    15 minutes later, Tony has identified and fixed a store issue that would have prevented me reinstalling the box, as well as gotten me the backup I need. I'm loaded for bear got a 6.2 disk ready to go reinstall this thing... then my IM chirps.

    "ok... so how we have Internet...? wth...?" <-- direct quote from client.

    I fire up openvpn sure enough I'm back online, hit the admin console and login, everything is happy. A quick SSH to the unit later reveals the "sda1" error it was whining about was linux's normal "I've been powered down dirty so I'm going to check this hard disk before I boot" error.

    So I arrive on site for the PR visit, and make sure everything is good and I found out another tidbit... in an act of impatience they reset the Untangle server mid boot process 3 times. So not only did it die due to no power, but it wasn't allowed to boot 3 times and they finally left it alone for me to get there and it comes up all on it's own.

    Who says this thing isn't enterprise ready...

    Quad Xeon 2.4ghz server build on an Intel s3000 series mainboard, with 2 onboard gigabit interfaces and a PCI-e quad port gigabit card. 4gb of Ram. 400gb sata hard drive. Unit was installed September of 2008. Still going strong, my oldest installation. Still only using 30gb of the drive.
    Last edited by sky-knight; 04-07-2010 at 11:52 AM.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  2. #2
    Untangle Junkie dmorris's Avatar
    Join Date
    Nov 2006
    Location
    San Carlos, CA
    Posts
    17,747

    Default

    lol - thats hilarious.

    it was probably running fsck or something.
    Attention: Support and help on the Untangle Forums is provided by volunteers and community members like yourself.
    If you need Untangle support please call or email support@untangle.com

  3. #3
    Untangle Ninja dbunyard's Avatar
    Join Date
    Nov 2008
    Location
    Westerville, Ohio, USA
    Posts
    1,059

    Default

    LOL, awesome.
    Dan

    You may one day find something interesting here. Today is not that day. Tomorrow isn't looking too good either.

  4. #4
    Untangle Ninja Mathiau's Avatar
    Join Date
    Feb 2008
    Location
    Costa Frickn' Rica
    Posts
    1,636

    Default

    too funny, amazing how some people just expect something to turn on and work instantly!

    People have no patience these days.

    when i have to reboot out firewall, which is seldom, i tell every one 2-3 mins, but i know it is only going to be about 30 seconds for it to allow connections again.
    kv-2 | UT 11.0.1 | Dell R610 Server | Intel Xeon 2.8Ghz Quad Cores | 24Gb DDR3 ECC | 1 Intel QPort NIC | Integrated Broadcom QP | Dell Perc 4i | 6 x 73G 2.5 15k SAS raid 10 | 100mb/100mb | 30mb/30Mb

  5. #5
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    26,173

    Default

    Well, that's just it...

    In this case the customer wasn't completely nuts.

    You see, the Untangle server took longer to boot than the Exchange server! That Exchange 2007 box is underpowered, and barely meets the minimum requirements. It's also a DC and running DNS. It takes 30min to power the thing down, and another 30min to power up.

    So when the customer saw Exchange online and happy, they figured Untangle would be done. What they didn't realize is it takes awhile for fsck to get through a 400gb drive.

    All in all I find the situation hilarious. And, they finally got an air conditioner to keep that room cool AND I got the budget to replace the dead cells in the UPSs. Never mind I've been warning about this stuff for 2 years... but hey nothing like a nice visible failure to motivate people.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  6. #6
    Untangle Ninja raditude's Avatar
    Join Date
    Jan 2009
    Location
    Eugene, OR
    Posts
    1,143

    Default

    It is always the same situation (that I see at least), that you preach about what will one day happen, and pretty much immediately following your "vision" coming true, they are like we have to get this fixed.... Gee you think? I do have to say that I like it when a situation resolves to this outcome (where nothing seems to have actually died/not come back online), than when things power up and you lose multiple systems....

  7. #7
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    26,173

    Default

    Yeah but I cut people slack there.

    Think about it from the lay person's perspective.

    You're the network guy, warning about doomsday this, catastrophe that. And you're doing so because it's a real worry, and it's your job to worry.

    But the guy with the purse strings, all he sees is a business that is alive, happy, and running like a clock. He doesn't understand that the body has cancer, it's still alive. You always hope you can teach people to listen, but even as you make progress there... you have to admit.

    1.) Things almost always live longer than you expected them to.
    2.) Even if you fix it, it can still break.

    There is no perfect solution. This particular client is great, and I love working with them. The way they react is always hilarious, and they even laugh at themselves while doing it. The CFO had my original written warning about the temperatures in that server room from 2 years ago in his hand today! He pointed at it and said... you warned me, but we didn't have to spend that money for two years! But today was bad because everything died on the morning of a conference... now the issue is a hot button and the money is spent.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  8. #8
    Untangle Ninja dwasserman's Avatar
    Join Date
    Jun 2008
    Location
    Argentina
    Posts
    4,366

    Default

    I dont did the accounts but 3 hours of fail in 1 1/2 year is better to 99,99% i believe.

  9. #9
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    26,173

    Default

    My monitors weren't in place when the unit was installed, but I do have almost 200 days of data to query in Nagios. Here's the relevant data for the exchange server that runs behind the Untangle.

    UP Total 194d 5h 14m 58s 99.599%
    Down Total 0d 18h 45m 2s 0.401%

    Mind you, the monitor is at my office, and the server is at their office, and it uses OpenVPN to make the query. So if EITHER of our Internet connections go out, it shows as a down.

    Just to illustrate just out underpowered this poor exchange server is...

    HTTPS 52.776% (98.786%) 0.000% (0.000%) 0.000% (0.000%) 0.649% (1.214%) 46.575%
    SMTP 0.000% (0.000%) 0.000% (0.000%) 0.000% (0.000%) 53.506% (100.000%) 46.494%

    The monitors think the two services are offline half the time due to timeouts. The services are there... just really really slow.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  10. #10
    Untangle Ninja
    Join Date
    Jan 2009
    Posts
    1,184

    Default

    So how long did the ups last?

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2