Page 1 of 3 123 LastLast
Results 1 to 10 of 26
  1. #1
    Newbie
    Join Date
    Dec 2020
    Posts
    10

    Default Stability bug in UVM JVM (Crashes, Orphans OpenVPN Processes) in Untangle Firewall

    Perhaps someone can help. I opened a case as I do have an Untangle home license. They asked me to reinstall before they would help me, which I did, and then they proceeded to say they could no longer help me because they changed their policy to disallow Untangle home licenses from access to technical support.

    UVM (JVM) crashes daily, sometimes many times per day, orphaning the openvpn processes and stopping all communication through TunnelVPN service. The only remedy is to 'pkill openvpn', and then toggle (restart) the TunnelVPN service to re-launch the processes. They are normally owned by UVM JVM, but since that crashes, the openvpn processes are orphaned. UVM then attempts to start new openvpn processes, while the existing orphaned processes are running, and those crash while attempting to start, because the management socket used for UVM to communicate with openvpn and read the connection statistics and manage the instances is already in use with the orphaned processes.

    I'll demonstrate below, as I supplied Untangle, that there is an application bug with their java code. Forgive me, as there is a lot of information here.

    Example below of the orphaned openvpn processes. You can see PID 1 owns them. Normally it is the java application UVM. You can see the ports 2200, 2202, etc, are the management ports that are bound by openvpn and the reason why the new openvpn instances fail to start.

    Code:
    root      97546      1  0 Dec09 ?        00:09:15 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-202/tunnel.conf --writepid /run/tunnelvpn/tunnel-202.pid --dev tun202 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-202 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2202 --bind --local [IP ADDRESS] --port 0
    root      97548      1  0 Dec09 ?        00:00:39 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-200/tunnel.conf --writepid /run/tunnelvpn/tunnel-200.pid --dev tun200 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-200 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2200 --nobind
    .....
    The following errors show up in tunnel.log when the UVM restarts and tries to spawn new openvpn processes, or when you try to respawn them by toggling the service.

    Code:
    .....
    Thu Dec 10 14:42:27 2020 WARNING: Using --management on a TCP port WITHOUT passwords is STRONGLY discouraged and considered insecure
    Thu Dec 10 14:42:27 2020 OpenVPN 2.4.7 x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Feb 20 2019
    Thu Dec 10 14:42:27 2020 library versions: OpenSSL 1.1.1d  10 Sep 2019, LZO 2.10
    Thu Dec 10 14:42:27 2020 MANAGEMENT: Socket bind failed on local address [AF_INET]127.0.0.1:2205: Address already in use (errno=98)
    Thu Dec 10 14:42:27 2020 Exiting due to fatal error
    As you can see above, because the ports are already bound by the orphaned processes, they fail to start. This leaves the TunnelVPN service unusable. I would presume it also would affect the OpenVPN service, since it likely uses the same mechanisms and would be spawned by UVM.

    In the console.log.crash file, which was generated when UVM crashes, it has the following. This seems to imply that they are using libnetcap, and I'm guessing the process is unable to keep up (buffer) and is crashing. Not sure what specifically this is tied to or how they are using this exactly, but seems to be a potential indicator.

    Code:
    .....
    Dec 10 14:35:41 gateway uvmconsole: 12-10 14:35:41.018221| ERROR:./libnetcap/src/netcap_udp.c:187:WARNING: Mailbox Full: Dropping Packet (xxx.xxx.xx.88:16393 -> xxx.xxx.xxx.55:16393)
    Dec 10 14:35:41 gateway uvmconsole: [93004.388s][info][gc] GC(12601) Pause Full (System.gc()) 91M->80M(115M) 278.974ms
    Dec 10 14:35:41 gateway uvmconsole: 12-10 14:35:41.041086| ERROR:./libnetcap/src/netcap_udp.c:187:WARNING: Mailbox Full: Dropping Packet (xxx.xxx.xx.88:16393 -> xxx.xxx.xxx.55:16393)
    Dec 10 14:35:42 gateway uvmconsole: [93005.852s][info][gc] GC(12602) Pause Young (Normal) (G1 Evacuation Pause) 96M->80M(115M) 4.452ms
    Dec 10 14:35:45 gateway uvmconsole: [93009.312s][info][gc] GC(12603) Pause Young (Normal) (G1 Evacuation Pause) 95M->81M(115M) 4.313ms
    .....
    Dec 10 14:37:32 gateway uvmconsole: [93116.254s][info][gc] GC(12646) Pause Young (Normal) (G1 Evacuation Pause) 97M->85M(115M) 4.155ms
    Dec 10 14:37:35 gateway uvmconsole: #
    Dec 10 14:37:35 gateway uvmconsole: # A fatal error has been detected by the Java Runtime Environment:
    Dec 10 14:37:35 gateway uvmconsole: #
    Dec 10 14:37:35 gateway uvmconsole: #  SIGSEGV (0xb) at pc=0x00007efe23516a00, pid=60824, tid=60869
    Dec 10 14:37:35 gateway uvmconsole: #
    Dec 10 14:37:35 gateway uvmconsole: # JRE version: OpenJDK Runtime Environment (11.0.8+10) (build 11.0.8+10-post-Debian-1deb10u1)
    Dec 10 14:37:35 gateway uvmconsole: # Java VM: OpenJDK 64-Bit Server VM (11.0.8+10-post-Debian-1deb10u1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
    Dec 10 14:37:35 gateway uvmconsole: # Problematic frame:
    Dec 10 14:37:35 gateway uvmconsole: # C  [libpthread.so.0+0xca00]  pthread_rwlock_rdlock+0x0
    Dec 10 14:37:35 gateway uvmconsole: #
    Dec 10 14:37:35 gateway uvmconsole: # Core dump will be written. Default location: /tmp/core
    Dec 10 14:37:35 gateway uvmconsole: #
    Dec 10 14:37:35 gateway uvmconsole: # An error report file with more information is saved as:
    Dec 10 14:37:35 gateway uvmconsole: # /tmp/hs_err_pid60824.log
    Dec 10 14:37:35 gateway uvmconsole: #
    Dec 10 14:37:35 gateway uvmconsole: # If you would like to submit a bug report, please visit:
    Dec 10 14:37:35 gateway uvmconsole: #   https://bugs.debian.org/openjdk-11
    Dec 10 14:37:35 gateway uvmconsole: #

    From the above, you can see it created a hs_err_pid60824.log file. Inside, there is a lot of information, but I'll post the more relevant part only.
    Code:
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007efe23516a00, pid=60824, tid=60869
    #
    # JRE version: OpenJDK Runtime Environment (11.0.8+10) (build 11.0.8+10-post-Debian-1deb10u1)
    # Java VM: OpenJDK 64-Bit Server VM (11.0.8+10-post-Debian-1deb10u1, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
    # Problematic frame:
    # C  [libpthread.so.0+0xca00]  pthread_rwlock_rdlock+0x0
    #
    # Core dump will be written. Default location: /tmp/core
    #
    # If you would like to submit a bug report, please visit:
    #   https://bugs.debian.org/openjdk-11
    #
    
    ---------------  S U M M A R Y ------------
    
    Command Line: -Xmx4023m -Xms32m -Xss228k -XX:+PerfDisableSharedMem -XX:+UseG1GC -XX:MaxHeapFreeRatio=30 -XX:MinHeapFreeRatio=20 -XX:InitiatingHeapOccupancyPercent=30 -Xlog:gc*,gc+heap=debug:file=/var/log/uvm/gc.log:time,tags -verbose:gc 
    -Dcom.sun.jndi.ldap.object.disableEndpointIdentification=true -Dcom.sun.jndi.ldap.connect.pool.protocol=' plain ssl ' -Dcom.sun.jndi.ldap.connect.pool.timeout=30000 -Dprefix= -Djava.library.path=/usr/lib/uvm -Dnetcap.numthreads=25 -Drepo
    rts.max_queue_len=1000000 -Dnetworkaddress.cache.ttl=30 -Dsun.net.inetaddr.ttl=30 -Dnetworkaddress.cache.negative.ttl=10 -Djava.net.preferIPv4Stack=true com.untangle.uvm.Main
    
    Host: Intel(R) Celeron(R) CPU  J1900  @ 1.99GHz, 4 cores, 7G, Debian GNU/Linux 10 (buster)
    Time: Thu Dec 10 14:37:35 2020 CST elapsed time: 93118 seconds (1d 1h 51m 58s)
    
    ---------------  T H R E A D  ---------------
    
    Current thread is native thread
    
    Stack: [0x00007efdf0110000,0x00007efdf014f000],  sp=0x00007efdf0141d88,  free space=199k
    Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
    C  [libpthread.so.0+0xca00]  pthread_rwlock_rdlock+0x0
    
    
    siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000130
    
    Register to memory mapping:
    
    RAX=
    [error occurred during error reporting (printing register info), id 0xb, SIGSEGV (0xb) at pc=0x00007efe22edac08]
    
    Registers:
    RAX=0x0000000000000100, RBX=0x00007efda0071d40, RCX=0x0000000000000000, RDX=0x0000000000000001
    RSP=0x00007efdf0141d88, RBP=0x00007efda0067920, RSI=0x0000000000000000, RDI=0x0000000000000118
    R8 =0x00007efda0067960, R9 =0x0000000000000000, R10=0x0000000000000000, R11=0x0000000000000246
    R12=0x0000000000000001, R13=0x00007efdf0b05951, R14=0x00007efd9409d400, R15=0x0000000000020102
    RIP=0x00007efe23516a00, EFLAGS=0x0000000000010202, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
      TRAPNO=0x000000000000000e
    
    Top of Stack: (sp=0x00007efdf0141d88)
    0x00007efdf0141d88:   00007efdf0ae1174 00007efd940724f0
    0x00007efdf0141d98:   00007efda0071d40 00007efda0067920
    ..........

    As you can see, their JVM process com.untangle.uvm.Main is crashing due to a SIGSEGV. It references libpthread.so.

    [root @ gateway] /var/log/uvm # addr2line -e /usr/lib/x86_64-linux-gnu/libpthread.so -C -f 0xca00
    __GI___pthread_rwlock_rdlock
    ??:?

    The library they are using is libpthread.so that it is crashing from, which may be due to an access violation or a stack overflow. Suspect it is a stack overflow, but it's not my code.

    There are no resource issues on the system. It is idle most of the time, and has 8GiB RAM, of which about 4.7 is available at any one time.

    Perhaps Untangle will take notice and want to address a pretty clear bug in the software. When their java code crashes, it does leave a crash dump. But analyzing a foreign program's crash dump isn't my expertise.

    Any help would be sincerely appreciated.

  2. #2
    Untangler jcoffin's Avatar
    Join Date
    Aug 2008
    Location
    Lake Tahoe
    Posts
    9,813

    Default

    Verison? Hardware or VM type?

    The original ticket found modifications to the system on the CLI so they requested a fresh install. Any modifications outside of a clean install are YMMV. Is this a new fresh install?
    Last edited by jcoffin; 12-10-2020 at 03:05 PM.
    Attention: Support and help on the Untangle Forums is provided by
    volunteers and community members like yourself.
    If you need Untangle support please call or email support@untangle.com

  3. #3
    Untangler
    Join Date
    May 2008
    Posts
    464

    Default

    Disturbing that you were told no support. I hope that is not true. Maybe you should open a jira account and report it there.
    https://jira.untangle.com/projects/N...=allopenissues

  4. #4
    Untangler jcoffin's Avatar
    Join Date
    Aug 2008
    Location
    Lake Tahoe
    Posts
    9,813

    Default

    Home licenses do not have support and if your system is modified on the CLI, support will also not help. This policy has been there for a long time.
    Attention: Support and help on the Untangle Forums is provided by
    volunteers and community members like yourself.
    If you need Untangle support please call or email support@untangle.com

  5. #5
    Newbie
    Join Date
    Dec 2020
    Posts
    10

    Default

    Thanks, guys.

    The version is:
    Build: 16.1.1.20201028T105733.d127809143-1buster
    Kernel: 4.19.0-11-untangle-amd64

    It is running effectively the same hardware as their new Untangle appliance. Looks identical, it is probably the same OEM. It's a fanless SBC with a Intel Celeron J1900 4-core running on hardware (not a VM)

    Unfortunately this appears to be a brand-new policy of theirs associated to home user licenses, stating it's because they had a high call volume and couldn't offer support to home licenses. I get the business problem, but it was a little disappointing that I already had a case open with them (that they closed), and went to open a case to continue the work and they told me I was no longer entitled to support and I had to go to the community forums.

  6. #6
    Newbie
    Join Date
    Dec 2020
    Posts
    10

    Default

    Quote Originally Posted by jcoffin View Post
    Home licenses do not have support and if your system is modified on the CLI, support will also not help. This policy has been there for a long time.
    My system is not modified by CLI. Not only was it not before, but I just reinstalled it from scratch and it is clean and brand new. Check yourself. But regardless if you were suggesting the problem is due to my own error (it's not), support never told me I didn't have support on this and I previously had a case on it.

  7. #7
    Newbie
    Join Date
    Dec 2020
    Posts
    10

    Default

    Unfortunately, the UVM is so unstable it just crashed in less than two hours again. Had to cycle it again.

    Code:
    [root @ gateway] /var/log/uvm # ps -ef | grep openvpn
    root      67020      1  0 14:45 ?        00:00:03 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-202/tunnel.conf --writepid /run/tunnelvpn/tunnel-202.pid --dev tun202 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-202 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2202 --bind --local xxx.xxx.xxx.6 --port 0
    root      67021      1  0 14:45 ?        00:00:03 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-200/tunnel.conf --writepid /run/tunnelvpn/tunnel-200.pid --dev tun200 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-200 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2200 --nobind
    root      67023      1  0 14:45 ?        00:00:03 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-204/tunnel.conf --writepid /run/tunnelvpn/tunnel-204.pid --dev tun204 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-204 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2204 --bind --local xxx.xxx.xxx.6 --port 0
    root      67025      1  0 14:45 ?        00:00:06 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-205/tunnel.conf --writepid /run/tunnelvpn/tunnel-205.pid --dev tun205 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-205 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2205 --bind --local xxx.xxx.xxx.6 --port 0
    root      92414  64884  0 16:36 pts/4    00:00:00 grep openvpn
    [root @ gateway] /var/log/uvm # pkill openvpn
    [root @ gateway] /var/log/uvm # ps -ef | grep openvpn
    root      92976  91143  0 16:36 pts/5    00:00:00 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-202/tunnel.conf --writepid /run/tunnelvpn/tunnel-202.pid --dev tun202 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-202 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2202 --bind --local xxx.xxx.xxx.6 --port 0
    root      92977  91143  0 16:36 pts/5    00:00:00 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-200/tunnel.conf --writepid /run/tunnelvpn/tunnel-200.pid --dev tun200 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-200 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2200 --nobind
    root      92978  91143  0 16:36 pts/5    00:00:00 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-204/tunnel.conf --writepid /run/tunnelvpn/tunnel-204.pid --dev tun204 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-204 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2204 --bind --local xxx.xxx.xxx.6 --port 0
    root      92979  91143  0 16:36 pts/5    00:00:00 /usr/sbin/openvpn --config /usr/share/untangle/settings/tunnel-vpn/tunnel-205/tunnel.conf --writepid /run/tunnelvpn/tunnel-205.pid --dev tun205 --cd /usr/share/untangle/settings/tunnel-vpn/tunnel-205 --log-append /var/log/uvm/tunnel.log --auth-user-pass auth.txt --script-security 2 --up /usr/share/untangle/bin/tunnel-vpn-up.sh --down /usr/share/untangle/bin/tunnel-vpn-down.sh --management 127.0.0.1 2205 --bind --local xxx.xxx.xxx.6 --port 0
    root      93137  64884  0 16:36 pts/4    00:00:00 grep openvpn

  8. #8
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    26,414

    Default

    None of the home subscriptions have priority support, that's not the same as no support.

    No support means go away, no priority support means no help right now, and no right to use the phone to get it. You're free to open a ticket via email.

    Now, onto another little thing that gets people. Log into your Untangle UI, click config, then about.

    look for a line that says History, if it says yes and a number, that means your box was customized via the command line. If you're trying to open a bug report, and that's present, Untangle is going to rightfully assume you've done something to your box that's caused this. So if you want support, ever... stay out of the terminal.

    One last thing, I've not seen a .crash file on Untangle without there being a memory issue in the hardware in the better part of a decade. So I really suggest you test the hardware, because the UVM is incredibly well proven at this point. It's not perfect obviously... but it's a thing that a ton of systems run without issue for lengths of time measured in kernel lifetimes.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  9. #9
    Newbie
    Join Date
    Dec 2020
    Posts
    10

    Default

    I opened the ticket through email/web form, and they turned me away. Apparently a new policy. I called to ask why my new ticket was being auto-closed, and they politely explained that is a new policy they had to institute.

    History: yes (61)

    Yes, I've logged into the system to obtain the information previously. That is the only way I could get any information on the problem was to login via CLI and look. I've not even so much as modified a file or done anything, so saying there is a history file is not indicative of someone actually modifying the system. I get they don't want people modifying things in an appliance, but this isn't the case. Unfortunately, with their new policy of not allowing anything but community support with a home paid license, it means people really don't have any choice but to login to help determine root cause.

    This specific java error and library references isn't a hardware error. I've done linux kernel development and I know when there is a hardware issue. Those will nearly always show up in a number of other places and don't specifically reference a specific function in a library like this does. dmesg is clear and doesn't show an issue.

    .. that, and there is a core dump each and every time that they can look at to determine specific root cause. It's always the same specific library functions references after a SIGSEGV on the same process.

    Quote Originally Posted by sky-knight View Post
    None of the home subscriptions have priority support, that's not the same as no support.

    No support means go away, no priority support means no help right now, and no right to use the phone to get it. You're free to open a ticket via email.

    Now, onto another little thing that gets people. Log into your Untangle UI, click config, then about.

    look for a line that says History, if it says yes and a number, that means your box was customized via the command line. If you're trying to open a bug report, and that's present, Untangle is going to rightfully assume you've done something to your box that's caused this. So if you want support, ever... stay out of the terminal.

    One last thing, I've not seen a .crash file on Untangle without there being a memory issue in the hardware in the better part of a decade. So I really suggest you test the hardware, because the UVM is incredibly well proven at this point. It's not perfect obviously... but it's a thing that a ton of systems run without issue for lengths of time measured in kernel lifetimes.
    Last edited by swass; 12-10-2020 at 04:30 PM.

  10. #10
    Untangler
    Join Date
    May 2008
    Posts
    464

    Default

    If support is such a problem you can't afford to provide it there is something far more serious going on.

Page 1 of 3 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2