View Full Version : Untangle Hangls - Out of memory
yavosh
10-25-2007, 01:24 AM
Hello,
After running untangle for a day or so it hangs. After this the machine is completely unresponsive both from the network and locally.
Checking /var/log/messages reveals and error (see below). My initial reaction was that this was faulty hardware (RAM), so we replaced the machine with a new one (HP dx2300). The machine has 2GB of ram. Any ideas what might be causing this.
Error from /var/log/message trimmed for readability.
Oct 25 09:27:38 untangle kernel: oom-killer: gfp_mask=0xd0, order=1
Oct 25 09:27:38 untangle kernel: [out_of_memory+151/192] out_of_memory+0x97/0xc0
Oct 25 09:27:38 untangle kernel: [__alloc_pages+552/688] __alloc_pages+0x228/0x2b0
Oct 25 09:27:38 untangle kernel: [kmem_getpages+58/160] kmem_getpages+0x3a/0xa0
Oct 25 09:27:38 untangle kernel: [cache_grow+187/368] cache_grow+0xbb/0x170
Oct 25 09:27:38 untangle kernel: [cache_alloc_refill+354/528] cache_alloc_refill+0x162/0x210
Oct 25 09:27:38 untangle kernel: [kmem_cache_alloc+60/64] kmem_cache_alloc+0x3c/0x40
Oct 25 09:27:38 untangle kernel: [dup_task_struct+59/144] dup_task_struct+0x3b/0x90
Oct 25 09:27:38 untangle kernel: [copy_process+119/3696] copy_process+0x77/0xe70
... Goes on
hescominsoon
10-25-2007, 06:13 AM
Hello,
After running untangle for a day or so it hangs. After this the machine is completely unresponsive both from the network and locally.
Checking /var/log/messages reveals and error (see below). My initial reaction was that this was faulty hardware (RAM), so we replaced the machine with a new one (HP dx2300). The machine has 2GB of ram. Any ideas what might be causing this.
Error from /var/log/message trimmed for readability.
Oct 25 09:27:38 untangle kernel: oom-killer: gfp_mask=0xd0, order=1
Oct 25 09:27:38 untangle kernel: [out_of_memory+151/192] out_of_memory+0x97/0xc0
Oct 25 09:27:38 untangle kernel: [__alloc_pages+552/688] __alloc_pages+0x228/0x2b0
Oct 25 09:27:38 untangle kernel: [kmem_getpages+58/160] kmem_getpages+0x3a/0xa0
Oct 25 09:27:38 untangle kernel: [cache_grow+187/368] cache_grow+0xbb/0x170
Oct 25 09:27:38 untangle kernel: [cache_alloc_refill+354/528] cache_alloc_refill+0x162/0x210
Oct 25 09:27:38 untangle kernel: [kmem_cache_alloc+60/64] kmem_cache_alloc+0x3c/0x40
Oct 25 09:27:38 untangle kernel: [dup_task_struct+59/144] dup_task_struct+0x3b/0x90
Oct 25 09:27:38 untangle kernel: [copy_process+119/3696] copy_process+0x77/0xe70
... Goes on
how many users? what features do you have turned on? Run top and show us that output as well.
yavosh
10-25-2007, 06:26 AM
Number of users is ~50
I have enabled
Spam Blocker
Phish Blocker
Spyware blocker
Web Filter
Virus Blocker
IPS
Protocol Control
Firewall (no rules)
Router - COnfigured but not started
Attack blocker
Reports
This is what top looks like 30min after a reboot
top - 15:26:02 up 29 min, 1 user, load average: 0.07, 0.15, 0.20
Tasks: 72 total, 1 running, 71 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.5%us, 0.5%sy, 0.0%ni, 95.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2065184k total, 782656k used, 1282528k free, 31148k buffers
Swap: 2064344k total, 0k used, 2064344k free, 258232k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3306 root 20 0 955m 247m 20m S 9 12.2 4:55.16 java
1 root 0 0 156 84 52 S 0 0.0 0:00.82 init
2 root RT 0 0 0 0 S 0 0.0 0:00.33 migration/0
3 root 0 19 0 0 0 S 0 0.0 0:00.16 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.08 watchdog/0
5 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1
6 root 0 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1
8 root 0 -5 0 0 0 S 0 0.0 0:00.37 events/0
9 root 0 -5 0 0 0 S 0 0.0 0:00.12 events/1
10 root 0 -5 0 0 0 S 0 0.0 0:00.00 khelper
11 root 0 -5 0 0 0 S 0 0.0 0:00.00 kthread
14 root 0 -5 0 0 0 S 0 0.0 0:00.58 kblockd/0
15 root 0 -5 0 0 0 S 0 0.0 0:00.00 kblockd/1
16 root 22 -5 0 0 0 S 0 0.0 0:00.00 kacpid
125 root 10 -5 0 0 0 S 0 0.0 0:00.00 khubd
202 root 20 0 0 0 0 S 0 0.0 0:00.00 pdflush
203 root 0 0 0 0 0 S 0 0.0 0:00.38 pdflush
205 root 15 -5 0 0 0 S 0 0.0 0:00.00 aio/0
206 root 15 -5 0 0 0 S 0 0.0 0:00.00 aio/1
204 root 28 0 0 0 0 S 0 0.0 0:00.00 kswapd0
795 root 15 -5 0 0 0 S 0 0.0 0:00.02 kseriod
858 root 15 -5 0 0 0 S 0 0.0 0:00.00 ata/0
859 root 15 -5 0 0 0 S 0 0.0 0:00.00 ata/1
907 root 15 -5 0 0 0 S 0 0.0 0:00.00 kpsmoused
913 root 0 0 0 0 0 S 0 0.0 0:00.07 kirqd
917 root 20 0 0 0 0 S 0 0.0 0:01.25 kjournald
1315 root 20 0 0 0 0 S 0 0.0 0:00.00 khpsbpkt
1583 root 21 0 0 0 0 S 0 0.0 0:00.00 shpchpd_event
2792 root 0 0 1632 560 460 S 0 0.0 0:00.02 syslogd
2795 root 0 0 2584 1412 348 S 0 0.1 0:00.19 klogd
2820 root 20 0 15584 2284 1316 S 0 0.1 0:00.00 slapd
2828 root 4 -20 2432 2432 1764 S 0 0.1 0:00.23 atop
3054 clamav 0 0 41972 30m 980 S 0 1.5 0:19.11 clamd
3142 clamav 19 0 2736 712 564 S 0 0.0 0:00.05 freshclam
3249 postgres 0 0 90188 3012 2636 S 0 0.1 0:00.06 postmaster
3253 postgres 0 0 7980 1880 512 S 0 0.1 0:00.47 postmaster
3254 postgres 2 0 7144 1060 580 S 0 0.1 0:00.48 postmaster
3266 root 15 0 3548 1036 784 S 0 0.1 0:00.11 sshd
3294 root 0 0 2748 1412 1056 S 0 0.1 0:01.61 uvm.sh
8341 daemon 20 0 1768 464 368 S 0 0.0 0:00.00 atd
8344 root 0 0 1824 740 600 S 0 0.0 0:00.00 cron
8351 root 19 0 1580 488 424 S 0 0.0 0:00.00 getty
8353 root 19 0 1584 492 424 S 0 0.0 0:00.00 getty
8354 root 19 0 1580 488 424 S 0 0.0 0:00.00 getty
8355 root 19 0 1584 492 424 S 0 0.0 0:00.00 getty
8356 root 20 0 1580 488 424 S 0 0.0 0:00.00 getty
8536 root 21 0 2680 1364 1080 S 0 0.1 0:00.22 xsession
8599 root 20 0 2680 600 316 S 0 0.0 0:00.00 xsession
8601 root 0 0 31712 20m 2944 S 0 1.0 0:01.09 XFree86
yavosh
10-29-2007, 02:25 AM
This happens consistently 2 times per day.
One around 10 AM and one around 13-14.
Looks like an obvious memory leak triggered by some scheduled process.
I would appreciate any resolution for this as it is a major show stopper.
I have tried it on 2 different machines with fresh installs.
What Untangle version are you using? Did you use the Untangle downloadable ISO or do any modifications to it? Describe your hardware in more detail. I hate to answer questions with questions, but I would think if it was directly Untangle-related, this would be reported on a frequent basis. We would need to figure out what course to take to figure out what the problem really is. Out of curiosity, if it crashes with such consistency, have you monitored the interface at the time to see if you have anything going on externally that Untangle is responding to (successfully or not)?
yavosh
10-29-2007, 11:35 AM
I have installed the latest iso I had available. It was 5.0.2
One thing that I noticed was that the machine was failing to connect to some hosts on port 2703. I have opened this on our external firewall. On other thing to note is that internet access is generally blocked from our external firewall with the exception to dns,http and ntp. One of my thoughts was that maybe untangle was trying to access something and was crashing as a result. I have tried monitoring some access with tcpdump but I have found nothing so far. If you could advice what traffic other than http untangle might require please let me know. I see that updates are downloaded without problems.
Machine is a HP dx2300 with 2GB ram.
More hardware information follows. I had this running on an older machine with less ram with similar problems, given that I think this must be network related and not hardware.
~ # uname -a [root @ untangle]
Linux untangle.example.com 2.6.16-ck11-untangle-cd-486 #2 SMP PREEMPT Thu Jul 19 15:32:44 PDT 2007 i686 GNU/Linux
~ # cat /etc/debian_version [root @ untangle]
3.1
~ # cat /proc/cpuinfo [root @ untangle]
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz
stepping : 13
cpu MHz : 1795.762
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 cx16 xtpr lahf_lm
bogomips : 3593.99
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz
stepping : 13
cpu MHz : 1795.762
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 cx16 xtpr lahf_lm
bogomips : 3591.30
~ # cat /proc/meminfo [root @ untangle]
MemTotal: 2065184 kB
MemFree: 563484 kB
Buffers: 35484 kB
Cached: 377916 kB
SwapCached: 0 kB
Active: 662776 kB
Inactive: 211236 kB
HighTotal: 1170304 kB
HighFree: 324004 kB
LowTotal: 894880 kB
LowFree: 239480 kB
SwapTotal: 2064344 kB
SwapFree: 2064344 kB
Dirty: 56 kB
Writeback: 0 kB
Mapped: 494988 kB
Slab: 31100 kB
CommitLimit: 3096936 kB
Committed_AS: 681536 kB
PageTables: 1388 kB
VmallocTotal: 114680 kB
VmallocUsed: 21652 kB
VmallocChunk: 92740 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 4096 kB
~ # [root @ untangle]
dmorris
10-29-2007, 12:07 PM
thats strange.
If you are familiar with atop you can use it to track down what is using the memory. In the ps aux dump you posted, the untangle-vm is only taking up 256 megs, so something else must be taking up a lot of memory.
If you to 'atop -r /var/log/atop.log' you can scroll forward and backward through time viewing processes sorted via memory. This should allow you to tell what is taking the memory and we can make some suggestions.
(Or you can post the atop.log somewhere and I can look at it)
yavosh
10-29-2007, 12:46 PM
I have put atop and messages logs here
http://72.232.176.42/logs.tar.gz
The untangle vm slowly creeps up to more memory. Currently after running for 5h i have this.
top - 20:41:00 up 5:00, 2 users, load average: 0.00, 0.00, 0.00
Tasks: 74 total, 1 running, 73 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5%us, 0.0%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2065184k total, 1595776k used, 469408k free, 37428k buffers
Swap: 2064344k total, 0k used, 2064344k free, 391832k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3152 root 15 0 1009m 359m 20m S 1 17.8 10:05.13 java
1 root 0 0 156 84 52 S 0 0.0 0:00.79 init
2 root RT 0 0 0 0 S 0 0.0 0:00.61 migration/0
3 root 0 19 0 0 0 S 0 0.0 0:00.34 ksoftirqd/0
4 root RT 0 0 0 0 S 0 0.0 0:00.08 watchdog/0
5 root RT 0 0 0 0 S 0 0.0 0:00.05 migration/1
6 root 0 19 0 0 0 S 0 0.0 0:00.00 ksoftirqd/1
7 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1
8 root 0 -5 0 0 0 S 0 0.0 0:01.49 events/0
9 root 0 -5 0 0 0 S 0 0.0 0:00.10 events/1
10 root 0 -5 0 0 0 S 0 0.0 0:00.00 khelper
11 root 0 -5 0 0 0 S 0 0.0 0:00.00 kthread
14 root 0 -5 0 0 0 S 0 0.0 0:00.64 kblockd/0
15 root 0 -5 0 0 0 S 0 0.0 0:00.03 kblockd/1
16 root 22 -5 0 0 0 S 0 0.0 0:00.00 kacpid
125 root 10 -5 0 0 0 S 0 0.0 0:00.00 khubd
202 root 20 0 0 0 0 S 0 0.0 0:00.00 pdflush
203 root 0 0 0 0 0 S 0 0.0 0:00.47 pdflush
205 root 15 -5 0 0 0 S 0 0.0 0:00.00 aio/0
206 root 15 -5 0 0 0 S 0 0.0 0:00.00 aio/1
204 root 27 0 0 0 0 S 0 0.0 0:00.00 kswapd0
795 root 15 -5 0 0 0 S 0 0.0 0:00.04 kseriod
858 root 15 -5 0 0 0 S 0 0.0 0:00.00 ata/0
859 root 15 -5 0 0 0 S 0 0.0 0:00.00 ata/1
907 root 15 -5 0 0 0 S 0 0.0 0:00.00 kpsmoused
913 root 0 0 0 0 0 S 0 0.0 0:00.03 kirqd
917 root 20 0 0 0 0 S 0 0.0 0:04.11 kjournald
1313 root 20 0 0 0 0 S 0 0.0 0:00.00 khpsbpkt
1581 root 22 0 0 0 0 S 0 0.0 0:00.00 shpchpd_event
2638 root 0 0 1628 556 460 S 0 0.0 0:00.04 syslogd
2641 root 12 0 2588 1420 348 S 0 0.1 0:00.19 klogd
2649 nobody 0 0 1832 656 552 S 0 0.0 0:00.03 dnsmasq
2666 root 20 0 15584 2276 1312 S 0 0.1 0:00.00 slapd
2671 root 0 0 30400 27m 2300 S 0 1.4 0:02.30 spamd
2674 root 3 -20 2412 2412 1764 S 0 0.1 0:01.47 atop
2900 clamav 19 0 69700 57m 992 S 0 2.9 0:08.84 clamd
2988 clamav 7 0 2792 1140 936 S 0 0.1 0:00.08 freshclam
3096 postgres 0 0 90184 3008 2636 S 0 0.1 0:00.12 postmaster
let me know if I can provide you with any more info.
yavosh
11-09-2007, 03:36 PM
We managed to fix this by replacing the quad network card with a single network card. Looks like untangle did not like having free ethernet ports.
Hopefully this will save someone some headaches.
dmorris
11-09-2007, 05:15 PM
yavosh,
yeah something else is taking the memory, 350 megs resident for the untangle-vm is quite low, suggesting that there is some leak causing memory pressure on all the userspace processes and the untangle-vm is getting swapped out.
you could try 'free -m' instead of top/atop