We've determined that there are two situations that can cause a hard drive to fill up. Here are the two situations and their workarounds.
1.) System fills up very quickly (you can see it filling up by running df -h several times in a row from the command line). /var/log/kern.log contains error after error of:
Code:
Dec 8 12:50:50 untangle kernel: si_wait_scan]
Dec 8 12:50:50 untangle kernel: [ 2097.271177] Pid: 0, comm: swapper Tainted: G W 2.6.26-1-untangle-amd64 #1
Dec 8 12:50:50 untangle kernel: [ 2097.271179]
Dec 8 12:50:50 untangle kernel: [ 2097.271179] Call Trace:
Dec 8 12:50:50 untangle kernel: [ 2097.271180] <IRQ> [<ffffffff8023785f>] warn_on_slowpath+0x58/0x87
Dec 8 12:50:50 untangle kernel: [ 2097.271189] [<ffffffff8024e53a>] ? sched_clock_cpu+0x113/0x121
Dec 8 12:50:50 untangle kernel: [ 2097.271196] [<ffffffff8024eea7>] ? getnstimeofday+0x3a/0x9b
Dec 8 12:50:50 untangle kernel: [ 2097.271199] [<ffffffff8024cd1a>] ? ktime_get_ts+0x49/0x4e
Dec 8 12:50:50 untangle kernel: [ 2097.271203] [<ffffffff8024cd30>] ? ktime_get+0x11/0x42
Dec 8 12:50:50 untangle kernel: [ 2097.271206] [<ffffffff802525a0>] ? tick_dev_program_event+0x2a/0x9d
Dec 8 12:50:50 untangle kernel: [ 2097.271209] [<ffffffff8024eea7>] ? getnstimeofday+0x3a/0x9b
Dec 8 12:50:50 untangle kernel: [ 2097.271214] [<ffffffff8021f5dc>] hpet_legacy_next_event+0x3c/0x59
Dec 8 12:50:50 untangle kernel: [ 2097.271218] [<ffffffff80251666>] clockevents_program_event+0x73/0x7c
Dec 8 12:50:50 untangle kernel: [ 2097.271221] [<ffffffff802525a0>] tick_dev_program_event+0x2a/0x9d
Dec 8 12:50:50 untangle kernel: [ 2097.271225] [<ffffffff80251f3c>] tick_broadcast_set_event+0x15/0x17
Dec 8 12:50:50 untangle kernel: [ 2097.271228] [<ffffffff80252154>] tick_handle_oneshot_broadcast+0xbb/0xd4
Dec 8 12:50:50 untangle kernel: [ 2097.271231] [<ffffffff8024eea7>] ? getnstimeofday+0x3a/0x9b
Dec 8 12:50:50 untangle kernel: [ 2097.271236] [<ffffffff8020f91e>] timer_event_interrupt+0x1a/0x21
Dec 8 12:50:50 untangle kernel: [ 2097.271239] [<ffffffff802712e9>] handle_IRQ_event+0x2e/0x65
Dec 8 12:50:50 untangle kernel: [ 2097.271243] [<ffffffff8027299d>] handle_edge_irq+0xea/0x12b
Dec 8 12:50:50 untangle kernel: [ 2097.271247] [<ffffffff8020f898>] do_IRQ+0x6e/0xda
Dec 8 12:50:50 untangle kernel: [ 2097.271251] [<ffffffff8020c62d>] ret_from_intr+0x0/0x19
Dec 8 12:50:50 untangle kernel: [ 2097.271253] <EOI> [<ffffffff8021fd5d>] ? native_irq_enable+0x6/0x7
Dec 8 12:50:50 untangle kernel: [ 2097.271267] [<ffffffffa000c3ad>] ? rocessor:acpi_idle_enter_bm+0x2b8/0x332
Dec 8 12:50:50 untangle kernel: [ 2097.271270] [<ffffffff803ca087>] ? menu_select+0x70/0x99
Dec 8 12:50:50 untangle kernel: [ 2097.271275] [<ffffffff803c9477>] ? cpuidle_idle_call+0x77/0xa8
Dec 8 12:50:50 untangle kernel: [ 2097.271278] [<ffffffff803c9400>] ? cpuidle_idle_call+0x0/0xa8
Dec 8 12:50:50 untangle kernel: [ 2097.271281] [<ffffffff8020ad46>] ? cpu_idle+0x91/0xbb
Dec 8 12:50:50 untangle kernel: [ 2097.271285] [<ffffffff8044882d>] ? start_secondary+0x168/0x16c
Dec 8 12:50:50 untangle kernel: [ 2097.271295]
Dec 8 12:50:50 untangle kernel: [ 2097.271296] ---[ end trace 090cb75b73900f80 ]---
Dec 8 12:50:50 untangle kernel: [ 2097.271535] ------------[ cut here ]------------
Dec 8 12:50:50 untangle kernel: [ 2097.271537] WARNING: at arch/x86/kernel/hpet.c:299 hpet_legacy_next_event+0x3c/0x59()
Dec 8 12:50:50 untangle kernel: [ 2097.271539] Modules linked in: xt_physdev bridge ipt_MASQUERADE ipt_addrtype xt_NFQUEUE xt_NOTRACK iptable_raw xt_tcpudp ipt_REDIRECT xt_multiport xt_conntrack xt_connmark xt_state ipt_REJECT xt_mark iptable_filter iptable_nat xt_CONNMARK xt_MARK iptable_mangle tun nfnetlink_queue dummy iptable_tune ip_tables x_tables nf_nat_tftp nf_nat_sip nf_nat_pptp nf_nat_h323 nf_nat_amanda nf_nat_snmp_basic nf_nat_proto_gre nf_nat_irc nf_nat_ftp nf_conntrack_sip nf_conntrack_pptp nf_conntrack_irc nf_conntrack_ftp nf_conntrack_proto_sctp nf_conntrack_netlink nfnetlink nf_nat nf_conntrack_ipv4 ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_proto_gre nf_conntrack_netbios_ns nf_conntrack_h323 nf_conntrack loop snd_pcm snd_timer snd soundcore snd_page_alloc button pcspkr evdev ext3 jbd mbcache ide_cd_mod cdrom ide_disk jmicron usbhid hid ff_memless ide_pci_generic ide_core ahci ata_generic ipg libata mii scsi_mod r8169 ehci_hcd dock thermal processor fan thermal_sys [last unloaded: sc
Then your motherboard/BIOS and the linux kernel disagree with how HPET should work, you can disable HPET on your BIOS, or you can apply a patch and restart your system.
From the command line:
Code:
curl -s http://www.untangle.com/download/patches/7.1/hpetfix.sh | /bin/dash
Customers with live support can contact Untangle to apply the patch for you. Note: You should *only* apply this patch if you are being affected by the exact log message above.
2.) (
EDIT: this got fixed in 7.1) The second issue is seen on high-volume sites. What is happening is that the autovacuum process in postgres is unable to keep up with the deletes and adds in the database. We are currently researching the issue. Customers with live support are encouraged to contact the support team if they are having this issue.
If this issue is affecting you presently and you system has lots of RAM, you can tune your postgres settings to alleviate the problem.
In particular, the setting max_fsm_pages can be adjusted. It is located in: /etc/postgresql/8.3/main/postgresql.conf.
To determine the correct value, you should run:
Code:
#Note: If you are running this remotely, I'd recommend running 'screen' first.
psql -U postgres uvm
vacuum verbose;
At the end of that output, you will see something like:
Code:
INFO: free space map contains 402212 pages in 475 relations
DETAIL: A total of 365504 page slots are in use (including overhead).
365504 page slots are required to track all free space.
Current limits are: 500000 page slots, 1000 relations, using 2995 kB.
If you don't have enough free space map page entries, it will tell you how many you need. Use that number + 10% and edit /etc/postgresql/8.3/main/postgresql.conf and adjust max_fsm_pages
Code:
# - Free Space Map -
max_fsm_pages = 500000 # min max_fsm_relations*16, 6 bytes each
#max_fsm_pages = 153600 # min max_fsm_relations*16, 6 bytes each
# (change requires restart)
#max_fsm_relations = 1000 # min 100, ~70 bytes each
# (change requires restart)
We have found that running a "full vacuum" on the database will significantly reduce the disk space used. Unfortunately, the full vacuum process is very invasive. It locks the tables while it's processing them and increases disk utilization greatly.
The effect of locking the table means that while certain tables are being processed events are not being logged to the database. This will obviously have a negative impact on reporting. With those caveats in mind, you can execute the following commands:
Code:
If you are running this remotely, I'd recommend running 'screen' first.
psql -U postgres uvm
vacuum full verbose;
[wait a long time (depending on database size), could be up to 8 hours]
It is important to note that while the process runs for several hours, each table is only locked during it's processing.
Again, customers with live support are encouraged to contact the support team if they are having this issue. At this point we've had little opportunity to test these changes, so we'd like to monitor the effects on the system.