Page 1 of 2 12 LastLast
Results 1 to 10 of 13
  1. #1
    Newbie
    Join Date
    Jan 2019
    Posts
    8

    Default Random Hardcrash

    Hi Guys,
    I`ve been trying to figure out for a while why i keep getting random reboots with this system. Not sure if it's the ram, or SSD that is causing issue. I`m using a Samsung MSata SSD and the Ram is a patriot 16GB So DIMM ram which runs on a i5 Qotom Box with 6 Intel NICS.

    Last night i experienced a hard crash and didn't reboot itself. But i found this in the syslog and trying to figure out what caused it. From the logs, is this still hardware related or software kernel crash?

    Apr 13 01:26:05 PandaFW kernel: [329409.314876] ------------[ cut here ]------------
    Apr 13 01:26:05 PandaFW kernel: [329409.314892] WARNING: CPU: 3 PID: 48 at /localhome/buildbot/untangle-slaves-builds/amd64-kernel/debian-4.9.0-master_stretch/build/ngfw_kernels/debian-4.9.0/linux-4.9.1$
    Apr 13 01:26:05 PandaFW kernel: [329409.314895] list_del corruption. next->prev should be ffff9e7fd521dba8, but was bfff9e7fd521dba8
    Apr 13 01:26:05 PandaFW kernel: [329409.314897] Modules linked in: dm_mod xt_connbytes xt_NFQUEUE tun nf_conntrack_netlink nfnetlink_queue twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_$
    Apr 13 01:26:05 PandaFW kernel: [329409.314985] nf_nat_masquerade_ipv4 xt_multiport xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG xt_policy xt_mac xt_physdev br_netfilter xt_connmark xt_conntrack xt_addrt$
    Apr 13 01:26:05 PandaFW kernel: [329409.315182] CPU: 3 PID: 48 Comm: kswapd0 Not tainted 4.9.0-7-untangle-amd64 #1 Debian 4.9.110-3+deb9u1+untangle4
    Apr 13 01:26:05 PandaFW kernel: [329409.315185] Hardware name: Default string Default string/Default string, BIOS 5.12 07/01/2018
    Apr 13 01:26:05 PandaFW kernel: [329409.315189] ffffaa98c1d4bae0 ffffffff9ff55a82 ffffaa98c1d4bb30 0000000000000000
    Apr 13 01:26:05 PandaFW kernel: [329409.315198] ffffaa98c1d4bb20 ffffffff9fc7c10b 0000003ec1d4bae8 ffff9e7fd521da88
    Apr 13 01:26:05 PandaFW kernel: [329409.315206] ffff9e7fd521dba8 ffffffffc06c05c0 0000000000000000 000000000000002d
    Apr 13 01:26:05 PandaFW kernel: [329409.315213] Call Trace:
    Apr 13 01:26:05 PandaFW kernel: [329409.315222] [<ffffffff9ff55a82>] dump_stack+0x63/0x81
    Apr 13 01:26:05 PandaFW kernel: [329409.315230] [<ffffffff9fc7c10b>] __warn+0xcb/0xf0
    Apr 13 01:26:05 PandaFW kernel: [329409.315238] [<ffffffff9fc7c18f>] warn_slowpath_fmt+0x5f/0x80
    Apr 13 01:26:05 PandaFW kernel: [329409.315279] [<ffffffffc06912de>] ? ext4_destroy_inode+0x3e/0xb0 [ext4]
    Apr 13 01:26:05 PandaFW kernel: [329409.315286] [<ffffffff9ff7581b>] __list_del_entry+0xbb/0xc0
    Apr 13 01:26:05 PandaFW kernel: [329409.315294] [<ffffffff9fe341d0>] evict+0x80/0x190
    Apr 13 01:26:05 PandaFW kernel: [329409.315300] [<ffffffff9fe3431b>] dispose_list+0x3b/0x60
    Apr 13 01:26:05 PandaFW kernel: [329409.315305] [<ffffffff9fe3545a>] prune_icache_sb+0x5a/0x80
    Apr 13 01:26:05 PandaFW kernel: [329409.315311] [<ffffffff9fe1a3ae>] super_cache_scan+0x14e/0x1a0
    Apr 13 01:26:05 PandaFW kernel: [329409.315317] [<ffffffff9fda18e5>] shrink_slab.part.38+0x1f5/0x420
    Apr 13 01:26:05 PandaFW kernel: [329409.315323] [<ffffffff9fda1b39>] shrink_slab+0x29/0x30
    Apr 13 01:26:05 PandaFW kernel: [329409.315327] [<ffffffff9fda63bf>] shrink_node+0xff/0x300
    Apr 13 01:26:05 PandaFW kernel: [329409.315333] [<ffffffff9fda7338>] kswapd+0x2f8/0x740
    Apr 13 01:26:05 PandaFW kernel: [329409.315340] [<ffffffff9fda7040>] ? mem_cgroup_shrink_node+0x170/0x170
    Apr 13 01:26:05 PandaFW kernel: [329409.315347] [<ffffffff9fc9cd36>] kthread+0xe6/0x100
    Apr 13 01:26:05 PandaFW kernel: [329409.315353] [<ffffffffa025c3b0>] ? __switch_to_asm+0x40/0x70
    Apr 13 01:26:05 PandaFW kernel: [329409.315360] [<ffffffff9fc9cc50>] ? kthread_park+0x60/0x60
    Apr 13 01:26:05 PandaFW kernel: [329409.315366] [<ffffffffa025c437>] ret_from_fork+0x57/0x70
    Apr 13 01:26:05 PandaFW kernel: [329409.315397] ---[ end trace 40ef20e484066551 ]---
    Apr 13 01:26:05 PandaFW kernel: [329409.315443] general protection fault: 0000 [#1] SMP
    Apr 13 01:26:05 PandaFW kernel: [329409.315540] Modules linked in: dm_mod xt_connbytes xt_NFQUEUE tun nf_conntrack_netlink nfnetlink_queue twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_$
    Apr 13 01:26:05 PandaFW kernel: [329409.316926] nf_nat_masquerade_ipv4 xt_multiport xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG xt_policy xt_mac xt_physdev br_netfilter xt_connmark xt_conntrack xt_addrt$
    Apr 13 01:26:05 PandaFW kernel: [329409.319659] CPU: 3 PID: 48 Comm: kswapd0 Tainted: G W 4.9.0-7-untangle-amd64 #1 Debian 4.9.110-3+deb9u1+untangle4
    Apr 13 01:26:05 PandaFW kernel: [329409.319819] Hardware name: Default string Default string/Default string, BIOS 5.12 07/01/2018
    Apr 13 01:26:05 PandaFW kernel: [329409.319940] task: ffff9e82155b40c0 task.stack: ffffaa98c1d48000
    Apr 13 01:26:05 PandaFW kernel: [329409.320022] RIP: 0010:[<ffffffff9ff75789>] [<ffffffff9ff75789>] __list_del_entry+0x29/0xc0
    Apr 13 01:26:05 PandaFW kernel: [329409.320153] RSP: 0018:ffffaa98c1d4bb98 EFLAGS: 00010283
    Apr 13 01:26:05 PandaFW kernel: [329409.324047] RAX: ffff9e7fd521c308 RBX: ffff9e7fd521c618 RCX: dead000000000200
    Apr 13 01:26:05 PandaFW kernel: [329409.326181] RDX: bfff9e7fd521dba8 RSI: ffffffffc0689b50 RDI: ffff9e7fd521c738
    Apr 13 01:26:05 PandaFW kernel: [329409.328176] RBP: ffffaa98c1d4bb98 R08: ffff9e7fd521c728 R09: 0000000000000001
    Apr 13 01:26:05 PandaFW kernel: [329409.329842] R10: ffffaa98c1d4ba08 R11: 00000000000003ac R12: ffff9e7fd521c738
    Apr 13 01:26:05 PandaFW kernel: [329409.331556] R13: ffffffffc06c05c0 R14: 0000000000000000 R15: 000000000000002d
    Apr 13 01:26:05 PandaFW kernel: [329409.333203] FS: 0000000000000000(0000) GS:ffff9e822ed80000(0000) knlGS:0000000000000000
    Apr 13 01:26:05 PandaFW kernel: [329409.334589] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Apr 13 01:26:05 PandaFW kernel: [329409.335927] CR2: 00005645193b8d60 CR3: 0000000005208000 CR4: 0000000000360630
    Apr 13 01:26:05 PandaFW kernel: [329409.337257] Stack:
    Apr 13 01:26:05 PandaFW kernel: [329409.338390] ffffaa98c1d4bbc0 ffffffff9fe341d0 ffff9e7fd521c728 ffffaa98c1d4bbf8
    Apr 13 01:26:05 PandaFW kernel: [329409.339484] ffff9e7fd521c618 ffffaa98c1d4bbe8 ffffffff9fe3431b ffffaa98c1d4bbf8
    Apr 13 01:26:05 PandaFW kernel: [329409.340577] 00000000000003c8 ffffaa98c1d4bd08 ffffaa98c1d4bc20 ffffffff9fe3545a
    Apr 13 01:26:05 PandaFW kernel: [329409.341657] Call Trace:
    Apr 13 01:26:05 PandaFW kernel: [329409.342741] [<ffffffff9fe341d0>] evict+0x80/0x190
    Apr 13 01:26:05 PandaFW kernel: [329409.343674] [<ffffffff9fe3431b>] dispose_list+0x3b/0x60
    Apr 13 01:26:05 PandaFW kernel: [329409.344601] [<ffffffff9fe3545a>] prune_icache_sb+0x5a/0x80
    Apr 13 01:26:05 PandaFW kernel: [329409.345525] [<ffffffff9fe1a3ae>] super_cache_scan+0x14e/0x1a0
    Apr 13 01:26:05 PandaFW kernel: [329409.346453] [<ffffffff9fda18e5>] shrink_slab.part.38+0x1f5/0x420
    Apr 13 01:26:05 PandaFW kernel: [329409.347375] [<ffffffff9fda1b39>] shrink_slab+0x29/0x30
    Apr 13 01:26:05 PandaFW kernel: [329409.348242] [<ffffffff9fda63bf>] shrink_node+0xff/0x300
    Apr 13 01:26:05 PandaFW kernel: [329409.349087] [<ffffffff9fda7338>] kswapd+0x2f8/0x740
    Apr 13 01:26:05 PandaFW kernel: [329409.349919] [<ffffffff9fda7040>] ? mem_cgroup_shrink_node+0x170/0x170
    Apr 13 01:26:05 PandaFW kernel: [329409.350746] [<ffffffff9fc9cd36>] kthread+0xe6/0x100
    Apr 13 01:26:05 PandaFW kernel: [329409.351567] [<ffffffffa025c3b0>] ? __switch_to_asm+0x40/0x70
    Apr 13 01:26:05 PandaFW kernel: [329409.352360] [<ffffffff9fc9cc50>] ? kthread_park+0x60/0x60
    Apr 13 01:26:05 PandaFW kernel: [329409.353134] [<ffffffffa025c437>] ret_from_fork+0x57/0x70
    Apr 13 01:26:05 PandaFW kernel: [329409.353891] Code: 66 90 55 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b 57 08 48 89 e5 48 39 c8 74 29 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 3a <4c> 8b 02 4c 39 c7$
    Apr 13 01:26:05 PandaFW kernel: [329409.355497] RIP [<ffffffff9ff75789>] __list_del_entry+0x29/0xc0
    Apr 13 01:26:05 PandaFW kernel: [329409.356250] RSP <ffffaa98c1d4bb98>
    Apr 13 01:26:05 PandaFW kernel: [329409.359190] ---[ end trace 40ef20e484066552 ]---

    Someone please help me, i`ve run out of ideas, before i start changing hardware and stuff, i would like to find the root cause of this issue.

  2. #2
    Untangler jcoffin's Avatar
    Join Date
    Aug 2008
    Location
    Sunnyvale, CA
    Posts
    7,816

    Default

    Since third party apps were installed. I would reinstall a clean Untangle version.

    https://forums.untangle.com/hacks/41...tml#post232388
    f1assistance likes this.
    Attention: Support and help on the Untangle Forums is provided by
    volunteers and community members like yourself.
    If you need Untangle support please call or email support@untangle.com

  3. #3
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    23,346

    Default

    Yes, you're in an unsupported state, your crash may or may not be hardware. Though I'd boot that system with MemTest for a pass or two just in case.
    f1assistance likes this.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  4. #4
    Untangle Ninja jcoehoorn's Avatar
    Join Date
    Mar 2010
    Location
    York, NE
    Posts
    1,678

    Default

    I saw the third-party apps are for temperature monitoring because of known issues there.

    High-temps can definitely cause hardware issues, which can lead to crashes. The first thing I'd do is open the box and clean away any dust. Then I'd make sure the fans are all working properly, or install a fan if you've tried for silent/passive-cooled system. I'd also make sure there is adequate airflow to the machine and it's not sitting right next to another significant heat source. Finally, I'd check the system bios and look for any settings there related to fan operation that might not be aggressive enough.
    f1assistance likes this.
    Five time Microsoft ASP.Net MVP managing a Lenovo RD330 / E5-2420 / 16GB with Untangle 14.1.1 to protect 500Mbits for ~400 residential college students and associated staff and faculty

  5. #5
    Newbie
    Join Date
    Jan 2019
    Posts
    8

    Default

    Hey guys,
    Thanks for the reply, that’s really weird; I haven’t installed any hacks or any 3rd party stuff to the router.

    Any idea what I might have installed to reverse it?

  6. #6
    Newbie
    Join Date
    Jan 2019
    Posts
    8

    Default

    When I actually tried to install LM sensors; I didn’t bother when it didn’t work.
    So the box is still in its original image. And it still crashes and reboots randomly like before.

    This is a fanless QOTOM box with the i5-7200U. But I’m suspecting it could be the ram possibly being too big?

  7. #7
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    23,346

    Default

    The errors in that list indicate 3rd party software, so you did install something. Nuke and pave is the only supported way back to a supported state.

    And you cannot have "too much" RAM. Again run memtest, pretty quick and easy to see if you've got bad RAM, that will manifest as a bad HDD sometimes too. Pray you don't have a bad SSD, those things are a pain to troubleshoot.
    Last edited by sky-knight; 04-13-2019 at 02:44 PM.
    f1assistance likes this.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

  8. #8
    Newbie
    Join Date
    Jan 2019
    Posts
    8

    Default

    Quote Originally Posted by sky-knight View Post
    The errors in that list indicate 3rd party software, so you did install something. Nuke and pave is the only supported way back to a supported state.

    And you cannot have "too much" RAM. Again run memtest, pretty quick and easy to see if you've got bad RAM, that will manifest as a bad HDD sometimes too. Pray you don't have a bad SSD, those things are a pain to troubleshoot.
    It's a brand new Samsung SSD. it would be sad if it was bad.

    Could I nuke the whole installation and import the settings back straight away without any issues? I`ll have to do some reading.

  9. #9
    Newbie
    Join Date
    Nov 2018
    Posts
    12

    Default

    Yes this is possible. I've done that in the past once and configuration restored fine without issues.

  10. #10
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    23,346

    Default

    Quote Originally Posted by dantok View Post
    It's a brand new Samsung SSD. it would be sad if it was bad.

    Could I nuke the whole installation and import the settings back straight away without any issues? I`ll have to do some reading.
    Yes! Untangle backups contain only settings for Untangle. So that's exactly what I'd do, pull a backup, reinstall the platform, and restore the configuration. There are some configuration snags that can move in this process, but not platform corrupting things. Nothing that can cause a hard lock or reboot anyway.

    While you've got the unit down for a reinstall I'd recommend booting it to memtest and letting it run for a pass or two, just to rule out defective RAM.

    As for the SSD, all electronics have a 4% fault rate in the first month. And when SSDs go south, they tend to go south just like RAM does, that is to say some RAM is just randomly bad. Now the controller on the SSD will detect a fair bit of this and deal with it. Which is why SSD technology so so reliable. But, it can all go wrong... and I've seen it all go wrong... and unlike RAM where you have memtest, there are no such tests for SSDs out there. They are impossibly difficult to isolate and troubleshoot as a result. That being said, if you do have a bad SSD you should probably try grabbing a lottery ticket, because you have better odds of hitting it big there than getting an intermittent SSD.

    So let's start with the easy more likely culprits, bad RAM and 3rd party software.
    Rob Sandling, BS:SWE, MCP
    NexgenAppliances.com
    Phone: 866-794-8879 x201
    Email: support@nexgenappliances.com

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2