Results 1 to 6 of 6
  1. #1
    Untangler swmspam's Avatar
    Join Date
    Mar 2008
    Posts
    73

    Default Disk Check using tune2fs

    I recently powered down my UT box due to UPS service. It had an uptime of ~270 days. This made me wonder when the last time a fsck had been done. It is dangerous to run a fsck on a mounted drive, so a boot-time option is probably best.

    First, what filesystems are on the UT box?

    Code:
    # nano /etc/fstab
    
    # /etc/fstab: static file system information.
    #
    # <file system> <mount point>   <type>  <options>       <dump>  <pass>
    proc            /proc           proc    defaults        0       0
    /dev/hda1       /               ext3    errors=remount-ro 0       1
    /dev/hda5       /data           ext3    defaults        0       2
    /dev/hda2       none            swap    sw              0       0
    /dev/hdb        /media/cdrom0   udf,iso9660 user,noauto     0       0
    Therefore, /dev/hda1 is the target partition. When was the last file system consistency check, and when is the next due?

    Code:
    tune2fs -l /dev/hda1
    
    ......
    Filesystem created:       Mon Oct 19 22:45:09 2009
    Last mount time:          Sun Mar 13 12:00:22 2011
    Last write time:          Sun Mar 13 12:00:22 2011
    Mount count:              7
    Maximum mount count:      38
    Last checked:             Mon Oct 19 22:45:09 2009
    Check interval:           0 (<none>)
    ......
    Yipes! The last time the file system was examined was when this machine was built.

    This says that the ext3 partition on /dev/hda1 is checked every 38 power cycles. There is no date check to trigger a fsck. Since I've only rebooted my box twice since inception (plus the half-dozen boots when building the machine), it will be decade(s) until before the file system check is triggered by the "Maximum mount count".

    The Maximum mount count can be changed, along with the Check interval:

    Code:
    tune2fs -c 1 /dev/hda1
    tune2fs -i 1 /dev/hda1
    Now, we can check when a boot-time fsck will run next:

    Code:
    tune2fs -l /dev/hda1
    
    ......
    Filesystem created:       Mon Oct 19 22:45:09 2009
    Last mount time:          Sun Mar 13 12:00:22 2011
    Last write time:          Mon Mar 14 08:44:25 2011
    Mount count:              7
    Maximum mount count:      1
    Last checked:             Mon Oct 19 22:45:09 2009
    Check interval:           86400 (1 day)
    ......
    Now, a boot-time fsck is run every reboot. This will increase the boot time. I may also make a cron job to reboot the machine at very obscure times to ensure fsck is occasionally run once or twice a year (for example, midnight on Valentine's day, because we're all busy elsewhere ...)

  2. #2
    Untangle Ninja mrunkel's Avatar
    Join Date
    Jul 2008
    Posts
    2,989

    Default

    Sigh. It's pointless to run fsck on a disk that doesn't need it.
    m.


    Big Frickin Disclaimer:
    While I'm pretty sure, I can't guarantee that I know what I'm doing. There might be a better way to do this, and this way might actually suck. Make sure you understand the implications of what you're doing before trying to follow these directions.

    It often helps troubleshooting if you have a good network map. Look here if you want my advice on how to draw one.
    Attention: Support and help on the Untangle Forums is provided by volunteers and community members like yourself.
    If you need Untangle support please call or email support@untangle.com

  3. #3
    Untangler swmspam's Avatar
    Join Date
    Mar 2008
    Posts
    73

    Default

    Sigh. I've had hard disks crap out, especially ones that are in service for years without maintenance. Debian isn't indestructible. Close, but not quite.

    Code:
    root@untangle#fsck -nvf /dev/hda1
    fsck 1.41.3 (12-Oct-2008)
    e2fsck 1.41.3 (12-Oct-2008)
    Warning!  /dev/hda1 is mounted.
    Warning: skipping journal recovery because doing a read-only filesystem check.
    Pass 1: Checking inodes, blocks, and sizes
    Deleted inode 66244 has zero dtime.  Fix? no
    
    Inodes that were part of a corrupted orphan linked list found.  Fix? no
    
    Inode 1753095 was part of the orphaned inode list.  IGNORED.
    Inode 1753097 was part of the orphaned inode list.  IGNORED.
    Inode 1753098 was part of the orphaned inode list.  IGNORED.
    Inode 1753099 was part of the orphaned inode list.  IGNORED.
    Inode 1753100 was part of the orphaned inode list.  IGNORED.
    Inode 1753101 was part of the orphaned inode list.  IGNORED.
    Inode 1753102 was part of the orphaned inode list.  IGNORED.
    Inode 1753103 was part of the orphaned inode list.  IGNORED.
    Inode 1753104 was part of the orphaned inode list.  IGNORED.
    Inode 1753105 was part of the orphaned inode list.  IGNORED.
    Inode 1753106 was part of the orphaned inode list.  IGNORED.
    Pass 2: Checking directory structure
    Entry 'pgstat.stat' in /var/lib/postgresql/8.3/main/global (58381) has deleted/unused inode 57678.  Clear? no
    
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Unattached inode 57390
    Connect to /lost+found? no
    
    Pass 5: Checking group summary information
    Block bitmap differences:  -43460 -58416 -4804609 +4804610 -7014440 -7014448 -7014456 -7014464 -7014472 -7014480 -7014488 -7014496 -7014504 -7014512 -7196008 +10912268 -10912270 -10913800 +10915878 -10917970 +10942465
    Fix? no
    
    Free blocks count wrong for group #1 (8987, counted=8985).
    Fix? no
    
    Free blocks count wrong for group #219 (22284, counted=22282).
    Fix? no
    
    Free blocks count wrong (18181920, counted=18181916).
    Fix? no
    
    Inode bitmap differences:  -66244 -1753095 -(1753097--1753106)
    Fix? no
    
    /dev/hda1: ********** WARNING: Filesystem still has errors **********
    
       66836 inodes used (1.37%)
        1759 non-contiguous inodes (2.6%)
             # of inodes with ind/dind/tind blocks: 4158/206/0
     1349095 blocks used (6.91%)
           0 bad blocks
           1 large file
    
       56022 regular files
        6398 directories
          63 character device files
          25 block device files
           2 fifos
         404 links
        4300 symbolic links (4152 fast symbolic links)
           5 sockets
    --------
       67219 files
    root@untangle#
    So this disk has some minor filesystem discrepancies, along with very mild fragmentation (2.6%).

    In my opinion, an occasional fsck would improve the reliability of the system.

  4. #4
    Untangle Junkie dmorris's Avatar
    Join Date
    Nov 2006
    Location
    San Mateo, CA
    Posts
    11,689

    Default

    I wish we could add more fsck triggers just for safety, but we've had to remove them.
    If something is weird users reboot boxes. If its slow to boot they reboot it again, so often they end up rebooting it over and over during fscks.

    I think if it detects issues it will set the flag for an fsck on the next reboot, which is the next best thing.
    Attention: Support and help on the Untangle Forums is provided by volunteers and community members like yourself.
    If you need Untangle support please call or email support@untangle.com

  5. #5
    Untangle Ninja sky-knight's Avatar
    Join Date
    Apr 2008
    Location
    Phoenix, AZ
    Posts
    16,908

    Default

    Dirk, it does do exactly that. I've had to wait for fsck to do its thing on boot before.

    You guys remember that customer of mine who's son figured out he could "bypass" the filter during a special time when Untangle was booting? By the time his dad called me that poor box was fsck'ing every boot!

    Of course in his case the sata controller on the mainboard actually gave out... man that was a weird box.

    Anyway, Debian base does this. It follows the same tactic as windows. The kernel module responsible for maintaining the filesystem does its job while working on the files requested. If it comes across an issue it fixes it as it goes, if the issue is severe enough it will set a long check on the next reboot.

    The problem with this approach is Linux's "nasty" habit of going months if not years between reboots. Untangle by and large solves this issue with reboots for upgrades several times a year.

    So while I understand the desire to check the file system... I don't worry about it unless the thing is broken. After all, isn't that what the offline backup feature is for?
    Rob Sandling, BS:SWE, MCP
    Intouch Technology
    Phone: 480-272-9889
    rob@intouchtechllc.com

    UntangleAppliances.com
    Phone: 866-794-8879

  6. #6
    Untangler swmspam's Avatar
    Join Date
    Mar 2008
    Posts
    73

    Default

    Several of the Linux machines I work with don't reboot unless the UPS batteries are being switched, or the power cables being re-routed. This is usually on the timescale of year, not month. The firewall (m0n0wall) was still running an older version with >480 days uptime. I updated it to m0n0 1.32 with a CF card while the machine was down.

    The NAS machines are configured with smartmontools smartd and regular crontab xfs-repair tasks. I've got a drive now issuing SMART values 05, 196, and 198 (reallocations and uncorrectables). Probably take the NAS machine down soon and replace the drive.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2