Thread: Filesystem internal checksums?

  1. #1
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229

    Filesystem internal checksums?

    Do POSIX filesystems (eg. ext3) keep internal checksums of files? Is there a way to access them from userspace? (without computing them in userspace)

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by cyberfish View Post
    Do POSIX filesystems (eg. ext3) keep internal checksums of files? Is there a way to access them from userspace? (without computing them in userspace)
    Checksumming files would be infeasible, because the checksum would have to be updated every time the file was written to. Imagine a 1 gigabyte file, and you write to the first byte of the file. The entire file now has to be re-checksummed.

    There might be some strange filesystem out there with checksumming, in order to support some ridiculously unreliable media, but nothing you'd ever see in production...
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  3. #3
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I see.

    Checksums can be incrementally updated, though. For example, using some variations of Zobrist hashing.
    Zobrist hashing - Wikipedia, the free encyclopedia

  4. #4
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by cyberfish View Post
    I see.

    Checksums can be incrementally updated, though. For example, using some variations of Zobrist hashing.
    Zobrist hashing - Wikipedia, the free encyclopedia
    If you consider how Zobrist hashing is applied in a chess playing engine, the table consists of M tables componsed of N entries each, where M is the number of states a board square can take on, and N is the number of board positions. Translating that to the bytes of a file, you would need M*N+N entries where M is 256 (each possible byte value) and N is the maximum supported file size. The extra term of N is for a table which encodes the actual size of the file. That's an impossibly huge table.

    You could try to fiddle with the algorithm to require a smaller table size, but then you lose the main benefit of Zobrist hashing which is completely random mixing of bits for each possible state transition. The checksum would be weak. For instance, it would collide under certain types of data transpositions.

    You could definitely checksum single blocks though.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  5. #5
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Yeah, that's what I meant by "some variant". I was under the assumption that some "strength" can be sacrificed for table size. That could at least make the checksum strong enough against natural damage/data corruption, which is probably what a filesystem would care most about (as opposed to having a cryptographically strong hash).

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    What are you actually trying to solve? Checking for disk errors? If so, why not using RAID of some sort (RAID5 for example).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    There might be some strange filesystem out there with checksumming, in order to support some ridiculously unreliable media, but nothing you'd ever see in production...
    Some filesystems do have checksums. The only one I can think of off the top of my head is ZFS; I don't know of any other ones in common use that have a file system level checksum.

  8. #8
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I am thinking about making a disk caching system for Linux using flash memory.
    ReadyBoost for Linux?

    My current plan is to have a userspace daemon that uses inotify to log file accesses (or inode accesses), and copy those files it thinks should be cached to flash memory. And then a kernel module would intercept open system calls (actually, not sure if this is a good idea anymore, since kernel 2.6 no longer exports sys_call_table) to those files, and redirect them to the cached file. The userspace daemon will also monitor modifications to the cached versions (again using inotify, or by the kernel module notifying it), and write them to disk periodically.

    Still in brainstorming stage. I'm trying to find a way to make sure the two copies are still in sync when a system boots up. I could use mtime, but I thought a checksum of some sort would be more reliable (probably paranoia).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Internal Compiler Error
    By Elysia in forum C++ Programming
    Replies: 20
    Last Post: 07-05-2008, 03:59 AM
  2. Need to know the filesystem on /dev/sda1
    By amit_sahrawat in forum Linux Programming
    Replies: 7
    Last Post: 12-13-2007, 05:29 AM
  3. Filesystem monitoring question
    By rools in forum Linux Programming
    Replies: 2
    Last Post: 11-18-2005, 12:09 PM
  4. Please help, internal compiler error
    By Cap in forum C Programming
    Replies: 5
    Last Post: 08-15-2002, 07:31 PM
  5. Internal Exceptions in a BCB DLL?
    By andy668 in forum Windows Programming
    Replies: 3
    Last Post: 01-07-2002, 10:50 AM