Do POSIX filesystems (eg. ext3) keep internal checksums of files? Is there a way to access them from userspace? (without computing them in userspace)
Do POSIX filesystems (eg. ext3) keep internal checksums of files? Is there a way to access them from userspace? (without computing them in userspace)
Checksumming files would be infeasible, because the checksum would have to be updated every time the file was written to. Imagine a 1 gigabyte file, and you write to the first byte of the file. The entire file now has to be re-checksummed.
There might be some strange filesystem out there with checksumming, in order to support some ridiculously unreliable media, but nothing you'd ever see in production...
Code://try //{ if (a) do { f( b); } while(1); else do { f(!b); } while(1); //}
I see.
Checksums can be incrementally updated, though. For example, using some variations of Zobrist hashing.
Zobrist hashing - Wikipedia, the free encyclopedia
If you consider how Zobrist hashing is applied in a chess playing engine, the table consists of M tables componsed of N entries each, where M is the number of states a board square can take on, and N is the number of board positions. Translating that to the bytes of a file, you would need M*N+N entries where M is 256 (each possible byte value) and N is the maximum supported file size. The extra term of N is for a table which encodes the actual size of the file. That's an impossibly huge table.
You could try to fiddle with the algorithm to require a smaller table size, but then you lose the main benefit of Zobrist hashing which is completely random mixing of bits for each possible state transition. The checksum would be weak. For instance, it would collide under certain types of data transpositions.
You could definitely checksum single blocks though.
Code://try //{ if (a) do { f( b); } while(1); else do { f(!b); } while(1); //}
Yeah, that's what I meant by "some variant". I was under the assumption that some "strength" can be sacrificed for table size. That could at least make the checksum strong enough against natural damage/data corruption, which is probably what a filesystem would care most about (as opposed to having a cryptographically strong hash).
What are you actually trying to solve? Checking for disk errors? If so, why not using RAID of some sort (RAID5 for example).
--
Mats
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
Some filesystems do have checksums. The only one I can think of off the top of my head is ZFS; I don't know of any other ones in common use that have a file system level checksum.There might be some strange filesystem out there with checksumming, in order to support some ridiculously unreliable media, but nothing you'd ever see in production...
I am thinking about making a disk caching system for Linux using flash memory.
ReadyBoost for Linux?
My current plan is to have a userspace daemon that uses inotify to log file accesses (or inode accesses), and copy those files it thinks should be cached to flash memory. And then a kernel module would intercept open system calls (actually, not sure if this is a good idea anymore, since kernel 2.6 no longer exports sys_call_table) to those files, and redirect them to the cached file. The userspace daemon will also monitor modifications to the cached versions (again using inotify, or by the kernel module notifying it), and write them to disk periodically.
Still in brainstorming stage. I'm trying to find a way to make sure the two copies are still in sync when a system boots up. I could use mtime, but I thought a checksum of some sort would be more reliable (probably paranoia).