Thread: C file-manipulation in Windows

  1. #31
    Registered User
    Join Date
    May 2021
    Posts
    19
    Quote Originally Posted by hamster_nz View Post
    If I was implementing this, I would make a hidden 'checksums' file in each directory, with just the local files in that directory, and not inserting the checksums into the file names.
    Yes, that's a design decision I've pondered. On the one hand, writing the checksums into a directory-level file is easy, and thereby attractive. But the way the eventual program must handle the addition or removal of a new file requires some messy knife-and-fork coding. Easy but messy.

    On the other hand, if each file can somehow hold its own checksum, then the program can be file-based instead of directory-based, and the addition or removal of files in a directory is trivial to handle.

    So far, I favour the second option...



    Quote Originally Posted by hamster_nz View Post
    Have you done the numbers for the average data rates you need to sustain to meet your time goals?
    I have no hard targets, I just want to be able to verify and then backup in one day, which is harder than it sounds with 100,000+ files totalling 1.75 TBytes. Also, my master music collection has two separate backups, which makes backup take longer.

    For backup, I use SyncBack (Free). It's as good as anything else I've come across.

  2. #32
    Registered User
    Join Date
    May 2021
    Posts
    19
    Quote Originally Posted by Salem View Post
    > to verify my 150,000 music files before I back them up.
    Perhaps you also desire some kind of error correction in addition to error detection.
    Parchive - Wikipedia
    Thanks for that. I missed the link when I first read your post. [Sorry, I'm new to this forum, and not yet absorbing its visual format unconsciously. ]

    Do you happen to know what the overhead is? For a 100 MByte file, how big is the ECC file(s) that go with it?

  3. #33
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,675
    > Do you happen to know what the overhead is? For a 100 MByte file, how big is the ECC file(s) that go with it?
    You can choose your recovery size.
    Code:
    $ par2create -c1 p2c_test1.par2 kubuntu-20.04.2.0-desktop-amd64.iso 
    $ par2create -c100 p2c_test2.par2 kubuntu-20.04.2.0-desktop-amd64.iso 
    $ ls -lh kubuntu-20.04.2.0-desktop-amd64.iso p2c*
    -rw-rw-r-- 1 sc sc 2.5G Apr  3 14:56 kubuntu-20.04.2.0-desktop-amd64.iso
    -rw-rw-r-- 1 sc sc  40K Jun  7 03:59 p2c_test1.par2
    -rw-rw-r-- 1 sc sc 1.3M Jun  7 03:59 p2c_test1.vol0+1.par2
    -rw-rw-r-- 1 sc sc  40K Jun  7 04:02 p2c_test2.par2
    -rw-rw-r-- 1 sc sc 1.3M Jun  7 04:02 p2c_test2.vol000+01.par2
    -rw-rw-r-- 1 sc sc 2.6M Jun  7 04:02 p2c_test2.vol001+02.par2
    -rw-rw-r-- 1 sc sc 5.2M Jun  7 04:02 p2c_test2.vol003+04.par2
    -rw-rw-r-- 1 sc sc  11M Jun  7 04:02 p2c_test2.vol007+08.par2
    -rw-rw-r-- 1 sc sc  21M Jun  7 04:02 p2c_test2.vol015+16.par2
    -rw-rw-r-- 1 sc sc  41M Jun  7 04:02 p2c_test2.vol031+32.par2
    -rw-rw-r-- 1 sc sc  47M Jun  7 04:02 p2c_test2.vol063+37.par2
    The minimal single block is like 0.05% of the data size.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #34
    Registered User
    Join Date
    May 2021
    Posts
    19
    Quote Originally Posted by Salem View Post
    > Do you happen to know what the overhead is? For a 100 MByte file, how big is the ECC file(s) that go with it?
    You can choose your recovery size.

    8< [8< = Scissors = text snipped.]

    The minimal single block is like 0.05% of the data size.

    I've tried reading such documentation as there is, but it isn't clear to me. One of my priorities is that I need to deal with disk corruption, which can result in the loss of blocks of data OR just a byte here and there.

    This lead me to conclude, some time ago, that the best ECC system can't beat just storing a copy of the data under ECC protection ... provided that copy is not corrupt, and does not become corrupt, of course!

    To me, Parchive resembles the way a RAID array works. Is that likely to offer what I'm after? I'm not sure.

    But thanks for all the info nonetheless! It's all grist for the mill, as they say.
    Last edited by Pattern-chaser; 06-07-2021 at 05:49 AM.

  5. #35
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,675
    Here's an example.

    1. Create a minimal recovery set for some file.
    Code:
    $ cp 20200903_055944.mp4 dummy.mp4
    $ par2create -c1 p2c_test1.par2 dummy.mp4 
    Block size: 85456
    Source file count: 1
    Source block count: 2000
    Recovery block count: 1
    Recovery file count: 1
    
    Opening: dummy.mp4
    Computing Reed Solomon matrix.
    Constructing: done.
    Wrote 85456 bytes to disk
    Writing recovery packets
    Writing verification packets
    Done
    2. Some major league bit-rot. This could be anywhere within the file, not necessarily contiguous damage either.
    Code:
    $ dd if=/dev/urandom of=dummy.mp4 count=100 conv=notrunc
    100+0 records in
    100+0 records out
    51200 bytes (51 kB, 50 KiB) copied, 0.00818186 s, 6.3 MB/s
    3. Check it.
    Code:
    $ par2verify p2c_test1.par2
    Loading "p2c_test1.par2".
    Loaded 4 new packets
    Loading "p2c_test1.vol0+1.par2".
    Loaded 1 new packets including 1 recovery blocks
    Loading "p2c_test1.par2".
    No new packets found
    
    There are 1 recoverable files and 0 other files.
    The block size used was 85456 bytes.
    There are a total of 2000 data blocks.
    The total size of the data files is 170905919 bytes.
    
    Verifying source files:
    
    Target: "dummy.mp4" - damaged. Found 1999 of 2000 data blocks.
    85328 bytes of data were skipped whilst scanning.
    If there are not enough blocks found to repair: try again with the -N option.
    
    Scanning extra files:
    
    
    Repair is required.
    1 file(s) exist but are damaged.
    You have 1999 out of 2000 data blocks available.
    You have 1 recovery blocks available.
    Repair is possible.
    1 recovery blocks will be used to repair.
    4. Fix it.
    Code:
    $ par2repair p2c_test1.par2
    Loading "p2c_test1.par2".
    Loaded 4 new packets
    Loading "p2c_test1.vol0+1.par2".
    Loaded 1 new packets including 1 recovery blocks
    Loading "p2c_test1.par2".
    No new packets found
    
    There are 1 recoverable files and 0 other files.
    The block size used was 85456 bytes.
    There are a total of 2000 data blocks.
    The total size of the data files is 170905919 bytes.
    
    Verifying source files:
    
    Target: "dummy.mp4" - damaged. Found 1999 of 2000 data blocks.
    85328 bytes of data were skipped whilst scanning.
    If there are not enough blocks found to repair: try again with the -N option.
    
    Scanning extra files:
    
    
    Repair is required.
    1 file(s) exist but are damaged.
    You have 1999 out of 2000 data blocks available.
    You have 1 recovery blocks available.
    Repair is possible.
    1 recovery blocks will be used to repair.
    
    Computing Reed Solomon matrix.
    Constructing: done.
    Solving: done.
    
    Wrote 170905919 bytes to disk
    
    Verifying repaired files:
    
    Target: "dummy.mp4" - found.
    
    Repair complete.
    $ sha1sum 20200903_055944.mp4 dummy.mp4
    1b97ee7b451f551e663f80bdda8627f87a9d03bc  20200903_055944.mp4
    1b97ee7b451f551e663f80bdda8627f87a9d03bc  dummy.mp4
    > To me, Parchive resembles the way a RAID array works. Is that likely to offer what I'm after? I'm not sure.
    Each addition raises the bar on failures that eventually result in data loss.
    I suppose it's where you choose to draw the line.

    > [I keep one master collection and two full copies on different discs.]
    Please tell me all 3 copies are not in your house.
    If you're going to this much trouble, one of them has to be off-site at all times.
    It doesn't matter how many copies you have if your house burns down.

    A nearby relative's is probably fine, unless city killer asteroids are in your threat model.
    Society is a mess, but hey, my music collection is safe.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #36
    Registered User
    Join Date
    May 2021
    Posts
    19
    Quote Originally Posted by Pattern-chaser
    [I keep one master collection and two full copies on different discs.]
    Quote Originally Posted by Salem View Post
    Please tell me all 3 copies are not in your house.
    I'm afraid they are, but you are right to point out how silly this is. If my house burns down, I'm going to get blisters grabbing a USB drive on the way out!
    "Who cares, wins"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Help in file Manipulation
    By arunvijay19 in forum C Programming
    Replies: 5
    Last Post: 02-07-2010, 05:23 AM
  2. File Manipulation Help
    By GCNDoug in forum C Programming
    Replies: 2
    Last Post: 05-07-2007, 12:11 PM
  3. i/o file manipulation
    By mouse163 in forum C++ Programming
    Replies: 4
    Last Post: 05-03-2003, 05:48 PM
  4. File manipulation
    By Shadow in forum C Programming
    Replies: 1
    Last Post: 04-23-2002, 08:07 AM
  5. file manipulation
    By swiss powder in forum C++ Programming
    Replies: 2
    Last Post: 02-27-2002, 01:24 PM

Tags for this Thread