Thread: Help with recoding Tar command

  1. #1
    Registered User
    Join Date
    Sep 2007
    Posts
    69

    Help with recoding Tar command

    Hello to all of you!

    Im in the process of recoding the tar command, (unix) and I ran into some problems/questions lately.

    1.) What I do is, read a file, stat() on the file, create a header for my tar file, write into the tar file. And after I do the same for the rest. Is this more or less correct?

    2.) Im having trouble with executables. I open an executable with emacs, you see very strange things. If I copy and paste this into a new file, it does not want to run no matter the encoding I put. Also reading the executable, the "read" command, doesnt read beyond the first 10 characters no matter what.

    3.) If you open with emacs the executable or a tar file, you see ^@^@^@^@^@^@ and stuff like that all over the place. This is what a tar command (for example) does to make sure everything is in blocks of 512. How do I imitate this? I cant just write ^@^@ as it doesnt work. (im guessing ^@ is some binary code).

    If you could help with those questions, I would be already very grateful. Im still in the thinking stage (as it is a big project), but those questions could really help me in doing the tar command.

  2. #2
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    1.) What I do is, read a file, stat() on the file, create a header for my tar file, write into the tar file. And after I do the same for the rest. Is this more or less correct?
    Sure, doesn't sound too far off the mark . . .

    2.) Im having trouble with executables. I open an executable with emacs, you see very strange things. If I copy and paste this into a new file, it does not want to run no matter the encoding I put. Also reading the executable, the "read" command, doesnt read beyond the first 10 characters no matter what.
    Try opening your file in binary mode. You can do this by appending a "b" to whatever fopen() mode you are using.

    3.) If you open with emacs the executable or a tar file, you see ^@^@^@^@^@^@ and stuff like that all over the place. This is what a tar command (for example) does to make sure everything is in blocks of 512. How do I imitate this? I cant just write ^@^@ as it doesnt work. (im guessing ^@ is some binary code).
    ^@ is a NULL, I believe. Character 0.

    ^A is 1, ^B is 2, ^Z is 26, escape is 27, and ^@ is 0, as far I can remember. (All in decimal.)
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  3. #3
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by +Azazel+ View Post
    1.) What I do is, read a file, stat() on the file, create a header for my tar file, write into the tar file. And after I do the same for the rest. Is this more or less correct?
    Well, you probably only want a single header in the file, not one for each member of the archive.

    2.) Im having trouble with executables. I open an executable with emacs, you see very strange things. If I copy and paste this into a new file, it does not want to run no matter the encoding I put. Also reading the executable, the "read" command, doesnt read beyond the first 10 characters no matter what.
    Not sure why you would want to open an executable file in emacs. But you can cause it to preserve the encoding by invoking "M-x find-file-literally" to open the file. You can safely edit binary that way. But copy and pasting? From one app to another? Never gonna work.

    3.) If you open with emacs the executable or a tar file, you see ^@^@^@^@^@^@ and stuff like that all over the place. This is what a tar command (for example) does to make sure everything is in blocks of 512. How do I imitate this? I cant just write ^@^@ as it doesnt work. (im guessing ^@ is some binary code).
    ^@ is emacs' default surrogate for the NUL byte, in other words, 0.

    If you could help with those questions, I would be already very grateful. Im still in the thinking stage (as it is a big project), but those questions could really help me in doing the tar command.
    Why are you so worried about emacs? What does it have to do with anything?

  4. #4
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    The reason Im so worried, is because when I tar something with my program I will copy and paste (literally) the code into the tar file. Afterwards if I want to un-tar it, I have to create a new file and paste the code there. So what would I do in this case for executables? Also copy and paste the code or?

    So, I would use fopen with binary mode to open the file. Read it, and paste this into my tar file. (with fputs? or fprintf? or fwrite?) And later on, my tar file now contains both ascii data and binary data, so what would I open it now as?

  5. #5
    Registered User
    Join Date
    Feb 2006
    Posts
    54
    binary. there is almost no reason to ever open a file in text mode.

  6. #6
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    Would opening in binary mode, and reading it, work the same as usual opening and reading (just that it will preserve the binary characters?)

    For the header: Im confused with the checksum field.
    "The chksum field is a checksum of all the bytes in the header, assuming that the chksum field itself is all blanks."

    How would I go about finding that out?

    How can I fill a string with "zero" bytes?
    Last edited by +Azazel+; 12-01-2007 at 03:45 AM.

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    To begin with, why don't you write a simple program that copies a file in binary mode? Yes, it should work to do that. You will, of course, have to also set the correct mode of the file, and so on when you create the new file.

    By the way, I admire your courage. tar is not an entirely simple product, and I think it would take me quite some time to get such an application put together and working correctly. Not to mention that there are a great number of flags that make tar do things ever so slightly differently. Bet that as it may.

    Setting some data structure to zero is easiest done with
    Code:
    memset(&something, 0, sizeof(something));
    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    Ive written a program that copies a file into another file, it works both for executables and others. (I did not use binary mode by the way, I used the example given in the FAQ).

    Right now Im in the process of trying to write a function that will create the header. Not too much sucess so far, but Im getting closer. The problem is writing the checksum (which I dont know where to get it), and filling the various strings with Zero bytes. (bzero is a good function also?)

    EDIT: I still have problems with writing the checksum. No idea how to get it.
    Last edited by +Azazel+; 12-01-2007 at 11:07 AM.

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    What have you done so far with regards of checksum?

    bzero isn't as portable as memset, but yes, you can use bzero if you don't care about being portable. I would recommend memset, as I'm not aware of ANY OS that doesn't have a memset [it's probable one of the most common functions in <memory.h>, possibly after memcpy, so it's a quite common function]. I'm pretty sure that bzero, on systems that support it, is implemented like this:
    Code:
    void bzero(void *ptr, size_t size)
    {
        memset(ptr, 0, size);
    }
    If you really want to avoid writing the third argument, why not make a macro like this:
    Code:
    #define Zero(x) memset(x, 0, sizeof(x))
    --
    Mats
    Last edited by matsp; 12-01-2007 at 11:31 AM.
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    Thanks very much Mats!

    Memset works wonderfully, and Im almost finished with the header that is exactly like the real tar one. Its only the checksum that troubles me. From what I understand to be able to get the checksum, I have to add all the ASCII values of all the characters that are in my Header and then transform on Octal. Is this it?

  11. #11
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    According to wikipedia:
    Quote Originally Posted by Wikipedia tar file format
    The checksum is calculated by taking the sum of the byte values of the header block with the eight checksum bytes taken to be ascii spaces (value 32). It is stored as a six digit octal number with leading zeroes followed by a nul and then a space.
    So you shouldn't set the checksum to all zero, but something like:
    Code:
       memset(checksum, 32, 6);
    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #12
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    I would recommend memset, as I'm not aware of ANY OS that doesn't have a memset [it's probable one of the most common functions in <memory.h>, possibly after memcpy, so it's a quite common function].
    Don't you mean <string.h>? . . . memory.h is an extension with the mem*() functions from string.h, as far as I can tell. I do know that string.h is standard, and I'm pretty sure that memory.h isn't.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by dwks View Post
    Don't you mean <string.h>? . . . memory.h is an extension with the mem*() functions from string.h, as far as I can tell. I do know that string.h is standard, and I'm pretty sure that memory.h isn't.
    Yes, I mean string.h of course.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Registered User
    Join Date
    Sep 2007
    Posts
    69
    Hey again!

    All is advancing well. I succeeded in creating the header, and tar files created with my program can be extracted with the tar command.
    Im looking into being able to tar Directories in a recursive manner. Would I be able to adapt the Walker located on the FAQ for this question? Or would alot of it be useless for me?

    Thanks guys!
    Last edited by +Azazel+; 12-02-2007 at 10:44 AM.

  15. #15
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by +Azazel+ View Post
    (bzero is a good function also?)
    bzero() is only used by loser script kiddies who think they know C.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 26
    Last Post: 07-05-2010, 10:43 AM
  2. using tar command in a shell script
    By rohan_ak1 in forum Linux Programming
    Replies: 1
    Last Post: 05-10-2008, 07:03 AM
  3. problem with "touch" command in c program
    By Moony in forum C Programming
    Replies: 10
    Last Post: 08-01-2006, 09:56 AM
  4. Batch file programming
    By year2038bug in forum Tech Board
    Replies: 10
    Last Post: 09-05-2005, 03:30 PM
  5. Ping problem
    By bladerunner627 in forum C++ Programming
    Replies: 12
    Last Post: 02-02-2005, 12:54 PM