Thread: Create tar-like archiver

  1. #1
    Registered User
    Join Date
    Oct 2020
    Posts
    69

    Create tar-like archiver

    I want to write a simple version of tar, that is able to archive files provided as command line arguments (no compression needed). I've just started learning about binary files so I'm a bit lost.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    
    typedef struct{
    	char name[100];
    	char mode[8];
    	char uid[8];
    	char ouid[8];
    	char size[12];
    	char time[12];
    	char checksum[8];
    	char type[1];
    	char linkname[100];
    }HEADER;
    
    
    void readHeader(FILE *input)
    {
    	HEADER file;
    	fread(&file, sizeof(file), 1, input);
    //here I think I should use fwrite to write the header to the tar file
    }
    
    
    int main(int argc, char **argv)
    {
    	FILE *fin = fopen("tarOutput.tar", "wb");
    	if(!fin)
    	{
    		fprintf(stderr, "Error opening the .tar output file.\n");
    		exit(-1);
    	}
    	if(argc < 3)
    	{
    		fprintf(stderr, "Invalid format.\nCorrect format: ./a.out c tarOutput.tar file1 file2 ...\n");
    		exit(1);
    	}
    	if(strcmp(argv[1], "c") != 0)
    	{
    		fprintf(stderr, "Invalid option. You should use 'c' as the second argument.\nCorrect format: ./a.out c tarOutput.tar file1 file2 ...\n");
    		exit(2);
    	}
    	for(int i = 3; i <= argc; i++) // do this for each file specified as argument; start at 3 because argv[0,1,2] are program name, c, tar output file
    	{
    		//int fileIndex = 1;
    		FILE *fileInput = fopen(argv[i], "rb");
    		if(!fileInput)
    		{
    			fprintf(stderr, "Failed to open file '%s' for reading\n", argv[i]);
    			exit(-2);
    		}
    		readHeader(fileInput);
    	}
    	fclose(fin);
    	return 0;
    }
    I'm using the following wiki page as model for the header tar (computing) - Wikipedia . The header is 512 bytes. So I know I should first write the header followed by the contents of each file. Could you show me some steps how to correctly implement the header? Thank you!

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > FILE *fin = fopen("tarOutput.tar", "wb");
    Naming something like 'in' when it's an output file just makes the rest of the code read badly.

    Code:
    FILE *fout = fopen("tarOutput.tar", "wb");
    for(int i = 3; i <= argc; i++) {
        HEADER header;
        strcpy(header.name,argv[i]);  // save the filename
        // now do some other stuff for the rest of your header.member names
        // you can get most of it from https://linux.die.net/man/2/stat
    
        // write the header
        fwrite(&header,sizeof(header),1,fout);
    
        // now write the content of argv[i] to fout
    }
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    May 2012
    Posts
    505
    Quote Originally Posted by rmmstn View Post
    I want to write a simple version of tar, that is able to archive files provided as command line arguments (no compression needed). I've just started learning about binary files so I'm a bit lost.

    I'm using the following wiki page as model for the header tar (computing) - Wikipedia . The header is 512 bytes. So I know I should first write the header followed by the contents of each file. Could you show me some steps how to correctly implement the header? Thank you!
    I don't think that's a good idea. It's best to write your own tar attempt first, then worry about how the actual tar implementation does it later.

    Tar collapses a directory into one file. A directory is a tree. A tree can be represented serially by nested brackets. However your nodes will have large amounts of arbitary binary data in them as they represent who files.

    Worry about read permissions and the like later. Initially just get a version working which takes a small directory, and the restores the structure together with the filenames and data.
    I'm the author of MiniBasic: How to write a script interpreter and Basic Algorithms
    Visit my website for lots of associated C programming resources.
    https://github.com/MalcolmMcLean


  4. #4
    Registered User
    Join Date
    Oct 2020
    Posts
    69
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    
    #define MAX 3000
    
    
    typedef struct{
    	char name[100];
    	long size, checksum;
    }HEADER;
    
    
    /*long calculateChecksum(HEADER header)
    {
    	size_t checksum = 0;
        int i;
        unsigned char* bytes = &header;
        for(i = 0; i < sizeof(HEADER); ++i )
    	{
            checksum += bytes[i];
    	}
    	return checksum;
    }*/
    
    
    int main(int argc, char **argv)
    {
    	FILE *fout = fopen("tarOutput.tar", "wb");
    	if(!fout)
    	{
    		fprintf(stderr, "Error opening the .tar output file.\n");
    		exit(-1);
    	}
    	if(argc < 3)
    	{
    		fprintf(stderr, "Invalid format.\nCorrect format: ./a.out c tarOutput.tar file1 file2 ...\n");
    		exit(1);
    	}
    	if(strcmp(argv[1], "c") != 0)
    	{
    		fprintf(stderr, "Invalid option. You should use 'c' as the second argument.\nCorrect format: ./a.out c tarOutput.tar file1 file2 ...\n");
    		exit(2);
    	}
    
    
    	for(int i = 3; i < argc; i++) // do this for each file specified as argument; start at 3 because argv[0,1,2] are program name, c, tar output file
    	{
    		HEADER header;
    		strcpy(header.name, argv[i]); // saving the name of the file
    		FILE *fin = fopen(argv[i], "rb");
    		if(fin == NULL)
    		{
    			fprintf(stderr, "File not found.\n");
    			exit(1);
    		}
    		fseek(fin, 0L, SEEK_END);
    		header.size = ftell(fin);
    //		printf("%s is %ld bytes long\n", header.name, header.size);
    		fseek(fin, 0L, SEEK_SET);
    		//header.checksum = calculateChecksum(header);
    		fwrite(&header, sizeof(header), 1, fout);
    
    
    		//copying the contents of the file into the tar output
    
    
    		while(!feof(fin))
    		{
            char buf[MAX];
            size_t content = fread(buf, 1, MAX, fin);
            fwrite(buf, 1, content, fout);
        	}
    		fclose(fin);
    	}
    	fclose(fout);
    	return 0;
    }
    Right now, things are being copied into tarOutput.tar, however if I try to access the directory it's still empty. I've decided to stick to name, size and checksum (wiki says checksum is also important, not sure why atm) as the struct members for now. Could this be the reason I can't see the files in the archive? Do I need every single member from that long structure? (I'm not asking out of laziness, it's just that I need a simple archiver, I don't need info about permission and stuff like that).

  5. #5
    Registered User
    Join Date
    Dec 2017
    Posts
    1,632
    A quick and dirty example.
    Code:
    Consider this directory structure, where w, x, y, z are text files containing
    "w data", "x data", etc:
      
      a
      ├── b
      │   ├── w
      │   └── x
      └── c
          ├── d
          │   └── z
          └── y
      
    The program below will save it like this (as displayed by hd) :
      
    00000000  02 00 00 00 61 2f 00 00  00 00 00 00 00 00 04 00  |....a/..........|
    00000010  00 00 61 2f 63 2f 00 00  00 00 00 00 00 00 06 00  |..a/c/..........|
    00000020  00 00 61 2f 63 2f 64 2f  00 00 00 00 00 00 00 00  |..a/c/d/........|
    00000030  07 00 00 00 61 2f 63 2f  64 2f 7a 07 00 00 00 00  |....a/c/d/z.....|
    00000040  00 00 00 7a 20 64 61 74  61 0a 05 00 00 00 61 2f  |...z data.....a/|
    00000050  63 2f 79 07 00 00 00 00  00 00 00 79 20 64 61 74  |c/y........y dat|
    00000060  61 0a 04 00 00 00 61 2f  62 2f 00 00 00 00 00 00  |a.....a/b/......|
    00000070  00 00 05 00 00 00 61 2f  62 2f 78 07 00 00 00 00  |......a/b/x.....|
    00000080  00 00 00 78 20 64 61 74  61 0a 05 00 00 00 61 2f  |...x data.....a/|
    00000090  62 2f 77 07 00 00 00 00  00 00 00 77 20 64 61 74  |b/w........w dat|
    000000a0  61 0a                                             |a.|
      
    Every entry has the following structure (no checksum) :
      filename_size    uint32_t 
      filename         char[filename_size]
      file_size        uint64_t
      file_data        char[file_size]   // not present if it's a directory
     
    A directory is indicated by it's name ending with a slash.
     
    The basic structure of the program:
      main
      ├── create_tarfile
      │   └── store_file
      │       ├── write_dir
      │       └── write_file
      ├── list_tarfile
      └── extract_tarfile
          └── extract_file
    Code:
    #define _BSD_SOURCE
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdbool.h>
    #include <stdint.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    #include <unistd.h>
    #include <dirent.h>
     
    void create_tarfile  (const char *tarfile, char **files);
    void list_tarfile    (const char *tarfile);
    void extract_tarfile (const char *tarfile);
    void store_file      (FILE *fout, const char *name);
    void write_dir       (FILE *fout, const char *name);
    void write_file      (FILE *fout, const char *name);
    void extract_file    (FILE *fin,  const char *name, uint64_t size);
    void die             (const char *msg);
    void usage           (const char *progname);
     
    int main(int argc, char **argv)
    {
        if (argc < 2)
            usage(argv[0]);
        else if (strcmp(argv[1], "c") == 0)
        {
            if (argc < 4) usage(argv[0]);
            create_tarfile(argv[2], argv + 3);
        }
        else if (strcmp(argv[1], "t") == 0)
        {
            if (argc < 3) usage(argv[0]);
            list_tarfile(argv[2]);
        }
        else if (strcmp(argv[1], "x") == 0)
        {
            if (argc < 3) usage(argv[0]);
            extract_tarfile(argv[2]);
        }
        else
            usage(argv[0]);
        return 0;
    }
     
    void create_tarfile(const char *tarfile, char **files)
    {
        FILE *fout = fopen(tarfile, "wb");
        if (!fout) die("create_tarfile");
        for ( ; *files; ++files)
            store_file(fout, *files);
        fclose(fout);
    }
     
    void store_file(FILE *fout, const char *name)
    {
        struct stat sb;
        if (stat(name, &sb) == -1) die("store_file: stat");
        bool is_dir = false;
        switch (sb.st_mode & S_IFMT) {
        case S_IFDIR: is_dir = true; break;
        case S_IFREG:                break;
        default: die("store_file: not dir or regular file");
        }
        uint32_t filename_size = strlen(name) + is_dir;
        fwrite(&filename_size, sizeof filename_size, 1, fout);
        fwrite(name, 1, filename_size - is_dir, fout);
        uint64_t size = sb.st_size;
        if (is_dir)
        {
            fputc('/', fout); // dir names end in /
            size = 0;         // dirs have a "size" of 0
        }
        fwrite(&size, sizeof size, 1, fout);
        if (is_dir)
            write_dir(fout, name);
        else
            write_file(fout, name);
    }
     
    void write_dir(FILE *fout, const char *name)
    {
        char n[FILENAME_MAX];
        DIR *dir = opendir(name);
        struct dirent *ent;
        while ((ent = readdir(dir)) != NULL)
        {
            if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
                continue;
            strcpy(n, name);
            strcat(n, "/");
            strcat(n, ent->d_name);
            store_file(fout, n);
        }
        closedir(dir);
    }
     
    void write_file(FILE *fout, const char *name)
    {
        char buf[BUFSIZ];
        FILE *fin = fopen(name, "rb");
        if (!fin) die("store_file");
        for (size_t n; (n = fread(buf, 1, sizeof buf, fin)) > 0; )
            fwrite(buf, 1, n, fout);
        fclose(fin);
    }
     
    void list_tarfile(const char *tarfile)
    {
        FILE *fin = fopen(tarfile, "rb");
        if (!fin) die("list_tarfile");
        for (int name_size; fread(&name_size, sizeof name_size, 1, fin) > 0; )
        {
            char name[256] = {0};
            fread(name, 1, name_size, fin);
            printf("%s\n", name);
            uint64_t size;
            fread(&size, sizeof size, 1, fin);
            fseek(fin, size, SEEK_CUR);
        }
        fclose(fin);
    }
     
    void extract_tarfile(const char *tarfile)
    {
        FILE *fin = fopen(tarfile, "rb");
        if (!fin) die("extract_tarfile");
        char name[FILENAME_MAX];
        for (int name_size; fread(&name_size, sizeof name_size, 1, fin) > 0; )
        {
            fread(name, 1, name_size, fin);
            name[name_size] = '\0';
            off_t size;
            fread(&size, sizeof size, 1, fin);
            if (name[strlen(name) - 1] == '/')
                mkdir(name, S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
            else
                extract_file(fin, name, size);
        }
        fclose(fin);
    }
     
    void extract_file(FILE *fin, const char *name, uint64_t size)
    {
        FILE *fout = fopen(name, "wb");
        if (!fout) die("extract_tarfile");
        char buf[BUFSIZ];
        const uint64_t bsz = sizeof buf;
        for (uint64_t n;
             (n = fread(buf, 1, size > bsz ? bsz : size, fin)) > 0;
             size -= n)
            fwrite(buf, 1, n, fout);
        fclose(fout);
    }
     
    void die(const char *msg)
    {
        perror(msg);
        exit(EXIT_FAILURE);
    }
     
    void usage(const char *progname)
    {
        fprintf(stderr, "Usage: Create:  %s c TARFILE FILE...\n", progname);
        fprintf(stderr, "       List:    %s t TARFILE\n", progname);
        fprintf(stderr, "       Extract: %s x TARFILE\n", progname);
        exit(EXIT_FAILURE);
    }
    A little inaccuracy saves tons of explanation. - H.H. Munro

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How create and use DLL in Dev-C++
    By atztek in forum C Programming
    Replies: 0
    Last Post: 12-13-2015, 08:29 AM
  2. create and populate create bidimensional array
    By darkducke in forum C Programming
    Replies: 0
    Last Post: 12-03-2010, 07:06 AM
  3. file archiver
    By ufoludas in forum C Programming
    Replies: 5
    Last Post: 06-06-2010, 08:35 AM
  4. Create a DLL using Dev-C++
    By doia in forum C++ Programming
    Replies: 1
    Last Post: 01-14-2010, 09:07 PM
  5. Create a URL?
    By Yarin in forum C++ Programming
    Replies: 9
    Last Post: 09-26-2007, 11:22 AM

Tags for this Thread