Thread: Working with LZW data compression algorithm

  1. #1
    sparklezilla3
    Join Date
    Mar 2010
    Posts
    41

    Working with LZW data compression algorithm

    I have been working with the lzw data compression algorithm in which i need to modify to do a few things.

    First what I need to do is have in the command line to do a "-r" which resets the dictionary.

    I created the char variable for reset, and set up the if statement, however i think it sends the code into a forever while loop.
    So I think I placed the "if" statement in the wrong spot.

    Below is the code:
    I also need to implement a flag that clears the dictionary completely if the dictionary becomes full. I created the flag as char command
    and initialized it to take the 3rd argument from the command line.

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define BITS 12                   /* Setting the number of bits to 12, 13*/
    #define HASHING_SHIFT (BITS-8)    /* or 14 affects several constants.    */
    #define MAX_VALUE (1 << BITS) - 1 /* Note that MS-DOS machines need to   */
    #define MAX_CODE MAX_VALUE - 1    /* compile their code in large model if*/
                                      /* 14 bits are selected.               */
    #if BITS == 16
      #define TABLE_SIZE 99991
    #endif
    #if BITS == 14
      #define TABLE_SIZE 18041        /* The string table size needs to be a */
    #endif                            /* prime number that is somewhat larger*/
    #if BITS == 13                    /* than 2**BITS.                       */
      #define TABLE_SIZE 9029
    #endif
    #if BITS <= 12
      #define TABLE_SIZE 5021
    #endif
    
    void *malloc();
    
    int *code_value;                  /* This is the code value array        */
    unsigned int *prefix_code;        /* This array holds the prefix codes   */
    unsigned char *append_character;  /* This array holds the appended chars */
    unsigned char decode_stack[4000]; /* This array holds the decoded string */
    
    /*
     * Forward declarations
     */
    void compress(FILE *input,FILE *output);
    void expand(FILE *input,FILE *output);
    int find_match(int hash_prefix,unsigned int hash_character);
    void output_code(FILE *output,unsigned int code);
    unsigned int input_code(FILE *input);
    unsigned char *decode_string(unsigned char *buffer,unsigned int code);
    
    /********************************************************************
    **
    ** This program gets a file name from the command line.  It compresses the
    ** file, placing its output in a file named test.lzw.  It then expands
    ** test.lzw into test.out.  Test.out should then be an exact duplicate of
    ** the input file.
    **
    *************************************************************************/
    
    main(int argc, char *argv[])
    {
    FILE *input_file;
    FILE *output_file;
    FILE *lzw_file;
    char input_file_name[81];
    char command;
    
    command=(argv==3);
    
    /*
    **  The three buffers are needed for the compression phase.
    */
      code_value=(int*)malloc(TABLE_SIZE*sizeof(int));
      prefix_code=(unsigned int *)malloc(TABLE_SIZE*sizeof(unsigned int));
      append_character=(unsigned char *)malloc(TABLE_SIZE*sizeof(unsigned char));
      if (code_value==NULL || prefix_code==NULL || append_character==NULL)
      {
        printf("Fatal error allocating table space!\n");
        exit(-1);
      }
    /*
    ** Get the file name, open it up, and open up the lzw output file.
    */
      if (argc>1)
        strcpy(input_file_name,argv[1]);
      else
      {
        printf("Input file name? ");
        scanf("%s",input_file_name);
      }
      input_file=fopen(input_file_name,"rb");
      lzw_file=fopen("test.lzw","wb");
      if (input_file==NULL || lzw_file==NULL)
      {
        printf("Fatal error opening files.\n");
        exit(-1);
      };
    /*
    ** Compress the file.
    */
    if(command=='r')
    {
      compress(input_file,lzw_file);
    }
      fclose(input_file);
      fclose(lzw_file);
      free(c-ode_value);
    /*
    ** Now open the files for the expansion.
    */
      lzw_file=fopen("test.lzw","rb");
      output_file=fopen("test.out","wb");
      if (lzw_file==NULL || output_file==NULL)
      {
        printf("Fatal error opening files.\n");
        exit(-2);
      };
    /*
    ** Expand the file.
    */
      expand(lzw_file,output_file);
      fclose(lzw_file);
      fclose(output_file);
    
      free(prefix_code);
      free(append_character);
    }
    
    /*
    ** This is the compression routine.  The code should be a fairly close
    ** match to the algorithm accompanying the article.
    **
    */
    
    void compress(FILE *input,FILE *output)
    {
    unsigned int next_code;
    unsigned int character;
    unsigned int string_code;
    unsigned int index;
    int i;
    
      next_code=256;              /* Next code is the next available string code*/
      for (i=0;i<TABLE_SIZE;i++)  /* Clear out the string table before starting */
        code_value[i]=-1;
    
      i=0;
      printf("Compressing...\n");
      string_code=getc(input);    /* Get the first code                         */
    /*
    ** This is the main loop where it all happens.  This loop runs util all of
    ** the input has been exhausted.  Note that it stops adding codes to the
    ** table after all of the possible codes have been defined.
    */
      while ((character=getc(input)) != (unsigned)EOF)
      {
        if (++i==1000)                         /* Print a * every 1000    */
        {                                      /* input characters.  This */
          i=0;                                 /* is just a pacifier.     */
          printf("*");
        }
        index=find_match(string_code,character);/* See if the string is in */
        if (code_value[index] != -1)            /* the table.  If it is,   */
          string_code=code_value[index];        /* get the code value.  If */
        else                                    /* the string is not in the*/
        {                                       /* table, try to add it.   */
          if (next_code <= MAX_CODE)
          {
            code_value[index]=next_code++;
            prefix_code[index]=string_code;
            append_character[index]=character;
          }
          output_code(output,string_code);  /* When a string is found  */
          string_code=character;            /* that is not in the table*/
        }                                   /* I output the last string*/
      }                                     /* after adding the new one*/
    /*
    ** End of the main loop.
    */
      output_code(output,string_code); /* Output the last code               */
      output_code(output,MAX_VALUE);   /* Output the end of buffer code      */
      output_code(output,0);           /* This code flushes the output buffer*/
      printf("\n");
    }
    
    /*
    ** This is the hashing routine.  It tries to find a match for the prefix+char
    ** string in the string table.  If it finds it, the index is returned.  If
    ** the string is not found, the first available index in the string table is
    ** returned instead.
    */
    
    int find_match(int hash_prefix,unsigned int hash_character)
    {
    int index;
    int offset;
    
      index = (hash_character << HASHING_SHIFT) ^ hash_prefix;
      if (index == 0)
        offset = 1;
      else
        offset = TABLE_SIZE - index;
      while (1)
      {
        if (code_value[index] == -1)
          return(index);
        if (prefix_code[index] == hash_prefix && 
            append_character[index] == hash_character)
          return(index);
        index -= offset;
        if (index < 0)
          index += TABLE_SIZE;
      }
    }
    
    /*
    **  This is the expansion routine.  It takes an LZW format file, and expands
    **  it to an output file.  The code here should be a fairly close match to
    **  the algorithm in the accompanying article.
    */
    
    void expand(FILE *input,FILE *output)
    {
    unsigned int next_code;
    unsigned int new_code;
    unsigned int old_code;
    int character;
    int counter;
    unsigned char *string;
    
      next_code=256;           /* This is the next available code to define */
      counter=0;               /* Counter is used as a pacifier.            */
      printf("Expanding...\n");
    
      old_code=input_code(input);  /* Read in the first code, initialize the */
      character=old_code;          /* character variable, and send the first */
      putc(old_code,output);       /* code to the output file                */
    /*
    **  This is the main expansion loop.  It reads in characters from the LZW file
    **  until it sees the special code used to inidicate the end of the data.
    */
      while ((new_code=input_code(input)) != (MAX_VALUE))
      {
        if (++counter==1000)   /* This section of code prints out     */
        {                      /* an asterisk every 1000 characters   */
          counter=0;           /* It is just a pacifier.              */
          printf("*");
        }
    /*
    ** This code checks for the special STRING+CHARACTER+STRING+CHARACTER+STRING
    ** case which generates an undefined code.  It handles it by decoding
    ** the last code, and adding a single character to the end of the decode string.
    */
        if (new_code>=next_code)
        {
          *decode_stack=character;
          string=decode_string(decode_stack+1,old_code);
        }
    /*
    ** Otherwise we do a straight decode of the new code.
    */
        else
          string=decode_string(decode_stack,new_code);
    /*
    ** Now we output the decoded string in reverse order.
    */
        character=*string;
        while (string >= decode_stack)
          putc(*string--,output);
    /*
    ** Finally, if possible, add a new code to the string table.
    */
        if (next_code <= MAX_CODE)
        {
          prefix_code[next_code]=old_code;
          append_character[next_code]=character;
          next_code++;
        }
        old_code=new_code;
      }
      printf("\n");
    }
    
    /*
    ** This routine simply decodes a string from the string table, storing
    ** it in a buffer.  The buffer can then be output in reverse order by
    ** the expansion program.
    */
    
    unsigned char *decode_string(unsigned char *buffer,unsigned int code)
    {
    int i;
    
      i=0;
      while (code > 255)
      {
        *buffer++ = append_character/code/;
        code=prefix_code/code/;
        if (i++>=MAX_CODE)
        {
          printf("Fatal error during code expansion.\n");
          exit(-3);
        }
      }
      *buffer=code;
      return(buffer);
    }
    
    /*
    ** The following two routines are used to output variable length
    ** codes.  They are written strictly for clarity, and are not
    ** particularyl efficient.
    */
    
    unsigned int input_code(FILE *input)
    {
    unsigned int return_value;
    static int input_bit_count=0;
    static unsigned long input_bit_buffer=0L;
    
      while (input_bit_count <= 24)
      {
        input_bit_buffer |= 
            (unsigned long) getc(input) << (24-input_bit_count);
        input_bit_count += 8;
      }
      return_value=input_bit_buffer >> (32-BITS);
      input_bit_buffer <<= BITS;
      input_bit_count -= BITS;
      return(return_value);
    }
    
    void output_code(FILE *output,unsigned int code)
    {
    static int output_bit_count=0;
    static unsigned long output_bit_buffer=0L;
    
      output_bit_buffer |= (unsigned long) code << (32-BITS-output_bit_count);
      output_bit_count += BITS;
      while (output_bit_count >= 8)
      {
        putc(output_bit_buffer >> 24,output);
        output_bit_buffer <<= 8;
        output_bit_count -= 8;
      }
    }
    Last edited by husslela2; 06-30-2010 at 02:48 PM.

  2. #2
    Registered User
    Join Date
    Sep 2007
    Posts
    1,012
    Code:
    void *malloc();
    Don't do this. Any non-broken, relatively modern (as in within the last 15-20 years) system will properly prototype malloc() for you.
    Code:
    command=(argv==3);
    ...
    if(command=='r')
    Your compiler should give a diagnostic for the code argv==3. You can't compare a char** to an integer like that. I don't really know what you're trying to do. But if you want to compare strings, the arguments to your program will be argv[0], argv[1], argv[2] (with argv[0] typically being the program name itself).

    Before modifying a program like this, I'd recommend familiarizing yourself with the basics of C.

  3. #3
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    husslela2, did you realise that you haven't actually asked a question?
    Do you want people to just come along and look at that code and go uh huh uh huh, close this tab and then move on, or do you have something you want us to help with?

    You'd best paste code that actually compiles first, if possible.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    uh huh uh huh
    *closes tab*
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Compiling C in Visual Studio 2005
    By emanresu in forum C Programming
    Replies: 3
    Last Post: 11-16-2009, 04:25 AM
  2. Ziv-Lempel data compression algorithm
    By ataman in forum C++ Programming
    Replies: 1
    Last Post: 05-11-2008, 03:41 PM
  3. Simple compression algorithm?
    By cyberfish in forum Tech Board
    Replies: 8
    Last Post: 05-02-2008, 09:10 AM
  4. Bitmasking Problem
    By mike_g in forum C++ Programming
    Replies: 13
    Last Post: 11-08-2007, 12:24 AM
  5. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM