Thread: Writing array, to file

  1. #1
    Registered User
    Join Date
    Aug 2007
    Posts
    4

    Writing array, to file

    Hi,

    could someone please point me in the right direction here.

    Basically what i want to do it have a very fast key to value (int32=>int32) lookup which can be written to and stored in a binary file. I think similar to writing a hash table to a file.

    I'm not really sure how to explain it, but i would like to do something like this (i know this won't work)

    Code:
      
         int my_array[3];
    
         my_array[1] = 34;
         my_array[284] = 848;
         my_array[any_random_number] = 90;
    
         fwrite(&my_array, sizeof(int)*3, file);
    Code:
     
        /* Now Read Array Back */
    
        int array_read[3];
        fread(&array_read, sizeof(int)*3, file);
        print("array_read[284] has a value of 848);

    Does anyone know how this could be achieved? Thanks
    Last edited by zootreeves; 09-07-2007 at 06:58 AM.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    First, on an array that has 3 as size, you quite obviously won't be able to use indices beyond three, so my_array[248] - that's going to access some other number.

    If you want to look up 248, then your array must be at least 249 in size [becuase elements are 0..248 -> 249 elements].

    But you can write a section at a time or the whole array to file using fwrite(), and read it back again with fread().

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Aug 2007
    Posts
    4
    Thanks for the response, yes i'm aware you cannot put keys in larger than the array size, but obviously i don't want to have to make the array 248 length in size just to insert a value of 248 have the rest 247 values empty and wasting memory.

    I wrote my program using JUDY arrays (JUDYL), but I've found that at some point i'm going to have to store the values in them. I know i could do it by writing each Key=Value into a file then when they are needed read them each individually and insert them into a newly constructed Judy array/hash table then perform my lookups, but reading into a new structure is too slow.

    I need to know which data structure/if any binary tree/hash table/judy array/associative array could be modified to do this and if there any libraries that can do this already?
    Last edited by zootreeves; 09-07-2007 at 08:59 AM.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    I don't suppose there is any chance of you doing this in C++ is there?

    In the first instance, you could use a std::map to do the work. Then later on you could implement your own class with additional internal properties, and overload the [ ] operator to do what you're currently doing.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by zootreeves View Post
    Thanks for the response, yes i'm aware you cannot put keys in larger than the array size, but obviously i don't want to have to make the array 248 length in size just to insert a value of 248 have the rest 247 values empty and wasting memory.

    I wrote my program using JUDY arrays (JUDYL), but I've found that at some point i'm going to have to store the values in them. I know i could do it by writing each Key=Value into a file then when they are needed read them each individually and insert them into a newly constructed Judy array/hash table then perform my lookups, but reading into a new structure is too slow.

    I need to know which data structure/if any binary tree/hash table/judy array/associative array could be modified to do this and if there any libraries that can do this already?
    Aside from Salem's suggestion of using C++, there is really no way you can use indices that are beyond the limit of an array [sure you can use indices, but the effect of doing so is definitely undefined, and most likely will cause bad things with your application].

    What I would suggest is to use some sort of function to insert and retrieve data from your data-storage.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Registered User
    Join Date
    Aug 2007
    Posts
    4
    Thanks for your help, C++ isn't really an option since i can't seem to understand it for the life of me. But I've got it working with a hash table anyway. The too main things i found were.

    1. Realloc one pointer only and assign addresses within that space, this way you know it's contiguous and so can be written with fwrite by supplying only one pointer.
    2. The Values in the table should contain a pointer to the linked list which is relative to the address of the first pointer. So when you read it back from the file you can find them just by knowing the first address

    Sounds more complicated than it is i think. Here it is anyway in case someone of there is trying to to a similar thing

    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/stat.h>
    #include "hashtable.h"
    
    typedef struct H* HashTable;
    HashTable createHashTable(unsigned int capacity);
    void addItem(HashTable table, int item, int value);
    int containsItem(HashTable table, int item);
    void destroyHashTable(HashTable table);
    
    struct Node;
    typedef struct Node* Link;
    
    struct Node {
        int item;
        int value;
        Link next;
    };
    
    
    struct H {
        Link* data;
        unsigned int capacity;
        unsigned int bytes;
    };
    
    /*
     * Allocates a new node to hold the given initial values, and
     * returns a pointer to that node.  Returns NULL if the node
     * could not be allocated.
     */
    Link newNode(int item, int value, Link next, HashTable table) {
        printf("New node Address of table &#37;d, bytes %d\n", table, table->bytes);
        table = realloc(table, table->bytes + sizeof(struct Node));
        int address = table;
        Link result = address + table->bytes; //calloc(1, sizeof(struct Node));
        int rel_address = table->bytes;
        printf("New node Address of link %d, bytes %d\n", result, table->bytes);
        printf("Relative adddress %d of item %d should make an address of %d\n", rel_address, item, address + rel_address);
        table->bytes = table->bytes + sizeof(struct Node);
    
        if (result == NULL) {
    	printf("Realloc failed\n"); 
           return NULL;
        }
    
        result->item = item;
        result->value = value;
    
        printf("Address of next %d\n", next);
        result->next = next;
        return rel_address;
    }
    
    /*
     * Creates a hashtable with the given capacity and hash function.
     * Returns a pointer to the newly created table, or NULL if the
     * table could not be created.
     */
    HashTable createHashTable(unsigned int capacity) {
        HashTable table = (HashTable)calloc(1, sizeof(struct H));
        table->bytes = sizeof(struct H);
        printf("Address of table %d, bytes %d\n", table, table->bytes);
        if (table == NULL) return NULL;
        table = (Link*)realloc(table, table->bytes + (sizeof(void *) * capacity));
        printf("Address of table after realloc %d, bytes %d\n", table, table->bytes);
        int address = table;
        printf("Predicted Address of table->data %d\n", address + table->bytes);
        table->data = address + table->bytes;
        printf("Address of table->data %d\n", table->data);
        table->bytes = table->bytes + (sizeof(void *) * capacity);
        if (table->data == NULL) return NULL;
        table->capacity = capacity;
        return table;
    }
    
    /*
     * Adds the given item to the table, unless the item is already in
     * the table, in which case it does nothing.
     */
    void addItem(HashTable table, int item, int value) {
        printf("Ok\n");
    
        /* Which bucket does the item hash to? */
        int index = item % table->capacity;
    
        /* Search the linked list - if already there, bail! */
        int rel_address = table->data[index];
        int address = table;
        printf("Table Address %d, Rel Address %d\n", address, rel_address);
        Link p = address + rel_address;
    
        int pnexta = rel_address;
        printf("Address of p->next %d\n", pnexta + address);
        while (pnexta != NULL) {
    	printf("Testing Rel address %d\n", pnexta);
            p = pnexta + address;
    	printf("%d == %d\n", p->item, item);
            if (p->item == item) {
    	    printf("Already in index\n");
                return;
            }
    	pnexta = p->next;
        }
    
        /* Didn't find it in the list - add it! */
        printf("Inserting %d into index\n", index);
        table->data[index] = newNode(item, value, table->data[index], table);
    }
    
    /*
     * Returns 1 (true) if item is in table, otherwise returns 0 (false).
     */
    int containsItem(HashTable table, int item) {
    
        /* Which bucket does the item hash to? */
        int index = item % table->capacity;
    
    
        int rel_address = table->data[index];
    
        printf("Relative address of lookup = %d\n", rel_address);
        int address = table;
        Link p = address + rel_address;
        printf("Real address of lookup = %d\n",  p);
    
        /* Search the linked list */
        int pnexta = rel_address;
        printf("Address of p->next %d\n", pnexta + address);
        while (pnexta != NULL) {
    	printf("Testing Rel address %d\n", pnexta);
            p = pnexta + address;
    	printf("%d == %d\n", p->item, item);
            if (p->item == item) {
    	    printf("Returning value %d\n", p->value);
                return p->value;
            }
    	pnexta = p->next;
        }
    
        /* Didn't find it in the list */
        return 0;
    }
    
    /*
     * Cleans up any memory allocated when we created the hashtable.
     */
    void destroyHashTable(HashTable table) {
    
        free(table);
    }
    
    
    int main () {
    
       HashTable table = createHashTable(100);
       addItem(table, 560, 34);
       addItem(table, 660, 12);
      // addItem(table, 570);
      // addItem(table, 550023);
    
     
        int bytes_to_read = table->bytes; //Cheat, should read table first to get the number of 
                                                           // bytes then read the rest, but i can't be bothered at the moment
    
        FILE * stream = NULL;
        if ((stream = fopen("./test", "wb+")) == NULL) {
          printf("Could not open file for writing\n");
          exit(1);
        } else {
          fwrite(table, table->bytes, 1, stream);
          fclose(stream);
        }
    
       destroyHashTable(table);
    
       /* Test reading it back */
    
        stream = NULL;
        if ((stream = fopen("./test", "rb+")) == NULL) {
          printf("Could not open file for reading\n");
          exit(1);
        } else {
          HashTable new_table = malloc(bytes_to_read);
          fread(new_table, bytes_to_read, 1, stream);
         
    	/*test search */
    	   int value = 0;
       	   if ((value = containsItem(new_table, 560)) != NULL) {
    	      printf("%d found\n", value);
               } else {
    	        printf("not found\n");
               }
          fclose(stream);
        }
    
    
    
    }

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Have you considered what happens if realloc() returns NULL ?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Code:
    typedef struct H* HashTable;
    HashTable table;
    
    int address = table;
    Storing an address in an int is a bad idea. I'd use a void pointer instead.

    But why do you need to store the address anyway?
    Code:
    Link newNode(int item, int value, Link next, HashTable table) {
        printf("New node Address of table %d, bytes %d\n", table, table->bytes);
        table = realloc(table, table->bytes + sizeof(struct Node));
        /* int address = table; */ 
        Link result = table + table->bytes; //calloc(1, sizeof(struct Node));
        int rel_address = table->bytes;
        printf("New node Address of link %d, bytes %d\n", result, table->bytes);
        printf("Relative adddress %d of item %d should make an address of %d\n", rel_address, item, table + rel_address);
        table->bytes = table->bytes + sizeof(struct Node);
    
        if (result == NULL) {
    	printf("Realloc failed\n"); 
           return NULL;
        }
    
        result->item = item;
        result->value = value;
    
        printf("Address of next %d\n", next);
        result->next = next;
        return rel_address;
    }
    I'd also make rel_address and the return value of the function and anywhere you use an address size_ts instead of ints.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  9. #9
    Registered User
    Join Date
    Aug 2007
    Posts
    4
    Thanks for the suggestions, the pointer to table->data also needs ot be relative as well. cleaner version:

    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/stat.h>
    
    typedef struct H HashTable;
    HashTable * createHashTable(unsigned int capacity);
    void addItem(HashTable * table, int item, int value);
    int containsItem(HashTable * table, int item);
    void destroyHashTable(HashTable * table);
    
    struct Node;
    typedef struct Node Link;
    
    struct Node {
        int item;
        int value;
        unsigned int next;
    };
    
    struct H {
        unsigned int bytes;
        unsigned int capacity;
        unsigned int * data;
    };
    
    /*
     * Allocates a new node to hold the given initial values, and
     * returns a pointer to that node.  Returns NULL if the node
     * could not be allocated.
     */
    unsigned int newNode(int item, int value, unsigned int next, HashTable * table) {
        void * tmp_pointer = realloc(table, table->bytes + sizeof(struct Node));
    	if (tmp_pointer == NULL) {
    	  printf("Realloc failed, out of Contiguous memory, cannot continue\n");
    	  free(table);
    	  return NULL;
            } else {
    	  table = tmp_pointer;
    	}
    
    
        long unsigned int rel_address = table->bytes;
        Link * result = (void *)table + rel_address; //calloc(1, sizeof(struct Node));
        table->bytes = rel_address + sizeof(struct Node);
    
        result->item = item;
        result->value = value;
        result->next = next;
        return rel_address;
    }
    
    /*
     * Creates a hashtable with the given capacity and hash function.
     * Returns a pointer to the newly created table, or NULL if the
     * table could not be created.
     */
    HashTable * createHashTable(unsigned int capacity) {
        HashTable * table = malloc(sizeof(struct H));
    
        if (table == NULL) {
          return NULL;
        }
    
        /* Move Bytes along */
        table->bytes = sizeof(struct H);
    
        int empty_space = sizeof(unsigned int) * capacity;
        void * tmp_pointer = realloc(table, table->bytes + empty_space);
    	if (tmp_pointer == NULL) {
    	  printf("Realloc failed, out of Contiguous memory, cannot continue\n");
    	  free(table);
    	  return NULL;
            } else {
    	  table = tmp_pointer;
    	}
    
    
        table->data = table->bytes;
        bzero((unsigned int)table + (unsigned int)table->data, empty_space); 
        table->bytes = table->bytes + empty_space;
        table->capacity = capacity;
        return table;
    }
    
    /*
     * Adds the given item to the table, unless the item is already in
     * the table, in which case it does nothing.
     */
    void addItem(HashTable * table, int item, int value) {
    
       
        unsigned int * data = (unsigned int)table->data + (unsigned int)table;
    
        /* Which bucket does the item hash to? */
        int index = item &#37; table->capacity;
    
        /* Search the linked list - if already there, bail! */
        long unsigned int rel_address = data[index];
        Link * p = (void *)table + rel_address;
    
        long unsigned int pnexta = rel_address;
        while (pnexta != NULL) {
            p = pnexta + (void *)table;
            if (p->item == item) {
    	    printf("Already in index\n");
                return;
            }
    	pnexta = p->next;
        }
    
    
        /* Didn't find it in the list - add it! */
        data[index] = newNode(item, value, data[index], table);
    }
    
    /*
     * Returns 1 (true) if item is in table, otherwise returns 0 (false).
     */
    int containsItem(HashTable * table, int item) {
        /* Which bucket does the item hash to? */
        unsigned int index = item % table->capacity;
    
        unsigned int * data = (unsigned int)table->data + (unsigned int)table;
        int rel_address = data[index];
        int address = table;
        Link * p = address + rel_address;
    
        /* Search the linked list */
        unsigned int pnexta = rel_address;
        while (pnexta != NULL) {
            p = pnexta + address;
            if (p->item == item) {
                return p->value;
            }
    	pnexta = p->next;
        }
    
        /* Didn't find it in the list */
        return 0;
    }
    
    /*
     * Cleans up any memory allocated when we created the hashtable.
     */
    void destroyHashTable(HashTable * table) {
    
        free(table);
    }
    
    
    int main () {
    
       HashTable * table = createHashTable(10);
       addItem(table, 8493, 60);
       addItem(table, 6876, 12);
    
        FILE * stream = NULL;
        if ((stream = fopen("./test", "wb+")) == NULL) {
          printf("Could not open file for writing\n");
          exit(1);
        } else {
          fwrite(table, table->bytes, 1, stream);
          fclose(stream);
        }
    
       destroyHashTable(table);
    
       /* Test reading it back */
        stream = NULL;
        if ((stream = fopen("./test", "rb+")) == NULL) {
          printf("Could not open file for reading\n");
          exit(1);
        } else {
          HashTable * new_table = malloc(sizeof(struct H));
          fread(new_table, sizeof(struct H), 1, stream);
          
          new_table = realloc(new_table, new_table->bytes);
          fread((void *)new_table + sizeof(struct H), new_table->bytes - sizeof(struct H), 1, stream);
         
    	/*test search */
    	   int value = 0;
       	   if ((value = containsItem(new_table, 8493)) != NULL) {
    	      printf("%d found\n", value);
               } else {
    	      printf("not found\n");
               }
          fclose(stream);
        }
    
    
    
    }

  10. #10
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Code:
    /*
     * Allocates a new node to hold the given initial values, and
     * returns a pointer to that node.  Returns NULL if the node
     * could not be allocated.
     */
    unsigned int newNode(int item, int value, unsigned int next, HashTable * table) {
        void * tmp_pointer = realloc(table, table->bytes + sizeof(struct Node));
    	if (tmp_pointer == NULL) {
    	  printf("Realloc failed, out of Contiguous memory, cannot continue\n");
    	  free(table);
    	  return NULL;
            } else {
    	  table = tmp_pointer;
    	}
    
    
        long unsigned int rel_address = table->bytes;
        Link * result = (void *)table + rel_address; //calloc(1, sizeof(struct Node));
        table->bytes = rel_address + sizeof(struct Node);
    
        result->item = item;
        result->value = value;
        result->next = next;
        return rel_address;
    }
    rel_address is an unsigned long, whereas the function returns an unsigned int. I'd probably make them the same. And I'd use size_t instead of unsigned long or unsigned int, but that's just because I like size_t.

    Code:
    HashTable * table = malloc(sizeof(struct H));
    It's usually better to use the idiom
    Code:
    type *p = malloc(sizeof(*p));
    than a hard-coded type like int or struct H. They're both compile-time constants, so there's no problem there. But the sizeof(*p) version works for any type of p, and makes changing the type of p easier. In your case the sizeof(*p) version would be
    Code:
    HashTable * table = malloc(sizeof(*table));
    or
    Code:
    HashTable * table = malloc(sizeof *table);
    Ditto for pretty much everywhere you use sizeof().

    Code:
    fread((void *)new_table + sizeof(struct H), new_table->bytes - sizeof(struct H), 1, stream);
    There's no need to cast new_table to a void*. I think you could just leave out the cast. Or you could leave it in if you feel it makes it more readable or something.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Formatting a text file...
    By dagorsul in forum C Programming
    Replies: 12
    Last Post: 05-02-2008, 03:53 AM
  2. Writing an array to a file.
    By Queatrix in forum C++ Programming
    Replies: 8
    Last Post: 04-24-2005, 01:41 PM
  3. writing data to a file from an array of pointers
    By Mingzhi in forum C++ Programming
    Replies: 1
    Last Post: 07-19-2004, 09:07 AM
  4. writing contents of array to file
    By Unregistered in forum C Programming
    Replies: 4
    Last Post: 06-26-2002, 04:06 PM
  5. File I/O problems!!! Help!!!
    By Unregistered in forum C Programming
    Replies: 4
    Last Post: 05-17-2002, 08:09 PM