Thread: Casting a struct to char**, or abandon the struct and use char**?

  1. #1
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485

    Casting a struct to char**, or abandon the struct and use char**?

    Sorry for the cryptic title.

    I have a program that reads csv files and adds each row to a data structure. Right now the data structure is separated from it's content, which is handled by it's own source file which knows about the content, declares a struct, parses a csv line etc.

    What I want to do now is to find a way to easily change this content specific file, for example by only create a new struct that reflects a csv row. However each function depends on the struct to be exactly the way it currently is.

    Is it a viable solution to cast the struct to a char** array (the struct contains char* only) and use pragmas or __attribute__ to make sure it's packed, or is it better to abandon the struct and only use a char** array, and with it lose the named members that a struct has?

  2. #2
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    What I want to do now is to find a way to easily change this content specific file, for example by only create a new struct that reflects a csv row. However each function depends on the struct to be exactly the way it currently is.
    What functions are you talking about? Functions in the "content specific file", or some other functions? How are you passing this structure from your "content specific file" to the rest of your program. Depending on how you are using the structure you may be able to place the "new" information at the end of your structure. Then only functions designed to use this "new" information will try to access the "new" data.

    Jim

  3. #3
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by jimblumberg View Post
    What functions are you talking about? Functions in the "content specific file"
    Yes, although the aim is the have the content specific file be nothing more than a header that declares a struct. The parsing of csv and so on would be separated from the struct. The problem right now is that the parsing functions, memory allocations (for struct members) and so on depends on the struct. If I can make all these functions take a void* and then assign that to a char** inside the function and get size by: sizeof(struct s) / sizeof(char*) that would solve it. I just don't know if it's considered ok, or if there is perhaps a different way to approach it.



    Quote Originally Posted by jimblumberg View Post
    How are you passing this structure from your "content specific file" to the rest of your program. Depending on how you are using the structure you may be able to place the "new" information at the end of your structure. Then only functions designed to use this "new" information will try to access the "new" data.

    Jim
    The struct is #include'd in the files that will be using it.

    Edit:

    Reading back what I just wrote, I realise that it might not make much sense at all. I basically want to see how far it's possible to push decoupling. Together these files would read and parse csv and build a data structure in memory.

    I want to see if I can adapt the functions to any csv file by only changing a header file that declares a struct.
    Last edited by Subsonics; 02-15-2012 at 11:45 AM.

  4. #4
    Registered User
    Join Date
    May 2010
    Posts
    4,632
    I want to see if I can adapt the functions to any csv file by only changing a header file that declares a struct.
    Why can't you just pass the correct structure into the function? This structure could have a different number and type of elements depending on the format of the csv file. If you have two structures, one structure to hold the data from the csv file, the other would be the "header". This "header" structure would contain fields containing the number of fields, an array containing the size of the fields, the type of the fields, and possibly the length of the fields.

    Something like:
    Code:
    // Header structures.
    
    struct field_info
    {
       char field_type[3]; // c,s,i,ui,l, ul,f,d and any other types needed.
       size_t field_size;   // Probably only needed for strings.
    };
    
    #define NUM_FIELDS 10
    struct record
    {
       const size_t num_fields;  // Number of fields per line.
       struct field_info[NUM_FIELDS]; 
    };
    
    // Your data structure below. Contains NUM_FIELDS fields, of the type described in the field_info structure.
    You would have some kind of "initialization" function that will fill in the values in the record structure. So you would need both a small source file along with the header file. You could even read the required information for the header information from a file if so desired.

    Jim

  5. #5
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Hm, interesting but wouldn't that make the struct members anonymous if that makes sense in the same way that an array would? I might not completely understand what you meant though.

    This is two mock up examples:

    Code:
    // declared in a separate header
    struct record {
       char *name;
       char *address;
    };
    Option 1:

    Code:
    struct record *parse_csv_row(char *line) {
       struct record *r = calloc(sizeof(struct record), 1);
    
       // using strsep and strdup to get the string to r->name
    
       // ditto for r->address;
    
       return r;
    };
    2:

    Code:
    #define COLS sizeof(struct record) / sizeof(char*)
    
    void *parse_csv_row(char *line) {
       char **record = calloc(sizeof(struct record), 1);
    
       for(int i = 0; i < COLS; i++) {
          // using strsep and strdup to get the string to record[i]
       }
    
       return record;
    }
    The first function and any similar function that operates on the struct would have to be rewritten to reflect changes in the struct.

  6. #6
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    You could make a union of the struct and an actual array of char pointers, but you would have to update the union's type declaration if the members change. I'm not really sure if you'd be happy with that or not. I get the feeling I'm just moving the problem to another location.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    If you want to decouple CSV parsing from your domain models, then the solution is to write a generic CSV parser that parses the CSV file into records that contain fields that are (or contain) strings. Your struct objects are then created by processing these records as per the requirements of the struct. They don't need to know anything about CSV parsing; they just need to know that the nth field of the record corresponds to a particular member of the struct, then convert from string to the member type accordingly.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by whiteflags View Post
    You could make a union of the struct and an actual array of char pointers, but you would have to update the union's type declaration if the members change. I'm not really sure if you'd be happy with that or not. I get the feeling I'm just moving the problem to another location.
    That would be somewhat similar to what Jim proposed it sounds like. I think it might be easier to just use the array then and forget about named members, I guess it would be possible to define enums that represents the indexes, but it might be too messy.


    Quote Originally Posted by laserlight View Post
    If you want to decouple CSV parsing from your domain models, then the solution is to write a generic CSV parser that parses the CSV file into records that contain fields that are (or contain) strings.
    That is pretty much the direction I have been going in, and quite similar to the 2nd example above, except for the "sizeof(struct record)" part which depends on the struct being packed as opposed to: "sizeof(char*), COLS"

    Quote Originally Posted by laserlight View Post
    Your struct objects are then created by processing these records as per the requirements of the struct. They don't need to know anything about CSV parsing; they just need to know that the nth field of the record corresponds to a particular member of the struct, then convert from string to the member type accordingly.
    That may be a solution to avoid the cast to "struct record" that I currently do. I'm fine with keeping the members as strings, until I need to use them. Perhaps I could do something like a type_validator.h with functions like: is_integer, is_float and so on.

  9. #9
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    I was actually under the impression that all the members would be strings so using the union was really more about packing than anything.

  10. #10
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Subsonics
    That is pretty much the direction I have been going in, and quite similar to the 2nd example above, except for the "sizeof(struct record)" part which depends on the struct being packed as opposed to: "sizeof(char*), COLS"
    That's because your struct record is not a CSV record: it is a struct in the subject domain. We might even rename it struct addressbook_record.

    Quote Originally Posted by Subsonics
    I'm fine with keeping the members as strings, until I need to use them.
    My recommendation is to deal with objects of your domain struct type. The CSV records and fields are just an intermediate representation.

    Quote Originally Posted by Subsonics
    Perhaps I could do something like a type_validator.h with functions like: is_integer, is_float and so on.
    You could, but that's unnecessary.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  11. #11
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by laserlight View Post
    That's because your struct record is not a CSV record: it is a struct in the subject domain. We might even rename it struct addressbook_record.
    I'm not sure I follow, "record" as a name is abstract enough to be any record, but unlike arrays have the nice feature of named members. When ever I would actually use this struct I could call the instance something like addressbook_record.

    Code:
    struct record *addressbook_record;
    
    while( /* read lines from csv file */ ) {
       addressbook_record = parse_csv_line( line );
       insert_in_datastructure( addressbook_record );
    }

  12. #12
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    You basically want to implement an interface. The details in parse_csv_line can not be made to magically work with any old struct definition*, simply by including a different header file (unless that header included code, which is generally a no-no). Each type of object would need to define it's own parse_csv_line and that function would be linked in to the final executable. Even your magic header file method would involve a recompile, so you aren't losing any dynamic features here.

    * I say "can not". Part of me wants to believe it's possible, and to figure this out, but thinking about all the preprocessor trickery and fugly x-macros that would be involved makes me queasy. It would be a lot of work, much harder that writing a makefile that uses the right header and links in the right .o file for the given CSV file format.

  13. #13
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    Quote Originally Posted by anduril462 View Post
    You basically want to implement an interface. The details in parse_csv_line can not be made to magically work with any old struct definition*, simply by including a different header file (unless that header included code, which is generally a no-no). Each type of object would need to define it's own parse_csv_line and that function would be linked in to the final executable. Even your magic header file method would involve a recompile, so you aren't losing any dynamic features here.
    It can be done, and I have already done it. The question is if it's kosher to use the pack pragma or __attribute__ or if there are other solutions to do it that I haven't thought about.

    This is what the current test header looks like in it's entirety.

    Code:
    #ifndef RECORD_H
    #define RECORD_H
    
    extern struct record {
       char *artist;
       char *song;
       char *duration;
       char *year;
    }__attribute__((packed));
    
    #endif
    I also managed to generate this header from a CSV file (provided it has a title row) with awk, and tried to include it in my makefile for a laugh.

    This is what the parse_csv_line function looks like, (btw this source file also contains:

    #define RECORD_SIZE sizeof(struct record) / sizeof(char*)

    Code:
    void *parse_csv_line(char *line) {
        char *p = NULL;
        char **record;
    
        record = calloc(sizeof(struct record), 1);
        if(!record) return NULL;
    
        for(size_t i = 0; line && i < RECORD_SIZE; i++) {
            p = strsep(&line, ",");
            record[i] = strdup(p);
            if(line) {
                line = ffwd_space(line);
            }
        }
    
        return record;
    }

  14. #14
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Ahh, I see. I was thinking the record structure used non char * types. I do believe it will work. It seems like a total hack, relying on the result of packing such a struct being binary compatible with a char **. And packing is implementation defined as well. You could just read into a char ** and use an enum for the fields:
    Code:
    enum fields {
        ARTIST,
        SONG,
        DURATION,
        YEAR
    };
    
    char **record = read_from_csv_file();
    printf("artist is '%s'\n", record[ARTIST]);
    Since data field you store is just a string, I don't see you really gaining much benefit by having an actual struct versus a char **.

  15. #15
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Subsonics
    I'm not sure I follow, "record" as a name is abstract enough to be any record, but unlike arrays have the nice feature of named members. When ever I would actually use this struct I could call the instance something like addressbook_record.
    That doesn't work: what are you going to name the members? I notice that in your examples, you name them stuff like artist, soing, duration, year... those have nothing to do with a generic CSV record. If you want to do that, then you might as well couple CSV parsing with your subject domain addressbook_record, because this "abstract enough to be any record" isn't.

    In other words, if you want the coupling, do something like this:
    Code:
    struct song_record {
        char *artist;
        char *song;
        unsigned int duration;
        unsigned int year;
    };
    
    void song_record_read_from_csv(struct song_record *record, FILE *fp)
    {
        /* read a line (i.e., CSV record) from the file
         * parse the line into CSV fields
         * populate artist and song using field[0] and field[1]
         * convert field[2] and field[3] to unsigned int and use the result to
         *   populate duration and year respectively.
         */
    }
    Hence, if you have an addressbook_record, you would define song_record_read_from_csv to parse a CSV record from file into your addressbook_record.

    But if you want to decouple, do something like this:
    Code:
    struct csv_record {
        char **fields;
    };
    
    struct csv_file {
        csv_record *records;
        size_t num_records;
        size_t num_fields;
    };
    
    struct song_record {
        char *artist;
        char *song;
        unsigned int duration;
        unsigned int year;
    };
    
    void parse_csv_file(csv_file *file, FILE *fp)
    {
        /* ... */
    }
    
    void song_record_read_from_csv(struct song_record *record, csv_record *csv)
    {
        /* populate artist and song using csv->fields[0] and csv->fields[1]
         * convert csv->fields[2] and csv->fields[3] to unsigned int and use the
         *   result to populate duration and year respectively.
         */
    }
    Hence, if you have an addressbook_record, you only need to define addressbook_record_read_from_csv to take the fields of the CSV record to populate your addressbook_record. You do not need to rewrite any CSV file parsing since it is decoupled from your subject domain struct.

    EDIT:
    Quote Originally Posted by anduril462
    Since data field you store is just a string, I don't see you really gaining much benefit by having an actual struct versus a char **.
    Indeed. struct record is just a sham when it cannot decide whether it is a CSV record or a song record, and is really just an array of strings.
    Last edited by laserlight; 02-16-2012 at 08:17 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Casting from struct to char array and vice versa
    By Edelweiss in forum C Programming
    Replies: 6
    Last Post: 08-08-2011, 10:35 PM
  2. char** in struct amd malloc()
    By alexopth1512 in forum C Programming
    Replies: 7
    Last Post: 11-03-2010, 03:05 AM
  3. about char* in struct
    By ampc in forum C Programming
    Replies: 15
    Last Post: 09-12-2010, 09:36 AM
  4. char myarray[0] in struct
    By core_cpp in forum C Programming
    Replies: 5
    Last Post: 10-07-2009, 09:03 PM
  5. copying struct to char
    By l2u in forum C++ Programming
    Replies: 4
    Last Post: 07-03-2006, 11:31 AM