Casting a struct to char**, or abandon the struct and use char**?

**Subsonics** · 02-15-2012

Sorry for the cryptic title.

I have a program that reads csv files and adds each row to a data structure. Right now the data structure is separated from it's content, which is handled by it's own source file which knows about the content, declares a struct, parses a csv line etc.

What I want to do now is to find a way to easily change this content specific file, for example by only create a new struct that reflects a csv row. However each function depends on the struct to be exactly the way it currently is.

Is it a viable solution to cast the struct to a char** array (the struct contains char* only) and use pragmas or __attribute__ to make sure it's packed, or is it better to abandon the struct and only use a char** array, and with it lose the named members that a struct has?

**jimblumberg** · 02-15-2012

What I want to do now is to find a way to easily change this content specific file, for example by only create a new struct that reflects a csv row. However each function depends on the struct to be exactly the way it currently is.

What functions are you talking about? Functions in the "content specific file", or some other functions? How are you passing this structure from your "content specific file" to the rest of your program. Depending on how you are using the structure you may be able to place the "new" information at the end of your structure. Then only functions designed to use this "new" information will try to access the "new" data.

Jim

**Subsonics** · 02-15-2012

Originally Posted by jimblumberg

What functions are you talking about? Functions in the "content specific file"

Yes, although the aim is the have the content specific file be nothing more than a header that declares a struct. The parsing of csv and so on would be separated from the struct. The problem right now is that the parsing functions, memory allocations (for struct members) and so on depends on the struct. If I can make all these functions take a void* and then assign that to a char** inside the function and get size by: sizeof(struct s) / sizeof(char*) that would solve it. I just don't know if it's considered ok, or if there is perhaps a different way to approach it.

Originally Posted by jimblumberg

How are you passing this structure from your "content specific file" to the rest of your program. Depending on how you are using the structure you may be able to place the "new" information at the end of your structure. Then only functions designed to use this "new" information will try to access the "new" data.

Jim

The struct is #include'd in the files that will be using it.

Edit:

Reading back what I just wrote, I realise that it might not make much sense at all. I basically want to see how far it's possible to push decoupling. Together these files would read and parse csv and build a data structure in memory.

I want to see if I can adapt the functions to any csv file by only changing a header file that declares a struct.

**jimblumberg** · 02-15-2012

I want to see if I can adapt the functions to any csv file by only changing a header file that declares a struct.

Why can't you just pass the correct structure into the function? This structure could have a different number and type of elements depending on the format of the csv file. If you have two structures, one structure to hold the data from the csv file, the other would be the "header". This "header" structure would contain fields containing the number of fields, an array containing the size of the fields, the type of the fields, and possibly the length of the fields.

Something like:

Code:

// Header structures.

struct field_info
{
   char field_type[3]; // c,s,i,ui,l, ul,f,d and any other types needed.
   size_t field_size;   // Probably only needed for strings.
};

#define NUM_FIELDS 10
struct record
{
   const size_t num_fields;  // Number of fields per line.
   struct field_info[NUM_FIELDS]; 
};

// Your data structure below. Contains NUM_FIELDS fields, of the type described in the field_info structure.

You would have some kind of "initialization" function that will fill in the values in the record structure. So you would need both a small source file along with the header file. You could even read the required information for the header information from a file if so desired.

Jim

**Subsonics** · 02-16-2012

Hm, interesting but wouldn't that make the struct members anonymous if that makes sense in the same way that an array would? I might not completely understand what you meant though.

This is two mock up examples:

Code:

// declared in a separate header
struct record {
   char *name;
   char *address;
};

Option 1:

Code:

struct record *parse_csv_row(char *line) {
   struct record *r = calloc(sizeof(struct record), 1);

   // using strsep and strdup to get the string to r->name

   // ditto for r->address;

   return r;
};

2:

Code:

#define COLS sizeof(struct record) / sizeof(char*)

void *parse_csv_row(char *line) {
   char **record = calloc(sizeof(struct record), 1);

   for(int i = 0; i < COLS; i++) {
      // using strsep and strdup to get the string to record[i]
   }

   return record;
}

The first function and any similar function that operates on the struct would have to be rewritten to reflect changes in the struct.

**whiteflags** · 02-16-2012

You could make a union of the struct and an actual array of char pointers, but you would have to update the union's type declaration if the members change. I'm not really sure if you'd be happy with that or not. I get the feeling I'm just moving the problem to another location.

**laserlight** · 02-16-2012

If you want to decouple CSV parsing from your domain models, then the solution is to write a generic CSV parser that parses the CSV file into records that contain fields that are (or contain) strings. Your struct objects are then created by processing these records as per the requirements of the struct. They don't need to know anything about CSV parsing; they just need to know that the nth field of the record corresponds to a particular member of the struct, then convert from string to the member type accordingly.

**Subsonics** · 02-16-2012

Originally Posted by whiteflags

You could make a union of the struct and an actual array of char pointers, but you would have to update the union's type declaration if the members change. I'm not really sure if you'd be happy with that or not. I get the feeling I'm just moving the problem to another location.

That would be somewhat similar to what Jim proposed it sounds like. I think it might be easier to just use the array then and forget about named members, I guess it would be possible to define enums that represents the indexes, but it might be too messy.

Originally Posted by laserlight

If you want to decouple CSV parsing from your domain models, then the solution is to write a generic CSV parser that parses the CSV file into records that contain fields that are (or contain) strings.

That is pretty much the direction I have been going in, and quite similar to the 2nd example above, except for the "sizeof(struct record)" part which depends on the struct being packed as opposed to: "sizeof(char*), COLS"

Originally Posted by laserlight

Your struct objects are then created by processing these records as per the requirements of the struct. They don't need to know anything about CSV parsing; they just need to know that the nth field of the record corresponds to a particular member of the struct, then convert from string to the member type accordingly.

That may be a solution to avoid the cast to "struct record" that I currently do. I'm fine with keeping the members as strings, until I need to use them. Perhaps I could do something like a type_validator.h with functions like: is_integer, is_float and so on.

**whiteflags** · 02-16-2012

I was actually under the impression that all the members would be strings so using the union was really more about packing than anything.

**laserlight** · 02-16-2012

Originally Posted by Subsonics

That is pretty much the direction I have been going in, and quite similar to the 2nd example above, except for the "sizeof(struct record)" part which depends on the struct being packed as opposed to: "sizeof(char*), COLS"

That's because your struct record is not a CSV record: it is a struct in the subject domain. We might even rename it struct addressbook_record.

Originally Posted by Subsonics

I'm fine with keeping the members as strings, until I need to use them.

My recommendation is to deal with objects of your domain struct type. The CSV records and fields are just an intermediate representation.

Originally Posted by Subsonics

Perhaps I could do something like a type_validator.h with functions like: is_integer, is_float and so on.

You could, but that's unnecessary.

**Subsonics** · 02-16-2012

Originally Posted by laserlight

That's because your struct record is not a CSV record: it is a struct in the subject domain. We might even rename it struct addressbook_record.

I'm not sure I follow, "record" as a name is abstract enough to be any record, but unlike arrays have the nice feature of named members. When ever I would actually use this struct I could call the instance something like addressbook_record.

Code:

struct record *addressbook_record;

while( /* read lines from csv file */ ) {
   addressbook_record = parse_csv_line( line );
   insert_in_datastructure( addressbook_record );
}

**anduril462** · 02-16-2012

You basically want to implement an interface. The details in parse_csv_line can not be made to magically work with any old struct definition*, simply by including a different header file (unless that header included code, which is generally a no-no). Each type of object would need to define it's own parse_csv_line and that function would be linked in to the final executable. Even your magic header file method would involve a recompile, so you aren't losing any dynamic features here.

* I say "can not". Part of me wants to believe it's possible, and to figure this out, but thinking about all the preprocessor trickery and fugly x-macros that would be involved makes me queasy. It would be a lot of work, much harder that writing a makefile that uses the right header and links in the right .o file for the given CSV file format.

**Subsonics** · 02-16-2012

Originally Posted by anduril462

You basically want to implement an interface. The details in parse_csv_line can not be made to magically work with any old struct definition*, simply by including a different header file (unless that header included code, which is generally a no-no). Each type of object would need to define it's own parse_csv_line and that function would be linked in to the final executable. Even your magic header file method would involve a recompile, so you aren't losing any dynamic features here.

It can be done, and I have already done it. The question is if it's kosher to use the pack pragma or __attribute__ or if there are other solutions to do it that I haven't thought about.

This is what the current test header looks like in it's entirety.

Code:

#ifndef RECORD_H
#define RECORD_H

extern struct record {
   char *artist;
   char *song;
   char *duration;
   char *year;
}__attribute__((packed));

#endif

I also managed to generate this header from a CSV file (provided it has a title row) with awk, and tried to include it in my makefile for a laugh.

This is what the parse_csv_line function looks like, (btw this source file also contains:

#define RECORD_SIZE sizeof(struct record) / sizeof(char*)

Code:

void *parse_csv_line(char *line) {
    char *p = NULL;
    char **record;

    record = calloc(sizeof(struct record), 1);
    if(!record) return NULL;

    for(size_t i = 0; line && i < RECORD_SIZE; i++) {
        p = strsep(&line, ",");
        record[i] = strdup(p);
        if(line) {
            line = ffwd_space(line);
        }
    }

    return record;
}

**anduril462** · 02-16-2012

Ahh, I see. I was thinking the record structure used non char * types. I do believe it will work. It seems like a total hack, relying on the result of packing such a struct being binary compatible with a char **. And packing is implementation defined as well. You could just read into a char ** and use an enum for the fields:

Code:

enum fields {
    ARTIST,
    SONG,
    DURATION,
    YEAR
};

char **record = read_from_csv_file();
printf("artist is '%s'\n", record[ARTIST]);

Since data field you store is just a string, I don't see you really gaining much benefit by having an actual struct versus a char **.

**laserlight** · 02-16-2012

Originally Posted by Subsonics

I'm not sure I follow, "record" as a name is abstract enough to be any record, but unlike arrays have the nice feature of named members. When ever I would actually use this struct I could call the instance something like addressbook_record.

That doesn't work: what are you going to name the members? I notice that in your examples, you name them stuff like artist, soing, duration, year... those have nothing to do with a generic CSV record. If you want to do that, then you might as well couple CSV parsing with your subject domain addressbook_record, because this "abstract enough to be any record" isn't.

In other words, if you want the coupling, do something like this:

Code:

struct song_record {
    char *artist;
    char *song;
    unsigned int duration;
    unsigned int year;
};

void song_record_read_from_csv(struct song_record *record, FILE *fp)
{
    /* read a line (i.e., CSV record) from the file
     * parse the line into CSV fields
     * populate artist and song using field[0] and field[1]
     * convert field[2] and field[3] to unsigned int and use the result to
     *   populate duration and year respectively.
     */
}

Hence, if you have an addressbook_record, you would define song_record_read_from_csv to parse a CSV record from file into your addressbook_record.

But if you want to decouple, do something like this:

Code:

struct csv_record {
    char **fields;
};

struct csv_file {
    csv_record *records;
    size_t num_records;
    size_t num_fields;
};

struct song_record {
    char *artist;
    char *song;
    unsigned int duration;
    unsigned int year;
};

void parse_csv_file(csv_file *file, FILE *fp)
{
    /* ... */
}

void song_record_read_from_csv(struct song_record *record, csv_record *csv)
{
    /* populate artist and song using csv->fields[0] and csv->fields[1]
     * convert csv->fields[2] and csv->fields[3] to unsigned int and use the
     *   result to populate duration and year respectively.
     */
}

Hence, if you have an addressbook_record, you only need to define addressbook_record_read_from_csv to take the fields of the CSV record to populate your addressbook_record. You do not need to rewrite any CSV file parsing since it is decoupled from your subject domain struct.

EDIT:

Originally Posted by anduril462

Since data field you store is just a string, I don't see you really gaining much benefit by having an actual struct versus a char **.

Indeed. struct record is just a sham when it cannot decide whether it is a CSV record or a song record, and is really just an array of strings.

Thread: Casting a struct to char, or abandon the struct and use char?

Thread Tools

Search Thread

Display

Casting a struct to char, or abandon the struct and use char?

Similar Threads

Casting from struct to char array and vice versa

char** in struct amd malloc()

about char* in struct

char myarray[0] in struct

copying struct to char

Thread: Casting a struct to char**, or abandon the struct and use char**?

Casting a struct to char**, or abandon the struct and use char**?

Similar Threads

Thread: Casting a struct to char, or abandon the struct and use char?

Casting a struct to char, or abandon the struct and use char?