Thread: Feedback on name generator

  1. #1
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81

    Feedback on name generator

    Hello. I recently finished a name generator that reads, depending on the passed to the script (-f or -m), a txt file with a list of names. It'll then generate a name (first and last) for a fictional male or female character.

    I've included the code below and would love some feedback on where I can improve.

    Things I learned:

    * opening and reading data from a text file and storing the value in arrays.
    I do this constantly in other languages, such as Go, and I was happy with the result. I needed to include a hack or two to eliminate a '\n' character that fgets puts in.

    The code is below. I'd love feedback and how I can improve it.

    If you want to run it, I can provide the text files I used. Otherwise, just make "males.txt", "females.txt" and "surnames.txt" in a directory called "names." In each file, just one name on each line.

    Code:
    #include <ctype.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <time.h>
    #include <unistd.h>
    
    #define MAXLINES 100000
    #define NAMESIZE 200
    
    struct Person {
        char* first;
        char* last;
    };
    
    // utility functions
    void print_usage()
    {
        puts("Usage: namege <-hmf>");
    }
    
    void print_error()
    {
        puts("Error: Enter a male (-m) or female (-f) flag");
    }
    
    void print_names(char* names[])
    {
        int i;
        for (i = 0; i < MAXLINES; i++) {
            if (names[i] != NULL) {
                printf("%s", names[i]);
            }
        }
        return;
    }
    
    int get_random_index(char* names[])
    {
        int i = 0, n, min, max;
    
        while (names[++i] != NULL) {
            ;
        }
    
        max = i - 1;
        min = 0;
        n = (rand() % (max + 1 - min)) + min;
    
        return n;
    }
    
    char** get_names(char* filename)
    {
        int count;
        char** names;
        char buffer[BUFSIZ];
    
        FILE* fp;
    
        // allocate N items and multiply by chars
        names = malloc(MAXLINES * sizeof(char*));
    
        fp = fopen(filename, "rb");
    
        if (fp == 0) {
            fprintf(stderr, "Failed to open file");
            exit(1);
        }
    
        count = 0;
        while (fgets(buffer, MAXLINES, fp)) {
    
            // skip blank new lines
            if (buffer[0] == '\n') {
                continue;
            }
    
            /* replace new line with null terminator */
            if (buffer[strlen(buffer) - 1] == '\n') {
                buffer[strlen(buffer) - 1] = '\0';
            }
    
            // allocate space for each item
            names[count] = malloc(10 * sizeof(char*));
    
            // then copy it;
            strcpy(names[count], buffer);
    
            // increment
            count++;
        }
    
        fclose(fp);
    
        return names;
    }
    
    struct Person get_random_name(char* filename, struct Person* p, char* s_filename)
    {
    
        char **names, **surnames;
        int r_first, r_last;
    
        // get list of names
        names = get_names(filename);
        surnames = get_names(s_filename);
    
        // get random first and last random index
        r_first = get_random_index(names);
        r_last = get_random_index(surnames);
    
        // assign first and last to Person
        p->first = names[r_first];
        p->last = surnames[r_last];
    
        return *p;
    }
    
    void show_name(struct Person* p)
    {
        char name[NAMESIZE];
        sprintf(name, "%s %s", p->first, p->last);
        printf("%s\n", name);
    }
    
    int main(int argc, char** argv)
    { 
        // set random seed
        srand(time(NULL));
        
        int opt;
        char* male_name_file = "./names/males.txt";
        char* female_name_file = "./names/females.txt";
        char* surnames_file = "./names/surnames.txt";
    
        struct Person person, *p;
        p = &person;
    
        if (argc == 1 || isalpha(*argv[1]) != 0 || isdigit(*argv[1]) != 0) {
            print_error();
            exit(EXIT_FAILURE);
        }
    
        while ((opt = getopt(argc, argv, "hmf")) != -1) {
    
            switch (opt) {
    
            case 'h':
                print_usage();
                exit(EXIT_SUCCESS);
                break;
    
            case 'm':
                *p = get_random_name(male_name_file, p, surnames_file);
                break;
    
            case 'f':
                *p = get_random_name(female_name_file, p, surnames_file);
                break;
    
            case '?':
                print_error();
                exit(EXIT_FAILURE);
                break;
            }
        }
    
        show_name(p);
    
        return 0;
    }

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Your fgets call has a mistake: you probably wanted to use BUFSIZ or sizeof(buffer) instead of MAXLINES as the second argument.

    Allocating MAXLINES char pointers up front usually isn't a very smart allocation strategy for dynamic arrays. A more space efficient strategy would keep track of both size (number of elements in use) and capacity (number of elements allocated), then realloc when the size will exceed capacity. A simple realloc strategy is to start with some small capacity and double the capacity each time; a more sophisticated one could adapt based on the capacity.

    Your malloc for names[count] has a mistake: you want to allocate some characters, so you should use sizeof(char), not sizeof(char*). Since sizeof(char) is always 1, you can omit it.

    Your strcpy call has a buffer overflow vulnerability: you only allocate 10 characters for the destination string (including null character), but buffer is an array of BUFSIZ char. I suggest storing the result of strlen(buffer) in a size_t variable, then you can use it both to replace the newline with a null character and to determine how many char to malloc.

    get_random_name uses its p parameter as an out parameter. This means that you shouldn't return *p as that just results in unnecessary copying. Rather, change the return type to void, remove the return statement, then in main write:
    Code:
    Person person;
    // ...
    get_random_name(female_name_file, &person, surnames_file);
    That is, you don't need yet another pointer p in main.

    To avoid accidentally modifying what they point to, your pointers male_name_file etc should be const char* instead of just char*. Attempting to modify a string literal results in undefined behaviour, so the const char* gives the compiler a better chance of warning you.

    Granted, it isn't important in a short program like this when the resources will typically be reclaimed by a modern OS, but you should free what you malloc. This means reworking exactly when you call get_names since you cannot free at the end of the get_random_name function since you need the names to persist when control returns to the main function. Alternatively, you can copy the names over to the Person object instead of just assigning pointers, but this means more dynamic memory allocation since the Person object has pointer rather than array members.
    Last edited by laserlight; 05-18-2019 at 07:36 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    Thanks, laserlight! Great feedback.

    Quote Originally Posted by laserlight View Post
    A more space efficient strategy would keep track of both size (number of elements in use) and capacity (number of elements allocated), then realloc when the size will exceed capacity. A simple realloc strategy is to start with some small capacity and double the capacity each time; a more sophisticated one could adapt based on the capacity.
    OK. I think I know what you mean. Though, I'm still a little confused about memory allocation.

    What I'm doing below is allocating 10 bytes for the names array. Then, when I run fgetc, I'm allocating each name based on their string length and appending it to the array.

    At the end, increment the loop index and use that to reallocate the names array so it can grow as I continue looping through the list.

    Code:
        int initial_mem = 10;
    
        // set initil memore to
        names = malloc(initial_mem);
    
        fp = fopen(filename, "rb");
    
        if (fp == 0) {
            fprintf(stderr, "Failed to open file");
            exit(1);
        }
    
        count = 0;
        while (fgets(buffer, BUFSIZ, fp)) {
    
            size_t buf_size = strlen(buffer);
    
            // skip blank new lines
            if (buffer[0] == '\n') {
                continue;
            }
    
            /* replace new line with null terminator */
            if (buffer[buf_size - 1] == '\n') {
                buffer[buf_size - 1] = '\0';
            }
    
            // allocate space for each item
            names[count] = malloc(buf_size);
    
            // then copy it;
            strcpy(names[count], buffer);
    
            // increment
            count++;
    
            // reallocate
            names = realloc(names, count * initial_mem);
        }

    Your strcpy call has a buffer overflow vulnerability: you only allocate 10 characters for the destination string (including null character), but buffer is an array of BUFSIZ char. I suggest storing the result of strlen(buffer) in a size_t variable, then you can use it both to replace the newline with a null character and to determine how many char to malloc.
    I've seen the size_t type but haven't been sure when to use it. Could I have set the result of strlen(buffer) to an int? What's the benefit?

    get_random_name uses its p parameter as an out parameter. This means that you shouldn't return *p as that just results in unnecessary copying. Rather, change the return type to void, remove the return statement, then in main write:
    Code:
    Person person;
    // ...
    get_random_name(female_name_file, &person, surnames_file);
    That is, you don't need yet another pointer p in main.
    Was my issue here me submitting *p as a parameter than assigning the value to it, kind of like a snake eating its own tail?

    Granted, it isn't important in a short program like this when the resources will typically be reclaimed by a modern OS, but you should free what you malloc. This means reworking exactly when you call get_names since you cannot free at the end of the get_random_name function since you need the names to persist when control returns to the main function. Alternatively, you can copy the names over to the Person object instead of just assigning pointers, but this means more dynamic memory allocation since the Person object has pointer rather than array members.
    In this case, names is what needs to be freed, correct? I could free that memory when they finished at the end of the get_random_name function like so:

    Code:
    void get_random_name(const char* filename, struct Person* p, const char* s_filename)
    {
    
        char **names, **surnames;
        int r_first, r_last;
    
        // get list of names
        names = get_names(filename);
        surnames = get_names(s_filename);
    
        // get random first and last random index
        r_first = get_random_index(names);
        r_last = get_random_index(surnames);
    
        // assign first and last to Person
        p->first = names[r_first];
        p->last = surnames[r_last];
    
        free(names);
        free(surnames);
    }
    Unless this is not correct?

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    This is wrong as it doesn't account for the size of each element:
    Code:
    names = malloc(initial_mem);
    remember that you can only omit the size of each element if you really want that number of bytes, which is true when you are allocating storage for arrays of char such as those used to store strings, but rarely true for anything else. Hence, you should have written, just as you did in your original code:
    Code:
    names = malloc(initial_mem * sizeof(char*));
    but it would be even better to have the size depend on names itself:
    Code:
    names = malloc(initial_mem * sizeof(names[0]));
    This has an off-by-one error:
    Code:
    names[count] = malloc(buf_size);
    strlen returns the length of the string excluding the null character, but strcpy copies the string including the null character. Therefore, you should have written:
    Code:
    names[count] = malloc(buf_size + 1);
    This works, but it is terribly inefficient to reallocate on each iteration, that's why I talked about reallocation strategies:
    Code:
    names = realloc(names, count * initial_mem);
    Furthermore, you're reallocating a multiple of initial_mem, which means that before the loop, you allocated 10, then on the first iteration, you reallocated 10 again, then on the second iteration, you reallocated 20, then on the third iteration, 30... but at the third iteration you only needed 3, which means that you didn't need to reallocate at all, not even once.

    Quote Originally Posted by catacombs
    I've seen the size_t type but haven't been sure when to use it. Could I have set the result of strlen(buffer) to an int? What's the benefit?
    size_t is an unsigned integer type that is the type of the result of sizeof. It is also the return type of strlen and various other functions that return some kind of size/length. The advantage of using it over int to store the return value of strlen is that it matches the return type of strlen exactly.

    Quote Originally Posted by catacombs
    Was my issue here me submitting *p as a parameter than assigning the value to it, kind of like a snake eating its own tail?
    What you were doing wass modifying the struct Person object in the caller through the pointer. That's good, but then you return it by value... which is okay, but that means that the struct Person object is copied. You do this only to copy back to the struct Person object that has already been modified, which means that you do extra work for nothing.

    Quote Originally Posted by catacombs
    In this case, names is what needs to be freed, correct?
    Yes, and also each element of names because each of them is a pointer for which you dynamically allocated memory for the corresponding name.

    Quote Originally Posted by catacombs
    I could free that memory when they finished at the end of the get_random_name function like so:
    Yes, but you still have a memory leak because names[0] etc was not freed. If you free them, you then have a problem because now names[r_first] and surnames[r_last] would have been freed. You might be able to get away with some careful working around this:
    Code:
    void get_random_name(const char* filename, struct Person* p, const char* s_filename)
    {
        // get list of names
        size_t names_count = 0;
        char **names = get_names(filename, &names_count);
        size_t surnames_count = 0;
        char **surnames = get_names(s_filename, &names_count);
    
        // get random first and last random index
        int r_first = get_random_index(names);
        int r_last = get_random_index(surnames);
    
        // assign first and last to Person
        p->first = names[r_first];
        p->last = surnames[r_last];
    
        // free names except those that were assigned to Person:
        for (size_t i = 0; i < names_count; i++)
        {
            if (i != r_first)
            {
                free(names[i]);
            }
        }
        free(names);
        for (size_t i = 0; i < surnames_count; i++)
        {
            if (i != r_last)
            {
                free(surnames[i]);
            }
        }
        free(surnames);
    }
    I have modified the calls to get_names to assume that there is an out parameter for the name count. The idea is to skip the pointers that were assigned, and now responsibility to call free on those pointers falls on the caller, i.e., the main function.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    OK. I'm still a little confused about memory management (sorry). I'm still allocating based on the iterator, but only adding the text size. I don't know how I can allocate without using the iterator. Other ways, realloc complains the next size isn't valid or I get a segfault.

    I refactored the code based on your suggestions, and this is what I have:

    Code:
    char** get_names(const char* filename, size_t* count)
    {
    
        char** names;
        char buffer[BUFSIZ];
    
        FILE* fp;
    
        fp = fopen(filename, "rb");
    
        if (fp == 0) {
            fprintf(stderr, "Failed to open file");
            exit(1);
        }
    
        // allocate memory for names
        // with first item (8 bytes in this case)
        names = malloc(sizeof(names[0]));
    
        while (fgets(buffer, BUFSIZ, fp)) {
    
            // get string size
            size_t buf_size = strlen(buffer);
    
            // skip blank new lines
            if (buffer[0] == '\n') {
                continue;
            }
    
            /* replace new line with null terminator */
            if (buffer[buf_size - 1] == '\n') {
                buffer[buf_size - 1] = '\0';
            }
    
            // allocate buf_size + size of char
            // to index of name array
            names[*count] = malloc(buf_size * sizeof(char*));
    
            // copy string to index
            strcpy(names[*count], buffer);
    
            /* // increment */
            (*count)++;
    
            // reallocate names array based on count and buf size
            names = realloc(names, (*count + buf_size) * sizeof(char*));
    
            if (*count == 5) {
                break;
            }
        }
    
        fclose(fp);
    
        return names;
    }
    
    void get_random_name(const char* filename, struct Person* p, const char* s_filename)
    {
    
        // get list of names
        size_t names_count = 0;
        char** names = get_names(filename, &names_count);
    
        size_t surnames_count = 0;
        char** surnames = get_names(s_filename, &surnames_count);
    
        /* // get random first and last random index */
        size_t r_first = get_random_index(names);
        size_t r_last = get_random_index(surnames);
    
        /* // assign first and last to Person */
        p->first = names[r_first];
        p->last = surnames[r_last];
    
        //free names except those that were assigned to Person:
        for (size_t i = 0; i < names_count; i++) {
            if (i != r_first) {
                free(names[i]);
            }
        }
    
        free(names);
    
        // free surnames except assiend to Person
        for (size_t i = 0; i < surnames_count; i++) {
            if (i != r_last) {
                free(surnames[i]);
            }
        }
    
        free(surnames);
    }
    Yes, but you still have a memory leak because names[0] etc was not freed.
    How can I look out for memory leaks? They seem to be sneaky bastards. When I compile, I often don't get an error message. Is a sign based on long the program takes to run?

    Using time, before the latest refactor, I noticed the program ran within 0.018s. Now, by freeing the allocated items in the array, the program runs at 0.001s.

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    You seem to be mixing up the dynamic allocation of individual strings with the dynamic allocation of the array of pointers to point to these strings.

    As for looking out for memory leaks: tough. I'd suggest switching to C++ Realistically though, it's a matter of taking care to keep track of the allocations, and possibly using tools to check correctness for you. How long a program takes to run is not a good metric to determine if there's a memory leak.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    TEIAM - problem solved
    Join Date
    Apr 2012
    Location
    Melbourne Australia
    Posts
    1,907
    Valgrind: About

    Will help
    Fact - Beethoven wrote his first symphony in C

  8. #8
    Registered User catacombs's Avatar
    Join Date
    May 2019
    Location
    /home/
    Posts
    81
    Quote Originally Posted by laserlight View Post
    You seem to be mixing up the dynamic allocation of individual strings with the dynamic allocation of the array of pointers to point to these strings.
    OK. I think I see the problem. I guess what I'm trying to do is allocate enough memory for a string, let's say 5+1 bytes for MARY, taking that size and growing the list by 6 bytes, doing it for all the names.

    I'd suggest switching to C++
    Hah. I don't think I'm there yet. I do like C for its simplicity.

    Quote Originally Posted by Click_here View Post
    Valgrind: About

    Will help
    Thanks, Click_here. I'll check that out.

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by catacombs
    I guess what I'm trying to do is allocate enough memory for a string, let's say 5+1 bytes for MARY, taking that size and growing the list by 6 bytes, doing it for all the names.
    That can be done, but it has its pros and cons. The pro is that if you can make a good guess as to how much memory you'll need in the end, then you just need a single allocation. The con is that doing it the way you're trying to do it is slow: you're reallocating on each iteration. Reallocations are "expensive". The other thing is that then you need names to be a char* not a char**, and so each name pointer points to particular characters in this huge dynamic array of char (not a string because it has embedded null characters). This is a more difficult way of doing things (it is a common way of "flattening" a 2D array, but those don't have inner arrays of differing lengths), and I wouldn't advise it until you can do the other way with ease.

    Because you declared names to be a char**, I assumed that you were going with a simpler approach, and it looked like that was what you were doing in your original post: dynamically allocate char pointers, and for each char pointer, dynamically allocate just enough space to store a string.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. My random number generator. (Looking for feedback)
    By Yonut in forum C Programming
    Replies: 3
    Last Post: 04-21-2019, 10:35 AM
  2. Feedback on prime number generator source code?
    By sepp in forum C Programming
    Replies: 5
    Last Post: 12-21-2014, 09:20 PM
  3. Just looking for Feedback
    By rTeapot in forum C++ Programming
    Replies: 8
    Last Post: 07-07-2012, 08:26 AM
  4. Looking For Feedback
    By Matty_Alan in forum C++ Programming
    Replies: 2
    Last Post: 05-27-2010, 04:33 PM
  5. Looking for some feedback...
    By dead_cell in forum C++ Programming
    Replies: 7
    Last Post: 08-11-2002, 09:08 AM

Tags for this Thread