Thread: Linear search not working for large data set

  1. #1
    Registered User
    Join Date
    Jan 2021
    Posts
    11

    Linear search not working for large data set

    I have a text file of the form:
    number string

    I want to be able to search for a code using binary search, look up a specific name using linear search and print all names that start with a specific word.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdbool.h>
    
    
    #define MAXLINESIZE 300
    
    
    typedef struct{
        int code;
        char *name;
    }JOB;
    
    
    int countLines(FILE *fp)
    {
        int count = 0;
        for (int c; (c = fgetc(fp)) != EOF; )
            if (c == '\n')
                ++count;
        return count;
    }
    
    
    int binarySearch(JOB *arr, int l, int r, int x)
    {
        if (r >= l) {
            int mid = l + (r - l) / 2;
      
            if (arr[mid].code == x)
                return mid;
      
            if (arr[mid].code > x)
                return binarySearch(arr, l, mid - 1, x);
      
            return binarySearch(arr, mid + 1, r, x);
        }
      
        return -1;
    }
    
    
    int linearSearch(JOB *arr, int n, char *x)
    {
        int i;
        for (i = 0; i < n; i++)
        {
            if (strcmp(arr[i].name, x) == 0)
            {
                return i;
            }
        }
    
    
        return -1;
    }
    
    
    bool StartsWith(const char *a, const char *b)
    {
        if(strncmp(a, b, strlen(b)) == 0)
            return 1;
        return 0;
    }
    
    
    
    
    void linearSearchStart(JOB *arr, int n, char *x)
    {
        int i;
        bool found = false;
        for (i = 0; i < n; i++)
        {
            if (StartsWith(arr[i].name, x) == 1)
            {
                found = true;
                printf("[%02d]  Code: %d\t | Job: %s\n", i+1, arr[i].code, arr[i].name);
            }
        }
        if(!found)
            printf("NOT found\n");
    }
    
    
    
    
    int main(int argc, char **argv)
    {
        FILE *fin = fopen("search_file2.txt", "r");
        
        if(!fin)
        {
            fprintf(stderr, "Error opening the file.\n");
            exit(EXIT_FAILURE);
        }
    
    
        int nEntries = countLines(fin);
        fseek(fin, 0L, SEEK_SET);
        //printf("%d", nEntries);
        JOB *arr = (JOB *)malloc(nEntries * sizeof(JOB));
        int k;
        char buf[MAXLINESIZE];
        while(!feof(fin))
        {
            fscanf(fin, "%d", &arr[k].code);
            fgetc(fin);
            if(fgets(buf, sizeof(buf), fin) != NULL)
            {
                buf[strlen(buf) - 1] = '\0';
                arr[k].name = strdup(buf);
                k++; 
            }
        }
    
    
        printf("%d\n", k);
    
    
        for(k = 0; k < nEntries; k++)
        {
            printf("[%d]   Code : %d\t | Job: %s\n", k+1, arr[k].code, arr[k].name); 
        }
    
    
        int x = 111107;
        char *y = "senator"; 
        char *z = "president";
        int search = binarySearch(arr, 0, nEntries-1, x);
        int lin_search = linearSearch(arr, nEntries, y);
    
    
        if(search != -1)
        {
            printf("Element %d found at line %d\n", x, search + 1);
            printf("[%d]  Code: %d\t | Job: %s\n", search + 1, arr[search].code, arr[search].name);
        }
        else
        {
            printf("Element %d was NOT found in the given set\n", x);
        }
    
    
        if(lin_search != -1)
        {
            printf("Job %s found at line %d\n", y, lin_search + 1);
            printf("[%d]  Code: %d\t | Job: %s\n", lin_search + 1, arr[lin_search].code, arr[lin_search].name);
        }
        else
        {
            printf("Job %s was NOT found in the given set\n", y);
        }
    
    
        linearSearchStart(arr, nEntries, z);
    
    
        fclose(fin);
        return 0;
    }
    For the text file:
    111102 ambasador
    111107 senator
    111108 governor
    111111 president
    111121 prime minister

    Output:
    5
    [1] Code : 111102 | Job: ambasador
    [2] Code : 111107 | Job: senator
    [3] Code : 111108 | Job: governor
    [4] Code : 111111 | Job: president
    [5] Code : 111121 | Job: prime minister
    Element 111107 found at line 2
    [2] Code: 111107 | Job: senator
    Job senator found at line 2
    [2] Code: 111107 | Job: senator
    [04] Code: 111111 | Job: president


    Which is perfect, however, if I add 4000 more entries to this file, linear search will not work anymore, so if I were to search for 'senator' again, it will not find it, even though it is on the exact same line as before. Why is this happening?

  2. #2
    Registered User
    Join Date
    Dec 2017
    Posts
    1,190
    I notice that you forgot to initialize k to 0 before filling your jobs array (which would be better named "jobs" instead of the meaningless "arr"). You are also not testing for end-of-file correctly. Perhaps something like this:
    Code:
        for (int i = 0; fscanf(fin, "%d", &jobs[i].code) == 1; ++i)
        {
            char buf[MAXLINESIZE];
            fgetc(fin);
            if (fgets(buf, sizeof buf, fin) != NULL)
            {
                buf[strlen(buf) - 1] = '\0';
                jobs[i].name = strdup(buf);
            }
        }
    And although it is not strictly necessary in this case, it is always best to free the memory you malloc at some point.
    We live as it were by chance, and by chance we are governed. - Seneca

  3. #3
    Registered User
    Join Date
    Jan 2021
    Posts
    11
    Thank you! I know !while(feof) is considered a bad practice, so your solution is way more elegant. But I still have the same problem, if I search for a specific job using linear search in a text file with 20 lines it finds it, but it doesn't if the file has 4000 lines. I don't understand why. I have a file with 4238 lines, I checked the values for k and nEntries and they are both 4238, so I have no idea why this is happening

  4. #4
    Registered User
    Join Date
    Dec 2017
    Posts
    1,190
    In the 4238-line file, make sure that there is not an extra space before or after the word senator.
    We live as it were by chance, and by chance we are governed. - Seneca

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Linear search on a file with data
    By telmo_d in forum C Programming
    Replies: 3
    Last Post: 04-21-2015, 06:41 AM
  2. Help with Linear Search
    By SDH in forum C Programming
    Replies: 29
    Last Post: 01-14-2013, 04:25 PM
  3. Linear Search
    By tsmith94 in forum C Programming
    Replies: 5
    Last Post: 10-16-2011, 09:33 PM
  4. Difference Between A Linear Search And Binary Search
    By ImBack92 in forum C Programming
    Replies: 4
    Last Post: 05-12-2011, 08:47 AM
  5. binary search or linear search
    By vajira in forum C Programming
    Replies: 0
    Last Post: 06-05-2003, 12:42 PM

Tags for this Thread