Thread: Linear search not working for large data set

  1. #1
    Registered User
    Join Date
    Jan 2021
    Posts
    11

    Linear search not working for large data set

    I have a text file of the form:
    number string

    I want to be able to search for a code using binary search, look up a specific name using linear search and print all names that start with a specific word.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <stdbool.h>
    
    
    #define MAXLINESIZE 300
    
    
    typedef struct{
        int code;
        char *name;
    }JOB;
    
    
    int countLines(FILE *fp)
    {
        int count = 0;
        for (int c; (c = fgetc(fp)) != EOF; )
            if (c == '\n')
                ++count;
        return count;
    }
    
    
    int binarySearch(JOB *arr, int l, int r, int x)
    {
        if (r >= l) {
            int mid = l + (r - l) / 2;
      
            if (arr[mid].code == x)
                return mid;
      
            if (arr[mid].code > x)
                return binarySearch(arr, l, mid - 1, x);
      
            return binarySearch(arr, mid + 1, r, x);
        }
      
        return -1;
    }
    
    
    int linearSearch(JOB *arr, int n, char *x)
    {
        int i;
        for (i = 0; i < n; i++)
        {
            if (strcmp(arr[i].name, x) == 0)
            {
                return i;
            }
        }
    
    
        return -1;
    }
    
    
    bool StartsWith(const char *a, const char *b)
    {
        if(strncmp(a, b, strlen(b)) == 0)
            return 1;
        return 0;
    }
    
    
    
    
    void linearSearchStart(JOB *arr, int n, char *x)
    {
        int i;
        bool found = false;
        for (i = 0; i < n; i++)
        {
            if (StartsWith(arr[i].name, x) == 1)
            {
                found = true;
                printf("[%02d]  Code: %d\t | Job: %s\n", i+1, arr[i].code, arr[i].name);
            }
        }
        if(!found)
            printf("NOT found\n");
    }
    
    
    
    
    int main(int argc, char **argv)
    {
        FILE *fin = fopen("search_file2.txt", "r");
        
        if(!fin)
        {
            fprintf(stderr, "Error opening the file.\n");
            exit(EXIT_FAILURE);
        }
    
    
        int nEntries = countLines(fin);
        fseek(fin, 0L, SEEK_SET);
        //printf("%d", nEntries);
        JOB *arr = (JOB *)malloc(nEntries * sizeof(JOB));
        int k;
        char buf[MAXLINESIZE];
        while(!feof(fin))
        {
            fscanf(fin, "%d", &arr[k].code);
            fgetc(fin);
            if(fgets(buf, sizeof(buf), fin) != NULL)
            {
                buf[strlen(buf) - 1] = '\0';
                arr[k].name = strdup(buf);
                k++; 
            }
        }
    
    
        printf("%d\n", k);
    
    
        for(k = 0; k < nEntries; k++)
        {
            printf("[%d]   Code : %d\t | Job: %s\n", k+1, arr[k].code, arr[k].name); 
        }
    
    
        int x = 111107;
        char *y = "senator"; 
        char *z = "president";
        int search = binarySearch(arr, 0, nEntries-1, x);
        int lin_search = linearSearch(arr, nEntries, y);
    
    
        if(search != -1)
        {
            printf("Element %d found at line %d\n", x, search + 1);
            printf("[%d]  Code: %d\t | Job: %s\n", search + 1, arr[search].code, arr[search].name);
        }
        else
        {
            printf("Element %d was NOT found in the given set\n", x);
        }
    
    
        if(lin_search != -1)
        {
            printf("Job %s found at line %d\n", y, lin_search + 1);
            printf("[%d]  Code: %d\t | Job: %s\n", lin_search + 1, arr[lin_search].code, arr[lin_search].name);
        }
        else
        {
            printf("Job %s was NOT found in the given set\n", y);
        }
    
    
        linearSearchStart(arr, nEntries, z);
    
    
        fclose(fin);
        return 0;
    }
    For the text file:
    111102 ambasador
    111107 senator
    111108 governor
    111111 president
    111121 prime minister

    Output:
    5
    [1] Code : 111102 | Job: ambasador
    [2] Code : 111107 | Job: senator
    [3] Code : 111108 | Job: governor
    [4] Code : 111111 | Job: president
    [5] Code : 111121 | Job: prime minister
    Element 111107 found at line 2
    [2] Code: 111107 | Job: senator
    Job senator found at line 2
    [2] Code: 111107 | Job: senator
    [04] Code: 111111 | Job: president


    Which is perfect, however, if I add 4000 more entries to this file, linear search will not work anymore, so if I were to search for 'senator' again, it will not find it, even though it is on the exact same line as before. Why is this happening?

  2. #2
    Registered User
    Join Date
    Dec 2017
    Posts
    1,652
    I notice that you forgot to initialize k to 0 before filling your jobs array (which would be better named "jobs" instead of the meaningless "arr"). You are also not testing for end-of-file correctly. Perhaps something like this:
    Code:
        for (int i = 0; fscanf(fin, "%d", &jobs[i].code) == 1; ++i)
        {
            char buf[MAXLINESIZE];
            fgetc(fin);
            if (fgets(buf, sizeof buf, fin) != NULL)
            {
                buf[strlen(buf) - 1] = '\0';
                jobs[i].name = strdup(buf);
            }
        }
    And although it is not strictly necessary in this case, it is always best to free the memory you malloc at some point.
    And that's the world in a nutshell, an appropriate receptacle.

  3. #3
    Registered User
    Join Date
    Jan 2021
    Posts
    11
    Thank you! I know !while(feof) is considered a bad practice, so your solution is way more elegant. But I still have the same problem, if I search for a specific job using linear search in a text file with 20 lines it finds it, but it doesn't if the file has 4000 lines. I don't understand why. I have a file with 4238 lines, I checked the values for k and nEntries and they are both 4238, so I have no idea why this is happening

  4. #4
    Registered User
    Join Date
    Dec 2017
    Posts
    1,652
    In the 4238-line file, make sure that there is not an extra space before or after the word senator.
    And that's the world in a nutshell, an appropriate receptacle.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Linear search on a file with data
    By telmo_d in forum C Programming
    Replies: 3
    Last Post: 04-21-2015, 06:41 AM
  2. Help with Linear Search
    By SDH in forum C Programming
    Replies: 29
    Last Post: 01-14-2013, 04:25 PM
  3. Linear Search
    By tsmith94 in forum C Programming
    Replies: 5
    Last Post: 10-16-2011, 09:33 PM
  4. Difference Between A Linear Search And Binary Search
    By ImBack92 in forum C Programming
    Replies: 4
    Last Post: 05-12-2011, 08:47 AM
  5. binary search or linear search
    By vajira in forum C Programming
    Replies: 0
    Last Post: 06-05-2003, 12:42 PM

Tags for this Thread