Thread: Search File for String

  1. #1
    Registered User
    Join Date
    Aug 2014
    Posts
    3

    Search File for String

    Hello everyone!

    First, I should state that I am recently new to C programming. I have dabbled with Python in the past, but nothing formal. I am taking a C programming class, but it is an introductory course. As such, the instructor is moving extremely slow through the material (which I understand), but I would like to start utilizing the information to help me in other areas.

    With this program, I am trying to search a given text file (line-by-line) for user-submitted data. I then want to be able to store which line that data was found on. The program runs, but it always results in "Sorry, couldn't find a match." Even if it shouldn't.


    Code:
    #include <stdio.h> 
    #include <conio.h> 
    #include <string.h> 
    #include <stdlib.h>
    int main(void) 
    {
        FILE *fp;
       char *str;
        int line_num = 1;
        int find_result = 0;
        char temp[512];
       
    printf("Search> "); /* Enter Search Request */
    scanf("%c", str);
    
    fp =fopen("ChemicalSymbol.txt", "r"); // Open text file
    /* Search text file line by line for the entered data */
        while(fgets(temp, 512, fp) != NULL) 
          {
            if((strstr(temp, str)) != NULL) 
             {
                printf("A match found on line: %d\n", line_num);
                printf("\n%s\n", temp);
                find_result++;
               }
            line_num++;
           }
    
        if(find_result == 0) 
          {
            printf("\nSorry, couldn't find a match.\n");
           }
        
        //Close the file if still open.
        if(fp) 
          {
            fclose(fp);
           }
    return(0);
    }
    I should probably explain my overall goal with this program.
    I am a chemistry major. Ultimately, I would like to write a program that will calculate the molecular weight of a compound. I expect the user entered molecular compound to be something like this: CH3COOH (An equivalent entry could be C2H4O2)
    Also, it needs to be case sensitive because the chemical formulas are case sensitive.

    The output will be "The molecular weight of CH3COOH is xxx.xxxxx grams"


    I have two text files as well. One contains a list of all chemical symbols, and the second is a list of all atomic weights in the same order as the first.

    I want to place the user inputted data into an array, then search for each element of that array in the chemical symbol text file. It will show which line it was found on, then I can use that line to grab the atomic weight from the second text file.

    I know this is a lot of information, and I am probably making a ton of mistakes. But for now, I am focusing on the search function of the program, and would love any advice or recommendations.

    Thank you very much.
    Last edited by Dustin Spencer; 08-22-2014 at 10:20 PM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > scanf("%c", str);
    If you change this to %s, and make str an array (not a pointer), then you might get somewhere.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by Dustin Spencer
    I have two text files as well. One contains a list of all chemical symbols, and the second is a list of all atomic weights in the same order as the first.

    I want to place the user inputted data into an array, then search for each element of that array in the chemical symbol text file. It will show which line it was found on, then I can use that line to grab the atomic weight from the second text file.
    If I understand this correctly, the chemical symbols are those of the chemical elements. This is a sufficiently small list that you could even hardcode it into your program instead of reading from a file (or two). I suggest that you pick one of two approaches.

    Approach #1: define a struct to model a chemical element, e.g.,
    Code:
    typedef struct
    {
        char symbol[4];
        double atomic_weight;
    } ChemicalElement;
    Now, create an array of ChemicalElement objects (you might want to use a named constant instead of the magic number 118):
    Code:
    ChemicalElement chemical_elements[118];
    and populate it either by hardcoding it in your program or by reading from a file shortly after the program starts. You then sort the array (e.g., using qsort) by symbol, upon which you can quickly find the element's atomic_weight by using binary search (or by using bsearch, which typically implements binary search).

    Approach #2: create two arrays. One array will be an array of chemical symbols; the other array will be an array of atomic weights. Likewise, you can populate the arrays by hardcoding or reading from a file. Sorting will be a bit more difficult though, but it may suffice to just start with linear search first.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User Alpo's Avatar
    Join Date
    Apr 2014
    Posts
    877
    Quote Originally Posted by Dustin Spencer View Post
    Hello everyone!

    First, I should state that I am recently new to C programming. I have dabbled with Python in the past, but nothing formal. I am taking a C programming class, but it is an introductory course. As such, the instructor is moving extremely slow through the material (which I understand), but I would like to start utilizing the information to help me in other areas.

    Code:
    #include <stdio.h> 
    #include <conio.h> 
    #include <string.h> 
    #include <stdlib.h>
    
    int main(void) 
    {
        FILE *fp;
       char *str;
        int line_num = 1;
        int find_result = 0;
        char temp[512];
       
    printf("Search> "); /* Enter Search Request */
    scanf("%c", str);
    
    fp =fopen("ChemicalSymbol.txt", "r"); // Open text file
    /* Search text file line by line for the entered data */
        while(fgets(temp, 512, fp) != NULL) 
          {
            if((strstr(temp, str)) != NULL) 
             {
                printf("A match found on line: %d\n", line_num);
                printf("\n%s\n", temp);
                find_result++;
               }
            line_num++;
           }
    
        if(find_result == 0) 
          {
            printf("\nSorry, couldn't find a match.\n");
           }
        
        //Close the file if still open.
        if(fp) 
          {
            fclose(fp);
           }
    return(0);
    }
    I should probably explain my overall goal with this program.
    I am a chemistry major. Ultimately, I would like to write a program that will calculate the molecular weight of a compound. I expect the user entered molecular compound to be something like this: CH3COOH (An equivalent entry could be C2H4O2)
    Also, it needs to be case sensitive because the chemical formulas are case sensitive.

    The output will be "The molecular weight of CH3COOH is xxx.xxxxx grams"


    I have two text files as well. One contains a list of all chemical symbols, and the second is a list of all atomic weights in the same order as the first.

    I want to place the user inputted data into an array, then search for each element of that array in the chemical symbol text file. It will show which line it was found on, then I can use that line to grab the atomic weight from the second text file.

    I know this is a lot of information, and I am probably making a ton of mistakes. But for now, I am focusing on the search function of the program, and would love any advice or recommendations.

    Thank you very much.
    First, I wanted to say that what you're doing with your education, being proactive, is a really good thing. If you stick with it, you'll find that the largest source of your learning will probably come from teaching yourself.

    Along with what Salem said, I also suggest looking at the ctype.h functions. If you want to make the program case insensitive, it has good functions like tolower() and a few other ones which are convenient.

    Also it is a good idea to diagnose as many of the programs problems as you can yourself. You can do this through a debugger, or using printf() statements, or sometimes just by using pencil and paper. Whichever you feel helps you the most. Learning to get good at diagnosing works synergistically with previous coding knowledge, and will allow you to stretch and make programs beyond your own scope.

    Anyways, what I'm trying to say is, take the building of your program slowly. Work out the logic piece by piece, while using diagnostic tools along the way, and you may be surprised at what you can accomplish. Good luck!

  5. #5
    Registered User
    Join Date
    Aug 2014
    Posts
    3
    Thank you all very much for your advice. I took a look at the program again, and decided to change how I completed the task. I coded the molecular weights into the program, and set up a cycle to ask the user for the chemical symbol and the quantity of that atom present. The program correctly displays the atomic weight of each atom, but the total molecular weight isn't being calculated correctly. "Total" is being assigned the value of the last chemical symbol checked instead of the sum of each symbol. I thought the variables would keep the values assigned to them when they were last searched because that part of the code would be skipped over during the next cycle. Obviously, that isn't what is happening. Is there a way to update the variables without losing the data during the next run?


    Code:
    #include <stdio.h> 
    #include <conio.h> 
    #include <string.h> 
    #include <stdlib.h>
    int main(void) 
    {
    /* Declare Variables */
    char InpSym[4];
    double a, b, c;
    char H[4], He[4], Li[4], Be[4];
    int reset, ElemNum, Quantity, repeat;
    double ChemSym, Total;
    
    /* Reset all elements to zero */
    for (reset = 0; reset < 4; reset++)
          {
          InpSym[reset] = 0;
          H[reset] = 0;
          He[reset] = 0;
          Li[reset] = 0;
          Be[reset] = 0;
          }
    /* Set variabls to zero */    
    a = 0;
    b = 0;
    c = 0;
    
    /* Fill arrays with chemical symbols */     
    H[0] = 'H';
    He[0] = 'H';
    Li[0] = 'L';
    Be[0] = 'B';
    He[1] = 'e';
    Li[1] = 'i';
    Be[1] = 'e';
    
    /* Ask user for number of elements in the chemical formula */
    printf("How many different elements?> ");
    scanf("%d", &ElemNum);
    
    /* Begin scan */ 
    for (repeat = 0; repeat < ElemNum; repeat++)            //Repeats the program for each different element.
       {
       printf("\nPlease enter chemical symbol> ");          //Asks to enter the first symbol.
       scanf("%s", InpSym);
       if (strcmp(InpSym, H) == 0)                          //Compares the entered data against the chemical symbol for hydrogen.
          {
          a = 1.00794;                                      //The atomic mass of hydrogen.
          printf("\nQuantity?> ");                          //Asks user for number of hydrogen atoms in the compound.
          scanf("%d", &Quantity);
          a = a * Quantity;                                 //Multiplies the atomic mass by the number of atoms present.
          printf("%lf", a);                                 //Prints total mass of first element. (For testing use only)
          }
          else if (strcmp(InpSym, He) == 0)                 //Same for helium.
             {
             b = 4.002602;
             printf("\nQuantity?> ");
             scanf("%d", &Quantity);
             b = b * Quantity;
             printf("%lf", b);
             }
             else if (strcmp(InpSym, Li) == 0)               //Same for lithium.
                {
                c = 6.941;
                printf("\nQuantity?> ");
                scanf("%d", &Quantity);
                c = c * Quantity;
                printf("%lf", c);
                }
                else                                         //Tells the user that they did not enter a correct chemical symbol.
                    {                                                                                                                                                                                                                                                                                                                                                
                    printf("That chemical does not exist.");
                    repeat--;
                    }
    
    Total = a + b + c                                        //Totals the molecular weight and
    printf("\nMolecular weight is: %lf", &Total);            //Tells it to the user.
    return(0);
    }

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Is there any particular reason why you did not take my advice in post #3?

    What you are doing now is unwieldy and error-prone: you are creating an array of char for each chemical element, and then you are hardcoding the atomic weights within the if-else chain. You then assign the names of the chemical elements in a way that is extremely tedious and difficult to proof read in the source. Compare to:
    Code:
    const ChemicalElement chemical_elements[] = {
        {"H", 1.00794},
        {"He", 4.002602},
        {"Li", 6.941},
        {"Be", 9.012},
        /* ... */
    };
    const size_t num_chemical_elements = sizeof(chemical_elements) / sizeof(chemical_elements[0]);
    
    /* ... */
    
    double molecular_weight = 0.0;
    
    /* Begin scan */
    for (repeat = 0; repeat < ElemNum; repeat++)
    {
        printf("\nPlease enter chemical symbol> ");
        if (scanf("%3s", InpSym) == 1)
        {
            int i;
            for (i = 0; i < num_chemical_elements; ++i)
            {
                if (strcmp(InpSym, chemical_elements[i].symbol) == 0)
                {
                    int quantity;
                    printf("\nQuantity?> ");
                    if (scanf("%d", &quantity) == 1 && quantity > 0)
                    {
                        molecular_weight += chemical_elements[i].atomic_weight * quantity;
                    }
                    else
                    {
                        /* Handle invalid quantity input */
                    }
    
                    break;
                }
            }
    
            if (i == NUM_CHEMICAL_ELEMENTS)
            {
                printf("That chemical does not exist.");
                repeat--;
            }
        }
        else
        {
            break;
        }
    }
    
    printf("\nMolecular weight is: %lf\n", molecular_weight);
    Notice that the chemical symbols and atomic weights only appear at the start, where we define the array named chemical_elements. Because they are listed in one place, it should be easy to copy or verify them by consulting the periodic table.

    In the logic for computing the molecular weight, we don't see these values, but rather they are accessed by accessing the chemical_elements array to find the corresponding ChemicalElement object. Notice also that I use descriptive names, not names like a, b and c. Where I do use the name i, it is because i is a conventional name for a loop/array index. Take note of my consistent indentation, and also look carefully at the last line where I print molecular_weight, not &molecular_weight (also, for printing %f would do; %lf is only needed for reading with scanf and the like).
    Last edited by laserlight; 08-24-2014 at 06:07 AM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Read file line, search for string, get string position
    By taifun in forum C++ Programming
    Replies: 15
    Last Post: 03-24-2014, 02:55 PM
  2. search string into file
    By polslinux in forum C Programming
    Replies: 6
    Last Post: 06-27-2012, 09:27 AM
  3. How to search through a file for a certain string?
    By tranman1 in forum C Programming
    Replies: 5
    Last Post: 03-08-2010, 11:59 PM
  4. search a string in a file
    By hjr in forum C Programming
    Replies: 13
    Last Post: 10-11-2009, 04:13 PM
  5. search a file for a string?
    By Unregistered in forum C++ Programming
    Replies: 1
    Last Post: 12-06-2001, 07:24 PM