Formatting a String - Removing the newline character

This is a discussion on Formatting a String - Removing the newline character within the C Programming forums, part of the General Programming Boards category; So I'm reading in words from a text file. I start off by reading each line into a buffer string ...

  1. #1
    Registered User
    Join Date
    Nov 2011
    Posts
    46

    Formatting a String - Removing the newline character

    So I'm reading in words from a text file. I start off by reading each line into a buffer string and then set a for loop to look for a space character or the newline character (as such would indicate the end of the word).

    Then I get the strlen, allocate memory for a dynamic string in a WORD struct that I have defined, and copy it over.

    My only problem now is getting rid of that newline character for the words that end with it.

    Code:
    void getData ()
    {
        FILE *fp1;
        char buf[BUFSIZ], *pFirst, *pLast, *pTemp;
        int length, i, temp_length;
        WORD *tempw;
        //Open file
        fp1 = fopen(FILE1, "r");
        if(!fp1){
            printf("Error opening file!\n");
            system("pause");
            exit(101);
        }
        //Read text from file
        while(fgets(buf, sizeof(buf), fp1)){//Read full line into buffer string
            length = strlen(buf);
            //printf("(%d)%s", length, buf);
            if(length > 1){//Check for lines with text
                pFirst = &buf[0];
                for(i = 0; i < length; i++)
                {
                    pLast = &buf[i];
                    if(*pLast == ' '){//Check for spaces
                        //printf("%d ", i);
                        tempw = createWord();
                        temp_length = strlen(pFirst) - strlen(pLast);
                        tempw->word = (char*) calloc(temp_length + 1, sizeof(char));
                        strncpy(tempw->word, pFirst, temp_length);
                        printf("%s (%d)\n", tempw->word, strlen(tempw->word));
                        pFirst = pLast + 1;//Reset initial pointer
                    }
                    if(*pLast == '\n'){//Check for end of line
                        //printf("%d ", i);
                        tempw = createWord();
                        temp_length = strlen(pFirst) - strlen(pLast);
                        pTemp = strchr(pLast, '\n');
                        if(pTemp)
                            *pTemp = '\0';
                        tempw->word = (char*) calloc(temp_length + 1, sizeof(char));
                        strncpy(tempw->word, pFirst, temp_length);
                        printf("%s (%d)\n", tempw->word, strlen(tempw->word));
                    }
                }
            }
        }
        
        return;
    }
    I noticed that some of the lines in the text files have one or two space characters before the newline. For some reason it goes by undetected and the condition for a terminating newline character is executed..

    Code:
    This is a sample.
    
    This line contains a space before the newline. 
    This one doesn't.
    This line contains two such characters before the newline.  
    End
    Any and all feedback is appreciated
    JeanErmand
    Last edited by jeanermand; 02-24-2012 at 08:49 PM.

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,310
    Quote Originally Posted by jeanermand
    My only problem now is getting rid of that newline character for the words that end with it.
    Since you are using fgets, the only place where a newline can be is at the end of the string. Since you are using strlen, buf[length - 1] is the last character of the string (assuming that length > 0). So, you should check if this character is the newline character. If so, set it to '\0'. Then, you can simplify your logic since you don't need to check for the end of line.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Nov 2011
    Posts
    46
    I thought of that, but then I thought that having the newline character would simplify the condition when looking for the end of a word.
    Anyway, I found my problem. Turns out those few extra spaces were being treated as separate strings themselves, albeit empty strings with a
    string length of zero, which in turn messed everything up.

    I have since debugged and now this works just fine :
    Code:
    void getData ()
    {
        FILE *fp1;
        char buf[BUFSIZ], *pFirst, *pLast, *pTemp;
        int length, temp_length;
        WORD *temp;
        //Open file
        fp1 = fopen(FILE1, "r");
        if(!fp1){
            printf("Error opening file!\n");
            system("pause");
            exit(101);
        }
        //Read text from file
        while(fgets(buf, sizeof(buf), fp1)){//Read full line into buffer string
            length = strlen(buf);
            if(length > 1){//Check for lines with text
                pFirst = &buf[0];
                for(pLast = buf; pLast < buf + length; pLast++)
                {
                    if(*pLast == ' ' || *pLast == '\n'){//Check for spaces or newline
                        temp_length = strlen(pFirst) - strlen(pLast);
                        if(temp_length > 0)
                        {
                            temp = createWord();
                            temp->word = (char*) calloc(temp_length + 1, sizeof(char));
                            strncpy(temp->word, pFirst, temp_length);
                            printf("%s(%d)\n", temp->word, strlen(temp->word));
                            pFirst = pLast + 1;//Reset initial pointer position
                        }
                        else
                            pFirst = pLast + 1;//Condition for empty strings
                    }
                }
            }
        }
        
        return;
    }

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,310
    Good to hear. Now, allow me to suggest an alternative: strtok. Your code can then be simplified to something like:
    Code:
    while (fgets(buf, sizeof(buf), fp1)) {
        char *token = strtok(buf, " \n");
        while (token) {
            size_t token_length = strlen(token);
            if (token_length > 0) {
                temp = createWord();
                temp->word = malloc(token_length + 1);
                strcpy(temp->word, token);
                printf("%s(%d)\n", temp->word, token_length);
            }
            token = strtok(NULL, " \n");
        }
    }
    EDIT:
    Actually, I just realised that you overwrite temp... this leads to a memory leak.
    Last edited by laserlight; 02-24-2012 at 10:29 PM.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User
    Join Date
    Nov 2011
    Posts
    46
    Yeah I've since edited it again. Each WORD struct pointer is supposed to be placed in a BST, this was simply for the purpose of properly handing the strings before placing them in the WORD struct:
    Code:
    void getData (BST_TREE *wordBST)
    {
        FILE *fp1;
        char buf[BUFSIZ], *pFirst, *pLast, *temp;
        int length, temp_length;
    
        //Open file
        fp1 = fopen(FILE1, "r");
        if(!fp1){
            printf("Error opening file!\n");
            system("pause");
            exit(101);
        }
        //Read text from file
        while(fgets(buf, sizeof(buf), fp1)){//Read full line into buffer string
            length = strlen(buf);
            if(length > 1){//Check for lines with text
                pFirst = &buf[0];
                for(pLast = buf + 1; pLast < buf + length; pLast++)
                {
                    if(*pLast == ' ' || *pLast == '\n'){//Check for spaces or newline
                        temp_length = strlen(pFirst) - strlen(pLast);
                        if(temp_length > 0)
                        {
                            temp = formatString(pFirst, &temp_length);
                            printf("%s(%d)\n", temp, strlen(temp));
                            addWord(wordBST, temp, temp_length);//Insert to BST
                            free(temp);
                            pFirst = pLast + 1;//Reset initial pointer position
                        }
                        else
                            pFirst = pLast + 1;//Condition for empty strings
                    }
                }
            }
        }
        
        return;
    }
    Last edited by jeanermand; 02-25-2012 at 08:37 PM.

  6. #6
    Registered User
    Join Date
    Sep 2011
    Posts
    111
    One thing to keep in mind is, depending what "OS" was used to make your file, there could be extra characters within the string before the newline. So the trick of using

    Code:
    //proper headers
    
    end = strlen(string);
    
    string[end+1] = '\0';
    may not always work. If you find that you are finding problem just do

    Code:
    end = strlen(string);
    
    printf("(%d)\n", end);
    printf("(%s)\n", string); //the ( ) allows easy visual inspection of the string
    That way you can compare the values.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,310
    Quote Originally Posted by Strahd
    One thing to keep in mind is, depending what "OS" was used to make your file, there could be extra characters within the string before the newline.
    No, because the file open mode was text, not binary, so the newline sequence will be converted to a newline character.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    Nov 2011
    Posts
    46
    Boy do I feel dumb... I just realized that I could simplify my code significantly by using fscanf:
    Code:
    void getData (BST_TREE *wordBST)
    {
        FILE *fp1;
        char buf[BUFSIZ], *pBuf;
        int length;
        //Open file
        fp1 = fopen(FILE1, "r");
        if(!fp1){
            printf("Error opening file!\n");
            system("pause");
            exit(101);
        }
        //Read text from file
        while(fscanf(fp1, "%s", &buf) != EOF){
            length = strlen(buf) + 1;
            if(!isalpha(buf[length - 2])){//Get rid of punctuation
                buf[length - 2] = '\0';
                length--;}
            for(pBuf = buf; pBuf < buf + length - 1; pBuf++)//Get all lowercase
                *pBuf = tolower(*pBuf);
            addWord(wordBST, buf, length);
        }
        fclose(fp1);
        return;
    }
    Thank you all for the responses and feedback
    Yet I still have not gotten my insertion function for the BST to work properly
    *sigh

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,310
    Quote Originally Posted by jeanermand
    I just realized that I could simplify my code significantly by using fscanf:
    If the individual lines don't matter, then yes, except that you need to specify the field width in the format specifier otherwise your code will be susceptible to buffer overflow.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 2
    Last Post: 11-12-2010, 11:54 AM
  2. Checking to see if a string is just a newline character
    By Beowolf in forum C++ Programming
    Replies: 3
    Last Post: 11-14-2007, 08:29 PM
  3. How to get rid of newline character
    By C++angel in forum C++ Programming
    Replies: 3
    Last Post: 02-07-2006, 06:50 PM
  4. Newline character
    By sean in forum Networking/Device Communication
    Replies: 6
    Last Post: 11-24-2004, 02:33 PM
  5. comparing int to newline character
    By RedZippo in forum C++ Programming
    Replies: 5
    Last Post: 05-13-2004, 06:37 PM

Tags for this Thread


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21