Thread: problem collecting data from input file

  1. #1
    Registered User
    Join Date
    Feb 2008
    Posts
    77

    problem collecting data from input file

    Hello to all.

    I am unable to collect data from my input files. The format of the input files is standardized.

    the format is something like this:

    >blah blah blah
    SDJKDJDJKFJDSKLJGDKLJFDKJDFJDSKJ

    I need to be able to take the info after the ">" and store in a structure called
    loadedSequences.name and then take the characters after that and store in
    a structure named loadedSequences.data.

    I am able to deal with input files and get the first part but unable to collect the second part.

    Here is the relevant code:

    Code:
    if( (strcmp ( command, "read" )) == 0 )
                        {
                            
                            filename = strtok(NULL, " \t\n" ) ; // collect filename                       
                            if( filename ) 
                                { 
                                    input = fopen(filename, "r") ;
                                    if ( !input )
                                        {
                                            perror(filename);
                                            continue ;
                                        }                                
                                }                            
                           
                            fgets( header_data, 1000, input) ; 
                            
                            seqName = &header_data[1] ;   // After the > in FASTA format  
                            
                            loadedSequences[nSequences].name = 
                                (char *) malloc( ( strlen(seqName) + 1 ) * sizeof( char ) ) ;
                                
                            strcpy( loadedSequences[nSequences].name , seqName ) ;
                            
                            // Collect the sequence from FASTA file
                            
                            loadedSequences[nSequences].data[n++] = 0 ;
                            
                             while (c = getc(input) != EOF)
                                        {
                                            if(c >= 'A' && c <= 'Z')
                                                {
                                                    loadedSequences[nSequences].data[n++] = c ;
                                                }
                                        }  
                                           
                            
                            loadedSequences[nSequences].length = strlen((loadedSequences[nSequences].data)) ;
                            
                            fclose(input) ;
                            
                            ++nSequences ;
                   
                        }
    Any help would be great.

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    loadedSequences.data will always be an empty string, if you put a 0 in the first element. Of course that assumes that you initialize n at all, which it doesn't appear that you do. But you don't ever ever ever ever ever ever ever want your strings to start with a 0.

  3. #3
    Registered User
    Join Date
    Feb 2008
    Posts
    77
    Still not able to get data.


    Here is my code again with the declarations:

    Code:
    #include <stdio.h>
    #include <string.h>
    #include <ctype.h>
    #include <stdlib.h>
    
    typedef struct { char *name ; char data[1000000] ; int length ; int type ; } sequence ;
    
    sequence loadedSequences[1000] ;
    
    int nSequences = 0 ;
        
    int main()
        {
            char std_input[1000] ;  // prev. buffer
            char header_data[1000] ;
            char seq_data[100000] ;
            char buffer[100000] ;
            
            FILE *input ;
            
            char *seqName ;        
            char *command ;
            char *filename, *list_num ;
            
            int i = 0 ;
            int c ;
            int n = 0 ;
            
            for(;;)
                {
                    printf("SeqTool> ") ;
                    
                    fgets(std_input, 1000, stdin) ;
                    fpurge(stdin) ;
                    
                    // Eliminating the newline character
                    char *p = strchr(std_input, '\n') ;
                    if (p) 
                        {
                            *p = '\0' ; 
                        } 
                                              
                    command = strtok( std_input, " \t\n" ) ; 
                    
    ////////////////// Load sequence in FASTA format //////////////////////////
                    
                    if( (strcmp ( command, "read" )) == 0 )
                        {
                            
                            filename = strtok(NULL, " \t\n" ) ; // collect filename                       
                            if( filename ) 
                                { 
                                    input = fopen(filename, "r") ;
                                    if ( !input )
                                        {
                                            perror(filename);
                                            continue ;
                                        }                                
                                }                            
                           
                            fgets( header_data, 1000, input) ; 
                            
                            seqName = &header_data[1] ;   // After the > in FASTA format  
                            
                            loadedSequences[nSequences].name = 
                                (char *) malloc( ( strlen(seqName) + 1 ) * sizeof( char ) ) ;
                                
                            strcpy( loadedSequences[nSequences].name , seqName ) ;
                            
                            // Collect the sequence from FASTA file
                            
                            
                             while (c = getc(input) != EOF)
                                        {
                                            if(c >= 'A' && c <= 'Z')
                                                {
                                                    loadedSequences[nSequences].data[n++] = c ;
                                                }
                                        }  
                                           
                            
                            loadedSequences[nSequences].length = strlen((loadedSequences[nSequences].data)) ;
                            
                            fclose(input) ;
                            
                            ++nSequences ;
                   
                        }

  4. #4
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    You still need to re-initialize n to 0 each time through.

  5. #5
    Registered User
    Join Date
    Feb 2008
    Posts
    77
    Still not able to collect data with code.


    Code:
    if( (strcmp ( command, "read" )) == 0 )
                        {
                            
                            filename = strtok(NULL, " \t\n" ) ; // collect filename                       
                            if( filename ) 
                                { 
                                    input = fopen(filename, "r") ;
                                    if ( !input )
                                        {
                                            perror(filename);
                                            continue ;
                                        }                                
                                }                            
                           
                            fgets( header_data, 1000, input) ; 
                            
                            seqName = &header_data[1] ;   // After the > in FASTA format  
                            
                            loadedSequences[nSequences].name = 
                                (char *) malloc( ( strlen(seqName) + 1 ) * sizeof( char ) ) ;
                                
                            strcpy( loadedSequences[nSequences].name , seqName ) ;
                            
                            // Collect the sequence from FASTA file
                            
                            
                             while (c = getc(input) != EOF)
                                        {
                                            n = 0 ;
                                            if(c >= 'A' && c <= 'Z')
                                                {
                                                    loadedSequences[nSequences].data[n++] = c ;
                                                 
                                                }
                                        }  
                                           
                            
                            loadedSequences[nSequences].length = strlen((loadedSequences[nSequences].data)) ;
                            
                            fclose(input) ;
                            
                            ++nSequences ;
                   
                        }

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Now you're overwriting loadedSequences[nSequences].data[0] each time. In the big loop, but not inside that while loop.

  7. #7
    Registered User
    Join Date
    Feb 2008
    Posts
    77
    It is still not getting the data

    Code:
    if( (strcmp ( command, "read" )) == 0 )
                        {
                            
                            filename = strtok(NULL, " \t\n" ) ; // collect filename                       
                            if( filename ) 
                                { 
                                    input = fopen(filename, "r") ;
                                    if ( !input )
                                        {
                                            perror(filename);
                                            continue ;
                                        }                                
                                }                            
                           
                            fgets( header_data, 1000, input) ; 
                            
                            seqName = &header_data[1] ;   // After the > in FASTA format  
                            
                            loadedSequences[nSequences].name = 
                                (char *) malloc( ( strlen(seqName) + 1 ) * sizeof( char ) ) ;
                                
                            strcpy( loadedSequences[nSequences].name , seqName ) ;
                            
                            // Collect the sequence from FASTA file                                             
                            
                            n = 0 ;
                            
                            while(fgets(seq_data, 1000, input))
                                {                     
                                     while (c = (getc(input) != EOF))
                                                {
                                                    if(c >= 'A' && c <= 'Z')
                                                        {
                                                            loadedSequences[nSequences].data[n++] = c ;
                                                         
                                                        }
                                                }  
                                }          
                            
                            loadedSequences[nSequences].length = strlen((loadedSequences[nSequences].data)) ;
                            
                            fclose(input) ;
                            
                            ++nSequences ;
                   
                        }

  8. #8
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by gkoenig View Post
    It is still not getting the data

    Code:
    if( (strcmp ( command, "read" )) == 0 )
                        {
                            
                            filename = strtok(NULL, " \t\n" ) ; // collect filename                       
                            if( filename ) 
                                { 
                                    input = fopen(filename, "r") ;
                                    if ( !input )
                                        {
                                            perror(filename);
                                            continue ;
                                        }                                
                                }                            
                           
                            fgets( header_data, 1000, input) ; 
                            
                            seqName = &header_data[1] ;   // After the > in FASTA format  
                            
                            loadedSequences[nSequences].name = 
                                (char *) malloc( ( strlen(seqName) + 1 ) * sizeof( char ) ) ;
                                
                            strcpy( loadedSequences[nSequences].name , seqName ) ;
                            
                            // Collect the sequence from FASTA file                                             
                            
                            n = 0 ;
                            
                            while(fgets(seq_data, 1000, input)) /*WTF?*/
                                {                     
                                     while (c = (getc(input) != EOF))
                                                {
                                                    if(c >= 'A' && c <= 'Z')
                                                        {
                                                            loadedSequences[nSequences].data[n++] = c ;
                                                         
                                                        }
                                                }  
                                }          
                            
                            loadedSequences[nSequences].length = strlen((loadedSequences[nSequences].data)) ;
                            
                            fclose(input) ;
                            
                            ++nSequences ;
                   
                        }
    So you added a line there from last time, which takes 1000 characters from your input file and throws them away. Why?

  9. #9
    Registered User
    Join Date
    Feb 2008
    Posts
    77
    I am trying to set myself at the second line of the input file so that I can read data from there until the end of the file.

    Code:
    n = 0 ;
                                                
                             while (c = (getc(input) != EOF))
                                        {
                                            if(c >= 'A' && c <= 'Z')
                                                {
                                                    loadedSequences[nSequences].data[n++] = c ;
                                                 
                                                }
                                        }

  10. #10
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by gkoenig View Post
    I am trying to set myself at the second line of the input file so that I can read data from there until the end of the file.

    Code:
    n = 0 ;
                                                
                             while (c = (getc(input) != EOF))
                                        {
                                            if(c >= 'A' && c <= 'Z')
                                                {
                                                    loadedSequences[nSequences].data[n++] = c ;
                                                 
                                                }
                                        }
    Well, you were already at the second line of the file, since you read the first one with the bit about name. Also, I think you've got some parentheses sideways:
    Code:
    while ((c = getc(input)) != EOF)
    otherwise the truth value would come from the assignment, not the comparison.

  11. #11
    Registered User
    Join Date
    Feb 2008
    Posts
    77
    Still unable to collect sequence data with code:

    Code:
     if( (strcmp ( command, "read" )) == 0 )
                        {
                            
                            filename = strtok(NULL, " \t\n" ) ; // collect filename                       
                            if( filename ) 
                                { 
                                    input = fopen(filename, "r") ;
                                    if ( !input )
                                        {
                                            perror(filename);
                                            continue ;
                                        }                                
                                }                            
                           
                            fgets( header_data, 1000, input) ; 
                            
                            seqName = &header_data[1] ;   // After the > in FASTA format  
                            
                            loadedSequences[nSequences].name = 
                                (char *) malloc( ( strlen(seqName) + 1 ) * sizeof( char ) ) ;
                                
                            strcpy( loadedSequences[nSequences].name , seqName ) ;
                            
                            // Collect the sequence from FASTA file                                             
                            
                            n = 0 ;
                                                
                             while (c = getc(input) != EOF)
                                        {
                                            if(c >= 'A' && c <= 'Z')
                                                {
                                                    loadedSequences[nSequences].data[n++] = c ;
                                                 
                                                }
                                        }  
                                      
                            
                            loadedSequences[nSequences].length = strlen((loadedSequences[nSequences].data)) ;
                            
                            fclose(input) ;
                            
                            ++nSequences ;
                   
                        }
    Any suggestions would be great

  12. #12
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    (c = getc(input)) != EOF
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  13. #13
    Registered User
    Join Date
    Feb 2008
    Posts
    77
    Thank you for the help. Everything seems to be going fine. Sorry the newbie mistake.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Can we have vector of vector?
    By ketu1 in forum C++ Programming
    Replies: 24
    Last Post: 01-03-2008, 05:02 AM
  2. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  3. Message printing problem
    By robert_sun in forum C Programming
    Replies: 1
    Last Post: 05-18-2004, 05:05 AM
  4. reading a columns input data file
    By vk13wp in forum C Programming
    Replies: 6
    Last Post: 04-28-2003, 01:32 PM
  5. simulate Grep command in Unix using C
    By laxmi in forum C Programming
    Replies: 6
    Last Post: 05-10-2002, 04:10 PM