strings and matrix

This is a discussion on strings and matrix within the C Programming forums, part of the General Programming Boards category; Hi guys! I'm making this program to read a ruge text file (1.6Mb) and drop the lines into memmory. But ...

  1. #1
    ipe
    ipe is offline
    Registered User
    Join Date
    Jan 2003
    Posts
    52

    strings, matrix and strcmp()

    Hi guys!
    I'm making this program to read a ruge text file (1.6Mb) and drop the lines into memmory.
    But I'd like to catch only the text before a TAB in each line. Like this:

    bla-bla...bla [TAB] text_I_want

    Code:
    #include <stdio.h>
    
    FILE *arq;
    char hst[59000][81];
    
    void open()
    {
    	char cor;
    	int a, c;
    	a = c= 0;
    	
    	cor = getc(arq);
    	while (cor!=EOF)
    	{
    		if (cor==10) a++;
    		cor = getc(arq);
    	}
    	a++;
    	printf("Total lines: %d",a);
    	
    	fseek(arq, 0, SEEK_SET);
    	
    	for (c; c < a; c++)
    	{
    		fgets(hst[c], sizeof(hst[c]), arq );
    		strcpy(hst[c], strchr(hst[c],9+1));
    		
    	}
    		
    }
    
    
    void main()
    {
    	system("cls");
    	arq = fopen("test.txt","r");
    	open();
    	fclose(arq);
    }
    So the code line strcpy(hst[c], strchr(hst[c],9+1)); is returning error on runtime. I checked the cause of it is +1).
    When I change it to strcpy(hst[c], strchr(hst[c],9)); it runs ok but with the [TAB] on string. How do I get just the text after the TAB?

    I have another question yet:
    The file size and the line numbers can vary. So currently the textfile have exactly 58,735 lines so I'd like to create the hst matrix with the exact size of each line, then I'd save memmory usage

    char hst[max_line_numbers][current_line_length];
    Last edited by ipe; 01-02-2003 at 03:26 PM.

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Code:
    strchr(hst[c],9+1)
    The second argument to strchr is supposed to be the character to look for. You're passing it the decimal value of 10 (or 9). Thus, it searches the string for the decimal value of 10 (or 9) and returns that location. 9 is tab. 10 is newline.

    In short, you're using it incorrectly, or rather, unsafely. You should be checking the return value of strstr to make sure it isn't null.

    Quzah.
    Last edited by quzah; 01-02-2003 at 09:24 PM.
    Hope is the first step on the road to disappointment.

  3. #3
    Green Member Cshot's Avatar
    Join Date
    Jun 2002
    Posts
    892
    Is this what you want?

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    #define MAX_LINE_LENGTH 81
    
    void open(FILE *arq, char **hst)
    {
       int i;
       int numLines;
       char currLine[MAX_LINE_LENGTH];
       
       // count number of lines
       numLines = 0;
       while(fgets(currLine, MAX_LINE_LENGTH, arq) != NULL)
          ++numLines;
       // reset file pointer
       fseek(arq, SEEK_SET, 0);
       
       // allocate memory and store lines into file
       hst = malloc(numLines * sizeof(char *));
       if(hst == NULL)
       {
          printf("malloc error\n");
          return;
       }
       for(i = 0; i < numLines; i++)
       {
          hst[i] = malloc(MAX_LINE_LENGTH * sizeof(char));
          if(hst[i] == NULL)
          {
             printf("malloc error\n");
             return;
          }
          fgets(currLine, MAX_LINE_LENGTH, arq);
          strcpy(hst[i], strchr(currLine, '\t')+1);
       }
       
       // free memory
       for(i = 0; i < numLines; i++)
          free(hst[i]);
       free(hst);
       
       return;
    }
    
    int main()
    {
       FILE *fp;
       char **buffer = NULL;
       
       system("cls");
       fp = fopen("test.txt","r");
       if(fp == NULL)
          printf("Error opening file\n");
       else
       {
          open(fp, buffer);
          fclose(fp);
       }
       return 0;
    }
    Try not.
    Do or do not.
    There is no try.

    - Master Yoda

  4. #4
    ipe
    ipe is offline
    Registered User
    Join Date
    Jan 2003
    Posts
    52
    Thank you guys!

    quzah:
    Actually the right codeline is
    Code:
    strcpy(hst[c], strchr(hst[c],9)+1);
    When I transfered the code to here I made some adapts and I changed it with no intention. sorry.

    Cshot:
    Sorry you too. the line you adapted
    Code:
    strcpy(hst[i], strchr(currLine, '\t')+1);
    is the same of mine:
    Code:
    strcpy(hst[c], strchr(hst[c], 9)+1);
    the error is in this line.
    Anyway your code is better than mine. Also you implemented maloc().

    Well I implemented a function to sweep the lines stored into memory and show me the repeated entries:

    Code:
    void repeated()
    {
    	int a, c, repetidos = 0;
    	for (a = 0; a < lines; a++) 
    	{
    		for (c = a + 1; c < lines; c++)
    		{
    			if (strcmp(m[a],m[c]))
    			{
    				repetidos++;
    				printf("\nrepeted: %d : %d",a + 1,c + 1);
    			}
    		}
    	}
    	printf("\nrepeated: %d", repetidos);
    }
    Believe me: this function took 6 mins to be processed at pentium3 1Ghz. As I said its about 58,735 lines (biggest lenght line: 80).
    Note that I used this function with the complete line:

    bla-bla...bla [TAB] text_I_want

    I have a VB3 program that do the same work with the same file (1.6Mb) in less than 30 seconds. How do I improve this time in my code?

    And about that:
    Code:
    strcpy(hst[c], strchr(hst[c], 9)+1);
    Thank you guys
    Attached Images Attached Images  

  5. #5
    ipe
    ipe is offline
    Registered User
    Join Date
    Jan 2003
    Posts
    52
    worked fine:

    Code:
    if ( strchr(hst[b],9) )
    {		
             strcpy(hst[b], strchr(hst[b],9)+1);
             printf("%s",hst[b]);
    }
    Now I need a solution for the function repeated(). Someone here could help me to improve this function?
    Thanks a lot!

  6. #6
    End Of Line Hammer's Avatar
    Join Date
    Apr 2002
    Posts
    6,231
    In the repeated() function, this is wrong (I presume):
    >>if (strcmp(m[a],m[c]))
    Did you mean to find matching lines? If so, you need it like this:
    >>if (strcmp(m[a],m[c]) == 0)

    And the algorythm is off too. Maybe you should try a binary tree.
    When all else fails, read the instructions.
    If you're posting code, use code tags: [code] /* insert code here */ [/code]

  7. #7
    ipe
    ipe is offline
    Registered User
    Join Date
    Jan 2003
    Posts
    52
    Code:
    if (strcmp(m[a],m[c]) == 0)
    Yeah, you right! I just corrected this.
    But it still taking so much time.
    i tried something like that

    Code:
    int j, k;
    ...
    j = m[a] /* convert the string to a integer*/
    k = m[b] /* convert the string to a integer*/
    if (j == k)
    {
           ....
    }
    This way is faster but didn't worked because it give differents values for two equal lines

    I also thinked about make a decimal hash of each line and then compare the hashes (numbers). But I don't know hash a string.

    Maybe you should try a binary tree.
    How come? Could you help me, please?

  8. #8
    ipe
    ipe is offline
    Registered User
    Join Date
    Jan 2003
    Posts
    52
    Well I improved the code and now it take just 45 seconds!!!
    Now it will only compare the strings with the same size:
    Code:
    void repeat()
    {
    	int a, b, c; 
    	unsigned char tam[59000];
    	a = b = c = 0;
    	
    	for (a; a<qt_hst;a++)
    		tam[a]=strlen(hst[a]);
    
    	for (a=0; a < qt_hst; a++)
    	{
    		for (b = a+1 ; b < qt_hst; b++)
    		{
    			if (tam[a] == tam[b])
    			{
    				if ( strnicmp(hst[a], hst[b], strlen(hst[a])) == 0)
    				{	
    					printf("%d : %d\n",a+1,b+1);
    					printf("%s%s\n",hst[a],hst[b]);
    					c++;
    				}
    			}	
    		}
    	}		
    	printf("\n\n%d\n",c);
    }
    but the VB3 program still better with 35 seconds. I expect it go at 20 seconds margin
    Please guys, help me!

  9. #9
    ipe
    ipe is offline
    Registered User
    Join Date
    Jan 2003
    Posts
    52
    Code:
    if ( memcmp(hst[a], hst[b], strlen(hst[a])) == 0)
    it took 1 min

    Code:
    if ( strcmp(hst[a], hst[b]) == 0)
    it took 25 seconds!กกกกก!!!


    When did strcmp() become stricmp()?
    http://www.mkssoftware.com/docs/man3/strcoll.3.asp strcoll()
    http://www.mkssoftware.com/docs/man3/strcmp.3.asp strcmp()
    http://www.qnx.com/developer/docs/qn...s/stricmp.html stricmp()

    The stricmp() function compares, with case insensitivity
    How do I convert a string to lower case? tolower() would take so much time wouldn't it?

  10. #10
    ipe
    ipe is offline
    Registered User
    Join Date
    Jan 2003
    Posts
    52
    Code:
    if ( memcmp(hst[a], hst[bb], tam[a]) == 0)
    is 1 second faster than
    Code:
    if (strcmp(hst[a], hst[bb]) == 0)
    certainly I'll keep memcmp

    Thank you guys!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. input data from file into matrix?
    By cuizy in forum C Programming
    Replies: 1
    Last Post: 04-16-2009, 05:06 AM
  2. Help w/ graph as adjacency matrix
    By ac251404 in forum C++ Programming
    Replies: 4
    Last Post: 05-09-2006, 10:25 PM
  3. unable to read double A[0] and A[1] when n=1
    By sweetarg in forum C Programming
    Replies: 2
    Last Post: 10-25-2005, 12:35 PM
  4. two dimensional dynamic array?
    By ichijoji in forum C++ Programming
    Replies: 6
    Last Post: 04-14-2003, 04:27 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21