strings and matrix

**ipe** · 01-02-2003

Hi guys!
I'm making this program to read a ruge text file (1.6Mb) and drop the lines into memmory.
But I'd like to catch only the text before a TAB in each line. Like this:

bla-bla...bla [TAB] text_I_want

Code:

#include <stdio.h>

FILE *arq;
char hst[59000][81];

void open()
{
	char cor;
	int a, c;
	a = c= 0;
	
	cor = getc(arq);
	while (cor!=EOF)
	{
		if (cor==10) a++;
		cor = getc(arq);
	}
	a++;
	printf("Total lines: %d",a);
	
	fseek(arq, 0, SEEK_SET);
	
	for (c; c < a; c++)
	{
		fgets(hst[c], sizeof(hst[c]), arq );
		strcpy(hst[c], strchr(hst[c],9+1));
		
	}
		
}


void main()
{
	system("cls");
	arq = fopen("test.txt","r");
	open();
	fclose(arq);
}

So the code line strcpy(hst[c], strchr(hst[c],9+1)); is returning error on runtime. I checked the cause of it is +1).
When I change it to strcpy(hst[c], strchr(hst[c],9)); it runs ok but with the [TAB] on string. How do I get just the text after the TAB?

I have another question yet:
The file size and the line numbers can vary. So currently the textfile have exactly 58,735 lines so I'd like to create the hst matrix with the exact size of each line, then I'd save memmory usage

char hst[max_line_numbers][current_line_length];

**quzah** · 01-02-2003

Code:

strchr(hst[c],9+1)

The second argument to strchr is supposed to be the character to look for. You're passing it the decimal value of 10 (or 9). Thus, it searches the string for the decimal value of 10 (or 9) and returns that location. 9 is tab. 10 is newline.

In short, you're using it incorrectly, or rather, unsafely. You should be checking the return value of strstr to make sure it isn't null.

Quzah.

**Cshot** · 01-02-2003

Is this what you want?

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_LINE_LENGTH 81

void open(FILE *arq, char **hst)
{
   int i;
   int numLines;
   char currLine[MAX_LINE_LENGTH];
   
   // count number of lines
   numLines = 0;
   while(fgets(currLine, MAX_LINE_LENGTH, arq) != NULL)
      ++numLines;
   // reset file pointer
   fseek(arq, SEEK_SET, 0);
   
   // allocate memory and store lines into file
   hst = malloc(numLines * sizeof(char *));
   if(hst == NULL)
   {
      printf("malloc error\n");
      return;
   }
   for(i = 0; i < numLines; i++)
   {
      hst[i] = malloc(MAX_LINE_LENGTH * sizeof(char));
      if(hst[i] == NULL)
      {
         printf("malloc error\n");
         return;
      }
      fgets(currLine, MAX_LINE_LENGTH, arq);
      strcpy(hst[i], strchr(currLine, '\t')+1);
   }
   
   // free memory
   for(i = 0; i < numLines; i++)
      free(hst[i]);
   free(hst);
   
   return;
}

int main()
{
   FILE *fp;
   char **buffer = NULL;
   
   system("cls");
   fp = fopen("test.txt","r");
   if(fp == NULL)
      printf("Error opening file\n");
   else
   {
      open(fp, buffer);
      fclose(fp);
   }
   return 0;
}

**ipe** · 01-02-2003

Thank you guys!

quzah:
Actually the right codeline is

Code:

strcpy(hst[c], strchr(hst[c],9)+1);

When I transfered the code to here I made some adapts and I changed it with no intention. sorry.

Cshot:
Sorry you too. the line you adapted

Code:

strcpy(hst[i], strchr(currLine, '\t')+1);

is the same of mine:

Code:

strcpy(hst[c], strchr(hst[c], 9)+1);

the error is in this line.
Anyway your code is better than mine. Also you implemented maloc().

Well I implemented a function to sweep the lines stored into memory and show me the repeated entries:

Code:

void repeated()
{
	int a, c, repetidos = 0;
	for (a = 0; a < lines; a++) 
	{
		for (c = a + 1; c < lines; c++)
		{
			if (strcmp(m[a],m[c]))
			{
				repetidos++;
				printf("\nrepeted: %d : %d",a + 1,c + 1);
			}
		}
	}
	printf("\nrepeated: %d", repetidos);
}

Believe me: this function took 6 mins to be processed at pentium3 1Ghz. As I said its about 58,735 lines (biggest lenght line: 80).
Note that I used this function with the complete line:

bla-bla...bla [TAB] text_I_want

I have a VB3 program that do the same work with the same file (1.6Mb) in less than 30 seconds. How do I improve this time in my code?

And about that:

Code:

strcpy(hst[c], strchr(hst[c], 9)+1);

Thank you guys

**ipe** · 01-02-2003

worked fine:

Code:

if ( strchr(hst[b],9) )
{		
         strcpy(hst[b], strchr(hst[b],9)+1);
         printf("%s",hst[b]);
}

Now I need a solution for the function repeated(). Someone here could help me to improve this function?
Thanks a lot!

**Hammer** · 01-02-2003

In the repeated() function, this is wrong (I presume):
>>if (strcmp(m[a],m[c]))
Did you mean to find matching lines? If so, you need it like this:
>>if (strcmp(m[a],m[c]) == 0)

And the algorythm is off too. Maybe you should try a binary tree.

**ipe** · 01-03-2003

Code:

if (strcmp(m[a],m[c]) == 0)

Yeah, you right! I just corrected this.
But it still taking so much time.
i tried something like that

Code:

int j, k;
...
j = m[a] /* convert the string to a integer*/
k = m[b] /* convert the string to a integer*/
if (j == k)
{
       ....
}

This way is faster but didn't worked because it give differents values for two equal lines

I also thinked about make a decimal hash of each line and then compare the hashes (numbers). But I don't know hash a string.

Maybe you should try a binary tree.

How come? Could you help me, please?

**ipe** · 01-03-2003

Well I improved the code and now it take just 45 seconds!!!
Now it will only compare the strings with the same size:

Code:

void repeat()
{
	int a, b, c; 
	unsigned char tam[59000];
	a = b = c = 0;
	
	for (a; a<qt_hst;a++)
		tam[a]=strlen(hst[a]);

	for (a=0; a < qt_hst; a++)
	{
		for (b = a+1 ; b < qt_hst; b++)
		{
			if (tam[a] == tam[b])
			{
				if ( strnicmp(hst[a], hst[b], strlen(hst[a])) == 0)
				{	
					printf("%d : %d\n",a+1,b+1);
					printf("%s%s\n",hst[a],hst[b]);
					c++;
				}
			}	
		}
	}		
	printf("\n\n%d\n",c);
}

but the VB3 program still better with 35 seconds. I expect it go at 20 seconds margin
Please guys, help me!

**ipe** · 01-03-2003

Code:

if ( memcmp(hst[a], hst[b], strlen(hst[a])) == 0)

it took 1 min

Code:

if ( strcmp(hst[a], hst[b]) == 0)

it took 25 seconds!¡¡¡¡¡!!!

When did strcmp() become stricmp()?

http://www.mkssoftware.com/docs/man3/strcoll.3.asp strcoll()
http://www.mkssoftware.com/docs/man3/strcmp.3.asp strcmp()
http://www.qnx.com/developer/docs/qn...s/stricmp.html stricmp()

The stricmp() function compares, with case insensitivity
How do I convert a string to lower case? tolower() would take so much time wouldn't it?

**ipe** · 01-03-2003

Code:

if ( memcmp(hst[a], hst[bb], tam[a]) == 0)

is 1 second faster than

Code:

if (strcmp(hst[a], hst[bb]) == 0)

certainly I'll keep memcmp

Thank you guys!

Thread: strings and matrix

Thread Tools

Search Thread

Display

strings, matrix and strcmp()

Similar Threads

input data from file into matrix?

Help w/ graph as adjacency matrix

unable to read double A[0] and A[1] when n=1

two dimensional dynamic array?