C Board  

Go Back   C Board > General Programming Boards > C Programming

Reply
 
LinkBack Thread Tools Display Modes
Old 02-18-2009, 05:10 PM   #1
Registered User
 
Join Date: Feb 2009
Posts: 3
pointer problems

Hello,

I'm pretty new to C programming, and am working on an assignment that uses the quicksort algorithm adapted to sort strings. I got it working OK except for the fact that it only seems to be looking at the first character in each string. I'm at a loss trying to figure out where the problem is. Anyone have any clues?

I'm testing it by piping it the contents of a file named "test" that contains the following:

Code:
bbb
ccc
aaa
ddd
I'm trying to get the output to be:
Code:
aaa
bbb
ccc
ddd
Instead, I'm getting:
Code:
a
b
c
d
Here's how I'm executing it:
Code:
$ cat test | ./a.out
Anyway, here's the c code. Sorry for the huge post.
Code:
//-----------------------------COMPILER DIRECTIVES------------------------------
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>

//-----------------------------FUNCTION PROTOTYPES------------------------------
void quicksort( char *[], int );
void swap( char *[], int, int );

//----------------------------------CONSTANTS-----------------------------------
const long LINE_CAP = 1000000;     // Max number of strings (lines)
const short STR_CAP = 82;          // Max characters per string +2 for '/0'

//--------------------------------MAIN FUNCTION---------------------------------
int main() {

// Declare Variables
   long lngLinesRead = 0;          // Keeps track of how many data lines exist
   char *pcLine[LINE_CAP];         // Array of character arrays
   char cLine[STR_CAP];            // Buffer for the array of characters
   int i;                          // Iterator for loops

// Read the char array into the buffer while EOF is not reached
   while( fgets( cLine, STR_CAP, stdin ) != NULL ) {

// Allocate space at the nth pointer in pcLine array
      pcLine[lngLinesRead] = (char *) malloc( sizeof( cLine ) );

// Set the value in the pointer to the value of the character array
      *(pcLine[lngLinesRead]) = *cLine;

// Increment to next line
      lngLinesRead++;
   }

// Sort the lines
   quicksort( pcLine, lngLinesRead );

// Print to standard output
   for( i = 0; i < lngLinesRead; i++ ) {
      printf( "%s\n", pcLine[i] );
      free( pcLine[i] );
   }

// Return error free
   return 0;
}

//-----------------------------FUNCTION DEFINITIONS-----------------------------

// swap:  interchange v[i] and v[j].
void swap( char *v[], int i, int j )
{
	char *temp;

	temp = v[i];
	v[i] = v[j];
	v[j] = temp;
}

// quicksort: sort v[0]..v[n-1] into increasing order.
void quicksort( char *v [], int n )
{
	int i, last;
	if( n <= 1 )                             // nothing to do
		return;

	swap( v, 0, rand() % n );                // move pivot element to v[0]
	last = 0;

	for ( i = 1; i < n; i++ )                // partition
		if ( strcmp( v[i], v[0] ) < 0 )
			swap( v, ++last, i );

	assert(( last >= 0 ) && ( last < n ));
	swap( v, 0, last );                      //restore pivot

	quicksort( v, last );                    // recursively sort each part
	quicksort( v+last+1, n-last-1 );
}
Thanks for the help, I really appreciate it.
BradC78 is offline   Reply With Quote
Old 02-18-2009, 05:22 PM   #2
Kernel hacker
 
Join Date: Jul 2007
Location: Farncombe, Surrey, England
Posts: 15,686
Code:
      *(pcLine[lngLinesRead]) = *cLine;
shoudl be a strcpy.

--
Mats
__________________
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
matsp is offline   Reply With Quote
Old 02-18-2009, 05:27 PM   #3
Complete Beginner
 
Join Date: Feb 2009
Posts: 312
I didn't test my proposal, so this is just guessing.

Code:
// Allocate space at the nth pointer in pcLine array
      pcLine[lngLinesRead] = (char *) malloc( sizeof( cLine ) );

// Set the value in the pointer to the value of the character array
      *(pcLine[lngLinesRead]) = *cLine;
First of all, use malloc(strlen(cLine)). You don't need to allocate more space than you need.

Second, the next line of code sets pcLine[IngLinesRead][0] to the first character of cLine, which is not what you want. In order to assign string values, you need to use strcpy():

Code:
strcpy(pcLine[IngLinesRead], cLine);
Try that and tell us what happened.

Greets,
Philip
__________________
All things begin as source code.
Source code begins with an empty file.
-- Tao Te Chip
Snafuist is offline   Reply With Quote
Old 02-18-2009, 05:37 PM   #4
Kernel hacker
 
Join Date: Jul 2007
Location: Farncombe, Surrey, England
Posts: 15,686
@snafuist & BradC78:
Code:
malloc(strlen(cLine)+1)
Would be a lot better!

@BradC78
Code:
   char *pcLine[LINE_CAP];         // Array of character arrays
This uses up 4MB of stack (assuming 32-bit pointers) - that is quite a lot to put on the stack. Either use a global (not nice, but you probably only need the one!) or use dynamic allocation to create the array in the first place [or don't use 1 million as LINE_CAP!].

You also do no check IF the file is longer than a million lines - you probably should!

--
Mats
__________________
Compilers can produce warnings - make the compiler programmers happy: Use them!
Please don't PM me for help - and no, I don't do help over instant messengers.
matsp is offline   Reply With Quote
Old 02-18-2009, 05:40 PM   #5
Complete Beginner
 
Join Date: Feb 2009
Posts: 312
Quote:
Originally Posted by matsp View Post
@snafuist & BradC78:
Code:
malloc(strlen(cLine)+1)
Would be a lot better!
/* no comment */

Arghl,
Philip
__________________
All things begin as source code.
Source code begins with an empty file.
-- Tao Te Chip
Snafuist is offline   Reply With Quote
Old 02-18-2009, 05:48 PM   #6
Registered User
 
Join Date: Feb 2009
Posts: 3
strcpy did the trick! Thanks everyone. I'm also going to incorporate the other suggestions. Thanks for the help, it's much appreciated!
BradC78 is offline   Reply With Quote
Old 02-18-2009, 06:56 PM   #7
Complete Beginner
 
Join Date: Feb 2009
Posts: 312
Your code is ok. Here are some more remarks:

1)
You're using comments, that's good. But you are using way too many comments: try to narrow down your comments to why you did it, not how. The code itself already explains the latter to full detail. If your code isn't obvious, make it so. On a general note, if you have more comments than code (and your name is not Donald E. Knuth), there's something wrong.

2)
"lngLinesRead" is a funky name, but you will only have to remember it for the next 10 lines of code. "size" will do just fine and is not the least more difficult to understand. The same applies to cLine: what's wrong with simply calling it "tmp"? I'm not interested in its type and/or purpose if it gets used only on two lines. Generally speaking, the length of a variable name and the size of its scope are directly proportional.

3)
Similar arguments hold for type encoding in the variable name (the so called Hungarian Notation). This is bad for at least two reasons: first, a good function is no longer than 25 lines (all inclusive) and doesn't have more than maybe 4-5 local variables. Every programmer can easily handle this all by himself without having to look up the type every time he encounters another variable, especially given that one variable is called "i" and another "tmp".
Second, the type of a variable may change in the future. Now consider a global variable in a project with several million lines of code. Do you want to adjust every occurrence of that variable to reflect its new type? But it doesn't need to be so big. Having to inspect 25 lines of code because a variable changed its type is already annoying enough.

4)
For array subscripts, use variables of type size_t. Although int is guaranteed to hold values which are large enough to express all meaningful array indices, size_t is better because it makes the purpose of the variable more clear to the reader, and furthermore makes the compiler issue a warning if you try to assign negative values to it.

5)
You don't need to cast the return value of malloc(). void* is guaranteed to cast implicitly to the desired type.

6)
A function should do one thing well. Your main() function does three things, only one of which has its own function. Do functions for reading and output as well.

7)
Check the return value of malloc(). If there's not enough free memory available, you're in trouble.


If you follow these proposals, your main() will look like this:

Code:
/* read lines from stdin */
size_t read_s(char **lines)
{
        size_t size = 0;
        char tmp[STR_CAP];

        while(fgets(tmp, STR_CAP, stdin) != NULL) {
                lines[size] = malloc(strlen(tmp)+1);   // check for NULL
                strcpy(lines[size++], tmp);
        }

        return size;
}


/* print array of strings */
void print_s(char **lines, size_t size)
{
        size_t i;

        for(i = 0; i < size; i++)
                puts(lines[i]);
}


/* free array of strings */
void free_s(char **lines, size_t size)
{
        size_t i;

        for(i = 0; i < size; i++)
                free(lines[i]);
}


int main()
{
        size_t sz = 0;
        char *lines[LINE_CAP];
  
        sz = read_s(lines);

        quicksort(lines, sz);

        print_s(lines, sz);
        free_s(lines, sz);

        return 0;
}
Now it's clean and simple, there's no need for any further comments, and even my mother could understand it. On the other hand, it may not work because I didn't try to compile it.

And yes, I'm having too much free time.

Greets,
Philip
__________________
All things begin as source code.
Source code begins with an empty file.
-- Tao Te Chip

Last edited by Snafuist; 02-18-2009 at 07:01 PM.
Snafuist is offline   Reply With Quote
Old 02-18-2009, 09:52 PM   #8
Registered User
 
Join Date: Feb 2009
Posts: 278
A couple comments on Philips list here...

1. I think for beginning programmers, there can be no such thing as too many comments (provided they are relative of course!) Especially considering that a good programmer fleshes out the algorithm in words before coding. When he goes to do his next program, or one later in the year or next year, he might find valuable insight into his methodology by reading through his many comments.

2. Using meaningful variable names should never be discouraged. Nothing worse, in my opinion, than variables named line, temp, tmp, size, x, y, count, etc. size of what exactly? Yes, it should be possible to figure out what the meaning of the vague name is by its context, but why make that necessary. Like my comment for #1, he might appreciate his thoroughness in the future. That being said, I, personally, DO use variables such as x, y, etc, but only as incrementers in for loops as long as I'm not using the variable in any other way (such as an array index). I also use temp, but I usually do temp_int, iTemp, temp_mystruct etc, just for clarity sake.

3. I use so-called Hungarian Notation in my programs. It makes a lot of things easier to understand (especially if you use such vague variable names as temp, tmp, etc.) without having to scroll up and down. Secondly, if you have a global variable that you need to change the type of, there's a handy feature of most editors called the replace all. Replacing every iTemp with fTemp a very simple feat. On top of that is the fact that global variables should be avoided in the first place.

3b. Again, in my opinion, there is nothing worse than a bunch of two and three line functions in a program. In the code you provided, the three little functions might be handy to put in a seperate file because you might use them in multiple programs. However there are lots of examples of where splitting up a 75 line function would make it harder to understand rather than easier, not to mention a pain in the ass to keep scrolling up and down when you need to refer to one. I think saying that a good function is limited to 25 lines is too strict, especially for main. I see all the time where people put just one function call inside their main. What's the point of that? I do agree that a function should do one thing MOST times, but some things go together well enough (and always seem to follow one another) and can (and should) be put in the same function.

4. I don't do this myself, but it is a good suggestion.

5. It is generally good practice to do the casting, even though it is not required. I think making it explicit is good for two reasons. First, it makes it certain to the reader exactly what you did, and second, and more importantly, it's a double check for you as you code so you are getting the right kind of pointer. If you cast a malloc to a char and you assign it to a variable declared as a pointer to a structure you get an error. if you don't explicitly do this you don't get an error and then you will get at best junk, at worst, seg faults.

6. See 3b.

7. Agreed. However I don't see in the code you posted where you did that.

Overall, I know what you mean with most of these things, but I don't think they are things that should be discouraged.
Bladactania is offline   Reply With Quote
Old 02-18-2009, 09:58 PM   #9
Registered User
 
Join Date: Feb 2009
Posts: 278
I just re-read my post. I wasn't trying to sound hostile
Bladactania is offline   Reply With Quote
Old 02-19-2009, 12:14 AM   #10
C++ Witch
 
laserlight's Avatar
 
Join Date: Oct 2003
Location: Singapore
Posts: 11,319
Quote:
Originally Posted by Bladactania
I think for beginning programmers, there can be no such thing as too many comments (provided they are relative of course!)
I do not see why there should be a double standard, one for beginners and another for more experienced programmers. If the beginners want to stop being beginners, they have to adopt better practices that might not be typical of beginners who do not know better.

Quote:
Originally Posted by Bladactania
Especially considering that a good programmer fleshes out the algorithm in words before coding.
Restating what the code obviously does is redundant and not the same as describing an algorithm.

Quote:
Using meaningful variable names should never be discouraged. Nothing worse, in my opinion, than variables named line, temp, tmp, size, x, y, count, etc. size of what exactly? Yes, it should be possible to figure out what the meaning of the vague name is by its context, but why make that necessary.
I generally prefer descriptive variable names, but it depends on context, and if you write functions that do one thing and do it well, the context becomes more obvious. As such, the names that you cited as the worst possible certainly are not.

On the other hand, I agree with your objection to Snafuist's renaming (with the caveat that I dislike Hungarian notation, so agree with Snafuist on that point). Although Snafuist's own choice of variable names are not bad, I do not see any reason not to use line instead of tmp (though some object to variable names that differ by plurality) and line_count instead of size and sz.

Quote:
Originally Posted by Bladactania
Again, in my opinion, there is nothing worse than a bunch of two and three line functions in a program.
A 900+ line function is worse, in my opinion. Apparently some people here have seen co-workers write even longer functions, but that is the longest that I have seen written by people I know. I certainly would expect some one liners in the implementation of a library interface, e.g., for a getter function.

Quote:
Originally Posted by Bladactania
However there are lots of examples of where splitting up a 75 line function would make it harder to understand rather than easier, not to mention a pain in the ass to keep scrolling up and down when you need to refer to one.
I do not recall ever seeing a case where appropriately splitting up a function of such length into smaller, well named, functions that do one thing and do it well made it harder to understand the code. There are certainly more examples where splitting up a 75 line function would make it easier to understand the code.

The scrolling objection does not make sense to me: you do not need to scroll very often within the same function because everything fits into the screen, and when you do scroll to refer to another function, you can easily see which function's implementation you are looking at.

Quote:
Originally Posted by Bladactania
It is generally good practice to do the casting, even though it is not required.
There are pros and cons. The main disadvantage that is commonly cited is that in C, functions that are not declared are assumed by the compiler as having an int return type. Casting the return type of malloc() would then hide the fact that <stdlib.h> was not included, and thus result in an error that occurs at run time.

On the other hand, if you want your C code to be compilable by a C++ compiler, the cast becomes necessary since it is necessary in C++.

Quote:
Originally Posted by Bladactania
First, it makes it certain to the reader exactly what you did, and second, and more importantly, it's a double check for you as you code so you are getting the right kind of pointer.
If you write the code correctly, that will not be a problem, e.g.
Code:
p = malloc(sizeof(*p) * n);
No matter what the type of the pointer p, malloc() will allocate the correct number of elements n, or return a null pointer.
__________________
C + C++ Compiler: MinGW port of GCC
Build + Version Control System: SCons + Bazaar

Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
laserlight is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problems with passing an array of structures by pointer raptor1770 C Programming 9 11-29-2008 11:01 AM
Dynamic array of structures containing yet another dynamic array of structures innqubus C Programming 2 07-11-2008 07:39 AM
Following CTools EstateMatt C Programming 5 06-26-2008 10:10 AM
Request for comments Prelude A Brief History of Cprogramming.com 15 01-02-2004 10:33 AM
base class pointer problems ... C++ Programming 3 11-16-2003 11:27 PM


All times are GMT -6. The time now is 06:35 PM.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22