fgets only returns 3 characters

**Jesdisciple** · 08-26-2008

Originally Posted by vart

if the last read operation failed - you maybe interested to know - it is due to the end of file - or some read error - only in this case it is suitable to use feof

How do I know about a failure? And how can feof help me recover?

Originally Posted by vart

yes

On both counts, I assume. Why is linking necessary if includes still work?

Originally Posted by vart

have you checked the return value of fgets? Probably you are using the value left from the previos call

Yes, the garbage is present directly after the call to fgets... (A statement is executed after I proceed to the next stement with next.)

Code:

Breakpoint 2, readLine (length=32, stream=0x804b008) at main.c:9
9	    char *result = malloc(length * sizeof(char));
(gdb) next
10	    fgets(result, length, stream);
(gdb) print result
$14 = 0x804b358 ""
(gdb) next
11	    char *last = &result[strlen(result) - 1];
(gdb) print result
$15 = 0x804b358 "\201\004\b\024\201\004\b\023"
(gdb) next
12	    if(*last == '\n'){
(gdb) next
15	    return result;
(gdb) next
16	}

Originally Posted by vart

all functions you have placed in the header files

Are you only referring to checks for NULL?

**arpsmack** · 08-27-2008

I think you need to get the hang of incremental development. Start very simple. For example, just make a basic linked list, and write a very simple test program that uses it. Before taking any more steps forward, TEST IT AND MAKE SURE IT WORKS.

Then add another feature. Test it. Add another... etc... Plowing through a bunch of code, and then trying to fix all the mistakes you've made is not a very good development strategy.

I also fail to see the point of much of the stuff you are trying to write. For example, it seems you're trying to create a list in which each list item's value can be of a different type. This is a terribly bad idea IMO! This is why most modern languages use generics to provide type-safe lists that won't allow this (with the exception of inherited types, but C doesn't have those).

Your "charr" struct seems mostly pointless. About the only thing that it provides that I could call useful is the ability to create a dynamically allocated version of a string. But a simple strdup() function would do the trick here....

The object struct seems equally as pointless, especially given that it only seems to be used for having a list containing elements of all sorts of different types (as I mentioned, I think this is a bad idea).

I noticed you are putting a lot of effort into removing duplicates from your list. If that is indeed your intention, then you're using the wrong data structure to begin with. A set is a collection of unique items, and sets (usually) aren't implemented as linked lists. It would be much better to use something like a hash table or a tree.

The first element in argv[] is reserved for the name of your application. The real commandline arguments start at index 1. Consequently, argc is actually 1 greater than the number of actual arguments.

Sorry this post is just a mishmash of thoughts. I provided a simple example of a program that uses a linked list to read in every line in a file (specified on the commandline), and then prints them back out to stdout (freeing each stored string after it prints). It also demonstrates the concept of separating your type definitions and function prototypes into a header file, and your implementation into a .c source file.

You of course must specify all your source files on the commandline when compiling, so use something like this:

gcc -Wall main.c list.c

**Jesdisciple** · 08-27-2008

EDIT: I just realized this slipped through the cracks; I thought I had replied to it:

strings_t is not a reserved keyword, and the only standard library for strings is string.h, which operates on char arrays. So, why aren't you using the standard library, yourself?

I didn't know about the standard library, and I still don't know enough to use it. How can I read GCC's libraries?

Originally Posted by arpsmack

I think you need to get the hang of incremental development. Start very simple. For example, just make a basic linked list, and write a very simple test program that uses it. Before taking any more steps forward, TEST IT AND MAKE SURE IT WORKS.

Then add another feature. Test it. Add another... etc... Plowing through a bunch of code, and then trying to fix all the mistakes you've made is not a very good development strategy.

Yeah, that's a bad habit I've picked up from programming as a hobby.

Originally Posted by arpsmack

I also fail to see the point of much of the stuff you are trying to write. For example, it seems you're trying to create a list in which each list item's value can be of a different type. This is a terribly bad idea IMO! This is why most modern languages use generics to provide type-safe lists that won't allow this (with the exception of inherited types, but C doesn't have those).

It's not that I want a list that can hold mixed values but that I want a list implementation that can handle any type. Because C doesn't have generics, they end up being the same thing.

Originally Posted by arpsmack

Your "charr" struct seems mostly pointless. About the only thing that it provides that I could call useful is the ability to create a dynamically allocated version of a string. But a simple strdup() function would do the trick here....

It stemmed from 1) this thread at an earlier stage and 2) my own misconception. Now its use for me boils down to more accurate error reporting because strlen is called directly after the string is created rather than halfway down the containing function.

Originally Posted by arpsmack

I noticed you are putting a lot of effort into removing duplicates from your list. If that is indeed your intention, then you're using the wrong data structure to begin with. A set is a collection of unique items, and sets (usually) aren't implemented as linked lists. It would be much better to use something like a hash table or a tree.

Sets and maps don't maintain insertion order, which was the original purpose of the list. The uniqueness was an afterthought that arose from my specific assignment (which I picked up on a forum and assigned to myself), and I use a Boolean flag for it so the implementation isn't inherently unique.

Originally Posted by arpsmack

The first element in argv[] is reserved for the name of your application. The real commandline arguments start at index 1. Consequently, argc is actually 1 greater than the number of actual arguments.

That explains why I was getting garbage... I was reading my binary application.

Originally Posted by arpsmack

Sorry this post is just a mishmash of thoughts. I provided a simple example of a program that uses a linked list to read in every line in a file (specified on the commandline), and then prints them back out to stdout (freeing each stored string after it prints).

By the way, what's this line for?

Code:

    while (node != NULL) {
        puts((char*)node->value);
        free(node->value);
        node = node->next;
    }

Originally Posted by arpsmack

It also demonstrates the concept of separating your type definitions and function prototypes into a header file, and your implementation into a .c source file.

That's something I've been trying to figure out... What use is a declaration without a definition? I understand that it allows two files which are due for linking to have the same expectation of every function at compile time, but my own system of putting definitions in header files accomplishes that without the linking.

Originally Posted by arpsmack

You of course must specify all your source files on the commandline when compiling, so use something like this:

gcc -Wall main.c list.c

I've been reading a discussion of that, so there's my first contribution to this forum.

**tabstop** · 08-27-2008

I'm not even going to try to find all the questions to quote them.

1. You don't want \n in your strings, because it's not part of your data. If you're storing someone's name (for instance), their name doesn't have a new-line character in it. So if you now need to print out a form with name and age on the same line ....

2. Your system of putting definitions in header files accomplishes your task, provided you never have more than one file. Once you have two #includes of the same file (say main.c includes aux.h and other_code.c includes aux.h), you've now defined the functions twice, which Doesn't Work. There's code that I'm possibly inheriting soon at work that comes in eleven source code files. And it's not really that complex, as actual programs go. (And, of course, each of the source code files need to include the headers, because that's where the structs are defined and so on.)

3. puts puts a string on the screen.

4. All the reading things, like scanf, return values. So you might do something like this:

Code:

check = fscanf(infile, "&#37;d %s: %d", &int1, str1, &int2);

If check == 3 after this statement, then everything's good (since *scanf return the number of things that were read in). If check != 3, then the read failed.

Edit: You don't want to read gcc's libraries, because that's really well-optimized (read: makes no sense) code. You want to know what's in the libraries -- and (most) good C books will have a list of what the libraries are and what's in them. (I have a worksheet I made for students that has the highlights in it; I can send it to you if you want.)

**Jesdisciple** · 08-27-2008

2. But I wouldn't be linking in the first place; I would include other_code.h from main.c.

Yes, I would appreciate the highlights very much.

Code:

typedef struct Node Node;

struct Node {
   Node *prev, *next;
   void *value;
};

typedef struct List {
   Node *first, *last;
   size_t length;
} List;

What's the logic behind the red sections? I see no practical difference between typedef versus no typedef, or declaring global instances versus not. EDIT: Oh, is that what makes

Code:

List *list_create(void);

possible as opposed to

Code:

struct List *list_create(void);

?

**matsp** · 08-27-2008

Originally Posted by tabstop

Edit: You don't want to read gcc's libraries, because that's really well-optimized (read: makes no sense) code.

Not only is it well-optimized, but it's also written to be highly portable, which means that almost everything has extra levels of indirection and heavy use of macros to allow alternative implementations/optional solutions. Both of these (along with the optimization) makes it very hard to follow the code if you are not skilled at reading complex C code [it's hard then too, a bit like "legal english" is hard to read even if you are used to it, if you compare it with for example childrens books - but if you have no training in reading legal english, it's almost incomprehensible - likewise with glibc (or any other larger C library)]

**arpsmack** · 08-27-2008

Separating your interface (.h files) from your implementation (.c files) is important for a couple key reasons

Let's say you turn this program of yours into a pretty sweet library of data structures. Now some people want to buy it off you, but you don't want them to know how you implemented it. Since you separated your headers from your implementation code, you can compile the project into a shared library. If you sell them the compiled binary, they won't be able to (easily) look at your implementation, and you can still provide them the header files that contain the type definitions and function prototypes so they'll be able to use it.

Since your project is now split up into several .c files, you can take advantage of the power of Makefiles. If you make a change to "list.c", then only that one file will need to be re-compiled, and then the linker can link all the pieces together. If you put all of your implementation in header files, and include them all in one implementation file "main.c", the WHOLE project will need to be recompiled every time you make a change.

The header files won't be mucked up by all the implementation code, so they'll look a lot cleaner. This can provide people a much better overview of your code and how to use it.

I'm sure there are other reasons, but I can't think of any more off the top of my head.

I use the typedef's for precisely the reason you mentioned. I think writing "struct list *" and "struct node *" all over the place looks ugly. Also note that I capitalize the name of typedef'd structs. This is so its obvious to me, when I see a type that begins with a capital letter, I know its a struct.

Also, I forward declared the Node typedef so that the compiler wouldn't complain when I used Node inside the typedef. Without the forward declaration I would have had to write it like this:

Code:

typedef struct Node {
   struct Node *prev, *next;
   void *value;
} Node;

Its not really a big deal either way, but I chose to be more consistent with how I declared Node variables.

[edit]
A couple more things:

Originally Posted by Jesdisciple

It's not that I want a list that can hold mixed values but that I want a list implementation that can handle any type.

I guess my point here is that you don't need the object struct to achieve that. Just use a void*. The only thing the object struct added that a void* wouldn't be able to provide is: a string representing the type of the object, and the size of the stored object.

Neither of those things would be needed on a per-node basis unless you intended to store objects of different types and sizes...

Originally Posted by Jesdisciple

Sets and maps don't maintain insertion order, which was the original purpose of the list.

Sets and Maps can be modified slightly to keep track of insertion order, while still maintaining the desirable properties like (possibly) having the items sorted, and having fast insertions/removals. In fact as proof, I believe the Java standard library provides a hash table that allows you to iterate through the items in the order they were inserted.

All you have to do is maintain a set of links in addition to the tree or table structure.
[/edit]

**Jesdisciple** · 08-28-2008

Originally Posted by arpsmack

Let's say you turn this program of yours into a pretty sweet library of data structures. Now some people want to buy it off you, but you don't want them to know how you implemented it. Since you separated your headers from your implementation code, you can compile the project into a shared library. If you sell them the compiled binary, they won't be able to (easily) look at your implementation, and you can still provide them the header files that contain the type definitions and function prototypes so they'll be able to use it.

OK, but I can still separate declarations into one header file and definitions into another, i.e. I still don't need to link. Although this is a pretty far-fetched idea regardless.

Originally Posted by arpsmack

Since your project is now split up into several .c files, you can take advantage of the power of Makefiles. If you make a change to "list.c", then only that one file will need to be re-compiled, and then the linker can link all the pieces together. If you put all of your implementation in header files, and include them all in one implementation file "main.c", the WHOLE project will need to be recompiled every time you make a change.

This is a valid reason, but I don't understand why it matters... Do large files really take that long to compile?

Originally Posted by arpsmack

The header files won't be mucked up by all the implementation code, so they'll look a lot cleaner. This can provide people a much better overview of your code and how to use it.

See my response to your first reason.

Originally Posted by arpsmack

I guess my point here is that you don't need the object struct to achieve that. Just use a void*. The only thing the object struct added that a void* wouldn't be able to provide is: a string representing the type of the object, and the size of the stored object.

But then if I happen to insert the wrong type into the list, I get a very confusing segfault later on.

Originally Posted by arpsmack

Sets and Maps can be modified slightly to keep track of insertion order, while still maintaining the desirable properties like (possibly) having the items sorted, and having fast insertions/removals. In fact as proof, I believe the Java standard library provides a hash table that allows you to iterate through the items in the order they were inserted.

All you have to do is maintain a set of links in addition to the tree or table structure.

Good point.

**tabstop** · 08-28-2008

Originally Posted by Jesdisciple

This is a valid reason, but I don't understand why it matters... Do large files really take that long to compile?

The GMP took me an hour-and-a-half or so to compile (from scratch) on a rather new windows box. Call it two hours, with figuring out where some of the files went (since the unix scripts weren't quite right for my machine).

**laserlight** · 08-28-2008

OK, but I can still separate declarations into one header file and definitions into another, i.e. I still don't need to link.

If you do that, you would have to provide the source code to your customers who want to use your library.

This is a valid reason, but I don't understand why it matters... Do large files really take that long to compile?

If you have enough code, a full build can take hours, or perhaps even days.

**tabstop** · 08-28-2008

Originally Posted by Jesdisciple

Yes, I would appreciate the highlights very much.

Alright, I managed to get them posted on the web. (Feel free to flame away -- the idea is that the students would know/could look up more details about the functions, so I didn't get into too many specifics there, which maybe I should.)

**arpsmack** · 08-28-2008

For a small toy project like yours, you're probably not going to see the advantage of separating your implementation from your headers. Doing development on a large, robust codebase, this can make all the difference in the world.

Also, if you use open source software and like to tinker with things, it may help a lot. This obviously doesn't apply much in the Windows world since Windows users typically want packaged binaries and setup applications. However, in the Linux world there are a lot of people who maintain systems that they compiled from scratch (Gentoo!). Here's a small list of programs that could take all day to compile, even if you have a fairly modern machine:

OpenOffice
The Linux kernel
X
KDE / Gnome
Firefox
The C runtime library (glibc)
GCC
etc...

Originally Posted by Jesdisciple

But then if I happen to insert the wrong type into the list, I get a very confusing segfault later on.

Well, in the C world, the burden is usually on the programmer to make sure they don't do stupid things like this. So my response to you would be: don't insert the wrong type into the list!

If you're seriously looking for robust type-safety, bounds checking, exception handling, memory management etc... then C is not your language.

Now, that being said, I understand that this is more of a learning exercise. So here are two alternatives to what you were doing:

1) Implement the list by using void*'s as I suggested. Then, in the application, include wrapper functions around the insertion functions that accept the type you want to add to the list. That way you'll get compile time type checking, and you won't need to clutter your list with all that extra garbage. It doesn't even require that much code. Lets say we wanted to maintain a queue of Car structs in the program:

Code:

void enqueue_car(List *list, Car *car) {
   list_add_first(list, car);
}
Car *dequeue_car(List *list) {
   return list_remove_last(list);
}

2) If you insist on storing a string representing the element type of the list (which doesn't sound like a very elegant solution to me), at least store it at the list level, rather than per node. The size information really isn't needed unless your list is going to be allocating memory for its elements or accessing them for some reason.

Which brings me to another point that has already been mentioned: your list doesn't "own" its items. This is (usually) a bad design. When you store something in a list, you expect it to remain there until you free the list. The way you implemented it, the list just stores pointers to objects that some other code "owns". If that other code free's one of the objects, you now have a bad pointer stored somewhere in your list.

A solution to this, which again has already been posted, is to maintain a set of function pointers at the list level. When the user creates a list, they provide a set of function pointers that can manipulate the elements. Using the Car example again:

Code:

typedef struct List {
   Node *first, *last;
   size_t length;
   void *(*copy_func)(void *element);
   void (*free_func)(void *element);
};

...

Car my_car;
List *my_list = list_create(car_copy, car_free);
list_add_first(&my_car);
list_free(my_list);

Now when you go to add an item to the list, the add function can make a copy of the Car by using the car_copy() function in the List struct, and when you call list_free() it can free all of the Car copies it has by calling car_free(). And best of all, since the list is the ONLY thing that has pointers to these, no other code can free them.

**matsp** · 08-29-2008

Can I just point out that building the Linux kernel hasn't taken a whole day for about 10 years (on a machine that you could buy in that year, that is) - last time I built a new kernel, including downloading the source from kernel.org, I think the whole process took about 45 minutes or so - that's on a dual core Athlon64 @ 2.6GHz with 2GB of RAM. GCC on a similar machine would take perhaps an hour or two. OpenOffice takes much longer, mostly because it's so many SMALL components. [It's like a gazillion little .so files that all combine together].

But of course, you don't want to spend 45 minutes or a couple of hours compiling a project every time you fix one small thing... Just rebuilding what you've changed should be enough, and that should hopefully only take a few seconds or a couple of minutes if you changed some "included many places" header file.

--
Mats

**Jesdisciple** · 08-29-2008

Originally Posted by laserlight

If you do that, you would have to provide the source code to your customers who want to use your library.

No, I would only distribute the declaring header files; they would never see the implementation.

Originally Posted by tabstop

Alright, I managed to get them posted on the web. (Feel free to flame away -- the idea is that the students would know/could look up more details about the functions, so I didn't get into too many specifics there, which maybe I should.)

Thanks! (You leave me wondering why I would flame you...)

Originally Posted by arpsmack

Also, if you use open source software and like to tinker with things, it may help a lot. This obviously doesn't apply much in the Windows world since Windows users typically want packaged binaries and setup applications. However, in the Linux world there are a lot of people who maintain systems that they compiled from scratch (Gentoo!). Here's a small list of programs that could take all day to compile, even if you have a fairly modern machine:

I use FOSS, but I don't normally compile it. (I'm on Ubuntu.)

Originally Posted by arpsmack

1) Implement the list by using void*'s as I suggested. Then, in the application, include wrapper functions around the insertion functions that accept the type you want to add to the list. That way you'll get compile time type checking, and you won't need to clutter your list with all that extra garbage. It doesn't even require that much code. Lets say we wanted to maintain a queue of Car structs in the program:

That's a good idea I hadn't thought of.

Originally Posted by arpsmack

2) If you insist on storing a string representing the element type of the list (which doesn't sound like a very elegant solution to me), at least store it at the list level, rather than per node. The size information really isn't needed unless your list is going to be allocating memory for its elements or accessing them for some reason.

I was actually considering this, but I wanted an element and its type info to be available to any list that might handle it.

Originally Posted by arpsmack

Which brings me to another point that has already been mentioned: your list doesn't "own" its items. This is (usually) a bad design. When you store something in a list, you expect it to remain there until you free the list. The way you implemented it, the list just stores pointers to objects that some other code "owns". If that other code free's one of the objects, you now have a bad pointer stored somewhere in your list.

I figured the other code would remove the pointer from the list at that time.

**laserlight** · 08-29-2008

No, I would only distribute the declaring header files; they would never see the implementation.

Ah, but with no source files, all the implementation is in the header files, so how can your clients both not see the implementation and yet be able to use it unless you end up treating your other header file as a source file (and thus compile and link its respective object file), upon which you might as well admit that it is a source file and use a conventional file extension for it?