Thread: Question: Stream I/O (working with strings)

  1. #1
    Registered User
    Join Date
    May 2008
    Posts
    16

    Question: Stream I/O (working with strings)

    Hi, I'm new to the Cprogramming board, and I'm hoping to get some help on a little project I am working on. I'm fairly new to programming as well, and I've been going through the K&R text along with the GNU C programming tutorial and some others.

    While certain things have been, at times, frustrating and slow, I really feel like things are picking up, and I'm getting a better grasp on things. I would however, humbly request some help on a program I'm working on regarding string input/output on some files that I'm working with.

    I'm a reporter/editor for an online news site, and I want to be able to take an unformatted text file (specifically an article) and apply certain formatting changes (to our site's specifications) to be able to save our editorial/production staff some time and from a whole lot of tedium. This includes formatting the date, author, source, etc. lines.

    I have a rough idea of how I'll set up certain things so far, but I'm getting tripped up on some of the essentials, and it would help me a ton to get some feedback on it.

    Below is what I have so far... I have to manually interrupt the program with Ctrl-c and it only copies 80% of the text, and I'm not sure why at this point:

    **I'm sure I'm doing something embarrassingly wrong here, and I would very much appreciate it if someone could point that out; and I would love some feedback as far as the most effective way to deal with the input strings (eg. using control loops, etc.) in order to recognize what I need to so I can ouput the formatted text

    Thanks so much!

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    int main() {
    
      int c, i;
      int bytes_read;
      int nbytes = 100;
      char *my_string;
      FILE *stream_f1;
      FILE *stream_f2;
    
      stream_f1 = fopen ("formatted_sub.txt", "w");
      stream_f2 = fopen ("sub.txt", "r");
    
      while (*my_string != EOF) {
        my_string = (char *) malloc (nbytes + 1);
        bytes_read = getline (&my_string, &nbytes, stream_f2);
    
        fprintf (stream_f1, my_string);
    
      }
    
      fclose (stream_f1);
      fclose (stream_f2);
    
      return 0;
    }

  2. #2
    Dr Dipshi++ mike_g's Avatar
    Join Date
    Oct 2006
    Location
    On me hyperplane
    Posts
    1,218
    Is there any particular reason you chose C for this? If speed is not an issue, then you would probably find this sort of stuff much easier to do in a higher level language.

    At the moment you are doing some dodgy stuff. Here:
    Code:
        my_string = (char *) malloc (nbytes + 1);
    You are allocating 101 bytes each tinme your loop runs, but you are never freeing it; resulting in a memory leak. Tbh I dont see any reason why you should need to use malloc here considering you only want an array of 101 bytes. Instead I would just declare my_string as:
    Code:
    char my_string[nbytes+1];
    Personally when I read in from files I generally do it char by char. The first example here:
    http://faq.cprogramming.com/cgi-bin/...&id=1043284392
    Shows how to do that.

  3. #3
    Registered User
    Join Date
    May 2008
    Posts
    16
    Mike, thanks very much for your reply. The reason I'm starting with C is because a good friend of mine who programs in several languages recommended to me that I start with C, since it's a solid base language, and would help me adopt other languages and concepts more easily later on.

    I'll check out the link you sent, and def. rework the call to malloc(). Yeah, I have to admit, I was not familiar with it, but I saw it as part of the call to getline.

    The reason I didn't read in by char is because I thought it'd be easier to work by line with what I'm trying to do... for example:
    If I want to reformat an op-ed article by Bob Herbert from the NYTimes, the unformatted strings might look like:
    Bob Herbert | NY Times columnist...
    and say I want to change it to:
    Author: Bob Herbert
    Source: The New York TImes

    I figured that it would be easier to recognize and manipulate it by line, by setting up some switch or condition statements.

    How would you go about identifying certain strings or substrings after the fact when calling by character?

  4. #4
    Registered User
    Join Date
    Mar 2005
    Location
    Mountaintop, Pa
    Posts
    1,058
    Can you post a small example of your ASCII text input file? It'd make it a lot easier to determine how to tokenize each line.

  5. #5
    Registered User
    Join Date
    May 2008
    Posts
    16
    Certainly. Below would be an example of a few headers (the main parts of the text file that I'll be dealing with), and below that is an example of how the formatted header should be.

    Formatted Example:

    Headline: Texas Communities Sue to Stop Construction of Border Fence

    URL: http://www.mcclatchydc.com/homepage/story/37324.html

    Author: Dave Montgomery

    Source: McClatchy Newspapers

    Date: Friday 16 May 2008
    ---------------------------------

    Unformatted Version:

    http://www.mcclatchydc.com/homepage/story/37324.html
    McClatchy Washington Bureau
    Posted on Fri, May. 16, 2008
    Texas communities sue to stop construction of border fence
    Dave Montgomery | McClatchy Newspapers


    =====================================

    The point is to save all the time of doing these little, mundane, formatting edits.

    So for example, I figured I would set up a really long switch statement to pick up whatever case of the source (in the above example, McClatchy) and output to the new "formatted_sub.txt" with all the necessary augmentations.

    So I have a general idea of the construct, but I'm having trouble figuring out how to actually manipulate the strings to do what I want. the input/ouput from/to the streams is where I get caught up.

    Thanks so much for the time and consideration.

  6. #6
    Registered User
    Join Date
    Mar 2005
    Location
    Mountaintop, Pa
    Posts
    1,058
    One option would be to write a SubStr function that would extract author and newspaper from one text line. For example, this function would read the ASCII text line:

    Dave Montgomery | McClatchy Newspapers
    from index 0 to index 14 to capture Dave Montgomery as the author and then read index 18 to index 37 to capture the name of the newspaper, McClatchey Newspapers as the source. The bar can be used as a reference point for extracting this info. Refer to the strchr function to locate the bar in the ASCII text line. You can also use the SubStr function to extract the date. Headline and URL are really "no brainers".

  7. #7
    Registered User
    Join Date
    May 2008
    Posts
    16
    Thanks Bob; yes, the URL is a no brainer...

    The headline though, isn't necessarily so easy, because it might not appear first, and I obv. don't know what exactly to look for. (what I was thinking was that I could isolate the other elements [date, url, author, etc] since those would be easier to find, and then get the headline by process of elimination.

    What's challenging is the fact that the format varies from publication to publication and the order/syntax of the unformatted article isn't always the same. So I'm def. still doing work to find the best way to do that.

    What is vexing to me, however, is the simple logistics of it. I've been browsing through a bunch of tutorials, but I'm still not sure the best way to identify a certain string/substring.

    so if I want to, for example, recognize "http", and automatically assign that line (defined as all text until '\n'), to "URL: ".. what's the best way of searching for http, and then subsequently isolating that string to assign to some variable later on.

    Thank again to everyone who's taken the time to respond to this; I know some of this may seem super elementary, but after going through several diff. tutorials/howtos, I"m still having probs with some of the basics here.

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Code:
      while (*my_string != EOF) {
    Does this work? Because you are trying to dereference my_string which is uninitialized, which would (probably) mean a crash instantly. You need to allocate memory, copy in EOF and the you can check without problems.

    Code:
        my_string = (char *) malloc (nbytes + 1);
    In C, you should never cast the return of malloc.

    Code:
        bytes_read = getline (&my_string, &nbytes, stream_f2);
    I have never seen getline for C, but I'm pretty sure it takes a char* buffer instead of char**, which would would mean the first part is horribly wrong.

    Code:
        fprintf (stream_f1, my_string);
    Never do this either. You have no idea what's stored inside my_string, so it can contain format specifiers which can cause all sorts of weirdness and cause crashes. To write a string, use fputs.

    And aside from that, as they say, you never free the memory.

    Quote Originally Posted by ckuttruff View Post
    Mike, thanks very much for your reply. The reason I'm starting with C is because a good friend of mine who programs in several languages recommended to me that I start with C, since it's a solid base language, and would help me adopt other languages and concepts more easily later on.
    I don't know about that. I'd say you'll get better experience with C++. C is the lowest of lowest so to speak, and no other language that I know come close to C. C++ is a little higher level and will probably give you a little more to go on. Well, that's what I think anyway.
    C++ is always to prefer over C. In embedded, you might need C++/C, though.

    I'll check out the link you sent, and def. rework the call to malloc(). Yeah, I have to admit, I was not familiar with it, but I saw it as part of the call to getline.
    It looks like you have programming studying to do. C isn't just a language you can mess around with without understanding it...
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  9. #9
    Banned
    Join Date
    Nov 2007
    Posts
    678
    Quote Originally Posted by ckuttruff View Post
    I'm a reporter/editor for an online news site, and I want to be able to take an unformatted text file (specifically an article) and apply certain formatting changes (to our site's specifications) to be able to save our editorial/production staff some time and from a whole lot of tedium. This includes formatting the date, author, source, etc. lines.
    Just as a casual aside, is it a specific requirement to do this in C?
    I mean it will be much simpler and faster to do such kind of text formatting, in a scripting language, say python. And you are a reporter/editor so you need not do it the hardcore programmer way.

    Just a little aside, ignore if I bothered you.

  10. #10
    Registered User
    Join Date
    May 2008
    Posts
    16
    Elysia,

    Thanks very much for your response on this. Yes, I'm now going through the cprogramming.com tutorials on strings and pointers. I've been jumping from text to text, trying to solidify concepts.

    Yeah, I guess C probably wasn't the best choice to start out with; but I'm not the quitting type ; ) So back to the tutorials and howto sections.

    Really though, thanks for taking the time to give some feedback; your comments are very helpful.

  11. #11
    Registered User
    Join Date
    May 2008
    Posts
    16
    And Manav,

    Not at all an unreasonable suggestion; I think I will def. check out python (esp. since I use GNU/Linux, and python would be very useful to be familiar with). I certainly want to learn C, but for my current project, your suggestion is a very good one.

    Thanks

  12. #12
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    So, what are you targeting? GUI programming? Embedded programming?
    GUI programming is something C++ is best suited for.
    C++/C is better for lower level, such as embedded devices.
    Note that C++ can also do C, which I like to call C++/C. So by going C++, you're not really missing out on a whole lot. In fact, I'd say you gain a lot.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  13. #13
    Registered User
    Join Date
    May 2008
    Posts
    16
    Eventually I want to do some GUI stuff. But honestly, when I first started using Slackware 12 (my first linux distro), I wanted a deeper understanding of computing in general (which is why, in part, I was attracted to C). It's refreshing to look at certain system files and have at least a vague understanding of what's going on.

    But yeah, eventually I would want to be able to work on some open source projects. Programming is just satisfying. It's challenging, extremely useful, and fulfilling. I mostly went off a friend's advice to start with C and the K&R text, but if you really think C++ would be more advantageous, I would certainly take that into consideration.

    It's just hard to know where to start ( I feel like I'm coming so late in the game ). it's overwhelming since there are so many options/differing points of view.

  14. #14
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    If you want to know what's going on behind the scenes and still be able to create applications without too much trouble, then I definitely say C++ is for you. C++ is not RAD like VB or C#, so it's still low level, but it's higher than C and will make life easier when making programs that do not require such low level.
    C++ also comes with the STL, which can do a lot for you as a programmer, although it might not be exactly the best interface. Plus a lot of frameworks are written for C++ in mind.
    If you still think later than you would like more low understanding, then it's no problem, seeing as C++ is an extension of C, so you can absolutely do C in C++ too, and even mix the two, if you feel like it.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  15. #15
    Registered User
    Join Date
    May 2008
    Posts
    16
    Thanks for all your help Elysia.

    I think I'll take your advice. At this point the combination of trying to understand the fundamental syntactical/algorithmic elements of programming and dealing with the lower-level memory management stuff is a bit overwhelming.

    I can def. see what you're saying, and it makes sense to come back to C after gaining a more solid foundation; then I'll actually be able to appreciate the efficiency of the lower-level operations without feeling confused on too many fronts at once.

    Thanks again; I'll def. be back. I haven't had much experience with forums (esp. where programming is concerned), and you've been very helpful/not at all condescending

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Question on strings
    By papagaio in forum C Programming
    Replies: 3
    Last Post: 09-19-2008, 03:15 AM
  2. question about reading in strings from a file :>
    By bball887 in forum C Programming
    Replies: 8
    Last Post: 04-13-2004, 06:24 PM
  3. Replies: 2
    Last Post: 05-12-2003, 04:40 PM
  4. c++ i/o question
    By ArtVandalay in forum C++ Programming
    Replies: 1
    Last Post: 06-12-2002, 09:34 PM
  5. File I/O and Strings
    By Unregistered in forum C++ Programming
    Replies: 2
    Last Post: 05-25-2002, 10:02 PM