Thread: identifying first letter of word

  1. #1
    Registered User
    Join Date
    Mar 2006
    Posts
    2

    identifying first letter of word

    hi
    i'd like to know how you could get a c program to identify if a particular character is the first letter in a word (so then I can execute an instruction to alter it in some way)
    for example, capitalizing the first letter of a word if it starts with the letter 'b' (which would be done with printf, and not any instruction for capitalizing)
    thanks in advance,
    jeff.

  2. #2
    Registered User
    Join Date
    Mar 2006
    Posts
    9
    Code:
    If(word[0]=='b')
    {
        //do stuff
    }
    Gives you the first character.
    Is that what you mean to do?
    Greetings, Wezel

  3. #3
    Registered User
    Join Date
    Feb 2006
    Location
    Sydney, Australia
    Posts
    40
    Quote Originally Posted by Wezel
    Code:
    If(word[0]=='b')
    {
        //do stuff
    }
    Gives you the first character.
    Is that what you mean to do?
    Greetings, Wezel

    Ooh, sorry to hi-jack the thread but this touches on something that I am just learning at the moment and I'd like to ask a couple of related questions, if I may.

    Please bear in mind that I am pretty much still a newbie so I apologise if this seems an obvious question. It's something that I wasn't taught very well in class and I am by no means certain that I fully understand it.

    Here's what I know (or, more accurately, think I know)..

    1) Strings are just a collection (is array the right word here?) of characters in memory with a null (\0 or ACSII zero) at the end to signify the end of the string stored in that particular variable.

    2) The length of the string (or array - assuming that I am using that term correctly), is defined by the number in square brackets when the variable is declared. For example...

    char my_word[20];

    would be an array of 20 characters (actually only 19 if you don't include the null).

    Am I correct so far?

    Assuming I am...

    If you wanted to reference, say, the 4th character in that array, could you do so by doing something like what Wezel has done above? ie. something like my_word[4] (I am not sure if that is the right syntax or not but you get the picture, I hope).

    To clarify what I am trying to get at, here's some code by way of an example. My apologies if I get the syntax wrong. If anyone feels the need to correct me, I'd appreciate the instruction...

    char my_word[27] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    printf("The 4th character of the variable my_word is %c", my_word[4]);

    Would the output be the letter E? (I assume it starts with A on zero, not 1, so it wouldn't be D, right?). Can you reference items in an array as easy as this? If you can, is this good practice or is it yet another one of those things that will probably work but is poor practice...

    Another different-but-related issue that has confused me a bit is what would happen if the string entered into the my_word[20] variable, ended up being longer than the 20 characters defined when the variable was declared? What would happen to the overflowed characters?

    As I understand it (and I'm almost certain that I misunderstand it), the extra charcters would get loaded into memory but would be outside the bounds of the defined variable (just after it, in fact). That seems logical enough to me and this was the theory that I started out with when I set out to test it.

    I wrote a small program that intentionally overran the size of a variable. I created a variable 10 characters long (my_word[10]) and entered a 12 character word into it ("ABCDEFGHIJKL"). I then printed it out to the screen (In case it makes a difference, I used scanf() to read the 12 character word from a user input and printf() to print it to the screen).

    Now, I expected the last 3 characters to just go missing or, at the very least, just be garbage but to my surprise, they all came back nicely and printed.

    This resulted in some very colourful language on my part because I thought that I had understood how it all worked until this point and was now just as confused as ever.

    So can anyone suggest a plausable reason for why it worked? Surely the variable doesn't allow you to enter more chracters than it is defined for. Frankly, I'm stumped.

    Thanks,
    TV

  4. #4
    ex-DECcie
    Join Date
    Dec 2005
    Posts
    125
    Quote Originally Posted by tvsinesperanto

    1) Strings are just a collection (is array the right word here?) of characters in memory with a null (\0 or ACSII zero) at the end to signify the end of the string stored in that particular variable.
    Essentially, yes. Strings in C are null-terminated.

    2) The length of the string (or array - assuming that I am using that term correctly), is defined by the number in square brackets when the variable is declared. For example...

    char my_word[20];

    would be an array of 20 characters (actually only 19 if you don't include the null).

    Am I correct so far?
    The size of the my_word is 20. (i.e. sizeof(my_word)).
    Assuming the string is 19 characters, and terminated with a null, the strlen(my_word) would be 19.


    If you wanted to reference, say, the 4th character in that array, could you do so by doing something like what Wezel has done above? ie. something like my_word[4] (I am not sure if that is the right syntax or not but you get the picture, I hope).
    The 4th element would be my_word[3].

    To clarify what I am trying to get at, here's some code by way of an example. My apologies if I get the syntax wrong. If anyone feels the need to correct me, I'd appreciate the instruction...

    char my_word[27] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    printf("The 4th character of the variable my_word is %c", my_word[4]);

    Would the output be the letter E? (I assume it starts with A on zero, not 1, so it wouldn't be D, right?). Can you reference items in an array as easy as this? If you can, is this good practice or is it yet another one of those things that will probably work but is poor practice...
    4th character would be my_word[3] or 'D'

    You seem to be mixing up your indexes with your ordinals. Fourth character is 'D' (A,B,C,D), but the index number of the 4th character is 3 (0,1,2,3).


    Another different-but-related issue that has confused me a bit is what would happen if the string entered into the my_word[20] variable, ended up being longer than the 20 characters defined when the variable was declared? What would happen to the overflowed characters?

    As I understand it (and I'm almost certain that I misunderstand it), the extra charcters would get loaded into memory but would be outside the bounds of the defined variable (just after it, in fact). That seems logical enough to me and this was the theory that I started out with when I set out to test it.

    I wrote a small program that intentionally overran the size of a variable. I created a variable 10 characters long (my_word[10]) and entered a 12 character word into it ("ABCDEFGHIJKL"). I then printed it out to the screen (In case it makes a difference, I used scanf() to read the 12 character word from a user input and printf() to print it to the screen).

    Now, I expected the last 3 characters to just go missing or, at the very least, just be garbage but to my surprise, they all came back nicely and printed.

    This resulted in some very colourful language on my part because I thought that I had understood how it all worked until this point and was now just as confused as ever.

    So can anyone suggest a plausable reason for why it worked? Surely the variable doesn't allow you to enter more chracters than it is defined for. Frankly, I'm stumped.

    Thanks,
    TV
    Without seeing the code, I would hazard a guess that the program was small enough that your programs wrote the extra characters into an area of memory that did not affect the overall running of the program. In a word, you got lucky.

    In a larger program, with many variables, larger structures, and whatnot, a buffer overflow like that, if it does not crash the program outright (another lucky occurrence) will leave you tearing your hair out trying to find out why the program is behaving strangely.......


    Hope this all helped.
    Mr. Blonde: You ever listen to K-Billy's "Super Sounds of the Seventies" weekend? It's my personal favorite.

  5. #5
    ex-DECcie
    Join Date
    Dec 2005
    Posts
    125
    In response to the original post, the previous reply of word[0] will tell you the first character in the string.

    I suspect however that what you are wanting to do is to parse a longer string into work tokens. If that is the case, then you need to decide what your delimiters for a word are. Usually that would be whitespace (a space, a tab etc).

    You can traverse the string, character by character, and when you hit a delimiter, you know that the next character you hit is the start of a word.

    Of course, this is an oversimplification, but it might give you a starting point.
    Mr. Blonde: You ever listen to K-Billy's "Super Sounds of the Seventies" weekend? It's my personal favorite.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    > Am I correct so far?
    Yes.

    > char my_word[27] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    It's usually more convenient to just omit the size in this case, and let the compiler do the counting
    char my_word[ ] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    > What would happen to the overflowed characters?
    They would overwrite some memory which you didn't own.
    char arrays are often padded to the next multiple of 4 or 8 bytes, to preserve memory alignment for other variables declared in the same scope, so adding a couple of chars to a 10-character array may only just overwite padding space anyway, but that doesn't make it any safer.
    It's only when you make your arrays a bit larger, or you type in a few more excess characters that you would start to notice odd things happening.

    > So can anyone suggest a plausable reason for why it worked?
    Small overwrites seldom produce a noticeable effect when you test the result immediately - its only when the owner of the memory you trashed steps in that you notice all the weirdness.
    Eg. depending on how the compiler orders these things in memory.
    Code:
    int number = 10;
    char string[10];
    // now read in 12 characters
    
    // number may or may not be still 10
    
    // set number to something
    number = 20;
    
    // string may or may not contain the tail of the characters you typed in.
    > Surely the variable doesn't allow you to enter more chracters than it is defined for.
    You should always use fgets() for reading input into a string, since that is passed a size parameter.
    Simple things like gets() and scanf("%s") offer no protection at all for buffer overflow.
    scanf can be made safe, but the syntax is cumbersome to say the least.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Registered User
    Join Date
    Feb 2006
    Location
    Sydney, Australia
    Posts
    40
    Quote Originally Posted by Salem
    It's usually more convenient to just omit the size in this case, and let the compiler do the counting
    char my_word[ ] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
    Ahhh, cool! I didn't know that you could do that. I'm guessing that the compiler(?) just makes the array big enough to accept the data no matter how many characters it is? Correct?

    If so, that is so cool. I will be using that a lot. Thank you, that is a great tip.

    Quote Originally Posted by fgw_three
    You seem to be mixing up your indexes with your ordinals. Fourth character is 'D' (A,B,C,D), but the index number of the 4th character is 3 (0,1,2,3).
    Yeah, I think I am. I do understand it though, I just messed up the way I described it in my post. The 4th character is number 3 because the first character is number 0. Got it. Thanks for the clarification.

    Quote Originally Posted by fgw_three
    I would hazard a guess that the program was small enough that your programs wrote the extra characters into an area of memory that did not affect the overall running of the program. In a word, you got lucky.
    Ahhhh, yes, this could well be the case. It was a very small program, you might even call it trivial. About 5 lines. OK, so I got lucky this time but I can't rely on it. That is good because it means that I did understand the concept, it was just a weird situation. I'm glad!

    Quote Originally Posted by Salem
    They would overwrite some memory which you didn't own.
    char arrays are often padded to the next multiple of 4 or 8 bytes, to preserve memory alignment for other variables declared in the same scope, so adding a couple of chars to a 10-character array may only just overwite padding space anyway, but that doesn't make it any safer.
    It's only when you make your arrays a bit larger, or you type in a few more excess characters that you would start to notice odd things happening.
    Yes, this could also be the case (assuming I understand what you are saying correctly). What I think you mean is that, because variables of different types (eg. int, float, double, etc) generally come in powers of 2 (2, 4, 8, etc), if a char array is declared as, say, 10 bytes, it would get rounded up to 16(?) bytes. Is that correct? So the last 6 bytes, even though they are not actually used by the variable, the memory for those 6 bytes are allocated to that variable anyway. Is this to allow for some overflow for the string or is it for some memory management reason? Something else altogether?


    Quote Originally Posted by fgw_three
    The size of the my_word is 20. (i.e. sizeof(my_word)).
    Assuming the string is 19 characters, and terminated with a null, the strlen(my_word) would be 19.
    In class, we have not yet learnt what sizeof() and strlen() do but I can take a pretty good guess from the context here. From what you have said, sizeof(my_word) will return 20 in this case and strlen(my_word) will return 19. Can you tell me why they return different values? It's only an educated guess but is it that sizeof() returns the size of the array declared and strlen() only returns the size of the string that is stored in that array? That would seem to make sense to me but, obviously, I'm not sure if that is actually what is happening. If you are able to clarify this for me, I'd appreciate it.

    Thanks once again for all your help, guys. I really appreciate that so many of you are willing to take the time to explain these things (repeatedly in some cases). I am finding that your collective advice and assistance is really helping me to get my head around some of the more abstruse aspects of C that aren't really covered all that well in text books or in class (especially with my teacher) and that you would otherwise only learn after months, or perhaps even years, of first hand coding. Thank you.

    Cheers,
    TV

  8. #8
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    Quote Originally Posted by tvsinesperanto
    In class, we have not yet learnt what sizeof() and strlen() do but I can take a pretty good guess from the context here. From what you have said, sizeof(my_word) will return 20 in this case and strlen(my_word) will return 19. Can you tell me why they return different values? It's only an educated guess but is it that sizeof() returns the size of the array declared and strlen() only returns the size of the string that is stored in that array? That would seem to make sense to me but, obviously, I'm not sure if that is actually what is happening. If you are able to clarify this for me, I'd appreciate it.
    You've pretty much got it.

    The operator sizeof "yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand." The function strlen "returns the number of characters that precede the terminating null character."
    Code:
    #include <stdio.h>
    #include <string.h>
    
    int main()
    {
       char text[100] = "hello world";
       printf("sizeof text   = %lu\n", (long unsigned)sizeof text);
       printf("strlen(text)  = %lu\n", (long unsigned)strlen(text));
       return 0;
    }
    
    /* my output
    sizeof text  = 100
    strlen(text) = 11
    */
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  9. #9
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Then of course there's the whole sizeof scenario where you use it on the name of an array passed to a function, and get the size of the pointer it's degraded into.


    Quzah.
    Hope is the first step on the road to disappointment.

  10. #10
    Registered User
    Join Date
    Mar 2006
    Posts
    2
    thanks for your replies.
    is there any way of doing it with basic if commands? with the use of && and ||. I was thinking along the lines of it must have a space of punctuation before the letter. but with this method, if the first word i write starts with b, it wont be capitalized.

    also, with the word[0]=='b' technique, would this search through a whole sentence and apply the alteration, or would it just do it to the first word in a sentence (assuming in this case, all input is considered one sentence). sorry for asking this, but im in a situation where i can't check this right now
    thanks,
    jeff.

  11. #11
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    you can use a single boolean variable to set up a simple 'state machine'. if you encounter a letter and the current state is 'waiting for a letter', then you've reached the first letter of the next word, otherwise, set the state to 'not waiting for a letter'.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  12. #12
    Registered User
    Join Date
    Feb 2006
    Location
    Sydney, Australia
    Posts
    40
    Quote Originally Posted by Dave_Sinkula
    You've pretty much got it.

    The operator sizeof "yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand." The function strlen "returns the number of characters that precede the terminating null character."
    Great! I think I have a handle on it now. Thanks for the clarification.

    Quote Originally Posted by quzah
    Then of course there's the whole sizeof scenario where you use it on the name of an array passed to a function, and get the size of the pointer it's degraded into.
    I choose to ignore this information at this time since it is obviously well over my head and is only going to loosen my already tenuous grip on the knowledge that I have only just managed to shoehorn into my brain.

    Damn your eyes for being such a smarty pants! *shakes fist*

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Hangman game and strcmp
    By crazygopedder in forum C Programming
    Replies: 12
    Last Post: 11-23-2008, 06:13 PM
  2. help using strings and mapping
    By trprince in forum C Programming
    Replies: 29
    Last Post: 12-01-2007, 04:01 PM
  3. brace-enclosed error
    By jdc18 in forum C++ Programming
    Replies: 53
    Last Post: 05-03-2007, 05:49 PM
  4. Wrong Output
    By egomaster69 in forum C Programming
    Replies: 7
    Last Post: 01-28-2005, 06:44 PM
  5. Capatalizing the first letter of a word
    By cprog in forum C Programming
    Replies: 0
    Last Post: 12-07-2002, 06:58 PM