Thread: are strings a hack in c?

  1. #1
    Registered User
    Join Date
    Mar 2008
    Posts
    82

    are strings a hack in c?

    isn't the addition of a '\0' to an array of characters really a sort of a hack to obtain a string?

    it's true everything in life can be seen as a hack, in the end .. what gets me is that C can be so elegant at times, it's idea of a string is a bit hackish and out-of-character.

    also, does anybody have any background into the appearance of the function "gets" and its subsequent discrediting?

    many thanks in advance for answer and I apologise for the trollishness.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    There are MANY ways to implement a strings [or any other "data with variable length"] - the two main contenders are:
    1. Store the length.
    2. Store a marker at the end.

    You may say that one is better than the other, but realistically, both have their strong points and weak points.

    If the string length is to allow more than 255 bytes, you need a bigger length than the size of the zero at the end of a string. Iterating over a string of arbitrary lenght is also quite easy when the end is marked within the string - else you have to do some comparing with the location in the string as well as getting the content out for most of the string. To make strings really functional, you would actually need two extra fields:
    1. The current content's length.
    2. The maximum length.

    Other software, not written in C has used the same or a similar approach (e.g. the print-string system call in MS-DOS uses a $ to mark end of strings) to identifying the end of a string, so C did not invent this method. It may well be called a hack in those systems too, but it's at the very least a common hack.

    Some other languages of the past, such as Pascal, didn't even have a string type in the standard language - just arrays of char - and if you wanted to know how much of that content was actually used, you'd have to work that out in a similar way to C, by storing an "end marker" - and you couldn't use standard functions such as write on those arrays in that case, since it would just write the entire array. Strings implemented in variants of Pascal were non-standard, and used various methods to hold the length. Turbo Pascal used a one-byte length to store the size. In VAX pascal, the length was (if I remember correctly) a 16-bit integer, so strings could be up to 64K long.


    The failures of gets is a different matter - it is assuming that the programmer AND THE USER are in agreement and understands how much data is supposed to be input, since gets itself doesn't know the limits of the string to be stored. This doesn't work well in general release software, because there is always someone who will accidentally or on purpose overwrite the end of the string and cause unwanted effects.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Mar 2008
    Posts
    82
    thanks Mats .. you are top class .. it's a great explanation which I really appreciate. Lucky those that work with you.

  4. #4
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Quote Originally Posted by stabu View Post
    isn't the addition of a '\0' to an array of characters really a sort of a hack to obtain a string?

    it's true everything in life can be seen as a hack, in the end .. what gets me is that C can be so elegant at times, it's idea of a string is a bit hackish and out-of-character.

    also, does anybody have any background into the appearance of the function "gets" and its subsequent discrediting?

    many thanks in advance for answer and I apologise for the trollishness.
    In your opinion what would be the best method to use for a string? As mastp there are really two ways do make a string, C choses the one putting an end-of-string character. I don't think other languages have a really different approach. Even if they have there own type, like String, the information required for the string is just hidden within the type. In C you could make your own datatype string, or just use C++ that is an evolution of C.

  5. #5
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    > In C you could make your own datatype string
    Yes but as matsp said, you'll miss out on the existing C standard string functions (ie, strcmp(), strlen(), strcpy() etc...).

  6. #6
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Using '\0' to end a string always seemed perfectly logical to me. It needs some way to know where the end is. Just like in English (and many other languages) we use a '.' to end a sentence.
    Maybe all this useless text messaging kids are using these days are eroding some basic language skills like spelling & grammar?

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    There are MANY ways to implement a strings [or any other "data with variable length"] - the two main contenders are:
    1. Store the length.
    2. Store a marker at the end.
    Interestingly, command line arguments use both methods: argc (or whatever the first parameter of main() is named) gives the count of the number of command line arguments supplied, but one could also iterate over argv (or whatever the second parameter of main() is named) until the null pointer is reached.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by laserlight View Post
    Interestingly, command line arguments use both methods: argc (or whatever the first parameter of main() is named) gives the count of the number of command line arguments supplied, but one could also iterate over argv (or whatever the second parameter of main() is named) until the null pointer is reached.
    Yes, there's of course nothing preventing a redundant/hybrid solution of the two options outlined in my original post in this thread. And of course, it is sometimes easier to loop and check if current argv is NULL, and sometimes handy to know that there are at least 3 arguments without having to write some code to count them.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Registered User
    Join Date
    Jan 2007
    Posts
    330
    Quote Originally Posted by zacs7 View Post
    > In C you could make your own datatype string
    Yes but as matsp said, you'll miss out on the existing C standard string functions (ie, strcmp(), strlen(), strcpy() etc...).
    the internals of your string library could still use the standard C routines.

  10. #10
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    Quote Originally Posted by matsp View Post
    Other software, not written in C has used the same or a similar approach (e.g. the print-string system call in MS-DOS uses a $ to mark end of strings) to identifying the end of a string, so C did not invent this method.
    Considering that DOS' predecessor, CP/M, was predated by C about 2 years, I don't think that's quite how it worked.

  11. #11
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Quote Originally Posted by KIBO View Post
    the internals of your string library could still use the standard C routines.
    Exactly. If you want to make a serious big program, you could afford to spend a couple of hours (it wont take more) to implement your own library for strings. Then you can make it as you want with the pros and cons that you choose.

    Personally, for array types I prefer to know the size. In this case your for-loops would be probably more efficient, since you will know from before the size of the array and you wouldn't search for an ending character. That will also save you from bugs caused by having two '\0' in a string.
    But, if the standard libraries included the size instead of an end-of-string character, then all the functions would need one more input. That would be troublesome, having one variable as a pointer and one for the size. Making a struct for string could be a solution, but for other reasons they chose to keep it more simple.

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by robwhit View Post
    Considering that DOS' predecessor, CP/M, was predated by C about 2 years, I don't think that's quite how it worked.
    Neither CP/M nor MS-DOS are WRITTEN in C tho'. They may have been inspired by C, but certainly not written in.

    Another place where strings were implemented using zero-termination would be in PDP-11's RT11 OS, which was introduced in 1970:
    http://en.wikipedia.org/wiki/RT-11
    Code here:
    http://en.wikibooks.org/wiki/Compute...11.2C_MACRO-11
    The fact that it uses "ASCIZ" which is there to define a string with a zero termination indicates that those strings where common - otherwise there's little point in having "ASCIZ" if you don't need it to define ascii strings with a terminating zero.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  13. #13
    Registered User ssharish2005's Avatar
    Join Date
    Sep 2005
    Location
    Cambridge, UK
    Posts
    1,732
    Having a delimiter char with any array sort data structure like int , float, double arrays and so on. It would have been so much easier for the programmer to implement few applications.

    Does anyone agree with it? I would prefer to have delimiter symbol or a char for int, float..... So that few have had many standard libraries, similar to string library function.

    The reason why I thought of this is because, it would have been much simple to implement RPCgen stub .

    ssharish
    Last edited by ssharish2005; 08-06-2008 at 06:07 PM.

  14. #14
    Registered User
    Join Date
    Oct 2001
    Posts
    2,129
    Quote Originally Posted by matsp View Post
    Neither CP/M nor MS-DOS are WRITTEN in C tho'. They may have been inspired by C, but certainly not written in.
    I think I might be misunderstanding something. Why would DOS have to have been written in C?

  15. #15
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by robwhit View Post
    I think I might be misunderstanding something. Why would DOS have to have been written in C?
    We don't know. matsp said it wasn't, and you took exception to that statement.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Strings Program
    By limergal in forum C++ Programming
    Replies: 4
    Last Post: 12-02-2006, 03:24 PM
  2. Programming using strings
    By jlu0418 in forum C++ Programming
    Replies: 5
    Last Post: 11-26-2006, 08:07 PM
  3. Problems with strings as key in STL maps
    By all_names_taken in forum C++ Programming
    Replies: 3
    Last Post: 01-17-2006, 11:34 AM
  4. Reading strings input by the user...
    By Cmuppet in forum C Programming
    Replies: 13
    Last Post: 07-21-2004, 06:37 AM
  5. menus and strings
    By garycastillo in forum C Programming
    Replies: 3
    Last Post: 04-29-2002, 11:23 AM