Thread: better c string functions

  1. #1
    Registered User
    Join Date
    Oct 2002
    Posts
    27

    better c string functions

    I was looking for a different string library to use then the standard c one, or maybe just some simple wrappers for some of the c string functions. I did some searches and came up with a few possible ones.

    Firestring http://freshmeat.net/projects/firestring/?topic_id=809
    Better String Library http://bstring.sourceforge.net/

    I was wonder if anyone uses these or anything similar. What would you recommend? Which one do you think is best/easiest to use?

    Thanks,

  2. #2
    Obsessed with C chrismiceli's Avatar
    Join Date
    Jan 2003
    Posts
    501
    what is wrong with the standard c string library?

  3. #3
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >what is wrong with the standard c string library?
    A lot, actually. C doesn't support very good string handling functionality.

    >What would you recommend?
    Perl
    My best code is written with the delete key.

  4. #4
    Registered User
    Join Date
    Oct 2002
    Posts
    27
    I'm not looking for a different language to use, if I was I would just use Python. The thing is I've gotten so use to using the simple string operations in Python that now that I have to do some C code I'm just looking for an easy way out.

  5. #5
    Been here, done that.
    Join Date
    May 2003
    Posts
    1,164
    If you don't like something, you really need to explain what you want. "Different string library" does not explain what you are trying to accomplish that the standard library can't handle. Mentioning Python does not explain what you want any better.

    Since C is not an interpretive language like Python, *you* have control over what it does when it comes to strings. It's not a high level language like Basic or Fortran, it exists in the no-mans land between those and Assembler. Python would be above even those mentioned.

    Let me know what functions you want C to handle and I can build a string manipulation library for you that does exactly what you want. You'll need to decide if it would it be worth it to you?
    Definition: Politics -- Latin, from
    poly meaning many and
    tics meaning blood sucking parasites
    -- Tom Smothers

  6. #6
    Registered User
    Join Date
    Nov 2003
    Posts
    7
    Hi folks!

    I am the author of the better string library (Bstrlib). As to the question of "Which [string library] do you think is best/easiest to use?" I want the answer to be the better string library. If you think its not, remember that it is an open source project and if there is some criticism that you have, don't hesitate to send it my way.

    One thing about it is that although I believe Bstrlib to be a very easy library to use (even easier than the C standard library, I would claim), the fact that I designed it probably gives me a ridiculous advantage in understanding how it works. So if you have question about how to do something, I can add explanations for them to the documentation.

    I am also reasonably familliar with the Python programming language. Realizing how simple and powerful Python is, is part of what motivated me to write the better string library. Simple functionality like automatic memory management, extracting substrings, splitting/joining, buffer overfow and alias safety are built in.

    However, there is also functionality in Bstrlib that goes beyond the core of Python's string functions and more naturally maps to the C language. Improvements for most C standard library string functions are available to make sure that Bstrlib can meet any need that the C library can meet. Read-only attributes and static strings have been added to allow for robust and safe intermingling of stack based and heap based strings. One can make a purely reference-based substring of another string with very low overhead, for example. Abstract stream consumption functions have been added to mate bstrings to file I/O (or other kinds of I/O) in a way that isn't arbitrary and ad hoc like the C standard library. One can also access a '\0' terminated char * buffer version of the string for complete compatibility with ordinary C strings.

    And just for icing on the cake -- many of the Bstrlib functions are also asymptotically much faster than their analogues in the standard C library's string functions (often by a massive margin.)

    As to the comment of "Let me know what functions you want C to handle and I can build a string manipulation library for you ...", this has been done over and over by no end of other people who are only too willing to reinvent the wheel (I am one of them!). However, delivering the total functionality of Bstrlib, and providing a transparent regression test is not usually high on the list of priorities for other string libraries. By covering a superset of C standard library functionality in the area of strings, it is also going to be a much better starting point for an application specific string manipulation library.

    Its easy to decide to make your own string library, and its no problem to find ways of improving over the joke that is the standard C library. But what happens when you need to ask "Does this library have performance anomolies?", "Is this library interoperable with ordinary char * functionality to support backward compatibility with other libraries?", "Does this library help reduce buffer overflow problems?", "Is this library portable to other platforms?", "What is the learning curve for this library?", "Is the resulting code going to be maintainable?", "Is the library thread safe?". I think Bstrlib does very well on these questions.

    But of course, as the author, you did not just receive an unbiased opinion.

  7. #7
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    The bstring library looks excellent. Its ownly downside is the lack of unicode support (this seems to be shared by most other string libraries). Is the bstring library relatively stable as I was thinking of attempting to port it to unicode at some point instead of developing my own solutions.

    Do you think there are any major hurdles in changing it to unicode? As far as I can tell, its internal storage needs to be changed to wchar_t and all the functions that take 'char *'s need a corresponding function that takes 'wchar_t *'s. The 'wchar_t *' versions would be the native functions while the 'char *' versions would convert their argument to wide char and call the wide character functions. Any thoughts?

  8. #8
    Registered User Frobozz's Avatar
    Join Date
    Dec 2002
    Posts
    546
    I would think it would be more than just a few simple changes. You'd have to lookup what the standards are and see if you can make it comply. In other words... good luck.

  9. #9
    Registered User
    Join Date
    Nov 2003
    Posts
    7
    As to the question of stability :

    Bstrlib is *very* usable. Prior to implementing the regression test, Bstrlib was very reliable with no known problems under "normal" usage. In writing the main test for bstring made a "best effort" attempt to hit each function on every corner, factoring in write-protection vs. static vs. NULL vs. empty bstrings as well as other more typical scenarios. The regression test pointed out a number of errors that could arise from seriously "on the fringe" usage. I fixed them, and recompiled all my projects without encountering a single problem. So I would say that Bstrlib is as bug-free as I can make it.

    That said, an examination of the CVS tree would show you that I've been checking in non-trivial changes at a rate of about 1 per month for the past year. However, this has not come from API-bloating of the core functions. Its mostly been bug fixes, efficiency improvements, slight refactorings, documentation updates, etc. I.e., the code has always been *converging* towards a target.

    I would say that I am more than happy with the current implementation of the core functions in bstrlib.c. If I were to add anything it might be a split function that acts on a stream (certain applications like reading large CSV database files would benefit from this), but otherwise, I would consider the core API closed.

    Now that doesn't speak for the contents of bstraux.c. That module was supplied intentionally for non-core utility "bonus" functions. Besides bug fixes, I have been periodically adding a function or two here or there into bstraux.c. Basically any function I don't think necessarily belongs in the core because its not general enough, or of somewhat marginal utility gets stuck in there.

    As for the C++ stuff ... well, you have the advantage there that I am not much of a C++ person, and have barely touched that module.

    So depending on how conservative you want to be ... you might like to wait a month to see if I do anything major to it before you decide if its stable or not ... or you could trust me when I tell you "it recently hit a major milestone and is very stable now" .

    As to the question of wchar_t / UNICODE :

    Of course, Bstrlib does not have any internationalization support. One of the things Bstrlib is really useful for is manipulating blocks of binary non-string data (i.e., stuff that can include non-ASCII, and '\0' in its contents.) If you were to simply re-implement bstring on top of wchar_t then such a property would be lost. The abstracted stream-based functions would also all of a sudden becomes a lot less useful.

    That said, the lack of international character set support is a definite weakness of Bstrlib. It was my intention to write a seperate "Better Universal String Library" on the type "bustring" which would be a UCS-4 (what wchar_t is supposed to be) implementation minus the streaming functions, but then adding in conversion functions from the various UTF formats.

    But a reading of some of the UNICODE documentation suggests that this would barely scratch the surface of what one would want from UNICODE string manipulation. String collating, and even something as simple as determining a character position (as opposed to a code point position) is non-trivial. The complexities boggle the mind (or at least it boggled mine) and were discouraging to say the least.

    So what about relying on the underlying C compiler's library for support? Unfortunately, not all compilers support wchar_t, and functions like wcscoll(). And WATCOM C/C++ (my compiler of choice) has pulled the seriously nasty trick of defining wchar_t as 16bits for C, and 32bits for C++. WATCOM C/C++ also seems to think that strcol and strcmp are the same thing.

    For me, portability is not something I want to negotiate on, and I don't want it to be a mess of conditional compilation. So that would leave me with the task of actually implementing a string collation function for UNICODE myself.

    But maybe you see the problem more clearly than I do, and maybe you don't care about portability to older compilers as much. If you would like to take on the task yourself, then I would certainly like to see what you come up with. Since its under the BSD license, you are under no obligation to share whatever you do, but it would be nice if you did.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C++ ini file reader problems
    By guitarist809 in forum C++ Programming
    Replies: 7
    Last Post: 09-04-2008, 06:02 AM
  2. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  3. RicBot
    By John_ in forum C++ Programming
    Replies: 8
    Last Post: 06-13-2006, 06:52 PM
  4. Badly designed n string functions?
    By anonytmouse in forum C Programming
    Replies: 3
    Last Post: 11-01-2003, 06:16 AM
  5. Something is wrong with this menu...
    By DarkViper in forum Windows Programming
    Replies: 2
    Last Post: 12-14-2002, 11:06 PM