Thread: char (string) array functions

  1. #1
    Registered User
    Join Date
    Mar 2013
    Posts
    63

    char (string) array functions

    I'm an old fart basic programmer trying to upgrade my programming knowledge in c and c++.
    I have recently forked a long standing Basic -> C translator (BCX).
    I decided to start investigating the runtime library starting with the BASIC char string functions.
    I first want to establish if they are viable entities as perceived by the talented programmers here.
    If not do you see a better way to implement.
    I then would like to extend this to the c++ Programing Board for the same functions using std::string.

    Thank you for yout time.
    James


    First up is LEFT$
    ================================================== ============================
    Purpose: LEFT$ returns the leftmost substring of a string

    Syntax:
    SubStr$ = LEFT$(MainStr$, Length%)
    Parameters:
    MainStr$ String from which left substring is to be copied.
    Length% Leftmost substring length.
    Return Value:
    SubStr$, the return value, is a substring from Main$.
    ================================================== ============================


    Code:
    #ifndef BCXTmpStrSize
    #define BCXTmpStrSize  2048
    #endif
    char *BCX_TmpStr (size_t Bites,size_t  iPad,int iAlloc)
    {
      static int   StrCnt;
      static char *StrFunc[BCXTmpStrSize];
      StrCnt=(StrCnt + 1) & (BCXTmpStrSize-1);
      if(StrFunc[StrCnt]) {free (StrFunc[StrCnt]); StrFunc[StrCnt] = NULL;}
    #if defined BCX_MAX_VAR_SIZE
      if(Bites*sizeof(char)>BCX_MAX_VAR_SIZE)
      {
      printf("Buffer Overflow caught in BCX_TmpStr - requested space of %d EXCEEDS %d\n",(int)(Bites*sizeof(char)),BCX_MAX_VAR_SIZE);
      abort();
      }
    #endif
      if(iAlloc) StrFunc[StrCnt]=(char*)calloc(Bites+128,sizeof(char));
      return StrFunc[StrCnt];
    }
    
    
    
    
    char *left (const char *S, int length)
    {
      register int tmplen = strlen(S);
      if(length<1) return BCX_TmpStr(1,128,1);
      if(length<tmplen) tmplen=length;
      char *strtmp = BCX_TmpStr(tmplen,128,1);
      return (char*)memcpy(strtmp,S,tmplen);
    }

  2. #2
    SAMARAS std10093's Avatar
    Join Date
    Jan 2011
    Location
    Nice, France
    Posts
    2,694
    The first thing I notices is that you use the keyword static. I want to demonstrate, that static will actually "create one copy of the variable you need".

    Code:
    #include <stdlib.h>
    #include <stdio.h>
    
    char* james(char*);
    
    int main(void)
    {
        char* str1 = NULL;
        char* str2 = NULL;
        
        str1 = james("input1");
        
        printf("str1 = %s\nstr2 = %s\n", str1, str2);
        
        str2 = james("input2");
        
        printf("str1 = %s\nstr2 = %s\n", str1, str2);
        
        return 0;
    }
    
    char* james(char* input)
    {
        static char word[10];
        strcpy(word, input);
        return word;
    }
    gives as output
    Code:
    str1 = input1
    str2 = (null)
    str1 = input2
    str2 = input2
    where, by using malloc and the same main
    Code:
    char* james(char* input)
    {
        char* word;
        
        word = malloc(strlen(input) + 1);
        /* Should check for null pointer here */
        
        strcpy(word, input);
        return word;
    }
    we get this output
    Code:
    str1 = input1
    str2 = (null)
    str1 = input1
    str2 = input2
    Code - functions and small libraries I use


    It’s 2014 and I still use printf() for debugging.


    "Programs must be written for people to read, and only incidentally for machines to execute. " —Harold Abelson

  3. #3
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    In C, I would personally create a new string type, which would make implementing those operations much easier. There are many working approaches, but here's the one I'd most likely use:
    Code:
    #include <stdlib.h>
    #include <string.h>
    
    typedef struct {
        size_t  size;  /* Allocated for the string, including the '\0' at end */
        size_t  used;  /* Used by the string, not including the '\0' at end */
        char  *data; /* Dynamically allocated data */
    } string_t;
    
    #define STRING_INIT { 0, 0, NULL }
    The structure describes a dynamically allocated string. The string itself is mutable (modifiable) and can be grown and shrunk as necessary. Although the actual data can contain even NUL bytes ('\0'), there is a '\0' added at end so you can use data as a C string, too.

    Here's the function to create a new string (followed by a couple of helper functions it uses). It copies length characters from source to the new string, so you can use if for both string_t strings, and for normal C strings (via str = string_new(c_string, strlen(c_string));).
    Code:
    static void out_of_memory(void)
    {
        fflush(stdout);
        fprintf(stderr, "Out of memory error.\n");
        fflush(stderr);
        exit(1);
    }
    
    static size_t size_for_length(const size_t length)
    {
        /* If the strings ofthen grow, change this logic to
         * initially allocate larger area for each string.
         * The return value must be larger than length,
         * so that the '\0' can be appended.
        */
        return length + 1;
    }
    
    string_t string_new(const char *const source, const size_t length)
    {
        string_t str;
    
        str.used = length;
        str.size = size_for_length(length);
        str.data = malloc(str.size);
        if (!str.data)
            out_of_memory();
    
        if (length > 0)
            memcpy(str.data, source, length);
    
        str.data[length] = '\0';
    
        return str;
    }
    When you replace an existing string, the old one has to be destroyed first, or you'll leak the dynamically allocated memory used by the old string. (When the program exits, all dynamically allocated memory will be released, so there's no need to worry about that; only about when assigning new values over existing ones.)

    I like to do that thoroughly, "poisoning" the data, so I can easily detect if I try to reuse the string:
    Code:
    void string_destroy(string_t *const str)
    {
        if (str) {
            free(str->data);
            str->data = NULL;
            str->size = 0;
            str->used = 0;
        }
    }
    Here are some string manipulation functions. Each of them returns a new, independent structure containing a copy of the desired part of the source string.
    Code:
    /* Return a copy of the initial part (if length > 0),
     * or tail part (if length < 0), of the source string.
    */
    string_t string_part(const string_t source, const long length)
    {
        if (length < 0L) {
            const size_t len = (size_t)(-length);
            if (source.used < len)
                return string_new(source.data, source.used);
            else
                return string_new(source.data + source.used - len, len);
        } else
        if (length > 0L) {
            const size_t len = (size_t)length;
            if (source.used < len)
                return string_new(source.data, source.used);
            else
                return string_new(source.data, len);
        } else
            return string_new(NULL, 0);
    }
    
    /* Return a middle part of the source string.
     * Length must be positive, but offset can be
     * positive (from start) or negative (from end).
    */
    string_t string_sub(const string_t source, const long offset, const long length)
    {
        size_t off, len;
    
        if (length < 1L)
            return string_new(NULL, 0);
    
        if (offset < 0L) {
            if ((size_t)(-offset) > source.used)
                off = 0;
            else
                off = source.used - (size_t)(-offset);
        } else
        if (offset > 0L) {
            if ((size_t)offset > source.used)
                off = source.used;
            else
                off = (size_t)offset;
        } else
            off = 0;
    
        len = (size_t)length;
        if (off + len > source.used)
            len = source.used - off;
    
        return string_new(source.data + off, len);
    }
    Finally, here is a short example program that illustrates the use of the above. (It uses stdio.h, so remember to add #include <stdio.h> to the top of the program.)
    Code:
    int main(int argc, char *argv[])
    {
        string_t src = STRING_INIT;
        string_t dst = STRING_INIT;
        long offset, length;
        char dummy;
    
        if (argc < 3 || argc > 4 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
            fprintf(stderr, "\n");
            fprintf(stderr, "Usage: %s [ -h | --help]\n", argv[0]);
            fprintf(stderr, "       %s STRING LENGTH\n", argv[0]);
            fprintf(stderr, "       %s STRING OFFSET LENGTH\n", argv[0]);
            fprintf(stderr, "\n");
            return 1;
        }
    
        /* Create a string_t out of first command-line parameter. */
        src = string_new(argv[1], strlen(argv[1]));
    
        if (argc < 4) {
            /* STRING LENGTH */
            if (sscanf(argv[2], " %ld %c", &length, &dummy) != 1) {
                fprintf(stderr, "%s: Invalid length.\n", argv[2]);
                return 1;
            }
    
            dst = string_part(src, length);
    
        } else {
            /* STRING OFFSET LENGTH */
            if (sscanf(argv[2], " %ld %c", &offset, &dummy) != 1) {
                fprintf(stderr, "%s: Invalid offset.\n", argv[2]);
                return 1;
            }
            if (sscanf(argv[3], " %ld %c", &length, &dummy) != 1) {
                fprintf(stderr, "%s: Invalid length.\n", argv[3]);
                return 1;
            }
    
            dst = string_sub(src, offset, length);
        }
    
        printf("Input: '%s' (%lu bytes)\n", src.data, (unsigned long)src.used);
        printf("Result: '%s' (%lu bytes)\n", dst.data, (unsigned long)dst.used);
    
        /* Since the strings are no longer needed, destroy them. */
        string_destroy(&src);
        string_destroy(&dst);
    
        /* Note: you could now assign new strings to src and dst,
         * without leaking memory. Assuming you also remember
         * to destroy those too afterwards.
         * Note: When the program exits, all dynamically allocated
         * memory will be released automatically.
         * So, you don't need to string_destroy() all strings before
         * exiting; only before reusing them. */
        return 0;
    }
    Examples:
    Code:
    ./example "Some string" 50
    Input: 'Some string' (11 bytes)
    Result: 'Some string' (11 bytes)
    
    ./example "Some string" 6
    Input: 'Some string' (11 bytes)
    Result: 'Some s' (6 bytes)
    
    ./example "Some string" -6
    Input: 'Some string' (11 bytes)
    Result: 'string' (6 bytes)
    
    ./example "Some string" 6 3
    Input: 'Some string' (11 bytes)
    Result: 'tri' (3 bytes)
    
    ./example "Some string" -6 3
    Input: 'Some string' (11 bytes)
    Result: 'str' (3 bytes)
    Feel free to use the above code in whatever ways you wish. I consider it to be public domain. (It might not contain enough creative input to be considered actually copyright-worthy, as a lot of C programmers with similar objectives as I had, would write functionally the same code as above.)

  4. #4
    Registered User
    Join Date
    Mar 2013
    Posts
    63
    Nominal Animal,
    I appreciate the lengthy post but this is not a viable option as the translator is 43k lines of code and it uses the string functions heavily.
    I do thank you for the information and may use the code in my own "c" work.

    James

  5. #5
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by jcfuller View Post
    Nominal Animal, [..] this is not a viable option as the translator [..] uses the string functions heavily.
    Ah. Well, BCX_TmpStr() is still ... not too good, because it starts reusing existing strings after BCXTmpStrSize calls. That means, if there are long-lived strings, and you need BCXTmpStrSize temporary strings, your long-lived strings are lost; they start to point to garbage (random other data), and may even crash the program. Even if the strings are short-lived, it does not even try to reuse the already allocated pointer.

    Assuming all BCX_TmpStr() strings are short-lived, then you could do better with
    Code:
    static size_t BCX_TmpStr_Next = 0;
    static size_t BCX_TmpStr_Size[BCXTmpStrSize] = { 0 };
    static char  *BCX_TmpStr_Data[BCXTmpStrSize] = { 0 };
    
    char *BCX_TmpStr(const size_t length, const size_t padding, int unused)
    {
        char *retval;
    
        if (BCX_TmpStr_Size[BCX_TmpStr_Next] < length + padding) {
            free(BCX_TmpStr_Data[BCX_TmpStr_Next]);
            BCX_TmpStr_Size[BCX_TmpStr_Next] = length + padding;
            BCX_TmpStr_Data[BCX_TmpStr_Next] = malloc(length + padding);
            if (!BCX_TmpStr_Data[BCX_TmpStr_Next]) {
                fflush(stdout);
                fprintf(stderr, "Out of memory.\n");
                exit(1);
            }
    
        retval = BCX_TmpStr_Data[BCXTmpStrNext];
        BCX_TmpStr_Next = (BCX_TmpStr_Next + 1) % BCXTmpStrSize;
    
        return retval;
    }
    This one still reuses strings after BCXTmpStrSize calls. Note: In C, sizeof (char) == 1 by definition, free(NULL) is safe, and 0 in braces initializes an entire array to binary zeros. (You cannot use other values than 0; only the initial value will be initialized to the value, then the rest of the array is filled with binary zeros.)

    If the existing pointer has at least length+padding chars available, then the old pointer is reused as-is. Otherwise, the old pointer is discarded, and a new one allocated.

    Here is a string creator using the above. It takes an existing string (which will only be copied from), and the desired length, and returns a new string.
    Code:
    char *BCX_TempStrCopy(const char *const source, const size_t length)
    {
        char *p = BCX_TempStr(length, 1, 0);
        if (length > 0)
            memcpy(p, source, length);
        p[length] = '\0';
        return p;
    }
    Using the above, you can easily create a substring function, which copies length bytes (or all, if length < 0) from either the start (offset >= 0) or from the end (offset < 0). With this you should be able to slice strings (using character indexes) any way you want. It even checks sane ranges, and won't try to copy more data than there already is:
    Code:
    char *BCX_SubStr(const char *const source, const long offset, const long length)
    {
        const size_t sourcelen = (source) ? strlen(source) : 0;
        size_t off, len;
    
        if (offset < 0L) {
            if ((size_t)(-offset) < sourcelen)
                off = sourcelen - (size_t)(-offset);
            else
                off = 0;
        } else
        if ((size_t)offset < sourcelen)
            off = (size_t)offset;
        else
            off = sourcelen;
    
        if (length < 0L)
            len = sourcelen - off;
        else {
            len = (size_t)length;
            if (off + len > sourcelen)
                len = sourcelen - off;
        }
    
        return BCX_TempStrCopy(source + off, len);
    }
    If you are wondering, I'm using long because on most current 64-bit implementations, int is only 32-bit, but long and size_t and pointers are 64-bit. In other words, these support > 2GB long strings.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Char array and functions
    By acolussi in forum C Programming
    Replies: 8
    Last Post: 07-17-2012, 05:50 AM
  2. Convert string to char or use 2d array of char?
    By simpleblue in forum C++ Programming
    Replies: 6
    Last Post: 09-25-2011, 05:00 PM
  3. Question about functions of string vs char string
    By Robertjh12 in forum C++ Programming
    Replies: 2
    Last Post: 07-07-2011, 03:13 AM
  4. Appending char to string (char array)
    By sniper83 in forum C Programming
    Replies: 14
    Last Post: 04-15-2008, 06:48 AM
  5. functions gets into char array
    By bobthesled in forum C Programming
    Replies: 8
    Last Post: 04-27-2005, 09:16 AM