Thread: Help with Substrings/String manipulation using C-strings and the C Standard library

  1. #1
    Registered User
    Join Date
    Apr 2005
    Location
    USA
    Posts
    2

    Question Help with Substrings/String manipulation using C-strings and the C Standard library

    Disclaimer: I'm relatively new to C/C++ so please let me know if there is anything incorrect or missing in my post. Also, please elaborate as much as possible in your response.

    Essentially, what I need to do is the following:
    Given a C-style (const char *) string with a value of "<foo>||<bar>", I want to get <foo> in a separate C-string and <bar> in yet another C-string. "||" is a unique, fixed-length delimiter between <foo> and <bar>. The lengths of <foo> and <bar> would be reasonably small, but unknown till runtime.

    What is the best way to achieve this?

    Well, actually I've put together some code that does provide this functionality. But I've got two concerns, given that I'm new to C/C++ and I've patched this code together using the example found on the strstr documentation page on MSDN and a 'substr' function found with a web search:

    • I get a warning at compile time (below the code listing). What is the meaning of this? Is it safe to ignore? If not, how do I fix it?
    • Overall, is this good "final" quality code? Is there more that can be done for robustness/error-checking etc?


    Code:
    #include <stdio.h>
    #include <string.h>
    #include <malloc.h>
     
    char *substr(const char* fullstr, int start, int end)
    {
    	char *theSubstring;
    	int i, j;
     
    	theSubstring = (char*) malloc(end - start + 1);
    	for (i = start, j = 0; i < end; ++i, ++j)
    	{
    		theSubstring[j] = fullstr[i];
    	}
    	theSubstring[j] = 0;
    	return theSubstring;
    }
     
    void main( void )
    {
    	char *strstrResult;
    	int delimiterLocation;
    	char theString[] = "foofoofoo||foobarisnotbarfoo";
    	char delimiter[] = "||";
    	char *theFoo;
    	char *theBar;
     
    	printf( "String to be searched:[%s]\n", theString);
    	strstrResult = strstr( theString, delimiter );
    	delimiterLocation = strstrResult - theString + 1;
    	if( strstrResult != NULL )
    	{
    		printf( "[%s] found at position [%d]\n", delimiter, delimiterLocation);
    	}
    	else
    	{
    		printf( "[%s] not found\n", delimiter );
    	}
    	theFoo = substr(theString, 0, delimiterLocation-1);
    	theBar = substr(theString, delimiterLocation+1, (int)strlen(theString));
    	printf( "theFoo=[%s] theBar=[%s]", theFoo, theBar);
    	free(theFoo);
    	free(theBar);
    }
    I get the following compiler warning:
    Code:
    ~\substring_v2.c(30) : warning C4244: '=' : conversion from '__w64 int' to 'int', possible loss of data
    at this line in the code above:
    Code:
    delimiterLocation = strstrResult - theString + 1;
    Is this something to be concerned about? How do I get rid of this warning?



    Thanks,
    - ak

  2. #2
    Registered User kryptkat's Avatar
    Join Date
    Dec 2002
    Posts
    638
    delimiterLocation is def as int delimiterLocation;
    and strstrResult - theString + 1; are char
    unless you are trying to convert char to int.


    edit
    and you can not return strings

    C:\borland\bcc55\bin>bcc32 footest2b014.c
    Borland C++ 5.5.1 for Win32 Copyright (c) 1993, 2000 Borland
    footest2b014.c:
    Warning W8070 footest2b014.c 25: Function should return a value in function subs
    tr
    Turbo Incremental Link 5.00 Copyright (c) 1997, 2000 Borland

    C:\borland\bcc55\bin> footest2b014
    String to be searched:[foofoofoo||foobarisnotbarfoo]
    [||] found at position [10]

    Code:
    /* footest2.c */
    
    
    
    #include <stdio.h>
    #include <string.h>
    #include <malloc.h>
    
    
    char *theSubstring;
     
    char *substr(const char* fullstr, int start, int end)
    
    {
    	char *theSubstring;
    	int i, j;
     
    	/* theSubstring = (char*) malloc(end - start + 1); */
    	for (i = start, j = 0; i < end; ++i, ++j)
    	{
    		theSubstring[j] = fullstr[i];
    	}
    	theSubstring[j] = 0;
           /*   theSubstring; */
    }
     
    void main( void )
    {
    	char *strstrResult;
    	int delimiterLocation;
    	char theString[] = "foofoofoo||foobarisnotbarfoo";
    	char delimiter[] = "||";
    	char *theFoo;
    	char *theBar;
    
            int start;
            int end;
     
    	printf( "String to be searched:[%s]\n", theString);
    	strstrResult = strstr( theString, delimiter );
    	delimiterLocation = strstrResult - theString + 1;
    	if( strstrResult != NULL )
    	{
    		printf( "[%s] found at position [%d]\n", delimiter, delimiterLocation);
    	}
    	else
    	{
    		printf( "[%s] not found\n", delimiter );
    	}
    	theFoo = substr(theString, 0, delimiterLocation-1);
    	theBar = substr(theString, delimiterLocation+1, (int)strlen(theString));
    	printf( "theFoo=[%s] theBar=[%s]", theFoo, theBar);
    	free(theFoo);
    	free(theBar);
    }
    Last edited by kryptkat; 04-29-2005 at 05:36 PM.

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    0. Welcome to the board

    > I'm relatively new to C/C++ so please let me know if there is anything incorrect or missing in my post
    Step 1 is to decide which language you're programming in. For the moment at least, you're programming in C.
    C/C++ is some horrible mix which is only understood by the programmer and their current compiler.

    > What is the best way to achieve this?
    All sorts of ways, this way seems as good as any other.

    > using the example found on the strstr documentation page on MSDN
    Always treat examples on MSDN as being written in C/C++. They're mostly OK for simple programs written using M$ compilers, but as robust portable code, they usually suck.

    > What is the meaning of this? Is it safe to ignore?
    It means part of the expression returned a "__w64 int", which is bigger than the usual int, and so in converting from a wider type to a narrower type, the compiler warned about possible loss of data.
    Ideally, programs should compile with zero warnings. If you have too many warnings (especially in a large project), it's all too easy to miss the appearance of a new warning you should be paying attention to.
    Most warnings are safe to ignore, most of the time (and therein lies the problem of having too many).
    Writing large programs which compile with 0 warnings on a number of platforms can be a real challenge at times. Different compilers pick up on different things.

    > Overall, is this good "final" quality code?
    Well the error checking could be better.

    Now some comments on the code
    Code:
    delimiterLocation = strstrResult - theString + 1;
    With the red part of the expression, you're calculating the difference between two pointers (perfectly legal to do). However, the type of that part of the expression is ptrdiff_t (which on your machine is __w64 int). Since the rest of your expression is an int expression, you get a warning.

    Code:
    #include <stdio.h>
    #include <string.h>
    /*#include <malloc.h>*/
    #include <stdlib.h>  /* malloc lives here */
    
    char *substr(const char* fullstr, int start, int end)
    {
        char *theSubstring;
        int i, j;
    
        /* don't cast the result of malloc */
        /* you might be hiding an important warning you need to pay */
        /* attention to */
        theSubstring = malloc(end - start + 1);
        
        /* always assume malloc may fail */
        if ( theSubstring != NULL )
        {
            for (i = start, j = 0; i < end; ++i, ++j)
            {
                theSubstring[j] = fullstr[i];
            }
            theSubstring[j] = 0;
        }
        
        /* since this may now be a NULL pointer, the callers */
        /* need to check for NULL as well */
        return theSubstring;
    }
    
    /* main always returns an int */
    int main( void )
    {
        char *strstrResult;
        int delimiterLocation;
        char theString[] = "foofoofoo||foobarisnotbarfoo";
        char delimiter[] = "||";
        char *theFoo;
        char *theBar;
    
        printf( "String to be searched:[%s]\n", theString);
        strstrResult = strstr( theString, delimiter );
    
        /* this may be NULL, so I've moved your calculation inside the if() */
        if( strstrResult != NULL )
        {
            delimiterLocation = (int)(strstrResult - theString) + 1;
            printf( "[%s] found at position [%d]\n", delimiter, delimiterLocation);
    
            theFoo = substr(theString, 0, delimiterLocation-1);
            theBar = substr(theString, delimiterLocation+1, (int)strlen(theString));
            if ( theFoo != NULL && theBar != NULL )
            {
                /* both allocated successfully */
                printf( "theFoo=[%s] theBar=[%s]", theFoo, theBar);
            }
            else
            {
                printf( "Some memory problems\n" );
            }
            
            free(theFoo);
            free(theBar);
        }
        else
        {
            printf( "[%s] not found\n", delimiter );
        }
        return 0;
    }
    You might be able to get rid of the remaining two casts, if you made all the int variables in your code to be 'size_t' variables.


    > and you can not return strings
    There's simply no need for that kryptkat
    Their code was fine, the memory was allocated and correctly returned. There's no need to maul the code by using global variables.

    > Function should return a value in function substr
    Man, that code is SO crippled compared to the original version, it just trashes memory via an uninitialised pointer.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed