Thread: Segmentation fault via iconv

  1. #1
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733

    Segmentation fault via iconv

    Struggling to fix this problem myself, hoping for a solution to be posted while I'm at work.
    The section of code where iconv() is used:
    Code:
    if ( encoding != std_encoding_utf ) {
    		fprintf( stderr, "N = %p, C = %p, NS = %zu, CS = %zu\n",
    			N, C, NS, CS );
    		while ( NS && CS ) {
    			bytes = iconv( utf_to_iconv[encoding], &N, &NS, &C, &CS );
    			if ( !bytes || bytes == (size_t)-1 ) break;
    			if ( errno ) {
    				ret = errno;
    				FAIL( stderr, ret, "");
    				break;
    			}
    		}
    		if ( NS ) {
    			ret = ENOMEM;
    			FAIL( stderr, ret, "" );
    		}
    	}
    My output via Geany's "Compiler" tab:
    Code:
    make char.run
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libnext.so -c next.c
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libtsc.so -c tsc.c
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libbase62.so -c base62.c
    cc -fPIC -Wall -Wno-multichar  -D OUT=char.elf -o ./char.elf char.c ./libnext.so ./libtsc.so ./libbase62.so
    ./char.elf
    N = 0x560e06a2b210, C = 0x560e06a2d220, NS = 3, CS = 8192
    make: *** [makefile:36: char.run] Segmentation fault (core dumped)
    rm libtsc.so libbase62.so char.elf libnext.so
    Compilation failed.
    Edit: Uploaded the files here:
    multi_literal_test.tar.gz - Google Drive
    off to work now
    Last edited by awsdert; 10-10-2019 at 05:25 AM.

  2. #2
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Okay just got home and ran a debugging compile, here's my output (minus the gdb preamble):
    Code:
    make char.dbg
    cc -ggdb -fPIC -Wall -Wno-multichar  -shared -o ./libdnext.so -c next.c
    cc -ggdb -fPIC -Wall -Wno-multichar  -shared -o ./libdtsc.so -c tsc.c
    cc -ggdb -fPIC -Wall -Wno-multichar  -shared -o ./libdbase62.so -c base62.c
    cc -ggdb -fPIC -Wall -Wno-multichar  -D OUT=dchar.elf -o ./dchar.elf char.c ./libdnext.so ./libdtsc.so ./libdbase62.so
    gdb ./dchar.elf
    GNU gdb (GDB) 8.3
    ...
    This GDB was configured as "x86_64-pc-linux-gnu".
    ...
    Reading symbols from ./dchar.elf...
    (gdb) run
    Starting program: /run/media/zxuiji/ZXUIJI_1TB/github/mc/dchar.elf 
    std_encoding = 'UTF-8'
    
    Compiler Tests:
    
    
    'abcd': 0x61626364, 'd', dcba
    
    'a⺢': 0x61E2BAA2, '�', ���a
    
    '⺢a': 0xE2BAA261, 'a', a���
    
    Unit tests:
    
    010 = 8
    0b101 = 5
    99999 = 99999
    0x5f5e0ff = 99999999
    N = 0x5555555a2610, C = 0x5555555a4620, NS = 3, CS = 8192
    
    Program received signal SIGSEGV, Segmentation fault.
    0x00007ffff7dfabc1 in __gconv () from /usr/lib/libc.so.6
    (gdb)
    Any ideas for what I should look for as this is not a function I'm familiar with, btw the actual size of N is the same as C, the 3 is the result of calling I function I called utflen() on it, similar to strlen but avoids accidently terminating the length too early:
    Code:
    size_t utflen( char8_t const *txt ) {
    	size_t i;
    	if ( !txt ) return 0;
    	for ( i = 0; txt[i]; i += (utfclen( txt[i] ) + 1) );
    	return i;
    }

  3. #3
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Figured while I'm waiting and have a lack of ideas of what to do I would go ahead and decide on some better names for those variables, since they only get used in this one instance I decided to give them exactly those names:
    Code:
    	iconvSrcBuff = U8.buff;
    	iconvSrcSize = (utflen(u8) + 1) * sizeof(char8_t);
    	iconvDstBuff = CB.buff;
    	iconvDstSize = CB.size;
    	if ( encoding != std_encoding_utf ) {
    		FAIL( stderr, EXIT_SUCCESS, "Line marker" );
    		fprintf( stderr,
    			"iconvSrcBuff = %p, iconvDstBuff = %p, "
    			"iconvSrcSize = %zu, iconvDstSize = %zu\n",
    			iconvSrcBuff, iconvDstBuff, iconvSrcSize, iconvDstSize );
    		while ( iconvSrcSize && iconvDstSize ) {
    			bytes = iconv( utf_to_iconv[encoding],
    				&iconvSrcBuff, &iconvSrcSize,
    				&iconvDstBuff, &iconvDstSize );
    			if ( !bytes || bytes == (size_t)-1 ) break;
    			if ( errno ) {
    				ret = errno;
    				FAIL( stderr, ret, "");
    				break;
    			}
    		}
    		if ( iconvSrcSize ) {
    			ret = ENOMEM;
    			FAIL( stderr, ret, "" );
    		}
    	}
    	else (void)memcpy( cb, u8, iconvSrcSize );
    U8 & CB are managed via a dedicated function that makes it easier for me to catch allocation errors before this point can be reached so it definitly is not a lack of memory, I'm guessing I'm using iconv() wrong but I don't know how, this is what I've been referencing:
    iconv(3) - Linux manual page

    Edit 1: Grammar mistake
    Edit 2: While renaming my variables I slipped up and filled the wrong variable with the destination size, fixed it in the above code as well
    Last edited by awsdert; 10-10-2019 at 04:37 PM.

  4. #4
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    I suspect that the function size_t utfcpy( char8_t *D, size_t DLEN, char8_t const *S, size_t SLEN ) is not nul terminating the string. After modifying that function to ensure the copy is nul terminated the segfault goes away

    Edit: I'm not going to look at it too much, but:

    Code:
    size_t utfcpy( char8_t *D, size_t DLEN, char8_t const *S, size_t SLEN ) {
        size_t i;
        if ( DLEN > SLEN ) DLEN = SLEN;
        for ( i = 0; *D && i < DLEN; ++i )
            D[i] = S[i];
        D[i] = '\0';
        return i;
    Stops the invalid reads (which are causing the segfault). The question now, though, is whether my modified code does what you intend it to do or not. That's for you to answer You may have to change the macro NEXTC_C_LENG (currently 7) to 8... dunno, don't have time to look at it too hard at the moment but the function does seem to be problematic
    Last edited by Hodor; 10-10-2019 at 08:04 PM.

  5. #5
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by Hodor View Post
    I suspect that the function size_t utfcpy( char8_t *D, size_t DLEN, char8_t const *S, size_t SLEN ) is not nul terminating the string. After modifying that function to ensure the copy is nul terminated the segfault goes away

    Edit: I'm not going to look at it too much, but:

    Code:
    size_t utfcpy( char8_t *D, size_t DLEN, char8_t const *S, size_t SLEN ) {
        size_t i;
        if ( DLEN > SLEN ) DLEN = SLEN;
        for ( i = 0; *D && i < DLEN; ++i )
            D[i] = S[i];
        D[i] = '\0';
        return i;
    Stops the invalid reads (which are causing the segfault). The question now, though, is whether my modified code does what you intend it to do or not. That's for you to answer You may have to change the macro NEXTC_C_LENG (currently 7) to 8... dunno, don't have time to look at it too hard at the moment but the function does seem to be problematic
    I didn't notice that, thanks But no that didn't fix the segfault since it turned out I had been misusing a global variable thinking it was for something else, I just added another global with a similar name that ends with node instead, won't make the same mistake as long as I see that in the dropdown list. The misuse caused the program to try and access an index that didn't exist in my utf_to_iconv array, he new global is now used in it's place and encoding now recieves a valid index

  6. #6
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Well, it fixes (somewhat) the segfault for me. I'm 99.9% sure the problem is in that function in any case. Other parts of the code probably need to be modified to accommodate. Maybe get ride of the global variables

  7. #7
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    They have to stay unfortunately, without them I have no quick means of converting to and from the encoding the system is using, the alternative is a long .......... function that does a lot of comparisons to determine whether calling iconv() is a wise idea, they are also preopened cds for iconv() to save time (which is needed for compiling)

  8. #8
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Quote Originally Posted by awsdert View Post
    They have to stay unfortunately, without them I have no quick means of converting to and from the encoding the system is using, the alternative is a long .......... function that does a lot of comparisons to determine whether calling iconv() is a wise idea, they are also preopened cds for iconv() to save time (which is needed for compiling)
    Oh those globals. Yeah, they seemed reasonably reasonable (i.e. justifiable) when I saw them

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Using iconv/iconv_open
    By awsdert in forum C Programming
    Replies: 0
    Last Post: 09-16-2019, 04:16 PM
  2. C++ using iconv
    By Chris87 in forum C++ Programming
    Replies: 5
    Last Post: 01-25-2018, 10:11 PM
  3. In GDB no segmentation fault but while running segmentation fault
    By Tamim Ad Dari in forum C++ Programming
    Replies: 2
    Last Post: 12-10-2013, 11:16 AM
  4. iconv.dll
    By d1987 in forum Windows Programming
    Replies: 0
    Last Post: 08-16-2010, 09:28 AM
  5. segmentation fault and memory fault
    By Unregistered in forum C Programming
    Replies: 12
    Last Post: 04-02-2002, 11:09 PM

Tags for this Thread