Thread: Manually reading a character literal

  1. #16
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    My rule on "goto" is a newbie programmer should never use it. You are not a newbie.
    Someone, stated on this site; that "goto" should only be used to go forward.
    Someone else, stated that "goto" should only be used to handle exception in code (as it should not be the normal path)

    I think your code seems to follow all the rules I have read.

    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  2. #17
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Fair enough although it seems unlikely a newbie would learn how to use it if they don't use it to me but that's still better than never use

  3. #18
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Hmm. Well, your answer in the other thread didn't answer my question. You have (in this example)

    Code:
    int rdEscChr( int_c_voidp_t getchr, void *source, int c, char32_t *c32 )
    instead of the clearer

    Code:
    int rdEscChr( int getchr(void *), void *source, int c, char32_t *c32 )
    and you answered by saying that by using a typedef it's more "abstracted" when that is clearly not the case. Your version is no more abstract than the clearer version. My original question was not about void pointers but about typedef'ing a function pointer but you clearly misread.

    Anyway, back to the current question and -1. On line 33 of your amended code you're assigning a signed integer (m) to an integer object that is unsigned (num). It is quite possible (and likely) that 'm' is -1. Is that safe?

  4. #19
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Quote Originally Posted by ordak View Post
    Can not you use ungetc like here ?
    Or, better still (IMO) don't read the source character-by-character because that just makes things harder than they need to be

  5. #20
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by Hodor View Post
    Hmm. Well, your answer in the other thread didn't answer my question. You have (in this example)

    Code:
    int rdEscChr( int_c_voidp_t getchr, void *source, int c, char32_t *c32 )
    instead of the clearer

    Code:
    int rdEscChr( int getchr(void *), void *source, int c, char32_t *c32 )
    and you answered by saying that by using a typedef it's more "abstracted" when that is clearly not the case. Your version is no more abstract than the clearer version. My original question was not about void pointers but about typedef'ing a function pointer but you clearly misread.

    Anyway, back to the current question and -1. On line 33 of your amended code you're assigning a signed integer (m) to an integer object that is unsigned (num). It is quite possible (and likely) that 'm' is -1. Is that safe?
    Ah that's what you wee confused by, now I get it, well that was just easier than writing
    Code:
    (int (*)(void*))name
    any time I want to call that function or it's similarly reliant functions. As for the -1 I will have to take a look 2mw after work or gospel (since I probably forget before gospel anyways) gotta go to work now.

    Also hodor I used a function call because it allows me to avoid allocating memory without good cause and since I plan to put this in my own handwritten compiler I need to consider situations where I can't just load the whole file into a string or otherwise, I can just parse the expressions directly for literals and leave just a single literal in it's place (i.e 8 * 2 would become just 16 or var >> (8 * 3) would become just var >> 24) during the preparse stage

  6. #21
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    Quote Originally Posted by awsdert
    I used a function call because it allows me to avoid allocating memory without good cause and since I plan to put this in my own handwritten compiler I need to consider situations where I can't just load the whole file into a string or otherwise
    It is common for lexical analysis to proceed character by character to form lexemes, so you're in good company.

    Quote Originally Posted by awsdert
    I can just parse the expressions directly for literals and leave just a single literal in it's place (i.e 8 * 2 would become just 16 or var >> (8 * 3) would become just var >> 24) during the preparse stage
    That's beyond parsing as it is constant folding optimisation, not "preparse stage" Of course, if you see fit to do this while parsing, that's your prerogative as the author, but it isn't actually related to the notion of doing character by character lexical analysis: either way constant folding could be mixed in during parsing, though it seems to me that that may just over-complicate the parsing as opposed to doing it at a later stage, e.g., optimisation after generating an intermediate representation, where you might apply constant folding optimisation after constant propagation optimisation.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #22
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    Quote Originally Posted by awsdert View Post
    Fair enough although it seems unlikely a newbie would learn how to use it if they don't use it to me but that's still better than never use
    Till a newbie knows loops completely; they should not think of using "goto".

    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  8. #23
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Okay I've looked around my code and made a ton of changes (most were actually a couple days ago but didn't get round to actually posting it), I now have tested the reading in section enough to feel confident it will not use what it doesn't understand, the u8, u & U that comes before a literal can be "understood" but not yet actually used. Here's what I have now:
    Code:
    #include <limits.h>
    #include <inttypes.h>
    #include <errno.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <uchar.h>
    #include <ctype.h>
    
    typedef signed char schar;
    typedef unsigned char uchar, char8_t;
    typedef signed long long sllong;
    typedef unsigned long long ullong;
    typedef int (*int_c_voidp_t)( void *source );
    typedef struct readchar {
    	int p;
    	int c;
    	char const *func;
    	void *source;
    	int_c_voidp_t getchr;
    	int_c_voidp_t endofi;
    } readchar_t;
    int nextchar( readchar_t *readchar ) {
    	if ( !readchar ) return EINVAL;
    	if ( readchar->endofi(readchar->source) ) return -1;
    	readchar->p = readchar->c;
    	readchar->c = readchar->getchr(readchar->source);
    	return 0;
    }
    
    typedef struct str {
    	long pos;
    	long cap;
    	long len;
    	char * txt;
    } str_t;
    int sgetc( str_t *str ) {
    	return str ? (((str->pos) >= 0) ?
    		(((str->pos) < (str->len)) ? str->txt[(str->pos)++] : 0) : -ERANGE)
    		: -EADDRNOTAVAIL;
    }
    
    
    #define bitsof(T) (sizeof(T) * CHAR_BIT)
    #define _2STR( DATA ) #DATA
    #define TOSTR( DATA ) _2STR( DATA )
    
    #define BASE_NUM "0123456789"
    #define BASE_a2z "abcdefghijklmnopqrstuvqxyz"
    #define BASE_A2Z "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    
    char upper_base62_text[] = BASE_NUM BASE_A2Z BASE_a2z;
    char lower_base62_text[] = BASE_NUM BASE_a2z BASE_A2Z;
    
    int rdU64_base62
    (
    	readchar_t *readchar, long minlen, long maxlen,
    	uint base, _Bool lowercase1st, uint_least64_t *num
    )
    {
    	uint i = 0;
    	uint_least64_t n = 0;
    	long len = 0;
    	char *base_text =
    			lowercase1st ? lower_base62_text : upper_base62_text;
    	if ( !readchar ) return EADDRNOTAVAIL;
    	readchar->func = __func__;
    	if ( minlen < 0 ) minlen = 0;
    	if ( base < 2 || base > 62 || (maxlen > 0 && maxlen < minlen) ) {
    		if ( num ) *num = 0;
    		return ERANGE;
    	}
    	else if (base < 37) {
    		do {
    			for ( i = 0; i < base &&
    				readchar->c != upper_base62_text[i] &&
    				readchar->c != lower_base62_text[i]; ++i );
    			if ( i == base ) break;
    			n *= base;
    			n += i;
    			++len;
    			if (nextchar(readchar) != 0) break;
    			if (maxlen > 0 && len == maxlen) break;
    		} while ( readchar->c );
    	}
    	else {
    		do {
    			for ( i = 0; i < base && readchar->c != base_text[i]; ++i );
    			if ( i == base ) break;
    			n *= base;
    			n += i;
    			++len;
    			if (nextchar(readchar) != 0) break;
    			if (maxlen > 0 && len == maxlen) break;
    		} while ( readchar->c );
    	}
    	if ( num ) *num = n;
    	if ( len < minlen ) return EINVAL;
    	return 0;
    }
    
    int ishex( int c ) {
    	return ((c >= '0' && c <='9')
    				|| (c >= 'A' && c <= 'F')
    				|| (c >= 'a' && c <= 'f'));
    }
    
    int rdEscChr( readchar_t *readchar, char32_t *c32 ) {
    	uint_least64_t num = 0;
    	char const *var = getenv("ESCAPE_CHAR"), def[] = "\\";
    	if ( !var ) var = def;
    	if ( !readchar || readchar->c != var[0] ) {
    		if ( c32 ) *c32 = 0;
    		if ( !readchar ) return EINVAL;
    		if ( readchar->c == 0 ) return 0;
    		else if ( readchar->c < 0 ) return 1;
    		else return EINVAL;
    	}
    	readchar->func = __func__;
    	nextchar(readchar);
    	switch ( readchar->c ) {
    	case 'a': num = 0x07; break;
    	case 'b': num = 0x08; break;
    	case 'e': num = 0x1B; break;
    	case 'f': num = 0x0C; break;
    	case 'n': num = 0x0A; break;
    	case 'r': num = 0x0D; break;
    	case 't': num = 0x09; break;
    	case 'u':
    		nextchar(readchar);
    		rdU64_base62( readchar, 4, 4, 16, 0, &num );
    		readchar->func = __func__;
    		goto rdEscChr_done;
    	case 'U':
    		nextchar(readchar);
    		rdU64_base62( readchar, 8, 8, 16, 0, &num );
    		readchar->func = __func__;
    		goto rdEscChr_done;
    	case 'v': num = 0x0B; break;
    	case 'x':
    		nextchar(readchar);
    		rdU64_base62( readchar, 1, 0, 16, 0, &num );
    		readchar->func = __func__;
    		goto rdEscChr_done;
    	// Escape characters \ ' " and ^ do not need to be explicitly defined here
    	default:
    		if ( readchar->c < 0 ) {
    			if ( c32 ) *c32 = num;
    			return 1;
    		}
    		if ( isdigit(readchar->c) ) {
    			rdU64_base62( readchar, 1, 3, 8, 0, &num );
    			readchar->func = __func__;
    			goto rdEscChr_done;
    		}
    		else num = readchar->c;
    	}
    	nextchar(readchar);
    	rdEscChr_done:
    	if ( c32 ) *c32 = num;
    	return 0;
    }
    char32_t rtl2ltr( char32_t c ) {
    #define by1 (bitsof(char32_t) / 4)
    #define by2 (bitsof(char32_t) / 2)
    #define by3 (by2 + by1)
    #define pt1 (c >> by2)
    #define pt0 (c << by2)
    	return (pt0 << by1) | ((pt0 >> by3) << by2) |
    		((pt1 << by3) >> by2) | (pt1 >> by1);
    #undef by1
    #undef by2
    #undef by3
    #undef pt1
    #undef pt0
    }
    int istrcmp( char const * str, char const * val ) {
    	int s, v;
    	if ( !str ) return val ? -1 : 0;
    	if ( !val ) return 1;
    	while ( *str && *val ) {
    		if ( *str == 0 ) return val ? -1 : 0;
    		if ( *val == 0 ) return 1;
    		if ( *str != *val ) {
    			s = tolower(*str);
    			v = tolower(*val);
    			if ( s != v ) return s - v;
    		}
    		++str;
    		++val;
    	}
    	s = *str;
    	v = *val;
    	return s - v;
    }
    int rdChar( readchar_t *readchar, char32_t *c32 ) {
    	char32_t num = 0, now = 0;
    	int ret = 0, i;
    	char const *var;
    	if ( !readchar ) return EADDRNOTAVAIL;
    	readchar->func = __func__;
    	if ( readchar->c != '\'' ) {
    		if ( c32 ) *c32 = 0;
    		return EINVAL;
    	}
    	for ( i = 0; i < 4; ++i ) {
    		nextchar(readchar);
    		if ( readchar->c == '\'' || readchar->c <= 0 ||
    			readchar->c == '\n' || readchar->c == '\r' )
    			break;
    		if ( readchar->c == '\\' ) {
    			now = 0;
    			if ( (ret = rdEscChr( readchar, &now )) != 0 )
    				break;
    			readchar->func = __func__;
    		}
    		else if ( readchar->c >= 0 ) now = readchar->c;
    		else now = 0;
    		num <<= (bitsof(char32_t) / 4);
    		num |= now;
    	}
    	var = getenv("MB_IS_LTR");
    	if ( var && var[0] != '0' && istrcmp(var,"false") != 0 )
    		num = rtl2ltr( num );
    	if ( c32 ) *c32 = num;
    	if ( readchar->c != '\'' && readchar->c > 0 &&
    		readchar->c != '\n' && readchar->c != '\r' ) {
    		readchar->func = __func__;
    		nextchar(readchar);
    		if ( readchar->c != '\'' )
    			return EINVAL;
    		nextchar(readchar);
    	}
    	return ret;
    }
    
    int main() {
    	int i, c, dstEncoding = 0, srcEncoding = 0;
    	char32_t c32 = 0;
    	readchar_t readchar = {0};
    	readchar.getchr = (int_c_voidp_t)fgetc;
    	readchar.endofi = (int_c_voidp_t)feof;
    	readchar.source = stdin;
    	readchar.func = __func__;
    	// TODO: Get srcEncoding, default dstEncoding as same
    	for ( i = 0; i < 10; ++i ) {
    		(void)puts("Please enter an escape character:");
    		if ( nextchar(&readchar) != 0 ) break;
    		if ( readchar.c == 'u' || readchar.c == 'U' ) {
    			// TODO: Override dstEncoding
    			if ( nextchar(&readchar) != 0 ) break;
    			if ( readchar.c == '8' ) {
    				// TODO: Override dstEncoding again
    				if ( nextchar(&readchar) != 0 ) break;
    			}
    		}
    		switch ( readchar.c ) {
    		case '\\': c = rdEscChr( &readchar, &c32 ); break;
    		case '\'': c = rdChar( &readchar, &c32 ); break;
    		default:
    			puts("No more input found, ending program");
    			return 0;
    		}
    		if ( c != 0 ) {
    			switch ( c ) {
    			case ERANGE: puts("Error: ERANGE"); break;
    			case EINVAL: puts("Error: EINVAL"); break;
    			case EADDRNOTAVAIL: puts("Error: EADDRNOTAVAIL"); break;
    			case 1: puts("Error: Unknown"); break;
    			}
    			return 1;
    		}
    		(void)printf( "Resulting character: '%c'\n", (int)c32 );
    	}
    	return 0;
    }
    I think I filtered out that negative value scenario as much as I can, right now I need suggestions for handling encoding

  9. #24
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Just did a test on 'abcd' as the input (with some modified code to see the full output) and it seems my multibyte character at least results as being the same under the hood as whatever cc is redirecting t (presumably gcc) before any code conversions.

  10. #25
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    I finally found something of use when looking for information regarding iconv which I had come across but could not find header information because the results were all about a useless command, who the heck actually uses that as a command??? Anyways this is what my main() function looks like now, please tell me if I'm using the correct information for calling iconv_open():
    Code:
    int main() {
    	int i, ret = 0, c = 'abcd';
    	char32_t c32 = 0;
    	char *cb = (char*)&c, *var = getenv("LANG");
    	iconv_t l2l,l2u,l2U,l2u8;
    	readchar_t readchar = {0};
    	readchar.getchr = (int_c_voidp_t)fgetc;
    	readchar.endofi = (int_c_voidp_t)feof;
    	readchar.source = stdin;
    	readchar.func = __func__;
    	if ( (l2l = iconv_open(var,var)) < 0 ) {
    		ret = fail( 1, "Fatal", "iconv_open( NULL, NULL ) failed");
    		goto done;
    	}
    	if ( (l2u8 = iconv_open("UTF-8",var)) < 0 ) {
    		ret = fail( 1, "Fatal", "iconv_open( \"UTF-8\", NULL ) failed");
    		goto main_close_l2l;
    	}
    	if ( (l2u = iconv_open("UTF-16",var)) < 0 ) {
    		ret = fail( 1, "Fatal", "iconv_open( \"UTF-16\", NULL ) failed");
    		goto main_close_l2u8;
    	}
    	if ( (l2U = iconv_open("UTF-32",var)) < 0 ) {
    		ret = fail( 1, "Fatal", "iconv_open( \"UTF-32\", NULL ) failed");
    		goto main_close_l2u;
    	}
    	readchar.ic = l2l;
    	printf("'abcd' = '%c', %c:%c:%c:%c\n", c, cb[0], cb[1], cb[2], cb[3] );
    	for ( i = 0; i < 10; ++i ) {
    		(void)puts("Please enter a character:");
    		if ( nextchar(&readchar) != 0 ) break;
    		if ( readchar.c == 'U' ) {
    			readchar.ic = l2U;
    			if ( nextchar(&readchar) != 0 ) break;
    		}
    		else if ( readchar.c == 'u' ) {
    			if ( nextchar(&readchar) != 0 ) break;
    			if ( readchar.c == '8' ) {
    				readchar.ic = l2u8;
    				if ( nextchar(&readchar) != 0 ) break;
    			}
    			else readchar.ic = l2u;
    		}
    		switch ( readchar.c ) {
    		case '\\': c = rdEscChr( &readchar, &c32 ); break;
    		case '\'': c = rdChar( &readchar, &c32 ); break;
    		default:
    			(void)puts("No more input found, ending program");
    			goto main_close_l2U;
    		}
    		if ( c != 0 ) {
    			ret = fail(1, "Fatal", "Unexpected character");
    			goto main_close_l2U;
    		}
    		(void)printf( "Resulting character: '%c'\n", (int)c32 );
    		if ( c32 > 0xFF ) {
    			*((uint*)&c) = c32;
    			printf("%c:%c:%c:%c\n", cb[0], cb[1], cb[2], cb[3] );
    		}
    	}
    	main_close_l2U:
    	iconv_close( l2U );
    	main_close_l2u:
    	iconv_close( l2u );
    	main_close_l2u8:
    	iconv_close( l2u8 );
    	main_close_l2l:
    	iconv_close( l2l );
    	done:
    	return ret;
    }
    Edit: Just noticed I forgot to close the iconv_t handles, updated with the fix, still don't know if I'm using the right information for iconv_open()
    Last edited by awsdert; 09-10-2019 at 05:05 AM.

  11. #26
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    On a related subject I've added in a call to iconv() in my character reading function and am struggling to feed it my temporary buffers correctly, that poxy __restrict keyword was the dumbest thing I've seen added to C, a simple const under all uses of that word is fine, doing otherwise is poor C in my honest opinion. Anyway this is what I've got:
    Code:
    int nextchar( readchar_t *readchar ) {
    	/* More than enough to get it write the
    	 * 1st time even for a 4bit char */
    	int ret = 0;
    	size_t mbs = 20, cbs = 5, did;
    	char mb[20] = {0}, cb[5] = {0};
    	char32_t c32;
    	if ( !readchar ) return EINVAL;
    	if ( readchar->endofi(readchar->source) ) return -1;
    	readchar->p = readchar->c;
    	readchar->c = readchar->getchr(readchar->source);
    	if ( readchar->c < 0 ) return EILSEQ;
    	if ( readchar->ic >= 0 ) {
    		c32 = readchar->c;
    		cb[0] = c32 & 0xFF;
    		c32 >>= 8;
    		cb[1] = c32 & 0xFF;
    		c32 >>= 8;
    		cb[2] = c32 & 0xFF;
    		c32 >>= 8;
    		cb[3] = c32 & 0xFF;
    		did = iconv( readchar->ic, &((char*)cb), &cbs, &((char*)mb), &mbs );
    		(void)iconv( readchar->ic, NULL, NULL, NULL, NULL );
    		if ( did == (size_t)-1 ) {
    			ret = errno;
    			errno = 0;
    		}
    	}
    	return ret;
    }
    And this is what my compiler is spitting out:
    Code:
    make char (in directory: /media/lee/ZXUIJI_1TB/github/mc)
    cc -Wall  -D OUT=main.elf -o char.elf char.c
    char.c: In function ‘nextchar’:
    char.c:45:30: error: lvalue required as unary ‘&’ operand
       did = iconv( readchar->ic, &((char*)cb), &cbs, &((char*)mb), &mbs );
                                  ^
    char.c:45:50: error: lvalue required as unary ‘&’ operand
       did = iconv( readchar->ic, &((char*)cb), &cbs, &((char*)mb), &mbs );
                                                      ^
    char.c: In function ‘main’:
    char.c:290:22: warning: multi-character character constant [-Wmultichar]
      int i, ret = 0, c = 'abcd';
                          ^~~~~~
    makefile:24: recipe for target 'char.elf' failed
    make: *** [char.elf] Error 1
    Compilation failed.
    Since I'm not understanding the fault someone mind clewing me in?

    Edit: Fixed it by using convoluted code:
    Code:
    char mb[20] = {0}, cb[5] = {0}, *mbp = mb, *cbp = cb;
    ...
    did = iconv( readchar->ic, &cbp, &cbs, &mbp, &mbs );
    Last edited by awsdert; 09-10-2019 at 05:58 AM. Reason: Missed a couple of words in a statement

  12. #27
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Okay so after settling on how I want to pass characters from terminal, files, string etc onto my functions I went back to fix any errors that croppped up as a result of switching objects/functions in the middle, after that I continued onto testing my escape character function which possibly faulty but I'll move onto that one after hearing opinions on my other function which I'm sure is faulty since I only just wrote it for an automated test
    The initiation code part:
    Code:
    	// Automatic tests
    	_nextc.nextchr = (func_nextchr)snextchr;
    	_nextc.src = &utf;
    	utf.txt = (char8_t*)u8"\\u2ea2";
    	for ( utf.len = 0; utf.txt[utf.len]; utf.len++ )
    		putchar( utf.txt[utf.len] );
    	utf.cap = utf.len + 1;
    	(void)printf(" = ");
    	utf.pos = 0;
    	if ( !nextc(&_nextc) ) return EXIT_FAILURE;
    	ret = rdEscChr( &_nextc, now, NEXTC_C_LENG );
    	if ( ret != 0 ) return EXIT_FAILURE;
    	for ( utf.pos = 0; utf.pos < NEXTC_C_LENG; utf.pos++ )
    		putchar( now[utf.pos] );
    	putchar( '\n' );
    	// Manual tests
    The snextchr() function I had just written:
    Code:
    typedef struct utf {
    	size_t pos;
    	size_t cap;
    	size_t len;
    	char8_t * txt;
    } UTF;
    int snextchr( UTF *utf, char8_t *c, size_t leng ) {
    	size_t i;
    	for ( i = 0; i < leng; ++i ) {
    		if ( utf->pos >= utf->len ) break;
    		c[i] = utf->txt[utf->pos];
    		while ( c[i] & 0x80 ) {
    			if ( ++i == leng || utf->pos == utf->len ) break;
    			c[i] = utf->txt[utf->pos++];
    		}
    	}
    	c[leng] = 0;
    	return 0;
    }
    Gotta get some sleep now so don't be surprised by a late response

  13. #28
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Since I had already suspected my detection of UTF-8 bytes was faulty I went and checked my wiki bookmark and changed my snextchr function to look like this:
    Code:
    int snextchr( UTF *utf, char8_t *c, size_t leng ) {
    	int ret = 0;
    	size_t i, j, max;
    	if ( !utf || !c ) {
    		if ( c && leng )
    			(void)memset( c, 0, (leng+1) * sizeof(char8_t) );
    		ret = EDESTADDRREQ;
    		FAIL( stderr, ret, "utf and/or c was NULL!");
    		return ret;
    	}
    	for ( i = 0; i < leng; ++i ) {
    		if ( utf->pos >= utf->len ) break;
    		c[i] = utf->txt[utf->pos];
    		if ( c[i] & 0x80 ) {
    			if ( c[i] & 0x40 ) {
    				if ( c[i] & 0x20 )
    					max = ( c[i] & 0x10 ) ? 3 : 2;
    				else max = 1;
    			}
    			ret = EILSEQ;
    			FAIL( stderr, ret, "Corrupt UTF-8 character" );
    			c[i] = 0;
    			break;
    		}
    		else max = 0;
    		for ( j = 0; j < max; ++j ) {
    			if ( ++i == leng || utf->pos == utf->len ) break;
    			c[i] = utf->txt[utf->pos++];
    		}
    	}
    	c[leng] = 0;
    	return ret;
    }
    After fixing the function reading from the terminal to check UTF-8 the same way I tried running it only to encounter a few segfaults, so I fixed those and now I find that I'm somehow not getting any character from my function:
    nextc():
    Code:
    bool nextc( NEXTC *_nextc ) {
    	int ret = nextc_validate( _nextc );
    	size_t i;
    	switch ( ret ) {
    	case 0: break;
    	case ENODATA: return 0;
    	default:
    		_nextc->err = ret;
    		return 0;
    	}
    	for ( i = 0; i < NEXTC_C_SIZE; ++i ) {
    		_nextc->p[i] = _nextc->c[i];
    		_nextc->c[i] = 0;
    	}
    	_nextc->err = _nextc->nextchr( _nextc->src, _nextc->c, NEXTC_C_SIZE );
    	return ( _nextc->err == 0 ) ? 1 : 0;
    }
    Code:
    int rdEscChr( NEXTC *_nextc, char8_t *c, size_t leng ) {
    	int ret = 0;
    	uint_least64_t num = 0;
    	size_t size = (leng + 1) * sizeof(char8_t);
    	char const *esc, def[] = "\\";
    	if ( !_nextc || !c ) {
    		if ( c && size ) (void)memset( c, 0, size );
    		ret = EDESTADDRREQ;
    		FAIL( stderr, ret, "Invalid parameter/s" );
    		return ret;
    	}
    	memset( c, 0, size );
    	esc = getenv("ESCAPE_CHAR");
    	if ( !esc ) esc = def;
    	if ( _nextc->c[0] != esc[0] ) {
    		ret = EILSEQ;
    		FAIL( stderr, ret, "Invalid escape character" );
    		(void)printf( "Expected '%s'\n", esc );
    		return ret;
    	}
    	if ( utf2std == iconv_null &&
    		(ret = get_std_encoding()) != EXIT_SUCCESS )
    		return ret;
    	if ( !nextc(_nextc) )
    		return _nextc->err;
    	switch ( _nextc->c[0] ) {
    	case 'a': num = 0x07; break;
    	case 'b': num = 0x08; break;
    	case 'e': num = 0x1B; break;
    	case 'f': num = 0x0C; break;
    	case 'n': num = 0x0A; break;
    	case 'r': num = 0x0D; break;
    	case 't': num = 0x09; break;
    	case 'u':
    		if ( !nextc(_nextc) ) {
    			ret = _nextc->err;
    			break;
    		}
    		ret = rdU64_base62( _nextc, 4, 4, 16, 0, &num );
    		if ( ret != EXIT_SUCCESS ) return ret;
    		ret = type2utf( c, leng, (char*)&num, sizeof(char32_t), 'u' );
    		return ret;
    	case 'U':
    		if ( !nextc(_nextc) ) {
    			ret = _nextc->err;
    			break;
    		}
    		ret = rdU64_base62( _nextc, 8, 8, 16, 0, &num );
    		if ( ret != EXIT_SUCCESS ) return ret;
    		ret = type2utf( c, leng, (char*)&num, sizeof(char32_t), 'U' );
    		return ret;
    	case 'v': num = 0x0B; break;
    	case 'x':
    		if ( !nextc(_nextc) ) {
    			ret = _nextc->err;
    			break;
    		}
    		ret = rdU64_base62( _nextc, 1, 2, 16, 0, &num );
    		break;
    	default:
    		if ( _nextc->c[0] >= U'0' && _nextc->c[0] <= U'9' ) {
    			ret = rdU64_base62( _nextc, 1, 3, 8, 0, &num );
    			break;
    		}
    		num = _nextc->c[0];
    	}
    	if ( ret != 0 )
    		return ret;
    	c[0] = (char8_t)num;
    	ret = nextc(_nextc) ? EXIT_SUCCESS : _nextc->err;
    	return ret;
    }
    Can anyone spot any possible causes?
    Edit:
    my ouput right now:
    Code:
    make char.run (in directory: /media/lee/ZXUIJI_1TB/github/mc)
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libnext.so -c next.c
    cc -fPIC -Wall -Wno-multichar  -D OUT=char.elf -o ./char.elf char.c ./libnext.so
    ./char.elf
    char.c:497:fnextchr_utf(): Error: 0x00000022, 34, Numerical result out of range, Info: Imcomplete UTF-8 character
    ⺢, 'abcd': 0x00002EA2, 0x61626364
    'abcd' = 'd', d:c:b:a
    \u2ea2 = ''
    std_encoding = 'UTF-8'
    Please enter a character:
    rm char.elf libnext.so
    Compilation finished successfully.

  14. #29
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Having noticed the error line after posting here's my fnextchr_utf() function:
    Code:
    int fnextchr_utf( FILE *file, char8_t *c, size_t leng ) {
    	int ret = ERANGE;
    	size_t i = 0, j, max = 0, size = (leng + 1) * sizeof(char8_t);
    	if ( !file || !c ) {
    		if ( c && size ) (void*)memset( c, 0, size );
    		ret = EDESTADDRREQ;
    		FAIL( stderr, ret, "file and/or c was NULL!" );
    		return ret;
    	}
    	if ( leng < 4 ) {
    		if ( leng ) (void)memset( c, 0, size );
    		FAIL( stderr, ret, "leng was less than 4!" );
    		return ret;
    	}
    	(void)memset( c, 0, size );
    	if ( feof(file) ) return 0;
    	c[i] = fgetc( file );
    	if ( c[i] & 0x80 ) {
    		if ( c[i] & 0x40 ) {
    			if (c[i] & 0x20)
    				max = (c[i] & 0x10) ? 3 : 2;
    			else max = 1;
    		}
    		else {
    			ret = EILSEQ;
    			FAIL( stderr, ret, "Corrupt UTF-8 character" );
    			return ret;
    		}
    	}
    	for ( j = 0; j < max; ++j ) {
    		if ( feof(file) ) {
    			FAIL( stderr, ret, "Imcomplete UTF-8 character" );
    			return ret;
    		}
    		c[++i] = fgetc(file);
    	}
    	return 0;
    }
    Output of a manual run just now:
    Code:
    make char.run
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libnext.so -c next.c
    cc -fPIC -Wall -Wno-multichar  -D OUT=char.elf -o ./char.elf char.c ./libnext.so
    ./char.elf
    ⺢, 'abcd': 0x00002EA2, 0x61626364
    'abcd' = 'd', d:c:b:a
    \u2ea2 = ''
    std_encoding = 'UTF-8'
    Please enter a character:
    \u2ea2       
    No conversion
    Resulting character: '�'
    �:�:�:
    Please enter a character:
    
    No conversion
    No more input found, ending program
    *** stack smashing detected ***: <unknown> terminated
    makefile:34: recipe for target 'char.run' failed
    make: *** [char.run] Aborted (core dumped)
    rm char.elf libnext.so

  15. #30
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Just finished fixing the final known issue with my code, I'd like to have everyone whose willing do their own tests of it and give me feedback on what they see as missing since I just draw a blank when I try to think of even one example to test on. Here is a link to the tar.gz I made of the relevant files:
    char_literal_test.tar.gz - Google Drive
    As long as you're able to just run make & cc then typing "make char.run" in a terminal window opened in the same directory as these files should be enough to get the ball rolling

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Manually reading an integer literal
    By awsdert in forum C Programming
    Replies: 17
    Last Post: 09-04-2019, 03:36 PM
  2. Problems reading in character
    By jjohan in forum C Programming
    Replies: 8
    Last Post: 09-11-2014, 01:45 AM
  3. vsprintf ommit literal percent character
    By Niara in forum C Programming
    Replies: 3
    Last Post: 03-05-2012, 04:34 PM
  4. How to keep reading the same character in a file
    By nndhawan in forum C Programming
    Replies: 5
    Last Post: 04-04-2011, 05:53 PM
  5. reading a character
    By Calavera in forum C Programming
    Replies: 6
    Last Post: 11-23-2004, 12:25 PM

Tags for this Thread