Thread: Manually reading a character literal

  1. #1
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733

    Manually reading a character literal

    So I've started with the easiest to deal with annoyance, escape characters, this is the function I got so far:
    Code:
    int rdEscChr( int_c_voidp_t getchr, void *source, int c, char32_t *c32 ) {
    	uint_least64_t num = 0;
    	char text[5] = {0};
    	str_t str = {0};
    	if ( c !== '\\' ) {
    		c = -1;
    		goto rdEscChr_done;
    	}
    	str.txt = text;
    	str.cap = 5;
    	switch ( (c = getchr(source)) ) {
    	case '0':
    		c = getchr(source);
    		if ( !isdigit(c) )
    			break;
    		c = rdU64_base62( getchr, source, c, 8, 0, &num );
    		goto rdEscChr_done;
    	case 'b': num = '\b'; break;
    	case 'f': num = '\f': break;
    	case 'n': num = '\n'; break;
    	case 'r': num = '\r'; break;
    	case 't': num = '\t'; break;
    	case 'u':
    		text[0] = getchr(source);
    		text[1] = getchr(source);
    		text[2] = getchr(source);
    		text[3] = getchr(source);
    		str.len = 4;
    		(void)rdU64_base62( (int_c_voidp_t)sgetc, &str, sgetc(&str), 16, 0, &num );
    		break;
    	case 'v': num = '\v'; break;
    	case 'x':
    		text[0] = getchr(source);
    		text[1] = getchr(source);
    		str.len = 3;
    		(void)rdU64_base62( (int_c_voidp_t)sgetc, &str, sgetc(&str), 16, 0, &num );
    		break;
    	// Escape characters \ ' " and ^ do not need to be explicitly defined here
    	default: num = c;
    	}
    	c = getchr(source);
    	rdEscChr_done:
    	if ( c32 ) *c32 = (char32_t)num;
    	return c;
    }
    If you want the details of rdU64_base62, str_t and sgetc then refer back to a previous thread of mine: Manually reading an integer literal
    I'll worry about making those C89 compatible later if I decide to put them in a library (which I am considering atm).

    Anyways does anyone see any potential problems or missing stuff?

    Edit: Within a minute or 2 of posting this I noticed a bunch of problems myself and fixed them, I've now replaced the above code with that.
    Last edited by awsdert; 09-05-2019 at 04:12 AM.

  2. #2
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,739
    You may want to add '\e', it's not exactly standard but it's widely used for ANSI escape sequences.
    Devoted my life to programming...

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by awsdert
    Within a minute or 2 of posting this I noticed a bunch of problems myself and fixed them, I've now replaced the above code with that.
    Write unit tests?
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by laserlight View Post
    Write unit tests?
    I'll be doing that after I made sure nothing is missing like the below
    Quote Originally Posted by GReaper View Post
    You may want to add '\e', it's not exactly standard but it's widely used for ANSI escape sequences.
    What code would that translate to? Does it expect anything more? This is the 1st source I found for literal characters: Escape character - Wikipedia
    Is there any better you know of? Also need to know what to look for before and after the character such as u,U,u8 & L or is that all of them?

  5. #5
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by awsdert
    I'll be doing that after I made sure nothing is missing like the below
    You should write the tests before making further changes, to help you think through what might be missing, and detect any regressions due to your changes.
    Last edited by laserlight; 09-05-2019 at 05:20 AM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  6. #6
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Well I don't expect there to be regressions on a 1st test nor further on given that this is a self contained function that should never need to be modified in the C environment, maybe if new escape standards crop up but by that time it would already have been uploaded to github once completed. Anyway since I still don't know what \e is supposed to result in I tried throwing it in the cases like the rest, resulted in a '' character which I don't think is what is supposed to go there, I guess cc/gcc doesn't support it so I can't use that method of verifying. Incidentally the main reason I haven't written an automated tester is because the file I'm testing in will not be directly included in the final project, rather I will be copy pasting what I need, adapting and then testing manually again before writing a the appropriate unit tests, then again I might just move it all into a library and link it to this file and use this file for the unit tests, we'll see when I feel I'm done with this set of functions.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by awsdert
    Well I don't expect there to be regressions on a 1st test nor further on given that this is a self contained function that should never need to be modified in the C environment
    That implies that you're saying your code is perfect so you're not going to change it any more... but then you're asking for people to suggest improvements. I mean, whatever. If you don't want to do something that is beneficial to you, that's your own problem, and so you will continue to find yourself in situations like this: "Anyways does anyone see any potential problems or missing stuff?

    Edit: Within a minute or 2 of posting this I noticed a bunch of problems myself"
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by laserlight View Post
    That implies that you're saying your code is perfect so you're not going to change it any more
    You've misinterpreted, when I said further on I meant the code would get better as most of the logic paths are fine for this function, at most it needs a tweak here & there but nothing major that would completely change the logic paths used. As for no regressions on 1st test it's simple, you can't have a regression on unused code (well theoretically possible but it would never be known by anyone but god since only he would know if it was in a working state before being changed before being tested). Once I'm done making sure this function I will move onto the container function that looks for the modifiers and converts the character, by no means am I implying my code is perfect (however hard I try to get it so)

  9. #9
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Is it safe assigning -1 to 'num' (or c32 for that matter) considering num is unsigned? Other than that I guess the function is perfect. Except, why is 'source' a void*?

  10. #10
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by Hodor View Post
    Is it safe assigning -1 to 'num' (or c32 for that matter) considering num is unsigned? Other than that I guess the function is perfect. Except, why is 'source' a void*?
    again Hodor the getchr() and source variable are abstracted to indicate the function doesn't care how the characters are provided only that they are provided, as for that -1 I'll have a look what for what you mean 1st, in mean time I'll post my updated function:
    Code:
    int ishex( int c ) {
    	return ((c >= '0' && c <='9')
    				|| (c >= 'A' && c <= 'F')
    				|| (c >= 'a' && c <= 'f'));
    }
    
    int rdEscChr( int_c_voidp_t getchr, void *source, int c, char32_t *c32 ) {
    	uint_least64_t num = 0;
    	char *mbc = (char*)c32, *var;
    	int i, m;
    	if ( c != '\\' ) {
    		c = -1;
    		goto rdEscChr_done;
    	}
    	switch ( (m = getchr(source)) ) {
    	case 'a': num = '\a'; break;
    	case 'b': num = '\b'; break;
    	case 'e': num = '\e'; break;
    	case 'f': num = '\f'; break;
    	case 'n': num = '\n'; break;
    	case 'r': num = '\r'; break;
    	case 't': num = '\t'; break;
    	case 'u': c = rdU64_base62( getchr, source, getchr(source), 4, 16, 0, &num ); goto rdEscChr_done;
    	case 'U': c = rdU64_base62( getchr, source, getchr(source), 8, 16, 0, &num ); goto rdEscChr_done;
    	case 'v': num = '\v'; break;
    	case 'x': c = rdU64_base62( getchr, source, getchr(source), 4, 16, 0, &num ); goto rdEscChr_done;
    	// Escape characters \ ' " and ^ do not need to be explicitly defined here
    	default:
    		if ( isdigit(m) ) {
    			c = rdU64_base62( getchr, source, m, 3, 8, 0, &num );
    			goto rdEscChr_done;
    		}
    		else num = m;
    	}
    	c = getchr(source);
    	rdEscChr_done:
    	if ( c32 ) {
    		var = getenv("MBCHAR_DIR");
    		if ( var && var[0] == '>') {
    			for ( i = 0; i < 4; ++i ) {
    				if ( mbc[i] == 0 ) {
    					break;
    				}
    			}
    			if ( i == 4 ) {
    				*c32 = num;
    				return -1;
    			}
    			*c32 |= (num << (i * 8));
    		}
    		else *c32 = (((*c32) << 8) | (char32_t)num);
    	}
    	return c;
    }
    In the middle of manually testing stuff, most of escaped characters work fine, I think I will be adding a new parameter to rdU64_base62 though to limit the number of characters read (or set infinite if the parameter is 0)

    Edit: made some small changes, updated the above to reflect
    Last edited by awsdert; 09-06-2019 at 03:52 AM.

  11. #11
    Registered User
    Join Date
    Apr 2017
    Location
    Iran
    Posts
    138
    Quote Originally Posted by awsdert View Post
    again Hodor the getchr() and source variable are abstracted to indicate the function doesn't care how the characters are provided only that they are provided, as for that -1 I'll have a look what for what you mean 1st, in mean time I'll post my updated function:
    Code:
    int ishex( int c ) {
        return ((c >= '0' && c <='9')
                    || (c >= 'A' && c <= 'F')
                    || (c >= 'a' && c <= 'f'));
    }
    [...]

    AFAIR according to C standard digits '0' to '9' should be consecutive. But not necessarily for alphabets ('a' to 'f').

    By the way , why you use goto in the code ?

  12. #12
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by ordak View Post
    [...]

    AFAIR according to C standard digits '0' to '9' should be consecutive. But not necessarily for alphabets ('a' to 'f').

    By the way , why you use goto in the code ?
    goto isn't bad, in this case hit helps bypass an unwanted call to getchr() which would throw the caller off track when it attempts to read beyond the escape character, consider '\x30\x31' you'd normally expect it to become '01' wouldn't you? but if I did not bypass that call then the output would instead be '0x31' which no doubt confuse you into thinking you somehow entered the '0' wrong but instead it would be the function reading the hex that would be at fault

  13. #13
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Found a better article for the escape characters, Escape sequences in C - Wikipedia
    Actually mentions that \e so I have some information to work from.

  14. #14
    Registered User
    Join Date
    Apr 2017
    Location
    Iran
    Posts
    138
    Quote Originally Posted by awsdert View Post
    goto isn't bad, in this case hit helps bypass an unwanted call to getchr() which would throw the caller off track when it attempts to read beyond the escape character, consider '\x30\x31' you'd normally expect it to become '01' wouldn't you? but if I did not bypass that call then the output would instead be '0x31' which no doubt confuse you into thinking you somehow entered the '0' wrong but instead it would be the function reading the hex that would be at fault
    Can not you use ungetc like here ?

  15. #15
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    How is 2 calls better than simply skipping the call to begin with? Are you one of those "goto is bad no matter what" people? That's a bad attitude to have in programming, the attitude should have is "right thing in the right place"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Manually reading an integer literal
    By awsdert in forum C Programming
    Replies: 17
    Last Post: 09-04-2019, 03:36 PM
  2. Problems reading in character
    By jjohan in forum C Programming
    Replies: 8
    Last Post: 09-11-2014, 01:45 AM
  3. vsprintf ommit literal percent character
    By Niara in forum C Programming
    Replies: 3
    Last Post: 03-05-2012, 04:34 PM
  4. How to keep reading the same character in a file
    By nndhawan in forum C Programming
    Replies: 5
    Last Post: 04-04-2011, 05:53 PM
  5. reading a character
    By Calavera in forum C Programming
    Replies: 6
    Last Post: 11-23-2004, 12:25 PM

Tags for this Thread