Thread: int CHARACTER/S to char/*

  1. #1
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733

    int CHARACTER/S to char/*

    I get how to check if the integer is greater than UCHAR_MAX for easy conversion, I also know of wc/stomb/s functions but I cannot find the ic/stomb/s variants that should be available for functions that do such conversions (e.g. printf)

    Edit: Btw this is what I currently have:
    Code:
    case 'c':
    	if ( !dst ) return 1;
    	if (imax <= UCHAR_MAX && size) *((udac_t*)dst) = (udac_t)imax;
    	else if (imax <= WCHAR_MAX && imax >= WCHAR_MIN && size >= sizeof(wchar_t))
    		wctomb( dst, (wchar_t)imax );
    	else if (imax <= INT_MAX && imax >= INT_MIN && size >= sizeof(int))
    		/* Is this the right way to do it? */
    		*((int*)dst) = (int)imax;
    	else return 1;
    	break;
    Last edited by awsdert; 01-25-2019 at 05:39 AM.

  2. #2
    Registered User
    Join Date
    Dec 2017
    Posts
    1,633
    I have no idea what you are talking about.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  3. #3
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    the printf/fprintf/sprintf definition says that it takes an int for "%c", that implies that somewhere in the library there is a function or functions that do the same for an int to char* as wctomb() would do for a wchar_t to char* since these functions only print to char* buffers and it is fools folly to repeat the same code in similar functions when a common function can take that particular load itself, I'm not looking for itoa() since that covers the %i %d formats (also I already used a more reliable method than a sometimes available function for that purpose)

    Edit: It would be like converting UTF32 characters to UTF8 characters, although in this case it's not restricted to that and I'd prefer a standard function to cover that load
    Last edited by awsdert; 01-25-2019 at 06:26 PM.

  4. #4
    Registered User
    Join Date
    Dec 2017
    Posts
    1,633
    Quote Originally Posted by awsdert View Post
    that implies that somewhere in the library there is a function or functions that do the same for an int to char*
    No it doesn't. In C, a character constant, like 'c', is an int. Also, when a single char is passed to a function with a variable number of parameters (like printf), it is passed as an int. There is no "conversion" to print an int as a char. All that's needed is to send the low byte to the output. So 'c' is represented as 0x00000063 and printf (or putchar) simply sends the low byte, 0x63, to the output.
    Last edited by john.c; 01-25-2019 at 08:44 PM.
    A little inaccuracy saves tons of explanation. - H.H. Munro

  5. #5
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Did you not catch my mention of %c and snprintf? Also since there are code formats that go as large as 0x10000 or higher (such as UTF32) and since the code take will be system dependent support for converting said codes to a system multibyte character must also be supported, neither of which are suited for hardcoding like I can do with an alternative itoa(),
    and even if there is no need of something like icstombs() there would still be a need for something like ictomb() for simplifying the code of snprintf() & strset() which both support int to char* conversion, unless they return an error for being outside wchar_t range I cannot see how they would convert beyond that 0xFFFF (as set by M$ and from what I've read recent builds of linux variants) without introducing error prone code, the simplest way is to always have a ictomb() or lctomb() variant of wctomb() and map to it on systems with wchar_t mapped to int/long (dependent on whether the int is big enough since there are 16bit variants in systems using __LP32__ or similar macros), otherwise they would then just wrap around it when wchar_t maps to a short

    Edit:
    Here's an quote of the kind of multibyte character I need to make on systems where wchar_t maps to a short:
    The GNU C Library - Extended Characters
    The basic sequences consist of

    single bytes with values in the range 0 through 0177.
    two-byte sequences in which the first byte is in the range from 0200 through 0207, and the second byte is in the range from 0240 through 0377.
    three-byte sequences in which the first byte is in the range from 0210 through 0217, and the other bytes are in the range from 0240 through 0377.
    four-byte sequences in which the first byte is in the range from 0220 through 0227, and the other bytes are in the range from 0240 through 0377.
    Last edited by awsdert; 01-26-2019 at 02:55 AM.

  6. #6
    Registered User
    Join Date
    Dec 2017
    Posts
    1,633
    Can you read? Then please read my posts carefully next time.

    Quote Originally Posted by awsdert
    Did you not catch my mention of %c and snprintf?
    Yes, I know you said "%c" (I didn't think you were talking about the character 'c').
    No, you never mentioned snprintf. You said "printf" in your first post and you said "printf/fprintf/sprintf" in your second. So I used printf in my post. I used 'c' as an example character. It could be any character.

    Quote Originally Posted by awsdert
    there are code formats that go as large as 0x10000 or higher (such as UTF32)
    printf and friends cannot deal with a character code of 0x10000 as a single character since that would need to be a wchar_t. Instead, they will deal with such a character as a series of bytes, hence the term "multi-byte". They don't need any special way to handle the multi-byte characters since all they need to do is send the individual bytes on to the output device which will deal with the multi-byte sequence (if it can).

    Maybe this program will teach you something.
    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <locale.h>
    #include <wchar.h>
     
    int main() {
        setlocale(LC_ALL, "en_US.utf8");
     
        const wchar_t *wcs = L"x£ΣאᏇ𝄞";  // 6 wide characters (plus a wide null char)
        char mb[MB_CUR_MAX + 1];
     
        for (const wchar_t *wc = wcs; *wc; ++wc) {
            int len = wctomb(mb, *wc);
            mb[len] = '\0';
     
            // Print the character (as a multi-byte character)
            // Print the wchar_t (as an integer, in hex)
            printf("%s  %08X  ", mb, *wc);
     
            // Print the separate bytes of the multi-byte character representation
            for (int i = 0; i < len; ++i)
                printf(" %02X", (unsigned char)mb[i]);
     
            putchar('\n');
        }
    }
    A little inaccuracy saves tons of explanation. - H.H. Munro

  7. #7
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    No what I mean is something like this:
    Code:
    char mb[sizeof(int)+1] = {0};
    sprintf( mb, "%c", '\U+10000' );
    This case assumes UTF32 handling but surely with this you finally understand what I'm getting at right?

  8. #8
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    I ended up writing my own function:
    Code:
    int dalctomb( char *dst, size_t size, long c, size_t *bytes ) {
    static long endian = 0x01234567L;
    static char *ec = (char*)&endian;
    	size_t cb = 1;
    	int i, j, add;
    	void *vc = &c;
    	char *pc = (char*)vc;
    	if ( c < 0 ) goto dalctomb_range;
    	if ( c <= UCHAR_MAX ) {
    		*dst = *pc;
    		goto dalctomb_done;
    	}
    	cb = sizeof(wchar_t) + 2;
    	if ( size < sizeof(wchar_t) + 2 ) goto dalctomb_size;
    	if ( c <= WCHAR_MAX ) {
    		wctomb( dst, *((wchar_t*)vc) );
    		goto dalctomb_done;
    	}
    #if WCHAR_MAX == SHRT_MAX
    	cb = sizeof(long) + 2;
    	if ( size < 6 ) goto dalctomb_size;
    	if ( c <= 0x10FFFF ) {
    		switch ( *ec ) {
    		case 0x01: i = 3; j = 1; add = -1; break;
    		case 0x23: i = 2; j = 0; add = 1; break;
    		case 0x45: i = 1; j = 3; add = -1; break;
    		case 0x67: i = 0; j = 2; add = 1; break;
    		/* Just in case */
    		default: goto dalctomb_fail;
    		}
    		dst[3] = pc[i];
    		dst[2] = pc[i + add];
    		dst[1] = pc[j];
    		*dst = 0220;
    	}
    #endif
    	switch ( 0 ) {
    	case ESIZE: dalctomb_size: i = ESIZE; break;
    	case ERANGE: dalctomb_range: i = ERANGE; break;
    	case EPARAM: dalctomb_param: i = EPARAM; break;
    	case 1: dalctomb_fail: i = 1; break;
    	default: dalctomb_done: i = 0;
    	}
    	if ( bytes ) *bytes = (i == 0) ? cb : 0;
    	return i;
    }
    ...
    case 'c':
    	if ( imax < LONG_MIN || imax > LONG_MAX ) return ERANGE;
    	result = dalctomb( dst, size, (long)imax, done );
    	break;

  9. #9
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Now having understood some key functions I relied there are unavailable in kernel code I've had to make a full fledged function, luckily I found a decent resource explaining UTF8 multi-bytes which is sufficient for now, I'll consider OEM crap at a later date
    Code:
    const size_t CDAM_MAX_BIT = IDAC_BIT - 2;
    /* Account for '\0' also */
    const size_t CDAM_MAX_LEN = (32/CDAM_MAX_BIT) + (IDAC_BIT>=8 ? 2 : 3);
    int dalctoc( cdac_t *str, size_t size, unsigned long c32 ) {
    	size_t i = 0, bits = 0, bytes = 0;
    	unsigned long bit = IDAL_MIN, rec;
    	if ( !str || size < CDAM_MAX_LEN ) return 1;
    	if ( c32 <= IDAC_MAX ) {
    		*str = (cdac_t)c32;
    		return 0;
    	}
    	memset( str, 0, CDAM_MAX_LEN );
    	for ( ; !(bit & c32); bit >>= 1 );
    	rec = bit;
    	for ( ; bit; bit >>= 1, ++bits );
    	bytes = (bits / CDAM_MAX_BIT) + ((bits % CDAM_MAX_BIT) ? 1 : 0);
    	bit = IDAC_MIN;
    	for ( ; bytes; --bytes ) {
    		bit >>= 1;
    		str[i] |= bit;
    		if ( (udac_t)(str[i]) == UDAC_MAX ) {
    			++i;
    			bit = IDAC_MIN;
    		}
    	}
    	for ( ; i < CDAM_MAX_LEN && c32; ++i ) {
    		for ( ; c32 && bit; c32 <<= 1 ) {
    			if ( c32 & rec ) {
    				str[i] |= (cdac_t)bit;
    			}
    		}
    		bit = IDAL_MIN;
    		bit >>= 2;
    	}
    	return 0;
    }
    The typedefs are just wrappers between M$ CHAR etc and normal char etc, I didn't wanna play type games with M$ crap so just created typedefs for it, wrote this late at night after a long day at work, heading to bed now, let me know if you see problems with it. Here's the link to that reference I mentioned

  10. #10
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Was gonna edit the above but can't do that now, I modified the code a bit to account for the scenario where long > 32bits. < 32bits has yet to be handled but not urgent for what I'm doing yet
    Code:
    const size_t CDAM_MAX_BIT = IDAC_BIT - 2;
    /* Account for '\0' also */
    const size_t CDAM_MAX_LEN = (32/(IDAC_BIT - 2)) + (IDAC_BIT>=8 ? 2 : 3);
    int dalctoc( cdac_t *dst, size_t upto, unsigned long c32, size_t *done ) {
    	size_t i = 0, bits = 0, bytes = 0;
    	unsigned long bit = IDAL_MIN, rec;
    	if ( !dst || upto < CDAM_MAX_LEN
    #if UDAL_MAX >= 0xFFFFFFFFUL
    	|| c32 > 0xFFFFFFFFUL
    #endif
    	) {
    		if ( done ) *done = 0;
    		return 1;
    	}
    	memset( dst, 0, CDAM_MAX_LEN );
    	if ( c32 <= IDAC_MAX ) {
    		*dst = (cdac_t)c32;
    		if ( done ) *done = 1;
    		return 0;
    	}
    	for ( ; !(bit & c32); bit >>= 1 );
    	rec = bit;
    	for ( ; bit; bit >>= 1, ++bits );
    	bytes = (bits / CDAM_MAX_BIT) + ((bits % CDAM_MAX_BIT) ? 1 : 0);
    	bit = IDAC_MIN;
    	for ( ; bytes; --bytes ) {
    		bit >>= 1;
    		dst[i] |= bit;
    		if ( (udac_t)(dst[i]) == UDAC_MAX ) {
    			++i;
    			bit = IDAC_MIN;
    		}
    	}
    	for ( ; i < CDAM_MAX_LEN && c32; ++i ) {
    		for ( ; c32 && bit; c32 <<= 1 ) {
    			if ( c32 & rec ) {
    				dst[i] |= (cdac_t)bit;
    			}
    		}
    		bit = IDAL_MIN;
    		bit >>= 2;
    	}
    	if ( done ) *done = i;
    	return 0;
    }
    For anyone curios about why I needed this: GitHub - awsdert/da: Open source Driver API

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Copy every single character from a char into another char.
    By Roscoe Araujo in forum C Programming
    Replies: 8
    Last Post: 06-06-2013, 03:22 AM
  2. Location of Character in Char Array
    By Tien1868 in forum C Programming
    Replies: 2
    Last Post: 08-04-2009, 07:03 PM
  3. Getting a character from a char one at a time
    By Godders_2k in forum C Programming
    Replies: 14
    Last Post: 11-08-2007, 03:20 AM
  4. Find '&' character in char array?
    By tidemann in forum C Programming
    Replies: 7
    Last Post: 10-19-2006, 05:04 AM
  5. Changing one character in a char[]
    By codegirl in forum C++ Programming
    Replies: 4
    Last Post: 09-15-2003, 03:40 PM

Tags for this Thread