int CHARACTER/S to char/*

**awsdert** · 01-25-2019

I get how to check if the integer is greater than UCHAR_MAX for easy conversion, I also know of wc/stomb/s functions but I cannot find the ic/stomb/s variants that should be available for functions that do such conversions (e.g. printf)

Edit: Btw this is what I currently have:

Code:

case 'c':
	if ( !dst ) return 1;
	if (imax <= UCHAR_MAX && size) *((udac_t*)dst) = (udac_t)imax;
	else if (imax <= WCHAR_MAX && imax >= WCHAR_MIN && size >= sizeof(wchar_t))
		wctomb( dst, (wchar_t)imax );
	else if (imax <= INT_MAX && imax >= INT_MIN && size >= sizeof(int))
		/* Is this the right way to do it? */
		*((int*)dst) = (int)imax;
	else return 1;
	break;

**john.c** · 01-25-2019

I have no idea what you are talking about.

**awsdert** · 01-25-2019

the printf/fprintf/sprintf definition says that it takes an int for "%c", that implies that somewhere in the library there is a function or functions that do the same for an int to char* as wctomb() would do for a wchar_t to char* since these functions only print to char* buffers and it is fools folly to repeat the same code in similar functions when a common function can take that particular load itself, I'm not looking for itoa() since that covers the %i %d formats (also I already used a more reliable method than a sometimes available function for that purpose)

Edit: It would be like converting UTF32 characters to UTF8 characters, although in this case it's not restricted to that and I'd prefer a standard function to cover that load

**john.c** · 01-25-2019

Originally Posted by awsdert

that implies that somewhere in the library there is a function or functions that do the same for an int to char*

No it doesn't. In C, a character constant, like 'c', is an int. Also, when a single char is passed to a function with a variable number of parameters (like printf), it is passed as an int. There is no "conversion" to print an int as a char. All that's needed is to send the low byte to the output. So 'c' is represented as 0x00000063 and printf (or putchar) simply sends the low byte, 0x63, to the output.

**awsdert** · 01-26-2019

Did you not catch my mention of %c and snprintf? Also since there are code formats that go as large as 0x10000 or higher (such as UTF32) and since the code take will be system dependent support for converting said codes to a system multibyte character must also be supported, neither of which are suited for hardcoding like I can do with an alternative itoa(),
and even if there is no need of something like icstombs() there would still be a need for something like ictomb() for simplifying the code of snprintf() & strset() which both support int to char* conversion, unless they return an error for being outside wchar_t range I cannot see how they would convert beyond that 0xFFFF (as set by M$ and from what I've read recent builds of linux variants) without introducing error prone code, the simplest way is to always have a ictomb() or lctomb() variant of wctomb() and map to it on systems with wchar_t mapped to int/long (dependent on whether the int is big enough since there are 16bit variants in systems using __LP32__ or similar macros), otherwise they would then just wrap around it when wchar_t maps to a short

Edit:
Here's an quote of the kind of multibyte character I need to make on systems where wchar_t maps to a short:
The GNU C Library - Extended Characters

The basic sequences consist of

single bytes with values in the range 0 through 0177.
two-byte sequences in which the first byte is in the range from 0200 through 0207, and the second byte is in the range from 0240 through 0377.
three-byte sequences in which the first byte is in the range from 0210 through 0217, and the other bytes are in the range from 0240 through 0377.
four-byte sequences in which the first byte is in the range from 0220 through 0227, and the other bytes are in the range from 0240 through 0377.

**john.c** · 01-26-2019

Can you read? Then please read my posts carefully next time.

Originally Posted by awsdert

Did you not catch my mention of %c and snprintf?

Yes, I know you said "%c" (I didn't think you were talking about the character 'c').
No, you never mentioned snprintf. You said "printf" in your first post and you said "printf/fprintf/sprintf" in your second. So I used printf in my post. I used 'c' as an example character. It could be any character.

Originally Posted by awsdert

there are code formats that go as large as 0x10000 or higher (such as UTF32)

printf and friends cannot deal with a character code of 0x10000 as a single character since that would need to be a wchar_t. Instead, they will deal with such a character as a series of bytes, hence the term "multi-byte". They don't need any special way to handle the multi-byte characters since all they need to do is send the individual bytes on to the output device which will deal with the multi-byte sequence (if it can).

Maybe this program will teach you something.

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <wchar.h>
 
int main() {
    setlocale(LC_ALL, "en_US.utf8");
 
    const wchar_t *wcs = L"x£ΣאᏇ𝄞";  // 6 wide characters (plus a wide null char)
    char mb[MB_CUR_MAX + 1];
 
    for (const wchar_t *wc = wcs; *wc; ++wc) {
        int len = wctomb(mb, *wc);
        mb[len] = '\0';
 
        // Print the character (as a multi-byte character)
        // Print the wchar_t (as an integer, in hex)
        printf("%s  %08X  ", mb, *wc);
 
        // Print the separate bytes of the multi-byte character representation
        for (int i = 0; i < len; ++i)
            printf(" %02X", (unsigned char)mb[i]);
 
        putchar('\n');
    }
}

**awsdert** · 01-26-2019

No what I mean is something like this:

Code:

char mb[sizeof(int)+1] = {0};
sprintf( mb, "%c", '\U+10000' );

This case assumes UTF32 handling but surely with this you finally understand what I'm getting at right?

**awsdert** · 01-28-2019

I ended up writing my own function:

Code:

int dalctomb( char *dst, size_t size, long c, size_t *bytes ) {
static long endian = 0x01234567L;
static char *ec = (char*)&endian;
	size_t cb = 1;
	int i, j, add;
	void *vc = &c;
	char *pc = (char*)vc;
	if ( c < 0 ) goto dalctomb_range;
	if ( c <= UCHAR_MAX ) {
		*dst = *pc;
		goto dalctomb_done;
	}
	cb = sizeof(wchar_t) + 2;
	if ( size < sizeof(wchar_t) + 2 ) goto dalctomb_size;
	if ( c <= WCHAR_MAX ) {
		wctomb( dst, *((wchar_t*)vc) );
		goto dalctomb_done;
	}
#if WCHAR_MAX == SHRT_MAX
	cb = sizeof(long) + 2;
	if ( size < 6 ) goto dalctomb_size;
	if ( c <= 0x10FFFF ) {
		switch ( *ec ) {
		case 0x01: i = 3; j = 1; add = -1; break;
		case 0x23: i = 2; j = 0; add = 1; break;
		case 0x45: i = 1; j = 3; add = -1; break;
		case 0x67: i = 0; j = 2; add = 1; break;
		/* Just in case */
		default: goto dalctomb_fail;
		}
		dst[3] = pc[i];
		dst[2] = pc[i + add];
		dst[1] = pc[j];
		*dst = 0220;
	}
#endif
	switch ( 0 ) {
	case ESIZE: dalctomb_size: i = ESIZE; break;
	case ERANGE: dalctomb_range: i = ERANGE; break;
	case EPARAM: dalctomb_param: i = EPARAM; break;
	case 1: dalctomb_fail: i = 1; break;
	default: dalctomb_done: i = 0;
	}
	if ( bytes ) *bytes = (i == 0) ? cb : 0;
	return i;
}
...
case 'c':
	if ( imax < LONG_MIN || imax > LONG_MAX ) return ERANGE;
	result = dalctomb( dst, size, (long)imax, done );
	break;

**awsdert** · 02-20-2019

Now having understood some key functions I relied there are unavailable in kernel code I've had to make a full fledged function, luckily I found a decent resource explaining UTF8 multi-bytes which is sufficient for now, I'll consider OEM crap at a later date

Code:

const size_t CDAM_MAX_BIT = IDAC_BIT - 2;
/* Account for '\0' also */
const size_t CDAM_MAX_LEN = (32/CDAM_MAX_BIT) + (IDAC_BIT>=8 ? 2 : 3);
int dalctoc( cdac_t *str, size_t size, unsigned long c32 ) {
	size_t i = 0, bits = 0, bytes = 0;
	unsigned long bit = IDAL_MIN, rec;
	if ( !str || size < CDAM_MAX_LEN ) return 1;
	if ( c32 <= IDAC_MAX ) {
		*str = (cdac_t)c32;
		return 0;
	}
	memset( str, 0, CDAM_MAX_LEN );
	for ( ; !(bit & c32); bit >>= 1 );
	rec = bit;
	for ( ; bit; bit >>= 1, ++bits );
	bytes = (bits / CDAM_MAX_BIT) + ((bits % CDAM_MAX_BIT) ? 1 : 0);
	bit = IDAC_MIN;
	for ( ; bytes; --bytes ) {
		bit >>= 1;
		str[i] |= bit;
		if ( (udac_t)(str[i]) == UDAC_MAX ) {
			++i;
			bit = IDAC_MIN;
		}
	}
	for ( ; i < CDAM_MAX_LEN && c32; ++i ) {
		for ( ; c32 && bit; c32 <<= 1 ) {
			if ( c32 & rec ) {
				str[i] |= (cdac_t)bit;
			}
		}
		bit = IDAL_MIN;
		bit >>= 2;
	}
	return 0;
}

The typedefs are just wrappers between M$ CHAR etc and normal char etc, I didn't wanna play type games with M$ crap so just created typedefs for it, wrote this late at night after a long day at work, heading to bed now, let me know if you see problems with it. Here's the link to that reference I mentioned

**awsdert** · 02-21-2019

Was gonna edit the above but can't do that now, I modified the code a bit to account for the scenario where long > 32bits. < 32bits has yet to be handled but not urgent for what I'm doing yet

Code:

const size_t CDAM_MAX_BIT = IDAC_BIT - 2;
/* Account for '\0' also */
const size_t CDAM_MAX_LEN = (32/(IDAC_BIT - 2)) + (IDAC_BIT>=8 ? 2 : 3);
int dalctoc( cdac_t *dst, size_t upto, unsigned long c32, size_t *done ) {
	size_t i = 0, bits = 0, bytes = 0;
	unsigned long bit = IDAL_MIN, rec;
	if ( !dst || upto < CDAM_MAX_LEN
#if UDAL_MAX >= 0xFFFFFFFFUL
	|| c32 > 0xFFFFFFFFUL
#endif
	) {
		if ( done ) *done = 0;
		return 1;
	}
	memset( dst, 0, CDAM_MAX_LEN );
	if ( c32 <= IDAC_MAX ) {
		*dst = (cdac_t)c32;
		if ( done ) *done = 1;
		return 0;
	}
	for ( ; !(bit & c32); bit >>= 1 );
	rec = bit;
	for ( ; bit; bit >>= 1, ++bits );
	bytes = (bits / CDAM_MAX_BIT) + ((bits % CDAM_MAX_BIT) ? 1 : 0);
	bit = IDAC_MIN;
	for ( ; bytes; --bytes ) {
		bit >>= 1;
		dst[i] |= bit;
		if ( (udac_t)(dst[i]) == UDAC_MAX ) {
			++i;
			bit = IDAC_MIN;
		}
	}
	for ( ; i < CDAM_MAX_LEN && c32; ++i ) {
		for ( ; c32 && bit; c32 <<= 1 ) {
			if ( c32 & rec ) {
				dst[i] |= (cdac_t)bit;
			}
		}
		bit = IDAL_MIN;
		bit >>= 2;
	}
	if ( done ) *done = i;
	return 0;
}

For anyone curios about why I needed this: GitHub - awsdert/da: Open source Driver API

Thread: int CHARACTER/S to char/*

Thread Tools

Search Thread

Display

int CHARACTER/S to char/*

Similar Threads

Copy every single character from a char into another char.

Location of Character in Char Array

Getting a character from a char one at a time

Find '&' character in char array?

Changing one character in a char[]

Tags for this Thread