Curropted text

**awsdert** · 10-21-2019

Some of you will remember that I said I would move onto producing FPNs from text some days ago, well in aid of that I needed to backtrack the position of the source data on ocassion with the method I had originally chosen, but on stdin I ended up with errors, they may have been a result of faulty code elsewhere but I decided that to ensure it wasn't from my library code it was high time I cleaned it up and make it more suitable for sharing, I got most of that done but then hit a glitch somewhere in the backend function I'm working on now, the glitch resulted in output like this:

Code:

make word.run
cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_mem.so -c mcc_mem.c
cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_get.so -c mcc_get.c
cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_tsc.so -c mcc_tsc.c
cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_base62.so -c mcc_base62.c
cc -fPIC -Wall -Wno-multichar  -D OUT=word.elf -o ./word.elf word.c ./libmcc_mem.so ./libmcc_get.so ./libmcc_tsc.so ./libmcc_base62.so
./word.elf
'Text': 'Testing,' ' testing' ', 123'
'Text': 'Te' 'st' 'in' 'g,' ' t' 'es' 'ti' 'ng' ', ' '12' '3'
'Word/s': '������'
'Word/s': 'T'
rm libmcc_get.so word.elf libmcc_tsc.so libmcc_mem.so libmcc_base62.so
Compilation finished successfully.

Using this code:

Code:

int mcc___getword( void *src, MCC_CH8 *dst,
	mcc_char_info_t mcc_char_info,
	func_mcc__read _read, func_mcc__last _last )
{
	int ret;
	intmax_t c = 0;
	char *cb, *more;
	mcc_ch8_t ch8;
	MCC_VEC *dstv;
	MCC_MEM *dstm;
	MCC_ICONV_TOK tok;
	MCC_ICONV_MEM nil = {0};
	size_t read, size;
	long leng;
	if ( !dst ) return EDESTADDRREQ;
	dstv = &(dst->vec);
	dstm = &(dstv->mem);
	if ( !_read || !_last ) return EDESTADDRREQ;
	if ( !src || _last(src) )
		return ENODATA;
	if ( (ret = mcc_char_info_test(mcc_char_info)) != EXIT_SUCCESS )
		return ret;
	ch8 = 0;
	cb = (char*)&c;
	more = &cb[mcc_char_info.size];
	tok = mcc_iconv_new_tok( mcc_char_info.ch8dst );
	/* Permits appending to existing text */
	tok.dst.done = dstv->use * sizeof(mcc_ch8_t);
	tok.dst = mcc_iconv_tok_mem( tok.dst, dstm->addr, dstm->size );
	while ( ch8 != U' ' ) {
		/* Avoid the need to reset position by allocating memory 1st */
		if ( tok.dst.left <= sizeof(mcc_utf_t) ) {
			ret = mcc_vecsize( dstv, dstm->size + MCC_BUFSIZ, sizeof(mcc_ch8_t) );
			if ( ret != EXIT_SUCCESS ) break;
			tok.dst = mcc_iconv_tok_mem( tok.dst, dstm->addr, dstm->size );
		}
		/* _read() is not expected to overwrite all bytes */
		c = 0;
		read = _read( cb, mcc_char_info.size, 1, src );
		/* Check if any more characters are expected */
		leng = mcc_char_info.cleng(&c);
		if ( leng > 1 ) {
			size = (mcc_char_info.size * leng);
			if ( size )
				read += _read( more, mcc_char_info.size, leng - 1, src );
		}
		else {
			leng = 1;
			size = mcc_char_info.size;
		}
		if ( mcc_char_info.enc == mcc_encoding_ch8 ) {
			/* Quicken the pace */
			(void)memcpy( tok.dst.addr, cb, size );
			tok.dst.done += size;
			tok.dst.left -= size;
			tok.dst.addr += size;
		}
		else {
			/* Reset positon info like this */
			tok.src = mcc_iconv_tok_mem( nil, cb, sizeof(intmax_t) );
			ret = mcc_iconv( &tok );
		}
		ch8 = read ? *((mcc_ch8_t*)(tok.dst.addr)) : U' ';
		/* Faster than calling another function */
		switch ( ch8 ) {
		case U'\f': case U'\v':
		case U'\n': case U'\r':
		case U'\t': ch8 = U' ';
		}
		if ( read != size ) break;
		if ( ret != EXIT_SUCCESS ) break;
	}
	dstv->use = tok.dst.done / sizeof(mcc_ch8_t);
	return ret;
}

Both use the same functions and source string so I know the source of the corruption lies somewhere in this code, I figure fresh eyes might spot what I haven't so for now I'll take a break and come back to this after some light entertainment, any help will be welcome.

Edit: Noticed I was still using an enum value to check for space in the while condition, corrected that but text was still curropted which is no surprise as that was only an exit condition

**awsdert** · 10-21-2019

Turns out memcpy does not do what is says on the tin, replaced it with this:

Code:

txt = (char*)(tok.dst.addr);
for ( byte = 0; byte < size; ++byte, ++txt ) txt[byte] = cb[byte];

and hay presto I have correct text, remind me never to expect common sense in the standard library

**christop** · 10-23-2019

Originally Posted by awsdert

Turns out memcpy does not do what is says on the tin, replaced it with this:

Code:

txt = (char*)(tok.dst.addr);
for ( byte = 0; byte < size; ++byte, ++txt ) txt[byte] = cb[byte];

and hay presto I have correct text, remind me never to expect common sense in the standard library

That's likely because you used memcpy in an undefined way (e.g., source and destination overlap). I haven't looked too deeply at your code to see if that's the case, but there's an exceedingly low chance (like one in a million) that a standard library function is at fault.

**awsdert** · 10-24-2019

Originally Posted by christop

That's likely because you used memcpy in an undefined way (e.g., source and destination overlap). I haven't looked too deeply at your code to see if that's the case, but there's an exceedingly low chance (like one in a million) that a standard library function is at fault.

Nope, that scenario is already ruled out by prior code, besides if that had been the issue then the loop I made would have encountered the exact same issue

**christop** · 10-24-2019

I wouldn't say it's ruled out. Undefined behavior can show up in any of the following ways:

The program works as you expected.
The program crashes.
The program gives weird results.
Demons fly out of your nose.
Anything else.

Changing anything about the program or compiler (different compiler, different version, different options) can change the outcome.

**Hodor** · 10-24-2019

What happens if you use memmove instead of memcpy?

**awsdert** · 10-25-2019

Wouldn't make a difference, look:

Code:

int mcc___getword( void *src, MCC_CH8 *dst,
	mcc_char_info_t mcc_char_info,
	func_mcc__read _read, func_mcc__last _last )
{
	int ret;
	intmax_t c = 0;
	char *cb, *more;

cb is just a pointer to c, it is filled right before the loop it is used in, thus impossible to overlap, strangly however when I tried both memmove and memcpy just now they gave the expected result, dunno how that worked out but whatever, still using my own custom function with the loop I mentioned earlier

**Hodor** · 10-25-2019

Originally Posted by awsdert

cb is just a pointer to c, it is filled right before the loop it is used in, thus impossible to overlap, strangly however when I tried both memmove and memcpy just now they gave the expected result, dunno how that worked out but whatever, still using my own custom function with the loop I mentioned earlier

Personally I'd find that kind of worrying and it'd play tricks with my mind. Didn't work before, but works now, but don't know why? Yeah, I couldn't handle that.

I think it will be nice once this project is on github (or wherever) so that we can test things. I saw your other recent post (about number of characters) and I refrained from commenting because from the information given it seemed next to impossible to make a rational assessment -- not enough detail and no way to reproduce the error because: a) the code given was not something that could be compiled; and b) I felt that that from the code that was given was not enough to see what was going on. It's similar to the code in this post. I don't mean to be negative, but it's next to impossible (for me anyway) to understand snippets of code that are missing details. Edit: I mean, in this example, if the crash was being caused by UB due to memcpy (overlapping memory maybe) then using memmove would have fixed it... but now both methods work? That doesn't make sense, there was something wrong elsewhere and your loop is not, in my opinion, not what fixed things.

**awsdert** · 10-25-2019

Originally Posted by Hodor

Personally I'd find that kind of worrying and it'd play tricks with my mind. Didn't work before, but works now, but don't know why? Yeah, I couldn't handle that.

I found the trick to avoiding that stress is to just not concern myself with it until I find a clue as to what could've caused it, no point jumping down random rabit holes and creating new holes without understanding of where to make 'em if at all.

Originally Posted by Hodor

I think it will be nice once this project is on github (or wherever) so that we can test things.

Will probably do that monday so look forward to it

(will be marked beta since I want to add a base62 function like before but with support for FPNs now that I found a youtube vid that makes it clearer how I should translate the number to bits)

Originally Posted by Hodor

I saw your other recent post (about number of characters) and I refrained from commenting because from the information given it seemed next to impossible to make a rational assessment -- not enough detail and no way to reproduce the error because: a) the code given was not something that could be compiled; and b) I felt that that from the code that was given was not enough to see what was going on. It's similar to the code in this post. I don't mean to be negative, but it's next to impossible (for me anyway) to understand snippets of code that are missing details.

That's fine, what I wanted to see if anyone saw any problems with that code since that was the only location I could see as problematic, the full code is a bit much to be putting on the board directly so intended to look through that myself

Originally Posted by Hodor

Edit: I mean, in this example, if the crash was being caused by UB due to memcpy (overlapping memory maybe) then using memmove would have fixed it... but now both methods work? That doesn't make sense, there was something wrong elsewhere and your loop is not, in my opinion, not what fixed things.

Well all I know is that until I used that loop it wasn't playing ball, perhaps I fixed the issue later after getting to play ball in the 1st place, either way I needed to see if the text was being copied properly before I could experiment elsewhere to catch as yet uncaught scenarios, scenarios beyond what I can predict before uploading to github will just have to wait until someone spots it and either fixes it or posts the issue on github

**awsdert** · 10-25-2019

Well you can scrap monday 'cause I apparently had the motivation to upload everything tonight despite the fact I should be in bed now, go figure

GitHub - awsdert/mitsy: MIT licensed C compiler

**stahta01** · 10-25-2019

I suggest building a busybox like thing before trying a full OS kernel compile.
BusyBox

Tim S.

**awsdert** · 10-26-2019

Originally Posted by stahta01

I suggest building a busybox like thing before trying a full OS kernel compile.
BusyBox

Tim S.

That would fall under the 11th target "Compile various projects relying on GCC extensions" I would expect, I had no expectations to be doing that before next years end anyways (I'm a lazy programmer), thanks for the link anyways, I can focus on that one as the 1st project I attempt to compile once I'm done with pre-compiler end since that should give me a great wealth of errors to fix

**Hodor** · 10-27-2019

make word.run doesn't work for me (3 errors in mcc_get.c)

Code:

$ make word.run
cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_mem.so -c mcc_mem.c
cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_get.so -c mcc_get.c
In file included from mcc_get.c:1:
mcc_get.h:243:1: warning: no semicolon at end of struct or union
  243 | } MCC_NUM;
      | ^
mcc_get.c: In function ‘mcc_getnum’:
mcc_get.c:979:25: error: ‘MCC_CH8’ {aka ‘struct mcc_pos’} has no member named ‘use’
  979 |  if ( !src || !src->text.use ) return ENODATA;
      |                         ^
mcc_get.c:1013:6: error: ‘dts’ undeclared (first use in this function); did you mean ‘dst’?
 1013 |      dts->base = 8;
      |      ^~~
      |      dst
mcc_get.c:1013:6: note: each undeclared identifier is reported only once for each function it appears in
mcc_get.c:1058:9: error: ‘num’ undeclared (first use in this function)
 1058 |  *dst = num;
      |         ^~~
mcc_get.c:1057:2: warning: label ‘mcc_getnum_done’ defined but not used [-Wunused-label]
 1057 |  mcc_getnum_done:
      |  ^~~~~~~~~~~~~~~
mcc_get.c:975:42: warning: unused variable ‘type’ [-Wunused-variable]
  975 |  int ret = EXIT_SUCCESS, l = 10, h = 10, type, c;
      |                                          ^~~~
make: *** [makefile:93: libmcc_get.so] Error 1
rm libmcc_mem.so

**awsdert** · 10-28-2019

Ah thanks, seems I just made a typo and forgot to remove a statement after changing midway how I wanted to fill dst, its what I get for programming on a late night after a long day and early morning, was considering working on it now anyway, makes a good starting point

**Hodor** · 10-28-2019

Well, I got it to compile by checking out to a previous git revision. I can't say I really understand what's going on though, I find it pretty complicated. I do know where the problem is though. But... hmm, yeah. Complicated

(But, yes, it's reading and writing out of bounds)

Thread: Curropted text

Thread Tools

Search Thread

Display

Curropted text

Similar Threads

Project: C-Edit for Linux - Text editor in terminal with basic text UI. No ncurses.

User enter text and save to text file (end enter text using -1)

Type text = Press button = Display text in Google?

XOR ---> text file save --> Some of text disappears after Decode

create a text file with data using text editor

Tags for this Thread