Thread: Curropted text

  1. #1
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733

    Curropted text

    Some of you will remember that I said I would move onto producing FPNs from text some days ago, well in aid of that I needed to backtrack the position of the source data on ocassion with the method I had originally chosen, but on stdin I ended up with errors, they may have been a result of faulty code elsewhere but I decided that to ensure it wasn't from my library code it was high time I cleaned it up and make it more suitable for sharing, I got most of that done but then hit a glitch somewhere in the backend function I'm working on now, the glitch resulted in output like this:
    Code:
    make word.run
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_mem.so -c mcc_mem.c
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_get.so -c mcc_get.c
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_tsc.so -c mcc_tsc.c
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_base62.so -c mcc_base62.c
    cc -fPIC -Wall -Wno-multichar  -D OUT=word.elf -o ./word.elf word.c ./libmcc_mem.so ./libmcc_get.so ./libmcc_tsc.so ./libmcc_base62.so
    ./word.elf
    'Text': 'Testing,' ' testing' ', 123'
    'Text': 'Te' 'st' 'in' 'g,' ' t' 'es' 'ti' 'ng' ', ' '12' '3'
    'Word/s': '������'
    'Word/s': 'T'
    rm libmcc_get.so word.elf libmcc_tsc.so libmcc_mem.so libmcc_base62.so
    Compilation finished successfully.
    Using this code:
    Code:
    int mcc___getword( void *src, MCC_CH8 *dst,
    	mcc_char_info_t mcc_char_info,
    	func_mcc__read _read, func_mcc__last _last )
    {
    	int ret;
    	intmax_t c = 0;
    	char *cb, *more;
    	mcc_ch8_t ch8;
    	MCC_VEC *dstv;
    	MCC_MEM *dstm;
    	MCC_ICONV_TOK tok;
    	MCC_ICONV_MEM nil = {0};
    	size_t read, size;
    	long leng;
    	if ( !dst ) return EDESTADDRREQ;
    	dstv = &(dst->vec);
    	dstm = &(dstv->mem);
    	if ( !_read || !_last ) return EDESTADDRREQ;
    	if ( !src || _last(src) )
    		return ENODATA;
    	if ( (ret = mcc_char_info_test(mcc_char_info)) != EXIT_SUCCESS )
    		return ret;
    	ch8 = 0;
    	cb = (char*)&c;
    	more = &cb[mcc_char_info.size];
    	tok = mcc_iconv_new_tok( mcc_char_info.ch8dst );
    	/* Permits appending to existing text */
    	tok.dst.done = dstv->use * sizeof(mcc_ch8_t);
    	tok.dst = mcc_iconv_tok_mem( tok.dst, dstm->addr, dstm->size );
    	while ( ch8 != U' ' ) {
    		/* Avoid the need to reset position by allocating memory 1st */
    		if ( tok.dst.left <= sizeof(mcc_utf_t) ) {
    			ret = mcc_vecsize( dstv, dstm->size + MCC_BUFSIZ, sizeof(mcc_ch8_t) );
    			if ( ret != EXIT_SUCCESS ) break;
    			tok.dst = mcc_iconv_tok_mem( tok.dst, dstm->addr, dstm->size );
    		}
    		/* _read() is not expected to overwrite all bytes */
    		c = 0;
    		read = _read( cb, mcc_char_info.size, 1, src );
    		/* Check if any more characters are expected */
    		leng = mcc_char_info.cleng(&c);
    		if ( leng > 1 ) {
    			size = (mcc_char_info.size * leng);
    			if ( size )
    				read += _read( more, mcc_char_info.size, leng - 1, src );
    		}
    		else {
    			leng = 1;
    			size = mcc_char_info.size;
    		}
    		if ( mcc_char_info.enc == mcc_encoding_ch8 ) {
    			/* Quicken the pace */
    			(void)memcpy( tok.dst.addr, cb, size );
    			tok.dst.done += size;
    			tok.dst.left -= size;
    			tok.dst.addr += size;
    		}
    		else {
    			/* Reset positon info like this */
    			tok.src = mcc_iconv_tok_mem( nil, cb, sizeof(intmax_t) );
    			ret = mcc_iconv( &tok );
    		}
    		ch8 = read ? *((mcc_ch8_t*)(tok.dst.addr)) : U' ';
    		/* Faster than calling another function */
    		switch ( ch8 ) {
    		case U'\f': case U'\v':
    		case U'\n': case U'\r':
    		case U'\t': ch8 = U' ';
    		}
    		if ( read != size ) break;
    		if ( ret != EXIT_SUCCESS ) break;
    	}
    	dstv->use = tok.dst.done / sizeof(mcc_ch8_t);
    	return ret;
    }
    Both use the same functions and source string so I know the source of the corruption lies somewhere in this code, I figure fresh eyes might spot what I haven't so for now I'll take a break and come back to this after some light entertainment, any help will be welcome.

    Edit: Noticed I was still using an enum value to check for space in the while condition, corrected that but text was still curropted which is no surprise as that was only an exit condition
    Last edited by awsdert; 10-21-2019 at 12:59 PM.

  2. #2
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Turns out memcpy does not do what is says on the tin, replaced it with this:
    Code:
    txt = (char*)(tok.dst.addr);
    for ( byte = 0; byte < size; ++byte, ++txt ) txt[byte] = cb[byte];
    and hay presto I have correct text, remind me never to expect common sense in the standard library

  3. #3
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    Quote Originally Posted by awsdert View Post
    Turns out memcpy does not do what is says on the tin, replaced it with this:
    Code:
    txt = (char*)(tok.dst.addr);
    for ( byte = 0; byte < size; ++byte, ++txt ) txt[byte] = cb[byte];
    and hay presto I have correct text, remind me never to expect common sense in the standard library
    That's likely because you used memcpy in an undefined way (e.g., source and destination overlap). I haven't looked too deeply at your code to see if that's the case, but there's an exceedingly low chance (like one in a million) that a standard library function is at fault.

  4. #4
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by christop View Post
    That's likely because you used memcpy in an undefined way (e.g., source and destination overlap). I haven't looked too deeply at your code to see if that's the case, but there's an exceedingly low chance (like one in a million) that a standard library function is at fault.
    Nope, that scenario is already ruled out by prior code, besides if that had been the issue then the loop I made would have encountered the exact same issue

  5. #5
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    945
    I wouldn't say it's ruled out. Undefined behavior can show up in any of the following ways:

    • The program works as you expected.
    • The program crashes.
    • The program gives weird results.
    • Demons fly out of your nose.
    • Anything else.

    Changing anything about the program or compiler (different compiler, different version, different options) can change the outcome.

  6. #6
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    What happens if you use memmove instead of memcpy?

  7. #7
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Wouldn't make a difference, look:
    Code:
    int mcc___getword( void *src, MCC_CH8 *dst,
    	mcc_char_info_t mcc_char_info,
    	func_mcc__read _read, func_mcc__last _last )
    {
    	int ret;
    	intmax_t c = 0;
    	char *cb, *more;
    cb is just a pointer to c, it is filled right before the loop it is used in, thus impossible to overlap, strangly however when I tried both memmove and memcpy just now they gave the expected result, dunno how that worked out but whatever, still using my own custom function with the loop I mentioned earlier

  8. #8
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Quote Originally Posted by awsdert View Post
    cb is just a pointer to c, it is filled right before the loop it is used in, thus impossible to overlap, strangly however when I tried both memmove and memcpy just now they gave the expected result, dunno how that worked out but whatever, still using my own custom function with the loop I mentioned earlier
    Personally I'd find that kind of worrying and it'd play tricks with my mind. Didn't work before, but works now, but don't know why? Yeah, I couldn't handle that.

    I think it will be nice once this project is on github (or wherever) so that we can test things. I saw your other recent post (about number of characters) and I refrained from commenting because from the information given it seemed next to impossible to make a rational assessment -- not enough detail and no way to reproduce the error because: a) the code given was not something that could be compiled; and b) I felt that that from the code that was given was not enough to see what was going on. It's similar to the code in this post. I don't mean to be negative, but it's next to impossible (for me anyway) to understand snippets of code that are missing details. Edit: I mean, in this example, if the crash was being caused by UB due to memcpy (overlapping memory maybe) then using memmove would have fixed it... but now both methods work? That doesn't make sense, there was something wrong elsewhere and your loop is not, in my opinion, not what fixed things.
    Last edited by Hodor; 10-25-2019 at 03:18 AM.

  9. #9
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by Hodor View Post
    Personally I'd find that kind of worrying and it'd play tricks with my mind. Didn't work before, but works now, but don't know why? Yeah, I couldn't handle that.
    I found the trick to avoiding that stress is to just not concern myself with it until I find a clue as to what could've caused it, no point jumping down random rabit holes and creating new holes without understanding of where to make 'em if at all.

    Quote Originally Posted by Hodor View Post
    I think it will be nice once this project is on github (or wherever) so that we can test things.
    Will probably do that monday so look forward to it (will be marked beta since I want to add a base62 function like before but with support for FPNs now that I found a youtube vid that makes it clearer how I should translate the number to bits)
    Quote Originally Posted by Hodor View Post
    I saw your other recent post (about number of characters) and I refrained from commenting because from the information given it seemed next to impossible to make a rational assessment -- not enough detail and no way to reproduce the error because: a) the code given was not something that could be compiled; and b) I felt that that from the code that was given was not enough to see what was going on. It's similar to the code in this post. I don't mean to be negative, but it's next to impossible (for me anyway) to understand snippets of code that are missing details.
    That's fine, what I wanted to see if anyone saw any problems with that code since that was the only location I could see as problematic, the full code is a bit much to be putting on the board directly so intended to look through that myself
    Quote Originally Posted by Hodor View Post
    Edit: I mean, in this example, if the crash was being caused by UB due to memcpy (overlapping memory maybe) then using memmove would have fixed it... but now both methods work? That doesn't make sense, there was something wrong elsewhere and your loop is not, in my opinion, not what fixed things.
    Well all I know is that until I used that loop it wasn't playing ball, perhaps I fixed the issue later after getting to play ball in the 1st place, either way I needed to see if the text was being copied properly before I could experiment elsewhere to catch as yet uncaught scenarios, scenarios beyond what I can predict before uploading to github will just have to wait until someone spots it and either fixes it or posts the issue on github

  10. #10
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Well you can scrap monday 'cause I apparently had the motivation to upload everything tonight despite the fact I should be in bed now, go figure
    GitHub - awsdert/mitsy: MIT licensed C compiler

  11. #11
    Registered User
    Join Date
    May 2009
    Posts
    4,183
    I suggest building a busybox like thing before trying a full OS kernel compile.
    BusyBox

    Tim S.
    "...a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are,in short, a perfect match.." Bill Bryson

  12. #12
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Quote Originally Posted by stahta01 View Post
    I suggest building a busybox like thing before trying a full OS kernel compile.
    BusyBox

    Tim S.
    That would fall under the 11th target "Compile various projects relying on GCC extensions" I would expect, I had no expectations to be doing that before next years end anyways (I'm a lazy programmer), thanks for the link anyways, I can focus on that one as the 1st project I attempt to compile once I'm done with pre-compiler end since that should give me a great wealth of errors to fix

  13. #13
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    make word.run doesn't work for me (3 errors in mcc_get.c)

    Code:
    $ make word.run
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_mem.so -c mcc_mem.c
    cc -fPIC -Wall -Wno-multichar  -shared -o ./libmcc_get.so -c mcc_get.c
    In file included from mcc_get.c:1:
    mcc_get.h:243:1: warning: no semicolon at end of struct or union
      243 | } MCC_NUM;
          | ^
    mcc_get.c: In function ‘mcc_getnum’:
    mcc_get.c:979:25: error: ‘MCC_CH8’ {aka ‘struct mcc_pos’} has no member named ‘use’
      979 |  if ( !src || !src->text.use ) return ENODATA;
          |                         ^
    mcc_get.c:1013:6: error: ‘dts’ undeclared (first use in this function); did you mean ‘dst’?
     1013 |      dts->base = 8;
          |      ^~~
          |      dst
    mcc_get.c:1013:6: note: each undeclared identifier is reported only once for each function it appears in
    mcc_get.c:1058:9: error: ‘num’ undeclared (first use in this function)
     1058 |  *dst = num;
          |         ^~~
    mcc_get.c:1057:2: warning: label ‘mcc_getnum_done’ defined but not used [-Wunused-label]
     1057 |  mcc_getnum_done:
          |  ^~~~~~~~~~~~~~~
    mcc_get.c:975:42: warning: unused variable ‘type’ [-Wunused-variable]
      975 |  int ret = EXIT_SUCCESS, l = 10, h = 10, type, c;
          |                                          ^~~~
    make: *** [makefile:93: libmcc_get.so] Error 1
    rm libmcc_mem.so

  14. #14
    Registered User awsdert's Avatar
    Join Date
    Jan 2015
    Posts
    1,733
    Ah thanks, seems I just made a typo and forgot to remove a statement after changing midway how I wanted to fill dst, its what I get for programming on a late night after a long day and early morning, was considering working on it now anyway, makes a good starting point

  15. #15
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Well, I got it to compile by checking out to a previous git revision. I can't say I really understand what's going on though, I find it pretty complicated. I do know where the problem is though. But... hmm, yeah. Complicated (But, yes, it's reading and writing out of bounds)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 02-16-2019, 07:47 PM
  2. User enter text and save to text file (end enter text using -1)
    By DecoratorFawn82 in forum C Programming
    Replies: 7
    Last Post: 12-28-2017, 04:23 PM
  3. Type text = Press button = Display text in Google?
    By Raze88 in forum C++ Programming
    Replies: 4
    Last Post: 03-20-2008, 08:39 AM
  4. Replies: 4
    Last Post: 01-03-2006, 03:02 AM
  5. create a text file with data using text editor
    By fried egg in forum C Programming
    Replies: 3
    Last Post: 03-14-2002, 09:11 PM

Tags for this Thread