Thread: entab---spacing problems

  1. #1
    Registered User
    Join Date
    Nov 2009
    Posts
    14

    entab---spacing problems

    K&R, Exercise 1-21. I've looked up a lot of solutions to this problem to study (such as here), and they only seem to substitute 8 consecutive space characters with tabs. However, I don't really think that's what this question is asking; the text is "[w]rite a program entab that replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing." That's not just replacing 8 blanks with a tab; that's replacing, say, four blanks with a tab, if there's only four blanks to the next tab stop.

    So I've been struggling mightily to do that, and it's just not working for me. I'm even occasionally getting a segfault in apparently unreproducible circumstances. I have no idea what's wrong here; anyone have any pointers? Here's the code; gcc 4.3.2 on Debian Lenny.
    Code:
    #include<stdio.h>
    #include<string.h>
    
    #define MAXLINE 10000
    #define TABSTOP 8
    
    int getline(char s[], int lim);
    
    int main(void)
    {
    	int i,j,k;
    	char string[MAXLINE];
    	int lastlet;
    
    	while(getline(string, MAXLINE) > 0) {
    		for(i=0,j=0; string[i] != '\0'; ++i) { 
    			if (string[i] == '_') {
    				if (string[i-1] != '_' && string[i-1] != '\t')
    					lastlet = i;
    				if ((i % TABSTOP) == 0) {
    					string[lastlet] = '\t';
    					for (j=lastlet+1; j<=strlen(string); ++j)
    						string[j] = string[i++];
    					i = lastlet;
    				}
    			}
    		}
    		printf("%s",string);
    	}
    	return 0;
    }
    
    int getline(char s[], int lim)
    {
    	int c, i;
    
    	for (i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
    		s[i] = c;
    	if (c == '\n') 
    		s[i++] = c;
    	s[i] = '\0';
    	return i;
    }

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Count up the number of spaces. Divide by SPACES_PER_TAB. That's how many tabs you need. Mod by SPACES_PER_TAB. That's how many spaces you need.


    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Registered User
    Join Date
    Nov 2009
    Posts
    14
    Quote Originally Posted by quzah View Post
    Count up the number of spaces. Divide by SPACES_PER_TAB. That's how many tabs you need. Mod by SPACES_PER_TAB. That's how many spaces you need.


    Quzah.
    Right; that part's easy. But I don't think that's really what the exercise is asking for, is it? Assuming an eight-space tab, that solution fills up ten spaces with a tab and two spaces, which is fine. But the exercise is asking for something different, it seems to me. Picture this:
    Code:
            |        |        |        |        |
    Now_________is_______________the______time
    So between the "Now" and the "is" there are nine spaces; but to achieve the same spacing, I don't need one tab and one space, but rather one tab (which will bring me to the first tabstop, the eighth space) and three spaces (which will bring me to "is"). That's what I think the exercise is asking for.

    I've beat myself over the head for hours trying to make that work; the code I posted above is the best I could do. But it only works for the first tab, then breaks down. Any ideas?

  4. #4
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    One way to go about this is to look at each input line as made up of distinct fields, each 8 characters wide, and
    Within each field only trailing spaces get converted to tabs not leading/embedded ones otherwise the output gets botched.

  5. #5
    Registered User
    Join Date
    Nov 2009
    Posts
    14
    Quote Originally Posted by itCbitC View Post
    One way to go about this is to look at each input line as made up of distinct fields, each 8 characters wide, and
    Within each field only trailing spaces get converted to tabs not leading/embedded ones otherwise the output gets botched.
    I'm sure that'll do it. Thanks for the conceptual leap.

    Instead I wrote exercise 1-22, which despite being ostensibly more complicated I was able to do rather quickly. C is awesome.

  6. #6
    Registered User
    Join Date
    Nov 2009
    Posts
    14
    Quote Originally Posted by itCbitC View Post
    One way to go about this is to look at each input line as made up of distinct fields, each 8 characters wide, and
    Within each field only trailing spaces get converted to tabs not leading/embedded ones otherwise the output gets botched.
    For what it's worth, here's the code I came up with based on the "8 character field" concept. For some reason it still seems to be a space off, and I can't figure that out. But the concept, I'm confident, is correct. Any ideas why I might still be off?

    Code:
    #include<stdio.h>
    #include<string.h>
    
    #define MAXLINE 10000
    #define TABSTOP 8
    
    int getline(char s[], int lim);
    int tabreplace(char s[], int tabspot);
    
    int main(void)
    {
    	int i,j,k;
    	char string[MAXLINE];
    
    	while(getline(string, MAXLINE) > 0) {
    		for(i=0; i<=strlen(string); i+=TABSTOP) {
    			i -= tabreplace(string,i);
    		}
    		printf("%s",string);
    	}
    	return 0;
    }
    
    int tabreplace(char s[], int tabspot)
    {
    	int i;
    	int skipback;
    
    	if (s[tabspot] == '_') {
    		for (i=tabspot; s[i]=='_' && i>(tabspot-TABSTOP); --i);
    		s[i+1] = '\t';
    		skipback = tabspot - (i+2);
    		for (i=i+2; i<strlen(s); ++i)
    			s[i] = s[i+skipback];
    		printf("returning %d\n",skipback);
    		return skipback+1;
    	}
    	return 0;
    }
    
    int getline(char s[], int lim)
    {
    	int c, i;
    
    	for (i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
    		s[i] = c;
    	if (c == '\n') 
    		s[i++] = c;
    	s[i] = '\0';
    	return i;
    }

  7. #7
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Can you provide some explanation of your code through inline comments etc. as it is hard to understand what is goin' on.
    With getline() you're restricted to line lengths of MAXLINE; a simple getchar() that terminates on EOF works much better.

  8. #8
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    And please provide a sample input, the current output (in error) and the output as you expect it to be.
    Mainframe assembler programmer by trade. C coder when I can.

  9. #9
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Shouldn't the tab spot be x % TABSPOT == 0 ? If you start counting in an array, 0 ... TS-1 == first tab block. TS to TS*2 -1 == second block.
    Code:
    abcdabcdabcdabcd
    ^   ^   ^   ^   ^
    Let's assume 4, because I don't feel like typing out 8.


    Quzah.
    Hope is the first step on the road to disappointment.

  10. #10
    Registered User
    Join Date
    Nov 2009
    Posts
    14
    Quote Originally Posted by itCbitC View Post
    Can you provide some explanation of your code through inline comments etc. as it is hard to understand what is goin' on.
    With getline() you're restricted to line lengths of MAXLINE; a simple getchar() that terminates on EOF works much better.
    Forgive me, but you've got to have some bounds on the string. I suppose I could read it in and print it a character at a time, but since I'm reading it into a string, whether I do it by line or by getchar() terminating on EOF, I've got to keep a limit on it in some way, don't I?

    I try to follow the Linus Torvalds coding rules, but my comments are still too sparse. Here's a play-by-play of what I'm going for:
    Code:
    #include<stdio.h>
    #include<string.h>
    
    #define MAXLINE 10000
    #define TABSTOP 8
    
    int getline(char s[], int lim);
    int tabreplace(char s[], int tabspot);
    
    int main(void)
    {
    	int i,j,k;
    	char string[MAXLINE];
    
    	while(getline(string, MAXLINE) > 0) {
    		for(i=0; i<=strlen(string); i+=TABSTOP) {
    			i -= tabreplace(string,i);
    		}
    		printf("%s",string);
    	}
    	return 0;
    }
    This, of course, is pretty simple. While there's input from getline, execute the for loop. The loop starts at zero and increments by TABSTOP every cycle, until the index is greater than the length of the string it's processing. For each multiple of TABSTOP, call the function tabreplace(); tabreplace() returns the number of spaces replaced by the tab, so while we're at it we set the loop index back that number of spaces. When we're done, we print the string, then wait for a new line.

    Code:
    int tabreplace(char s[], int tabspot)
    {
    	int i;
    	int skipback;
    
    	if (s[tabspot] == '_') {
    Clear enough; if the character at TABSTOP, or a multiple thereof, is '_', execute the following code.
    Code:
    		for (i=tabspot; s[i]=='_' && i>(tabspot-TABSTOP); --i);
    Determines the numbers of spaces prior to tabspot that need to be replaced. Only trailing spaces need to be replaced, not leading ones, so it stops when it hits either a non-'_' or the prior tab. This means that the index is one behind where I need to insert the tab, so:
    Code:
    		s[i+1] = '\t';
    I insert the tab.
    Code:
    		skipback = tabspot - (i+2);
    		for (i=i+2; i<strlen(s); ++i)
    			s[i] = s[i+skipback];
    		printf("returning %d\n",skipback);
    Then I set skipback equal to the number of spaces I replaced with the '\t'. It looks worse than it is. It's the tabspot, minus the number of spaces to the first non-space character, plus that first non-space character and the '\t' itself.

    I then run a loop to move the entire string back by skipback, one character at a time, to ensure that not only is a tab inserted by that the spaces are actually removed.

    The printf was a debugging statement to make sure that I was returning the correct value.
    Code:
    		return skipback+1;
    	}
    Then I return the number of spaces I replaced (plus one for the '\t' I inserted).}

    I'm using the underscore for spaces so that I can visually verify what's been replaced and what hasn't.

    I've skipped getline(), as its function seems obvious. It gets a line, returns the length of said line.

    So do you all think I'm at least on the right track?

  11. #11
    Registered User
    Join Date
    Nov 2009
    Posts
    14
    Quote Originally Posted by Dino View Post
    And please provide a sample input, the current output (in error) and the output as you expect it to be.
    Sample input:
    Code:
    	|	|	|	|	|
    Now_________is______the___time_____________for
    Expected output:
    Code:
    Now	____is	____the	__time		___for
    Actual output:
    Code:
    Now	____is______the	_time_	________for

  12. #12
    Registered User
    Join Date
    Nov 2009
    Posts
    14
    I had a sudden epiphany while banging away at this; if I'd stopped to think longer about it before trying to write it, I wouldn't have had this problem. Here's the deal: inserting the tab into the string changed the length of the string, because I was removing a variable number of spaces and then inserting a single character, '\t'. This meant that the index wasn't finding the right tab spots, because the same index, after tabreplace() was run, pointed at a different spot in the string.

    I needed two different variables, one to keep track of where the tab stops were and one to be a simple index for the string. Once I conceptually separated those two functions, all was well.

    It's my third complete rewrite (excepting getline(), which I cribbed wholesale anyway), but I finally got it. This has taught me an extremely valuable lesson in C programming: it's not like Perl, where you can just hack away at things until they work. It's a spartan language, but a lovely one, and it requires elegance and foresight to produce working and sensible code.

    Here's what I wound up with, that produces the expected output in all circumstances I've tried (except that I haven't put in bounds checking, as this is just an exercise, not a production program):
    Code:
    #include<stdio.h>
    #include<string.h>
    
    #define MAXLINE 10000
    #define TABSTOP 8
    
    int getline(char s[], int lim);
    int tabreplace(char s[], int tabspot, int index);
    
    int main(void)
    {
    	int i; /* keep track of string index */
    	int j; /* keep track of tab stops */
    	char string[MAXLINE];
    
    	while(getline(string, MAXLINE) > 0) {
    		for(i=0,j=0; string[i] != '\0'; ++i,++j)
    			i -= tabreplace(string,j,i);
    		printf("%s",string);
    	}
    	return 0;
    }
    
    int tabreplace(char s[], int tabspot, int index)
    {
    	int i,j;
    	int numspaces;
    
    	if (((tabspot % TABSTOP) == 0) && (s[index] == '_')) {
    		for (i = index; (s[i] == '_') && (i > (index-TABSTOP)); --i);
    		numspaces = (i>(index-TABSTOP)) ? index-i-1 : index-i;
    		if (numspaces > 0)
    			s[index-numspaces] = '\t';
    		for(j=index-numspaces+1; s[j] != '\0'; ++j)
    			s[j] = s[j+numspaces-1];
    		return numspaces-1;
    	} 
    	return 0; 
    }
    I daresay it's a better-looking program, too, as well as a working one. Thanks for all your inspiration; you've been an enormous help to me.

  13. #13
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by dgoodmaniii View Post
    Forgive me, but you've got to have some bounds on the string. I suppose I could read it in and print it a character at a time, but since I'm reading it into a string, whether I do it by line or by getchar() terminating on EOF, I've got to keep a limit on it in some way, don't I?
    Nope! you don't need to have "bounds on the string". Using getchar() eliminates getline() as the input will be processed a field at a time ie "divide and conquer".
    Quote Originally Posted by dgoodmaniii View Post
    .
    .
    .
    Then I return the number of spaces I replaced (plus one for the '\t' I inserted).}

    I'm using the underscore for spaces so that I can visually verify what's been replaced and what hasn't.
    Gotcha! all along I couldn't figure out why the underscore; makes sense now.
    Quote Originally Posted by dgoodmaniii View Post
    I've skipped getline(), as its function seems obvious. It gets a line, returns the length of said line.

    So do you all think I'm at least on the right track?
    Yep!

  14. #14
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Quote Originally Posted by dgoodmaniii View Post
    I had a sudden epiphany while banging away at this; if I'd stopped to think longer about it before trying to write it, I wouldn't have had this problem. Here's the deal: inserting the tab into the string changed the length of the string, because I was removing a variable number of spaces and then inserting a single character, '\t'. This meant that the index wasn't finding the right tab spots, because the same index, after tabreplace() was run, pointed at a different spot in the string.
    Yep! since a tab and a space ea. count as a single character but for display purposes a single tab takes up the same room as 8 spaces.
    Quote Originally Posted by dgoodmaniii View Post
    I needed two different variables, one to keep track of where the tab stops were and one to be a simple index for the string. Once I conceptually separated those two functions, all was well.
    Yep! you need 2 variables - one to keep track of the number of spaces seen so far and the other for the current column number within the field
    Quote Originally Posted by dgoodmaniii View Post
    It's my third complete rewrite (excepting getline(), which I cribbed wholesale anyway), but I finally got it. This has taught me an extremely valuable lesson in C programming: it's not like Perl, where you can just hack away at things until they work. It's a spartan language, but a lovely one, and it requires elegance and foresight to produce working and sensible code.
    The more you dive into the C the more programming pearls you'll gather
    Quote Originally Posted by dgoodmaniii View Post
    Here's what I wound up with, that produces the expected output in all circumstances I've tried (except that I haven't put in bounds checking, as this is just an exercise, not a production program):
    .
    .
    .
    I daresay it's a better-looking program, too, as well as a working one. Thanks for all your inspiration; you've been an enormous help to me.
    No killjoy here but your program doesn't work with the said input:
    Code:
    hello___world

  15. #15
    Registered User
    Join Date
    Nov 2009
    Posts
    14
    Quote Originally Posted by itCbitC View Post
    Nope! you don't need to have "bounds on the string". Using getchar() eliminates getline() as the input will be processed a field at a time ie "divide and conquer".
    Well, you could use getchar(), but I'm still reading the results into a string. If I do that, the string needs to have bounds, yes?

    I could read it in by getchar() and then output by putchar(), one at a time, but I want to have the new string with tabs in a character array.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. No clue how to make a code to solve problems!
    By ctnzn in forum C Programming
    Replies: 8
    Last Post: 10-16-2008, 02:59 AM
  2. Better spacing issues
    By swgh in forum C++ Programming
    Replies: 2
    Last Post: 01-02-2008, 04:46 PM
  3. contest problems on my site
    By DavidP in forum Contests Board
    Replies: 4
    Last Post: 01-10-2004, 09:19 PM
  4. Spacing?
    By trenzterra in forum C++ Programming
    Replies: 5
    Last Post: 11-28-2002, 10:42 PM
  5. DJGPP problems
    By stormswift in forum C Programming
    Replies: 2
    Last Post: 02-26-2002, 04:35 PM