1. ## entab---spacing problems

K&R, Exercise 1-21. I've looked up a lot of solutions to this problem to study (such as here), and they only seem to substitute 8 consecutive space characters with tabs. However, I don't really think that's what this question is asking; the text is "[w]rite a program entab that replaces strings of blanks by the minimum number of tabs and blanks to achieve the same spacing." That's not just replacing 8 blanks with a tab; that's replacing, say, four blanks with a tab, if there's only four blanks to the next tab stop.

So I've been struggling mightily to do that, and it's just not working for me. I'm even occasionally getting a segfault in apparently unreproducible circumstances. I have no idea what's wrong here; anyone have any pointers? Here's the code; gcc 4.3.2 on Debian Lenny.
Code:
```#include<stdio.h>
#include<string.h>

#define MAXLINE 10000
#define TABSTOP 8

int getline(char s[], int lim);

int main(void)
{
int i,j,k;
char string[MAXLINE];
int lastlet;

while(getline(string, MAXLINE) > 0) {
for(i=0,j=0; string[i] != '\0'; ++i) {
if (string[i] == '_') {
if (string[i-1] != '_' && string[i-1] != '\t')
lastlet = i;
if ((i % TABSTOP) == 0) {
string[lastlet] = '\t';
for (j=lastlet+1; j<=strlen(string); ++j)
string[j] = string[i++];
i = lastlet;
}
}
}
printf("%s",string);
}
return 0;
}

int getline(char s[], int lim)
{
int c, i;

for (i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n')
s[i++] = c;
s[i] = '\0';
return i;
}```

2. Count up the number of spaces. Divide by SPACES_PER_TAB. That's how many tabs you need. Mod by SPACES_PER_TAB. That's how many spaces you need.

Quzah.

3. Originally Posted by quzah
Count up the number of spaces. Divide by SPACES_PER_TAB. That's how many tabs you need. Mod by SPACES_PER_TAB. That's how many spaces you need.

Quzah.
Right; that part's easy. But I don't think that's really what the exercise is asking for, is it? Assuming an eight-space tab, that solution fills up ten spaces with a tab and two spaces, which is fine. But the exercise is asking for something different, it seems to me. Picture this:
Code:
```        |        |        |        |        |
Now_________is_______________the______time```
So between the "Now" and the "is" there are nine spaces; but to achieve the same spacing, I don't need one tab and one space, but rather one tab (which will bring me to the first tabstop, the eighth space) and three spaces (which will bring me to "is"). That's what I think the exercise is asking for.

I've beat myself over the head for hours trying to make that work; the code I posted above is the best I could do. But it only works for the first tab, then breaks down. Any ideas?

4. One way to go about this is to look at each input line as made up of distinct fields, each 8 characters wide, and
Within each field only trailing spaces get converted to tabs not leading/embedded ones otherwise the output gets botched.

5. Originally Posted by itCbitC
One way to go about this is to look at each input line as made up of distinct fields, each 8 characters wide, and
Within each field only trailing spaces get converted to tabs not leading/embedded ones otherwise the output gets botched.
I'm sure that'll do it. Thanks for the conceptual leap.

Instead I wrote exercise 1-22, which despite being ostensibly more complicated I was able to do rather quickly. C is awesome.

6. Originally Posted by itCbitC
One way to go about this is to look at each input line as made up of distinct fields, each 8 characters wide, and
Within each field only trailing spaces get converted to tabs not leading/embedded ones otherwise the output gets botched.
For what it's worth, here's the code I came up with based on the "8 character field" concept. For some reason it still seems to be a space off, and I can't figure that out. But the concept, I'm confident, is correct. Any ideas why I might still be off?

Code:
```#include<stdio.h>
#include<string.h>

#define MAXLINE 10000
#define TABSTOP 8

int getline(char s[], int lim);
int tabreplace(char s[], int tabspot);

int main(void)
{
int i,j,k;
char string[MAXLINE];

while(getline(string, MAXLINE) > 0) {
for(i=0; i<=strlen(string); i+=TABSTOP) {
i -= tabreplace(string,i);
}
printf("%s",string);
}
return 0;
}

int tabreplace(char s[], int tabspot)
{
int i;
int skipback;

if (s[tabspot] == '_') {
for (i=tabspot; s[i]=='_' && i>(tabspot-TABSTOP); --i);
s[i+1] = '\t';
skipback = tabspot - (i+2);
for (i=i+2; i<strlen(s); ++i)
s[i] = s[i+skipback];
printf("returning %d\n",skipback);
return skipback+1;
}
return 0;
}

int getline(char s[], int lim)
{
int c, i;

for (i=0; i<lim-1 && (c=getchar())!=EOF && c!='\n'; ++i)
s[i] = c;
if (c == '\n')
s[i++] = c;
s[i] = '\0';
return i;
}```

7. Can you provide some explanation of your code through inline comments etc. as it is hard to understand what is goin' on.
With getline() you're restricted to line lengths of MAXLINE; a simple getchar() that terminates on EOF works much better.

8. And please provide a sample input, the current output (in error) and the output as you expect it to be.

9. Shouldn't the tab spot be x % TABSPOT == 0 ? If you start counting in an array, 0 ... TS-1 == first tab block. TS to TS*2 -1 == second block.
Code:
```abcdabcdabcdabcd
^   ^   ^   ^   ^```
Let's assume 4, because I don't feel like typing out 8.

Quzah.

10. Originally Posted by itCbitC
Can you provide some explanation of your code through inline comments etc. as it is hard to understand what is goin' on.
With getline() you're restricted to line lengths of MAXLINE; a simple getchar() that terminates on EOF works much better.
Forgive me, but you've got to have some bounds on the string. I suppose I could read it in and print it a character at a time, but since I'm reading it into a string, whether I do it by line or by getchar() terminating on EOF, I've got to keep a limit on it in some way, don't I?

I try to follow the Linus Torvalds coding rules, but my comments are still too sparse. Here's a play-by-play of what I'm going for:
Code:
```#include<stdio.h>
#include<string.h>

#define MAXLINE 10000
#define TABSTOP 8

int getline(char s[], int lim);
int tabreplace(char s[], int tabspot);

int main(void)
{
int i,j,k;
char string[MAXLINE];

while(getline(string, MAXLINE) > 0) {
for(i=0; i<=strlen(string); i+=TABSTOP) {
i -= tabreplace(string,i);
}
printf("%s",string);
}
return 0;
}```
This, of course, is pretty simple. While there's input from getline, execute the for loop. The loop starts at zero and increments by TABSTOP every cycle, until the index is greater than the length of the string it's processing. For each multiple of TABSTOP, call the function tabreplace(); tabreplace() returns the number of spaces replaced by the tab, so while we're at it we set the loop index back that number of spaces. When we're done, we print the string, then wait for a new line.

Code:
```int tabreplace(char s[], int tabspot)
{
int i;
int skipback;

if (s[tabspot] == '_') {```
Clear enough; if the character at TABSTOP, or a multiple thereof, is '_', execute the following code.
Code:
`		for (i=tabspot; s[i]=='_' && i>(tabspot-TABSTOP); --i);`
Determines the numbers of spaces prior to tabspot that need to be replaced. Only trailing spaces need to be replaced, not leading ones, so it stops when it hits either a non-'_' or the prior tab. This means that the index is one behind where I need to insert the tab, so:
Code:
`		s[i+1] = '\t';`
I insert the tab.
Code:
```		skipback = tabspot - (i+2);
for (i=i+2; i<strlen(s); ++i)
s[i] = s[i+skipback];
printf("returning %d\n",skipback);```
Then I set skipback equal to the number of spaces I replaced with the '\t'. It looks worse than it is. It's the tabspot, minus the number of spaces to the first non-space character, plus that first non-space character and the '\t' itself.

I then run a loop to move the entire string back by skipback, one character at a time, to ensure that not only is a tab inserted by that the spaces are actually removed.

The printf was a debugging statement to make sure that I was returning the correct value.
Code:
```		return skipback+1;
}```
Then I return the number of spaces I replaced (plus one for the '\t' I inserted).}

I'm using the underscore for spaces so that I can visually verify what's been replaced and what hasn't.

I've skipped getline(), as its function seems obvious. It gets a line, returns the length of said line.

So do you all think I'm at least on the right track?

11. Originally Posted by Dino
And please provide a sample input, the current output (in error) and the output as you expect it to be.
Sample input:
Code:
```	|	|	|	|	|
Now_________is______the___time_____________for```
Expected output:
Code:
`Now	____is	____the	__time		___for`
Actual output:
Code:
`Now	____is______the	_time_	________for`

12. I had a sudden epiphany while banging away at this; if I'd stopped to think longer about it before trying to write it, I wouldn't have had this problem. Here's the deal: inserting the tab into the string changed the length of the string, because I was removing a variable number of spaces and then inserting a single character, '\t'. This meant that the index wasn't finding the right tab spots, because the same index, after tabreplace() was run, pointed at a different spot in the string.

I needed two different variables, one to keep track of where the tab stops were and one to be a simple index for the string. Once I conceptually separated those two functions, all was well.

It's my third complete rewrite (excepting getline(), which I cribbed wholesale anyway), but I finally got it. This has taught me an extremely valuable lesson in C programming: it's not like Perl, where you can just hack away at things until they work. It's a spartan language, but a lovely one, and it requires elegance and foresight to produce working and sensible code.

Here's what I wound up with, that produces the expected output in all circumstances I've tried (except that I haven't put in bounds checking, as this is just an exercise, not a production program):
Code:
```#include<stdio.h>
#include<string.h>

#define MAXLINE 10000
#define TABSTOP 8

int getline(char s[], int lim);
int tabreplace(char s[], int tabspot, int index);

int main(void)
{
int i; /* keep track of string index */
int j; /* keep track of tab stops */
char string[MAXLINE];

while(getline(string, MAXLINE) > 0) {
for(i=0,j=0; string[i] != '\0'; ++i,++j)
i -= tabreplace(string,j,i);
printf("%s",string);
}
return 0;
}

int tabreplace(char s[], int tabspot, int index)
{
int i,j;
int numspaces;

if (((tabspot % TABSTOP) == 0) && (s[index] == '_')) {
for (i = index; (s[i] == '_') && (i > (index-TABSTOP)); --i);
numspaces = (i>(index-TABSTOP)) ? index-i-1 : index-i;
if (numspaces > 0)
s[index-numspaces] = '\t';
for(j=index-numspaces+1; s[j] != '\0'; ++j)
s[j] = s[j+numspaces-1];
return numspaces-1;
}
return 0;
}```
I daresay it's a better-looking program, too, as well as a working one. Thanks for all your inspiration; you've been an enormous help to me.

13. Originally Posted by dgoodmaniii
Forgive me, but you've got to have some bounds on the string. I suppose I could read it in and print it a character at a time, but since I'm reading it into a string, whether I do it by line or by getchar() terminating on EOF, I've got to keep a limit on it in some way, don't I?
Nope! you don't need to have "bounds on the string". Using getchar() eliminates getline() as the input will be processed a field at a time ie "divide and conquer".
Originally Posted by dgoodmaniii
.
.
.
Then I return the number of spaces I replaced (plus one for the '\t' I inserted).}

I'm using the underscore for spaces so that I can visually verify what's been replaced and what hasn't.
Gotcha! all along I couldn't figure out why the underscore; makes sense now.
Originally Posted by dgoodmaniii
I've skipped getline(), as its function seems obvious. It gets a line, returns the length of said line.

So do you all think I'm at least on the right track?
Yep!

14. Originally Posted by dgoodmaniii
I had a sudden epiphany while banging away at this; if I'd stopped to think longer about it before trying to write it, I wouldn't have had this problem. Here's the deal: inserting the tab into the string changed the length of the string, because I was removing a variable number of spaces and then inserting a single character, '\t'. This meant that the index wasn't finding the right tab spots, because the same index, after tabreplace() was run, pointed at a different spot in the string.
Yep! since a tab and a space ea. count as a single character but for display purposes a single tab takes up the same room as 8 spaces.
Originally Posted by dgoodmaniii
I needed two different variables, one to keep track of where the tab stops were and one to be a simple index for the string. Once I conceptually separated those two functions, all was well.
Yep! you need 2 variables - one to keep track of the number of spaces seen so far and the other for the current column number within the field
Originally Posted by dgoodmaniii
It's my third complete rewrite (excepting getline(), which I cribbed wholesale anyway), but I finally got it. This has taught me an extremely valuable lesson in C programming: it's not like Perl, where you can just hack away at things until they work. It's a spartan language, but a lovely one, and it requires elegance and foresight to produce working and sensible code.
The more you dive into the C the more programming pearls you'll gather
Originally Posted by dgoodmaniii
Here's what I wound up with, that produces the expected output in all circumstances I've tried (except that I haven't put in bounds checking, as this is just an exercise, not a production program):
.
.
.
I daresay it's a better-looking program, too, as well as a working one. Thanks for all your inspiration; you've been an enormous help to me.
No killjoy here but your program doesn't work with the said input:
Code:
`hello___world`

15. Originally Posted by itCbitC
Nope! you don't need to have "bounds on the string". Using getchar() eliminates getline() as the input will be processed a field at a time ie "divide and conquer".
Well, you could use getchar(), but I'm still reading the results into a string. If I do that, the string needs to have bounds, yes?

I could read it in by getchar() and then output by putchar(), one at a time, but I want to have the new string with tabs in a character array.