Thread: Is there any way for a program to find out how many spaces...?

  1. #1
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827

    Arrow Is there any way for a program to find out how many spaces...?

    Is there any way for a program to find out how many spaces a particular tab '\t' character represents in a file?
    I am writing a program for fixing a fairly long file, i.e. lining up the data properly in the third column. I found it necessary to replace all the tab characters found in the file with the number of spaces they represent. I initially assumed they would all be the same in terms of spaces, but I just found out they are not, so I can not make any assumption like "one length fits all" and expect my program to accomplish the job properly. For example, at least one tab character I found in the file represents 3 spaces, and another tab character I looked at represents 5 spaces. Don't ask me why this is, because I just copied/pasted the data in the file from an online source, and I wasn't the one who wrote it to begin with.

    So the bottom line is, I'm hoping there's some way to find out programmatically how many spaces each tab represents, so that I can replace the tabs with spaces.

    Thanks in advance.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  2. #2
    Registered User
    Join Date
    Jul 2008
    Posts
    38
    I believe tab size is dependent on the file editor with which you are reading the file. There is no system wide tab size.

  3. #3
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    Well a tab *typically* represents up to 8 spaces (in notepad on Windows anyway), i.e. the tabs snap to multiples of 8. So, say you've read in 3 characters and then read in a tab. That tab is going to be 5 spaces so that the total space used up so far snaps up to the nearest multiple of 8, i.e. 8. Get it?

  4. #4
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by Florian View Post
    I believe tab size is dependent on the file editor with which you are reading the file. There is no system wide tab size.
    Well, I thought the same thing, but then I found out that the editor I'm viewing the file in (gedit) has 8 spaces for one tab created in the editor, but the tabs that I copied/pasted into the file (along with the rest of the content), and saved, represents a different number of spaces. So it seems that somehow whatever editor or program that writes the tab characters effects how many spaces they will represent across all editors that view the same file. Or else, gedit would be showing all tabs as 8 spaces, wouldn't it?
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  5. #5
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by Epy View Post
    Well a tab *typically* represents up to 8 spaces (in notepad on Windows anyway), i.e. the tabs snap to multiples of 8. So, say you've read in 3 characters and then read in a tab. That tab is going to be 5 spaces so that the total space used up so far snaps up to the nearest multiple of 8, i.e. 8. Get it?
    No, I'm not so sure I get it. You're saying that the tabs are effected by the surrounding content, and that their space is dependant on how many characters are read into a program BEFORE the tab? That doesn't make much sense to me.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  6. #6
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    Quote Originally Posted by Programmer_P View Post
    So it seems that somehow whatever editor or program that writes the tab characters effects how many spaces they will represent across all editors that view the same file. Or else, gedit would be showing all tabs as 8 spaces, wouldn't it?
    Are you sure you have copied the tab character?
    Or the "source" editor replaced the tab character with several spaces before putting the block into the copy-buffer?
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  7. #7
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by vart View Post
    Are you sure you have copied the tab character?
    Or the "source" editor replaced the tab character with several spaces before putting the block into the copy-buffer?
    Yes, I am sure that it is the tab character, because when I put the cursor on a given line with the tab characters, I used the right arrow key to move the cursor horizontally across the text, and when it gets to the tab character that is there, it *jumps* across several spaces, instead of just going one space, like it does for space characters. That is how I know it is a tab character. That also with the fact that my program's output file, which is a copy of the input file with said tabs replaced with spaces, doesn't do that on the same lines as the original file, since the tab characters were replaced with several spaces.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  8. #8
    - - - - - - - - oogabooga's Avatar
    Join Date
    Jan 2008
    Posts
    2,808
    Quote Originally Posted by Programmer_P View Post
    No, I'm not so sure I get it. You're saying that the tabs are effected by the surrounding content, and that their space is dependant on how many characters are read into a program BEFORE the tab? That doesn't make much sense to me.
    Take a look at this.
    Code:
    |-------|-------|-------|-------|-------|
    a       bcde    fghijkl mnopqrst        u
     T          T          T        T
    Here we are assuming a tab size of 8. The first line is a ruler with a | every 8 characters. The second line is a line with tabs in it. On the third line I've put a capital T just under the tab characters in the second line. Note that wherever the tab is placed, it jumps to the next "tab stop" and that can take anywhere from 1 to 8 spaces.

  9. #9
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by Programmer_P View Post
    No, I'm not so sure I get it. You're saying that the tabs are effected by the surrounding content, and that their space is dependant on how many characters are read into a program BEFORE the tab? That doesn't make much sense to me.
    It not only makes perfect sense to me, but it's blindingly obvious.
    Tabs move the cursor to the next multiple of N spaces where N is commonly 4 or 8. Ergo the formula for the number of spaces that would produce the equivalent alignment for a given tab length, is very much straightforward.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  10. #10
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    I have seen text editors where the spacing introduced by single or multiple tabs was configurable.

    I remember one editor with a "Fortran mode" which had three tabbed spacings that were fixed as (relative to the start of a line) 7,72, and 80. (Any Fortran programmer familiar with fixed layout will understand the significance of those numbers). For code with nested loops, an equal tab spacing (with default value 3) was used for tabs between 7 and 72. So, consecutive tabs from 7 were (by default) 7, 10, 13, 16, 19 ...., 67, 70, 72, 80

    The formula for converting tabs to space characters is simple, but usually requires some assumptions (namely the spacing between two consecutive tabs). And, if different people (or programs) make different assumptions, some beautifully formatted source code containing tab characters can look horrible when opened with a different editor.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  11. #11
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by oogabooga View Post
    Take a look at this.
    Code:
    |-------|-------|-------|-------|-------|
    a       bcde    fghijkl mnopqrst        u
     T          T          T        T
    Here we are assuming a tab size of 8. The first line is a ruler with a | every 8 characters. The second line is a line with tabs in it. On the third line I've put a capital T just under the tab characters in the second line. Note that wherever the tab is placed, it jumps to the next "tab stop" and that can take anywhere from 1 to 8 spaces.
    Well, that's not the same thing he was saying. He was saying that given that every tab read in is 5 spaces in length, and say you read in 3 characters before the tab, those 3 characters plus the 5 spaces equals 8 character spaces. But I was talking about space represented by the tab character itself, which is why his post didn't make much sense to me. And keep in mind that I'm faced with a situation where the tab characters do not all have the same tab size. That is why I cannot make any assumption about a tab size (like 8, 4, etc.), and expect my program to work on all tabs in the file.
    Last edited by Programmer_P; 01-16-2012 at 08:19 AM.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  12. #12
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by iMalc View Post
    It not only makes perfect sense to me, but it's blindingly obvious.
    Tabs move the cursor to the next multiple of N spaces where N is commonly 4 or 8. Ergo the formula for the number of spaces that would produce the equivalent alignment for a given tab length, is very much straightforward.
    I realize that. However, as I do believe I said in the first post, the file that I'm reading does not use a consistent number of spaces for every tab character in the file. It differs. There's some with 5 spaces, some with 4, some with 3, etc.
    That is why I need a non-straightforward formula for solving the problem.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  13. #13
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Quote Originally Posted by grumpy View Post
    I have seen text editors where the spacing introduced by single or multiple tabs was configurable.

    I remember one editor with a "Fortran mode" which had three tabbed spacings that were fixed as (relative to the start of a line) 7,72, and 80. (Any Fortran programmer familiar with fixed layout will understand the significance of those numbers). For code with nested loops, an equal tab spacing (with default value 3) was used for tabs between 7 and 72. So, consecutive tabs from 7 were (by default) 7, 10, 13, 16, 19 ...., 67, 70, 72, 80

    The formula for converting tabs to space characters is simple, but usually requires some assumptions (namely the spacing between two consecutive tabs). And, if different people (or programs) make different assumptions, some beautifully formatted source code containing tab characters can look horrible when opened with a different editor.
    If that is so, I would be interested in reading what it is.
    I've been doing some research on my own while waiting for replies here, and found this page. According to the author, the Tab key sometimes does not actually write the tab character to file in some editors when pressed, it merely indents a set number of spaces. And of course, there are many different interpretations of the tab character itself. So I think I'm going to find out what interpretation my gedit editor is using when it comes to tab characters and the TAB key, and perhaps that will clue me in to some kind of procedure that will work.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  14. #14
    Programming Ninja In-T...
    Join Date
    May 2009
    Posts
    827
    Alright. I just found out the default setting of gedit is to have the tab width be 8 spaces, and to insert the tab character when the tab key is pressed. However, this setting can be changed to have a tab width of your own choosing, and you can also set it to insert spaces instead of tabs to indent. But I still don't know how it handles tabs which are copied/pasted into the editor from another source. So that information doesn't help me much with this particular problem.
    However, I have an idea. I think that instead of replacing tabs with spaces, I will just remove them altogether instead from the file, and then adjust the columns to line up properly by simply adding the number of spaces I need, which will be calculated by subtracting a given column row's pos from that column's header pos to get the number of spaces I need to add for that column row to line up with the column header. I then just insert that number of spaces in front of the column row in order to shift it over to the right enough to line up.
    I'm an alien from another world. Planet Earth is only my vacation home, and I'm not liking it.

  15. #15
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    You're still way overthinking things here. The example oogabooga posted is still perfectly correct for your situation.

    If text is copied and pasted from another editor then the tabs within that text will take on the tab positions defined within the editor it is going into, exactly as though you had just typed it in this program.

    There is no differing behaviour per line or per tab. They all fill up to the next multiple of N.
    To replace tabs with spaces in a fixed character width editor, and have it look the same, the only thing you need to know is what is the multiple of character positions to which a tab advances to.
    I don't understand why this is not sinking in for you.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 9
    Last Post: 09-02-2011, 04:57 PM
  2. Program does not printf after spaces.
    By team23 in forum C Programming
    Replies: 5
    Last Post: 09-18-2010, 03:46 PM
  3. Replies: 7
    Last Post: 10-03-2009, 10:58 PM
  4. Replies: 4
    Last Post: 12-01-2007, 04:10 PM
  5. Can't find spaces
    By Drakon in forum C++ Programming
    Replies: 14
    Last Post: 04-07-2004, 10:02 AM