Thread: Data model for text editor

  1. #1
    Registered User
    Join Date
    Aug 2009
    Posts
    198

    Data model for text editor

    I am trying to write a text editor and I am not really sure how to structure the text data. Right now the best I figured out is to hold the text in one big array that reallocates in 1KB steps and is stored as one continuous string, with newlines/tabs and all. It just requires a lot if iterating through the string and counting newlines and such. It still seems like a good idea but I was wondering if you know of something better or how to improve this?

  2. #2
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by MTK View Post
    I am trying to write a text editor and I am not really sure how to structure the text data. Right now the best I figured out is to hold the text in one big array that reallocates in 1KB steps and is stored as one continuous string, with newlines/tabs and all. It just requires a lot if iterating through the string and counting newlines and such. It still seems like a good idea but I was wondering if you know of something better or how to improve this?
    What are you using for the user interface -- ie, where is this text going to appear to the user?
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  3. #3
    Registered User
    Join Date
    Aug 2009
    Posts
    198
    In a Linux command line using ncurses

  4. #4
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Having never written a text editor, I would be inclined to keep each text line (a pointer to it) in a linked list, along with control info, such as "has it been changed", undo level perhaps, current line number, pointer to next line and previous line (aka, a linked list), etc.
    Mainframe assembler programmer by trade. C coder when I can.

  5. #5
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Here is how I did it, and it worked rather well.

    Store the text in a large array. Then have a separate list (or array) of pointers to new lines in the text array. This allows you to scroll up and down very easily because you already have pointers to the start of all the lines. The only time you need to recalculate the pointer array is when the window's width is resized.
    bit∙hub [bit-huhb] n. A source and destination for information.

  6. #6
    Registered User
    Join Date
    Aug 2009
    Posts
    198
    Why do you have to recalculate the pointer array when the window is resized?

    Don't you have to recalculate them if you type a character?

  7. #7
    Registered User
    Join Date
    Aug 2006
    Posts
    100
    Here is a pretty good guide to various data structures and methods for use in generic text editors:

    Data Structures for Text Sequences

    For a tutorial on a Windows editor, see :

    Part 1 - Overview | www.catch22.net

  8. #8
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Quote Originally Posted by MTK View Post
    Why do you have to recalculate the pointer array when the window is resized?

    Don't you have to recalculate them if you type a character?
    Because if the window is resized, then the newline locations will change due to word wrap. This is not true if you are not supporting word wrap in your text editor though.

    When a character is typed, you do not need to recalculate the entire pointer array. You just need to check if the new character is a newline character, or if that character forces a word wrap. If so, then add a new entry to the end of your pointer array.
    bit∙hub [bit-huhb] n. A source and destination for information.

  9. #9
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    I'm guessing if you were typing in the middle of the text, you might need to recalculate the start of the lines, unless you went the vi route and kept the old breaks until they were done and typed the magical "recalculate line breaks" command.

    I know we had a challenge or such here on the boards to write "ed" -- Matsp actually posted his, so you can get an idea of what a very basic line editor would look like.

  10. #10
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Quote Originally Posted by bithub View Post
    Because if the window is resized, then the newline locations will change due to word wrap. This is not true if you are not supporting word wrap in your text editor though.

    When a character is typed, you do not need to recalculate the entire pointer array. You just need to check if the new character is a newline character, or if that character forces a word wrap. If so, then add a new entry to the end of your pointer array.
    I would take a different approach, and separate the "data" from the "display". Whether a character might fit on a line has everything to do with display and nothing to do with data.
    Mainframe assembler programmer by trade. C coder when I can.

  11. #11
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Quote Originally Posted by Dino View Post
    I would take a different approach, and separate the "data" from the "display". Whether a character might fit on a line has everything to do with display and nothing to do with data.
    What do you mean? The entire point of my design is to separate the data and the display. That's why there are 2 different data structures. One has the unformatted data, and the other contains how the data is displayed. The display structure will need to be more complex if you start getting into things like different fonts and colors though.
    bit∙hub [bit-huhb] n. A source and destination for information.

  12. #12
    Registered User
    Join Date
    Aug 2009
    Posts
    198
    Quote Originally Posted by rdrast View Post
    Here is a pretty good guide to various data structures and methods for use in generic text editors:

    Data Structures for Text Sequences

    For a tutorial on a Windows editor, see :

    Part 1 - Overview | www.catch22.net
    The above link might have had some interesting ideas but the were not really explained well and the images were broken so I did not really understand.
    Last edited by MTK; 09-01-2009 at 06:37 PM.

  13. #13
    Registered User
    Join Date
    Aug 2009
    Posts
    198
    I was thinking that a good idea would be to array of pointers to individual lines, so lines and columns are more separate. The advantage would be that cursor movement, and undo would be easier as well as when you insert a character you only need to shift the characters in that one line, and not int the whole document. The disadvantage is that it would need to realloc() on every change because having unused memory on each line (as caused by reallocating in large steps) would build up to massive amounts in a large text. I wonder is there a kind of workaround for that?

  14. #14
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    For the "currently being edited line" you could keep a separate line buffer that is plenty long, and then when the user leaves that line, then realloc() the in-memory copy. As the user cursors up or down, or clicks to the set the current line, it would be very fast to change the pointer into the array for the currently-edited-line. I'm sure there are probably other ways to do it too.
    Mainframe assembler programmer by trade. C coder when I can.

  15. #15
    Registered User
    Join Date
    Aug 2009
    Posts
    198
    So with this only the currently edited line has extra memory space and I only have to do 2 realloc()'s when I move the cursor to another column?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. question about a working linked list
    By cold_dog in forum C++ Programming
    Replies: 23
    Last Post: 09-13-2006, 01:00 AM
  2. Replies: 4
    Last Post: 06-14-2005, 05:45 AM
  3. Need help on Data Model
    By OneStiffRod in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 06-06-2003, 11:14 PM
  4. All u wanted to know about data types&more
    By SAMSAM in forum Windows Programming
    Replies: 6
    Last Post: 03-11-2003, 03:22 PM
  5. C Programming Question
    By TK in forum A Brief History of Cprogramming.com
    Replies: 13
    Last Post: 07-04-2002, 07:11 PM