Thread: Data model for text editor

  1. #16
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Moving the cursor to another column shouldn't activate any need for realloc() - you're just updating the cursor (caret) position of the currently-edited-line.

    When the user moves the caret to another line, that's when you realloc() (once) the just-edited-line and fix the linked list (obviously, you do this only if it was changed). Then, just clear the currently-edited-line and assign it the data from whatever row the caret is now on. So, just one realloc().

    The only time you would need to realloc() the currently-edited-line buffer is if it grew larger than you had initially allocated. If you figure a line might typically be 80 chars, and you allocate 256 (for example), you won't be reallocing it very much. No big deal if you do, and completely transparent to the user.
    Mainframe assembler programmer by trade. C coder when I can.

  2. #17
    Registered User
    Join Date
    Aug 2009
    Posts
    198
    This part doesn't really make sense to me:

    Quote Originally Posted by Dino
    When the user moves the caret to another line, that's when you realloc() (once) the just-edited-line and fix the linked list (obviously, you do this only if it was changed). Then, just clear the currently-edited-line and assign it the data from whatever row the caret is now on. So, just one realloc().
    I see that you are suggesting a linked list of lines instead of an array of pointers to lines. Not sure which would be better.

    And I really don't get the part "Then, just clear the currently-edited-line and assign it the data from whatever row the caret is now on.". Why do I have to clear the line? What data on the row the caret is on?

    Maybe you are not thinking the same thing as I am. The way I initially understood this is that imagine that we have a text file with 2 lines. The first line is being edited and has extra memory allocated to make room for extra characters. Now the user moves from line 1 to line 2. Line 1 is reallocated just enough size to hold it. Line 2 now reallocates more memory to prepare for editing. (I am now thinking that it is probably better to reallocate the second line once the user starts editing, not right away)

    Anyway, that was what I was thinking about and maybe it will clear up some confusion.

  3. #18
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by MTK View Post
    Maybe you are not thinking the same thing as I am. The way I initially understood this is that imagine that we have a text file with 2 lines. The first line is being edited and has extra memory allocated to make room for extra characters. Now the user moves from line 1 to line 2. Line 1 is reallocated just enough size to hold it.
    It will be silly to allocate memory in this way and to conceptualize the text on the screen as needing to be mirrored constantly in memory, requiring the operations that you seem to imply. You do not have to change the data every time the user inserts a character on screen.

    Also, just keep the whole thing in one string, not an array. AFAIK GUI API's generally return text area contents this way -- as a single string, and this is certainly the easiest and most efficient method. The terminal itself will wrap lines, and you can get the wrap length* with getmaxyx() etc. As for memory, allocate that in large chunks when you read the screen contents back in -- which can either happen at intervals, say every few seconds, or when the user performs certain actions.

    Have you looked at the CDK extentions for ncurses? It looks like there is a "multi-line entry" widget there which might make your life much easier depending on how it can be controlled...

    * the challange would be having a "no-wrap" option.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #19
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    I am thinking more in a variable length record-oriented fashion as opposed to a single (entire file) string manner, where the CRLF's break the records into "lines" (aka rows).

    So, let's say you have 3 lines in your file:

    1:ABC
    2: DEF
    3: GHI

    (without the number and colon).

    Read this into a linked list. It can still be an array, but the reason I chose a linked list was because the user might insert a new line between 1 & 2, or between 2 & 3, or even prior to line 1. The linked-list aspect will allow you to not have to realloc the whole array just to keep things in physical sequential order.

    My idea for the struct would look something like

    struct Contents {
    char *rowdata ;
    Contents *previous
    Contents *next ;
    } ;

    And, keep a pointer to the first linked list entry

    void * mylist ;


    So, now you also have another variable, called cel[256] (currently-edited-line). When the file is initially displayed, you copy line 1 into cel and put the caret prior to the first character.

    When the cursor is moved anywhere on that line, you just keep an updated cursor position - no big deal. When the use inserts a new character, you insert it into the proper place in cel - no changes to linked-list line 1 yet.

    When the user picks a new row, if there were any changes, then you take the contents of cel, stick it in a newly allocated string (malloc() ), free() the line 1 *rowdata, and then insert the new pointer into Contents, essentially overwriting line 1. (If the row got shorter, you don't even have to malloc a new chunk, you could just copy the new cel data into the old rowdata area and truncate via the null term character). Finally, you "blank out" cel (I said "clear the currently edited line before so I can see where that was confusing) and the copy the newly picked line (row) into cel, position the caret, and now you are back at the top of your editing loop.

    Does that make sense? Like I said before, I've never written an editor, but I've thought about it a lot.
    Mainframe assembler programmer by trade. C coder when I can.

  5. #20
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Dino View Post
    Read this into a linked list. It can still be an array, but the reason I chose a linked list was because the user might insert a new line between 1 & 2, or between 2 & 3, or even prior to line 1. The linked-list aspect will allow you to not have to realloc the whole array just to keep things in physical sequential order.
    Again, Dino, IMO that is totally crazy. You have a string. You can determine the current number of characters on screen AFTER a change has been made but BEFORE you read them back into your data store (the string). You realloc the string in chunks of say 1k, which is a lot of text -- in other words, if the number of characters to be read in is currently greater than the allocated length of the string, add memory rounded off to one Kb. So if it is three bytes/characters over because of an addition (anywhere), you add a Kb and you probably won't have to do that very often.

    Some key points at which the content must be read back in would be when the screen is to scroll up or down, since once gone you won't be able to get it. To compensate for off-screen data, you keep a pointer set to the place in the string where the current screen begins.

    It is also silly to use '\n' (or "\c\r" or whatever) as a line break, since the terminal *will* wrap a long line and you do not have to arrange anything that way. It will also automatically render newlines as entered by the user, obviously. You cannot (or should not) use variably placed linebreaks to structure the screen content, this will just create a headache and a mess.

    In other words, structuring the data based on a line break, and placing it in some kind of array or list, will be counter-intuitive to the purpose. A document is not an array, it is a single file (a string) and best left that way. You could do it based on screen width, but this will become ridiculous to try and shift around insertion and deletion. Use a single string. Honestly.
    Last edited by MK27; 09-03-2009 at 08:48 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  6. #21
    Registered User
    Join Date
    Aug 2009
    Posts
    198
    I already half-did an editor that stores the file as one big string and I was starting to think that was insanely complicated having to constantly iterate through the whole file counting newlines, so I kind of started changing it to use separate lines, but I couldn't figure out how to reallocate it well.

    I really like Dino's last idea and I am almost certain I will base my editor on it, but I still just wonder why do you have to copy the current line to cel[] and then back when you can just edit the line itself.

    Quote Originally Posted by MK27
    It will also automatically render newlines as entered by the user, obviously.
    Actually, I am using ncurses in noecho() mode so that characters are not echoed, because I think it is easier and more controlled to simply write out the visible portion of the document.

    Quote Originally Posted by MK27
    It is also silly to use '\n' (or "\c\r" or whatever) as a line break, since the terminal *will* wrap a long line and you do not have to arrange anything that way.
    I was originally thinking that it is best to let the terminal handle line-wrapping of lines larger than the window.

  7. #22
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    I see I'm in good company.

    Man flying was a totally crazy idea. Rosa Parks sitting at the front of the bus was totally crazy.

    There are tons of ways to skin this cat. Some crazy, some not. MTK asked for ideas - I'm giving mine.

    If the user interface chosen provides for handling of long strings by wrapping and rolling lines, and one wants to leverage that behavior, then managing the file as a full string would probably be fine.
    Mainframe assembler programmer by trade. C coder when I can.

  8. #23
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    MTK - cel could be used or not. You could certainly manage changing the line itself.
    Mainframe assembler programmer by trade. C coder when I can.

  9. #24
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Dino View Post
    Man flying was a totally crazy idea. Rosa Parks sitting at the front of the bus was totally crazy.

    There are tons of ways to skin this cat. Some crazy, some not. MTK asked for ideas - I'm giving mine.
    LOL! Good for you, but god help the person who has to try and operate this vehicle!

    If the user interface chosen provides for handling of long strings by wrapping and rolling lines, and one wants to leverage that behavior, then managing the file as a full string would probably be fine.
    The user interface is the terminal via ncurses, and of course it wraps lines...when was the last time you saw the cursor disappear off to the right? The complex part would be creating a buffer that DOESN'T wrap lines until a '\n'.

    @MTK: you are aware that doing this as a full blown GUI will be much easier?
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  10. #25
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Never used ncurses - but I got the book. Since my data is separate from my display... the UI could be anything.
    Mainframe assembler programmer by trade. C coder when I can.

  11. #26
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    MK27 - to give you my perspective on my design - I work on mainframes all day, every day. Every file format on the platform I've worked on every day for the last 29 years is record oriented.
    Mainframe assembler programmer by trade. C coder when I can.

  12. #27
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Dino View Post
    Never used ncurses - but I got the book. Since my data is separate from my display... the UI could be anything.
    I've used it a (very) little bit because it has a "main loop" that is nifty for IPC stuff (since you can monitor a connection inside of it, nice and easy) and is somewhat quicker and simpler than gtk BUT there are a lot of things, like maintaining a "text area" that gtk is already set up for, whereas ncurses is not, at least AFAICT. That being the case, I could write a simple, fairly functional text editor using gtk+ in like an hour, say a few hundred lines, but I have a funny feeling it could be quite a task with ncurses. However, I haven't made use of windows or panels in ncurses; your knowledge of the API will be a major factor here, if I were the OP I would focus on fiddling with and getting to know ncurses BEFORE I focussed on the nitty gritty details of an editor, if it's ncurses you must use.

    Vis, working with record based files on mainframes, no doubt that's a real skill but maybe like trying to kill mosquitos with a shotgun in this case? As in, it could be made to work real well but you are asking for trouble IMO, by treating a text file as anything but a string.

    Still think MTK may have become unnecessarily concerned about line breaks, a line break is just another character. And if you were "starting to think that was insanely complicated having to constantly iterate through the whole file counting newlines", well:
    1. You are unlikely to have so much text that parsing it as a string will take much more than a few nanoseconds.
    2. You are hardly reducing the insanity by making an array of lines ending in a newline; you are just changing the way you must deal with it. In the end, I think it will be increased...
    Last edited by MK27; 09-03-2009 at 10:34 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  13. #28
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    ps. You could set an array of pointers into the string, based on the screen width (eg, 80 bytes apart), and moving those pointers around will be way easier than actually shifting text around in an array of strings.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  14. #29
    Registered User
    Join Date
    Apr 2004
    Posts
    210
    I'd use two arrays of char: one for the original text, one for insert-changes (reasonably large) as well as an array (or a compact/preallocated linked list) of pointers into either.
    Individual lines of text within the textblock would be separated by \n\0 [or alternatively store the length of each line along with the pointers].

    Initially, each pointer indicates the beginning of a new row of text within the char-array (text+0, first '\n\0' +1, second '\n\0' +1 ...).

    * If text is removed from a row, only the textdata between the row-pointer-target and the next \0 will be updated. Nothing else changes. No realloc (and don't move the '\0').
    * If text is added to a row (you can use strlen on the rowpointer to get the row's original [=maximum] length), it's contents will be copied to the insert buffer and the rowpointer for this line is updated to point into that buffer.
    * Any additional row will be stored within the insertbuffer.
    * You can commit data from the insertbuffer to the textbuffer when the user saves the file. Make sure you skip the garbage between each \n and \0 that might have accumulated within the buffers.

    This worked quite well even when editing very large files. Individual mallocs for each row or even reallocs on the entire textblock really slow things down considerably when the files get larger than a couple hundred megabytes - which isn't uncommon for logfiles.
    main() { int O[!0<<~-!0]; (!0<<!0)[O]+= ~0 +~(!0|!0<<!0); printf("a function calling "); }

  15. #30
    Registered User
    Join Date
    Aug 2009
    Posts
    198
    I was thinking about the way the cursor moves. It moves seperately along the y and x axes, so organizing the text as such in memory would make it easier to compute cursor movement (just jump to the next line in the list), while if it was stored as a big array, you would have to iterate backwards and count characters until the next \n, and then iterate forward until the next \n, and then iterate until you get to the right column, checking whether there is an \n in the way.

    It just seems that text is meant to be organized in separate rows because that is the way it is meant to be read, and the \n character is just to make it easy for computers to store.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. question about a working linked list
    By cold_dog in forum C++ Programming
    Replies: 23
    Last Post: 09-13-2006, 01:00 AM
  2. Replies: 4
    Last Post: 06-14-2005, 05:45 AM
  3. Need help on Data Model
    By OneStiffRod in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 06-06-2003, 11:14 PM
  4. All u wanted to know about data types&more
    By SAMSAM in forum Windows Programming
    Replies: 6
    Last Post: 03-11-2003, 03:22 PM
  5. C Programming Question
    By TK in forum A Brief History of Cprogramming.com
    Replies: 13
    Last Post: 07-04-2002, 07:11 PM