Thread: Handling database files under Git workflow

  1. #1
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446

    Handling database files under Git workflow

    This application is being served by a sqlite database file. All project files, including the database are under version control. I'm using the Git workflow based on develop, release and feature branches.

    A feature branch may need to make sizable changes to the database, adding new tables to serve the feature being coded, for instance. But it is conceivable that the development branch may also once and a while make changes to the database, to fine tune certain tables after a feature has been merged, for instance. Similarly, the release branch, or a hotfix may need to change the database.

    Being a binary file with a strict internal structure, it's rarely possible to handle conflicts. So, every change to the database needs to be merged immediately into any live branches.

    So, how do I fit this particular file in the version control schema?

    I'm thinking the best option is for this file to be served by its own branch. Any change to the sqlite file must be made under this branch and it is the responsibility of other branch maintainers to merge in any new commits to the 'database' branch.

    Do I have a better option?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #2
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    I'm not a DB guy and this is just off the top of my head. What if instead of having a binary DB file that accompanies each feature/hotfix branch, have a master "UpdateSchemaToLatest.sql" that can take any DB from version X or higher to the latest schema. So if you have multiple schema changes in multiple branches, then merging will be text based.

    gg

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Could you use the .dump command to output a text file version of the database?

    Eg
    Sqlite database backup: the .dump command

    Being just text, it would be more amenable to merging.
    Extra rows would be dead easy, extra columns less so.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Could you use the .dump command to output a text file version of the database?
    O_o

    I do exactly that in some of my projects.

    You would probably need a "UpdateSchemaToLatest.sql" mechanism for release revisions of any significant application using a database, but I find writing such update scripts for development revisions misplaces time and effort.

    If you add a few filters and scripts to markup context over the `.dump' files, you can get pretty nice lists of schema changes without doing any real work.

    Soma
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  5. #5
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Excellent solution. I'll do it exactly like that. This way I can avoid versioning the binary file.

    Quote Originally Posted by phantomotap View Post
    If you add a few filters and scripts to markup context over the `.dump' files, you can get pretty nice lists of schema changes without doing any real work.
    I need to explore git hooks to see if I can automate the process, since I don't feel like manually generating the dump file. But I think you are talking about something else. Can you elaborate?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    I need to explore git hooks to see if I can automate the process, since I don't feel like manually generating the dump file. But I think you are talking about something else. Can you elaborate?
    O_o

    I use "Mercurial" so can't help with the hooks, but I am indeed talking about generating text files for the repository to difference from the normal binary database on commit.

    The approach isn't much of a stretch; I'm just using some simple information along with the binary database to generate different files which thanks to the repository mechanisms track different tables/schema separately without much hoop-jumpery nonsense.

    I don't have a huge amount of time available right now, but you may feel free to ask for more information if you don't follow anything from my strategy.

    I have a directory in each project that holds the database header (".json") files and any build instruction set (".sql.txt"*) files, if any exists, called "database" or similar.

    The header file contains options used by a script, which triggers with commit actions, that changes what information is included or excluded in the build instruction set.

    Code:
    (layout)
    / cboard
        / stable
            / cboard.db
            / database
                001_members.json
                002_posts.json
        / development
            / cboard.db
            / database
                001_members.json
                002_posts.json
        / features
            / reputation
                / cboard.db
                / database
                    001_members.json
                    002_posts.json
                    003_reputation.json
    Code:
    (stable -> 001_members.json)
    // the members table stores members which i know is a dumb comment but whatever this is just an example
    {
      "table": "members",
      "collate": "none",
      "command": "CREATE TABLE members(id INTEGER PRIMARY KEY, name TEXT);"
    }
    Code:
    (development -> 001_members.json)
    // the members table stores members which i know is a dumb comment but whatever this is just an example
    // revision: added posts counter
    {
      "table": "members",
      "collate": "none"
    }
    The header itself is part of the repository so notes and revisions can be included in the markup and/or commit messages for the relevant files. The bits and bobs have defaults, but the example is illustrative. The "table" is used so that the files and names can follow different standards for naming. The "collate" is used for optionally sorting "INSERT" lines for the sake of focusing the visibility of changes between versions. (The point is to focus on small changes in large sets without a lot of overhead to be seen in the difference files.) The "command" is used as a cheap firewall (If the "command" doesn't exist, the value is ignored allowing free changes to the table without warning during development.) to prevent accidental changes between stable versions.

    The script executed on commit action generates the build instruction set corresponding to the database ("cboard.db") for each header.

    Code:
    (layout)
    / cboard
        / stable
            / database
                001_members.sql.txt
    Code:
    (stable -> generated 001_members.sql.txt)
    CREATE TABLE members(id INTEGER PRIMARY KEY, name TEXT);
    INSERT INTO "members" VALUES(1,'Mario F.');
    INSERT INTO "members" VALUES(2,'phantomotap');
    The script also glues the varies build instruction sets together according to those same header files.

    Code:
    sql3revision -b ./database | sqlite3 "cboard.db"
    Soma

    *) I'm used to ".sql" extension for binary instruction files.
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  7. #7
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    I can see exactly what you are doing. I like particularly the approach of generating only that which is required for that particular commit and nothing more. Most excellent. Going through the commit history to see when something was changed is a breeze.

    And this integrates well with any RDBMS where the database is even harder to serialize, like postgres, mysql, sqlserver, etc.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Mario F. View Post
    This application is being served by a sqlite database file. All project files, including the database are under version control. I'm using the Git workflow based on develop, release and feature branches.

    A feature branch may need to make sizable changes to the database, adding new tables to serve the feature being coded, for instance. But it is conceivable that the development branch may also once and a while make changes to the database, to fine tune certain tables after a feature has been merged, for instance. Similarly, the release branch, or a hotfix may need to change the database.

    Being a binary file with a strict internal structure, it's rarely possible to handle conflicts. So, every change to the database needs to be merged immediately into any live branches.

    So, how do I fit this particular file in the version control schema?

    I'm thinking the best option is for this file to be served by its own branch. Any change to the sqlite file must be made under this branch and it is the responsibility of other branch maintainers to merge in any new commits to the 'database' branch.

    Do I have a better option?
    I would keep the database file in a completely separate repository, and bring it into the dependent projects either manually, or using git submodule mechanism.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  9. #9
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by brewbuck View Post
    I would keep the database file in a completely separate repository, and bring it into the dependent projects either manually, or using git submodule mechanism.
    My idea initially was to indeed separate it, but by branching. The database file exists only in the context of this application, so the idea of having it on a separate repository wasn't so appealing.

    But I've been reevaluating the whole thing. If every change I make to the database needs to be merged into the other branches, that means one hell of a lot of merge entries on the repo. And I tend to dislike those immensely. I could set a rule for the the team to rebase instead of merge to avoid the clutter, but that is also not a good option because if you work on the feature-foo team, there's no way to know if your workmates have already rebased the latest update without asking.

    Meanwhile, I'm fearing the conflicts that can emerge, even when working from dump files that wouldn't require me to merge so often. SQL dialect isn't very friendly to conflicts on the DDL front. Meanwhile, a whole lot of changes can break existing code too easily, since sqlite doesn't offer options to move the business logic to the database.

    So you idea makes a whole lot of sense actually. I just stop version control from within the project and start the database own repo. Won't even use submodules. I will just have the sqlite binary file on .gitignore.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  10. #10
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Sounds like it could be the right pattern for this, but to be honest I haven't been in the exact situation.

    The code I deal with at work is split into 7 repos, and the home project (ok, dreaming of going commercial with it) is split across 5, so I'm kind of used to working like that. Hopefully, I don't send you down the wrong path.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File handling and header files?
    By linkstatic in forum C++ Programming
    Replies: 3
    Last Post: 12-20-2011, 03:18 PM
  2. Need Help in Handling Multiple Files
    By Gaurav Singh in forum C Programming
    Replies: 1
    Last Post: 05-15-2011, 02:42 PM
  3. need some help in scanning files in database
    By d4xyjen in forum C Programming
    Replies: 3
    Last Post: 08-15-2009, 05:13 PM
  4. Looking for a workflow management application
    By PJYelton in forum A Brief History of Cprogramming.com
    Replies: 6
    Last Post: 10-13-2006, 06:07 AM
  5. Error Handling and Header Files
    By the pooper in forum C Programming
    Replies: 10
    Last Post: 01-03-2005, 01:40 AM