Handling database files under Git workflow

**Mario F.** · 04-10-2015

This application is being served by a sqlite database file. All project files, including the database are under version control. I'm using the Git workflow based on develop, release and feature branches.

A feature branch may need to make sizable changes to the database, adding new tables to serve the feature being coded, for instance. But it is conceivable that the development branch may also once and a while make changes to the database, to fine tune certain tables after a feature has been merged, for instance. Similarly, the release branch, or a hotfix may need to change the database.

Being a binary file with a strict internal structure, it's rarely possible to handle conflicts. So, every change to the database needs to be merged immediately into any live branches.

So, how do I fit this particular file in the version control schema?

I'm thinking the best option is for this file to be served by its own branch. Any change to the sqlite file must be made under this branch and it is the responsibility of other branch maintainers to merge in any new commits to the 'database' branch.

Do I have a better option?

**Codeplug** · 04-10-2015

I'm not a DB guy and this is just off the top of my head. What if instead of having a binary DB file that accompanies each feature/hotfix branch, have a master "UpdateSchemaToLatest.sql" that can take any DB from version X or higher to the latest schema. So if you have multiple schema changes in multiple branches, then merging will be text based.

gg

**Salem** · 04-10-2015

Could you use the .dump command to output a text file version of the database?

Eg
Sqlite database backup: the .dump command

Being just text, it would be more amenable to merging.
Extra rows would be dead easy, extra columns less so.

**phantomotap** · 04-10-2015

Could you use the .dump command to output a text file version of the database?

O_o

I do exactly that in some of my projects.

You would probably need a "UpdateSchemaToLatest.sql" mechanism for release revisions of any significant application using a database, but I find writing such update scripts for development revisions misplaces time and effort.

If you add a few filters and scripts to markup context over the `.dump' files, you can get pretty nice lists of schema changes without doing any real work.

Soma

**Mario F.** · 04-11-2015

Excellent solution. I'll do it exactly like that. This way I can avoid versioning the binary file.

Originally Posted by phantomotap

If you add a few filters and scripts to markup context over the `.dump' files, you can get pretty nice lists of schema changes without doing any real work.

I need to explore git hooks to see if I can automate the process, since I don't feel like manually generating the dump file. But I think you are talking about something else. Can you elaborate?

**phantomotap** · 04-11-2015

I need to explore git hooks to see if I can automate the process, since I don't feel like manually generating the dump file. But I think you are talking about something else. Can you elaborate?

O_o

I use "Mercurial" so can't help with the hooks, but I am indeed talking about generating text files for the repository to difference from the normal binary database on commit.

The approach isn't much of a stretch; I'm just using some simple information along with the binary database to generate different files which thanks to the repository mechanisms track different tables/schema separately without much hoop-jumpery nonsense.

I don't have a huge amount of time available right now, but you may feel free to ask for more information if you don't follow anything from my strategy.

I have a directory in each project that holds the database header (".json") files and any build instruction set (".sql.txt"*) files, if any exists, called "database" or similar.

The header file contains options used by a script, which triggers with commit actions, that changes what information is included or excluded in the build instruction set.

Code:

(layout)
/ cboard
    / stable
        / cboard.db
        / database
            001_members.json
            002_posts.json
    / development
        / cboard.db
        / database
            001_members.json
            002_posts.json
    / features
        / reputation
            / cboard.db
            / database
                001_members.json
                002_posts.json
                003_reputation.json

Code:

(stable -> 001_members.json)
// the members table stores members which i know is a dumb comment but whatever this is just an example
{
  "table": "members",
  "collate": "none",
  "command": "CREATE TABLE members(id INTEGER PRIMARY KEY, name TEXT);"
}

Code:

(development -> 001_members.json)
// the members table stores members which i know is a dumb comment but whatever this is just an example
// revision: added posts counter
{
  "table": "members",
  "collate": "none"
}

The header itself is part of the repository so notes and revisions can be included in the markup and/or commit messages for the relevant files. The bits and bobs have defaults, but the example is illustrative. The "table" is used so that the files and names can follow different standards for naming. The "collate" is used for optionally sorting "INSERT" lines for the sake of focusing the visibility of changes between versions. (The point is to focus on small changes in large sets without a lot of overhead to be seen in the difference files.) The "command" is used as a cheap firewall (If the "command" doesn't exist, the value is ignored allowing free changes to the table without warning during development.) to prevent accidental changes between stable versions.

The script executed on commit action generates the build instruction set corresponding to the database ("cboard.db") for each header.

Code:

(layout)
/ cboard
    / stable
        / database
            001_members.sql.txt

Code:

(stable -> generated 001_members.sql.txt)
CREATE TABLE members(id INTEGER PRIMARY KEY, name TEXT);
INSERT INTO "members" VALUES(1,'Mario F.');
INSERT INTO "members" VALUES(2,'phantomotap');

The script also glues the varies build instruction sets together according to those same header files.

Code:

sql3revision -b ./database | sqlite3 "cboard.db"

Soma

*) I'm used to ".sql" extension for binary instruction files.

**Mario F.** · 04-11-2015

I can see exactly what you are doing. I like particularly the approach of generating only that which is required for that particular commit and nothing more. Most excellent. Going through the commit history to see when something was changed is a breeze.

And this integrates well with any RDBMS where the database is even harder to serialize, like postgres, mysql, sqlserver, etc.

**brewbuck** · 04-12-2015

Originally Posted by Mario F.

This application is being served by a sqlite database file. All project files, including the database are under version control. I'm using the Git workflow based on develop, release and feature branches.

A feature branch may need to make sizable changes to the database, adding new tables to serve the feature being coded, for instance. But it is conceivable that the development branch may also once and a while make changes to the database, to fine tune certain tables after a feature has been merged, for instance. Similarly, the release branch, or a hotfix may need to change the database.

Being a binary file with a strict internal structure, it's rarely possible to handle conflicts. So, every change to the database needs to be merged immediately into any live branches.

So, how do I fit this particular file in the version control schema?

I'm thinking the best option is for this file to be served by its own branch. Any change to the sqlite file must be made under this branch and it is the responsibility of other branch maintainers to merge in any new commits to the 'database' branch.

Do I have a better option?

I would keep the database file in a completely separate repository, and bring it into the dependent projects either manually, or using git submodule mechanism.

**Mario F.** · 04-14-2015

Originally Posted by brewbuck

I would keep the database file in a completely separate repository, and bring it into the dependent projects either manually, or using git submodule mechanism.

My idea initially was to indeed separate it, but by branching. The database file exists only in the context of this application, so the idea of having it on a separate repository wasn't so appealing.

But I've been reevaluating the whole thing. If every change I make to the database needs to be merged into the other branches, that means one hell of a lot of merge entries on the repo. And I tend to dislike those immensely. I could set a rule for the the team to rebase instead of merge to avoid the clutter, but that is also not a good option because if you work on the feature-foo team, there's no way to know if your workmates have already rebased the latest update without asking.

Meanwhile, I'm fearing the conflicts that can emerge, even when working from dump files that wouldn't require me to merge so often. SQL dialect isn't very friendly to conflicts on the DDL front. Meanwhile, a whole lot of changes can break existing code too easily, since sqlite doesn't offer options to move the business logic to the database.

So you idea makes a whole lot of sense actually. I just stop version control from within the project and start the database own repo. Won't even use submodules. I will just have the sqlite binary file on .gitignore.

**brewbuck** · 04-14-2015

Sounds like it could be the right pattern for this, but to be honest I haven't been in the exact situation.

The code I deal with at work is split into 7 repos, and the home project (ok, dreaming of going commercial with it) is split across 5, so I'm kind of used to working like that. Hopefully, I don't send you down the wrong path.

Thread: Handling database files under Git workflow

Thread Tools

Search Thread

Display

Handling database files under Git workflow

Similar Threads

File handling and header files?

Need Help in Handling Multiple Files

need some help in scanning files in database

Looking for a workflow management application

Error Handling and Header Files