what is the issue with database changes?

Matt Mackall mpm at selenic.com
Tue Mar 17 15:40:48 EDT 2015


On Tue, 2015-03-17 at 20:15 +0100, Jan Heylen wrote:
> On Tue, Mar 3, 2015 at 7:55 PM, Matt Mackall <mpm at selenic.com> wrote:
> > On Tue, 2015-03-03 at 15:24 +0100, Thomas De Schampheleire wrote:
> >> Hi,
> >>
> >> Regularly I hear that we don't want to change the model yet to stay
> >> backwards compatible.
> >>
> >> However, I do see several 'dbmigrate' scripts in the source base,
> >> which hint at it being possible to migrate across database changes.
> >>
> >> Can someone explain in more detail why we do not want such database
> >> changes (yet)?
> >
> > On-disk format migrations are an anti-pattern of software development.
> > They're fragile and one-way and present a large barrier to user
> > acceptance of new versions. And there's no excuse for them when you have
> > a database rather than a file format: you can always add new tables
> > without changing the structure of the existing tables.
> 
> So, given this anti-pattern, say you want to introduce a new type of
> changeset-comment, I see 3 options:

...

> 3. add a new class (so database table) e.g. ChangesetOtherCommentType,
> and only use it for 'the other type' and keep using ChangsetComment
> for comments and use the other table for the new feature.
> 
> No issue with downgrading. The new table is just not seen by the old model.

That's the ticket. Append to the schema, but don't change the semantics
of any existing elements of the schema.

> For all three, I also have the question: Is there a migrate (script)
> needed?

For #3, you can detect/ignore that the table doesn't exist on read and
create it on write.. at run-time. Or simply assert that the list of
tables L exists or create them at start-up. Then the user need never
even be aware that things are changing.

This is a bit of a headache for developers to save a significant
headache for users. If you have more users than developers, it's a huge
win.

>  And how does that work, for as far as I can see, the 'old'
> model db.py is copied to another class, and the new db.py get an
> increased version number.

Version numbers are what you do when you can't do the above. Given that
databases are basically the ultimate free-form extensible file format,
there's really no good reason to have version numbers when you can
literally say to the storage "hey, do you know about feature X? ok, now
you do."

This is mostly about "there's something missing in the schema" rather
than "the schema is fundamentally broken" but I would say that even in
that case, it's much better for your installed base for you to try to
work around your mistakes rather than reaching for an incompatible
change first.

-- 
Mathematics is the supreme nostalgia of our time.




More information about the kallithea-general mailing list