what is the issue with database changes?

Wed Mar 18 02:07:35 EDT 2015

On Tue, Mar 17, 2015 at 8:40 PM, Matt Mackall <mpm at selenic.com> wrote:
> On Tue, 2015-03-17 at 20:15 +0100, Jan Heylen wrote:
>> On Tue, Mar 3, 2015 at 7:55 PM, Matt Mackall <mpm at selenic.com> wrote:
>> > On Tue, 2015-03-03 at 15:24 +0100, Thomas De Schampheleire wrote:
>> >> Hi,
>> >>
>> >> Regularly I hear that we don't want to change the model yet to stay
>> >> backwards compatible.
>> >>
>> >> However, I do see several 'dbmigrate' scripts in the source base,
>> >> which hint at it being possible to migrate across database changes.
>> >>
>> >> Can someone explain in more detail why we do not want such database
>> >> changes (yet)?
>> >
>> > On-disk format migrations are an anti-pattern of software development.
>> > They're fragile and one-way and present a large barrier to user
>> > acceptance of new versions. And there's no excuse for them when you have
>> > a database rather than a file format: you can always add new tables
>> > without changing the structure of the existing tables.
>>
>> So, given this anti-pattern, say you want to introduce a new type of
>> changeset-comment, I see 3 options:
>
> ...
>
>> 3. add a new class (so database table) e.g. ChangesetOtherCommentType,
>> and only use it for 'the other type' and keep using ChangsetComment
>> for comments and use the other table for the new feature.
>>
>> No issue with downgrading. The new table is just not seen by the old model.
>
> That's the ticket. Append to the schema, but don't change the semantics
> of any existing elements of the schema.

Yes, but also in the first and the second option we "append" to the
schema, the question is, should downgrading be supported without
database wipe or not.
It makes a huge difference in implementation effort to support only
the upgrade path for keeping data in the database, or also the
downgrade path...

>
>> For all three, I also have the question: Is there a migrate (script)
>> needed?
>
> For #3, you can detect/ignore that the table doesn't exist on read and
> create it on write.. at run-time. Or simply assert that the list of
> tables L exists or create them at start-up. Then the user need never
> even be aware that things are changing.

How does this work in pylons, has this been done in the history of
kallithea, a reference implementation for this would be usefull.

>
> This is a bit of a headache for developers to save a significant
> headache for users. If you have more users than developers, it's a huge
> win.
>
>>  And how does that work, for as far as I can see, the 'old'
>> model db.py is copied to another class, and the new db.py get an
>> increased version number.
>
> Version numbers are what you do when you can't do the above. Given that
> databases are basically the ultimate free-form extensible file format,
> there's really no good reason to have version numbers when you can
> literally say to the storage "hey, do you know about feature X? ok, now
> you do."

True.

>
> This is mostly about "there's something missing in the schema" rather
> than "the schema is fundamentally broken" but I would say that even in
> that case, it's much better for your installed base for you to try to
> work around your mistakes rather than reaching for an incompatible
> change first.

At least we should somehow start creating "the wanted db schema", or
document it somehow. Because by doing these kind of workarounds
(creating new tables instead of just-a-new-column, the db and its
model don't get cleaner...

br,

Jan

>
> --
> Mathematics is the supreme nostalgia of our time.
>
>