what is the issue with database changes?

Tue Mar 17 16:23:23 EDT 2015

On Tue, 2015-03-17 at 20:58 +0100, Thomas De Schampheleire wrote:
> Hi,
> 
> On Tue, Mar 17, 2015 at 8:40 PM, Matt Mackall <mpm at selenic.com> wrote:
> > On Tue, 2015-03-17 at 20:15 +0100, Jan Heylen wrote:
> >> On Tue, Mar 3, 2015 at 7:55 PM, Matt Mackall <mpm at selenic.com> wrote:
> >> > On Tue, 2015-03-03 at 15:24 +0100, Thomas De Schampheleire wrote:
> >> >> Hi,
> >> >>
> >> >> Regularly I hear that we don't want to change the model yet to stay
> >> >> backwards compatible.
> >> >>
> >> >> However, I do see several 'dbmigrate' scripts in the source base,
> >> >> which hint at it being possible to migrate across database changes.
> >> >>
> >> >> Can someone explain in more detail why we do not want such database
> >> >> changes (yet)?
> >> >
> >> > On-disk format migrations are an anti-pattern of software development.
> >> > They're fragile and one-way and present a large barrier to user
> >> > acceptance of new versions. And there's no excuse for them when you have
> >> > a database rather than a file format: you can always add new tables
> >> > without changing the structure of the existing tables.
> >>
> >> So, given this anti-pattern, say you want to introduce a new type of
> >> changeset-comment, I see 3 options:
> >
> > ...
> >
> >> 3. add a new class (so database table) e.g. ChangesetOtherCommentType,
> >> and only use it for 'the other type' and keep using ChangsetComment
> >> for comments and use the other table for the new feature.
> >>
> >> No issue with downgrading. The new table is just not seen by the old model.
> >
> > That's the ticket. Append to the schema, but don't change the semantics
> > of any existing elements of the schema.
> >
> >> For all three, I also have the question: Is there a migrate (script)
> >> needed?
> >
> > For #3, you can detect/ignore that the table doesn't exist on read and
> > create it on write.. at run-time. Or simply assert that the list of
> > tables L exists or create them at start-up. Then the user need never
> > even be aware that things are changing.
> >
> > This is a bit of a headache for developers to save a significant
> > headache for users. If you have more users than developers, it's a huge
> > win.
> >
> >>  And how does that work, for as far as I can see, the 'old'
> >> model db.py is copied to another class, and the new db.py get an
> >> increased version number.
> >
> > Version numbers are what you do when you can't do the above. Given that
> > databases are basically the ultimate free-form extensible file format,
> > there's really no good reason to have version numbers when you can
> > literally say to the storage "hey, do you know about feature X? ok, now
> > you do."
> >
> > This is mostly about "there's something missing in the schema" rather
> > than "the schema is fundamentally broken" but I would say that even in
> > that case, it's much better for your installed base for you to try to
> > work around your mistakes rather than reaching for an incompatible
> > change first.
> >
> 
> Feedback I have seen passing by from Mads on several occasions is that
> the Kallitha database schema is fundamentally broken. Given that there
> is not yet a huge install base, doesn't it make sense to make a big
> incompatible change now and use the incremental approach described
> above after that initial cleanup?

That's certainly a possibility. But you need to ask yourself: will we
get to a huge install base if we make upgrading a hassle for early
adopters? I can't tell you how many pieces of software I've abandoned
because I felt abused by their upgrade process.

-- 
Mathematics is the supreme nostalgia of our time.