repository caching

Mads Kiilerich mads at kiilerich.com
Wed Sep 2 00:02:02 UTC 2015


On 09/02/2015 01:40 AM, Andrew Bartlett wrote:
> On Wed, 2015-09-02 at 01:32 +0200, Mads Kiilerich wrote:
>> On 09/02/2015 12:16 AM, Andrew Bartlett wrote:
>>> On Tue, 2015-09-01 at 19:55 +0200, Andrew Shadura wrote:
>>>> Hello everyone,
>>>>
>>>> I was reading kallithea/model/db.py and trying to understand what
>>>> Repository.update_changeset_cache does. It seems, it gets the
>>>> last
>>>> changeset from the repository, and if it's the same cache thinks
>>>> it
>>>> is,
>>>> it does nothing, otherwise it updates the database. Is getting
>>>> the
>>>> last
>>>> changeset really such an expensive operation we want to do
>>>> rarely? It
>>>> seems to me that this sort of caching is doing more harm than
>>>> good.
>>>> We
>>>> should probably call update_changeset_cache more often or, maybe,
>>>> call
>>>> it every time we access the changeset cache, so any
>>>> inconsistencies
>>>> are
>>>> detected immediately.
>>>>
>>>> What do you think about it?
>>> I've been bitten by it, when attempting to allow direct operations
>>> on
>>> the git repo, like before I found the SSH patch set.
>>>
>>> Similarly I would like to be able to force update of repos using
>>> cron,
>>> or automated sync of github issues etc.
>> That also works just fine if you call these two paster commands after
>>
>> updating the repo - possibly from a hook.
>>
>> Why is that not good enough? What more is needed and can you outline
>> how
>> it perhaps could be implemented?
> Because for the user who is just trying out the software, it breaks for
> no clear reason, and no clue as how to even start investigating.  (The
> failure mode was, from memory, that some details were updated while
> others were not, and clicking on things in the web UI just caused
> strange crashes).
>
> Indeed, there are hooks in the repo, but if they fail, they just
> succeed!
>
> At the very least, when the web UI notices things are not quite right,
> it should try updating the cache.

Thanks for clarifying.

The issue you saw must have been caused by short term caching in memory 
and can be avoided by telling all processes to refresh their repo 
caches. That is a similar to but different from the long term cache of 
repo data in the db (which is used to avoid needing repos at all when 
showing the overview pages).

I don't see a good way to avoid either of them or make them work by 
magic. The repo cache (and other caches) could be much smarter. And if 
you frequently see the same kind of failure, we could probably add some 
special hack for that.

But ... thinking of it ... I would assume that git repos get hooks 
installed (see today's PR) that would update the DB? Mercurial repos 
could perhaps get the same.

/Mads


More information about the kallithea-general mailing list