[PATCH] repo-scan: add option to scan nested repositories

Angel Ezquerra angel.ezquerra at gmail.com
Tue Feb 16 07:56:55 UTC 2016


On Mon, Feb 15, 2016 at 1:02 AM, Mads Kiilerich <mads at kiilerich.com> wrote:
> On 02/13/2016 09:40 PM, Angel Ezquerra wrote:
>>
>> # HG changeset patch
>> # User angel.ezquerra at gmail.com
>
>
> Thank you for the contribution!

Thanks for the review!

> I suggest you configure your Mercurial to use full 'Angel Ezquerra
> <angel.ezquerra at gmail.com>'.

Sure

> Also, the patch was mangled and didn't apply cleanly. It was not sent with
> patchbomb ... or like how you usually contribute to Mercurial?

I've had some problems when trying to send emails from TortoiseHg
using gmail :-/
That is why I just copyied and pasted it. I'll try to fix the problem
or send the patch as an attachment next time.

>> # Date 1455366858 -3600
>> #      Sat Feb 13 13:34:18 2016 +0100
>> # Node ID 71ec71b81cea0ca1aaefa1e1c5b3063219c96756
>> # Parent  2f14b4db03626c6eb2fd9b6f541d74c96de0178a
>> repo-scan: add option to scan nested repositories
>>
>> "nested repositories" are those that are found within other repositories.
>> An
>> example of these nested repositories are mercurial repositories. This
>> change
>> adds an option to the admin settings page to look for nested repositories
>> when
>> performing a repository scan.
>
>
> Is this enough to fully support nested repositories? Can you navigate to
> nested repositories? Can the top level both be managed as repository and as
> "repository group"? How to add a new such nested repository? Can the nested
> repositories and repository groups also be managed and deleted?

This patch simply adds the capability for Kallithea to find existing
nested repositories, and show them as regular repositories on the
repository list. It does not add complete support for them, but it is
a step in the right direction, IMHO.

> Assuming it adds something that is generally useful, it should probably be
> mentioned in docs/usage/vcs_support.rst .
>
> (It still seems odd to me that you in your setup first jumps through loops
> to put these repositories in the filesystem with "symlinks" from their real
> location, and then jump through other loops to make them appear as real
> separate repositories. I doubt this goes towards a good / the best solution
> to your problem.)

This patch is unrelated to our particular setup. We could be using
completely independent subrepos, or even having no repos that shared a
subrepo, and we would still need this to make it easy to access those
subrepos in Kallithea.

Regarding our particular setup, we arrived to it after quite a bit of
trial and error. We tried several things along the way, such as using
shared repos, mercurial level mappints, web server level mappints,
NTFS junctions and even completely separate subrepo copies, with
manual and even automatic pushes between them to keep them in sync. We
found that the NTFS junction solution was the simplest and the one
that worked better for our needs.

One particular feature of the NTFS junction solution is that all the
repos can be found on the file system, and thus you can always do
maintenance on them by accessing the server. This does not work with
mercurial or web server level mappings, where there is only one copy
of the subrepo, and you must know where to find it if you need to do
something on it (e.g. removing a revision containing your nuclear
launch codes that was pushed by mistake).

>> diff -r 2f14b4db0362 -r 71ec71b81cea
>> kallithea/controllers/admin/settings.py
>> --- a/kallithea/controllers/admin/settings.py    Tue Feb 09 17:46:36 2016
>> +0100
>> +++ b/kallithea/controllers/admin/settings.py    Sat Feb 13 13:34:18 2016
>> +0100
>> @@ -167,8 +167,9 @@
>>           c.active = 'mapping'
>>           if request.POST:
>>               rm_obsolete = request.POST.get('destroy', False)
>> +            find_nested = request.POST.get('find_nested', False)
>>               install_git_hooks = request.POST.get('hooks', False)
>> -            overwrite_git_hooks = request.POST.get('hooks_overwrite',
>> False);
>> +            overwrite_git_hooks = request.POST.get('hooks_overwrite',
>> False)
>>               invalidate_cache = request.POST.get('invalidate', False)
>>               log.debug('rescanning repo location with destroy
>> obsolete=%s, '
>>                         'install git hooks=%s and '
>> @@ -179,7 +180,7 @@
>>                   for repo in Repository.get_all():
>>                       ScmModel().mark_for_invalidation(repo.repo_name)
>>
>> -            filesystem_repos = ScmModel().repo_scan()
>> +            filesystem_repos =
>> ScmModel().repo_scan(find_nested=find_nested)
>>               added, removed = repo2db_mapper(filesystem_repos,
>> rm_obsolete,
>>
>> install_git_hooks=install_git_hooks,
>>                                               user=c.authuser.username,
>> diff -r 2f14b4db0362 -r 71ec71b81cea kallithea/lib/utils.py
>> --- a/kallithea/lib/utils.py    Tue Feb 09 17:46:36 2016 +0100
>> +++ b/kallithea/lib/utils.py    Sat Feb 13 13:34:18 2016 +0100
>> @@ -206,12 +206,15 @@
>>           sa.commit()
>>
>>
>> -def get_filesystem_repos(path, recursive=False, skip_removed_repos=True):
>> +def get_filesystem_repos(path, recursive=False, skip_removed_repos=True,
>> +                         find_nested=False):
>>       """
>>       Scans given path for repos and return (name,(type,path)) tuple
>>
>>       :param path: path to scan for repositories
>>       :param recursive: recursive search and return names with subdirs in
>> front
>> +    :param find_nested: look for 'nested repositories' (e.g. subrepos).
>> +                        find_nested is ignored if recursive is false.
>>       """
>>
>>       # remove ending slash for better results
>> @@ -240,14 +243,17 @@
>>               try:
>>                   scm_info = get_scm(cur_path)
>>                   yield scm_info[1].split(path, 1)[-1].lstrip(os.sep),
>> scm_info
>> +                if not find_nested:
>> +                    continue
>>               except VCSError:
>> -                if not recursive:
>> -                    continue
>> -                #check if this dir containts other repos for recursive
>> scan
>> -                rec_path = os.path.join(p, dirpath)
>> -                if not os.path.islink(rec_path) and
>> os.path.isdir(rec_path):
>> -                    for inner_scm in _get_repos(rec_path):
>> -                        yield inner_scm
>
>
> The existing code is cryptic and seems fragile. Have you investigated what
> the logic behind catching VCSError is? Could you add a comment? Should we do
> extra checks for what kind of VCSErrors we are catching?
>
> It seems extra cryptic when the "default" flow changes from fall-through to
> be caught by the 'continue' in the 'try' section and the exception
> apparently just gets ignored.

I agree that the code is perhaps not the best. Since I know basically
nothing about how Kallithea works I tried to do the smallest possible
change that worked for our needs.

That being said, it seems to be that what the code is trying to do is:

- try to access a folder as a repository
    - If that works, we found a repo! return it to the list of found
repos by yielding it
    - If that does not work, it will throw an exception. This means
that the folder is a regular folder.
    - In that case, check if we must do a recursive search
        - If that is the case, recurse into that folder and start the
process again
        - If not, go to the next folder in the current hierarchy level

>> +                pass
>> +            if not recursive:
>> +                continue
>> +            #check if this dir containts other repos for recursive scan
>> +            rec_path = os.path.join(p, dirpath)
>> +            if not os.path.islink(rec_path) and os.path.isdir(rec_path):
>> +                for inner_scm in _get_repos(rec_path):
>> +                    yield inner_scm
>>
>>       return _get_repos(path)
>>
>> diff -r 2f14b4db0362 -r 71ec71b81cea kallithea/model/scm.py
>> --- a/kallithea/model/scm.py    Tue Feb 09 17:46:36 2016 +0100
>> +++ b/kallithea/model/scm.py    Sat Feb 13 13:34:18 2016 +0100
>> @@ -260,12 +260,13 @@
>>
>>           return q.ui_value
>>
>> -    def repo_scan(self, repos_path=None):
>> +    def repo_scan(self, repos_path=None, find_nested=False):
>>           """
>>           Listing of repositories in given path. This path should not be a
>>           repository itself. Return a dictionary of repository objects
>>
>>           :param repos_path: path to directory containing repositories
>> +        :param find_nested: look for 'nested repositories' (e.g.
>> subrepos).
>>           """
>>
>>           if repos_path is None:
>> @@ -276,7 +277,8 @@
>>           baseui = make_ui('db')
>>           repos = {}
>>
>> -        for name, path in get_filesystem_repos(repos_path,
>> recursive=True):
>> +        for name, path in get_filesystem_repos(repos_path,
>> recursive=True,
>> +                                               find_nested=find_nested):
>>               # name need to be decomposed and put back together using the
>> /
>>               # since this is internal storage separator for kallithea
>>               name = Repository.normalize_repo_name(name)
>> diff -r 2f14b4db0362 -r 71ec71b81cea
>> kallithea/templates/admin/settings/settings_mapping.html
>> --- a/kallithea/templates/admin/settings/settings_mapping.html    Tue
>> Feb 09 17:46:36 2016 +0100
>> +++ b/kallithea/templates/admin/settings/settings_mapping.html    Sat
>> Feb 13 13:34:18 2016 +0100
>> @@ -19,6 +19,12 @@
>>                       <span class="help-block">${_('Check this to
>> reload data and clear cache keys for all repositories.')}</span>
>>
>>                       <div class="checkbox">
>> +                        ${h.checkbox('find_nested',False)}
>> +                        <label for="find_nested">${_('Look for nested
>> repositories (e.g. mercurial subrepos)')}</label>
>> +                    </div>
>> +                    <span class="help-block">${_('Check this option
>> to look for mercurial subrepos within mercurial
>> repositories.')}</span>
>
>
> Please use proper casing of Mercurial. And I guess it also can find Git
> repositories?

OK

> I think "nested repositories" will need some more hints in the description.

OK.

> Subrepos is one example of what it can be used for, but it might be more
> relevant what this change means. Perhaps something like "A repository can
> also be used as repository group". It should perhaps also be clarified that
> the subrepo example is for when the subrepo paths are simple and they thus
> live in the checkout / working directory - it is not 1:1 with subrepos.

OK. I think however that users using subrepos are likely to already
know some of their limitations such as this one, which is pretty
basic.
It is unfortunate that mercurial (still) uses its working directory as
a "subrepo cache", i.e. the location where subrepos are stored.
Hopefuly one day we'll be able to fix it.

Cheers,

Angel


More information about the kallithea-general mailing list