[PATCH 3 of 3 v2] hooks: fix potentially invalid interpreter in git hooks (Issue #333)

Sat Apr 6 19:22:01 UTC 2019

El jue., 4 abr. 2019 a las 0:41, Mads Kiilerich (<mads at kiilerich.com>)
escribió:

> On 4/3/19 10:28 PM, Thomas De Schampheleire wrote:
> > # HG changeset patch
> > # User Thomas De Schampheleire <thomas.de_schampheleire at nokia.com>
> > # Date 1554323208 -7200
> > #      Wed Apr 03 22:26:48 2019 +0200
> > # Branch stable
> > # Node ID 0d91f710606b72027456a1183794ee38f26fe741
> > # Parent  06b511848fb1a3ead56de583e19df5f317bdf7e0
> > hooks: fix potentially invalid interpreter in git hooks (Issue #333)
>
>
> Thanks for iterating on this.
>
>
> > Commit 5e501b6ee639 introduced the use of 'sys.executable' as interpreter
> > for git hooks instead of 'python2' with the following argument:
> >
> >      "Windows doesn't necessarily have "python2" available in $PATH, but
> we
> >      still want to make sure we don't end up invoking a python3. Using
> the
> >      absolute path seems more safe."
> >
> > But, sys.executable does not necessarily point to Python. When Kallithea
> is
> > started under uWSGI, sys.executable points to the uwsgi executable. As a
> > result, the interpreter encoded in the git hooks on the server
> repositories
> > would be:
> >
> >      #!/usr/bin/env /path/to/uwsgi
> >
> > And pushing to such repo would result in following client errors:
> >
> >      $ git push
> >      Password for 'http://user@localhost:5050':
> >      Enumerating objects: 3, done.
> >      Counting objects: 100% (3/3), done.
> >      Writing objects: 100% (3/3), 241 bytes | 241.00 KiB/s, done.
> >      Total 3 (delta 0), reused 0 (delta 0)
> >      remote: unable to load configuration from hooks/pre-receive
> >      To http://localhost:5050/gitrepo-new
> >       ! [remote rejected] master -> master (pre-receive hook declined)
> >      error: failed to push some refs to 'http://user@localhost
> :5050/gitrepo-new'
>
>
> Can we somehow make it show a more useful error message in this case? I
> guess not - none of our code is executed ...
>

Actually, there is some of our code executed: SimpleGit. But, at that point
you don't know that the git hook call is going to fail.

>
>
> > The approach taken by this commit is:
>
>
> FWIW, this:
>
> > - Introduce a new configuration setting: hook_python_path, and use its
> value
> >    as the Python interpreter to use in git hooks.
> > - If the value is absent or empty, fall back to the current logic
> >    'sys.executable' or if that is also invalid, 'python2'.
>
>
> and this:
>
>
> > - Let `kallithea-cli config-create` fill in its own interpreter as
> default
> >    value, which should be valid as long as the virtual environment is not
> >    moved/replaced.
>
>
> *could* be separate "milestones" and separate changesets. But no big
> deal. Split or keep together as you like.
>

Yes, I considered it, but got a bit stuck with the comments referring to
code that does not yet exist.
I'll think about it a second time.

>
>
> > The main downside of this approach is its fragility in case a new
> virtualenv
> > is used. Previously, the configuration file was the same regardless of
> > virtualenv, but this is no longer true after this patch.
>
>
> But also, it is crucial that the *right* virtualenv is used, also if Git
> should be invoked directly, outside any virtual env. It is thus inherent
> that the new virtualenv *has* to be written to the hooks somehow. The
> previous use of same 'python2' was wrong and fragile. I thus don't think
> the new way is more fragile. Explicit expectations and failing early is
> *less* fragile ;-)
>

In fact, my understanding was not fully complete before: when users push to
a git repo, they actually communicate with our Git middleware, which itself
uses dulwich. All this should normally happen inside the virtualenv (if
used). Only when the hooks get called, there is a decoupling into another
process. (I am theorizing here, not explicitly checked)

So if that is correct, then I wonder if we could not avoid the decoupling.
We are already handling pull hooks in
kallithea.lib.middleware.simplegit._handle_githooks .
Can we not handle push hooks pre-receive/post-receive in the same place? In
this case, we are definitely in the right environment, we are and stay
within Python. I guess it will also simplify the situation for Windows.

Am I overlooking something?

>
> > diff --git a/development.ini b/development.ini
> > --- a/development.ini
> > +++ b/development.ini
> > @@ -117,6 +117,16 @@ use_htsts = false
> >   ## number of commits stats will parse on each iteration
> >   commit_parse_limit = 25
> >
> > +## Path to Python executable to be used in git hooks.
> > +## When empty, the value of 'sys.executable' at the time of installation
> > +## of the git hooks is used, which is correct in many cases but not when
> > +## using uwsgi.
> > +## When creating a config file via `kallithea-cli config-create`, the
> > +## Python executable used for that invocation is filled in below.
> > +## If you change this setting, you should reinstall the git hooks via
> > +## Admin > Settings > Remap and Rescan.
> > +hook_python_path =
> > +
> >   ## path to git executable
> >   git_path = git
> >
> > diff --git a/kallithea/lib/inifile.py b/kallithea/lib/inifile.py
> > --- a/kallithea/lib/inifile.py
> > +++ b/kallithea/lib/inifile.py
> > @@ -23,6 +23,7 @@ other custom values.
> >   import logging
> >   import re
> >   import os
> > +import sys
> >
> >   import mako.template
> >
> > @@ -40,6 +41,7 @@ default_variables = {
> >       'host': '127.0.0.1',
> >       'port': '5000',
> >       'uuid': lambda: 'VERY-SECRET',
> > +    'python_executable': sys.executable or '',
> >   }
> >
> >
> > diff --git a/kallithea/lib/paster_commands/template.ini.mako
> b/kallithea/lib/paster_commands/template.ini.mako
> > --- a/kallithea/lib/paster_commands/template.ini.mako
> > +++ b/kallithea/lib/paster_commands/template.ini.mako
> > @@ -211,6 +211,16 @@ use_htsts = false
> >   <%text>## number of commits stats will parse on each iteration</%text>
> >   commit_parse_limit = 25
> >
> > +<%text>## Path to Python executable to be used in git hooks.</%text>
> > +<%text>## When empty, the value of 'sys.executable' at the time of
> installation</%text>
> > +<%text>## of the git hooks is used, which is correct in many cases but
> not when</%text>
> > +<%text>## using uwsgi.</%text>
>
>
> Perhaps clarify that "used" means "written into the git hook script files".
>
>
> > +<%text>## When creating a config file via `kallithea-cli
> config-create`, the</%text>
> > +<%text>## Python executable used for that invocation is filled in
> below.</%text>
> > +<%text>## If you change this setting, you should reinstall the git
> hooks via</%text>
> > +<%text>## Admin > Settings > Remap and Rescan.</%text>
> > +hook_python_path = ${python_executable}
>
>
> Perhaps call the variable `python_executable` just like the internal
> name. Consistency makes it simpler to work with. If one name is better
> than the other, just use it everywhere. And while the current use of
> python_executable is to be written into hooks, that "policy" doesn't
> have to be "hardcoded" into the name of the setting - mentioning in the
> comments should be enough.
>
> And how about using template conditionals (and examples in comments, like)
>
> """
> # python_executable = /srv/kallithea/venv/bin/python2
> # python_executable = (((a windows example)))
> %if python_executable:
> python_executable = ${python_executable}
> %endif
> """
>
> Then we don't need the previous change or the " or ''" ... and we avoid
> putting trailing space in the .ini ;-)
>
>
> >   <%text>## path to git executable</%text>
> >   git_path = git
> >
> > diff --git a/kallithea/model/scm.py b/kallithea/model/scm.py
> > --- a/kallithea/model/scm.py
> > +++ b/kallithea/model/scm.py
> > @@ -721,8 +721,11 @@ class ScmModel(object):
> >           return choices, hist_l
> >
> >       def _get_python_executable(self):
> > -        """Return a Python executable for use in hooks."""
> > -        return sys.executable or 'python2'
> > +        """Return a Python executable for use in hooks.
> > +
> > +        This is not simply sys.executable because under uwsgi it will
> be the
> > +        uwsgi program itself."""
>
>
> I think it also would be very helpful to summarize how virtualenv should
> (and will) be handled.
>
> Perhaps, if you know mod_wsgi has similar problems, mention that too.
>
> Windows vs Posix might also deserve mentioning.
>
> (But also, I don't know how much we want to duplicate information here,
> in .ini, in commit message, (and perhaps in documentation if we can't
> automate it fully but have to explain).)
>
>
> > +        return kallithea.CONFIG.get('hook_python_path', sys.executable)
> or 'python2'
>
>
> We want sys.executable, also if the .ini file sets it to the empty
> string. I guess it thus should be something like
>
> kallithea.CONFIG.get('python_executable') or sys.executable or 'python2'
>
>
> >       def install_git_hooks(self, repo, force_create=False):
> >           """
> > diff --git a/scripts/generate-ini.py b/scripts/generate-ini.py
> > --- a/scripts/generate-ini.py
> > +++ b/scripts/generate-ini.py
> > @@ -48,6 +48,7 @@ ini_files = [
> >               },
> >           },
> >           {
> > +            'python_executable': '',
> >           },
>
>
> (With the template proposed above, we don't need this.)
>

Thanks for these comments, I generally agree.

But before going further, I'd like to hear your opinion on the idea of
calling the hooks directly from our own code (see above).

Thanks,
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sfconservancy.org/pipermail/kallithea-general/attachments/20190406/632bf235/attachment-0001.html>