[PATCH] ChangesetCommentsModel: refactor duplicate code

Wed Apr 29 18:39:07 EDT 2015

On 04/29/2015 10:09 PM, Jan Heylen wrote:
> # HG changeset patch
> # User Jan Heylen <heyleke at gmail.com>
> # Date 1430323563 -7200
> #      Wed Apr 29 18:06:03 2015 +0200
> # Node ID 8ead3310a073d6453b926b891f1a59a0fcf6c6b5
> # Parent  71140b7c9abfcb34db70792c39e09a4e4796efb1
> ChangesetCommentsModel: refactor duplicate code
>
> KALLITHEA_WHOOSH_TEST_DISABLE=1 KALLITHEA_NO_TMP_PATH=1 nosetests kallithea/tests/functional/test_changeset_comments.py
> still passes all 4 tests
>
> diff -r 71140b7c9abf -r 8ead3310a073 kallithea/model/comment.py
> --- a/kallithea/model/comment.py	Wed Apr 29 04:25:53 2015 +0200
> +++ b/kallithea/model/comment.py	Wed Apr 29 18:06:03 2015 +0200
> @@ -257,27 +257,36 @@
>           :param revision:
>           :param pull_request:
>           """
> -
> -        q = ChangesetComment.query()\
> -                .filter(ChangesetComment.repo_id == repo_id)\
> -                .filter(ChangesetComment.line_no == None)\
> -                .filter(ChangesetComment.f_path == None)
> -        if revision:
> -            q = q.filter(ChangesetComment.revision == revision)
> -        elif pull_request:
> -            pull_request = self.__get_pull_request(pull_request)
> -            q = q.filter(ChangesetComment.pull_request == pull_request)
> -        else:
> -            raise Exception('Please specify revision or pull_request')
> -        q = q.order_by(ChangesetComment.created_on)
> -        return q.all()
> +        return self._get_comments(repo_id, revision=revision, pull_request=pull_request, inline=False)
>   
>       def get_inline_comments(self, repo_id, revision=None, pull_request=None):
> +        """
> +        Gets inline comments based on revision or pull_request_id
> +
> +        :param repo_id:
> +        :param revision:
> +        :param pull_request:

I haven't been trained to embrace this kind of doxygen-ish docstrings. I 
can see how they can be very valuable if it not is obvious from the 
general description of the function and its name and the name of the 
parameters what they do ... but I think it in most cases would be more 
important and perhaps sufficient to do some renamings and make it self 
explanatory. I can also see how such comments can be nice _if_ we can 
get and maintain full and correct coverage of meaningful documentation 
... but I don't think that is realistic.

So: in this case I would prefer to not have these stubs (which always 
can be auto generated if they should be needed, but right now only adds 
an maintenance cost without adding value). In this case I think it would 
be enough (and preferred) if the docstring just pointed out that it 
returned comments for _either_ a revision or a pull_request.

Also, it seems like it is non-trivial what this function returns 
(especially compared to get_comments). Python duck typing might make it 
hard to say much about the types of input parameters but there is often 
something relevant to say about the returned value.

I realize most of the docstring I am commenting on here came from 
copying get_comments. My comments here should thus instead be directed 
to the previous author. Still, I would prefer if we could reduce the 
amount of such technical debt.

> +        """
> +        return self._get_comments(repo_id, revision=revision, pull_request=pull_request, inline=True)
> +
> +    def _get_comments(self, repo_id, revision=None, pull_request=None, inline=False):
> +        """
> +        Gets comments based on revision or pull_request_id
> +
> +        :param repo_id:
> +        :param revision:
> +        :param pull_request:
> +        :param inline: boolean, get normal comments or inline comments

IMO, it is obvious that inline is a boolean considering the default 
value of False. It is also obvious that the default is "not-so-much 
inline" and setting inline=True means "yes to inline". All we need to 
know is if this function is able to return both types at once or if it 
returns either one or the other. The implementation hints that it 
returns either one or the other. IMO, _that_ would be relevant to have 
in the general description, leaving no need for description of the 
individual parameters.

> +        """
>           q = Session().query(ChangesetComment)\
>               .filter(ChangesetComment.repo_id == repo_id)\
> -            .filter(ChangesetComment.line_no != None)\
> -            .filter(ChangesetComment.f_path != None)\
> -            .order_by(ChangesetComment.comment_id.asc())\
> +
> +        if inline:
> +            q = q.filter(ChangesetComment.line_no != None)\
> +                .filter(ChangesetComment.f_path != None)
> +        else:
> +            q = q.filter(ChangesetComment.line_no == None)\
> +                .filter(ChangesetComment.f_path == None)
>   
>           if revision:
>               q = q.filter(ChangesetComment.revision == revision)
> @@ -287,7 +296,11 @@
>           else:
>               raise Exception('Please specify revision or pull_request_id')
>   
> +        q = q.order_by(ChangesetComment.created_on)
> +
>           comments = q.all()
> +        if not inline:
> +            return comments
>   
>           paths = defaultdict(lambda: defaultdict(list))

It seems like this function returns completely different types of data, 
depending on the input parameter. That can confuse any type checker - 
including my mental type checker. I don't like that.

How about letting _get_comments always return an "iterable" of comments 
and leave it to the new get_inline_comments to do the conversion to what 
seems to be a weird randomly ordered list of tuples of filenames and 
linenumber-to-list-of-comments. (Without looking at the context, I guess 
it would be cleaner if it returned the 'paths' dict and left it to the 
caller to convert to list or use as dict.

/Mads