How to fix searching

Dominik Ruf dominikruf at gmail.com
Mon Sep 26 16:55:39 UTC 2016


Hi,

there are basically 2 different kinds of searches in kallithea.

1. filtering revisions
Mads mentioned 2 years ago that he plans to add some support for this
https://bitbucket.org/conservancy/kallithea/issues/18/search-needs-to-be-improved
2. searching in multiple repositories (inlc. fulltext searching in the
files)

I think the first point is pretty much strait forward. Git and Mercurial
support filtering revisions. It basically 'only' needs to be implemented.
:-)

But the second one is more complicated.
There are multiple problems with the current implementation.

1. For starters since 9c5f794df7cd the make-index command is broken. But
that can be easily fixed.
2. What is no so easy to fix, is the fact that indexing is currently
incredibly slow.
3. The indexing is done periodically, it only indexes the tip revision at
indexing time and the search results refer to the tip at search time.
Therefore
  a) you may get hits that are no longer valid
  b) you may get no hits even though the string is present now
  c) you can't search for things that have been removed

I believe all this is solvable. I looked into the code and found a few
places where the indexing can definitely be improve.
But I don't have much experience with whoosh. So I'm not sure if it is even
worth it to fix the current implementation, or if I should restart with
solr or elastic search.

My questions to you guys are:

1. Do you have experience with whoosh? Does it scale to gigabytes of data?
2. Would you even pull a implementation that requires installing solr?
Note: I believe installation and setup of solr can be automated.
3. Or maybe you thing the fulltext search should be dropped all together.

BTW: I'd use the linux kernel as benchmark. I think, if we could handle
more then half a million revisions, with more then a gig of files, we would
be fine.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sfconservancy.org/pipermail/kallithea-general/attachments/20160926/2d3cd6e8/attachment.html>


More information about the kallithea-general mailing list