<div dir="ltr">Hi,<div><br></div><div>there are basically 2 different kinds of searches in kallithea.</div><div><br></div><div>1. filtering revisions</div><div>Mads mentioned 2 years ago that he plans to add some support for this</div><div><a href="https://bitbucket.org/conservancy/kallithea/issues/18/search-needs-to-be-improved">https://bitbucket.org/conservancy/kallithea/issues/18/search-needs-to-be-improved</a><br></div><div>2. searching in multiple repositories (inlc. fulltext searching in the files)</div><div><br></div><div>I think the first point is pretty much strait forward. Git and Mercurial support filtering revisions. It basically 'only' needs to be implemented. :-)</div><div><br></div><div>But the second one is more complicated.</div><div>There are multiple problems with the current implementation.</div><div><br></div><div>1. For starters since 9c5f794df7cd the make-index command is broken. But that can be easily fixed.</div><div>2. What is no so easy to fix, is the fact that indexing is currently incredibly slow.</div><div>3. The indexing is done periodically, it only indexes the tip revision at indexing time and the search results refer to the tip at search time. Therefore</div><div>  a) you may get hits that are no longer valid</div><div>  b) you may get no hits even though the string is present now</div><div>  c) you can't search for things that have been removed</div><div><br></div><div>I believe all this is solvable. I looked into the code and found a few places where the indexing can definitely be improve.</div><div>But I don't have much experience with whoosh. So I'm not sure if it is even worth it to fix the current implementation, or if I should restart with solr or elastic search.</div><div><br></div><div>My questions to you guys are:</div><div><br></div><div>1. Do you have experience with whoosh? Does it scale to gigabytes of data?</div><div>2. Would you even pull a implementation that requires installing solr? Note: I believe installation and setup of solr can be automated.</div><div>3. Or maybe you thing the fulltext search should be dropped all together.</div><div><br></div><div>BTW: I'd use the linux kernel as benchmark. I think, if we could handle more then half a million revisions, with more then a gig of files, we would be fine.</div></div>