[KinoSearch] State of multisearcher/sorting in svn

Marvin Humphrey marvin at rectangular.com
Tue Jun 10 10:15:48 PDT 2008



On Jun 10, 2008, at 1:23 AM, Henry wrote:

> I've been diligently reading (in some cases glassily, so I may have  
> missed
> something important:) the subversion commits and noticed:

> Log:
> Port the rest of SortSpec to C.

There weren't any meaningful functional changes in that commit.  It  
was just another step in the process of porting the modules, so that  
KS can run from C and be bound to other languages.

> can you provide a description of the current
> status of multisearch/sorting (as of latest svn commit)?  I vaguely  
> recall
> that the two (multisearch/sort) were on your todo list at some point.

There's a working implementation, but it's disabled by default and  
requires an undocumented call to enable it.

   KinoSearch::Search::MultiSearcher->set_enable_sorting(1);

It's that way because I basically want only people who are subscribed  
to this list to be able to use that feature.

Sorting at the single machine level works pretty well.  The "sort  
cache" which is maintained for each sortable field, is actually an  
array of 32-bit integers, one for each document, which indicates the  
document's rank in a list sorted on that field.  When a sorted search  
is requested, these rank numbers are compared, rather than the  
original field values.  It's very fast, and the memory footprint to  
maintain the cache, while substantial, is smaller because we only need  
32-bit integers rather than the original strings.

Unfortunately, that model breaks down at the multi-machine level  
because the rank numbers are no longer comparable.  That means that  
once we have the top hits for each node, we have to retrieve the  
original string values, send them across the network, and sort at the  
master node.

The infrastructure required to pull that trick off is quite  
elaborate.  It took a long time to write, and I'm concerned that by  
dint of its sheer size that there are bugs lurking.  In particular, I  
don't like the implementation of MultiLexicon.  I wish there was a  
better way.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list