[KinoSearch] KinoSearch Death

Marvin Humphrey marvin at rectangular.com
Fri Oct 20 14:52:10 PDT 2006




On Oct 20, 2006, at 1:14 PM, Chris Nandor wrote:

> Another would be to copy all the index files to each httpd box,  
> instead of
> using NFS.  Pain.

Well, most of the time the index doesn't change very much, so you  
wouldn't have to copy the whole thing every 5 minutes if you went  
that route.  Segments stick around as long as they can.  The  
fibonacci-based merge trigger is designed to minimize churn.

Check out how Doug set up Lucene for Technorati.

http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg12709.html

I'm also kind of curious about how many servers you can point at the  
same NFS volume before you end up i/o bound.

> Also, I wonder ... could search() fail and *not die* like that?   
> Maybe only
> part of the file is gone?  Or is this all-or-nothing?  I *think*  
> from what
> I understand of the problem, it will fail entirely or work  
> entirely, but I
> lack full confidence in that assessment.

The InStream class throws that error when it tries to read something  
that ought to be there and fails.  KS checks the return value for  
every read call.  There are very, very few opportunities for a read  
failure to produce incorrect data.

An InStream that gets out of sync has the potential to produce  
invalid output for a little bit.  But it usually dies almost  
immediately -- typically when it tries to read a string header vint  
and decoded vint tells it that the string is waaaaaayyyy longer than  
it actually is.  The instream tries to read that many bytes, slams  
into an EOF, and throws a error.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list