[KinoSearch] Search in a clustered environment

Marvin Humphrey marvin at rectangular.com
Thu Jan 18 17:13:02 PST 2007




On Jan 18, 2007, at 3:48 PM, Miles Crawford wrote:

>> I'm open to suggestions about how the docs can be improved so that  
>> less reading is required.  By disposition I favor minimal APIs  
>> with minimal documentation and enjoy throttling feeping  
>> creatures.  However, it's hard for me to understand how you would  
>> have made that assumption after even a passing glance over the  
>> docs for SearchServer.  :(
>
> The docs are fine - the first glance I gave them told me I was  
> mistaken, I heard about the client/server for the first time from  
> your email.

One thing I don't like about the way the docs show up on  
search.cpan.org right now is that all the private classes are  
exposed, despite the fact that all their POD is "invisible" -- i.e.  
within =for/=begin blocks.

Next CPAN release, all the "=head1 NAME" directives will switch to  
"=head1 PRIVATE CLASS".  After that change the private classes should  
get grayed out, which will make things more navigable.

>> I recommend maintaining two copies of the index on the NFS volume  
>> as described in KinoSearch::Docs::NFS.  Use rync or equivalent  
>> rather than copying because not all files change.
>
> Granted about rsync, yeah.  But as for the scheme described in  
> KinoSearch::Docs::NFS, I'm less certain.  I can't think of a  
> particularly clean way of causing all the machines in my cluster to  
> alternate the location of the index on a potentially very frequent  
> cue.  I'd have to be flipping some flag in a database or managing  
> an indication file on the NFS mount or something else relatively  
> orthogonal to searching. At that point I have a process that seems  
> overly complex and error-prone.

Warming up the caches for a Searcher operating against a large index  
may cause a noticeable delay on the first search.  That may factor in  
to how often you want to refresh them.  Also bear in mind that each  
Searcher presents a snapshot of the index, and does not update to  
include recent content -- you have to destroy and recreate if you  
want to see changes.

> Sounds like your telling me that rsyncing the completed index over  
> the old one is error-prone as well, huh?

If there are active Searchers, they'll crash on NFS because NFS  
blithely deletes files out from underneath active apps.

There was lots of action on the java-dev at lucene.apache.org list  
regarding NFS locking strategies today.  Here's the conclusion I've  
arrived at after much discussion.

> On Jan 18, 2007, at 2:59 PM, Doron Cohen wrote:
>> To my understanding the only remaining issue with NFS is: a reader
>> might get an IO exception in case writer removed an old file that
>> the reader is using.
>>
>> It is not a possible corruption that we try to solve, right?
>>
>> For that I think it is not worth to add that stuff again.
>
> I agree, Doron.
>
> I'd rather leave NFS as a problem case.
>
> Now, how about having Readers establish advisory read locks when  
> the operating system supports them?  That seems to me to be still  
> in the spirit having Readers be read-only.
>
> Then our problem set is reduced even further: only NFS systems  
> using protocols prior to version 4.
>
> It's probably not even worth it to perform the lock test I proposed  
> earlier.  We just use file systems the way they're suppose to  
> behave and eventually NFS catches up.
>
> Provisionally, this is what I will implement for KS, unless  
> something better emerges from ongoing discussion.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list