[KinoSearch] Search in a clustered environment

Marvin Humphrey marvin at rectangular.com
Wed Jan 17 15:16:10 PST 2007




I wrote...

>> Just by releasing a version of KS which reverts to the old  
>> behavior of throwing an exception rather than deleting the lock  
>> file, the contention issue is mitigated -- you still have to deal  
>> with the exceptions, but no more potential index corruption.

I take this back.

Until 0.20 puts the write.lock file in the index dir itself, multiple  
machines will never be able to write safely to a shared index unless  
they actively override the (non-public) $LOCK_DIR location and put it  
on the NFS mount.  By default, each box will be looking in its own  
temp directory, so it won't find lockfiles within the temp dirs of  
other boxen.

There are other ways of indexing using multiple machines (using  
InvIndexer's add_invindexes() method) , but the particular config  
described earlier is problematic.  It has never been safe and can't  
be made safe pre-0.20 without a public API and active intervention.

So, change of plans.  Barring further developments, I'll be leaving  
the current behavior alone.  If a new version of KS 0.1x goes up, it  
will simply add a note to InvIndexer's docs indicating that it is not  
safe to write from multiple machines to a shared volume.

Miles Crawford replied...

> Perhaps if deleting the lock file was exposed via the API, the  
> calling application could make the decision to nuke it if needed?   
> That might be exposing too much of KS's internals though.

For now, it's going to stay private.  There's a huge amount of  
pressure on me to get 0.20 out the door, and I'm not adding a public  
API for the locking mechanism to my short-term todo list.

>> SearchServer and SearchClient are there to diffuse the cost of  
>> searching a large corpus over several machines.  They know nothing  
>> about how the indexes were created, but the MultiSearcher with  
>> which you would aggregate several SearchClients assumes that each  
>> sub-searcher is responsible for unique content.
>
> Hmmm, I was making the (unwarranted) assumption that SearchServer  
> would accept additions to the index as well as queries, and that I  
> could then use it to safely broker all the activity on my cluster.

I'm open to suggestions about how the docs can be improved so that  
less reading is required.  By disposition I favor minimal APIs with  
minimal documentation and enjoy throttling feeping creatures.   
However, it's hard for me to understand how you would have made that  
assumption after even a passing glance over the docs for  
SearchServer.  :(

> I find myself deciding to have one indexer process updating an  
> index periodically, and then copy that index to the NFS mount.  All  
> the searchers would be pointed at that spot on the NFS mount.   
> Obviously, it would be possible to interrupt a search in an  
> unfortunate fashion.   Perhaps I can catch this, either via return  
> codes or with an eval?

I recommend maintaining two copies of the index on the NFS volume as  
described in KinoSearch::Docs::NFS.  Use rync or equivalent rather  
than copying because not all files change.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list