[KinoSearch] Search in a clustered environment
marvin at rectangular.com
Wed Jan 17 15:16:10 PST 2007
>> Just by releasing a version of KS which reverts to the old
>> behavior of throwing an exception rather than deleting the lock
>> file, the contention issue is mitigated -- you still have to deal
>> with the exceptions, but no more potential index corruption.
I take this back.
Until 0.20 puts the write.lock file in the index dir itself, multiple
machines will never be able to write safely to a shared index unless
they actively override the (non-public) $LOCK_DIR location and put it
on the NFS mount. By default, each box will be looking in its own
temp directory, so it won't find lockfiles within the temp dirs of
There are other ways of indexing using multiple machines (using
InvIndexer's add_invindexes() method) , but the particular config
described earlier is problematic. It has never been safe and can't
be made safe pre-0.20 without a public API and active intervention.
So, change of plans. Barring further developments, I'll be leaving
the current behavior alone. If a new version of KS 0.1x goes up, it
will simply add a note to InvIndexer's docs indicating that it is not
safe to write from multiple machines to a shared volume.
Miles Crawford replied...
> Perhaps if deleting the lock file was exposed via the API, the
> calling application could make the decision to nuke it if needed?
> That might be exposing too much of KS's internals though.
For now, it's going to stay private. There's a huge amount of
pressure on me to get 0.20 out the door, and I'm not adding a public
API for the locking mechanism to my short-term todo list.
>> SearchServer and SearchClient are there to diffuse the cost of
>> searching a large corpus over several machines. They know nothing
>> about how the indexes were created, but the MultiSearcher with
>> which you would aggregate several SearchClients assumes that each
>> sub-searcher is responsible for unique content.
> Hmmm, I was making the (unwarranted) assumption that SearchServer
> would accept additions to the index as well as queries, and that I
> could then use it to safely broker all the activity on my cluster.
I'm open to suggestions about how the docs can be improved so that
less reading is required. By disposition I favor minimal APIs with
minimal documentation and enjoy throttling feeping creatures.
However, it's hard for me to understand how you would have made that
assumption after even a passing glance over the docs for
> I find myself deciding to have one indexer process updating an
> index periodically, and then copy that index to the NFS mount. All
> the searchers would be pointed at that spot on the NFS mount.
> Obviously, it would be possible to interrupt a search in an
> unfortunate fashion. Perhaps I can catch this, either via return
> codes or with an eval?
I recommend maintaining two copies of the index on the NFS volume as
described in KinoSearch::Docs::NFS. Use rync or equivalent rather
than copying because not all files change.
KinoSearch mailing list
KinoSearch at rectangular.com
More information about the kinosearch