[KinoSearch] Index size
Marvin Humphrey
marvin at rectangular.com
Wed Oct 18 16:43:05 PDT 2006
On Oct 18, 2006, at 3:52 PM, Chris Nandor wrote:
> So, I am running my indexer every five minutes. Been running that
> way for
> a few weeks, and I have 6000+ files in my invindex directory.
>
> Is this ... bad? Should I "manually" optimize more often?
Something is wrong. Either segments are not being merged away often
enough, or old, unused segment files are not being deleted. Segment
growth is limited by the fibonacci series. If you really have
several thousand segments in your index, get away from your box,
because a worm hole is about to open up.
What extensions are represented, and how many of each do you have?
Is this an NFS disk?
Please try this code:
my $searcher = KinoSearch::Searcher->new(
analyzer => $analyzer,
invindex => $invindex,
);
my $seg_infos = $searcher->{reader}{seg_infos};
my @seg_names = sort keys %{ $seg_infos->{infos} };
print "NUM DOCS: " . $searcher->{reader}->num_docs . "\n";
print "SEG NAMES: @seg_names\n";
If there is a one-to-one correspondence between the segment names and
the _XX.cfs files in your invindex, then all those .cfs files are in
use. If you've haven't noticed a significant slowdown, that's
probably not the case.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list