[KinoSearch] KinoSearch 0.163 - Couldn't open file : File exists
Marvin Humphrey
marvin at rectangular.com
Thu Feb 26 12:58:22 PST 2009
On Thu, Feb 26, 2009 at 11:42:10AM -0500, Clifton Kussmaul wrote:
> I tried 4217, and it still gets stuck, unfortunately.
>
> Couldn't open file '<...>/index/_1.srt": File exists
> at <...>/KinoSearch/Store/FSInvIndex.pm
Yeah, that little cockroach had escaped. Please try 4221.
> Also, I think I (finally) found the error which breaks the index:
> Out of memory during "large" request for 16781312 bytes,
> total sbrk() is 376035328 bytes at <...>/KinoSearch/Index/SegWriter.pm line
> 74.
> (That's a 16MB request and the total sbrk() is 376MB.)
> I guess that's the request that sbrk()'s the Kino's back :-)
>
> I am indexing files >10MB, so maybe more RAM will fix this.
For KS 0.163 on a 32-bit machine, each Token takes up 28 bytes in addition to
the space required by the text itself. That's before inversion...
struct Token {
char *text;
STRLEN len;
I32 start_offset;
I32 end_offset;
I32 pos_inc;
Token *next;
Token *prev;
};
So, yes, indexing huge documents takes a lot of memory, and more RAM will
probably prevent that crash. KS uses external sorting so that it can handle a
lot of docs, but a single huge doc can cause problems on a memory-limited
machine.
Marvin Humphrey
More information about the kinosearch
mailing list