[KinoSearch] KinoSearch 0.163 - Couldn't open file : File exists

Marvin Humphrey marvin at rectangular.com
Thu Feb 26 12:58:22 PST 2009


On Thu, Feb 26, 2009 at 11:42:10AM -0500, Clifton Kussmaul wrote:

> I tried 4217, and it still gets stuck, unfortunately.
> 
> Couldn't open file '<...>/index/_1.srt": File exists
>        at <...>/KinoSearch/Store/FSInvIndex.pm

Yeah, that little cockroach had escaped.  Please try 4221.

> Also, I think I (finally) found the error which breaks the index:
> Out of memory during "large" request for 16781312 bytes, 
> total sbrk() is 376035328 bytes at <...>/KinoSearch/Index/SegWriter.pm line
> 74.
> (That's a 16MB request and the total sbrk() is 376MB.)
> I guess that's the request that sbrk()'s the Kino's back :-)
> 
> I am indexing files >10MB, so maybe more RAM will fix this.

For KS 0.163 on a 32-bit machine, each Token takes up 28 bytes in addition to
the space required by the text itself.   That's before inversion...  

    struct Token {
        char   *text;
        STRLEN  len;
        I32     start_offset;
        I32     end_offset;
        I32     pos_inc;
        Token  *next;
        Token  *prev;
    };

So, yes, indexing huge documents takes a lot of memory, and more RAM will
probably prevent that crash.  KS uses external sorting so that it can handle a
lot of docs, but a single huge doc can cause problems on a memory-limited
machine.

Marvin Humphrey




More information about the kinosearch mailing list