[KinoSearch] schema.json / index files growing to unwieldy size over time

Ashley Pond V ashley.pond.v at gmail.com
Mon Dec 14 11:32:47 PST 2009


This is sort of a follow up to this short thread:
http://rectangular.com/pipermail/kinosearch/2009-November/007152.html

I'm having a problem with my index files slowly drifting/ballooning to
a size where the machine can't handle them and they run with too much
memory and CPU or even cause out of memory errors. You can see below
the file "schema_m.json" is up to 578_841_148.

-rw-r--r--  1 apache apache 578841148 Nov 29 13:36 schema_m.json
drwxr-xr-x  2 apache apache      4096 Nov 27 16:42 seg_1/
drwxr-xr-x  2 apache apache      4096 Nov 29 13:36 seg_m/
-rw-r--r--  1 apache apache      6093 Nov 29 13:36 snapshot_m.json

The same index's schema looks like this after an optimization:

-rw-r--r--  1 apache apache 8944 Dec 14 09:37 schema_1ju.json

I suppose I can run optimize on every index operation/commit but that
seems kind of expensive. The problem with running it infrequently is
that by the time the schema file starts to get in the 500MB+ range it
starts to cause "Out of Memory" errors on the machine and leaves the
index with a "write.lock."

Marvin suggested last time that sortable fields might be an issue and
indeed I had many of them. Several were unused so I trimmed them. I
still have 6 sortable fields in a schema of 27 fields. Many of these
are unused for search and I could probably collapse half of them into
one JSON encoded field or something if that would improve
memory/segmentation.

*Any* additional advice, experience, pointers would be helpful. I
suspect I'm doing something wrong but it's a ton of (proprietary) code
so I don't want to (and probably can't for work reasons) post it.

Thanks for looking!
-Ashley



More information about the kinosearch mailing list