[KinoSearch] schema.json / index files growing to unwieldy size over time
Ashley Pond V
ashley.pond.v at gmail.com
Mon Dec 14 11:32:47 PST 2009
This is sort of a follow up to this short thread:
http://rectangular.com/pipermail/kinosearch/2009-November/007152.html
I'm having a problem with my index files slowly drifting/ballooning to
a size where the machine can't handle them and they run with too much
memory and CPU or even cause out of memory errors. You can see below
the file "schema_m.json" is up to 578_841_148.
-rw-r--r-- 1 apache apache 578841148 Nov 29 13:36 schema_m.json
drwxr-xr-x 2 apache apache 4096 Nov 27 16:42 seg_1/
drwxr-xr-x 2 apache apache 4096 Nov 29 13:36 seg_m/
-rw-r--r-- 1 apache apache 6093 Nov 29 13:36 snapshot_m.json
The same index's schema looks like this after an optimization:
-rw-r--r-- 1 apache apache 8944 Dec 14 09:37 schema_1ju.json
I suppose I can run optimize on every index operation/commit but that
seems kind of expensive. The problem with running it infrequently is
that by the time the schema file starts to get in the 500MB+ range it
starts to cause "Out of Memory" errors on the machine and leaves the
index with a "write.lock."
Marvin suggested last time that sortable fields might be an issue and
indeed I had many of them. Several were unused so I trimmed them. I
still have 6 sortable fields in a schema of 27 fields. Many of these
are unused for search and I could probably collapse half of them into
one JSON encoded field or something if that would improve
memory/segmentation.
*Any* additional advice, experience, pointers would be helpful. I
suspect I'm doing something wrong but it's a ton of (proprietary) code
so I don't want to (and probably can't for work reasons) post it.
Thanks for looking!
-Ashley
More information about the kinosearch
mailing list