[KinoSearch] Invalid UTF-8

Marvin Humphrey marvin at rectangular.com
Wed Jan 27 16:41:45 PST 2010


On Tue, Jan 26, 2010 at 07:15:16PM -0800, Marvin Humphrey wrote:

> Yup, I've now duplicated the problem on my system using 60,000 docs.  

Fixed by r5764.

> I bet I can get that way down by fiddling with the flush threshold.

Ultimately, I was isolate the trigger to a single document with two fields, by
bringing the threshold at which PostingListWriter flushes all of its
PostingPools way, way down:

-#define DEFAULT_MEM_THRESH 0x1000000
+/* #define DEFAULT_MEM_THRESH 0x1000000 */
+#define DEFAULT_MEM_THRESH 0x10

When that variable lived in Perl, the KinoSearch::Test module used to set it
to a much smaller number at load time.  This had the effect of simulating
large indexes as far as PostingListWriter was concerned, by forcing runs to be
flushed many many times.  However, it turns out that we have been doing
without that important simulation for a long time -- the entire KS test suite
was not triggering a PostingPool flush even once.  I'm a little surprised that
after all the refactoring I did on this code recently, there was only a single
glitch that needed to be fixed.  

Now even if I set the threshold to 0x100, the whole test suite passes.

Marvin Humphrey




More information about the kinosearch mailing list