[KinoSearch] Exception when indexing an empty field

Marvin Humphrey marvin at rectangular.com
Mon Oct 18 04:10:56 PDT 2010


On Mon, Oct 18, 2010 at 10:19:35AM +0200, Thomas Klausner wrote:
> After upgrading from 0.30101 to 0.30122 I get the following error when 
> indexing a document with an empty field:
> 
> Invalid UTF-8, aborting: ''
> Invalid UTF-8., S_die_invalid_utf8 at core/KinoSearch/Object/CharBuf.c 
> line 161
> 
> i.e. when I call
> $indexer->add_doc({
> 	foo=>'foo',
>   	bar=>'',
> });

Hmm, there's a test for this specific case in t/204-doc_reader.t, which
presumably passed when 0.30_122 was installed:
    
    $indexer->add_doc(
        {   text     => $val,
            bin      => $bin_val,
            unstored => $val,
            empty    => '',
            float64  => 2.0,
        }
    );

So, I'm not sure I'll be able to reproduce the problem here easily without some
help.  Nevertheless, I have a couple things we can try.

First, what do you get if you run this on your system?

    perl -MKinoSearch -le 'warn KinoSearch::Util::StringHelper::utf8_valid("")'

Next, please try applying this patch against the file
xs/KinoSearch/Util/StringHelper.c:

Index: ../perl/xs/KinoSearch/Util/StringHelper.c
===================================================================
--- ../perl/xs/KinoSearch/Util/StringHelper.c   (revision 6383)
+++ ../perl/xs/KinoSearch/Util/StringHelper.c   (working copy)
@@ -6,7 +6,7 @@
 kino_StrHelp_utf8_valid(const char *ptr, size_t size)
 {
     const U8 *uptr = (const U8*)ptr;
-    return is_utf8_string(uptr, size);
+    return size == 0 ? true : !!is_utf8_string(uptr, size);
 }


Does that make a difference?

> I think I also got a similar error when I left out a field completly (or 
> set it to undef).

If you set a field to undef, you'll get a stringification warning.  Right now,
KS can't round-trip undefs cleanly through the index, so it warns before
stringifying undef to "" and indexing/storing.

Just to verify something, is the actual line that's triggering the exception in
your app (line 161 in the error message above) the add_doc() call?
 
> Any hints? Am i missing something?

You're not missing anything AFAICT.  What you describe ought to work.

Marvin Humphrey




More information about the kinosearch mailing list