[KinoSearch] Exception when indexing an empty field
Marvin Humphrey
marvin at rectangular.com
Mon Oct 18 04:10:56 PDT 2010
On Mon, Oct 18, 2010 at 10:19:35AM +0200, Thomas Klausner wrote:
> After upgrading from 0.30101 to 0.30122 I get the following error when
> indexing a document with an empty field:
>
> Invalid UTF-8, aborting: ''
> Invalid UTF-8., S_die_invalid_utf8 at core/KinoSearch/Object/CharBuf.c
> line 161
>
> i.e. when I call
> $indexer->add_doc({
> foo=>'foo',
> bar=>'',
> });
Hmm, there's a test for this specific case in t/204-doc_reader.t, which
presumably passed when 0.30_122 was installed:
$indexer->add_doc(
{ text => $val,
bin => $bin_val,
unstored => $val,
empty => '',
float64 => 2.0,
}
);
So, I'm not sure I'll be able to reproduce the problem here easily without some
help. Nevertheless, I have a couple things we can try.
First, what do you get if you run this on your system?
perl -MKinoSearch -le 'warn KinoSearch::Util::StringHelper::utf8_valid("")'
Next, please try applying this patch against the file
xs/KinoSearch/Util/StringHelper.c:
Index: ../perl/xs/KinoSearch/Util/StringHelper.c
===================================================================
--- ../perl/xs/KinoSearch/Util/StringHelper.c (revision 6383)
+++ ../perl/xs/KinoSearch/Util/StringHelper.c (working copy)
@@ -6,7 +6,7 @@
kino_StrHelp_utf8_valid(const char *ptr, size_t size)
{
const U8 *uptr = (const U8*)ptr;
- return is_utf8_string(uptr, size);
+ return size == 0 ? true : !!is_utf8_string(uptr, size);
}
Does that make a difference?
> I think I also got a similar error when I left out a field completly (or
> set it to undef).
If you set a field to undef, you'll get a stringification warning. Right now,
KS can't round-trip undefs cleanly through the index, so it warns before
stringifying undef to "" and indexing/storing.
Just to verify something, is the actual line that's triggering the exception in
your app (line 161 in the error message above) the add_doc() call?
> Any hints? Am i missing something?
You're not missing anything AFAICT. What you describe ought to work.
Marvin Humphrey
More information about the kinosearch
mailing list