[KinoSearch] Dynamic schemas - How?

Marvin Humphrey marvin at rectangular.com
Tue Feb 27 01:03:31 PST 2007




On Feb 26, 2007, at 10:30 PM, Marc Elser wrote:

> I just took a look at KinoSearch 0.20_01 because I've been longing  
> for this new release.
>
> To my very suprise, I saw that the index structure now is based on  
> subclassing KinoSearch::Schema::FieldSpec. Well that is a big  
> problem for users like me which dynamically create indexes based on  
> columns in our sql-tables which can be flagged to be indexed. Of  
> course statically defined subclasses of KinoSearch::Schema or not  
> possible with this setup.

Maybe not, but you can simulate them, because Perl is dynamic.

>  how can I define dynamic Schemas in KS 0.20???

At index time, it's possible, though kludgy.

     for my $field_name (@field_names) {
         eval qq|
             package MySchema::$field_name;
             use base qw( KinoSearch::Schema::Field );
         |;
         die $@ if $@;
     }
     MySchema->init_fields(@field_names);

That's essentially what I'm doing in my provisional implementation of  
KinoSearch::Simple.

The bigger problem in your case is what to do at search time.  KS no  
longer stores information about what fields are indexed, analyzed,  
stored, anything -- all that information is communicated via the  
Schema.  All that gets stored as far as field defs go is a per- 
segment field-name-to-field-num mapping.

To kludge up a search-time Schema, you could maybe write a file with  
the field names in it to the index directory, then read that file and  
generate your Schema subclass on the fly at search-time, too.  Not  
the most elegant solution, but should be usable, no?

The eventual plan is to improve the situation over what exists in KS  
0.15.  Right now I have to dedicate most of my devel time to certain  
large-scale performance optimizations, but here's some of what I have  
in mind...

[ ... ]

OK, the rationale behind Schema got too long so I offloaded it to a  
separate email.

[ ... ]

The next feature I'd planned to add to KinoSearch's Schema API is  
something called DeepFieldSpec.  It would allow KS to fake one-to- 
many relationships by applying a common FieldSpec to class names  
which share a common prefix.

Maybe we can bend that concept into something that fits your needs.

You don't know the field names in advance at index-time, but you must  
know exactly how you're going to define the fields -- otherwise, you  
couldn't make this work with KS 0.1x.  So we have a field spec.  We  
just need to associate it with field names.

Are there multiple specs?

Do they ever change?

Do you ever need to add fields in the middle of an indexing session  
or do you know them all up front?

What we probably need is a new KinoSearch::Schema class method, akin  
to init_fields() but with one more layer of indirection.  Instead of  
telling your Schema about a field, you tell it about a FieldSpec  
subclass and one or more field names.  Are you with me?   Could that  
work for you?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list