[KinoSearch] Stemming and scoring

Marvin Humphrey marvin at rectangular.com
Thu Feb 15 09:48:54 PST 2007




On Feb 15, 2007, at 8:35 AM, Eamon Daly wrote:

> I see that in the spec_field I
> can specify an analyzer, but I don't see an equivalent for
> the Searcher.

Mmm, yes, you're right.  That's not in 0.15.

It is in svn trunk, though.

> I suspect I have to go the long way 'round and build a
> QueryParser of my own. Correct?

Yes.  Bummer.  That's not optimal because things get tricky with  
+term/-term and multiple fields when you use multiple QueryParsers -  
the bug that Henry uncovered and I fixed in 0.13.

In 0.20's new API, analyzers are specified via an external "Schema"  
module:

     # MySchema.pm
     use KinoSearch::Analysis::PolyAnalyzer;
     use KinoSearch::Analysis::LCNormalizer;
     use KinoSearch::Analysis::Tokenizer;

     package MySchema::stemmed;
     use base qw( KinoSearch::Schema::Field );

     package MySchema::unstemmed;
     use base qw( KinoSearch::Schema::Field );
     sub analyzer {
         my $lc_normalizer = KinoSearch::Analysis::LCNormalizer->new;
         my $tokenizer = KinoSearch::Analysis::Tokenizer->new;
         return KinoSearch::Analysis::PolyAnalyzer->new(
             analyzers => [ $lc_normalizer, $tokenizer ],
         );
     }

     package MySchema;
     use base qw( KinoSearch::Schema );
     sub analyzer {
         return KinoSearch::Analysis::PolyAnalyzer->new( language =>  
'en' );
     }
     __PACKAGE__->load_fields(qw( stemmed unstemmed ));

     1;

Both indexer and searcher scripts use the Schema module.  No more  
search-time/index-time analyzer mismatch issues!  :D

     # invindexer.plx
     use MySchema;
     use KinoSearch::InvIndexer;

     my $invindexer = KinoSearch::InvIndexer->new(
         invindex => MySchema->clobber('/path/to/invindex'),
     );

     ...

     # searcher.cgi
     use MySchema;
     use KinoSearch::Searcher;

     my $searcher = KinoSearch::Searcher->new(
         invindex => MySchemma->open('/path/to/invindex'),
     );

     ...

See the doco for KinoSearch::Schema for further details.

     http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/ 
Schema.html

The doco for KinoSearch::Schema::Field, I have to go write...

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/



_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list