[KinoSearch] Stemming and scoring
Marvin Humphrey
marvin at rectangular.com
Thu Feb 15 09:48:54 PST 2007
On Feb 15, 2007, at 8:35 AM, Eamon Daly wrote:
> I see that in the spec_field I
> can specify an analyzer, but I don't see an equivalent for
> the Searcher.
Mmm, yes, you're right. That's not in 0.15.
It is in svn trunk, though.
> I suspect I have to go the long way 'round and build a
> QueryParser of my own. Correct?
Yes. Bummer. That's not optimal because things get tricky with
+term/-term and multiple fields when you use multiple QueryParsers -
the bug that Henry uncovered and I fixed in 0.13.
In 0.20's new API, analyzers are specified via an external "Schema"
module:
# MySchema.pm
use KinoSearch::Analysis::PolyAnalyzer;
use KinoSearch::Analysis::LCNormalizer;
use KinoSearch::Analysis::Tokenizer;
package MySchema::stemmed;
use base qw( KinoSearch::Schema::Field );
package MySchema::unstemmed;
use base qw( KinoSearch::Schema::Field );
sub analyzer {
my $lc_normalizer = KinoSearch::Analysis::LCNormalizer->new;
my $tokenizer = KinoSearch::Analysis::Tokenizer->new;
return KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [ $lc_normalizer, $tokenizer ],
);
}
package MySchema;
use base qw( KinoSearch::Schema );
sub analyzer {
return KinoSearch::Analysis::PolyAnalyzer->new( language =>
'en' );
}
__PACKAGE__->load_fields(qw( stemmed unstemmed ));
1;
Both indexer and searcher scripts use the Schema module. No more
search-time/index-time analyzer mismatch issues! :D
# invindexer.plx
use MySchema;
use KinoSearch::InvIndexer;
my $invindexer = KinoSearch::InvIndexer->new(
invindex => MySchema->clobber('/path/to/invindex'),
);
...
# searcher.cgi
use MySchema;
use KinoSearch::Searcher;
my $searcher = KinoSearch::Searcher->new(
invindex => MySchemma->open('/path/to/invindex'),
);
...
See the doco for KinoSearch::Schema for further details.
http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/
Schema.html
The doco for KinoSearch::Schema::Field, I have to go write...
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list