[KinoSearch] opening up the scorers
marvin at rectangular.com
Mon Apr 21 20:05:06 PDT 2008
On Apr 19, 2008, at 12:10 PM, Nathan Kurz wrote:
>> There is actually quite a lot that happens in between a Query and a
>> That's where the "Weight" classes come in - they encapsulate the
>> process of
>> compiling a Query to a Scorer.
> Any chance you could write up what actually happens here?
I've now rewritten the documentation for Query and Weight, added
official public APIs for Scorer and Tally, and tweaked
Cookbook::WildCardQuery. The new docs reflect something of a shift in
perspective, thanks to our ongoing conversations -- instead of
thinking of Weight's primary purpose as enabling reuse of Query
objects (the original Lucene rationale), we think of Weight as
something which compiles a Query to a Scorer.
I also did some of the work refactoring Weight that we discussed a
Please have a look at the new POD. It gets dynamically generated, so
you need to run Build.PL:
> And then
> perhaps feeling too embarrassed to publish the as-builts, rework this
> part of the architecture to make it simple, streamlined, and 3x5
> cardable? ;)
Things should be improved. I doubt you'll be satisfied, though,
because the TF/IDF stuff is still in there. :\
I thought about hiding away Sum_Of_Squared_Weights(),
Apply_Norm_Factor() and Normalize() away in a TFIDFWeight subclass.
Unfortunately, if I do that, then subclasses of Weight which aren't
also subclasses of TFIDFWeight wouldn't be able to participate in
recursive normalizing. For instance, say you had an ANDWeight whose
children were a TermWeight and a WildCardWeight -- you could only
Normalize() the TermWeight, since the WildCardWeight wouldn't have
that method. I don't think we'd be better off.
Instead, I changed those from abstract methods to real methods with
sensible defaults, and made it clear early on in the documentation
that they could be ignored under many circumstances.
KinoSearch mailing list
KinoSearch at rectangular.com
More information about the kinosearch