On Apr 17, 2008, at 10:15 PM, Nathan Kurz wrote:
> So the tree of Queries is used to build a tree (typically) of Scorers,
> and each Query class has a one-to-one relationship with a Scorer
> class?
It's close to a one-to-one relationship, but it's not, quite. Some
optimizations are possible when compiling the Scorers.
For instance, if someone has created a PhraseQuery that only has one
term in it, you know you can compile that down to a TermScorer instead
of PhraseScorer. Or, even better, say you have a simple TermQuery,
and you find out that the term isn't in the index (because $searchable-
>doc_freq returns 0). Then you can just return undef (indicating a
null result set) instead of a Scorer.
> Is there any 'query' specific code in the query beyond the
> name of the Scorer class?
There is actually quite a lot that happens in between a Query and a
Scorer. That's where the "Weight" classes come in - they encapsulate
the process of compiling a Query to a Scorer.
Query classes are indeed, very simple. There's not much to them
excerpt for a make_weight() factory method (and an extract_terms()
method I'd really like to kill off).
> My desire for simplicity makes me wonder if
> one could just have a single 'QueryNode' class that instantiates a
> customizeable Scorer.
I don't quite follow.
>> People could potentially publish KSx subclasses that compile down to
>> scorers that behave differently from those in core.
>
> For a custom OrScorer that I'm interested in (short-circuit OR,
> returns the score of the first match of the ordered children) what
> would I subclass and how would I call it?
Your ORQuery subclass would probably look like this:
package FirstMatchORQuery;
use base qw( KinoSearch::Search::ORQuery );
sub make_weight {
my $self = shift;
return FirstMatchORWeight->new( @_, parent => $self );
}
package FirstMatchORWeight;
...
> My instinct is it would be
> simplest just to build the Scorer tree myself and stick with my
> FirstMatchScorer in at the appropriate places. But what would the
> right way be?
You mean how would you persuade QueryParser to use your ORQuery
variant rather than the default? Probably we'd need to give
QueryParser some sort of make_orquery() factory method you could
override.
I'm not sure I want that to happen right away in core, though.
QueryParser-type classes are sadly prone to death by Featuritis. This
is the kind of thing I'd rather see refined via KSx.
>> ANDQuery - Search for 'a AND b'.
>> ORQuery - Search for 'a OR b'.
>> ANDNOTQuery - Search for 'a AND NOT b'.
>
> Why not just have a NotQuery?
Good question, and I think, good suggestion.
When we swap out ANDNOTQuery for NOTQuery, all of a sudden we get a
coherent suite:
ANDQuery
ORQuery
NOTQuery
ReqOptQuery
Background:
NOTQuery hasn't been needed up till now. QueryParser doesn't parse
'NOT brobniquitz' down to a NOTQuery because it's standard behavior
for search engines to parse that kind of thing as a void query with no
result set rather than return the universe.
> It seems like it would be more general,
> and one could always build the 'a AND NOT b' using an AND and a NOT.
I think this is probably a good plan. I played back a couple
scenarios in my mind to see whether the combination of an ANDScorer
and a NOTScorer would needlessly iterate over more results than an
ANDNOTScorer would, but with Scorer_Skip_To, I couldn't come up with a
case where that would happen.
There's going to be a marginal increase in CPU overhead from wrapping
a positive scorer with a NOTScorer, but I doubt it will matter.
>> ANDORQuery is the odd one out, because it doesn't really mean 'a
>> AND/OR b'.
>> What it does is combine one optional clause and one required clause.
>
> Ditto. Why not just layer an AND and an OR?
I don't think that's quite the same thing??
> Or an AND with a
> hypothetical 'OptionalTermScorer' that returns some non-zero score if
> the term is not found?
If I follow what you're saying, I think that would sort of work, but
it's no clearer conceptually than a ReqOptQuery combining one required
clause and one optional clause.
> I do like the that Lucene names mention
> that they are 'Sum' scorers, though, as it seems useful to distinguish
> how the actual scoring is done.
Right. FYI, there's also a DisjunctionMaxScorer, which is mated with
a DisjunctionMaxQuery.
> ps. The ice cream goes pretty well: http://screamsorbet.com/
Beet Lemon Sorbet! Awesome.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch