[KinoSearch] passing positions
Nathan Kurz
nate at verse.com
Fri Sep 7 20:24:58 PDT 2007
On 9/7/07, Marvin Humphrey <marvin at rectangular.com> wrote:
> > Something like allowing:
> > PhraseScorer -> AndScorer -> [TermScorer TermScorer TermScorer]
>
> Good plan. Can we do that now, in isolation from the rest of the
> changes?
It's possible with your greater familiarity with the current code that
you could do so, but I haven't found a comfortable way to do it.
> > I'm going to propose two main subclasses for Scorer, MultiScorer and
> > MatchScorer. MultiScorer's contain a public VArray of other Scorer's,
> > while MatchScorer's contain a public Match struct.
>
> Interesting. Do you end up with more subscorers than before?
I'm not done yet, but I think it will end up the same or fewer than
before. The parent classes are new, but ANDOR and ANDNOT go away,
replaced by simple combinations of And, Or, and Not. Not is new (and
not yet done), but BooleanScorer goes. More importantly, I think
things like PhraseScorer and its unborn ilk will be simpler as they
won't have to duplicate the low-level work.
> > It's removed from Match in my new incarnation, but would
> > would you prefer it to be called: 'index_field', 'field_num'?
>
> field_num.
Agreed to be better, and changed.
> "field", when it's used at all, means "field name". It used to mean
> a Field object -- before I killed that class -- and that's still the
> place it holds in concept-space.
At one point I asked my grandfather about some directions he gave me
based on "the road where the bridge is out", and was suprised to learn
he'd never actually seen the bridge in the half-century he'd been in
that area, and that it must have washed out before he was born.
That was just how people referred to that road. :)
> I'm having trouble visualizing this. I wish there was a way to
> divide and conquer this problem more effectively.
I haven't found it yet. I think it might be possible, though, at
least in pieces. I'm hoping to get it working independent from the
existing code, and then work with you to integrate it. Hopefully once
I have a working model (ie, once I have the ocean at a rolling boil),
the ways it can be incrementally incorporated will become clearer.
> I think you need collation for the PhraseScorer. Say you're
> iterating over positions in several subscorers. You have position 35
> and 36; now you need 37. If you haven't kept track of where each
> subscorer is at, you'll have to start from scratch with each one.
> If you don't, and the subscorer has multiple subscorers itself, you
> might miss something.
We might just be defining 'collation' differently. I agree that one
needs to keep track the current position within each Scorer, but I
think this can be done with a pointer rather than a copy. I'll fire
off another message with my current version of PhraseScorer so you can
see what I'm doing. Likely we mean the same thing but are just
describing it differently.
Nathan Kurz
nate at verse.com
_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
More information about the kinosearch
mailing list