[KinoSearch] Boosting doc scores
Dermot
paikkos at googlemail.com
Mon Apr 12 03:20:57 PDT 2010
On 11 April 2010 04:27, Peter Karman <peter at peknet.com> wrote:
> Dermot wrote on 4/8/10 5:02 AM:
>
>> Here's Marvin's example:
>>
>> my $doc = KinoSearch::Doc->new(
>> fields => \%fields,
>> boost => calc_boost( $fields{sales} )
>> );
>>
>> My question is, would this completely eschew the results? If the
>> search term is 'parrot', I'm worried that the hits would start
>> returning lots of 'kittens' because more kittens are sold than
>> parrots. There looks like there is the danger of loosing the term
>> relevance.
>>
>> Is boost between 0-10? Before I begin experimenting with values for
>> boost, it would be nice to know what I can expect. Ideally I like the
>> boost/score only to be relevant once the Matcher has picked out those
>> docs which match based on the term. There is, after all, the 2nd
>> option to use the Docs/Cookbook/CustomQuery.html option although that
>> is considerably more complicated than boosting at index time.
>
> Presumably you are using KinoSearch1 (or KinoSearch 0.1x), since I don't even
> see 'boost' as an option anymore in the 0.3x branch.
I'm using $VERSION = '0.30_071' and boost is a attribute of the
add_doc method in Indexer
(~/KinoSearch-0.30_10/lib/KinoSearch/Index/Indexer.pod#add_doc(...))
> If you want the boost to be conditional on the Matcher, it sounds like you'd
> need to do that at search time. I don't think there is a straightforward answer
> to your questions other than "try it and see" -- the math involved is always
> going to be affected by the specifics of your particular document collection and
> KS implementation.
Your absolutely right, I will have to TIAS. I was hoping to get some
idea before-hand if boost would over-ride the relevance. I suspect it
might. It may be a question of the extent of influence the boost will
have and then experimenting with values. One piece of information that
might be useful but I can't find in the docs is the max value for a
doc. That is, if I am to set the boost for a particular document, (the
default is 1), what is the maximum limit? It looks like it is 10.
If I can't get satisfactory results by boosting at index time, I'll
have to attempt the far tricky business of boosting at search time.
Option one would be preferable :)
Dp.
More information about the kinosearch
mailing list