[KinoSearch] Boosting doc scores

Peter Karman peter at peknet.com
Mon Apr 12 06:45:58 PDT 2010


Dermot wrote on 04/12/2010 05:20 AM:

> I'm using $VERSION = '0.30_071' and boost is a attribute of the
> add_doc method in Indexer
> (~/KinoSearch-0.30_10/lib/KinoSearch/Index/Indexer.pod#add_doc(...))
> 

ah, thank you. I didn't look there.

>> If you want the boost to be conditional on the Matcher, it sounds like you'd
>> need to do that at search time. I don't think there is a straightforward answer
>> to your questions other than "try it and see" -- the math involved is always
>> going to be affected by the specifics of your particular document collection and
>> KS implementation.
> 
> 
> Your absolutely right, I will have to TIAS. I was hoping to get some
> idea before-hand if boost would over-ride the relevance. I suspect it
> might. It may be a question of the extent of influence the boost will
> have and then experimenting with values. One piece of information that
> might be useful but I can't find in the docs is the max value for a
> doc. That is, if I am to set the boost for a particular document, (the
> default is 1), what is the maximum limit? It looks like it is 10.

In my reading of the code, the doc_boost is a float and can be as big as
a float can be. It gets passed through to the underlying Posting::*
class, which uses it like this:

 float field_boost = doc_boost * FType_Get_Boost(type) * length_norm;

so it ends up being applied to all fields in the doc. Will it over-ride
the relevance? Depends on how big it is, I guess. The whole point is to
skew the raw IDF/TF score in one direction or another. How much it is
skewed will depend on a host of factors. If it were me, I would start
small (e.g. 2.0 or twice the normal) and see how it affects the
rankings. You're looking for a sweet spot where it affects them just
enough to privilege what you're after and not so much that it drowns out
reasonable rankings. Like a salad dressing.


> 
> If I can't get satisfactory results by boosting at index time, I'll
> have to attempt the far tricky business of boosting at search time.
> Option one would be preferable :)

Search-time would give you much more control since you could alter
rankings based on the actual query and/or resultset, rather than a
one-size-fits-all approach at indexing time. But like you point out,
it's more work.


-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the kinosearch mailing list