[KinoSearch] Getting unique hits

Eamon Daly edaly at nextwavemedia.com
Fri Dec 15 08:47:27 PST 2006



Hi, all. We have a set of documents that often returns
duplicate excerpts, and we'd like to filter those out.
The problem, obviously, is that I can't predict how many
results I need to retrieve. I've tried two (brane-dead)
methods of deduping, but both were prohibitively slow. One,
I ran $hits->seek($i, 1) repeatedly, stopping after 20
unique results; and two, I ran a seek from 0 to total_hits,
deduped in a while loop and bailed out after 20.

I've been playing with the KS::Search::HitQueueCollector,
but again, I'm running into the issue where I need to know
exactly how many unique results there are ahead of time. I
suspect that the answer might lie somewhere in
KS::Search::FilteredCollector, but further complicating
matters is that this query already has a filter on it.

Any suggestions?

____________________________________________________________
Eamon Daly


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list