[KinoSearch] Getting unique hits
edaly at nextwavemedia.com
Fri Dec 15 08:47:27 PST 2006
Hi, all. We have a set of documents that often returns
duplicate excerpts, and we'd like to filter those out.
The problem, obviously, is that I can't predict how many
results I need to retrieve. I've tried two (brane-dead)
methods of deduping, but both were prohibitively slow. One,
I ran $hits->seek($i, 1) repeatedly, stopping after 20
unique results; and two, I ran a seek from 0 to total_hits,
deduped in a while loop and bailed out after 20.
I've been playing with the KS::Search::HitQueueCollector,
but again, I'm running into the issue where I need to know
exactly how many unique results there are ahead of time. I
suspect that the answer might lie somewhere in
KS::Search::FilteredCollector, but further complicating
matters is that this query already has a filter on it.
KinoSearch mailing list
KinoSearch at rectangular.com
More information about the kinosearch