[KinoSearch] BitVectors
Dermot
paikkos at googlemail.com
Mon Feb 22 15:26:09 PST 2010
On 22 February 2010 22:58, Marvin Humphrey <marvin at rectangular.com> wrote:
> On Mon, Feb 22, 2010 at 09:44:13PM +0000, Dermot wrote:
>> Let's see if I understand you correctly.
>>
>> If the above document is restricted in USA, DEU and ITA, do you think
>>
>> $indexer->add_doc({keywords = $polyanalyzed, country => 'USA DEU ITA'});
>
> Ah, so there can be more than one country. Then you need a FullTextType
> field.
>
> # Index time:
> my $whitespace_tokenizer = KinoSearch::Analyzer::Tokenizer->new(
> pattern => '\\S+',
> );
> my $type = KinoSearch::FieldType::FullTextType->new(
> analyzer => $whitespace_tokenizer,
> );
> $schema->spec_field( name => 'censored', type => $type );
> ...
> $indexer->add_doc({
> keywords => $keywords,
> censored => 'USA DEU ITA',
> });
>
> # Search time:
> my $qparser = KinoSearch::QueryParser->new(
> schema => $searcher->get_schema,
> );
> my $user_query = $qparser->parse($query_string);
> my $ita = KinoSearch::Index::TermQuery->new(
> field => 'censored',
> term => 'ITA',
> );
> my $not_ita = KinoSearch::Search::NOTQuery->new(
> negated_query => $ita,
> );
> my $and_query = KinoSearch::Search::ANDQuery->new(
> children => [ $user_query, $not_ita ],
> );
> my $hits = $searcher->hits( query => $and_query );
Yes it is more than one country. In fact, about 30% of all the
documents to be indexed will be restricted in ALL countries expect
one. So this 'censored' field may have to accommodate a large amount
of text.
And I also note that NOTQuery needs a negated_query => KS::Index::TermQuery.
Thank you for showing me how to set up the indexed field. That's really useful.
Dp,
More information about the kinosearch
mailing list