[KinoSearch] BitVectors

Dermot paikkos at googlemail.com
Mon Feb 22 15:26:09 PST 2010


On 22 February 2010 22:58, Marvin Humphrey <marvin at rectangular.com> wrote:
> On Mon, Feb 22, 2010 at 09:44:13PM +0000, Dermot wrote:
>> Let's see if I understand you correctly.
>>
>> If the above document is restricted in USA, DEU and ITA, do you think
>>
>> $indexer->add_doc({keywords = $polyanalyzed, country => 'USA DEU ITA'});
>
> Ah, so there can be more than one country.  Then you need a FullTextType
> field.
>
>  # Index time:
>  my $whitespace_tokenizer = KinoSearch::Analyzer::Tokenizer->new(
>    pattern => '\\S+',
>  );
>  my $type = KinoSearch::FieldType::FullTextType->new(
>    analyzer => $whitespace_tokenizer,
>  );
>  $schema->spec_field( name => 'censored', type => $type );
>  ...
>  $indexer->add_doc({
>    keywords => $keywords,
>    censored => 'USA DEU ITA',
>  });
>
>  # Search time:
>  my $qparser = KinoSearch::QueryParser->new(
>    schema => $searcher->get_schema,
>  );
>  my $user_query = $qparser->parse($query_string);
>  my $ita = KinoSearch::Index::TermQuery->new(
>    field => 'censored',
>    term  => 'ITA',
>  );
>  my $not_ita = KinoSearch::Search::NOTQuery->new(
>    negated_query => $ita,
>  );
>  my $and_query = KinoSearch::Search::ANDQuery->new(
>    children => [ $user_query, $not_ita ],
>  );
>  my $hits = $searcher->hits( query => $and_query );


Yes it is more than one country. In fact, about 30% of all the
documents to be indexed will be restricted in ALL countries expect
one. So this 'censored' field may have to accommodate a large amount
of text.

And I also note that NOTQuery needs a negated_query => KS::Index::TermQuery.

Thank you for showing me how to set up the indexed field. That's really useful.
Dp,



More information about the kinosearch mailing list