[KinoSearch] question on querying by url

Filippo A. Salustri salustri at ryerson.ca
Mon Apr 17 10:31:49 PDT 2006



No luck yet.

I rebuild the database, to make sure that the 'url' field had 
analyzed=>0 for all records.

But when I do
	$ks->search("url:http://blah.....")
I still get no hits.

Will keep futzing around with it.  But further ideas would be welcome.

Cheers.
Fil

Marvin Humphrey wrote:
> 
> On Apr 17, 2006, at 5:40 AM, Filippo A. Salustri wrote:
> 
>>>   $ki->spec_field ( name => 'url', boost => 1, indexed => 1, analyzed 
>>> => 1,
>>>                     stored => 1, compressed => 0 );
> 
> Unless you want to search for individual chunks within a URL (which 
> would be a pretty unusual technique), the field should not be analyzed.
> 
>     $ki->spec_field (
>         name       => 'url',
>         boost      => 1,
>         indexed    => 1,
>         analyzed   => 0, # !!
>         stored     => 1,
>         compressed => 0,
>     );
> 
>> Say the url I want to search for is by
>>     $q = "http://deseng.ryerson.ca/~fil";
>> I then do:
>>>   my $ks = KinoSearch::Searcher->new
>>>     ( invindex => "$serfcgi/db",
>>>       analyzer => KinoSearch::Analysis::PolyAnalyzer->new(language => 
>>> 'en'),
>>>       );
>>>   return $ks->search($q);
>>
>> I get an error saying that "http" is not a valid field name.  That's 
>> cool - I understand why it would do that.
> 
> Thanks for illustrating exactly why this parser behavior must be 
> documented.  :\
> 
> I think you've illustrated a second problem as well: KinoSearch is dying 
> when presented with an invalid field name, but it should just return an 
> empty result set instead.
> 
>> So I do
>>     $q = "url:http://deseng.ryerson.ca/~fil";
>>
>> Now the search returns 0 hits.
>>
>> Any ideas on what I'm doing wrong?
> 
> The Term that the QueryParser is creating looks like this:
> 
>     KinoSearch::Index::Term->new( 'url', 'http://deseng.ryerson.ca/~fil' );
> 
> ... but because the url field was analyzed using the English 
> PolyAnalyzer at index-time, you're only going to get results if you 
> search for "http", "deseng", "ryerson", "ca", or "fil".  Try this as an 
> experiment:
> 
>     $q = "url:fil";
> 
> I bet you will get some hits.
> 
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
> 
> 

-- 
Prof. Filippo A. Salustri, Ph.D., P.Eng.
Department of Mechanical and Industrial Engineering
Ryerson University                         Tel: 416/979-5000 x7749
350 Victoria St.                           Fax: 416/979-5265
Toronto, ON                                email: salustri at ryerson.ca
M5B 2K3  Canada                            http://deseng.ryerson.ca/~fil/

_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list