[KinoSearch] Indexing results

Pasquale Stirparo pstirparo at gmail.com
Fri May 15 16:48:29 PDT 2009


The application is something really easy. I've just to do some
"language statistics". I'll describe you an hypothetical scenario, so
you may have some advice.
I have 3 documents: doc1, doc2 and doc3. I wanna know which is the
most frequent word among all 3 documents as in each one of them ( so,
how many word "philosophy" there are in total, and how many in doc1,
doc2 and doc3).

Do you think is it possible with Kino?
Any suggestion will be very wellcome.

Thanks

Pasquale


2009/5/15 Marvin Humphrey <marvin at rectangular.com>:
> On Thu, May 14, 2009 at 11:35:07AM +0200, Pasquale Stirparo wrote:
>
>> I'm new to kino and I'm findnig it very useful and easy to use.
>
> :)
>
>> I built my inverted index customizing the example at
>> http://rectangular.com/kinosearch/docs/stable/KinoSearch/InvIndexer.html.
>> Now, I would like to know how many times the word "target" is present
>> in a document. Or it would be nice to have a kind of chart of the
>> occurrence of every word, for example:
>> "target1" 25
>> "target2" 23
>> "target3" 14
>> ... and so on
>
> There is not a public interface for accessing that information in the stable
> branch.  SVN trunk has one way
> (KinoSearch::Search::Compiler::highlight_spans), but it's going to be
> undergoing revision soon and it's a little involved.
>
>> Until now I only managed, using the searcher, to know in wich document
>> a word is or in how many documents.
>
> Right, finding matching documents is a search engine library's first task.
>
> What's the application?
>
> Marvin Humphrey
>
>
>
> _______________________________________________
> kinosearch mailing list
> kinosearch at rectangular.com
> http://rectangular.com/cgi-bin/mailman/listinfo/kinosearch
>



More information about the kinosearch mailing list