[KinoSearch] compound word splitting

Marc Elser melser at gmx.ch
Thu Aug 24 00:37:51 PDT 2006



Hi Marvin,

I'was googling more for compound word splitting and maybe there's a 
solution which could work for KinoSearch too.

There's a program called TSearch V2 
(http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/) which is a 
PostgreSQL extension which enhances PostGres by adding an inverted 
fulltext search indexes and adds new functions to PostgreSQL. One 
function is 'lexize' which you must pass the encoding and a word which 
returns the compound words if you have a dictionary which is tagged for 
compound words, but there are some dictionaries for swedish, german and 
other languages although I don't know if the other dictionaries are 
tagged too.

The extension is written in C and the code is not too big, I wonder if 
you could take a look at it and decide if it would be possible to create 
a new Analyzer maybe Analyzer::CompoundSplitter for KinoSearch. This 
would be really great and would work for all available dictionaries with 
  compound tagging. There's also a helper script which can create ispell 
dictionaries out of myspell dictionaries (they are from openoffice) and 
also a helper script which can tag dictionaries for compound words.

Cheers,

Marc


_______________________________________________
KinoSearch mailing list
KinoSearch at rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch




More information about the kinosearch mailing list