[KinoSearch] utf8 (unicode) any progress on TokenBatch?
david at kineticode.com
Mon Aug 14 15:05:29 PDT 2006
On Aug 14, 2006, at 14:39, Marvin Humphrey wrote:
> Encode is a bit of a beast, though... Maybe provide
Encode is not difficult to use if you know what encoding you're
use Encode 'decode';
my $utf8 = decode($encoding, $string);
That's it. As long as you know the encoding, just use decode() to
decode your text to UTF8 and turn on the utf8 flag.
> Does Bricolage throw a fatal error if it encounters a non-UTF-8
> scalar and you haven't told it an encoding?
Yes. If you don't tell it an encoding, it assumes UTF-8, so you'll
get errors if you have non-UTF-8 or non-ASCII text.
> Maybe it would suffice to create KinoSearch::Docs::FAQ and make
> "What are those strange characters?" an item, with an Encode-based
> recipe for solving the problem.
It'd be a pretty simple recipe, I think.
More information about the kinosearch