[KinoSearch] Serialized Schema
peter at peknet.com
Fri Oct 5 06:17:50 PDT 2007
On 10/04/2007 08:46 PM, Marvin Humphrey wrote:
> On Oct 4, 2007, at 5:26 PM, Peter Karman wrote:
>> Sounds like what you want isn't an official subset of the language,
>> but rather something like a SGML document type definition (called
>> (overload overload) a schema in XML parlance). Just an official
>> declaration of what constitutes a legal KS header.
> I started messing around with defining what aspects of YAML are key.
> Is there an XSD schema for the Swish format? I haven't written one
> before, but being able to follow a schema for writing schemas (overload
> overload overload) appeals to me.
No, Swish-e 2.x doesn't use XML to store the header, and Swish3 doesn't have an
official Schema. Yet. Probably will though, when I can get back to that project.
> A switch to XML for KS metadata file serialization might be in order.
> It was kind of a toss up between the two contenders. But when I
> brought this up on the Lucene list a while ago, people were like "YAML,
> what's that?". And Swish uses XML. Might be time to go with the flow.
> (Switching wouldn't even be disruptive, since we'd just look for the
> segments_XXX.whatever file and parse it according to the extension.)
well, Java folk are notoriously xml-centric, so that doesn't surprise me.
XML vs YAML got discussed here before. I'm not convinced you need XML; it's
probably a little harder to read than YAML, but XML does have wider adoption at
this point in history. Guess it in part depends on (1) how hard it is to write
your own parser for either, and (2) if you have any philosophical agenda to
> I'd kind of like to stick with using a minimal custom parser rather than
> adding a full-on XML parser as a dependency. That means placing
> restrictions on the XML akin to those I laid out for YAML. You know
> whether spec'ing those sort of restrictions is something XSD is set up
> to handle?
You can definitely restrict the kind of XML allowed with a Schema. See
http://www.w3schools.com/schema/schema_elements_ref.asp for example.
You might just simplify it to the point where you don't allow any attributes,
limited nesting of elements, require all lowercase element names, etc. That
would make parsing much simpler. The hardest thing I have found in rolling my
own XML parser is tracking the nesting. If you take a SAX approach this is less
important, but if you take a DOM approach, then it's harder to do.
Then again, libxml2 is pretty widely available on all *nix systems and there
are Win32 ports available too...
Peter Karman . peter at peknet.com . http://peknet.com/
KinoSearch mailing list
KinoSearch at rectangular.com
More information about the kinosearch