
FAQ
2.2.2
FAQ
Hey! REXML is untainting my strings!
Yes, it is, but not intentionally. REXML relies on String.unpack() and
Array.pack() to do encoding conversions, and in this process, tainting is
lost. If you have a really good reason why REXML should preserve
this attribute, at the cost of some speed, let me know and I'll
consider your argument.
Why is Element.elements indexed off of '1' instead of
'0'?
Because of XPath. The XPath specification states that the index of the
first child node is '1'. Although it may be counter-intuitive to
base elements on 1, it is more undesireable to have element.elements[0] ==
element.elements[ 'node()[1]' ]. Since I can't change the
XPath specification, the result is that Element.elements[1] is the first
child element.
Why isn't REXML a validating parser?
Because validating parsers must include code that parses and interprets
DTDs. I hate DTDs. REXML supports the barest minimum of DTD parsing, and
even that isn't complete. There is DTD parsing code in the works, but
I only work on it when I'm really, really bored. Rumor has it that a
contributor is working on a DTD parser for REXML; rest assured that any
such contribution will be included with REXML as soon as it is available.
I'm trying to create an ISO-8859-1 document, but when I add text to
the document it isn't being properly encoded.
Regardless of what the encoding of your document is, when you add text
programmatically to a REXML document you must ensure that you are
only adding UTF-8 to the tree. In particular, you can't add ISO-8859-1
encoded text that contains characters above 0x80 to REXML trees -- you
must convert it to UTF-8 before doing so. Luckily, this is easy:
text.unpack('C*').pack('U*') will do the
trick. 7-bit ASCII is identical to UTF-8, so you probably won't need
to worry about this.