Scala's built-in XML support is
perhaps flawed, but still offers very convenient syntax for simple XML manipulation. Even ignoring performance concerns and concurrency issues, there are still weird gotchas that the average user may need to deal with...
CDATA magically escaped
scala> val xml = <xml><test><![CDATA[a < b]]></test></xml>
xml: scala.xml.Elem = <xml><test>a < b</test></xml> <-- WTF?
Same when loading from a String:
scala> val xml = XML.loadString("<xml><test><![CDATA[a < b]]></test></xml>")
xml: scala.xml.Elem = <xml><test>a < b</test></xml>
This is not what you want. The stuff in the CDATA is meant to be left alone. Instead, it seems that the CDATA is eaten and its contents magically escaped. This causes lots of grief if the contents of the CDATA are Javascript, for example.
One workaround is to use the built-in
ConstructingParser to load XML.
scala> val xml2 = ConstructingParser.fromSource(Source.fromString("<xml><test><![CDATA[a < b]]></test></xml>"), preserveWS = true).document.docElem
xml2: scala.xml.Node = <xml><test><![CDATA[a < b]]></test></xml>
Looks good.
You can also use
<xml:unparsed>. Check out this
Scala XML faq for more.
XML Comments eaten
When loading XML from a string, XML comments disappear. Example:
scala> val looksGood = <xml><test><!-- comment --></test></xml>
looksGood: scala.xml.Elem = <xml><test><!-- comment --></test></xml>
scala> val wtf = XML.loadString("<xml><test><!-- comment --></test></xml>")
wtf: scala.xml.Elem = <xml><test></test></xml>
Again,
ConstructingParser can fix this:
scala> val correct = ConstructingParser.fromSource(Source.fromString("<xml><test><!-- comment --></test></xml>"), preserveWS = true).document.docElem
correct: scala.xml.Node = <xml><test><!-- comment --></test></xml>
There are some alternatives if you run into these issues.
- As described above, use scala.xml.parsers.ConstructingParser to load XML
- Use the Lift web framework's PCDataMarkupParser (extends Scala's built-in MarkupParser with various improvments)
- Daniel Spiewak's Anti-XML project looks promising
- Use any of the million Java XML parsers that are out there (but give up the convenient scala.xml syntax