Python XML: write " instead of &quot

Multi tool use
Multi tool use


Python XML: write " instead of &quot



I am using Python's xml minidom and all works well except that in text sequences it writes out &quot escape characters instead of ". This of course makes sense if a quote appears in a tag, but it bugs me in the text. How do I change this?


&quot


"





Do you mean you want " in text, and " in tags?
– Andrew Lee
Aug 11 '11 at 18:12


"


"





No, other way around: <?xml version="1.0" ?> <article title="boy says &quot; bla&quot;"> A little boy woke up and said "bla" </article> Or is this not possible?
– foges
Aug 11 '11 at 18:42



<?xml version="1.0" ?> <article title="boy says &quot; bla&quot;"> A little boy woke up and said "bla" </article>





XML parsers will not distinguish between the two. It will be "read" as " in the XML InfoSet. Why do you care which way the quotes are encoded?
– Mads Hansen
Aug 11 '11 at 19:20


"




1 Answer
1



looking at the source (Python 3.2 if it matters), this is hardcoded in the _write_data() function. you would need to modify the writexml() method of TextNode - either by subclassing it or simply editing it - so that it didn't call that method, but instead did something similar to escape only < and >.



if you created a subclass outside of the package (instead of copying and hacking the package to make your own custom xmlminidom) then it looks like, with a little care, you could make things work. so you would create your own (subclass of) TextNode, modified as above and then, to add text to the DOM, you would add an instance of your new class (or replace existing text nodes with instances of that class). you would need to set the ownerDocument attribute. perhaps simplest would be to also subclass Document and fix the createTextNode() method.



but i don't see a simpler way of doing what you want. it might be best to use a better dom implementation.



ps i have no idea whether this behaviour is required by the xml spec, or not. update: a quick scan of http://www.w3.org/TR/2008/REC-xml-20081126/#syntax suggests that only < and & must be encoded.





Thanks for the reply. Seems like a lot of work for something that probably isn't worth it. I followed your advice and found a different xml Library, lxml works very well :)
– foges
Aug 12 '11 at 13:24






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

dBQK6,3EgQs4 ljP1pRQCmLanM1kEV7NZ5MrPA70H6SlDb hWz,pisn CZZLSyU8fV,vD3zrE
q4X XJxZ2GOzLEylfjZ121tG T,f1XcVipa

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications