编码 CDATA 元素的有效方法
好的,我正在使用 StreamReader 从流中读取数据。 流中的数据不是 xml,它可以是任何内容。
基于输入 StreamReader,我使用 XmlTextWriter 写入输出流。 基本上,总而言之,输出流包含来自输入流的数据,这些数据包装在父元素中包含的元素中。
我的问题是双重的。 数据以块的形式从输入流中读取,并且 StreamReader 类返回 char[]。 如果输入流中的数据包含“]]>” 它需要分成两个 CDATA 元素。 首先,如何搜索“]]>” 在字符数组中? 其次,因为我正在分块阅读,所以“]]>” 子字符串可以分为两个块,那么我该如何解释呢?
我可能可以将 char[] 转换为字符串,然后对其进行搜索替换。 这将解决我的第一个问题。 在每次读取时,我还可以检查最后一个字符是否是“]”,以便在下一次读取时,前两个字符是否是“]>” 我将开始一个新的 CDATA 部分。
这看起来效率很低,因为它涉及将 char 数组转换为字符串,这意味着花费时间来复制数据,并占用两倍的内存。 有没有更有效的方法,无论是速度还是记忆力?
Ok, I'm reading data from a stream using a StreamReader. The data inside the stream is not xml, it could be anything.
Based on the input StreamReader I'm writing to an output stream using an XmlTextWriter. Basically, when all is said and done, the output stream contains data from the input stream wrapped in a element contained in a parent element.
My problem is twofold. Data gets read from the input stream in chunks, and the StreamReader class returns char[]. If data in the input stream contains a "]]>" it needs to be split across two CDATA elements. First, how do I search for "]]>" in a char array? And second, because I'm reading in chunks, the "]]>" substring could be split across two chunks, so how do I account for this?
I could probably convert the char[] to a string, and do a search replace on it. That would solve my first problem. On each read, I could also check to see if the last character was a "]", so that on the next read, if the first two characters are "]>" I would start a new CDATA section.
This hardly seems efficient because it involves converting the char array to a string, which means spending time to copy the data, and eating up twice the memory. Is there a more efficient way, both speedwise and memory wise?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据如何在生成 XML 时避免被称为笨蛋:
只要对一小组特殊字符进行编码/转义,它就应该可以工作。
您是否必须自己处理转义是另一回事,但肯定更直接 - 。
然后只需将整个内容作为子文本节点附加到相关的 XML 元素即可
According to HOWTO Avoid Being Called a Bozo When Producing XML:
So long as the small set of special characters are encoded/escaped it should just work.
Whether you have to handle the escaping yourself is a different matter, but certainly a much more straightforward-to-solve problem.
Then just append the whole lot as a child text node to the relevant XML element.
我知道 CDATA 的两个实际用例:
一个是在包含脚本的 XHTML 文档中:
另一个是在手工编写的 XML 文档中,其中文本包含嵌入的标记,例如:
在所有其他情况下,只需让 DOM(或XmlWriter,或者任何您用来创建 XML 的工具)转义文本节点都可以正常工作。
I know of exactly two real use cases for CDATA:
One is in an XHTML document containing script:
The other is in hand-authored XML documents where the text contains embedded markup, e.g.:
In all other cases, just letting the DOM (or the XmlWriter, or whatever tool you're using to create the XML) escape the text nodes works just fine.
事实上,您必须保留队列中的最后两个字符,而不是立即将它们吐出。 然后,当新输入进入时,将其附加到队列中,并再次获取除最后两个字符之外的所有字符,对它们进行搜索和替换,然后输出。
更好的是:根本不用担心 CDATA 部分。 它们只是为了方便手工创作而存在。 如果您已经在进行搜索和替换,那么您没有理由不只搜索和替换 '<'、'>' 和“&” 及其预定义实体,并将它们包含在普通文本节点中。 由于这些是简单的单字符替换,因此您无需担心缓冲。
但是:如果您像您所说的那样使用 XmlTextWriter,那么就像为每个传入文本块调用 WriteString() 一样简单。
Indeed, you would have to keep back the last two characters in a queue instead of spitting them out immediately. Then when new input comes in, append it to the queue and again take all but the last two characters, search-and-replace over them, and output.
Better: don't bother with a CDATA section at all. They're only there for the convenience of hand-authoring. If you're already doing search-and-replace, there's no reason you shouldn't just search-and-replace ‘<’, ‘>’ and ‘&’ with their predefined entities, and include those in a normal Text node. Since those are simple single-character replacements, you don't need to worry about buffering.
But: if you're using an XmlTextWriter as you say, it's as simple as calling WriteString() on it for each chunk of incoming text.