CDATA 部分真的没有必要吗?

发布于 2024-10-05 05:03:28 字数 809 浏览 5 评论 0原文

这个问题是由开发人员 Michael Rys 相当激进地拒绝将 CDATA 部分的解析包含到 FOR XML PATH 中引起的,因为 "您存储的数据没有语义差异。"

我已在 CDATA 节点和其他内容中存储了 HTML 块需要使用特殊或尴尬的字符。然而,我觉得没有资格质疑 Rys 有争议的主张,因为我认为,从技术上讲,在我为了方便而使用 CDATA 的情况下,他是正确的。

真正让我困惑的是,当开发人员在互联网上寻求如何使用 FOR XML PATH 呈现 CDATA 段的建议时,受访者不断指示他们使用 FOR XML EXPLICIT,XML 呈现方法 Rys 引用为“来自地狱的查询”。

如果我们真的能做到在任何人都可以建议的每个用例中都没有 CDATA,我想我们应该停止抱怨并拒绝使用 CDATA。但是,如果有明确定义的情况,其中 CDATA 至关重要,Rys 已经承诺将其烘焙到 FOR XML PATH 中,并在这个问题的最上面的链接中继续前进。

那么它会是哪一个呢? CDATA 部分真的是过去的遗物吗?或者 Rys 应该抽出手指并允许在 FOR XML PATH 中进行 CDATA 解析?与此同时,是否有任何技巧可以让 FOR XML PATH 返回 CDATA 部分?

This question is prompted by the rather militant refusal of developer Michael Rys to include the parsing of CDATA sections into FOR XML PATH because "There is no semantic difference in the data that you store."

I have stored nuggets of HTML in CDATA nodes and other content that requires the use of special or awkward characters. However I don't feel qualified to challenge Rys's controversial assertion because, I suppose, technically he is correct in the scenarios where I've employed CDATA for convenience.

What's really baking my noodle is that, as developers take to the internet begging for advice on how to render CDATA segments using FOR XML PATH, respondents continually direct them to use FOR XML EXPLICIT instead, the XML rendering method Rys cited as being the "query from hell".

If we can really do without CDATA in every use case that anyone can suggest I guess we should stop moaning and reject CDATA usage henceforth. But if there are clearly defined cases where CDATA is essential Rys already undertook that he would bake it into FOR XML PATH going forward in the topmost link in this question.

So which is it to be? Are CDATA sections really relics of the past? Or should Rys pull his finger out and allow for CDATA parsing in FOR XML PATH? And while we're at it, in the meanwhile, are there any hacks for getting FOR XML PATH to return CDATA sections?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

稀香 2024-10-12 05:03:28

CDATA 部分是不必要的。它们不是“过去的遗物”,因为它们一直都是不必要的。

这并不意味着它们没有用。看看几乎任何编程语言或库,您都会发现很多可以不用的东西,因为它们在语义上与其他东西等效,但如果有人坐在那里必须编写这些东西,那么这些东西就很有用。

就这一点而言,即使采用程序化生产,也可以方便地采取相反的方法并对每一个 c 数据使用 CDATA 部分(臃肿,但它可以在其他地方提高效率)。

FOR XML PATH 不需要人工坐在那里编写这些内容。它是一种从 SQL 查询结果生成有效 XML 的方法。 (这也不是解析 CDATA 部分的问题,而是生成它们的问题 - 另一回事)。

当您想要真正精细的控制时,您不能真正抱怨 FOR XML EXPLICIT 是替代方案 - FOR XML EXPLICIT 有时使用起来如此令人讨厌的原因正是因为它为您提供了真正精细的控制。事实上,考虑一下他们是否首先添加了对 CDATA 部分的支持,然后添加了对其他每个调整和配置选项的支持,这对于其他人来说似乎同样重要。由于 FOR XML EXPLICIT 比 FOR XML PATH 更直接,需要多长时间才会自动选择它?

CDATA 在四种情况下很有用:

  1. 您坐在键盘上自己输入这些内容。
  2. 您正在处理不同技术与不同时间设计的不同标准的混合,这些技术将由不同的解析器以不同的方式解释(例如嵌入到 XHTML 中的 javascript - 尽管这里不是 100% 必要,否则将是一场噩梦)。
  3. 您正在尝试使用无法理解 XML 的内容来解析 XML。
  4. 您尝试使用基于解析器构建的东西,该解析器允许区分 CDATA 部分和其他字符数据的低级访问,并且不恰当地使用该低级访问。

有趣的是,这四种情况也是禁止接受 CDATA 部分有意义的四种情况。

情况 1 在这里不适用,它不是人类生成的代码。
如果您正在做一些非常疯狂的事情,则案例 2 可以适用于此。坦率地说,缺少 CDATA 部分是您最不用担心的;改为在查询中生成更简单的 XML 并将其转换到其他地方。
情况 3 可以适用于此,但如果确实如此,向 SQL 人员抱怨是不公平的,因为您应该向不处理 <example> 的损坏的 XML 解析器抱怨与 ]]> 相同。
案例 4 可以适用于此,但再次向编写错误代码的人抱怨,而不是向 SQL 人员抱怨。

CDATA sections are unnecessary. They're not a "relic of the past" because they've always been unnecessary.

This does not mean they aren't useful. Look at just about any programming language or library and you can find a large number of things you could do without because they are semantically equivalent to something else, but which are useful if there's a human being sitting there having to write the stuff.

For that matter, even with programmatic production it's also handy that one could take the opposite approach and use CDATA sections for every single piece of c-data (bloaty, but it could have efficiency gains elsewhere).

FOR XML PATH does not involve a human being sitting there having to write the stuff. It's a means of producing valid XML from a the results of an SQL query. (It's also not a matter of parsing CDATA sections, but of producing them - a different matter).

And you can't really complain about FOR XML EXPLICIT being the alternative when you want really fine control - the reason FOR XML EXPLICIT is so nasty to use sometimes is precisely because it gives you really fine control. Indeed, consider if they first added support for CDATA sections and then added support for every other tweak and configuration option that seemed just as vital to someone else out there. How long would it take before FOR XML EXPLICIT was the automatic choice due to it being more straightforward than FOR XML PATH‽

There are four cases where CDATA are useful:

  1. You're sitting at a keyboard typing this stuff in yourself.
  2. You are dealing with a mixing different technologies with different standards designed at different times and which will be interpreted by different parsers in different ways (e.g javascript embedded into XHTML - though it's not 100% necessary here it's a nightmare to do otherwise).
  3. You're trying to parse the XML with something that doesn't understand XML.
  4. You're trying to use something built on a parser that allows low-level access that distinguishes between CDATA sections and other character data and using that low-level access inappropriately.

Funnily enough, these four cases are also the four cases where a ban on accepting CDATA sections can make sense.

Case 1 doesn't apply here, it isn't human-generated code.
Case 2 could apply here if you are doing something really crazy. Frankly, the lack of CDATA sections is the least of your worries here; switch to producing simpler XML in the query and transforming it elsewhere.
Case 3 could apply here, but it's not fair to complain to the SQL people if it does, when you should complain to the broken XML parser that doesn't treat <example> the same as <![CDATA[<example>]]>.
Case 4 could apply here, but again complain to the person who wrote the buggy code, not the SQL people.

末蓝 2024-10-12 05:03:28

如果您不关心其中数据的语义(即您不需要解析它 - 它只是一串字符),并且您不希望,CDATA 部分很有用转义其中的任何 XML。

根据 w3 的定义:

CDATA 节可能出现在字符数据可能出现的任何地方;它们用于转义包含字符的文本块,否则这些字符将被识别为标记。

来自维基百科

XML 文档的新作者常常会误解 CDATA 部分的用途,错误地认为其用途是“保护”数据在处理过程中不被视为普通字符数据。一些用于处理 XML 文档的 API 确实提供了独立访问 CDATA 部分的选项,但此类选项的存在超出了 XML 处理系统的正常要求,并且仍然不会更改数据的隐式含义。字符数据就是字符数据,无论它是通过 CDATA 节还是普通标记来表达。

CDATA 部分对于将 XML 代码编写为 XML 文档中的文本数据非常有用。例如,如果希望用 XSL 排版一本书,解释 XML 应用程序的使用,则出现在书中的 XML 标记将被写入源文件的 CDATA 部分中。但是,CDATA 部分不能包含字符串“]]>”因此 CDATA 节不可能包含嵌套的 CDATA 节。使用 CDATA 部分对包含三元组“]]>”的文本进行编码的首选方法的方法是通过在“>”之前拆分每个出现的三元组来使用多个 CDATA 部分。例如,编码“]]>”有人会写:

CDATA sections are useful if you don't care about the semantics of the data in them (i.e. you do not need to parse it - it is simply a run of characters), and you don't wish to escape any of the XML within them.

The definition, according to w3:

CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup.

From wikipedia:

New authors of XML documents often misunderstand the purpose of a CDATA section, mistakenly believing that its purpose is to "protect" data from being treated as ordinary character data during processing. Some APIs for working with XML documents do offer options for independent access to CDATA sections, but such options exist above and beyond the normal requirements of XML processing systems, and still do not change the implicit meaning of the data. Character data is character data, regardless of whether it is expressed via a CDATA section or ordinary markup.

CDATA sections are useful for writing XML code as text data within an XML document. For example, if one wishes to typeset a book with XSL explaining the use of an XML application, the XML markup to appear in the book itself will be written in the source file in a CDATA section. However, a CDATA section cannot contain the string "]]>" and therefore it is not possible for a CDATA section to contain nested CDATA sections. The preferred approach to using CDATA sections for encoding text that contains the triad "]]>" is to use multiple CDATA sections by splitting each occurrence of the triad just before the ">". For example, to encode "]]>" one would write:

梦里泪两行 2024-10-12 05:03:28

有趣的是,有人如何以如此异想天开的方式抛出标准中非常有价值的部分。并不是每个人都使用 XML 来表示几百个字符的 HTML 或下拉列表中的项目列表。

我们中的一些人实际上正在使用 XML 来交换数据,非常复杂的数据,例如 CCD、CDA CDR,这些都是医疗保健领域的标准文档格式,并且随着奥巴马医改变得越来越突出。这些文档结构的一部分包含附件,例如 DiCOM 图像、PDF 和其他二进制数据,解析器不应读取这些附件,因为 CDATA 定义存在。

为什么我要支付解析器读取嵌入在 CCD 文档中的 3 MB DiCom 图像的开销?既然文档来自原始数据并且是 XML 标准的一部分,为什么我要被迫将其分开呢?我希望能够找到并恢复文档以及 XML 内容。

这让我很困惑为什么你们都支持解析不打算被引擎解析的数据。如果引擎看到CDATA忽略它,那就很简单了。继续认为有些人不需要它的论点是无关紧要的。它是标准的一部分,应该维护该标准。如果他们想添加一个被称为“功能”的功能,那么可以使用一个选项来支持默认行为。

请停止解析 CDATA 并忽略它。

It is interesting to see how someone can just throw a very valuable piece of the Standard with such whimsical approach. Not everyone is using XML for a few hundred characters of HTML or a list of items for a drop down.

Some of us are actually using XML to exchange data, very complex data like a CCD, CDA CDR, these are all standard document formats in the healthcare arena and are becoming more and more prominent with ObamaCare. Part of these documents structure contain attachments things like DiCOM Images, PDF's and other Binary Data that should not be read by the parser the reason the CDATA definition exists.

Why should I pay the overhead of the parser reading a 3 megabyte DiCom image embedded in a CCD document? Why should I be forced to separate the document when it came in the original data and is part of the XML Standard. And I want the be able to locate and recover the document and is contents with XML.

This bewilders me why you all would support the parsing of data that is intended to not be parsed by the engine. If the engine sees CDATA ignore it, it is very simple. And the continued argument that some do not need it is irrelevant. It is part of the standard and the standard should be maintained. If they would like to add a "Feature" as it has been called then support the default behavior with an option.

Please stop parsing CDATA and ignore it.

〆凄凉。 2024-10-12 05:03:28

你说得完全正确,CDATA 在许多场景中都是必不可少的,它们是 XML 标准的一部分,并且应该受到每个 XML 操作工具/方法的支持。但事实是,微软通常不在乎..你知道,“640kB 对每个人来说应该足够了”这种方法。

编辑:关于 FOR XML EXPLICIT - 这是生成精确格式化的 XML 数据的最佳方法。是的,语法看起来有点痛苦和混乱,但是一旦你使用它几次,你就会欣赏它的美丽和力量。

You are absolutely right, CDATA are essential in many scenarios, they're part of XML standard and should be supported by every XML manipulation tool/method. But thing is that MS usually dosn't care .. you know, "640kB should be enough for everyone" kind of approach.

Edit: About FOR XML EXPLICIT - this is THE best method for generating precisely formatted XML data. Yes, syntax is kinda painful to look at and confusing, but once you use it feww times, you'll admire its beauty and power.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文