我可以使用架构强制执行 XML 属性的顺序吗？

发布于 2024-08-09 15:45:40 字数 2556 浏览 13 评论 0原文

我们的 C++ 应用程序从 XML 文件中读取配置数据，如下所示：

<data>
 <value id="FOO1" name="foo1" size="10" description="the foo" ... />
 <value id="FOO2" name="foo2" size="10" description="the other foo" ... />
 ...
 <value id="FOO300" name="foo300" size="10" description="the last foo" ... />
</data>

完整的应用程序配置由大约 2500 个 XML 文件组成（可转换为超过 150 万个键/值属性对）。 XML 文件来自许多不同的来源/团队，并根据模式进行验证。然而，有时节点看起来像这样：

<value name="bar1" id="BAR1" description="the bar" size="20" ... />

或者这样：

<value id="BAT1" description="the bat" name="bat1"  size="25" ... />

为了使这个过程更快，我们使用Expat 解析 XML 文档。 Expat 将属性公开为数组 - 如下所示：

void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
 // The attributes are stored in an array of XML_Char* where:
 //  the nth element is the 'key'
 //  the n+1 element is the value
 //  the final element is NULL
 for (int i = 0; atts[i]; i += 2) 
 {
  std::string key = atts[i];
  std::string value = atts[i + 1];
  ProcessAttribute (key, value);
 }
}

这将所有责任交给我们的 ProcessAttribute() 函数来读取“键”并决定如何处理该值。 对应用程序进行分析表明，总 XML 解析时间的约 40% 是按名称/字符串处理这些属性。

如果我能够保证/强制执行这些属性的顺序，则整个过程可能会显着加快。属性（对于初学者来说，ProcessAttribute() 中没有字符串比较）。例如，如果“id”属性始终是第一个属性，我们可以直接处理它：

void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
 // The attributes are stored in an array of XML_Char* where:
 //  the nth element is the 'key'
 //  the n+1 element is the value
 //  the final element is NULL
 ProcessID (atts[1]);
 ProcessName (atts[3]);
 //etc.
}

根据 W3C 架构规范，我可以使用在 XML 模式中强制执行元素的顺序 - 但它似乎不适用于属性 - 或者我可能错误地使用它：

<xs:element name="data">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="value" type="value_type" minOccurs="1" maxOccurs="unbounded" />
  </xs:sequence>
 </xs:complexType>
</xs:element>

<xs:complexType name="value_type">
 <!-- This doesn't work -->
 <xs:sequence>
  <xs:attribute name="id" type="xs:string" />
  <xs:attribute name="name" type="xs:string" />
  <xs:attribute name="description" type="xs:string" />
 </xs:sequence>
</xs:complexType>

Is there a way to强制属性顺序在 XML 文档中？如果答案是“否” - 有人可能会建议一种不会带来巨大的运行时性能损失的替代方案吗？

原文

Our C++ application reads configuration data from XML files that look something like this:

<data>
 <value id="FOO1" name="foo1" size="10" description="the foo" ... />
 <value id="FOO2" name="foo2" size="10" description="the other foo" ... />
 ...
 <value id="FOO300" name="foo300" size="10" description="the last foo" ... />
</data>

The complete application configuration consist of ~2500 of these XML files (which translates into more than 1.5 million key/value attribute pairs). The XML files come from many different sources/teams and are validated against a schema. However, sometimes the <value/> nodes look like this:

<value name="bar1" id="BAR1" description="the bar" size="20" ... />

or this:

<value id="BAT1" description="the bat" name="bat1"  size="25" ... />

To make this process fast, we are using Expat to parse the XML documents. Expat exposes the attributes as an array - like this:

void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
 // The attributes are stored in an array of XML_Char* where:
 //  the nth element is the 'key'
 //  the n+1 element is the value
 //  the final element is NULL
 for (int i = 0; atts[i]; i += 2) 
 {
  std::string key = atts[i];
  std::string value = atts[i + 1];
  ProcessAttribute (key, value);
 }
}

This puts all the responsibility onto our ProcessAttribute() function to read the 'key' and decide what to do with the value. Profiling the app has shown that ~40% of the total XML Parsing time is dealing with these attributes by name/string.

The overall process could be sped up dramatically if I could guarantee/enforce the order of the attributes (for starters, no string comparisons in ProcessAttribute()). For example, if 'id' attribute was always the 1st attribute we could deal with it directly:

void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
 // The attributes are stored in an array of XML_Char* where:
 //  the nth element is the 'key'
 //  the n+1 element is the value
 //  the final element is NULL
 ProcessID (atts[1]);
 ProcessName (atts[3]);
 //etc.
}

According to the W3C schema specs, I can use <xs:sequence> in an XML schema to enforce the order of elements - but it doesn't seem to work for attributes - or perhaps I'm using it incorrectly:

<xs:element name="data">
 <xs:complexType>
  <xs:sequence>
   <xs:element name="value" type="value_type" minOccurs="1" maxOccurs="unbounded" />
  </xs:sequence>
 </xs:complexType>
</xs:element>

<xs:complexType name="value_type">
 <!-- This doesn't work -->
 <xs:sequence>
  <xs:attribute name="id" type="xs:string" />
  <xs:attribute name="name" type="xs:string" />
  <xs:attribute name="description" type="xs:string" />
 </xs:sequence>
</xs:complexType>

Is there a way to enforce attribute order in an XML document? If the answer is "no" - could anyone perhaps suggest a alternative that wouldn't carry a huge runtime performance penalty?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

你的背包 2024-08-16 15:45:40

根据xml规范，

开始标签或空元素标签中属性规范的顺序并不重要

您可以在第 3.1 节

回复收藏 0 原文

百思不得你姐 2024-08-16 15:45:40

XML 属性没有顺序，因此没有强制执行的顺序。

如果您想要订购某些东西，则需要 XML 元素。或者与 XML 不同的东西。例如，JSON、YAML 和 bEncode 都有映射（无序）和序列（有序）。

回复收藏 0 原文

寒冷纷飞旳雪 2024-08-16 15:45:40

正如其他人指出的那样，不，您不能依赖属性排序。

如果我有任何涉及 2,500 个 XML 文件和 150 万个键/值对的流程，我会尽快将这些数据从 XML 中取出并转换为更可用的形式。数据库、二进制序列化格式等等。使用 XML（除了模式验证之外）并没有获得任何优势。每次获得新的 XML 文件时，我都会更新我的商店，并从流程的主要流程中解析 150 万个 XML 元素。

回复收藏 0 原文

盗心人 2024-08-16 15:45:40

唉，答案是是不。我对你 40% 的数字感到震惊。我很难相信将“foo”变成 ProcessFoo 需要这么长时间。您确定 40% 不包括执行 ProcessFoo 所花费的时间吗？

是否可以使用 Expat 事物按名称访问属性？这是访问属性的更传统的方式。我并不是说它会更快，但可能值得一试。

回复收藏 0 原文

罪#恶を代价 2024-08-16 15:45:40

我不认为 XML Schema 支持这一点 - 属性只是由名称定义和限制，例如它们必须匹配特定名称 - 但我不知道如何在 XSD 中定义这些属性的顺序。

我不知道有任何其他方法可以确保 XML 节点上的属性按特定顺序排列 - 不确定其他任何 XML 模式机制（例如 Schematron 或 Relax NG）是否支持这一点......

回复收藏 0 原文

放血 2024-08-16 15:45:40

我非常确定没有办法在 XML 文档中强制执行属性顺序。我假设您可以通过业务流程或其他人为因素（例如合同或其他文件）坚持这样做。

如果您只是假设第一个属性是“id”，并测试名称来确定怎么办？如果是，则使用该值，如果不是，则可以尝试通过名称获取该属性或丢弃该文档。

虽然不如按序号调用属性那么有效，但在非零次数的情况下，您将能够猜测您的数据提供者已将 XML 交付给规范。其余时间，您可以采取其他行动。

回复收藏 0 原文

悲凉≈ 2024-08-16 15:45:40

只是猜测，但是您可以尝试将 use="required" 添加到每个属性规范中吗？

<xs:complexType name="value_type">
 <!-- This doesn't work -->
 <xs:sequence>
  <xs:attribute name="id" type="xs:string" use="required" />
  <xs:attribute name="name" type="xs:string" use="required" />
  <xs:attribute name="description" type="xs:string" use="required" />
 </xs:sequence>
</xs:complexType>

我想知道解析器是否因允许可选属性而减慢速度，当它出现时，您的属性将始终存在。

再说一遍，只是猜测。

编辑： XML 1.0 规范表示属性顺序并不重要。 http://www.w3.org/TR/REC-xml/# sec-starttags

因此，XSD 不会强制执行任何命令。但这并不意味着解析器不能被愚弄而快速工作，因此我将发布上述答案，以防它确实有效。

Just a guess, but can you try adding use="required" to each of your attribute specifications?

<xs:complexType name="value_type">
 <!-- This doesn't work -->
 <xs:sequence>
  <xs:attribute name="id" type="xs:string" use="required" />
  <xs:attribute name="name" type="xs:string" use="required" />
  <xs:attribute name="description" type="xs:string" use="required" />
 </xs:sequence>
</xs:complexType>

I'm wondering if the parser is being slowed down by allowing optional attributes, when it appears your attributes will always be there.

Again, just a guess.

EDIT: XML 1.0 spec says that attribute order is not significant. http://www.w3.org/TR/REC-xml/#sec-starttags

Therefore, XSD won't enforce any order. But that doesn't mean that parsers can't be fooled into working quickly, so I'm keeping the above answer published in case it actually works.

回复收藏 0 原文