我可以使用架构强制执行 XML 属性的顺序吗?
我们的 C++ 应用程序从 XML 文件中读取配置数据,如下所示:
<data>
<value id="FOO1" name="foo1" size="10" description="the foo" ... />
<value id="FOO2" name="foo2" size="10" description="the other foo" ... />
...
<value id="FOO300" name="foo300" size="10" description="the last foo" ... />
</data>
完整的应用程序配置由大约 2500 个 XML 文件组成(可转换为超过 150 万个键/值属性对)。 XML 文件来自许多不同的来源/团队,并根据模式进行验证。然而,有时
节点看起来像这样:
<value name="bar1" id="BAR1" description="the bar" size="20" ... />
或者这样:
<value id="BAT1" description="the bat" name="bat1" size="25" ... />
为了使这个过程更快,我们使用Expat 解析 XML 文档。 Expat 将属性公开为数组 - 如下所示:
void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
// The attributes are stored in an array of XML_Char* where:
// the nth element is the 'key'
// the n+1 element is the value
// the final element is NULL
for (int i = 0; atts[i]; i += 2)
{
std::string key = atts[i];
std::string value = atts[i + 1];
ProcessAttribute (key, value);
}
}
这将所有责任交给我们的 ProcessAttribute() 函数来读取“键”并决定如何处理该值。 对应用程序进行分析表明,总 XML 解析时间的约 40% 是按名称/字符串处理这些属性。
如果我能够保证/强制执行这些属性的顺序,则整个过程可能会显着加快。属性(对于初学者来说,ProcessAttribute()
中没有字符串比较)。例如,如果“id”属性始终是第一个属性,我们可以直接处理它:
void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
// The attributes are stored in an array of XML_Char* where:
// the nth element is the 'key'
// the n+1 element is the value
// the final element is NULL
ProcessID (atts[1]);
ProcessName (atts[3]);
//etc.
}
根据 W3C 架构规范,我可以使用
在 XML 模式中强制执行元素的顺序 - 但它似乎不适用于属性 - 或者我可能错误地使用它:
<xs:element name="data">
<xs:complexType>
<xs:sequence>
<xs:element name="value" type="value_type" minOccurs="1" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="value_type">
<!-- This doesn't work -->
<xs:sequence>
<xs:attribute name="id" type="xs:string" />
<xs:attribute name="name" type="xs:string" />
<xs:attribute name="description" type="xs:string" />
</xs:sequence>
</xs:complexType>
Is there a way to强制属性顺序在 XML 文档中?如果答案是“否” - 有人可能会建议一种不会带来巨大的运行时性能损失的替代方案吗?
Our C++ application reads configuration data from XML files that look something like this:
<data>
<value id="FOO1" name="foo1" size="10" description="the foo" ... />
<value id="FOO2" name="foo2" size="10" description="the other foo" ... />
...
<value id="FOO300" name="foo300" size="10" description="the last foo" ... />
</data>
The complete application configuration consist of ~2500 of these XML files (which translates into more than 1.5 million key/value attribute pairs). The XML files come from many different sources/teams and are validated against a schema. However, sometimes the <value/>
nodes look like this:
<value name="bar1" id="BAR1" description="the bar" size="20" ... />
or this:
<value id="BAT1" description="the bat" name="bat1" size="25" ... />
To make this process fast, we are using Expat to parse the XML documents. Expat exposes the attributes as an array - like this:
void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
// The attributes are stored in an array of XML_Char* where:
// the nth element is the 'key'
// the n+1 element is the value
// the final element is NULL
for (int i = 0; atts[i]; i += 2)
{
std::string key = atts[i];
std::string value = atts[i + 1];
ProcessAttribute (key, value);
}
}
This puts all the responsibility onto our ProcessAttribute()
function to read the 'key' and decide what to do with the value. Profiling the app has shown that ~40% of the total XML Parsing time is dealing with these attributes by name/string.
The overall process could be sped up dramatically if I could guarantee/enforce the order of the attributes (for starters, no string comparisons in ProcessAttribute()
). For example, if 'id' attribute was always the 1st attribute we could deal with it directly:
void ExpatParser::StartElement(const XML_Char* name, const XML_Char** atts)
{
// The attributes are stored in an array of XML_Char* where:
// the nth element is the 'key'
// the n+1 element is the value
// the final element is NULL
ProcessID (atts[1]);
ProcessName (atts[3]);
//etc.
}
According to the W3C schema specs, I can use <xs:sequence>
in an XML schema to enforce the order of elements - but it doesn't seem to work for attributes - or perhaps I'm using it incorrectly:
<xs:element name="data">
<xs:complexType>
<xs:sequence>
<xs:element name="value" type="value_type" minOccurs="1" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="value_type">
<!-- This doesn't work -->
<xs:sequence>
<xs:attribute name="id" type="xs:string" />
<xs:attribute name="name" type="xs:string" />
<xs:attribute name="description" type="xs:string" />
</xs:sequence>
</xs:complexType>
Is there a way to enforce attribute order in an XML document? If the answer is "no" - could anyone perhaps suggest a alternative that wouldn't carry a huge runtime performance penalty?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
根据xml规范,
您可以在 第 3.1 节
According to the xml specification,
You can check it at section 3.1
XML 属性没有顺序,因此没有强制执行的顺序。
如果您想要订购某些东西,则需要 XML 元素。或者与 XML 不同的东西。例如,JSON、YAML 和 bEncode 都有映射(无序)和序列(有序)。
XML attributes don't have an order, therefore there is no order to enforce.
If you want something ordered, you need XML elements. Or something different from XML. JSON, YAML and bEncode, e.g. have both maps (which are unordered) and sequences (which are ordered).
正如其他人指出的那样,不,您不能依赖属性排序。
如果我有任何涉及 2,500 个 XML 文件和 150 万个键/值对的流程,我会尽快将这些数据从 XML 中取出并转换为更可用的形式。数据库、二进制序列化格式等等。使用 XML(除了模式验证之外)并没有获得任何优势。每次获得新的 XML 文件时,我都会更新我的商店,并从流程的主要流程中解析 150 万个 XML 元素。
As others have pointed out, no, you can't rely on attribute ordering.
If I had any process at all involving 2,500 XML files and 1.5 million key/value pairs, I would get that data out of XML and into a more usable form as soon as I possibly could. A database, a binary serialization format, whatever. You're not getting any advantage out of using XML (other than schema validation). I'd update my store every time I got a new XML file, and take parsing 1.5 million XML elements out of the main flow of my process.
唉,答案是是不。我对你 40% 的数字感到震惊。我很难相信将“foo”变成 ProcessFoo 需要这么长时间。您确定 40% 不包括执行 ProcessFoo 所花费的时间吗?
是否可以使用 Expat 事物按名称访问属性?这是访问属性的更传统的方式。我并不是说它会更快,但可能值得一试。
The answer is no, alas. I'm shocked by your 40% figure. I find it hard to believe that turning "foo" into ProcessFoo takes that long. Are you sure the 40% doesn't include the time taken to execute ProcessFoo?
Is it possible to access the attributes by name using this Expat thing? That's the more traditional way to access attributes. I'm not saying it's going to be faster, but it might be worth a try.
我不认为 XML Schema 支持这一点 - 属性只是由名称定义和限制,例如它们必须匹配特定名称 - 但我不知道如何在 XSD 中定义这些属性的顺序。
我不知道有任何其他方法可以确保 XML 节点上的属性按特定顺序排列 - 不确定其他任何 XML 模式机制(例如 Schematron 或 Relax NG)是否支持这一点......
I don't think XML Schema supports that - attributes are just defined and restricted by name, e.g. they have to match a particular name - but I don't see how you could define an order for those attributes in XSD.
I don't know of any other way to make sure attributes on a XML node come in a particular order - not sure if any of the other XML schema mechanisms like Schematron or Relax NG would support that....
我非常确定没有办法在 XML 文档中强制执行属性顺序。我假设您可以通过业务流程或其他人为因素(例如合同或其他文件)坚持这样做。
如果您只是假设第一个属性是“id”,并测试名称来确定怎么办?如果是,则使用该值,如果不是,则可以尝试通过名称获取该属性或丢弃该文档。
虽然不如按序号调用属性那么有效,但在非零次数的情况下,您将能够猜测您的数据提供者已将 XML 交付给规范。其余时间,您可以采取其他行动。
I'm pretty sure there's no way to enforce attribute order in an XML document. I'm going to assume that you can insist on it via a business process or other human factors, such as a contract or other document.
What if you just assumed that the first attribute was "id", and tested the name to be sure? If yes, use the value, if not, then you can try to get the attribute by name or throw out the document.
While not as efficient as calling out the attribute by its ordinal, some non-zero number of times you'll be able to guess that your data providers have delivered XML to spec. The rest of the time, you can take other action.
只是猜测,但是您可以尝试将
use="required"
添加到每个属性规范中吗?我想知道解析器是否因允许可选属性而减慢速度,当它出现时,您的属性将始终存在。
再说一遍,只是猜测。
编辑: XML 1.0 规范表示属性顺序并不重要。 http://www.w3.org/TR/REC-xml/# sec-starttags
因此,XSD 不会强制执行任何命令。但这并不意味着解析器不能被愚弄而快速工作,因此我将发布上述答案,以防它确实有效。
Just a guess, but can you try adding
use="required"
to each of your attribute specifications?I'm wondering if the parser is being slowed down by allowing optional attributes, when it appears your attributes will always be there.
Again, just a guess.
EDIT: XML 1.0 spec says that attribute order is not significant. http://www.w3.org/TR/REC-xml/#sec-starttags
Therefore, XSD won't enforce any order. But that doesn't mean that parsers can't be fooled into working quickly, so I'm keeping the above answer published in case it actually works.
据我所知,Expat 是一个非验证解析器,而且更适合它..所以你可能可以放弃这个 XSD 想法。在许多 XML 方法中,依赖顺序也不是一个好主意(XSD 在元素顺序方面受到了很多批评,例如,MSFT 中 XML Web Services 的支持者或反对者)。
进行自定义编码并简单地扩展逻辑以实现更有效的查找或深入解析器源代码。围绕编码有效替换编写工具,同时保护软件代理和用户免受它的影响,这是微不足道的。您希望这样做,以便可以轻松迁移,同时保持向后兼容性和可逆性。另外,请选择固定大小约束/属性名称翻译。
[ 认为自己很幸运,拥有 Expat :) 及其原始速度。想象一下 CLR 开发人员多么喜欢 XML 扩展功能,他们通常在“仅查询数据库”的过程中在线发送 200MB 的数据..]
From what I recall, Expat is a non validating parser and better for it.. so you can probably scrap that XSD idea. Neither is the order-dependent a good idea in many XML approaches (XSD got criticised on element order a heck of a lot back in the day, for example, by pro or anti- sellers of XML Web Services at MSFT).
Do your custom encoding and simply extend either your logic for more efficient lookup or dig into the parser source. It is trivial to write the tooling around encoding efficient replacement whilst shielding the software agents and users from it.. you want do to this so it is easily migrated while preserving backward compatibility and reversibility. Also, go for fixed-size constraints/attribute-name-translation.
[ Consider yourself lucky with Expat :) and its raw speed. Imagine how CLR devs love XML scaling facilities, they routinely send 200MB on the wire in process of 'just querying the database' .. ]