清理/清理 xpath 属性
我需要动态构建元素属性的 XPath 查询,其中属性值由用户提供。 我不确定如何清理或清理该值以防止相当于 SQL 注入攻击的 XPath。 例如(在 PHP 中):
<?php
function xPathQuery($attr) {
$xml = simplexml_load_file('example.xml');
return $xml->xpath("//myElement[@content='{$attr}']");
}
xPathQuery('This should work fine');
# //myElement[@content='This should work fine']
xPathQuery('As should "this"');
# //myElement[@content='As should "this"']
xPathQuery('This\'ll cause problems');
# //myElement[@content='This'll cause problems']
xPathQuery('\']/../privateElement[@content=\'private data');
# //myElement[@content='']/../privateElement[@content='private data']
最后一个特别让人想起以前的 SQL 注入攻击。
现在,我知道事实上会有包含单引号的属性和包含双引号的属性。 由于这些是作为函数的参数提供的,那么清理这些输入的理想方法是什么?
I need to dynamically construct an XPath query for an element attribute, where the attribute value is provided by the user. I'm unsure how to go about cleaning or sanitizing this value to prevent the XPath equivalent of a SQL injection attack. For example (in PHP):
<?php
function xPathQuery($attr) {
$xml = simplexml_load_file('example.xml');
return $xml->xpath("//myElement[@content='{$attr}']");
}
xPathQuery('This should work fine');
# //myElement[@content='This should work fine']
xPathQuery('As should "this"');
# //myElement[@content='As should "this"']
xPathQuery('This\'ll cause problems');
# //myElement[@content='This'll cause problems']
xPathQuery('\']/../privateElement[@content=\'private data');
# //myElement[@content='']/../privateElement[@content='private data']
The last one in particular is reminiscent to the SQL injection attacks of yore.
Now, I know for a fact there will be attributes containing single quotes and attributes containing double quotes. Since these are provided as an argument to a function, what would be the ideal way to sanitize the input for these?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
XPath 实际上包含一种安全执行此操作的方法,因为它允许变量引用在表达式中采用
$varname
形式。 PHP SimpleXML 所基于的库提供了一个提供变量的接口,但是这个在您的示例中,未由 xpath 函数公开。作为一个演示,这实际上是多么简单:
这是使用 lxml,一个与底层库相同的 python 包装器SimpleXML,具有类似的 xpath 函数。 布尔值、数字和节点集也可以直接传递。
如果无法切换到功能更强大的 XPath 接口,则给定外部字符串时的解决方法如下(随意适应 PHP):
返回值可以直接插入表达式字符串中。 由于这实际上不太可读,因此其行为如下:
注意,您不能在 XML 文档外部以
'
形式使用转义,通用 XML 序列化例程也不适用。 但是,XPath concat 函数可用于在任何上下文中创建包含两种类型的引号的字符串。PHP 变体:
XPath does actually include a method of doing this safely, in that it permits variable references in the form
$varname
in expressions. The library on which PHP's SimpleXML is based provides an interface to supply variables, however this is not exposed by the xpath function in your example.As a demonstration of really how simple this can be:
That's using lxml, a python wrapper for the same underlying library as SimpleXML, with a similar xpath function. Booleans, numbers, and node-sets can also be passed directly.
If switching to a more capable XPath interface is not an option, a workaround when given external string would be something (feel free to adapt to PHP) along the lines of:
The return value can be directly inserted in your expression string. As that's not actually very readable, here is how it behaves:
Note, you can't use escaping in the form
'
outside of an XML document, nor are generic XML serialisation routines applicable. However, the XPath concat function can be used to create a string with both types of quotes in any context.PHP variant:
好的,它有什么作用?
它对所有出现的 & 进行编码。 和 " 作为字符串中的 & 和 ",这应该为您提供用于该特定用途的安全选择器。请注意,我还用 " 替换了 xpath 中的内部 '。 编辑:后来有人指出 ' 可以转义为 ',因此您可以使用您喜欢的任何字符串引用方法。
Ok, what does it do?
It encodes all occurences of & and " as & and " in the string, which should give you a safe selector for that particular use. Note that I also replaced the inner ' in the xpath with ". EDIT: It has since been pointed out that ' can be escaped as ', so you could use whichever string quoting method you prefer.
我将使用 DOM 创建一个单元素 XML 文档,使用 DOM 将元素的文本设置为提供的值,然后从 DOM 的 XML 字符串表示形式中获取文本。 这将保证所有角色转义都正确完成,而不仅仅是我偶然想到的角色转义。
编辑:我在这种情况下使用 DOM 的原因是编写 DOM 的人已经阅读了 XML 推荐,而我没有(至少,没有像他们那样谨慎)。 举一个简单的例子,如果文本包含 XML 不允许的字符(如 #x8),DOM 将报告解析错误,因为 DOM 的作者已经实现了 XML 建议的第 2.2 节。
现在,我可能会说,“好吧,我只需从 XML 建议中获取无效字符列表,然后将它们从输入中删除即可。” 当然。 让我们看看 XML 建议,然后...嗯,Unicode 代理块到底是什么? 我必须编写什么样的代码才能摆脱它们? 他们能首先进入我的文本吗?
假设我弄清楚了。 XML 建议如何指定字符表示形式是否还有我不知道的其他方面? 大概。 这些会对我想要实施的事情产生影响吗? 或许。
如果我让 DOM 为我进行字符编码,我就不必担心任何这些事情。
I'd create a single-element XML document using a DOM, use the DOM to set the element's text to the provided value, and then grab the text out of the DOM's string representation of the XML. This will guarantee that all of the character escaping is done properly, and not just the character escaping that I'm happening to think about offhand.
Edit: The reason I would use the DOM in situations like this is that the people who wrote the DOM have read the XML recommendation and I haven't (at least, not with the level of care they have). To pick a trivial example, the DOM will report a parse error if the text contains a character that XML doesn't allow (like #x8), because the DOM's authors have implemented section 2.2 of the XML recommendation.
Now, I might say, "well, I'll just get the list of invalid characters from the XML recommendation, and strip them out of the input." Sure. Let's just look the XML recommendation and...um, what the heck are the Unicode surrogate blocks? What kind of code do I have to write to get rid of them? Can they even get into my text in the first place?
Let's suppose I figure that out. Are there other aspects of how the XML recommendation specifies character representations that I don't know about? Probably. Will these have an impact on what I'm trying to implement? Maybe.
If I let the DOM do the character encoding for me, I don't have to worry about any of that stuff.