Python 的安全性“eval” 对于列表反序列化

发布于 2024-07-27 03:51:32 字数 1373 浏览 2 评论 0原文

在这种情况下是否可能发生任何安全漏洞:

eval(repr(unsanitized_user_input), {"__builtins__": None}, {"True":True, "False":False})

其中 unsanitized_user_input 是一个 str 对象。 该字符串是用户生成的,可能很糟糕。 假设我们的 Web 框架没有让我们失望,那么它是来自 Python 内置函数的真正的 str 实例。

如果这很危险,我们可以对输入采取任何措施以确保其安全吗?

我们绝对不想执行字符串中包含的任何内容。

另请参阅:

更大的上下文是(我相信)对问题来说并不重要的是,我们有数千个这样的:

repr([unsanitized_user_input_1,
      unsanitized_user_input_2,
      unsanitized_user_input_3,
      unsanitized_user_input_4,
      ...])

在某些情况下是嵌套的:

repr([[unsanitized_user_input_1,
       unsanitized_user_input_2],
      [unsanitized_user_input_3,
       unsanitized_user_input_4],
       ...])

它们本身使用 repr() 转换为字符串,放入持久存储中,并最终使用 eval 读回内存。

Eval 从持久存储中反序列化字符串的速度比 pickle 和 simplejson 快得多。 解释器是 Python 2.5,因此 json 和 ast 不可用。 不允许使用 C 模块,也不允许使用 cPickle。

Are there any security exploits that could occur in this scenario:

eval(repr(unsanitized_user_input), {"__builtins__": None}, {"True":True, "False":False})

where unsanitized_user_input is a str object. The string is user-generated and could be nasty. Assuming our web framework hasn't failed us, it's a real honest-to-god str instance from the Python builtins.

If this is dangerous, can we do anything to the input to make it safe?

We definitely don't want to execute anything contained in the string.

See also:

The larger context which is (I believe) not essential to the question is that we have thousands of these:

repr([unsanitized_user_input_1,
      unsanitized_user_input_2,
      unsanitized_user_input_3,
      unsanitized_user_input_4,
      ...])

in some cases nested:

repr([[unsanitized_user_input_1,
       unsanitized_user_input_2],
      [unsanitized_user_input_3,
       unsanitized_user_input_4],
       ...])

which are themselves converted to strings with repr(), put in persistent storage, and eventually read back into memory with eval.

Eval deserialized the strings from persistent storage much faster than pickle and simplejson. The interpreter is Python 2.5 so json and ast aren't available. No C modules are allowed and cPickle is not allowed.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

゛时过境迁 2024-08-03 03:51:32

这确实很危险,最安全的替代方案是 ast.literal_eval (请参阅 ast 标准库中的模块)。 当然,您可以构建和更改 ast 以在评估生成的 AST 之前(当它简化为文字时)提供变量评估等。

eval 的可能利用始于它可以获取的任何对象(这里说 True),然后通过 .__class_ 到达其类型对象,等等,直到 >object,然后获取它的子类...基本上它可以获取任何对象类型并造成严重破坏。 我可以更具体,但我不想在公共论坛上这样做(该漏洞是众所周知的,但考虑到有多少人仍然忽略它,向想要成为脚本小子的人透露它可能会使事情变得更糟......只需避免 eval 对未经处理的用户输入进行评估,从此过上幸福的生活!-)。

It is indeed dangerous and the safest alternative is ast.literal_eval (see the ast module in the standard library). You can of course build and alter an ast to provide e.g. evaluation of variables and the like before you eval the resulting AST (when it's down to literals).

The possible exploit of eval starts with any object it can get its hands on (say True here) and going via .__class_ to its type object, etc. up to object, then gets its subclasses... basically it can get to ANY object type and wreck havoc. I can be more specific but I'd rather not do it in a public forum (the exploit is well known, but considering how many people still ignore it, revealing it to wannabe script kiddies could make things worse... just avoid eval on unsanitized user input and live happily ever after!-).

何其悲哀 2024-08-03 03:51:32

如果您可以毫无疑问地证明 unsanitized_user_input 是来自 Python 内置函数的 str 实例,没有任何篡改,那么这始终是安全的。 事实上,即使没有所有这些额外的参数,它也是安全的,因为对于所有此类字符串对象,eval(repr(astr)) = astr 都是如此。 你输入一个字符串,你就会取出一个字符串。 你所做的只是逃避和摆脱它。

这一切都让我认为 eval(repr(x)) 不是你想要的——除非有人给你一个 unsanitized_user_input 对象,否则任何代码都不会被执行。看起来像字符串,但实际上不是,但这是一个不同的问题 - 除非您尝试以最慢的方式复制字符串实例:D。

If you can prove beyond doubt that unsanitized_user_input is a str instance from the Python built-ins with nothing tampered, then this is always safe. In fact, it'll be safe even without all those extra arguments since eval(repr(astr)) = astr for all such string objects. You put in a string, you get back out a string. All you did was escape and unescape it.

This all leads me to think that eval(repr(x)) isn't what you want--no code will ever be executed unless someone gives you an unsanitized_user_input object that looks like a string but isn't, but that's a different question--unless you're trying to copy a string instance in the slowest way possible :D.

月亮坠入山谷 2024-08-03 03:51:32

对于您所描述的所有内容,从技术上讲,评估 rered 字符串是安全的,但是,无论如何,我都会避免这样做,因为它会带来麻烦:

  • 可能存在一些奇怪的极端情况,您假设只存储 rered 字符串(例如,无法立即复制的错误/不同的存储路径会成为代码注入漏洞,否则可能无法利用)

  • 即使现在一切正常,假设可能会在某些时候发生变化,并且未经处理的数据可能会发生变化。被不知道 eval 代码的人存储在该字段中。

  • 您的代码可能会在您没有考虑到的情况下被重用(或更糟糕的是复制+粘贴)。

    您的

正如 Alex Martelli 指出的,在 python2.6 及更高版本中,有 ast .literal_eval 它将安全地处理字符串和其他简单数据类型(如元组)。 这可能是最安全、最完整的解决方案。

然而,另一种可能性是使用string-escape编解码器。 这比 eval 快得多(根据 timeit 大约是 10 倍),在比literal_eval 更早的版本中可用,并且应该执行您想要的操作:

>>> s = 'he\nllo\' wo"rld\0\x03\r\n\tabc'
>>> repr(s)[1:-1].decode('string-escape') == s
True

([1:-1] 是去除 repr 添加的外部引号。)

With everything as you describe, it is technically safe to eval repred strings, however, I'd avoid doing it anyway as it's asking for trouble:

  • There could be some weird corner-case where your assumption that only repred strings are stored (eg. a bug / different pathway into the storage that doesn't repr instantly becmes a code injection exploit where it might otherwise be unexploitable)

  • Even if everything is OK now, assumptions might change at some point, and unsanitised data may get stored in that field by someone unaware of the eval code.

  • Your code may get reused (or worse, copy+pasted) into a situation you didn't consider.

As Alex Martelli pointed out, in python2.6 and higher, there is ast.literal_eval which will safely handle both strings and other simple datatypes like tuples. This is probably the safest and most complete solution.

Another possibility however is to use the string-escape codec. This is much faster than eval (about 10 times according to timeit), available in earlier versions than literal_eval, and should do what you want:

>>> s = 'he\nllo\' wo"rld\0\x03\r\n\tabc'
>>> repr(s)[1:-1].decode('string-escape') == s
True

(The [1:-1] is to strip the outer quotes repr adds.)

终陌 2024-08-03 03:51:32

一般来说,您绝不应该允许任何人发布代码。

所谓的“付费专业程序员”很难编写出真正有效的代码。

在没有正式质量保证的情况下接受匿名公众的代码是所有可能情况中最糟糕的情况。

专业程序员——如果没有良好、可靠的正式质量保证——将会对几乎所有网站进行哈希处理。 事实上,我正在对付费专业人士的一些令人难以置信的糟糕代码进行逆向工程。

允许非专业人士(不受 QA 阻碍)发布代码的想法确实很可怕。

Generally, you should never allow anyone to post code.

So called "paid professional programmers" have a hard-enough time writing code that actually works.

Accepting code from the anonymous public -- without benefit of formal QA -- is the worst of all possible scenarios.

Professional programmers -- without good, solid formal QA -- will make a hash of almost any web site. Indeed, I'm reverse engineering some unbelievably bad code from paid professionals.

The idea of allowing a non-professional -- unencumbered by QA -- to post code is truly terrifying.

阳光的暖冬 2024-08-03 03:51:32
repr([unsanitized_user_input_1, 
        unsanitized_user_input_2, 
        ... 
  

... unsanitized_user_input 是一个 str 对象

您不必序列化字符串以将它们存储在数据库中。.

如果这些都是字符串,正如您提到的 - 为什么可以您不只是将字符串存储在 db.StringListProperty 中吗?

嵌套条目可能有点复杂,但为什么会这样呢? 当您必须求助于 eval 来从数据库获取数据时,您可能做错了什么。

您不能将每个 unsanitized_user_input_x 存储为它自己的 db.StringProperty行,并按参考字段对它们进行分组?

其中任何一个可能都不适用,因为我不知道你想要实现什么,但我的观点是 - 你能不能以不必依赖 eval< 的方式构建数据/code> (并且还依赖它不是安全问题)?

repr([unsanitized_user_input_1,
      unsanitized_user_input_2,
      ...

... unsanitized_user_input is a str object

You shouldn't have to serialise strings to store them in a database..

If these are all strings, as you mentioned - why can't you just store the strings in a db.StringListProperty?

The nested entries might be a bit more complicated, but why is this the case? When you have to resort to eval to get data from the database, you're probably doing something wrong..

Couldn't you store each unsanitized_user_input_x as it's own db.StringProperty row, and have group them by an reference field?

Either of those may not be applicable, since I've no idea what you're trying to achieve, but my point is - can you not structure the data in a way you where don't have to rely on eval (and also rely on it not being a security issue)?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文