沙箱来执行可能不友好的Python代码
假设互联网上有一台服务器,可以向其发送一段代码进行评估。在某个时刻,服务器获取已提交的所有代码,并开始运行和评估它。然而,在某些时候它肯定会遇到一些邪恶的程序员发送的“os.system('rm -rf *')”。除了“rm -rf”之外,您可能会期望人们尝试使用服务器发送垃圾邮件或对某人进行攻击,或者玩弄“while True:pass”之类的事情。
有没有办法与这种不友好/不可信的代码合作?我特别对 python 的解决方案感兴趣。但是,如果您有任何其他语言的信息,请分享。
Let's say there is a server on the internet that one can send a piece of code to for evaluation. At some point server takes all code that has been submitted, and starts running and evaluating it. However, at some point it will definitely bump into "os.system('rm -rf *')" sent by some evil programmer. Apart from "rm -rf" you could expect people try using the server to send spam or dos someone, or fool around with "while True: pass" kind of things.
Is there a way to coop with such unfriendly/untrusted code? In particular I'm interested in a solution for python. However if you have info for any other language, please share.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果您不特定于 CPython 实现,则应考虑查看 PyPy [wiki] 出于这些目的 - 这种 Python 方言允许透明的代码沙箱。
否则,您可以在
exec
或eval
的相应全局/局部参数中提供假的__builtin__
和__builtins__
。此外,您可以提供类似字典的对象而不是真正的字典,并跟踪不受信任的代码对其名称空间的操作。
此外,您实际上可以跟踪该代码(在执行任何其他代码之前在受限制的环境中发出
sys.settrace()
),以便在出现问题时可以中断执行。如果没有任何解决方案是可接受的,请使用操作系统级沙箱(例如 chroot、unionfs 和标准多进程 python 模块)在单独的安全进程中生成代码工作线程。
If you are not specific to CPython implementation, you should consider looking at PyPy[wiki] for these purposes — this Python dialect allows transparent code sandboxing.
Otherwise, you can provide fake
__builtin__
and__builtins__
in the corresponding globals/locals arguments toexec
oreval
.Moreover, you can provide dictionary-like object instead of real dictionary and trace what untrusted code does with it's namespace.
Moreover, you can actually trace that code (issuing
sys.settrace()
inside restricted environment before any other code executed) so you can break execution if something will go bad.If none of solutions is acceptable, use OS-level sandboxing like
chroot
,unionfs
and standardmultiprocess
python module to spawn code worker in separate secured process.您可以检查 pysandbox ,它就是这样做的,尽管如果您负担得起的话,VM 路线可能更安全。
You can check pysandbox which does just that, though the VM route is probably safer if you can afford it.
不可能为此提供绝对的解决方案,因为“坏”的定义很难确定。
打开和写入文件是坏还是好?如果该文件是 /dev/ram 怎么办?
您可以分析行为特征,也可以尝试阻止任何可能不好的事情,但您永远不会获胜。 Javascript 就是一个很好的例子,人们一直在他们的计算机上运行任意的 javascript 代码——它应该是沙箱的,但会出现各种各样的安全问题和边缘条件。
我并不是说不要尝试,您会从这个过程中学到很多东西。
许多公司花费了数百万美元(英特尔刚刚在迈克菲上花费了数十亿美元)试图了解如何检测“不良代码”,而每天运行迈克菲防病毒软件的机器都会感染病毒。 Python 代码的危险性并不比 C 低。您可以运行系统调用、绑定到 C 库等。
It's impossible to provide an absolute solution for this because the definition of 'bad' is pretty hard to nail down.
Is opening and writing to a file bad or good? What if that file is /dev/ram?
You can profile signatures of behavior, or you can try to block anything that might be bad, but you'll never win. Javascript is a pretty good example of this, people run arbitrary javascript code all the time on their computers -- it's supposed to be sandboxed but there's all sorts of security problems and edge conditions that crop up.
I'm not saying don't try, you'll learn a lot from the process.
Many companies have spent millions (Intel just spent billions on McAffee) trying to understand how to detect 'bad code' -- and every day machines running McAffe anti-virus get infected with viruses. Python code isn't any less dangerous than C. You can run system calls, bind to C libraries, etc.
我会认真考虑虚拟化环境来运行这些东西,这样无论你实现什么机制,都可以通过虚拟机的配置再次防火墙。
顺便说一句,用户数量以及您期望测试/运行的代码类型将对选择产生相当大的影响。如果他们不希望链接到文件或数据库,或者运行计算密集型任务,并且您的压力非常低,那么您可以通过完全阻止文件访问并在进程被杀死之前对其施加时间限制来几乎没问题。该提交被标记为过于昂贵或恶意。
如果您要测试的代码可能是任意 Django 扩展或页面,那么您可能需要做很多工作。
I would seriously consider virtualizing the environment to run this stuff, so that exploits in whatever mechanism you implement can be firewalled one more time by the configuration of the virtual machine.
Number of users and what kind of code you expect to test/run would have considerable influence on choices btw. If they aren't expected to link to files or databases, or run computationally intensive tasks, and you have very low pressure, you could be almost fine by just preventing file access entirely and imposing a time limit on the process before it gets killed and the submission flagged as too expensive or malicious.
If the code you're supposed to test might be any arbitrary Django extension or page, then you're in for a lot of work probably.
您可以尝试一些通用的 sanbox,例如 Sydbox 或 Gentoo 的沙箱。它们不是特定于 Python 的。
两者都可以配置为限制对某些目录的读/写。 Sydbox 甚至可以沙箱套接字。
You can try some generic sanbox such as Sydbox or Gentoo's sandbox. They are not Python-specific.
Both can be configured to restrict read/write to some directories. Sydbox can even sandbox sockets.
我认为像这样的修复将非常困难,这让我想起了我参加的一次关于在虚拟环境中编程的好处的讲座。
如果你真的这么做了,如果他们打扰了,那就很酷了。它暂时不会解决 True: pass 但 rm -rf / 并不重要。
I think a fix like this is going to be really hard and it reminds me of a lecture I attended about the benefits of programming in a virtual environment.
If you're doing it virtually its cool if they bugger it. It wont solve a while True: pass but rm -rf / won't matter.
除非我弄错了(而且很可能是这样),这就是 Google 为 App Engine 更改 Python 的主要原因。您在他们的服务器上运行 Python 代码,但他们删除了写入文件的功能。所有数据都保存在“nosql”数据库中。
这不是对您问题的直接回答,而是在某些情况下如何处理此问题的示例。
Unless I'm mistaken (and I very well might be), this is much of the reason behind the way Google changed Python for the App Engine. You run Python code on their server, but they've removed the ability to write to files. All data is saved in the "nosql" database.
It's not a direct answer to your question, but an example of how this problem has been dealt with in some circumstances.