像 codepad.org 和 ideone.com 这样的网站如何将您的程序沙箱化?
我需要在我的网站上编译并运行用户提交的脚本,类似于 codepad 和 ideone 可以。如何对这些程序进行沙箱处理,以免恶意用户破坏我的服务器?
具体来说,我想将它们锁定在一个空目录中,并防止它们在该目录之外的任何地方读取或写入、消耗过多的内存或 CPU,或者执行任何其他恶意行为。
我需要从沙箱外部通过管道(通过标准输入/标准输出)与这些程序进行通信。
I need to compile and run user-submitted scripts on my site, similar to what codepad and ideone do. How can I sandbox these programs so that malicious users don't take down my server?
Specifically, I want to lock them inside an empty directory and prevent them from reading or writing anywhere outside of that, from consuming too much memory or CPU, or from doing anything else malicious.
I will need to communicate with these programs via pipes (over stdin/stdout) from outside the sandbox.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
codepad.org 有一些基于 geordi 的东西,它在 chroot 中运行所有内容(即仅限于文件系统)具有资源限制,并使用 ptrace API 来限制不受信任的程序使用系统调用。请参阅 http://codepad.org/about 。
我以前使用过 Systrace,这是另一个用于限制系统调用的实用程序。
如果策略设置正确,将阻止不受信任的程序破坏沙箱中的任何内容或访问不应访问的任何内容,因此可能不需要将程序放在单独的 chroot 中并为每次运行创建和删除它们。尽管这会提供另一层保护,但这可能不会造成伤害。
codepad.org has something based on geordi, which runs everything in a chroot (i.e restricted to a subtree of the filesystem) with resource restrictions, and uses the ptrace API to restrict the untrusted program's use of system calls. See http://codepad.org/about .
I've previously used Systrace, another utility for restricting system calls.
If the policy is set up properly, the untrusted program would be prevented from breaking anything in the sandbox or accessing anything it shouldn't, so there might be no need put programs in separate chroots and create and delete them for each run. Although that would provide another layer of protection, which probably wouldn't hurt.
前段时间,我正在寻找一个沙盒解决方案,用于计算机科学学生的自动作业评估系统。与其他所有事物一样,各种属性之间也需要权衡:
我最终决定采用基于 Linux 的多层架构:
0 级- 虚拟化:
通过对特定时间范围内的所有分配使用一个或多个虚拟机快照,可以获得多种优势:
敏感数据与非敏感数据清晰分离。
在该时间段结束时(例如每天一次或每次会话后),虚拟机将关闭并从快照重新启动,从而删除任何恶意或流氓代码的残余。
第一级计算机资源隔离:每个虚拟机的磁盘、CPU 和内存资源有限,且无法直接访问主机。
直接网络过滤:通过将虚拟机置于内部接口上,主机上的防火墙可以有选择地过滤网络连接。
例如,用于测试入门编程课程学生的虚拟机可能会阻止所有传入和传出连接,因为该级别的学生不会有网络编程作业。在较高级别,相应的虚拟机可以阻止所有传出连接,只允许来自教职员工内部的传入连接。
为基于 Web 的提交系统配备一个单独的虚拟机也是有意义的 - 该虚拟机可以将文件上传到评估虚拟机,但几乎不执行其他操作。
1 级 - 基本运行系统限制:
在包含传统访问和资源控制机制的 Unix 操作系统上:
每个沙盒程序都可以作为单独的用户执行,也许在单独的
chroot
监狱中执行。严格的用户权限,可能使用 ACL。
ulimit
对处理器时间和内存使用的资源限制。在
nice
下执行以降低更关键进程的优先级。在 Linux 上,您还可以使用ionice
和cpulimit
- 我不确定其他系统上存在哪些等效项。磁盘配额。
每用户连接过滤。
您可能希望以稍微特权一点的用户身份运行编译器;更多内存和 CPU 时间、对编译器工具和头文件的访问等
2 级 - 高级操作系统限制:
在 Linux 上,我认为这是使用 Linux 安全模块,例如 AppArmor 或 SELinux 来限制对特定文件和/或系统调用的访问。一些 Linux 发行版提供了一些沙箱安全配置文件,但要使此类功能正常工作仍然是一个漫长而痛苦的过程。
第 3 级 - 用户空间沙箱解决方案:
我已经成功小规模使用了Systrace,如我的这个旧答案。还有其他几种适用于 Linux 的沙箱解决方案,例如 libsandbox。与基于 LSM 的替代方案相比,此类解决方案可以对可能使用的系统调用提供更细粒度的控制,但可以对性能产生可衡量的影响。
4级 - 先发制人:
由于您将自己编译代码,而不是执行现有的二进制文件,因此您手中有一些额外的工具:
基于代码指标的限制;例如,一个简单的“Hello World”程序不应超过 20-30 行代码。
选择性访问系统库和头文件;如果您不希望用户调用
connect()
,您可以限制对socket.h
的访问。静态代码分析;禁止汇编代码、“怪异”字符串文字(即 shell 代码)和使用受限制的系统函数。
一个有能力的程序员也许能够绕过这些措施,但随着成本效益比的增加,他们坚持下去的可能性就会大大降低。
级别 0-5 - 监控和日志记录:
您应该监视系统的性能并记录所有失败的尝试。您不仅更有可能在系统级别中断正在进行的攻击,而且您还可以利用管理手段来保护您的系统,例如:
致电负责此类问题的安全官员。
找到你身边那个执着的小黑客,并为他们提供一份工作。
您需要的保护程度以及您愿意花费的资源来设置它取决于您。
Some time ago I was searching for a sandbox solution to use in an automated assignment evaluation system for CS students. Much like everything else, there is a trade-off between the various properties:
I eventually decided on a multi-tiered architecture, based on Linux:
Level 0 - Virtualization:
By using one or more virtual machine snapshots for all assignments within a specific time range, it was possible to gain several advantages:
Clear separation of sensitive from non-sensitive data.
At the end of the period (e.g. once per day or after each session) the VM is shutdown and restarted from the snapshot, thus removing any remnants of malicious or rogue code.
A first level of computer resource isolation: each VM has limited disk, CPU and memory resources and the host machine is not directly accessible.
Straight-forward network filtering: By having the VM on an internal interface, the firewall on the host can selectively filter the network connections.
For example, a VM intended for testing students of an introductory programming course could have all incoming and outgoing connections blocked, since students at that level would not have network programming assignments. At higher levels the corresponding VMs could e.g. have all outgoing connections blocked and allow incoming connection only from within the faculty.
It would also make sense to have a separate VM for the Web-based submission system - one that could upload files to the evaluation VMs, but do little else.
Level 1 - Basic cperating-system contraints:
On a Unix OS that would contain the traditional access and resource control mechanisms:
Each sandboxed program could be executed as a separate user, perhaps in a separate
chroot
jail.Strict user permissions, possibly with ACLs.
ulimit
resource limits on processor time and memory usage.Execution under
nice
to reduce priority over more critical processes. On Linux you could also useionice
andcpulimit
- I am not sure what equivalents exist on other systems.Disk quotas.
Per-user connection filtering.
You would probably want to run the compiler as a slightly more privileged user; more memory and CPU time, access to compiler tools and header files e.t.c.
Level 2 - Advanced operating-system constraints:
On Linux I consider that to be the use of a Linux Security Module, such as AppArmor or SELinux to limit access to specific files and/or system calls. Some Linux distributions offer some sandboxing security profiles, but it can still be a long and painful process to get something like this working correctly.
Level 3 - User-space sandboxing solutions:
I have successfully used Systrace in a small scale, as mentioned in this older answer of mine. There several other sandboxing solutions for Linux, such as libsandbox. Such solutions may provide more fine-grained control over the system calls that may be used than LSM-based alternatives, but can have a measurable impact on performance.
Level 4 - Preemptive strikes:
Since you will be compiling the code yourself, rather than executing existing binaries, you have a few additional tools in your hands:
Restrictions based on code metrics; e.g. a simple "Hello World" program should never be larger than 20-30 lines of code.
Selective access to system libraries and header files; if you don't want your users to call
connect()
you might just restrict access tosocket.h
.Static code analysis; disallow assembly code, "weird" string literals (i.e. shell-code) and the use of restricted system functions.
A competent programmer might be able to get around such measures, but as the cost-to-benefit ratio increases they would be far less likely to persist.
Level 0-5 - Monitoring and logging:
You should be monitoring the performance of your system and logging all failed attempts. Not only would you be more likely to interrupt an in-progress attack at a system level, but you might be able to make use of administrative means to protect your system, such as:
calling whatever security officials are in charge of such issues.
finding that persistent little hacker of yours and offering them a job.
The degree of protection that you need and the resources that you are willing to expend to set it up are up to you.
我是 @thkala 提到的 libsandbox 的开发人员,我确实推荐它在您的项目中使用。
关于@thkala的回答的一些附加评论,
connect()
之类的系统函数。这是因为用户代码可以 (1) 自行声明函数原型,而不包含系统头文件,或者 (2) 调用底层的内核级系统调用,而无需接触libc
中的包装函数;I am the developer of libsandbox mentioned by @thkala, and I do recommend it for use in your project.
Some additional comments on @thkala's answer,
connect()
from being called. This is because user code can (1) declare function prototypes by themselves without including system headers, or (2) invoke the underlying, kernel-land system calls without touching wrapper functions inlibc
;