像 codepad.org 和 ideone.com 这样的网站如何将您的程序沙箱化？

发布于 2024-09-19 00:27:54 字数 304 浏览 11 评论 0原文

我需要在我的网站上编译并运行用户提交的脚本，类似于 codepad 和 ideone 可以。如何对这些程序进行沙箱处理，以免恶意用户破坏我的服务器？

具体来说，我想将它们锁定在一个空目录中，并防止它们在该目录之外的任何地方读取或写入、消耗过多的内存或 CPU，或者执行任何其他恶意行为。

我需要从沙箱外部通过管道（通过标准输入/标准输出）与这些程序进行通信。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一桥轻雨一伞开 2024-09-26 00:27:54

codepad.org 有一些基于 geordi 的东西，它在 chroot 中运行所有内容（即仅限于文件系统）具有资源限制，并使用 ptrace API 来限制不受信任的程序使用系统调用。请参阅 http://codepad.org/about 。

我以前使用过 Systrace，这是另一个用于限制系统调用的实用程序。

如果策略设置正确，将阻止不受信任的程序破坏沙箱中的任何内容或访问不应访问的任何内容，因此可能不需要将程序放在单独的 chroot 中并为每次运行创建和删除它们。尽管这会提供另一层保护，但这可能不会造成伤害。

回复收藏 0 原文

滥情空心 2024-09-26 00:27:54

前段时间，我正在寻找一个沙盒解决方案，用于计算机科学学生的自动作业评估系统。与其他所有事物一样，各种属性之间也需要权衡：

隔离和访问控制粒度
性能和安装/配置的简易性

我最终决定采用基于 Linux 的多层架构：

0 级- 虚拟化：
通过对特定时间范围内的所有分配使用一个或多个虚拟机快照，可以获得多种优势：
- 敏感数据与非敏感数据清晰分离。
- 在该时间段结束时（例如每天一次或每次会话后），虚拟机将关闭并从快照重新启动，从而删除任何恶意或流氓代码的残余。
- 第一级计算机资源隔离：每个虚拟机的磁盘、CPU 和内存资源有限，且无法直接访问主机。
- 直接网络过滤：通过将虚拟机置于内部接口上，主机上的防火墙可以有选择地过滤网络连接。
  例如，用于测试入门编程课程学生的虚拟机可能会阻止所有传入和传出连接，因为该级别的学生不会有网络编程作业。在较高级别，相应的虚拟机可以阻止所有传出连接，只允许来自教职员工内部的传入连接。
为基于 Web 的提交系统配备一个单独的虚拟机也是有意义的 - 该虚拟机可以将文件上传到评估虚拟机，但几乎不执行其他操作。
1 级 - 基本运行系统限制：
在包含传统访问和资源控制机制的 Unix 操作系统上：
- 每个沙盒程序都可以作为单独的用户执行，也许在单独的 chroot 监狱中执行。
- 严格的用户权限，可能使用 ACL。
- ulimit 对处理器时间和内存使用的资源限制。
- 在nice下执行以降低更关键进程的优先级。在 Linux 上，您还可以使用 ionice 和 cpulimit - 我不确定其他系统上存在哪些等效项。
- 磁盘配额。
- 每用户连接过滤。
您可能希望以稍微特权一点的用户身份运行编译器；更多内存和 CPU 时间、对编译器工具和头文件的访问等
2 级 - 高级操作系统限制：
在 Linux 上，我认为这是使用 Linux 安全模块，例如 AppArmor 或 SELinux 来限制对特定文件和/或系统调用的访问。一些 Linux 发行版提供了一些沙箱安全配置文件，但要使此类功能正常工作仍然是一个漫长而痛苦的过程。
第 3 级 - 用户空间沙箱解决方案：
我已经成功小规模使用了Systrace，如我的这个旧答案。还有其他几种适用于 Linux 的沙箱解决方案，例如 libsandbox。与基于 LSM 的替代方案相比，此类解决方案可以对可能使用的系统调用提供更细粒度的控制，但可以对性能产生可衡量的影响。
4级 - 先发制人：
由于您将自己编译代码，而不是执行现有的二进制文件，因此您手中有一些额外的工具：
- 基于代码指标的限制；例如，一个简单的“Hello World”程序不应超过 20-30 行代码。
- 选择性访问系统库和头文件；如果您不希望用户调用 connect()，您可以限制对 socket.h 的访问。
- 静态代码分析；禁止汇编代码、“怪异”字符串文字（即 shell 代码）和使用受限制的系统函数。
一个有能力的程序员也许能够绕过这些措施，但随着成本效益比的增加，他们坚持下去的可能性就会大大降低。
级别 0-5 - 监控和日志记录：
您应该监视系统的性能并记录所有失败的尝试。您不仅更有可能在系统级别中断正在进行的攻击，而且您还可以利用管理手段来保护您的系统，例如：
- 致电负责此类问题的安全官员。
- 找到你身边那个执着的小黑客，并为他们提供一份工作。

您需要的保护程度以及您愿意花费的资源来设置它取决于您。

Some time ago I was searching for a sandbox solution to use in an automated assignment evaluation system for CS students. Much like everything else, there is a trade-off between the various properties:

Isolation and access control granularity
Performance and ease of installation/configuration

I eventually decided on a multi-tiered architecture, based on Linux:

Level 0 - Virtualization:
By using one or more virtual machine snapshots for all assignments within a specific time range, it was possible to gain several advantages:
- Clear separation of sensitive from non-sensitive data.
- At the end of the period (e.g. once per day or after each session) the VM is shutdown and restarted from the snapshot, thus removing any remnants of malicious or rogue code.
- A first level of computer resource isolation: each VM has limited disk, CPU and memory resources and the host machine is not directly accessible.
- Straight-forward network filtering: By having the VM on an internal interface, the firewall on the host can selectively filter the network connections.
  For example, a VM intended for testing students of an introductory programming course could have all incoming and outgoing connections blocked, since students at that level would not have network programming assignments. At higher levels the corresponding VMs could e.g. have all outgoing connections blocked and allow incoming connection only from within the faculty.
It would also make sense to have a separate VM for the Web-based submission system - one that could upload files to the evaluation VMs, but do little else.
Level 1 - Basic cperating-system contraints:
On a Unix OS that would contain the traditional access and resource control mechanisms:
- Each sandboxed program could be executed as a separate user, perhaps in a separate chroot jail.
- Strict user permissions, possibly with ACLs.
- ulimit resource limits on processor time and memory usage.
- Execution under nice to reduce priority over more critical processes. On Linux you could also use ionice and cpulimit - I am not sure what equivalents exist on other systems.
- Disk quotas.
- Per-user connection filtering.
You would probably want to run the compiler as a slightly more privileged user; more memory and CPU time, access to compiler tools and header files e.t.c.
Level 2 - Advanced operating-system constraints:
On Linux I consider that to be the use of a Linux Security Module, such as AppArmor or SELinux to limit access to specific files and/or system calls. Some Linux distributions offer some sandboxing security profiles, but it can still be a long and painful process to get something like this working correctly.
Level 3 - User-space sandboxing solutions:
I have successfully used Systrace in a small scale, as mentioned in this older answer of mine. There several other sandboxing solutions for Linux, such as libsandbox. Such solutions may provide more fine-grained control over the system calls that may be used than LSM-based alternatives, but can have a measurable impact on performance.
Level 4 - Preemptive strikes:
Since you will be compiling the code yourself, rather than executing existing binaries, you have a few additional tools in your hands:
- Restrictions based on code metrics; e.g. a simple "Hello World" program should never be larger than 20-30 lines of code.
- Selective access to system libraries and header files; if you don't want your users to call connect() you might just restrict access to socket.h.
- Static code analysis; disallow assembly code, "weird" string literals (i.e. shell-code) and the use of restricted system functions.
A competent programmer might be able to get around such measures, but as the cost-to-benefit ratio increases they would be far less likely to persist.
Level 0-5 - Monitoring and logging:
You should be monitoring the performance of your system and logging all failed attempts. Not only would you be more likely to interrupt an in-progress attack at a system level, but you might be able to make use of administrative means to protect your system, such as:
- calling whatever security officials are in charge of such issues.
- finding that persistent little hacker of yours and offering them a job.

The degree of protection that you need and the resources that you are willing to expend to set it up are up to you.

回复收藏 0 原文

德意的啸 2024-09-26 00:27:54

我是 @thkala 提到的 libsandbox 的开发人员，我确实推荐它在您的项目中使用。

关于@thkala的回答的一些附加评论，

将 libsandbox 分类为用户空间工具是公平的，但是libsandbox 确实集成了标准操作系统级安全机制（即 chroot、setuid 和资源配额）；
限制对 C/C++ 标头的访问或对用户代码的静态分析不会阻止调用诸如 connect() 之类的系统函数。这是因为用户代码可以 (1) 自行声明函数原型，而不包含系统头文件，或者 (2) 调用底层的内核级系统调用，而无需接触 libc 中的包装函数；
编译时保护也值得关注，因为恶意 C/C++ 代码可能会通过无限模板递归或预处理宏扩展耗尽 CPU；