细粒度沙箱

发布于 2024-08-07 10:08:27 字数 540 浏览 10 评论 0原文

场景:在 Java 或 Python 等字节码虚拟机中运行的程序想要评估(通过即时编译为字节码然后运行)一个其代码自动生成或从外部提供的函数。

棘手的一点是该函数的代码不可信——它可能是由遗传编程等随机方法生成的,甚至是由对手提供的。因此,需要强制它作为纯函数运行——它可以返回一个值,但它可能没有任何副作用,即它可能不会以任何方式改变程序的任何现有数据。

另一个棘手的问题是该函数可能有调用程序的某些现有函数的合理需求;其中一些函数可能会产生副作用,但只要它们被可疑函数调用,就应该防止它们实际上产生任何持久的影响。

另外,优选的是,不对可疑函数的编码风格施加任何限制,例如,它可以自由地对其自身创建的任何数据结构执行破坏性更新,仅要求其整体效果是纯函数式的。

此外,该解决方案最好具有相当低的开销,因为这可能需要执行数十亿次;例如,最好避免为每个此类功能创建一个全新的虚拟机。

这不一定在 Java 或 Python 等现有虚拟机中可行;如果有必要围绕这个用例设计一个虚拟机,那就这样吧。

对于这个问题是否有任何已知的解决方案(或非解决方案,即已知不起作用的东西)?

Scenario: a program running in a byte code virtual machine like Java or Python, wants to evaluate (by compiling on the fly to byte code and then running) a function whose code was automatically generated or supplied from outside.

The tricky bit is that the function's code is not trusted -- it may have been generated by a stochastic method like genetic programming, or even supplied by an adversary. Therefore it is desired to enforce that it behaves as a pure function -- it can return a value, but it may not have any side effects, i.e. it may not alter any of the program's existing data in any way.

Another tricky bit is that the function may have a legitimate need to call some of the program's existing functions; some of these functions may have side effects, but these should be prevented from actually having any lasting effect as long as they are being called by the suspect function.

Also, it is preferable that no constraint be placed on the coding style of the suspect function, e.g. it is free to perform destructive updates on any data structures that it creates itself, only its overall effect is required to be purely functional.

Furthermore, it is preferable that the solution have reasonably low overhead, because this may need to be done billions of times; it would be better, for example, to avoid having to fork a whole new virtual machine for each such function.

This doesn't necessarily have to be doable in an existing virtual machine like Java or Python; if it is necessary to design a virtual machine around this use case, then so be it.

Are there any already known solutions (or non-solutions, i.e. things that are known not to work) for this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

仅一夜美梦 2024-08-14 10:08:27

我和其他许多人之前已经构建了用于遗传编程目的的语言。如果构建一种新语言是一种选择,那么这样的解决方案已经存在。既然有自动函数生成的技术,那么提供函数库应该是微不足道的。该系统实际上将构成一个沙盒环境。这些函数的任何副作用都将限制在程序可访问的空间内。

I, and many others have previously constructed languages for genetic programming purposes. If constructing a new language is an option, then such solutions already exists. Since there are techniques for automatic function generation, it should be trivial to provide function libraries. This system will, in effect constitute a sand-boxed environment. Any side effects of these functions will be limited to the program accessible space.

走走停停 2024-08-14 10:08:27

我想沙箱是你唯一的选择。尝试分析程序并确定它是否安全相当于一个暂停问题。 CLR 内置了安全性,允许这样的限制,我想 Java 也有类似的。我认为 Python 不会。

I'd imagine sandboxing is your only option. Trying to analyze the program and determine if it is safe is a halting problem equivalent. The CLR has security built in that allow restrictions like this, I imagine Java has similar ones. I don't think Python does.

溺孤伤于心 2024-08-14 10:08:27

好吧,一般问题似乎无法解决:没有一种终止策略可以将本质上有状态的计算与可能无状态的计算进行排序。除非字节码经过专门构造以提供克服此问题所需的强类型约束,否则您将迷失方向。 Curien 写了很多关于可以和不能从黑色推断出哪些事物的文章箱观察。

但是,如果您愿意向函数提供者提出更多要求,那么问题就需要提供证明代码(PCC)作为答案。我猜您知道 Necula 的工作,他特别关心确保汇编代码遵守内存使用的限制,例如不篡改范围外的状态;您可能不知道在相当常见的情况下自动推理证明所做的工作:PCC 可能比您想象的更容易。

Well, the general problem seems to be unsolveable: there isn't a terminating strategy for sorting inherently stateful computations from the possibly state-free. Unless the byte code is especially constructed to provide the kind strong type constraints required to overcome this, then you'll be lost. Curien has written much about what kind of things can and can't be inferred from black box observations.

But if you are willing to demand more from your function provider, the question is begging for proof-carrying code (PCC) as an answer. I'm guessing you are aware of the work of Necula, who was particularly concerned with ensuring that assembly code respected constraints on memory usage, such as not tampering with out-of-scope state; you might not be aware of the work done on automatic inference of proofs in fairly common cases: it may be the case that PCC is an easier option than you think.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文