在 Python 中嵌入低性能脚本语言

发布于 2024-10-18 15:13:34 字数 1539 浏览 8 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

紧拥背影 2024-10-25 15:13:34

这是我对这个问题的看法。要求用户脚本在普通 CPython 中运行意味着您要么需要为您的迷你语言编写一个解释器,要么将其编译为 Python 字节码(或使用 Python 作为源语言),然后在执行之前“清理”字节码。

我基于用户可以编写的假设给出了一个快速示例
他们的 Python 脚本,并且源代码和字节码足以
通过从解析中过滤不安全语法的某种组合进行清理
树和/或从字节码中删除不安全的操作码。

解决方案的第二部分要求用户脚本字节码是
定期被看门狗任务中断,这将确保用户
脚本不超过某些操作码限制,并且所有这些都可以在普通 CPython 上运行。

我的尝试总结,主要集中在问题的第二部分。

  • 用户脚本是用 Python 编写的。
  • 使用byteplay来过滤和修改字节码。
  • 检测用户的字节码以插入操作码计数器并调用上下文切换到看门狗任务的函数。
  • 使用greenlet执行用户的字节码,并进行yield切换
    用户脚本和看门狗协程之间。
  • 看门狗对操作码的数量强制执行预设限制
    在引发错误之前执行。

希望这至少朝着正确的方向发展。我有兴趣听听
当您找到解决方案时,请详细了解您的解决方案。

lowperf.py 的源代码:

# std
import ast
import dis
import sys
from pprint import pprint

# vendor
import byteplay
import greenlet

# bytecode snippet to increment our global opcode counter
INCREMENT = [
    (byteplay.LOAD_GLOBAL, '__op_counter'),
    (byteplay.LOAD_CONST, 1),
    (byteplay.INPLACE_ADD, None),
    (byteplay.STORE_GLOBAL, '__op_counter')
    ]

# bytecode snippet to perform a yield to our watchdog tasklet.
YIELD = [
    (byteplay.LOAD_GLOBAL, '__yield'),
    (byteplay.LOAD_GLOBAL, '__op_counter'),
    (byteplay.CALL_FUNCTION, 1),
    (byteplay.POP_TOP, None)
    ]

def instrument(orig):
    """
    Instrument bytecode.  We place a call to our yield function before
    jumps and returns.  You could choose alternate places depending on 
    your use case.
    """
    line_count = 0
    res = []
    for op, arg in orig.code:
        line_count += 1

        # NOTE: you could put an advanced bytecode filter here.

        # whenever a code block is loaded we must instrument it
        if op == byteplay.LOAD_CONST and isinstance(arg, byteplay.Code):
            code = instrument(arg)
            res.append((op, code))
            continue

        # 'setlineno' opcode is a safe place to increment our global 
        # opcode counter.
        if op == byteplay.SetLineno:
            res += INCREMENT
            line_count += 1

        # append the opcode and its argument
        res.append((op, arg))

        # if we're at a jump or return, or we've processed 10 lines of
        # source code, insert a call to our yield function.  you could 
        # choose other places to yield more appropriate for your app.
        if op in (byteplay.JUMP_ABSOLUTE, byteplay.RETURN_VALUE) \
                or line_count > 10:
            res += YIELD
            line_count = 0

    # finally, build and return new code object
    return byteplay.Code(res, orig.freevars, orig.args, orig.varargs,
        orig.varkwargs, orig.newlocals, orig.name, orig.filename,
        orig.firstlineno, orig.docstring)

def transform(path):
    """
    Transform the Python source into a form safe to execute and return
    the bytecode.
    """
    # NOTE: you could call ast.parse(data, path) here to get an
    # abstract syntax tree, then filter that tree down before compiling
    # it into bytecode.  i've skipped that step as it is pretty verbose.
    data = open(path, 'rb').read()
    suite = compile(data, path, 'exec')
    orig = byteplay.Code.from_code(suite)
    return instrument(orig)

def execute(path, limit = 40):
    """
    This transforms the user's source code into bytecode, instrumenting
    it, then kicks off the watchdog and user script tasklets.
    """
    code = transform(path)
    target = greenlet.greenlet(run_task)

    def watcher_task(op_count):
        """
        Task which is yielded to by the user script, making sure it doesn't
        use too many resources.
        """
        while 1:
            if op_count > limit:
                raise RuntimeError("script used too many resources")
            op_count = target.switch()

    watcher = greenlet.greenlet(watcher_task)
    target.switch(code, watcher.switch)

def run_task(code, yield_func):
    "This is the greenlet task which runs our user's script."
    globals_ = {'__yield': yield_func, '__op_counter': 0}
    eval(code.to_code(), globals_, globals_)

execute(sys.argv[1])

这是一个示例用户脚本 user.py

def otherfunc(b):
    return b * 7

def myfunc(a):
    for i in range(0, 20):
        print i, otherfunc(i + a + 3)

myfunc(2)

这是一个示例运行:

% python lowperf.py user.py

0 35
1 42
2 49
3 56
4 63
5 70
6 77
7 84
8 91
9 98
10 105
11 112
Traceback (most recent call last):
  File "lowperf.py", line 114, in <module>
    execute(sys.argv[1])
  File "lowperf.py", line 105, in execute
    target.switch(code, watcher.switch)
  File "lowperf.py", line 101, in watcher_task
    raise RuntimeError("script used too many resources")
RuntimeError: script used too many resources

Here is my take on this problem. Requiring that the user scripts run inside vanilla CPython means you either need to write an interpreter for your mini language, or compile it to Python bytecode (or use Python as your source language) and then "sanitize" the bytecode before executing it.

I've gone for a quick example based on the assumption that users can write
their scripts in Python, and that the source and bytecode can be sufficiently
sanitized through some combination of filtering unsafe syntax from the parse
tree and/or removing unsafe opcodes from the bytecode.

The second part of the solution requires that the user script bytecode be
periodically interrupted by a watchdog task which will ensure that the user
script does not exceed some opcode limit, and for all of this to run on vanilla CPython.

Summary of my attempt, which mostly focuses on the 2nd part of the problem.

  • User scripts are written in Python.
  • Use byteplay to filter and modify the bytecode.
  • Instrument the user's bytecode to insert an opcode counter and calls to a function which context switches to the watchdog task.
  • Use greenlet to execute the user's bytecode, with yields switching
    between the user's script and the watchdog coroutine.
  • The watchdog enforces a preset limit on the number of opcodes which can be
    executed before raising an error.

Hopefully this at least goes in the right direction. I'm interested to hear
more about your solution when you arrive at it.

Source code for lowperf.py:

# std
import ast
import dis
import sys
from pprint import pprint

# vendor
import byteplay
import greenlet

# bytecode snippet to increment our global opcode counter
INCREMENT = [
    (byteplay.LOAD_GLOBAL, '__op_counter'),
    (byteplay.LOAD_CONST, 1),
    (byteplay.INPLACE_ADD, None),
    (byteplay.STORE_GLOBAL, '__op_counter')
    ]

# bytecode snippet to perform a yield to our watchdog tasklet.
YIELD = [
    (byteplay.LOAD_GLOBAL, '__yield'),
    (byteplay.LOAD_GLOBAL, '__op_counter'),
    (byteplay.CALL_FUNCTION, 1),
    (byteplay.POP_TOP, None)
    ]

def instrument(orig):
    """
    Instrument bytecode.  We place a call to our yield function before
    jumps and returns.  You could choose alternate places depending on 
    your use case.
    """
    line_count = 0
    res = []
    for op, arg in orig.code:
        line_count += 1

        # NOTE: you could put an advanced bytecode filter here.

        # whenever a code block is loaded we must instrument it
        if op == byteplay.LOAD_CONST and isinstance(arg, byteplay.Code):
            code = instrument(arg)
            res.append((op, code))
            continue

        # 'setlineno' opcode is a safe place to increment our global 
        # opcode counter.
        if op == byteplay.SetLineno:
            res += INCREMENT
            line_count += 1

        # append the opcode and its argument
        res.append((op, arg))

        # if we're at a jump or return, or we've processed 10 lines of
        # source code, insert a call to our yield function.  you could 
        # choose other places to yield more appropriate for your app.
        if op in (byteplay.JUMP_ABSOLUTE, byteplay.RETURN_VALUE) \
                or line_count > 10:
            res += YIELD
            line_count = 0

    # finally, build and return new code object
    return byteplay.Code(res, orig.freevars, orig.args, orig.varargs,
        orig.varkwargs, orig.newlocals, orig.name, orig.filename,
        orig.firstlineno, orig.docstring)

def transform(path):
    """
    Transform the Python source into a form safe to execute and return
    the bytecode.
    """
    # NOTE: you could call ast.parse(data, path) here to get an
    # abstract syntax tree, then filter that tree down before compiling
    # it into bytecode.  i've skipped that step as it is pretty verbose.
    data = open(path, 'rb').read()
    suite = compile(data, path, 'exec')
    orig = byteplay.Code.from_code(suite)
    return instrument(orig)

def execute(path, limit = 40):
    """
    This transforms the user's source code into bytecode, instrumenting
    it, then kicks off the watchdog and user script tasklets.
    """
    code = transform(path)
    target = greenlet.greenlet(run_task)

    def watcher_task(op_count):
        """
        Task which is yielded to by the user script, making sure it doesn't
        use too many resources.
        """
        while 1:
            if op_count > limit:
                raise RuntimeError("script used too many resources")
            op_count = target.switch()

    watcher = greenlet.greenlet(watcher_task)
    target.switch(code, watcher.switch)

def run_task(code, yield_func):
    "This is the greenlet task which runs our user's script."
    globals_ = {'__yield': yield_func, '__op_counter': 0}
    eval(code.to_code(), globals_, globals_)

execute(sys.argv[1])

Here is a sample user script user.py:

def otherfunc(b):
    return b * 7

def myfunc(a):
    for i in range(0, 20):
        print i, otherfunc(i + a + 3)

myfunc(2)

Here is a sample run:

% python lowperf.py user.py

0 35
1 42
2 49
3 56
4 63
5 70
6 77
7 84
8 91
9 98
10 105
11 112
Traceback (most recent call last):
  File "lowperf.py", line 114, in <module>
    execute(sys.argv[1])
  File "lowperf.py", line 105, in execute
    target.switch(code, watcher.switch)
  File "lowperf.py", line 101, in watcher_task
    raise RuntimeError("script used too many resources")
RuntimeError: script used too many resources
寂寞美少年 2024-10-25 15:13:34

Jispy 非常适合!

  • 它是Python中的JavaScript解释器,主要是为了在Python中嵌入JS而构建的。

  • 值得注意的是,它提供了对递归和循环的检查和限制。正如所需要的。

  • 它可以轻松地让您使 Python 函数可用于 JavaScript 代码。

  • 默认情况下,它不会公开主机的文件系统或任何其他敏感元素。

全面披露:

  • Jispy 是我的项目。我显然对此有偏见。
  • 尽管如此,在这里,它似乎确实是完美的选择。

PS:

  • 这个答案是在这个问题提出大约三年后写的。
  • 这么晚的回答背后的动机很简单:
    鉴于 Jispy 对当前问题的严格限制,未来有类似需求的读者应该能够从中受益。

Jispy is the perfect fit!

  • It is a JavaScript interpreter in Python, built primarily for embedding JS in Python.

  • Notably, it provides checks and caps on recursion and looping. Just as is needed.

  • It easily allows you to make python functions available to JavaScript code.

  • By default, it doesn't expose the host's file system or any other sensitive element.

Full Disclosure:

  • Jispy is my project. I am obviously biased toward it.
  • Nonetheless, here, it really does seem to be the perfect fit.

PS:

  • This answer is being written ~3 years after this question was asked.
  • The motivation behind such a late answer is simple:
    Given how closely Jispy confines to the question at hand, future readers with similar requirements should be able to benefit from it.
风铃鹿 2024-10-25 15:13:34

尝试一下卢阿。你提到的语法与 Lua 的语法几乎相同。请参阅如何将 Lua 嵌入到 Python 3.x 中?

Try Lua. The syntax you mentioned is almost identical to Lua's. See How could I embed Lua into Python 3.x?

病毒体 2024-10-25 15:13:34

我还不知道有什么可以真正解决这个问题。

我认为你能做的最简单的事情就是用 python 编写你自己版本的 python 虚拟机。

我经常想到在 Cython 之类的东西中这样做,这样你就可以将它作为模块导入,并且你可以依靠现有的运行时来处理大多数困难的部分。

您可能已经能够使用 PyPy 生成 python-in-python 解释器,但是 PyPy 的输出是一个可以执行所有操作的运行时,包括实现内置类型的底层 PyObjects 的等效项以及所有这些,我认为这对于这种事。

您真正需要的是像执行堆栈中的帧一样工作的东西,然后是每个操作码的方法。我认为你甚至不需要自己实现它。您可以编写一个模块,将现有的框架对象公开给运行时。

无论如何,然后您只需维护自己的帧对象堆栈并处理字节码,并且您可以使用每秒字节码或其他方式来限制它。

I don't know of anything that really solves this problem yet.

I think the absolute simplest thing you could do would be to write your own version of the python virtual machine in python.

I've often thought of doing that in something like Cython so you could just import it as a module, and you could lean on the existing runtime for most of the hard bits.

You may already be able to generate a python-in-python interpreter with PyPy, but PyPy's output is a runtime that does EVERYTHING, including implementing the equivalent of the underlying PyObjects for built-in types and all that, and I think that's overkill for this kind of thing.

All you really need is something that works like a Frame in the execution stack, and then a method for each opcode. I don't think you even have to implement it yourself. You could just write a module that exposed the existing frame objects to the runtime.

Anyway, then you just maintain your own stack of frame objects and handle the bytecodes, and you can throttle it with bytecodes per second or whatever.

黎夕旧梦 2024-10-25 15:13:34

我在早期项目中使用 Python 作为“迷你配置语言”。我的方法是获取代码,使用 parser 模块对其进行解析,然后遍历生成代码的 AST 并剔除“不允许的”操作(例如定义类,称为 __< /code> 方法等)。

完成此操作后,创建了一个合成环境,其中仅包含“允许”的模块和变量,并评估其中的代码以获得我可以运行的东西。

这对我来说效果很好。我不知道它是否是防弹的,特别是如果你想为你的用户提供比我为配置语言所做的更多的权力。

至于时间限制,您可以在单独的线程或进程中运行程序,并在固定的时间后终止它。

I've used Python as a "mini config language" for an earlier project. My approach was to take the code, parse it using the parser module and then to walk the AST of the generated code and to kick out "unallowed" operations (e.g. defining classes, called __ methods etc.).

After I do this, a created a synthetic environment with only the modules and variables that were "allowed" and evaluated the code within that to get something I could run.

It worked nicely for me. I don't know if it's bullet proof especially if you want to give your users more power than I did for a config language.

As for the time limit, you could run your program in a separate thread or process and terminate it after a fixed amount of time.

画离情绘悲伤 2024-10-25 15:13:34

为什么不在 pysandbox 中使用 python 代码 http://pypi.python.org/pypi/pysandbox/1.0 .3

Why not python code in pysandbox http://pypi.python.org/pypi/pysandbox/1.0.3 ?

千纸鹤 2024-10-25 15:13:34

看看 LimPy。它代表 Limited Python,正是为此目的而构建的。

在这样的环境中,用户需要编写基本逻辑来控制用户体验。我不知道它将如何与运行时限制交互,但我想如果您愿意编写一些代码,您就可以做到。

Take a look at LimPy. It stands for Limited Python and was built for exactly this purpose.

There was an environment where users needed to write basic logic to control a user experience. I don't know how it'll interact with runtime limits, but I imagine you can do it if you're willing to write a little code.

ゃ懵逼小萝莉 2024-10-25 15:13:34

制作真正的 DSL 最简单的方法是 ANTLR,它具有一些流行语言的语法模板。

The simplest way to make a real DSL is ANTLR, it has syntax templates for some popular languages.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文