如何在python源代码中检测I/O(I/O的标准库方式)

发布于 2024-10-17 13:02:57 字数 355 浏览 5 评论 0原文

我正在为我最后一年的项目的一小部分 Python 代码构建一个优化编译器。我要做的第一件事是测试变量是否参与或导致 I/O。如果我要静态地跟踪众所周知的兔子洞中的函数调用,我如何准确地知道它涉及 I/O?是否会调用内置 python 函数,例如打印、输入或内置“文件”对象函数调用来读取和写入?

我没有很多时间来完成这个项目(只有 6 个月),所以我完全忽略了人们用 C 编写 I/O,将其包装成某种 python 对象并从 python 调用它。

生成的字节码是否表明是否有I/O?或者它和 AST 一样无用吗?

如果它是可撤消的,那没什么大不了的,我只会为我的项目打印、输入读取和写入我的 I/O 子集。或者做活性分析。

谢谢。

I'm build an optimizing compiler for a small subset of python code for my final year project. First thing I'm doing is testing whether a variable is involved in or leads to I/O. If I were to statically trace a function call down the proverbial rabbit hole, how exactly would I know that it involves I/O? Would there be a call to a built-in python function such as print, input, or built-in 'file' object function calls to read and write?

I don't have alot of time to do this project(only 6 months) so I'm completely ignoring people writing the I/O in C, wrapping it some sort of python object and calling it from python.

Is the byte code generated indicative of whether there's I/O? Or is it as unhelpful as the AST?

No biggie if it's undoable, I'll just my I/O subset for my project to print, input read and write. That or do liveness analysis.

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

手长情犹 2024-10-24 13:02:57

它并不像只查看字节码那么简单,因为对事物的调用只是符号查找:

>>> def write_to_a_file(s):
    f = open('foo.txt', 'w')
    f.write(s)
    f.close()


>>> import dis
>>> dis.dis(write_to_a_file)
  2           0 LOAD_GLOBAL              0 (open)
              3 LOAD_CONST               1 ('foo.txt')
              6 LOAD_CONST               2 ('w')
              9 CALL_FUNCTION            2
             12 STORE_FAST               1 (f)

  3          15 LOAD_FAST                1 (f)
             18 LOAD_ATTR                1 (write)
             21 LOAD_FAST                0 (s)
             24 CALL_FUNCTION            1
             27 POP_TOP             

  4          28 LOAD_FAST                1 (f)
             31 LOAD_ATTR                2 (close)
             34 CALL_FUNCTION            0
             37 POP_TOP             
             38 LOAD_CONST               0 (None)
             41 RETURN_VALUE      

字节码本身只是加载事物、调用事物和存储事物。如果您在字节码级别操作,您实际上必须查看有效负载。

查看当前的Python字节码列表,你会发现那里真的什么都没有区分 I/O 调用。

即使您要检查所有 LOAD_GLOBAL 调用或 LOAD_FAST 调用并应用白名单,也不一定有效,因为有些模块提供 I/O 而字节码不提供在那里也没有真正帮助您:

>>> def uses_a_module_for_io(s):
    import shutil
    shutil.copy(s, 'foo.txt')


>>> dis.dis(uses_a_module_for_io)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (shutil)
              9 STORE_FAST               1 (shutil)

  3          12 LOAD_FAST                1 (shutil)
             15 LOAD_ATTR                1 (copy)
             18 LOAD_FAST                0 (s)
             21 LOAD_CONST               2 ('foo.txt')
             24 CALL_FUNCTION            2
             27 POP_TOP             
             28 LOAD_CONST               0 (None)
             31 RETURN_VALUE  

>>> def doesnt_use_shutil_really(s):
    shutil = object()
    shutil.copy = lambda x,y: None
    shutil.copy(s, 'foo.txt')


>>> dis.dis(doesnt_use_shutil_really)
  2           0 LOAD_GLOBAL              0 (object)
              3 CALL_FUNCTION            0
              6 STORE_FAST               1 (shutil)

  3           9 LOAD_CONST               1 (<code object <lambda> at 011D8AD0, file "<pyshell#29>", line 3>)
             12 MAKE_FUNCTION            0
             15 LOAD_FAST                1 (shutil)
             18 STORE_ATTR               1 (copy)

  4          21 LOAD_FAST                1 (shutil)
             24 LOAD_ATTR                1 (copy)
             27 LOAD_FAST                0 (s)
             30 LOAD_CONST               2 ('foo.txt')
             33 CALL_FUNCTION            2
             36 POP_TOP             
             37 LOAD_CONST               0 (None)
             40 RETURN_VALUE        

请注意,shutil 的 LOAD_FAST 可以是用户刚刚编写的内容。就我而言,我只是将其设为通用对象,但用户也可以在其路径上拥有不同的 shutil

It's not as simple as just looking at the bytecode because calls for things are just symbol lookups:

>>> def write_to_a_file(s):
    f = open('foo.txt', 'w')
    f.write(s)
    f.close()


>>> import dis
>>> dis.dis(write_to_a_file)
  2           0 LOAD_GLOBAL              0 (open)
              3 LOAD_CONST               1 ('foo.txt')
              6 LOAD_CONST               2 ('w')
              9 CALL_FUNCTION            2
             12 STORE_FAST               1 (f)

  3          15 LOAD_FAST                1 (f)
             18 LOAD_ATTR                1 (write)
             21 LOAD_FAST                0 (s)
             24 CALL_FUNCTION            1
             27 POP_TOP             

  4          28 LOAD_FAST                1 (f)
             31 LOAD_ATTR                2 (close)
             34 CALL_FUNCTION            0
             37 POP_TOP             
             38 LOAD_CONST               0 (None)
             41 RETURN_VALUE      

The bytecodes them selves are just loading things, calling things, and storing things. You'll actually have to look at the payload if you're operating at the bytecode level.

Check out the current list of Python bytecodes and you can see that there's really nothing there that distinguishes I/O calls.

Even if you were to inspect all LOAD_GLOBAL calls or LOAD_FAST calls and apply a whitelist, that wouldn't necessarily work because there are modules that provide I/O and the bytecode doesn't really help you there either:

>>> def uses_a_module_for_io(s):
    import shutil
    shutil.copy(s, 'foo.txt')


>>> dis.dis(uses_a_module_for_io)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (shutil)
              9 STORE_FAST               1 (shutil)

  3          12 LOAD_FAST                1 (shutil)
             15 LOAD_ATTR                1 (copy)
             18 LOAD_FAST                0 (s)
             21 LOAD_CONST               2 ('foo.txt')
             24 CALL_FUNCTION            2
             27 POP_TOP             
             28 LOAD_CONST               0 (None)
             31 RETURN_VALUE  

>>> def doesnt_use_shutil_really(s):
    shutil = object()
    shutil.copy = lambda x,y: None
    shutil.copy(s, 'foo.txt')


>>> dis.dis(doesnt_use_shutil_really)
  2           0 LOAD_GLOBAL              0 (object)
              3 CALL_FUNCTION            0
              6 STORE_FAST               1 (shutil)

  3           9 LOAD_CONST               1 (<code object <lambda> at 011D8AD0, file "<pyshell#29>", line 3>)
             12 MAKE_FUNCTION            0
             15 LOAD_FAST                1 (shutil)
             18 STORE_ATTR               1 (copy)

  4          21 LOAD_FAST                1 (shutil)
             24 LOAD_ATTR                1 (copy)
             27 LOAD_FAST                0 (s)
             30 LOAD_CONST               2 ('foo.txt')
             33 CALL_FUNCTION            2
             36 POP_TOP             
             37 LOAD_CONST               0 (None)
             40 RETURN_VALUE        

Note that the LOAD_FAST for shutil can be something the user just makes up. In my case I just made it a generic object, but the user can have a different shutil on their path as well.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文