如何在python源代码中检测I/O(I/O的标准库方式)
我正在为我最后一年的项目的一小部分 Python 代码构建一个优化编译器。我要做的第一件事是测试变量是否参与或导致 I/O。如果我要静态地跟踪众所周知的兔子洞中的函数调用,我如何准确地知道它涉及 I/O?是否会调用内置 python 函数,例如打印、输入或内置“文件”对象函数调用来读取和写入?
我没有很多时间来完成这个项目(只有 6 个月),所以我完全忽略了人们用 C 编写 I/O,将其包装成某种 python 对象并从 python 调用它。
生成的字节码是否表明是否有I/O?或者它和 AST 一样无用吗?
如果它是可撤消的,那没什么大不了的,我只会为我的项目打印、输入读取和写入我的 I/O 子集。或者做活性分析。
谢谢。
I'm build an optimizing compiler for a small subset of python code for my final year project. First thing I'm doing is testing whether a variable is involved in or leads to I/O. If I were to statically trace a function call down the proverbial rabbit hole, how exactly would I know that it involves I/O? Would there be a call to a built-in python function such as print, input, or built-in 'file' object function calls to read and write?
I don't have alot of time to do this project(only 6 months) so I'm completely ignoring people writing the I/O in C, wrapping it some sort of python object and calling it from python.
Is the byte code generated indicative of whether there's I/O? Or is it as unhelpful as the AST?
No biggie if it's undoable, I'll just my I/O subset for my project to print, input read and write. That or do liveness analysis.
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
它并不像只查看字节码那么简单,因为对事物的调用只是符号查找:
字节码本身只是加载事物、调用事物和存储事物。如果您在字节码级别操作,您实际上必须查看有效负载。
查看当前的Python字节码列表,你会发现那里真的什么都没有区分 I/O 调用。
即使您要检查所有
LOAD_GLOBAL
调用或LOAD_FAST
调用并应用白名单,也不一定有效,因为有些模块提供 I/O 而字节码不提供在那里也没有真正帮助您:请注意,shutil 的 LOAD_FAST 可以是用户刚刚编写的内容。就我而言,我只是将其设为通用对象,但用户也可以在其路径上拥有不同的
shutil
。It's not as simple as just looking at the bytecode because calls for things are just symbol lookups:
The bytecodes them selves are just loading things, calling things, and storing things. You'll actually have to look at the payload if you're operating at the bytecode level.
Check out the current list of Python bytecodes and you can see that there's really nothing there that distinguishes I/O calls.
Even if you were to inspect all
LOAD_GLOBAL
calls orLOAD_FAST
calls and apply a whitelist, that wouldn't necessarily work because there are modules that provide I/O and the bytecode doesn't really help you there either:Note that the
LOAD_FAST
forshutil
can be something the user just makes up. In my case I just made it a generic object, but the user can have a differentshutil
on their path as well.