Python:在执行过程中分析复杂语句
我想知道是否有任何方法可以获取有关执行期间 python 语句解释的一些元信息。
假设这是一个由一些与 or 连接的单个语句组成的复杂语句(A、B、...是布尔函数)
if A or B and ((C or D and E) or F) or G and H:
,我想知道语句的哪一部分导致语句计算为确实如此,我可以用这些知识做一些事情。在这个例子中,有 3 个可能的候选者:
A
B and ((C or D and E) or F)
G and H
在第二种情况下,我想知道评估的是 (C 或 D 和 E)
还是 F
到 True 等等...
有没有办法不解析语句?我可以以某种方式连接到解释器或以我尚未找到的方式利用检查模块吗?我不想调试,这实际上是为了了解该或链的哪一部分在运行时触发了该语句。
编辑 - 更多信息:我想要使用它的应用程序类型是一种分类算法,它输入一个对象并根据其属性输出该对象的特定类别。我需要知道哪些属性对该类别具有决定性作用。
正如您可能猜到的,上面的复杂语句来自分类算法。该算法的代码是从正式的伪代码生成的,包含大约 3,000 个嵌套的 if-elif 语句,这些语句以分层方式确定类别,例如,
if obj.attr1 < 23 and (is_something(obj.attr10) or eats_spam_for_breakfast(obj)):
return 'Category1'
elif obj.attr3 == 'Welcome Home' or count_something(obj) >= 2:
return 'Category2a'
elif ...
除了类别本身之外,我还需要标记那些属性对于该类别具有决定性作用,因此,如果我删除所有其他属性,该对象仍会被分配给同一类别(由于语句中的或
)。这些语句可能非常长,最多 1,000 个字符,并且嵌套很深。每个对象最多可以有 200 个属性。
非常感谢您的帮助!
编辑2:过去两周没有找到时间。感谢您提供这个解决方案,它有效!
I am wondering if there is any way to get some meta information about the interpretation of a python statement during execution.
Let's assume this is a complex statement of some single statements joined with or (A, B, ... are boolean functions)
if A or B and ((C or D and E) or F) or G and H:
and I want to know which part of the statement is causing the statement to evaluate to True so I can do something with this knowledge. In the example, there would be 3 possible candidates:
A
B and ((C or D and E) or F)
G and H
And in the second case, I would like to know if it was (C or D and E)
or F
that evaluated to True and so on...
Is there any way without parsing the statement? Can I hook up to the interpreter in some way or utilize the inspect module in a way that I haven't found yet? I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.
Edit - further information: The type of application that I want to use this in is a categorizing algorithm that inputs an object and outputs a certain category for this object, based on its attributes. I need to know which attributes were decisive for the category.
As you might guess, the complex statement from above comes from the categorization algorithm. The code for this algorithm is generated from a formal pseudo-code and contains about 3,000 nested if-elif-statements that determine the category in a hierarchical way like
if obj.attr1 < 23 and (is_something(obj.attr10) or eats_spam_for_breakfast(obj)):
return 'Category1'
elif obj.attr3 == 'Welcome Home' or count_something(obj) >= 2:
return 'Category2a'
elif ...
So aside from the category itself, I need to flag the attributes that were decisive for that category, so if I'd delete all other attributes, the object would still be assigned to the same category (due to the or
s within the statements). The statements can be really long, up to 1,000 chars, and deeply nested. Every object can have up to 200 attributes.
Thanks a lot for your help!
Edit 2: Haven't found time in the last two weeks. Thanks for providing this solution, it works!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您能否重新编码您的原始代码:
例如:
...?如果是这样,就有希望!-)。 Evaluator 类在
__call__
后,将编译
其字符串参数,然后使用(全局变量的空真实字典,以及)eval
结果用于locals
的伪dict
,实际上将值查找委托给其调用者的局部变量和全局变量(只需要一点黑魔法,但是,还不错;-)还记下它查找的名称。考虑到 Python 的and
和or
的短路行为,您可以从实际查找的一组实际名称中推断出哪一个确定了表达式的真值(或每个子表达式)——在 X 或 Y 或 Z 中,第一个真值(如果有)将是最后一个查找的值,在 X 和 Y 和 Z 中/code>,第一个错误将会。这有帮助吗?如果是,并且如果您需要编码方面的帮助,我很乐意对此进行扩展,但首先我想确认一下获取
Evaluator
的代码 确实正在解决您想要解决的任何问题!-)编辑:所以这里是实现 Evaluator 的编码并举例说明其使用:
这是示例运行的输出:
您经常可以看到(大约 50% 的时间)A 为真,这会使一切短路。当 A 为假时,B 进行计算——当 B 也为假时,接下来是 G,当 B 为真时,然后是 C。
Could you recode your original code:
as, say:
...? If so, there's hope!-). The Evaluator class, upon
__call__
, wouldcompile
its string argument, theneval
the result with (an empty real dict for globals, and) a pseudo-dict
forlocals
that actually delegates the value lookups to the locals and globals of its caller (just takes a little black magic, but, not too bad;-) and also takes note of what names it's looked up. Given Python'sand
andor
's short-circuiting behavior, you can infer from the actual set of names that were actually looked up, which one determined the truth value of the expression (or each subexpression) -- in anX or Y or Z
, the first true value (if any) will be the last one looked up, and in aX and Y and Z
, the first false one will.Would this help? If yes, and if you need help with the coding, I'll be happy to expand on this, but first I'd like some confirmation that getting the code for
Evaluator
would indeed be solving whatever problem it is that you're trying to address!-)Edit: so here's coding implementing Evaluator and exemplifying its use:
and here's output from a sample run:
You can see that often (about 50% of the time) A is true, which short-circuits everything. When A is false, B evaluates -- when B is also false, then G is next, when B is true, then C.
据我记得,Python 本身不返回 True 或 False:
Python 标准库 - 真值测试
因此,以下内容是有效的:
As far as I remember, Python does not return True or False per se:
The Python Standard Library - Truth Value Testing
Therefore, following is valid:
Python 解释器不提供在运行时内省表达式求值的方法。
sys.settrace()
函数允许您注册一个为每一行源代码调用的回调,但这对于您想要执行的操作来说太粗粒度了。也就是说,我尝试了一种疯狂的黑客方法,为执行的每个字节码调用该函数: Python字节码跟踪。
但即便如此,我也不知道如何找到执行状态,例如解释器堆栈上的值。
我认为获得你想要的唯一方法是通过算法修改代码。您可以转换源代码(尽管您说过不想解析代码),也可以转换编译后的字节码。这两项工作都不是一件简单的事,我确信如果你尝试的话,会有十几个困难的障碍需要克服。
抱歉让您失望了……
顺便说一句:这种技术有什么应用?
The Python interpreter doesn't give you a way to introspect the evaluation of an expression at runtime. The
sys.settrace()
function lets you register a callback that is invoked for every line of source code, but that's too coarse-grained for what you want to do.That said, I've experimented with a crazy hack to have the function invoked for every bytecode executed: Python bytecode tracing.
But even then, I don't know how to find the execution state, for example, the values on the interpreter stack.
I think the only way to get at what you want is to modify the code algorithmically. You could either transform your source (though you said you didn't want to parse the code), or you could transform the compiled bytecode. Neither is a simple undertaking, and I'm sure there are a dozen difficult hurdles to overcome if you try it.
Sorry to be discouraging...
BTW: What application do you have for this sort of technology?
我会在大语句之前添加类似的内容(假设该语句在一个类中):
I would just put something like this before the big statement (assuming the statement is in a class):
“”“我不想调试,这实际上是为了知道这个或链的哪一部分在运行时触发了语句。”“”:您可能需要解释“调试”和“知道哪个部分”之间的区别。
您的意思是观察者需要在运行时被告知发生了什么(为什么?)以便您可以做不同的事情,或者您的意思是代码需要“知道”以便它可以做不同的事情?
无论如何,假设你的 A、B、C 等没有副作用,为什么你不能简单地拆分你的 or 链并测试组件:
?
更新:
“调试和‘知道哪一部分’之间的区别是我需要为首先评估为 True(在运行时)的语句中使用的变量分配一个标志” ”
所以你是说,给定条件“A 或 B”,如果 A 为真且 B 为真,则 A 获得所有荣耀(或所有指责)?我发现很难相信您所描述的分类软件是基于“或”进行短路评估。您确定代码背后有“A 或 B”而不是“B 或 A”的意图吗?顺序可能是随机的,还是受到变量最初输入的顺序的影响?
无论如何,自动生成 Python 代码然后对其进行逆向工程似乎远远解决了这个问题。为什么不直接使用
part1 = yadda; 生成代码?第2部分=废话;等等
性质?"""I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.""": you might need to explain what is the difference between "debug" and "knowing which part".
Do you mean that you the observer need to be told at runtime what is going on (why??) so that you can do something different, or do you mean that the code needs to "know" so that it can do something different?
In any case, assuming that your A, B, C etc don't have side effects, why can't you simply split up your or-chain and test the components:
??
Update:
"""The difference between debug and 'knowing which part' is that I need to assign a flag for the variables that were used in the statement that first evaluated to True (at runtime)"""
So you are saying that given the condition "A or B", that if A is True and B is True, A gets all the glory (or all the blame)? I'm finding it very hard to believe that categorisation software such as you describe is based on "or" having a short-circuit evaluation. Are you sure that there's an intent behind the code being "A or B" and not "B or A"? Could the order be random, or influenced by the order that the variables where originally input?
In any case, generating Python code automatically and then reverse-engineering it appears to be a long way around the problem. Why not just generate code with the
part1 = yadda; part2 = blah; etc
nature?