Python：在执行过程中分析复杂语句

发布于 2024-08-12 11:25:29 字数 1097 浏览 9 评论 0原文

我想知道是否有任何方法可以获取有关执行期间 python 语句解释的一些元信息。

假设这是一个由一些与 or 连接的单个语句组成的复杂语句（A、B、...是布尔函数）

if A or B and ((C or D and E) or F) or G and H:

，我想知道语句的哪一部分导致语句计算为确实如此，我可以用这些知识做一些事情。在这个例子中，有 3 个可能的候选者：

A
B and ((C or D and E) or F)
G and H

在第二种情况下，我想知道评估的是 (C 或 D 和 E) 还是 F到 True 等等...

有没有办法不解析语句？我可以以某种方式连接到解释器或以我尚未找到的方式利用检查模块吗？我不想调试，这实际上是为了了解该或链的哪一部分在运行时触发了该语句。

编辑 - 更多信息：我想要使用它的应用程序类型是一种分类算法，它输入一个对象并根据其属性输出该对象的特定类别。我需要知道哪些属性对该类别具有决定性作用。
正如您可能猜到的，上面的复杂语句来自分类算法。该算法的代码是从正式的伪代码生成的，包含大约 3,000 个嵌套的 if-elif 语句，这些语句以分层方式确定类别，例如，

if obj.attr1 < 23 and (is_something(obj.attr10) or eats_spam_for_breakfast(obj)):
    return 'Category1'
elif obj.attr3 == 'Welcome Home' or count_something(obj) >= 2:
    return 'Category2a'
elif ...

除了类别本身之外，我还需要标记那些属性对于该类别具有决定性作用，因此，如果我删除所有其他属性，该对象仍会被分配给同一类别（由于语句中的或）。这些语句可能非常长，最多 1,000 个字符，并且嵌套很深。每个对象最多可以有 200 个属性。

非常感谢您的帮助！

编辑2：过去两周没有找到时间。感谢您提供这个解决方案，它有效！

原文

I am wondering if there is any way to get some meta information about the interpretation of a python statement during execution.

Let's assume this is a complex statement of some single statements joined with or (A, B, ... are boolean functions)

if A or B and ((C or D and E) or F) or G and H:

and I want to know which part of the statement is causing the statement to evaluate to True so I can do something with this knowledge. In the example, there would be 3 possible candidates:

A
B and ((C or D and E) or F)
G and H

And in the second case, I would like to know if it was (C or D and E) or F that evaluated to True and so on...

Is there any way without parsing the statement? Can I hook up to the interpreter in some way or utilize the inspect module in a way that I haven't found yet? I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.

Edit - further information: The type of application that I want to use this in is a categorizing algorithm that inputs an object and outputs a certain category for this object, based on its attributes. I need to know which attributes were decisive for the category.
As you might guess, the complex statement from above comes from the categorization algorithm. The code for this algorithm is generated from a formal pseudo-code and contains about 3,000 nested if-elif-statements that determine the category in a hierarchical way like

if obj.attr1 < 23 and (is_something(obj.attr10) or eats_spam_for_breakfast(obj)):
    return 'Category1'
elif obj.attr3 == 'Welcome Home' or count_something(obj) >= 2:
    return 'Category2a'
elif ...

So aside from the category itself, I need to flag the attributes that were decisive for that category, so if I'd delete all other attributes, the object would still be assigned to the same category (due to the ors within the statements). The statements can be really long, up to 1,000 chars, and deeply nested. Every object can have up to 200 attributes.

Thanks a lot for your help!

Edit 2: Haven't found time in the last two weeks. Thanks for providing this solution, it works!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

む无字情书 2024-08-19 11:25:29

您能否重新编码您的原始代码：

if A or B and ((C or D and E) or F) or G and H:

例如：

e = Evaluator()
if e('A or B and ((C or D and E) or F) or G and H'):

...？如果是这样，就有希望！-）。 Evaluator 类在 __call__ 后，将编译其字符串参数，然后使用（全局变量的空真实字典，以及）eval结果用于 locals 的伪dict，实际上将值查找委托给其调用者的局部变量和全局变量（只需要一点黑魔法，但是，还不错；-）还记下它查找的名称。考虑到 Python 的 and 和 or 的短路行为，您可以从实际查找的一组实际名称中推断出哪一个确定了表达式的真值（或每个子表达式）——在 X 或 Y 或 Z 中，第一个真值（如果有）将是最后一个查找的值，在 X 和 Y 和 Z 中/code>，第一个错误将会。

这有帮助吗？如果是，并且如果您需要编码方面的帮助，我很乐意对此进行扩展，但首先我想确认一下获取 Evaluator 的代码 确实正在解决您想要解决的任何问题！-)

编辑：所以这里是实现 Evaluator 的编码并举例说明其使用：

import inspect
import random

class TracingDict(object):

  def __init__(self, loc, glob):
    self.loc = loc
    self.glob = glob
    self.vars = []

  def __getitem__(self, name):
    try: v = self.loc[name]
    except KeyError: v = self.glob[name]
    self.vars.append((name, v))
    return v


class Evaluator(object):

  def __init__(self):
    f = inspect.currentframe()
    f = inspect.getouterframes(f)[1][0]
    self.d = TracingDict(f.f_locals, f.f_globals)

  def __call__(self, expr):
    return eval(expr, {}, self.d)


def f(A, B, C, D, E):
  e = Evaluator()
  res = e('A or B and ((C or D and E) or F) or G and H')
  print 'R=%r from %s' % (res, e.d.vars)

for x in range(20):
  A, B, C, D, E, F, G, H = [random.randrange(2) for x in range(8)]
  f(A, B, C, D, E)

这是示例运行的输出：

R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 0), ('G', 1), ('H', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]

您经常可以看到（大约 50% 的时间）A 为真，这会使一切短路。当 A 为假时，B 进行计算——当 B 也为假时，接下来是 G，当 B 为真时，然后是 C。

Could you recode your original code:

if A or B and ((C or D and E) or F) or G and H:

as, say:

e = Evaluator()
if e('A or B and ((C or D and E) or F) or G and H'):

...? If so, there's hope!-). The Evaluator class, upon __call__, would compile its string argument, then eval the result with (an empty real dict for globals, and) a pseudo-dict for locals that actually delegates the value lookups to the locals and globals of its caller (just takes a little black magic, but, not too bad;-) and also takes note of what names it's looked up. Given Python's and and or's short-circuiting behavior, you can infer from the actual set of names that were actually looked up, which one determined the truth value of the expression (or each subexpression) -- in an X or Y or Z, the first true value (if any) will be the last one looked up, and in a X and Y and Z, the first false one will.

Would this help? If yes, and if you need help with the coding, I'll be happy to expand on this, but first I'd like some confirmation that getting the code for Evaluator would indeed be solving whatever problem it is that you're trying to address!-)

Edit: so here's coding implementing Evaluator and exemplifying its use:

import inspect
import random

class TracingDict(object):

  def __init__(self, loc, glob):
    self.loc = loc
    self.glob = glob
    self.vars = []

  def __getitem__(self, name):
    try: v = self.loc[name]
    except KeyError: v = self.glob[name]
    self.vars.append((name, v))
    return v


class Evaluator(object):

  def __init__(self):
    f = inspect.currentframe()
    f = inspect.getouterframes(f)[1][0]
    self.d = TracingDict(f.f_locals, f.f_globals)

  def __call__(self, expr):
    return eval(expr, {}, self.d)


def f(A, B, C, D, E):
  e = Evaluator()
  res = e('A or B and ((C or D and E) or F) or G and H')
  print 'R=%r from %s' % (res, e.d.vars)

for x in range(20):
  A, B, C, D, E, F, G, H = [random.randrange(2) for x in range(8)]
  f(A, B, C, D, E)

and here's output from a sample run:

R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 0), ('G', 1), ('H', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]

You can see that often (about 50% of the time) A is true, which short-circuits everything. When A is false, B evaluates -- when B is also false, then G is next, when B is true, then C.

回复收藏 0 原文

黒涩兲箜 2024-08-19 11:25:29

据我记得，Python 本身不返回 True 或 False：

重要的例外：布尔值
操作 or 和 and 总是返回
它们的操作数之一。

Python 标准库 - 真值测试
因此，以下内容是有效的：

A = 1
B = 0
result = B or A # result == 1

As far as I remember, Python does not return True or False per se:

Important exception: the Boolean
operations or and and always return
one of their operands.

The Python Standard Library - Truth Value Testing
Therefore, following is valid:

A = 1
B = 0
result = B or A # result == 1

回复收藏 0 原文

彩扇题诗 2024-08-19 11:25:29

Python 解释器不提供在运行时内省表达式求值的方法。 sys.settrace() 函数允许您注册一个为每一行源代码调用的回调，但这对于您想要执行的操作来说太粗粒度了。

也就是说，我尝试了一种疯狂的黑客方法，为执行的每个字节码调用该函数： Python字节码跟踪。

但即便如此，我也不知道如何找到执行状态，例如解释器堆栈上的值。

我认为获得你想要的唯一方法是通过算法修改代码。您可以转换源代码（尽管您说过不想解析代码），也可以转换编译后的字节码。这两项工作都不是一件简单的事，我确信如果你尝试的话，会有十几个困难的障碍需要克服。

抱歉让您失望了……

顺便说一句：这种技术有什么应用？

回复收藏 0 原文

偷得浮生 2024-08-19 11:25:29

我会在大语句之前添加类似的内容（假设该语句在一个类中）：

for i in ("A","B","C","D","E","F","G","H"):
    print i,self.__dict__[i]

I would just put something like this before the big statement (assuming the statement is in a class):

for i in ("A","B","C","D","E","F","G","H"):
    print i,self.__dict__[i]

回复收藏 0 原文

Bonjour°[大白 2024-08-19 11:25:29

“”“我不想调试，这实际上是为了知道这个或链的哪一部分在运行时触发了语句。”“”：您可能需要解释“调试”和“知道哪个部分”之间的区别。

您的意思是观察者需要在运行时被告知发生了什么（为什么？）以便您可以做不同的事情，或者您的意思是代码需要“知道”以便它可以做不同的事情？

无论如何，假设你的 A、B、C 等没有副作用，为什么你不能简单地拆分你的 or 链并测试组件：

part1 = A
part2 = B and ((C or D and E) or F)
part3 = G and H
whodunit = "1" if part1 else "2" if part2 else "3" if part3 else "nobody"
print "Perp is", whodunit
if part1 or part2 or part3:
    do_something()

？

更新：

“调试和‘知道哪一部分’之间的区别是我需要为首先评估为 True（在运行时）的语句中使用的变量分配一个标志” ”

所以你是说，给定条件“A 或 B”，如果 A 为真且 B 为真，则 A 获得所有荣耀（或所有指责）？我发现很难相信您所描述的分类软件是基于“或”进行短路评估。您确定代码背后有“A 或 B”而不是“B 或 A”的意图吗？顺序可能是随机的，还是受到变量最初输入的顺序的影响？

无论如何，自动生成 Python 代码然后对其进行逆向工程似乎远远解决了这个问题。为什么不直接使用 part1 = yadda; 生成代码？第2部分=废话；等等性质？

"""I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.""": you might need to explain what is the difference between "debug" and "knowing which part".

Do you mean that you the observer need to be told at runtime what is going on (why??) so that you can do something different, or do you mean that the code needs to "know" so that it can do something different?

In any case, assuming that your A, B, C etc don't have side effects, why can't you simply split up your or-chain and test the components:

part1 = A
part2 = B and ((C or D and E) or F)
part3 = G and H
whodunit = "1" if part1 else "2" if part2 else "3" if part3 else "nobody"
print "Perp is", whodunit
if part1 or part2 or part3:
    do_something()

Update:

"""The difference between debug and 'knowing which part' is that I need to assign a flag for the variables that were used in the statement that first evaluated to True (at runtime)"""

So you are saying that given the condition "A or B", that if A is True and B is True, A gets all the glory (or all the blame)? I'm finding it very hard to believe that categorisation software such as you describe is based on "or" having a short-circuit evaluation. Are you sure that there's an intent behind the code being "A or B" and not "B or A"? Could the order be random, or influenced by the order that the variables where originally input?

In any case, generating Python code automatically and then reverse-engineering it appears to be a long way around the problem. Why not just generate code with the part1 = yadda; part2 = blah; etc nature?

回复收藏 0 原文

~没有更多了~