Python 等价于内联函数或宏

发布于 2024-11-16 10:43:54 字数 324 浏览 2 评论 0原文

我刚刚意识到,这样做

x.real*x.real+x.imag*x.imag

快三倍

abs(x)**2

比 x 是复数的 numpy 数组 。为了代码的可读性,我可以定义一个比

def abs2(x):
    return x.real*x.real+x.imag*x.imag

abs(x)**2 快得多的函数,但这是以函数调用为代价的。是否可以内联这样的函数,就像我在 C 中使用宏或使用 inline 关键字所做的那样?

I just realized that doing

x.real*x.real+x.imag*x.imag

is three times faster than doing

abs(x)**2

where x is a numpy array of complex numbers. For code readability, I could define a function like

def abs2(x):
    return x.real*x.real+x.imag*x.imag

which is still far faster than abs(x)**2, but it is at the cost of a function call. Is it possible to inline such a function, as I would do in C using macro or using inline keyword?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

千と千尋 2024-11-23 10:43:54

是否可以内联这样的函数,就像我在 C 中使用宏或使用 inline 关键字所做的那样?

不。在到达这个特定指令之前,Python 解释器甚至不知道是否存在这样的函数,更不用说它做什么了。

正如评论中所指出的,PyPy 将自动内联(上面的内容仍然成立 - 它“简单地”在运行时生成优化版本,从中受益,但在它失效时打破它),尽管在此特定情况并没有帮助,因为在 PyPy 上实现 NumPy 刚刚开始不久,而且至今还不是 beta 级别。但底线是:不要担心 Python 中这个级别的优化。实现要么自行优化,要么不优化,这不是您的责任。

Is it possible to inline such a function, as I would do in C using macro or using inline keyword?

No. Before reaching this specific instruction, Python interpreters don't even know if there's such a function, much less what it does.

As noted in comments, PyPy will inline automatically (the above still holds - it "simply" generates an optimized version at runtime, benefits from it, but breaks out of it when it's invalidated), although in this specific case that doesn't help as implementing NumPy on PyPy started only shortly ago and isn't even beta level to this day. But the bottom line is: Don't worry about optimizations on this level in Python. Either the implementations optimize it themselves or they don't, it's not your responsibility.

复古式 2024-11-23 10:43:54

不完全是OP所要求的,但接近:

Inliner 内联 Python 函数调用。 此概念证明
博客
帖子

来自内联导入内联

@排队
def add_stuff(x, y):
    返回 x + y

def add_lots_of_numbers():
    结果=[]
    对于 xrange(10) 中的 i:
         结果.append(add_stuff(i, i+1))

在上面的代码中,add_lots_of_numbers函数被转换为
这个:

def add_lots_of_numbers():
    结果=[]
    对于 xrange(10) 中的 i:
         结果.append(i + i + 1)

此外,任何对这个问题以及在 CPython 中实现此类优化器所涉及的复杂性感兴趣的人,也可能想看看:

Not exactly what the OP has asked for, but close:

Inliner inlines Python function calls. Proof of concept for this
blog
post

from inliner import inline

@inline
def add_stuff(x, y):
    return x + y

def add_lots_of_numbers():
    results = []
    for i in xrange(10):
         results.append(add_stuff(i, i+1))

In the above code the add_lots_of_numbers function is converted into
this:

def add_lots_of_numbers():
    results = []
    for i in xrange(10):
         results.append(i + i + 1)

Also anyone interested in this question and the complications involved in implementing such optimizer in CPython, might also want to have a look at:

jJeQQOZ5 2024-11-23 10:43:54

我同意其他人的观点,即此类优化只会让您在 CPython 上感到痛苦,如果您关心性能,则应该考虑 PyPy(尽管我们的 NumPy 可能太不完整而无用)。然而,我不同意,并说你可以关心 PyPy 上的此类优化,而不是像所说的 PyPy 自动执行的那样,但如果你很了解 PyPy,你真的可以调整你的代码以使 PyPy 发出你想要的程序集,并不是说你几乎永远都需要这样做。

I'll agree with everyone else that such optimizations will just cause you pain on CPython, that if you care about performance you should consider PyPy (though our NumPy may be too incomplete to be useful). However I'll disagree and say you can care about such optimizations on PyPy, not this one specifically as has been said PyPy does that automatically, but if you know PyPy well you really can tune your code to make PyPy emit the assembly you want, not that you need to almost ever.

夏夜暖风 2024-11-23 10:43:54

不。

最接近 C 宏的是一个脚本(awk 或其他),您可以将其包含在 makefile 中,并用长形式替换 python 脚本中的某种模式,例如 abs(x)**2 。

No.

The closest you can get to C macros is a script (awk or other) that you may include in a makefile, and which substitutes a certain pattern like abs(x)**2 in your python scripts with the long form.

£烟消云散 2024-11-23 10:43:54

实际上,计算速度可能会更快,例如:

x.real** 2+ x.imag** 2

因此,函数调用的额外成本可能会减少。让我们看看:

In []: n= 1e4
In []: x= randn(n, 1)+ 1j* rand(n, 1)
In []: %timeit x.real* x.real+ x.imag* x.imag
10000 loops, best of 3: 100 us per loop
In []: %timeit x.real** 2+ x.imag** 2
10000 loops, best of 3: 77.9 us per loop

并将计算封装在函数中:

In []: def abs2(x):
   ..:     return x.real** 2+ x.imag** 2
   ..: 
In []: %timeit abs2(x)
10000 loops, best of 3: 80.1 us per loop

无论如何(正如其他人指出的那样)这种微优化(为了避免函数调用)并不是编写 Python 代码的真正有效的方法。

Actually it might be even faster to calculate, like:

x.real** 2+ x.imag** 2

Thus, the extra cost of function call will likely to diminish. Lets see:

In []: n= 1e4
In []: x= randn(n, 1)+ 1j* rand(n, 1)
In []: %timeit x.real* x.real+ x.imag* x.imag
10000 loops, best of 3: 100 us per loop
In []: %timeit x.real** 2+ x.imag** 2
10000 loops, best of 3: 77.9 us per loop

And encapsulating the calculation in a function:

In []: def abs2(x):
   ..:     return x.real** 2+ x.imag** 2
   ..: 
In []: %timeit abs2(x)
10000 loops, best of 3: 80.1 us per loop

Anyway (as other have pointed out) this kind of micro-optimization (in order to avoid a function call) is not really productive way to write python code.

彩扇题诗 2024-11-23 10:43:54

您可以尝试使用 lambda:

abs2 = lambda x : x.real*x.real+x.imag*x.imag

然后通过以下方式调用它:

y = abs2(x)

You can try to use lambda:

abs2 = lambda x : x.real*x.real+x.imag*x.imag

then call it by:

y = abs2(x)
各空 2024-11-23 10:43:54

Python 是一种动态编程语言。幸运的是,Python 在执行之前会编译为字节码。所以你可以内联代码。对于不需要胖外部包的简单解决方案,您可以在内部函数中使用 Python:

from inspect import getsource

abs2 = lambda z : z.real * z.real + z.imag * z.imag

def loop (zz, zs):
  for z in zs:
    zz += abs2 (z)

print ( f"loop code:\n{getsource (loop)}" )

inlined = getsource (loop).replace ("abs2 (z)", getsource (abs2).split(":")[1] )

print ( f"inlined loop code:\n{inlined}" )

compiled = compile (inlined, '<string>', 'exec').co_code

def loop2 (zz, zs):
  for z in zs:
    zz += z.real * z.real + z.imag * z.imag

compiled2 = compile (getsource (loop2), '<string>', 'exec').co_code

print ( f"compiled loop  code: {compiled}" )
print ( f"compiled loop2 code: {compiled2}")

注意:这仅支持一行 lambda,其参数与传递的变量具有相同的名称。这是一个简单且非常 hackish 的解决方案,但 Python 并不是一种不支持实时代码编辑的解释器语言。

Python is a dynamic programming language. Luckily Python does compile to bytecode before execution. So you can inline code. For simple solutions that don't require fat external packages you can use Pythons in house functions:

from inspect import getsource

abs2 = lambda z : z.real * z.real + z.imag * z.imag

def loop (zz, zs):
  for z in zs:
    zz += abs2 (z)

print ( f"loop code:\n{getsource (loop)}" )

inlined = getsource (loop).replace ("abs2 (z)", getsource (abs2).split(":")[1] )

print ( f"inlined loop code:\n{inlined}" )

compiled = compile (inlined, '<string>', 'exec').co_code

def loop2 (zz, zs):
  for z in zs:
    zz += z.real * z.real + z.imag * z.imag

compiled2 = compile (getsource (loop2), '<string>', 'exec').co_code

print ( f"compiled loop  code: {compiled}" )
print ( f"compiled loop2 code: {compiled2}")

Note: this only supports one line lambdas with the parameters having the same name than the passed variables. A simple and very hackish solution, but Python isn't an interpreter language to not support real time code editing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文