为什么这个 Lua 优化 hack 会提高性能?
我正在查看描述提高Lua脚本代码性能的各种技术的文档 ,我很震惊竟然需要这样的技巧。 (虽然我引用的是 Lua,但我在 Javascript 中也见过类似的 hack)。
为什么需要这种优化:
例如代码
对于 i = 1,1000000 做 局部 x = math.sin(i) 结尾
运行速度比这个慢 30%:
局部 sin = math.sin 对于 i = 1, 1000000 做 局部 x = sin(i) 结尾
它们在本地重新声明 sin
函数。
为什么这会有帮助?无论如何,这是编译器的工作。为什么程序员必须做编译器的工作?
我在 Javascript 中看到过类似的东西;显然,解释编译器没有完成其工作肯定有一个非常充分的理由。它是什么?
我在我正在摆弄的 Lua 环境中反复看到它;人们将变量重新声明为本地变量:
local strfind = strfind
local strlen = strlen
local gsub = gsub
local pairs = pairs
local ipairs = ipairs
local type = type
local tinsert = tinsert
local tremove = tremove
local unpack = unpack
local max = max
local min = min
local floor = floor
local ceil = ceil
local loadstring = loadstring
local tostring = tostring
local setmetatable = setmetatable
local getmetatable = getmetatable
local format = format
local sin = math.sin
这里发生了什么,人们必须完成编译器的工作?编译器是否对如何查找 format
感到困惑?为什么这是程序员必须处理的问题?为什么 1993 年这个问题没有得到解决呢?
我似乎还遇到了一个逻辑悖论:
- 不进行分析就不应进行优化
- Lua 没有能力进行分析
- Lua 不应该进行优化
I'm looking over a document that describes various techniques to improve performance of Lua script code, and I'm shocked that such tricks would be required. (Although I'm quoting Lua, I've seen similar hacks in Javascript).
Why would this optimization be required:
For instance, the code
for i = 1, 1000000 do local x = math.sin(i) end
runs 30% slower than this one:
local sin = math.sin for i = 1, 1000000 do local x = sin(i) end
They're re-declaring sin
function locally.
Why would this be helpful? It's the job of the compiler to do that anyway. Why is the programmer having to do the compiler's job?
I've seen similar things in Javascript; and so obviously there must be a very good reason why the interpreting compiler isn't doing its job. What is it?
I see it repeatedly in the Lua environment I'm fiddling in; people redeclaring variables as local:
local strfind = strfind
local strlen = strlen
local gsub = gsub
local pairs = pairs
local ipairs = ipairs
local type = type
local tinsert = tinsert
local tremove = tremove
local unpack = unpack
local max = max
local min = min
local floor = floor
local ceil = ceil
local loadstring = loadstring
local tostring = tostring
local setmetatable = setmetatable
local getmetatable = getmetatable
local format = format
local sin = math.sin
What is going on here that people have to do the work of the compiler? Is the compiler confused by how to find format
? Why is this an issue that a programmer has to deal with? Why would this not have been taken care of in 1993?
I also seem to have hit a logical paradox:
- Optimization should not be done without profiling
- Lua has no ability to be profiled
- Lua should not be optimized
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
Lua 是一种动态语言。编译器可以用静态语言进行很多推理,例如将常量表达式从循环中取出。在动态语言中,情况有点不同。
Lua 的主要(也是唯一)数据结构是表。
math
也只是一个表,尽管它在这里用作命名空间。没有人可以阻止您在循环中的某个位置修改 math.sin 函数(即使认为这是一件不明智的事情),并且编译器在编译代码时无法知道这一点。因此,编译器会完全按照您指示的方式执行:在循环的每次迭代中,在math
表中查找sin
函数并调用它。现在,如果您知道您不会修改 math.sin (即您将调用相同的函数),您可以将其保存在循环外的局部变量中。由于没有表查找,因此生成的代码速度更快。
LuaJIT 的情况有点不同 - 它使用跟踪和一些高级魔法来查看代码在运行时正在做什么,因此它实际上可以通过将表达式移到循环之外来优化循环,并且除了将其实际编译为机器代码之外,其他优化也使其速度快得惊人。
关于“将变量重新声明为本地变量” - 很多时候在定义模块时,您希望使用原始函数。当访问
pairs
、max
或使用全局变量的任何内容时,没有人可以向您保证每次调用都是相同的函数。例如 stdlib 重新定义了很多全局函数。通过创建与全局同名的局部变量,您实际上将函数存储到局部变量中,并且因为局部变量(它们是词法作用域的,这意味着它们在当前作用域和任何嵌套作用域中都是可见的)优先于全局变量,请确保始终调用相同的函数。如果稍后有人修改全局,也不会影响您的模块。更不用说它也更快,因为全局变量是在全局表(
_G
)中查找的。更新:我刚刚阅读了 Roberto Ierusalimschy 的 Lua 性能技巧, Lua 作者之一,它几乎解释了您需要了解的有关 Lua、性能和优化的所有内容。 IMO 最重要的规则是:
Lua is a dynamic language. Compilers can do a lot of reasoning in static languages, like pulling constant expressions out of the loop. In dynamic languages, the situation is a bit different.
Lua's main (and also only) data structure is the table.
math
is also just a table, even though it is used as a namespace here. Nobody can stop you from modifying themath.sin
function somewhere in the loop (even thought that would be an unwise thing to do), and the compiler cannot know that when compiling the code. Therefore the compiler does exactly what you instruct it to do: in every iteration of the loop, lookup thesin
function in themath
table and call it.Now, if YOU know that you are not going to modify
math.sin
(i.e. you are going to call the same function), you can save it in a local variable outside the loop. Because there are no table lookups, the resulting code is faster.The situation is a bit different with LuaJIT - it uses tracing and some advanced magic to see what your code is doing in runtime, so it can actually optimize the loop by moving the expression outside of the loop, and other optimizations, apart from actually compiling it to machine code, making it crazy fast.
Regarding the the 'redeclaring variables as local' - many times when defining a module, you want to work with the original function. When accessing
pairs
,max
or anything using their global variables, nobody can assure you that it will be the same function every call. For example stdlib redefines a lot of global functions.By creating a local variable with the same name as the global, you essentially store the function into a local variable, and because local variables (which are lexically scoped, meaning they are visible in the current scope and any nested scopes too) take precedence before globals, you make sure to always call the same function. Should someone modify the global later, it will not affect your module. Not to mention it is also faster, because globals are looked up in a global table (
_G
).Update: I just read Lua Performance Tips by Roberto Ierusalimschy, one of Lua authors, and it pretty much explains everything that you need to know about Lua, performance and optimization. IMO the most important rules are:
为什么默认情况下不这样做,我不知道。然而,为什么它更快是因为本地变量被写入寄存器,而全局变量意味着在表(_G)中查找它,众所周知,这会稍微慢一些。
至于可见性(就像格式函数一样):局部掩盖了全局。因此,如果您声明一个与全局函数同名的局部函数,只要它在作用域内,就会使用局部函数。如果您想使用全局函数,请使用 _G.function。
如果你真的想要快速 Lua,你可以尝试 LuaJIT
The reason why it's not done by default, I don't know. Why it's faster however is because locals get written to a register, while a global means looking it up in a table (_G), which is known to be somewhat slower.
As for the visibility (like with the format function): A local obscures the global. So if you declare a local function with the same name as a global, the local will be used instead as long as it is in scope. If you would want to use the global function instead, use _G.function.
If you really want fast Lua, you could try LuaJIT
默认情况下这样做是完全错误的。
当一遍又一遍地使用函数时,使用本地引用而不是表访问可以说是有用的,就像在示例循环中一样:
但是,在循环外部,添加表访问的开销完全可以忽略不计。
因为您上面制作的两个代码示例并不意味着完全相同的事情。
Lua 是一种非常动态的语言,您不能做出与其他限制性更强的语言(例如 C)相同的假设。函数可以在循环运行时发生变化。鉴于语言的动态特性,编译器不能假设函数不会改变。或者至少不需要对代码及其后果进行复杂的分析。
诀窍在于,即使您的两段代码看起来相同,但在 Lua 中它们却并非如此。在第一个中,您明确告诉它“在每次迭代时将 sin 函数放入数学表中”。在第二个中,您一次又一次地使用对同一函数的单个引用。
考虑一下:
Doing that by default is plain wrong.
It is arguably useful to use local references instead of table accesses when a function is used over and over again, like inside your example loop:
However, outside loops, the overhead of adding a table access is completely negligible.
Because the two code samples you made above don't mean exactly the same thing.
Lua is a very dynamic language, and you can't make the same assumptions than in other more restrictive languages, like C. The function can change while your loop is running. Given the dynamic nature of the language, the compiler can not assume that the function will not change. Or at least not without a complex analysis of your code and its ramifications.
The trick is that, even if your two pieces of code look equivalent, in Lua they are not. On the first one you are explicitly telling it to "get the sin function inside the math table on every iteration". On the second one you are using a single reference to the same function again and again.
Consider this:
将函数存储在局部变量中会删除表索引以在循环的每次迭代中查找函数键,数学函数是显而易见的,因为它需要在数学表中查找哈希,其他函数则不需要,它们被索引到
_G
(全局表),从 5.2 开始为_ENV
(环境表)。另外,人们应该能够使用其调试钩子 API 或使用现有的 lua 调试器来分析 lua。
Storing functions in local variables removes the table indexing to look up the function key each iteration of the loop, the math ones are obvious, as it needs to lookup the hash in the Math table, the others aren't, they are indexed into the
_G
(global table), which is now_ENV
(environment table) as of 5.2.Also, one should be able to profile lua using its debug hooks API, or by using the lua debuggers lying around.
我的假设是,在优化版本中,因为对函数的引用存储在局部变量中,所以不必在 for 循环的每次迭代中都进行树遍历(用于查找 math.sin< /代码>)。
我不确定设置为函数名称的本地引用,但我假设如果找不到本地引用,则需要某种全局命名空间查找。
话又说回来,我可能大错特错了;)
编辑:我还假设 Lua 编译器是愚蠢的(无论如何,这对我来说是关于编译器的一般假设;))
My assumption is that in the optimized version, because the reference to the function is stored in a local variable, a tree traversal doesn't have to be done on every iteration of the for loop (for lookup to
math.sin
).I'm not sure about the local references set to function names, but I'd assume that there's some sort of a global namespace lookup required if a local one isn't found.
Then again, I could be WAY off base ;)
Edit: I also assume that the Lua compiler is dumb (which is a general assumption for me about compilers anyway ;))
这不仅仅是
Lua
的错误/功能,如果您访问本地值而不是值,包括Java
和C
在内的许多语言都会执行得更快超出范围,例如来自类或数组。例如,在 C++ 中,访问本地成员比访问某个类的变量成员更快。
更快地计数到 10,000:
这会比:
Lua
将全局值保存在表中的原因是因为它允许程序员只需更改 _G 引用的表即可快速保存和更改全局环境。我同意,如果有一些将全局表 _G 视为特殊情况的“语法糖”,那就太好了;将它们全部重写为文件作用域中的局部变量(或类似的东西),当然没有什么可以阻止我们自己这样做;也许是一个函数 optGlobalEnv(...) 使用 unpack() 或其他方法将 _G 表及其成员/值“本地化”到“文件范围”。This isn't just a bug/feature of
Lua
, many languages includingJava
andC
will perform faster if you access local values instead of values out of scope, such as from a class or array.In
C++
for example, it's faster to access a local member than it would be to access variable members of some class.This would count to 10,000 faster:
than:
The reason
Lua
holds global values inside a table is because it allows the programmer to quickly save and change the global environment simply by changing the table that _G references. I agree that it would be nice to have some 'syntatic sugar' that treated the global table _G as a special case; rewriting them all as local variables in the file scope (or something similar), of course there is nothing stopping us from doing this ourselves; maybe a function optGlobalEnv(...) that 'localizes' the _G table and it's members / values to the 'file scope' using unpack() or something.