Lua 挑战:你能提高 mandelbrot 实现的性能吗?

发布于 2024-07-14 03:17:26 字数 1492 浏览 13 评论 0原文

状态:到目前为止,最佳答案的程序执行时间是原始程序的 33%! 但可能还有其他方法可以优化它。


Lua 是目前最快的脚本语言,但是 Lua 在一些针对 C/C++ 的基准测试中得分非常差。

其中之一是 mandelbrot 测试(生成 Mandelbrot 设置便携式位图文件 N=16,000),其得分为可怕的 1:109(多核)或 1:28(单核),

因为速度增量相当大,这是一个很好的优化候选者。 另外,我确信那些知道 Mike Pall 的人可能会认为不可能进一步优化这一点,但这是明显错误的。 任何做过优化的人都知道,总是可以做得更好。 此外,我确实通过一些调整获得了一些额外的性能,所以我知道它是可能的:)

-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
-- contributed by Mike Pall

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char

write("P4\n", width, " ", height, "\n")

for y=0,height-1 do
  local Ci = 2*y / height - 1
  for xb=0,width-1,8 do
    local bits = 0
    local xbb = xb+7
    for x=xb,xbb < width and xbb or width-1 do
      bits = bits + bits
      local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
      local Cr = x * wscale - 1.5
      for i=1,m do
        local Zri = Zr*Zi
        Zr = Zrq - Ziq + Cr
        Zi = Zri + Zri + Ci
        Zrq = Zr*Zr
        Ziq = Zi*Zi
        if Zrq + Ziq > limit2 then
          bits = bits + 1
          break
        end
      end
    end
    if xbb >= width then
      for x=width,xbb do bits = bits + bits + 1 end
    end
    write(char(255-bits))
  end
end

那么如何优化它(当然,与任何优化一样,你必须测量你的实现以确保它更快)。 并且你不能为此改变 Lua 的 C 核心,或者使用 LuaJit,它是为了寻找方法来优化 Lua 的弱点之一。

编辑:为此设置赏金以使挑战更有趣。

Status: So far the best answer's program executes in 33% of the time of the original program! But there is probably still other ways to optimize it.


Lua is currently the fastest scripting language out there, however Lua scores really bad in a few benchmarks against C/C++.

One of those is the mandelbrot test (Generate Mandelbrot set portable bitmap file N=16,000), where it scores a horrible 1:109(Multi Core) or 1:28(Single Core)

Since the Delta in speed is quite large, this is a good candidate for optimizations. Also I'm sure some that those who know who Mike Pall is might believe its not possible to optimize this any further, but that's blatantly wrong. Anyone who has done optimizations knows it is always possible to do better. Besides I did manage to get some extra performance with a few tweaks, so I know its possible :)

-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
-- contributed by Mike Pall

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char

write("P4\n", width, " ", height, "\n")

for y=0,height-1 do
  local Ci = 2*y / height - 1
  for xb=0,width-1,8 do
    local bits = 0
    local xbb = xb+7
    for x=xb,xbb < width and xbb or width-1 do
      bits = bits + bits
      local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
      local Cr = x * wscale - 1.5
      for i=1,m do
        local Zri = Zr*Zi
        Zr = Zrq - Ziq + Cr
        Zi = Zri + Zri + Ci
        Zrq = Zr*Zr
        Ziq = Zi*Zi
        if Zrq + Ziq > limit2 then
          bits = bits + 1
          break
        end
      end
    end
    if xbb >= width then
      for x=width,xbb do bits = bits + bits + 1 end
    end
    write(char(255-bits))
  end
end

So how could this be optimized (of course as with any optimization you have to measure your implementation to be sure its faster). And you aren't allowed to alter the C-core of Lua for this, or use LuaJit, its about finding ways to optimizing one of Lua's weak weak points.

Edit: Putting a Bounty on this as to make the challenge more fun.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

夏夜暖风 2024-07-21 03:17:27

我不知道 Lua 是否适合生成工作代码,但您应该能够通过使用一些数学技巧来大幅提高 Mandelbrot 性能。 有人建议使用对称性来加速它,使用此优化可以完成另一个重大改进:

使用使用 Mandelbrot 部分的矩形坐标的递归函数。 然后,它计算矩形边界线和中间分割的两条线处的值。 之后,有 4 个子矩形。 如果其中一个边框像素颜色全部相同,则可以简单地用该颜色填充它,如果不是,则递归调用该部分的函数。

我搜索了该算法的另一种解释,并在此处找到了 -以及一个漂亮的可视化。 旧的 DOS 程序 FRACTINT 将此优化称为“Tesseral 模式”。

I don't know Lua that good to produce working code, but you should be able to heavily increase Mandelbrot performance by using some math tricks. There was a suggestion about using symmetry to speed it up, another big improvement could be done using this optimization:

Use a recursive function that uses rectangle coordinates of a Mandelbrot portion. It then calculates the values at the rectangles border lines and the two lines that split in the middle. After this, there are 4 sub-rectangles. If one of it has all the same border pixel colors, you can simply fill it with this color, if not, you recursively call the function on this portion.

I searched for another explanation of this algorithm and found one here - along with a nice visualization. The old DOS program FRACTINT calls this optimization "Tesseral mode".

小…红帽 2024-07-21 03:17:27

为什么要使用局部变量Zri? 可以通过重新排序接下来的两个语句来避免使用它:

Zi = 2*Zr*Zi + Ci
Zr = Zrq - Ziq + Cr

也可以使用简单的周期性检查,但加速取决于 m。 “m”越大,周期性检查获得的加速效果越好。

Why to use local variable Zri? It is possible to avoid its use by reordering next two statements:

Zi = 2*Zr*Zi + Ci
Zr = Zrq - Ziq + Cr

There's also possible to use simple periodicity checking, but speedup depends on m. The larger "m" is, the better is speedup gained from periodicity checking.

赠我空喜 2024-07-21 03:17:27

罗伯特·古尔德 其中之一是 mandelbrot 测试(生成 Mandelbrot 设置便携式位图文件 N=16,000),其得分为可怕的 1:109

当您引用基准测试游戏中的数字时,请说明这些数字的来源,以便读者有一定的背景。

在这种情况下,您似乎已经在四核机器上测量了数据,其中最快的程序已被重写以利用多个核心。 而不是查看经过的时间 排序依据CPU 时间,您会看到比率下降到 1:28

或者查看中位数和四分位数以获得更好的印象 C++ 测量集与 Lua 测量集的比较

或者有一整套测量程序被迫仅使用一个核心 - Lua 与 C++ 的比较 - 如果你看一下 那些 Lua pi-digits 程序 你会看到它们使用 C 语言 GNU GMP图书馆。

Robert Gould > One of those is the mandelbrot test (Generate Mandelbrot set portable bitmap file N=16,000), where it scores a horrible 1:109

When you quote numbers from the benchmarks game please show where those numbers come from so readers have some context.

In this case you seem to have taken numbers measured on the quadcore machine where the fastest programs have been re-written to exploit multiple cores. Instead of looking at elapsed time sort by CPU time and you'll see the ratio drop to 1:28.

Or look at the median and quartiles to get a better impression of how the set of C++ measurements compares to the set of Lua measurements.

Or there's a whole set of measurements where programs are forced to use just one core - Lua compared with C++ - and if you take a look at those Lua pi-digits programs you'll see that they use the C language GNU GMP library.

最佳男配角 2024-07-21 03:17:27

接下来我做的就是把一遍又一遍计算出来的东西缓存起来,并将bit+bit替换为bit*2,这些简单的优化功能相当强大……

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char
local results={}
write("P4\n", width, " ", height, "\n")
local height_minus_one = height - 1
local width_minus_one = width -1

for y=0,height_minus_one do
  local Ci = 2*y / height_minus_one
  for xb=0,width_minus_one,8 do
    local bits = 0
    local xbb = xb+7
    for x=xb,xbb < width and xbb or width_minus_one do
      bits = bits *2
      local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
      local Cr = x * wscale - 1.5
      for i=1,m do
        local Zri = Zr*Zi
        Zr = Zrq - Ziq + Cr
        Zi = Zri + Zri + Ci
        Zrq = Zr*Zr
        Ziq = Zi*Zi
        if Zrq + Ziq > limit2 then
          bits = bits + 1
          break
        end
      end
    end
    if xbb >= width then
      for x=width,xbb do bits = bits *2 + 1 end
    end
    table.insert(results,(char(255-bits)))
  end
end
write(table.concat(results))

这个优化让程序的运行时间是原来的34%,但 Markus Q 的优化仍然击败了我的;)

Next step I did was cache the stuff that was calculated over and over, and replace bit+bit to bit*2, These simple optimizations are quite powerful...

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char
local results={}
write("P4\n", width, " ", height, "\n")
local height_minus_one = height - 1
local width_minus_one = width -1

for y=0,height_minus_one do
  local Ci = 2*y / height_minus_one
  for xb=0,width_minus_one,8 do
    local bits = 0
    local xbb = xb+7
    for x=xb,xbb < width and xbb or width_minus_one do
      bits = bits *2
      local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
      local Cr = x * wscale - 1.5
      for i=1,m do
        local Zri = Zr*Zi
        Zr = Zrq - Ziq + Cr
        Zi = Zri + Zri + Ci
        Zrq = Zr*Zr
        Ziq = Zi*Zi
        if Zrq + Ziq > limit2 then
          bits = bits + 1
          break
        end
      end
    end
    if xbb >= width then
      for x=width,xbb do bits = bits *2 + 1 end
    end
    table.insert(results,(char(255-bits)))
  end
end
write(table.concat(results))

This optimization makes the program run in 34% of the time of the original, but Markus Q's optimization still beat mine ;)

假情假意假温柔 2024-07-21 03:17:27

这是另一种尝试,但结果比本地访问变量要慢(我想象使用干净的环境会更快地找到变量,但事实并非如此,本地的虚拟寄存器稍微快一些)将运行时间降低至 41%。

local env={}
env.width = tonumber(arg and arg[1]) or 100
env.height = env.width
env.wscale = 2/env.width
env.m = 50
env.limit2 = 4.0
env.write = io.write
env.char = string.char
env.results={}
env.height_minus_one = env.height - 1
env.width_minus_one = env.width -1
env.insert = table.insert

setfenv(function()
    write("P4\n", env.width, " ", env.height, "\n")
    for y=0,height_minus_one do
      local Ci = 2*y / height_minus_one
      for xb=0,width_minus_one,8 do
        local bits = 0
        local xbb = xb+7
        for x=xb,xbb < width and xbb or width_minus_one do
          bits = bits *2
          local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
          local Cr = x * wscale - 1.5
          for i=1,m do
            local Zri = Zr*Zi
            Zr = Zrq - Ziq + Cr
            Zi = Zri + Zri + Ci
            Zrq = Zr*Zr
            Ziq = Zi*Zi
            if Zrq + Ziq > limit2 then
              bits = bits + 1
              break
            end
          end
        end
        if xbb >= width then
          for x=width,xbb do bits = bits *2 + 1 end
        end
        insert(results,(char(255-bits)))
      end
    end
end,env)()

io.write(table.concat(env.results))

This was another attempt, but it turned out to be slower than local access of variables (I imagined using a clean environment would have made it faster to find the variables, but it wasn't the case, local's virtual registers is slightly faster) This brought the runtime down to 41%.

local env={}
env.width = tonumber(arg and arg[1]) or 100
env.height = env.width
env.wscale = 2/env.width
env.m = 50
env.limit2 = 4.0
env.write = io.write
env.char = string.char
env.results={}
env.height_minus_one = env.height - 1
env.width_minus_one = env.width -1
env.insert = table.insert

setfenv(function()
    write("P4\n", env.width, " ", env.height, "\n")
    for y=0,height_minus_one do
      local Ci = 2*y / height_minus_one
      for xb=0,width_minus_one,8 do
        local bits = 0
        local xbb = xb+7
        for x=xb,xbb < width and xbb or width_minus_one do
          bits = bits *2
          local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
          local Cr = x * wscale - 1.5
          for i=1,m do
            local Zri = Zr*Zi
            Zr = Zrq - Ziq + Cr
            Zi = Zri + Zri + Ci
            Zrq = Zr*Zr
            Ziq = Zi*Zi
            if Zrq + Ziq > limit2 then
              bits = bits + 1
              break
            end
          end
        end
        if xbb >= width then
          for x=width,xbb do bits = bits *2 + 1 end
        end
        insert(results,(char(255-bits)))
      end
    end
end,env)()

io.write(table.concat(env.results))
柳若烟 2024-07-21 03:17:26

通过 2,(在我的机器上)比之前的好大约 30%。 主要的节省来自展开内循环以分摊开销。

还包括(已注释掉)的是当您卡在中央心形线时,通过提前退出(并将像素设置为黑色)来节省时间的中止尝试。 它有效,但无论我如何调整它,它都会变慢。

我得走了,但我会留下一个分手建议。 通过对结果进行游程编码可能会进行一些优化(因此,您可以保存一个列表(白点的数量、黑点的数量、白点的数量等),而不是保存一堆位扭曲的字符。 )。 这将:

  1. 减少存储/GC 开销
  2. 允许对输出生成进行一些优化(当数字大于等于 8 时)
  3. 允许进行一些轨道检测。

不知道它是否可以编码得足够紧密以便能够飞行,但如果我有更多时间,这就是我下一步会尝试的地方。

-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
-- contributed by Mike Pall
-- with optimizations by Markus J. Q. (MarkusQ) Roberts

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char

local h2 = math.floor(height/2)
local hm = height - h2*2
local top_half = {}

for y=0,h2+hm do
    local Ci = 2*y / height - 1
    local line = {""}
    for xb=0,width-1,8 do
        local bits = 0
        local xbb = xb+7
        for x=xb,xbb < width and xbb or width-1 do
            bits = bits + bits
            local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
            local Cr = x * wscale - 1.5
            local Zri = Zr*Zi
            for i=1,m/5 do
                Zr = Zrq - Ziq + Cr
                Zi = Zri + Zri + Ci
                Zri = Zr*Zi

                Zr = Zr*Zr - Zi*Zi + Cr
                Zi = 2*Zri +         Ci
                Zri = Zr*Zi

                Zr = Zr*Zr - Zi*Zi + Cr
                Zi = 2*Zri +         Ci
                Zri = Zr*Zi

                Zr = Zr*Zr - Zi*Zi + Cr
                Zi = 2*Zri +         Ci
                Zri = Zr*Zi

                Zr = Zr*Zr - Zi*Zi + Cr
                Zi = 2*Zri +         Ci
                Zri = Zr*Zi

                Zrq = Zr*Zr
                Ziq = Zi*Zi
                Zri = Zr*Zi
                if Zrq + Ziq > limit2 then
                    bits = bits + 1
                    break
                    end
                -- if i == 1 then
                --    local ar,ai       = 1-4*Zr,-4*Zi
                --    local a_r         = math.sqrt(ar*ar+ai*ai)
                --    local k           = math.sqrt(2)/2
                --    local br,bi2      = math.sqrt(a_r+ar)*k,(a_r-ar)/2
                --    if (br+1)*(br+1) + bi2 < 1 then
                --        break
                --        end
                --    end
                end
            end
        for x=width,xbb do 
            bits = bits + bits + 1 
            end
        table.insert(line,char(255-bits))
        end
    line = table.concat(line) 
    table.insert(top_half,line)
    end

write("P4\n", width, " ", height, "\n")
for y=1,h2+hm do
    write(top_half[y])
    end
for y=h2,1,-1 do
    write(top_half[y])
   end

Pass 2, about 30% better (on my machine) than my previous. The main saving came from unrolling the inner loop to amortize the overhead.

Also included (commented out) is an aborted attempt to save time by exiting early (& set the pixel black) when you are stuck in the central cardioid. It works, but it's slower no matter how I jiggered it.

I've got to run, but I'll leave a parting suggestion. There may be some optimization possible by run-length encoding the results (so instead of saving a bunch of bit-twiddled chars you'd save a list (number of white dots, number of black dots, number of white dots, etc.)). This would:

  1. Reduce the storage/GC overhead
  2. Allow some optimizations on the output generation (when the numbers were >> 8)
  3. Permit some orbit detection.

No idea if it could be coded tight enough to fly, but that is where I would try next if I had more time.

-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
-- contributed by Mike Pall
-- with optimizations by Markus J. Q. (MarkusQ) Roberts

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char

local h2 = math.floor(height/2)
local hm = height - h2*2
local top_half = {}

for y=0,h2+hm do
    local Ci = 2*y / height - 1
    local line = {""}
    for xb=0,width-1,8 do
        local bits = 0
        local xbb = xb+7
        for x=xb,xbb < width and xbb or width-1 do
            bits = bits + bits
            local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
            local Cr = x * wscale - 1.5
            local Zri = Zr*Zi
            for i=1,m/5 do
                Zr = Zrq - Ziq + Cr
                Zi = Zri + Zri + Ci
                Zri = Zr*Zi

                Zr = Zr*Zr - Zi*Zi + Cr
                Zi = 2*Zri +         Ci
                Zri = Zr*Zi

                Zr = Zr*Zr - Zi*Zi + Cr
                Zi = 2*Zri +         Ci
                Zri = Zr*Zi

                Zr = Zr*Zr - Zi*Zi + Cr
                Zi = 2*Zri +         Ci
                Zri = Zr*Zi

                Zr = Zr*Zr - Zi*Zi + Cr
                Zi = 2*Zri +         Ci
                Zri = Zr*Zi

                Zrq = Zr*Zr
                Ziq = Zi*Zi
                Zri = Zr*Zi
                if Zrq + Ziq > limit2 then
                    bits = bits + 1
                    break
                    end
                -- if i == 1 then
                --    local ar,ai       = 1-4*Zr,-4*Zi
                --    local a_r         = math.sqrt(ar*ar+ai*ai)
                --    local k           = math.sqrt(2)/2
                --    local br,bi2      = math.sqrt(a_r+ar)*k,(a_r-ar)/2
                --    if (br+1)*(br+1) + bi2 < 1 then
                --        break
                --        end
                --    end
                end
            end
        for x=width,xbb do 
            bits = bits + bits + 1 
            end
        table.insert(line,char(255-bits))
        end
    line = table.concat(line) 
    table.insert(top_half,line)
    end

write("P4\n", width, " ", height, "\n")
for y=1,h2+hm do
    write(top_half[y])
    end
for y=h2,1,-1 do
    write(top_half[y])
   end
长亭外,古道边 2024-07-21 03:17:26

所以这里是 ~40% 的开始:

-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
-- contributed by Mike Pall

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char

function addChar (line, c)
    table.insert(line, c)
    for i=table.getn(line)-1, 1, -1 do
        if string.len(line[i]) > string.len(line[i+1]) then
            break
            end
        line[i] = line[i] .. table.remove(line)
        end
    end

local h2 = math.floor(height/2)
local hm = height - h2*2
local top_half = {}
for y=0,h2+hm do
    local Ci = 2*y / height - 1
    local line = {""}
    for xb=0,width-1,8 do
        local bits = 0
        local xbb = xb+7
        for x=xb,xbb < width and xbb or width-1 do
            bits = bits + bits
            local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
            local Cr = x * wscale - 1.5
            for i=1,m do
                local Zri = Zr*Zi
                Zr = Zrq - Ziq + Cr
                Zi = Zri + Zri + Ci
                Zrq = Zr*Zr
                Ziq = Zi*Zi
                if Zrq + Ziq > limit2 then
                    bits = bits + 1
                    break
                    end
                end
            end
        for x=width,xbb do 
            bits = bits + bits + 1 
            end
        addChar(line,char(255-bits))
        end
    line = table.concat(line) 
    table.insert(top_half,line)
    end

write("P4\n", width, " ", height, "\n")
for y=1,h2+hm do
    write(top_half[y])
    end
for y=h2,1,-1 do
    write(top_half[y])
    end

--MarkusQ

So here's ~40% for a start:

-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
-- contributed by Mike Pall

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char

function addChar (line, c)
    table.insert(line, c)
    for i=table.getn(line)-1, 1, -1 do
        if string.len(line[i]) > string.len(line[i+1]) then
            break
            end
        line[i] = line[i] .. table.remove(line)
        end
    end

local h2 = math.floor(height/2)
local hm = height - h2*2
local top_half = {}
for y=0,h2+hm do
    local Ci = 2*y / height - 1
    local line = {""}
    for xb=0,width-1,8 do
        local bits = 0
        local xbb = xb+7
        for x=xb,xbb < width and xbb or width-1 do
            bits = bits + bits
            local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
            local Cr = x * wscale - 1.5
            for i=1,m do
                local Zri = Zr*Zi
                Zr = Zrq - Ziq + Cr
                Zi = Zri + Zri + Ci
                Zrq = Zr*Zr
                Ziq = Zi*Zi
                if Zrq + Ziq > limit2 then
                    bits = bits + 1
                    break
                    end
                end
            end
        for x=width,xbb do 
            bits = bits + bits + 1 
            end
        addChar(line,char(255-bits))
        end
    line = table.concat(line) 
    table.insert(top_half,line)
    end

write("P4\n", width, " ", height, "\n")
for y=1,h2+hm do
    write(top_half[y])
    end
for y=h2,1,-1 do
    write(top_half[y])
    end

-- MarkusQ

枕梦 2024-07-21 03:17:26

现在至少有一个答案比我的解决方案更快,我将发布我的答案。

-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
-- contributed by Mike Pall

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char
local insert = table.insert
local results={}
write("P4\n", width, " ", height, "\n")

for y=0,height-1 do
  local Ci = 2*y / height - 1
  for xb=0,width-1,8 do
    local bits = 0
    local xbb = xb+7
    for x=xb,xbb < width and xbb or width-1 do
      bits = bits + bits
      local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
      local Cr = x * wscale - 1.5
      for i=1,m do
        local Zri = Zr*Zi
        Zr = Zrq - Ziq + Cr
        Zi = Zri + Zri + Ci
        Zrq = Zr*Zr
        Ziq = Zi*Zi
        if Zrq + Ziq > limit2 then
          bits = bits + 1
          break
        end
      end
    end
    if xbb >= width then
      for x=width,xbb do bits = bits + bits + 1 end
    end
    insert(results,(char(255-bits)))
  end
end
write(table.concat(results))

技巧是在将值发送到输出之前将值存储到表中。
像这样简单的事情将运行时间减少到 58%。

Now that there is at least one answer faster than my solution I'll post my answer

-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
-- contributed by Mike Pall

local width = tonumber(arg and arg[1]) or 100
local height, wscale = width, 2/width
local m, limit2 = 50, 4.0
local write, char = io.write, string.char
local insert = table.insert
local results={}
write("P4\n", width, " ", height, "\n")

for y=0,height-1 do
  local Ci = 2*y / height - 1
  for xb=0,width-1,8 do
    local bits = 0
    local xbb = xb+7
    for x=xb,xbb < width and xbb or width-1 do
      bits = bits + bits
      local Zr, Zi, Zrq, Ziq = 0.0, 0.0, 0.0, 0.0
      local Cr = x * wscale - 1.5
      for i=1,m do
        local Zri = Zr*Zi
        Zr = Zrq - Ziq + Cr
        Zi = Zri + Zri + Ci
        Zrq = Zr*Zr
        Ziq = Zi*Zi
        if Zrq + Ziq > limit2 then
          bits = bits + 1
          break
        end
      end
    end
    if xbb >= width then
      for x=width,xbb do bits = bits + bits + 1 end
    end
    insert(results,(char(255-bits)))
  end
end
write(table.concat(results))

The trick is storing values to a table before sending them to the output.
Something as simple as this reduced the run time to 58%.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文