优化 Haskell 中的数值数组性能
我正在为类似《我的世界》的世界开发地形生成算法。目前,我正在使用基于论文 'Simplex Noise 中的实现的单纯形噪声揭秘'[PDF],因为单纯形噪声应该比 Perlin 噪声更快并且具有更少的伪影。这看起来相当不错(见图),但到目前为止它也相当慢。
运行噪声函数 10 次(我需要不同波长的噪声,例如地形高度、温度、树木位置等),块(16x16x128 块)中的每个块有 3 个八度的噪声,或者总共约 100 万次对噪声函数的调用,大约需要 700-800 毫秒。尽管算法中没有明显昂贵的操作(至少对我来说),但对于以任何合适的速度生成地形来说,这至少慢了一个数量级。只是取整、取模、一些数组查找和基本算术。下面列出了该算法(用 Haskell 编写)。 SCC 评论用于分析。我省略了 2D 噪声函数,因为它们的工作方式相同。
g3 :: (Floating a, RealFrac a) => a
g3 = 1/6
{-# INLINE int #-}
int :: (Integral a, Num b) => a -> b
int = fromIntegral
grad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a)
grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
(1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
(0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]
{-# INLINE dot3 #-}
dot3 :: Num a => (a, a, a) -> a -> a -> a -> a
dot3 (a,b,c) x y z = a * x + b * y + c * z
{-# INLINE fastFloor #-}
fastFloor :: RealFrac a => a -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)
--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed
--Generate 3D noise between -0.5 and 0.5
simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> a
simplex3D p x y z = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
(i,j,k) = {-# SCC "ijk" #-} (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
(x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
(i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0
then if y0 >= z0 then (1,0,0,1,1,0) else
if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
else if y0 < z0 then (0,0,1,0,1,1) else
if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3)
xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
xyz3 = {-# SCC "xyz3" #-} (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3)
(ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255)
gi0 = {-# SCC "gi0" #-} mod (p V.! (ii + p V.! (jj + p V.! kk ))) 12
gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12
gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12
gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1 + p V.! (jj + 1 + p V.! (kk + 1 )))) 12
{-# INLINE n #-}
n gi (x',y',z') = {-# SCC "n" #-} (\a -> if a < 0 then 0 else
a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'
harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> a
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
f 0 = 0
f o = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1)
--Generate harmonic 3D noise between -0.5 and 0.5
harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> a
harmonicNoise3D p octaves l x y z = harmonic octaves
(\f -> simplex3D p (x * f / l) (y * f / l) (z * f / l))
对于分析,我使用了以下代码,
q _ = let p = perm 0 in
sum [harmonicNoise3D p 3 l x y z :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]]
main = do start <- getCurrentTime
print $ q ()
end <- getCurrentTime
print $ diffUTCTime end start
它生成以下信息:
COST CENTRE MODULE %time %alloc
simplex3D Main 18.8 21.0
n Main 18.0 19.6
out Main 10.1 9.2
harmonicNoise3D Main 9.8 4.5
harmonic Main 6.4 5.8
int Main 4.0 2.9
gi3 Main 4.0 3.0
xyz2 Main 3.5 5.9
gi1 Main 3.4 3.4
gi0 Main 3.4 2.7
fastFloor Main 3.2 0.6
xyz1 Main 2.9 5.9
ijk Main 2.7 3.5
gi2 Main 2.7 3.3
xyz3 Main 2.6 4.1
iijjkk Main 1.6 2.5
dot3 Main 1.6 0.7
为了进行比较,我还将算法移植到 C#。性能大约快了 3 到 4 倍,所以我想我一定做错了什么。但即便如此,它也没有我想要的那么快。所以我的问题是:任何人都可以告诉我是否有任何方法可以加快我的实现和/或一般算法的速度,或者是否有人知道具有更好性能特征但外观相似的不同噪声算法?
更新:
遵循下面提供的一些建议后,代码现在如下所示:
module Noise ( Permutation, perm
, noise3D, simplex3D
) where
import Data.Bits
import qualified Data.Vector.Unboxed as UV
import System.Random
import System.Random.Shuffle
type Permutation = UV.Vector Int
g3 :: Double
g3 = 1/6
{-# INLINE int #-}
int :: Int -> Double
int = fromIntegral
grad3 :: UV.Vector (Double, Double, Double)
grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
(1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
(0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]
{-# INLINE dot3 #-}
dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Double
dot3 (a,b,c) x y z = a * x + b * y + c * z
{-# INLINE fastFloor #-}
fastFloor :: Double -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)
--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed
--Generate 3D noise between -0.5 and 0.5
noise3D :: Permutation -> Double -> Double -> Double -> Double
noise3D p x y z = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
(i,j,k) = (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
(x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
(i1,j1,k1,i2,j2,k2) = if x0 >= y0
then if y0 >= z0 then (1,0,0,1,1,0) else
if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
else if y0 < z0 then (0,0,1,0,1,1) else
if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
xyz1 = (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3)
xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
xyz3 = (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3)
(ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255)
gi0 = rem (UV.unsafeIndex p (ii + UV.unsafeIndex p (jj + UV.unsafeIndex p kk ))) 12
gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12
gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12
gi3 = rem (UV.unsafeIndex p (ii + 1 + UV.unsafeIndex p (jj + 1 + UV.unsafeIndex p (kk + 1 )))) 12
{-# INLINE n #-}
n gi (x',y',z') = (\a -> if a < 0 then 0 else
a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'
harmonic :: Int -> (Double -> Double) -> Double
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
f 0 = 0
f o = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1)
--3D simplex noise
--syntax: simplex3D permutation number_of_octaves wavelength x y z
simplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Double
simplex3D p octaves l x y z = harmonic octaves
(\f -> noise3D p (x * f / l) (y * f / l) (z * f / l))
连同将块大小减小到 8x8x128,生成新地形块的速度现在约为 10-20 fps,这意味着移动现在周围的问题不再像以前那么严重了。当然,任何其他性能改进仍然值得欢迎。
I'm working on a terrain generation algorithm for a MineCraft-like world. Currently, I'm using simplex noise based on the implementation in the paper 'Simplex Noise Demystified' [PDF], since simplex noise is supposed to be faster and to have fewer artifacts than Perlin noise. This looks fairly decent (see image), but so far it's also pretty slow.
Running the noise function 10 times (I need noise with different wavelengths for things like terrain height, temperature, tree location, etc.) with 3 octaves of noise for each block in a chunk (16x16x128 blocks), or about 1 million calls to the noise function in total, takes about 700-800 ms. This is at least an order of magnitude too slow for the purposes of generating terrain with any decent kind of speed, despite the fact that there are no obvious expensive operations in the algorithm (at least to me). Just floor, modulo, some array lookups and basic arithmetic. The algorithm (written in Haskell) is listed below. The SCC comments are for profiling. I've omitted the 2D noise functions, since they work the same way.
g3 :: (Floating a, RealFrac a) => a
g3 = 1/6
{-# INLINE int #-}
int :: (Integral a, Num b) => a -> b
int = fromIntegral
grad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a)
grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
(1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
(0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]
{-# INLINE dot3 #-}
dot3 :: Num a => (a, a, a) -> a -> a -> a -> a
dot3 (a,b,c) x y z = a * x + b * y + c * z
{-# INLINE fastFloor #-}
fastFloor :: RealFrac a => a -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)
--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed
--Generate 3D noise between -0.5 and 0.5
simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> a
simplex3D p x y z = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
(i,j,k) = {-# SCC "ijk" #-} (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
(x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
(i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0
then if y0 >= z0 then (1,0,0,1,1,0) else
if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
else if y0 < z0 then (0,0,1,0,1,1) else
if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3)
xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
xyz3 = {-# SCC "xyz3" #-} (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3)
(ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255)
gi0 = {-# SCC "gi0" #-} mod (p V.! (ii + p V.! (jj + p V.! kk ))) 12
gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12
gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12
gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1 + p V.! (jj + 1 + p V.! (kk + 1 )))) 12
{-# INLINE n #-}
n gi (x',y',z') = {-# SCC "n" #-} (\a -> if a < 0 then 0 else
a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'
harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> a
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
f 0 = 0
f o = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1)
--Generate harmonic 3D noise between -0.5 and 0.5
harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> a
harmonicNoise3D p octaves l x y z = harmonic octaves
(\f -> simplex3D p (x * f / l) (y * f / l) (z * f / l))
For profiling, I used the following code,
q _ = let p = perm 0 in
sum [harmonicNoise3D p 3 l x y z :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]]
main = do start <- getCurrentTime
print $ q ()
end <- getCurrentTime
print $ diffUTCTime end start
which produces the following information:
COST CENTRE MODULE %time %alloc
simplex3D Main 18.8 21.0
n Main 18.0 19.6
out Main 10.1 9.2
harmonicNoise3D Main 9.8 4.5
harmonic Main 6.4 5.8
int Main 4.0 2.9
gi3 Main 4.0 3.0
xyz2 Main 3.5 5.9
gi1 Main 3.4 3.4
gi0 Main 3.4 2.7
fastFloor Main 3.2 0.6
xyz1 Main 2.9 5.9
ijk Main 2.7 3.5
gi2 Main 2.7 3.3
xyz3 Main 2.6 4.1
iijjkk Main 1.6 2.5
dot3 Main 1.6 0.7
To compare, I also ported the algorithm to C#. Performance there was about 3 to 4 times faster, so I imagine I must be doing something wrong. But even then it's not nearly as fast as I would like. So my question is this: can anyone tell me if there are any ways to speed up my implementation and/or the algorithm in general or does anyone know of a different noise algorithm that has better performance characteristics but a similar look?
Update:
After following some of the suggestions offered below, the code now looks as follows:
module Noise ( Permutation, perm
, noise3D, simplex3D
) where
import Data.Bits
import qualified Data.Vector.Unboxed as UV
import System.Random
import System.Random.Shuffle
type Permutation = UV.Vector Int
g3 :: Double
g3 = 1/6
{-# INLINE int #-}
int :: Int -> Double
int = fromIntegral
grad3 :: UV.Vector (Double, Double, Double)
grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0),
(1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1),
(0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)]
{-# INLINE dot3 #-}
dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Double
dot3 (a,b,c) x y z = a * x + b * y + c * z
{-# INLINE fastFloor #-}
fastFloor :: Double -> Int
fastFloor x = truncate (if x > 0 then x else x - 1)
--Generate a random permutation for use in the noise functions
perm :: Int -> Permutation
perm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed
--Generate 3D noise between -0.5 and 0.5
noise3D :: Permutation -> Double -> Double -> Double -> Double
noise3D p x y z = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where
(i,j,k) = (s x, s y, s z) where s a = fastFloor (a + (x + y + z) / 3)
(x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3
(i1,j1,k1,i2,j2,k2) = if x0 >= y0
then if y0 >= z0 then (1,0,0,1,1,0) else
if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1)
else if y0 < z0 then (0,0,1,0,1,1) else
if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0)
xyz1 = (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3)
xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3)
xyz3 = (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3)
(ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255)
gi0 = rem (UV.unsafeIndex p (ii + UV.unsafeIndex p (jj + UV.unsafeIndex p kk ))) 12
gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12
gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12
gi3 = rem (UV.unsafeIndex p (ii + 1 + UV.unsafeIndex p (jj + 1 + UV.unsafeIndex p (kk + 1 )))) 12
{-# INLINE n #-}
n gi (x',y',z') = (\a -> if a < 0 then 0 else
a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z'
harmonic :: Int -> (Double -> Double) -> Double
harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where
f 0 = 0
f o = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1)
--3D simplex noise
--syntax: simplex3D permutation number_of_octaves wavelength x y z
simplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Double
simplex3D p octaves l x y z = harmonic octaves
(\f -> noise3D p (x * f / l) (y * f / l) (z * f / l))
Together with reducing my chunk size to 8x8x128, generating new terrain chunks now occurs at about 10-20 fps, which means moving around is now not nearly as problematic as before. Of course, any other performance improvements are still welcome.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
最初引人注目的是您的代码是高度多态的。您应该将浮点类型统一专门化为 Double,以便 GHC(和 LLVM)有机会应用更积极的优化。
请注意,对于那些尝试重现的人,此代码导入:
好的。您可以尝试做很多事情来改进这段代码。
改进
数据表示
(a,a,a)
T !Double !Double !Double
Data.Array
切换到Data.Array.Unboxed
forPermutations
repa
包中的多维未装箱数组组成的三元组装箱数组编译器标志
-O2 -fvia-C -optc-O3 -fexcess- precision -optc-march 进行编译=native
(或与-fllvm
等效)-fspec-constr-count=16
更高效的库函数
mod
替换为rem
V.!
索引替换为未经检查的索引VU.unsafeIndex
(移动到Data.Vector.Unboxed
后运行时设置
-A20M
或-H
另外,请检查您的算法是否与 C# 算法相同,并且您使用的是相同的数据结构。
The thing that stands out initially is that your code is highly polymorphic. You should specialize your floating point type uniformly to
Double
, so GHC (and LLVM) have a chance of applying more aggressive optimizations.Note, for those trying to reproduce, this code imports:
Ok. There's lots of things you can try to improve this code.
Improvements
Data representation
(a,a,a)
with unboxed tripleT !Double !Double !Double
Data.Array
toData.Array.Unboxed
forPermutations
repa
packageCompiler flags
-O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native
(or equivalent with-fllvm
)-fspec-constr-count=16
More efficient library functions
mod
withrem
V.!
indexing with unchecked indexingVU.unsafeIndex
(after moving toData.Vector.Unboxed
Runtime settings
-A20M
or-H
Also, check your algorithm is identical to the C# one, and you're using the same data structures.