什么是“矢量化”?
现在好几次了,我在 matlab、fortran ……其他一些……中遇到这个术语,但我从未找到解释它是什么意思,它有什么作用?所以我在这里问,什么是矢量化,例如“循环矢量化”是什么意思?
Several times now, I've encountered this term in matlab, fortran ... some other ... but I've never found an explanation what does it mean, and what it does? So I'm asking here, what is vectorization, and what does it mean for example, that "a loop is vectorized" ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
许多CPU具有“向量”或“SIMD”指令集,它们同时对两个、四个或更多个数据块应用相同的操作。现代x86芯片有SSE指令,许多PPC芯片有“Altivec”指令,甚至一些ARM芯片有向量指令集,称为NEON。
“矢量化”(简化)是重写循环的过程,这样它就不会处理数组的单个元素 N 次,而是同时处理(比如说)数组的 4 个元素 N/4 次。
我选择 4 是因为它是现代硬件最有可能直接支持 32 位浮点数或整数的值。
矢量化和循环展开之间的区别:
考虑以下非常简单的循环,它将两个数组的元素相加并将结果存储到第三个数组。
展开此循环会将其转换为如下所示的内容:
另一方面,对其进行向量化会产生如下所示的内容:
其中“addFourThingsAtOnceAndStoreResult”是编译器用于指定向量指令的任何内部函数的占位符。
术语:
请注意,大多数现代提前编译器都能够自动矢量化像这样的非常简单的循环,这通常可以通过编译选项启用(默认情况下在现代 C 和C++ 编译器,例如
gcc -O3 -march=native
)。 OpenMP#pragma omp simd
有时有助于提示编译器,特别是对于“归约”循环,例如对 FP 数组求和,其中矢量化需要假装 FP 数学是关联的。更复杂的算法仍然需要程序员的帮助来生成良好的矢量代码;我们称之为手动矢量化,通常使用诸如 x86
_mm_add_ps
之类的内在函数来映射到单个机器指令,如 Intel cpu 上的 SIMD 前缀总和 或 如何使用 SIMD 计算字符出现次数。或者甚至使用 SIMD 来解决简短的非循环问题,例如 将 9 个字符数字转换为 int 或无符号 int 的最疯狂的最快方法 或 如何将二进制整数转换为十六进制字符串?还使用术语“向量化” 来描述更高级别的软件转换,您可以将循环完全抽象出来,只描述对数组的操作,而不是对组成数组的元素进行操作。例如,用某种语言编写
C = A + B
,当这些是数组或矩阵时,允许这样做,这与 C 或 C++ 不同。在这样的低级语言中,您可以描述调用 BLAS 或 Eigen 库函数,而不是作为矢量化编程风格手动编写循环。关于这个问题的其他一些答案集中在矢量化和高级语言的含义上。Many CPUs have "vector" or "SIMD" instruction sets which apply the same operation simultaneously to two, four, or more pieces of data. Modern x86 chips have the SSE instructions, many PPC chips have the "Altivec" instructions, and even some ARM chips have a vector instruction set, called NEON.
"Vectorization" (simplified) is the process of rewriting a loop so that instead of processing a single element of an array N times, it processes (say) 4 elements of the array simultaneously N/4 times.
I chose 4 because it's what modern hardware is most likely to directly support for 32-bit floats or ints.
The difference between vectorization and loop unrolling:
Consider the following very simple loop that adds the elements of two arrays and stores the results to a third array.
Unrolling this loop would transform it into something like this:
Vectorizing it, on the other hand, produces something like this:
Where "addFourThingsAtOnceAndStoreResult" is a placeholder for whatever intrinsic(s) your compiler uses to specify vector instructions.
Terminology:
Note that most modern ahead-of-time compilers are able to auto vectorize very simple loops like this, which can often be enabled via a compile option (on by default with full optimization in modern C and C++ compilers, like
gcc -O3 -march=native
). OpenMP#pragma omp simd
is sometimes helpful to hint the compiler, especially for "reduction" loops like summing an FP array where vectorization requires pretending that FP math is associative.More complex algorithms still require help from the programmer to generate good vector code; we call this manual vectorization, often with intrinsics like x86
_mm_add_ps
that map to a single machine instruction as in SIMD prefix sum on Intel cpu or How to count character occurrences using SIMD. Or even use SIMD for short non-looping problems like Most insanely fastest way to convert 9 char digits into an int or unsigned int or How to convert a binary integer number to a hex string?The term "vectorization" is also used to describe a higher level software transformation where you might just abstract away the loop altogether and just describe operating on arrays instead of the elements that comprise them. e.g. writing
C = A + B
in some language that allows that when those are arrays or matrices, unlike C or C++. In lower-level languages like that, you could describe calling BLAS or Eigen library functions instead of manually writing loops as a vectorized programming style. Some other answers on this question focus on that meaning of vectorization, and higher-level languages.矢量化是将标量程序转换为矢量程序的术语。矢量化程序可以从一条指令运行多个操作,而标量只能一次对操作数对进行操作。
来自 wikipedia:
标量方法:
矢量化方法:
Vectorization is the term for converting a scalar program to a vector program. Vectorized programs can run multiple operations from a single instruction, whereas scalar can only operate on pairs of operands at once.
From wikipedia:
Scalar approach:
Vectorized approach:
矢量化广泛应用于需要高效处理大量数据的科学计算中。
在实际的编程应用程序中,我知道它在 NUMPY 中使用(不确定其他)。
Numpy(Python 中的科学计算包)使用向量化来快速操作 n 维数组,如果使用内置的 Python 选项来处理数组,通常会更慢。
尽管有大量的解释,但这里的向量化在NUMPY DOCUMENTATION PAGE中定义的内容
向量化描述了代码中没有任何显式循环、索引等 -当然,这些事情只是在优化的预编译 C 代码的“幕后”发生。矢量化代码有很多优点,其中包括:
矢量化代码更简洁,更易于阅读
更少的代码行通常意味着更少的错误
代码更接近标准的数学符号
(通常可以更容易地正确编码数学
构造)
向量化会产生更多“Pythonic”代码。没有
向量化,我们的代码将充满低效和
for 循环难以阅读。
Vectorization is used greatly in scientific computing where huge chunks of data needs to be processed efficiently.
In real programming application , i know it's used in NUMPY(not sure of other else).
Numpy (package for scientific computing in python) , uses vectorization for speedy manipulation of n-dimensional array ,which generally is slower if done with in-built python options for handling arrays.
although tons of explanation are out there , HERE'S WHAT VECTORIZATION IS DEFINED AS IN NUMPY DOCUMENTATION PAGE
Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” in optimized, pre-compiled C code. Vectorized code has many advantages, among which are:
vectorized code is more concise and easier to read
fewer lines of code generally means fewer bugs
the code more closely resembles standard mathematical notation
(making it easier, typically, to correctly code mathematical
constructs)
vectorization results in more “Pythonic” code. Without
vectorization, our code would be littered with inefficient and
difficult to read for loops.
它指的是在一个步骤中对数字列表(或“向量”)进行单个数学运算的能力。你经常在 Fortran 中看到它,因为它与科学计算相关,而科学计算与超级计算相关,矢量化算术首次出现在超级计算中。如今,几乎所有桌面 CPU 都通过英特尔 SSE 等技术提供某种形式的矢量化算术。 GPU 还提供一种矢量化算术形式。
It refers to a the ability to do single mathematical operation on a list -- or "vector" -- of numbers in a single step. You see it often with Fortran because that's associated with scientific computing, which is associated with supercomputing, where vectorized arithmetic first appeared. Nowadays almost all desktop CPUs offer some form of vectorized arithmetic, through technologies like Intel's SSE. GPUs also offer a form of vectorized arithmetic.
简单来说,矢量化意味着优化算法,使其能够利用处理器中的 SIMD 指令。
AVX、AVX2 和 AVX512 是在一条指令中对多个数据执行相同操作的指令集(intel)。例如。 AVX512 意味着您可以一次操作 16 个整数值(4 个字节)。这意味着,如果您有 16 个整数的向量,并且您希望将每个整数中的值加倍,然后加上 10。您可以将值加载到通用寄存器 [a,b,c] 16 次并执行相同的操作,也可以通过将所有 16 个值加载到 SIMD 寄存器 [xmm,ymm] 并执行一次操作来执行相同的操作。这可以加快矢量数据的计算速度。
在矢量化中,我们通过重构数据来利用这一点,以便我们可以对其执行 SIMD 操作并加快程序速度。
矢量化的唯一问题是处理条件。因为条件会分支执行流程。这可以通过屏蔽来处理。通过将条件建模为算术运算。例如。如果我们想在 value 大于 100 的情况下加 10。我们都可以。
或者我们可以将条件建模为算术运算,创建条件向量 c,
虽然这是一个非常简单的例子...因此,c 是我们的掩码向量,我们用它来根据其值执行二元运算。这避免了执行流的分支并实现矢量化。
矢量化与并行化同样重要。因此,我们应该尽可能地利用它。所有现代处理器都具有用于繁重计算工作负载的 SIMD 指令。我们可以通过矢量化来优化我们的代码以使用这些 SIMD 指令,这类似于并行化我们的代码以在现代处理器上可用的多个内核上运行。
最后我想提一下 OpenMP,它可以让您使用编译指示对代码进行矢量化。我认为这是一个很好的起点。 OpenACC 也是如此。
Vectorization, in simple words, means optimizing the algorithm so that it can utilize SIMD instructions in the processors.
AVX, AVX2 and AVX512 are the instruction sets (intel) that perform same operation on multiple data in one instruction. for eg. AVX512 means you can operate on 16 integer values(4 bytes) at a time. What that means is that if you have vector of 16 integers and you want to double that value in each integers and then add 10 to it. You can either load values on to general register [a,b,c] 16 times and perform same operation or you can perform same operation by loading all 16 values on to SIMD registers [xmm,ymm] and perform the operation once. This lets speed up the computation of vector data.
In vectorization we use this to our advantage, by remodelling our data so that we can perform SIMD operations on it and speed up the program.
Only problem with vectorization is handling conditions. Because conditions branch the flow of execution. This can be handled by masking. By modelling the condition into an arithmetic operation. eg. if we want to add 10 to value if it is greater then 100. we can either.
or we can model the condition into arithmetic operation creating a condition vector c,
this is very trivial example though... thus, c is our masking vector which we use to perform binary operation based on its value. This avoid branching of execution flow and enables vectorization.
Vectorization is as important as Parallelization. Thus, we should make use of it as much possible. All modern days processors have SIMD instructions for heavy compute workloads. We can optimize our code to use these SIMD instructions using vectorization, this is similar to parrallelizing our code to run on multiple cores available on modern processors.
I would like to leave with the mention of OpenMP, which lets yo vectorize the code using pragmas. I consider it as a good starting point. Same can be said for OpenACC.
我认为英特尔人很容易掌握。
链接 https ://software.intel.com/en-us/articles/vectorization-a-key-tool-to-improve-performance-on-modern-cpus
在 Java 中,可以选择将其包含在 JDK 中2020 年 15 月 15 日或 2021 年 JDK 16 后期。请参阅此官方问题。
By Intel people I think is easy to grasp.
Link https://software.intel.com/en-us/articles/vectorization-a-key-tool-to-improve-performance-on-modern-cpus
In Java there is a option to this be included in JDK 15 of 2020 or late at JDK 16 at 2021. See this official issue.
希望你一切都好!
矢量化是指将缩放器实现(其中单个操作一次处理单个实体)转换为矢量实现(其中单个操作同时处理多个实体)的所有技术。
矢量化是一种技术,借助它我们可以优化代码以有效地处理大量数据。矢量化在 NumPy、pandas 等科学应用中的应用,您也可以在使用 Matlab、图像处理、NLP 等时使用此技术。总的来说,它优化了程序的运行时间和内存分配。
希望您能得到答案!
谢谢。
hope you are well!
vectorization refers to all the techniques that convert scaler implementation, in which a single operation processes a single entity at a time to vector implementation in which a single operation processes multiple entities at the same time.
Vectorization refers to a technique with the help of which we optimize the code to work with huge chunks of data efficiently. application of vectorization seen in scientific applications like NumPy, pandas also you can use this technique while working with Matlab, image processing, NLP, and much more. Overall it optimizes the runtime and memory allocation of the program.
Hope you may get your answer!
Thank you. ????
我将定义向量化给定语言的一个功能,其中如何迭代某个集合的元素的责任可以从程序员(例如元素的显式循环)委托给由语言(例如隐式循环)。
现在,我们为什么要这么做?
请注意,对于第 3 点和第 4 点,某些语言(尤其是 Julia)允许使用程序员定义的顺序处理(例如
for
循环)来利用这些硬件并行化,但是当使用语言提供的矢量化方法。现在,虽然矢量化有很多优点,但有时使用显式循环比矢量化更直观地表达算法(也许我们需要诉诸复杂的线性代数运算、恒等和对角矩阵......所有这些都是为了保留我们的“矢量化”方法) ,如果使用显式排序形式没有计算上的缺点,则应首选此形式。
I would define vectorisation a feature of a given language where the responsibility on how to iterate over the elements of a certain collection can be delegated from the programmer (e.g. explicit loop of the elements) to some method provided by the language (e.g. implicit loop).
Now, why do we ever want to do that ?
Note that for points 3 and 4 some languages (Julia notably) allow these hardware parallelizations to be exploited also using programmer-defined order processing (e.g.
for
loops), but this happens automatically and under the hood when using the vectorisation method provided by the language.Now, while vectorisation has many advantages, sometimes an algorithm is more intuitively expressed using an explicit loop than vectorisation (where perhaps we need to resort to complex linear algebra operations, identity and diagonal matrices... all to retain our "vectorised" approach), and if using an explicit ordering form has no computational disadvantages, this one should be preferred.
参见上面两个答案。我只是想补充一点,想要进行矢量化的原因是这些操作可以很容易地由超级计算机和多处理器并行执行,从而产生巨大的性能增益。在单处理器计算机上,不会有任何性能提升。
See the two answers above. I just wanted to add that the reason for wanting to do vectorization is that these operations can easily be performed in paraell by supercomputers and multi-processors, yielding a big performance gain. On single processor computers there will be no performance gain.