在哪里可以找到有关双三次插值和 Lanczos 重采样的好读物?
我想用 C++ 实现上述两种图像重采样算法(双三次和 Lanczos)。 我知道现有的实现有几十种,但我仍然想制作自己的实现。 我之所以这么做,部分原因是我想了解它们是如何工作的,部分原因是我想为它们提供一些主流实现中没有的功能(例如可配置的多 CPU 支持和进度报告)。
我尝试阅读维基百科,但内容对我来说有点太干了。 也许这些算法有一些更好的解释? 我在 SO 或 Google 上找不到任何东西。
已添加:似乎没有人能给我提供有关这些主题的良好链接。 有人至少可以尝试在这里解释一下吗?
I want to implement the two above mentioned image resampling algorithms (bicubic and Lanczos) in C++. I know that there are dozens of existing implementations out there, but I still want to make my own. I want to make it partly because I want to understand how they work, and partly because I want to give them some capabilities not found in mainstream implementations (like configurable multi-CPU support and progress reporting).
I tried reading Wikipedia, but the stuff is a bit too dry for me. Perhaps there are some nicer explanations of these algorithms? I couldn't find anything either on SO or Google.
Added: Seems like nobody can give me a good link about these topics. Can anyone at least try to explain them here?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
两种算法的基本工作原理都非常简单。 它们都是卷积滤波器。 对于每个输出值,卷积滤波器将卷积函数的原点移动到输出的中心,然后将输入中的所有值与该位置处的卷积函数的值相乘,并将它们加在一起。
卷积的一个性质是输出的积分是两个输入函数的积分的乘积。 如果考虑输入和输出图像,则积分表示平均亮度,如果希望亮度保持不变,则卷积函数的积分需要加起来为 1。
理解它们的一种方法是将卷积函数视为显示输入像素对输出像素的影响程度取决于它们的距离的函数。
卷积函数通常被定义为当距离大于某个值时它们为零,这样您就不必考虑每个输出值的每个输入值。
对于 lanczos 插值,卷积函数基于 sinc(x) = sin(x*pi)/x 函数,但仅采用前几个波瓣。 通常为3:
这个函数称为过滤器内核。
要使用 lanczos 重新采样,请想象您将输出和输入相互叠加,其中点表示像素位置。 对于每个输出像素位置,您从该点获取一个框 +- 3 个输出像素。 对于位于该框中的每个输入像素,以输出像素坐标中距输出位置的距离为参数,计算该位置处的 lanczos 函数的值。 然后,您需要通过缩放计算值来标准化它们,使它们加起来为 1。之后,将每个输入像素值与相应的缩放值相乘,并将结果加在一起以获得输出像素的值。
由于 lanzos 函数具有可分离性,并且如果调整大小,则网格是规则的,因此您可以通过分别进行水平和垂直卷积并预先计算每行的垂直过滤器和每列的水平过滤器来优化此功能。
双三次卷积基本相同,只是滤波器核函数不同。
要了解更多详细信息,数字图像处理,第 16.3 节。
另外,image_operations.cc 和 convolver.cc 在skia 中对lanczos 插值的实现有很好的评论。
The basic operation principle of both algorithms is pretty simple. They're both convolution filters. A convolution filter that for each output value moves the convolution functions point of origin to be centered on the output and then multiplies all the values in the input with the value of the convolution function at that location and adds them together.
One property of convolution is that the integral of the output is the product of the integrals of the two input functions. If you consider the input and output images, then the integral means average brightness and if you want the brightness to remain the same the integral of the convolution function needs to add up to one.
One way how to understand them is to think of the convolution function as something that shows how much input pixels influence the output pixel depending on their distance.
Convolution functions are usually defined so that they are zero when the distance is larger than some value so that you don't have to consider every input value for every output value.
For lanczos interpolation the convolution function is based on the sinc(x) = sin(x*pi)/x function, but only the first few lobes are taken. Usually 3:
This function is called the filter kernel.
To resample with lanczos imagine you overlay the output and input over eachother, with points signifying where the pixel locations are. For each output pixel location you take a box +- 3 output pixels from that point. For every input pixel that lies in that box, calculate the value of the lanczos function at that location with the distance from the output location in output pixel coordinates as the parameter. You then need to normalize the calculated values by scaling them so that they add up to 1. After that multiply each input pixel value with the corresponding scaling value and add the results together to get the value of the output pixel.
Because lanzos function has the separability property and, if you are resizing, the grid is regular, you can optimize this by doing the convolution horizontally and vertically separately and precalculate the vertical filters for each row and horizontal filters for each column.
Bicubic convolution is basically the same, with a different filter kernel function.
To get more detail, there's a pretty good and thorough explanation in the book Digital Image Processing, section 16.3.
Also, image_operations.cc and convolver.cc in skia have a pretty well commented implementation of lanczos interpolation.
虽然 Ants Aasma 所说的内容大致描述了这种差异,但我认为它并没有特别说明为什么你可能会这样做。
就链接而言,您正在询问图像处理中的一个非常基本的问题,任何有关该主题的体面的入门教科书都会对此进行描述。 如果我没记错的话,冈萨雷斯和伍兹对此很不错,但我远离我的书,无法检查。
现在谈谈细节,从根本上思考你在做什么应该会有所帮助。 您有一个测量方格,您想要为其插入新值。 在简单的上采样情况下,假设您想要在已有的每个测量之间进行新的测量(例如,分辨率加倍)。
现在您将无法获得“正确”的值,因为通常您没有该信息。 所以你必须估计一下。 这个怎么做? 一种非常简单的方法是线性插值。 每个人都知道如何用两个点来做到这一点,您只需在它们之间画一条线,然后从线中读取新值(在本例中,在中间点)。
现在图像是二维的,因此您确实想在左右和上下方向上执行此操作。 使用结果进行估计,瞧,您就得到了“双线性”插值。
这样做的主要问题是它不是很准确,尽管它比“最近邻居”方法更好(并且更慢),“最近邻居”方法也是非常本地化且快速的。
为了解决第一个问题,您想要比两点的线性拟合更好的东西,您想要将一些东西拟合到更多的数据点(像素),并且可以是非线性的。 三次样条是精度和计算成本之间的一个很好的权衡。 因此,这将为您提供一条平滑的拟合线,并且您再次通过中间的值来近似新的“测量值”。 在两个方向上执行此操作,您就得到了“双三次”插值。
所以这样更准确,但仍然很重。 解决速度问题的一种方法是使用卷积,它具有很好的特性,在傅立叶域中,它只是乘法,因此我们可以很快地实现它。 但是您无需担心实现,即可理解任何点的卷积结果都是一个函数(您的图像)集成到另一个产品中,通常是较小的支持(非零部分)函数,称为内核),在该内核以该特定点为中心之后。 在离散世界中,这些只是乘积的总和。
事实证明,您可以设计一个具有与三次样条非常相似的属性的卷积核,并使用它来获得快速的“双三次”
Lancsoz 重采样是类似的事情,但内核中的属性略有不同,这主要意味着它们将具有不同特征的工件。 您可以轻松地查找这些内核函数的详细信息(我确信维基百科有它们,或任何介绍文本)。 图形程序中使用的实现往往是高度优化的,有时具有专门的假设,这使得它们更高效但不太通用。
While what Ants Aasma says roughly describes the difference, I don't think it is particularly informative as to why you might do such a thing.
As far as links go, you are asking a very basic question in image processing, and any decent introductory textbook on the subject will describe this. If I remember correctly, Gonzales and Woods is decent on it, but I'm away from my books and can't check.
Now on to the particulars, it should help to think about what you are doing fundamentally. You have a square lattice of measurements that you want to interpolate new values for. In the simple case of upsampling, lets imagine you want a new measurement in between every one that you already have (e.g. double the resolution).
Now you won't get the "correct" value, because in general you don't have that information. So you have to estimate it. How to do this? A very simple way would be to linearly interpolate. Everyone knows how to do this with two points, you just draw a line between them, and read the new value off the line (in this case, at the half way point).
Now an image is two dimensional, so you really want to do this in both the left-right and up-down directions. Use the result for your estimate and voila you have "bilinear" interpolation.
The main problem with this is that it isn't very accurate, although it's better (and slower) than the "nearest neighbor" approach which is also very local and fast.
To address the first problem, you want something better than a linear fit of two points, you want to fit something to more data points (pixels), and something that can be nonlinear. A good trade off on accuracy and computational cost is something called a cubic spline. So this will give you a smooth fit line, and again you approximate your new "measurement" by the value it takes in the middle. Do this in both directions and you've got "bicubic" interpolation.
So that's more accurate, but still heavy. One way to address the speed issue is to use a convolution, which has the nice property that in the Fourier domain, it's just a multiplication, so we can implement it quite quickly. But you don't need to worry about the implementation to understand that the convolution result at any point is one function (your image) being integrated in product another, typically much smaller support (the part that is non-zero) function called the kernel), after that kernel has been centered over that particular point. In the discrete world, these are just sums of the products.
It turns out that you can design a convolution kernel that has properties quite like the cubic spline, and use that to get a fast "bicubic"
Lancsoz resampling is a similar thing, with slightly different properties in the kernel, which primarily means they will have different characteristic artifacts. You can look up the details of these kernel functions easily enough (I'm sure wikipedia has them, or any intro text). The implementations used in graphics programs tend to be highly optimized and sometimes have specialized assumptions which make them more efficient but less general.
我建议阅读以下文章来基本了解不同的图像插值方法 通过卷积进行图像插值。 如果您想尝试更多插值方法,imageresampler 是一个不错的开源项目。
我认为图像插值可以从两个方面来理解,一是从函数拟合的角度,一是从卷积的角度。 例如,image 中解释的样条插值通过卷积插值在三次插值中从函数拟合的角度得到了很好的解释。
另外,图像插值总是与特定的应用相关,例如图像缩放、图像旋转等。 事实上,对于特定的应用,图像插值可以通过智能的方式实现。 例如,图像旋转可以通过 三剪切方法,并且在每次剪切操作期间可以实现不同的一维插值算法。
I would like suggest the following article for a basic understanding of different image interpolation methods image interpolation via convolution. If you want to try more interpolation methods, the imageresampler is a nice open source project to begin with.
In my opinion image interpolation can be understood from two aspects, one is from function fitting perspective, and one is from convolution perspective. For example, the spline interpolation explained in image interpolation via convolution is well explained from function fitting perspective in Cubic interpolation.
Additionally, image interpolation is always related to a specific application, for example image zooming, image rotation and so on. In fact for a specific application, image interpolation can be implemented i.n a smart way. For example, image rotation can be implemented via a three-shearing method, and during each shearing operation different one-dimension interpolation algorithms can be implemented.