解释霍夫变换
我只是在冒险,向计算机视觉迈出了第一步。我尝试自己实现霍夫变换,但我只是不了解全貌。我阅读了维基百科条目,甚至是理查德·杜达和彼得·哈特的原始“使用霍夫变换来检测图片中的直线和曲线”,但没有帮助。
有人可以帮我用更友好的语言解释一下吗?
I am just being adventurous and taking my first baby step toward computer vision. I tried to implement the Hough Transformation on my own but I just don't get the whole picture. I read the wikipedia entry, and even the original "use of the hough transformation to detect lines and curves in pictures" by richard Duda and Peter Hart, but didn't help.
Can someone help explaining to me in a more friendly language?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
以下是霍夫变换如何检测图像中的线条的非常基本的直观解释:
Here's a very basic, visual explanation of how a Hough Transform works for detecting lines in an image:
更常见的是在直角坐标系中考虑一条线,即y = mx + b。正如维基百科文章所述,一条线也可以用极坐标形式表示。霍夫变换利用了这种表示形式的变化(无论如何,对于直线。讨论也可以应用于圆形、椭圆形等)。
霍夫变换的第一步是将图像缩小为一组边缘。 Canny 边缘检测器是常见的选择。生成的边缘图像作为霍夫过程的输入。
总而言之,边缘图像中“点亮”的像素被转换为极坐标形式,即它们的位置使用方向 theta 和距离 r 表示 - 而不是 x 和 y。 (图像的中心通常用作这种坐标变化的参考点。)
霍夫变换本质上是一个直方图。假设映射到相同 theta 和 r 的边缘像素定义图像中的一条线。为了计算出现频率,theta 和 r 被离散化(划分为多个容器)。一旦所有边缘像素都转换为极坐标形式,就会分析箱以确定原始图像中的线条。
通常会查找 N 个最常见的参数 - 或对参数设置阈值,从而忽略小于某个 n 的计数。
我不确定这个答案比您最初提供的来源更好 - 您是否坚持某个特定观点?
It's more common to think of a line in rectangle coordinates, i.e. y = mx + b. As the Wikipedia article states, a line can also be expressed in polar form. The Hough transform exploits this change of representation (for lines, anyway. The discussion can also be applied to circles, ellipses, etc.).
The first step in the Hough transform is to reduce the image to a set of edges. The Canny edge-detector is a frequent choice. The resulting edge image serves as the input to the Hough process.
To summarize, pixels "lit" in the edge image are converted to polar form, i.e. their position is represented using a direction theta and a distance r - instead of x and y. (The center of the image is commonly used as the reference point for this change of coordinates.)
The Hough transform is essentially a histogram. Edge pixels mapping to the same theta and r are assumed to define a line in the image. To compute the frequency of occurrence, theta and r are discretized (partitioned into a number of bins). Once all edge pixels have been converted to polar form, the bins are analyzed to determine the lines in the original image.
It is common to look for the N most frequent parameters - or threshold the parameters such that counts smaller than some n are ignored.
I'm not sure this answer is any better than the sources you originally presented - is there a particular point that you are stuck on?
霍夫变换是一种查找代表直线(或圆,或许多其他事物)的最可能值的方法。
您为霍夫变换提供一条线的图片作为输入。该图片将包含两种类型的像素:属于线条的一部分,以及属于背景的一部分。
对于属于该线的每个像素,计算所有可能的参数组合。例如,如果坐标 (1, 100) 处的像素是直线的一部分,那么它可能是梯度 (m) = 0 且 y 截距 (c) = 100 的直线的一部分。也可以是 m = 1、c = 99 的一部分;或 m = 2,c = 98;或 m = 3,c = 97;等等。您可以求解直线方程 y = mx + c 以找到所有可能的组合。
每个像素给每个可以解释它的参数(m 和 c)投一票。所以你可以想象,如果你的线有 1000 个像素,那么 m 和 c 的正确组合将有 1000 票。
得票最多的 m 和 c 的组合将作为该行的参数返回。
The Hough transform is a way of finding the most likely values which represent a line (or a circle, or many other things).
You give the Hough transform a picture of a line as input. This picture will contain two types of pixels: ones which are part of the line, and ones which are part of the background.
For each pixel that is part of the line, all possible combinations of parameters are calculated. For example, if the pixel at co-ordinate (1, 100) is part of the line, then that could be part of a line where the gradient (m) = 0 and y-intercept (c) = 100. It could also be part of m = 1, c = 99; or m = 2, c = 98; or m = 3, c = 97; and so on. You can solve the line equation y = mx + c to find all possible combinations.
Each pixel gives one vote to each of the parameters (m and c) that could explain it. So you can imagine, if your line has 1000 pixels in it, then the correct combination of m and c will have 1000 votes.
The combination of m and c which has the most votes is what is returned as the parameters for the line.
这是另一个视角(在电视节目Numbers的试播集中使用的一个视角):想象一下,早些时候草坪上的某个地方有一个类似喷泉的草坪洒水器,在其周围喷出水滴。现在洒水器不见了,但水滴仍然存在。想象一下,将每个水滴变成自己的洒水器,它会在自身周围喷出水滴——向各个方向喷射,因为水滴不知道它来自哪个方向。这会将大量的水薄薄地撒在地面上,但有一个地方会同时从所有水滴中喷出大量的水。那个地方就是原来的喷头所在的地方。
线检测的应用是类似的。图像中的每个点都是原始液滴之一;当它充当洒水器时,它会发送自己的水滴,标记所有可能经过该点的线路。大量二次液滴降落的地方代表穿过大量图像点的线的参数 - 瞧!检测到线路!
Here's another perspective (one used in the pilot episode of the T.V show Numbers): Imagine a fountain-like lawn sprinkler was somewhere on a lawn earlier, casting out water droplets around itself. Now the sprinkler is gone, but the drops remain. Imagine turning each drop into its own sprinkler, itself casting out droplets around itself - in all directions because the drop doesn't know what direction it came from. This will scatter a lot of water thinly around on the ground, except there will be a spot where a whole lot of water hits from all drops at once. That spot is where the original sprinkler was.
The application to (e.g) line detection is similar. Each point in the image is one of the original droplets; when it acts as a sprinkler it sends its own droplets marking all of the lines that could be passing through that point. Places where a whole lot of secondary droplets land represent the parameters of a line that passes through a whole lot of image points - VOILA! Line detected!