骰子面值识别
我正在尝试构建一个简单的应用程序,它将识别两个 6 面骰子的值。我正在寻找一些一般性的指导,甚至可能是一个开源项目。
两个骰子是黑色和白色的,分别有白色和黑色的点数。它们到摄像机的距离始终相同,但它们在比赛场地上的位置和方向是随机的。
骰子 http://www.freeimagehosting.net/uploads/9160bdd073.jpg
(不是最好的例子,表面的颜色会更明显,阴影也会消失)
我之前没有开发此类识别软件的经验,但我认为技巧是首先通过搜索以白色或黑色为主的方形轮廓来隔离面(图像的其余部分,即桌子/游戏表面,将采用明显不同的颜色),然后隔离点进行计数。自上而下的照明将消除阴影。
我希望所描述的场景非常简单(读作:常见),甚至可以用作从事 OCR 技术或类似计算机视觉挑战的开发人员的“入门练习”。
更新:
我进一步进行了谷歌搜索,发现了此视频奇怪的是,这正是我正在寻找的东西。似乎 OpenCV 项目 是我迄今为止最好的选择,我会尝试使用它与此其他项目、OpenCVDotNet 或Emgu 简历。
更新:
仍在挣扎,无法让 Emgu CV 发挥作用。
想法、指示、想法等仍然非常受欢迎!
I’m trying to build a simple application that will recognize the values of two 6-sided dice. I’m looking for some general pointers, or maybe even an open source project.
The two dice will be black and white, with white and black pips respectively. Their distance to the camera will always be the same, but their position and orientation on the playing surface will be random.
Dice http://www.freeimagehosting.net/uploads/9160bdd073.jpg
(not the best example, the surface will be a more distinct color and the shadows will be gone)
I have no prior experience with developing this kind of recognition software, but I would assume the trick is to first isolate the faces by searching for the square profile with a dominating white or black color (the rest of the image, i.e. the table/playing surface, will in distinctly different colors), and then isolate the pips for the count. Shadows will be eliminated by top down lighting.
I’m hoping the described scenario is so simple (read: common) it may even be used as an “introductory exercise” for developers working on OCR technologies or similar computer vision challenges.
Update:
I did some further googling and came across this video which strangely enough is exactly what I'm looking for. It also seems it's the OpenCV project is my best bet so far, I'll try and use it with this other project, OpenCVDotNet or Emgu CV.
Update:
Still struggling, can't get Emgu CV to work.
Ideas, pointers, thoughts, etc are still very much welcome!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
图像识别并非易事。您将必须以某种方式限制输入数据,并且看起来您已经考虑过这一点。
你的问题让我想起了作者的博客文章SudokuGrab,这是一款 iPhone 应用程序,可让您拍摄报纸上的数独谜题照片,并让它为您解答谜题。在帖子中,他讨论了您在解决问题时将遇到的几个问题,以及他如何克服这些问题。
Image recognition is non-trivial. You're going to have to constrain the input data in some way, and it looks like you've given this some thought.
Your question reminded me of a blog post by the author of SudokuGrab, which is an iPhone app that allows you to take photos of a Sudoku puzzle in a newspaper, and have it solve the puzzle for you. In the post, he discusses several of the issues that you will face in solving your problem, and how he overcame them.
这是一个与 Object Recognition from Templates 类似的问题,我提供了一个我认为可能的答案有用的。
虽然不同类型的分类器可能效果很好,但我可能会尝试我首先概述的方法。分类器通常很难实现,尤其是正确训练。
另外,当事情不起作用时,很难知道问题出在哪里:是否在分类器的实现中,您选择了错误的方法,参数是否错误,您是否没有正确训练它,或者您是否只是运气不好?
不,如果问题可以使用简单的图像处理方法和一些数学(轻松)解决,请远离分类器、模板匹配和神经网络。
This is a smiliar question to Object Recognition from Templates to which I provided an answer which I believe might be of use.
While different kinds of classifiers will probably work well, I would probably try the method I outlined first. Classifiers are often tricky to implement and especially to train properly.
Also, when things don't work it is very hard to know where the problem is: is it in your implementation of the classifier, did you choose the wrong method, are the parameters wrong, did you not train it properly, or were you just unlucky?
No, stay away from classifiers, template matching and neural networks if the problem can (easily) be solved using simlpe image processing methods and some math.
另一种可能性是首先使用更通用的图像处理/识别算法来确定骰子位置,然后将图像旋转并缩放到某种形式的标准(例如,已旋转为直的骰子的 512x512 像素灰度图像)。然后尝试训练 6 个不同的神经网络来识别屏幕上不同数量的骰子。 AForge.Net 是一个很好的、可靠的人工智能(包括神经网络)库,应该能让你有所收获那里的路。
Another possibility is first using a more generic image manipulation/recognition algorithm to pin down the dice positions, then rotate and scale the image to some form of standard (such as, 512x512 pixel grayscale images of dice which have been rotated to be straight). Then attempt to train 6 different neural nets to recognize the various numbers of dice on screen. AForge.Net is a good solid artificial intelligence (including neural nets) library, and should get you a fair bit of the way there.
我认为,在此视频中,您几乎可以看到您想要的行为。作者使用了多个白色骰子,但他提供了代码(python/opencv),也许您可以在此基础上构建您的项目。
In this video you can see pretty much the behaviour you want, I think. The author is using multiple white dice, but he is providing the code (python/opencv) and maybe you can build your project on that.
虽然图像训练正如 @Brian 所说“不平凡”,但这实际上是一个非常容易编写的程序。你需要做的是为骰子开发 haar 分类器。您总共需要 6 个分类器。分类器是良好图像识别的关键,而 haar 分类器是目前最好的分类器。它们需要很长时间才能制作。这里有一些很好的链接,可以帮助您熟悉 haar 级联:
http://www.computer-vision-software.com/blog/2009/11/faq-opencv-haartraining/
http://www.cognotics.com/opencv/docs/1.0/haartraining.htm
http://note.sonots.com/SciSoftware/haartraining.html
查看这个家伙的 YouTube 视频,然后从他在视频中提供的链接下载他的源代码看看他如何在 EmguCV 中应用级联文件。这将是你可以继续发展的东西。
http://www.youtube.com/watch?v=07QAhRJmcKQ
此网站发布链接到一些不错的小工具的来源,该工具添加了一些自动化功能来裁剪图像并创建创建 haar 级联所需的索引文件。几个月前我用过它,但我无法让它正常工作,但我修改了它,它对 haar (不是 HMM)效果很好。如果您想要我修改的版本,请发回,我会把它给您。
http://sandarenu.blogspot.com/2009/03/opencv -haar-training-resources.html
While image training is "non-trivial" as @Brian said, that will actually be a pretty easy program to write. What you need to do is develop haar classifiers for the dice. You will need 6 classifiers total. The classifiers are the key to good image recongnition, and haar classifiers are the best there are right now. They take a long time to make. Here are some good links to get you familiarized with haar cascades:
http://www.computer-vision-software.com/blog/2009/11/faq-opencv-haartraining/
http://www.cognotics.com/opencv/docs/1.0/haartraining.htm
http://note.sonots.com/SciSoftware/haartraining.html
Check out this guys youtube video and then download his source from the link he provides in the video to see how he applied the cascade files in EmguCV. It will be something for you to build on.
http://www.youtube.com/watch?v=07QAhRJmcKQ
This site posts the link to some source for nice little tool that adds a little automation to cropping the images and creating the index files needed for the creation of the haar cascades. I used it a few months back and I couldn't get it to work right, but I modified it and it worked great for haar (not HMM). If you want the version I modified post back and I will get it to you.
http://sandarenu.blogspot.com/2009/03/opencv-haar-training-resources.html
虽然我无法为您提供什么技术帮助,但 Dice-O-Matic mark II 的制造商可能会能够提供帮助。
While I have little technical assistance to offer you, the maker of the Dice-O-Matic mark II may be able to help.
好吧,
用于执行高抽象级别的图像识别的算法(例如生成可靠的手写识别软件或面部识别软件所需的抽象类型)仍然是当今计算机科学中最困难的问题之一。然而,对于约束良好的应用程序(如您所描述的应用程序)的模式识别是一个可解决且非常有趣的算法问题。
我建议使用两种可能的策略来执行您的任务:
第一种策略涉及使用一些第三方软件,该软件可以预处理您的图像并返回有关低级图像组件的数据。我有一些使用名为 pixcavator 的软件的经验,该软件有一个 SDK 此处。 Pixavator 将挖掘您的图像并研究每个像素的颜色值之间的差异,以返回图像中各个组件的边界。像 pixcavator 这样的软件应该能够轻松定义图片中组件的边界,最重要的是每个点的边界。然后,您的工作将是挖掘第三方软件返回给您的数据,并寻找符合白色或黑色小圆形分区描述的组件。您将能够计算出有多少个图像组件被分区,并使用它来返回图像中的点数。
如果您有足够的雄心在不使用第三方软件的情况下解决这个问题,那么问题仍然是可以解决的。本质上,您需要定义一个圆形扫描仪,它是一组呈圆形的像素,它将扫描您的图像测试以寻找点(就像眼睛可能扫描图片以寻找隐藏在图片中的东西一样) )。当您的算法“眼睛”扫描图像时,它将从图像中获取像素集(称为测试集)并与预定义的像素集(我们称为训练集)进行比较并检查以查看如果测试集与训练集之一在预定义的误差容限内匹配。运行此类测试的最简单方法是简单地将测试集中每个像素的颜色数据与训练集中每个像素的颜色数据进行比较,以生成第三组像素,称为差异集。如果差异集中的值足够小(意味着测试集与训练集足够相似),您将图像上的该区域定义为点,然后继续扫描图像的其他部分。
需要进行一些猜测并检查以找到正确的容错能力,以便您捕获每个点,并且不会对非点的东西进行测试。
Alright,
Algorithms for carrying out image recognition with a high level of abstraction (like the type of abstraction necessary to produce reliable handwriting recognition software or face recognition software) persists as one of the most difficult problems in computer science today. However, pattern recognition for well constrained applications, like the application you described, is a solvable and very fun algorithmic problem.
I would suggest two possible strategies for carrying out your task:
The first strategy involves using some third party software that can preprocess your image and return data about low-level image components. I have some experience using a software called pixcavator, that has an SDK here. Pixavator will mine through your image and study the discrepancy between the color values of each of the pixels to return the borders of various components in the image. A software like pixcavator should be able to easily define the boundaries for the comopents in your picture and most importantly each of the pips. Your job will then be to mine through the data that the third party software returns to you and look for components that fit the description of small circular partitions that are either white or black. You'll be able to count up how many of these image components were partitioned off and use that to return the amount of pips in your image.
If you're ambitious enough to work on this problem without the use of third party software, the problem is still solvable. Essentially, you'll want to define a circular scanner which is a set of pixels in a circular formation that will scan through your image testing looking for a pip (just like an eye might scan over a picture to look for something hidden in the picture). As your algorithmic “eye” is scanning over the image, it will be taking sets of pixels from the image (call it test sets) and comparing with a predefined set of pixels (what we'll call your training sets) and checking to see if the test set matches one of the training sets within a predefined tolerance for error. The easiest way to run a test like this would be to simply compare the color data for each of the pixels in the test set with each of the pixels in the training set to produce a third set of pixels called your discrepancy set. If the values in your discrepancy set are sufficiently small (meaning the test set is sufficiently similar to the training set) you'll define that area on your image as a pip and move on to scan other parts of your image.
It will take a little guess and check to find the right error tolerance so that you catch every pip and you don't test positive for things that aren't pips.