Google Shopper 中的图像识别是如何工作的?

发布于 2024-09-19 21:35:31 字数 104 浏览 4 评论 0原文

我对这个软件的运行效果(和速度)感到惊讶。我在昏暗的灯光下将手机摄像头悬停在书籍封面的一小块区域上,Google Shopper 只需几秒钟就可以识别它。这几乎是神奇的。有谁知道它是如何工作的?

I am amazed at how well (and fast) this software works. I hovered my phone's camera over a small area of a book cover in dim light and it only took a couple of seconds for Google Shopper to identify it. It's almost magical. Does anyone know how it works?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

猫七 2024-09-26 21:35:31

我不知道 Google Shopper实际上是如何工作的。但它可以像这样工作:

  • 获取图像并转换为边缘(使用边缘过滤器,保留颜色信息)。
  • 找到边缘相交的点并列出它们(包括颜色和相交边缘的角度)。
  • 通过选择成对的高对比度点并测量它们之间的距离,转换为与旋转无关的度量。现在书的封面被表示为一堆数字:(edgecolor1a,edgecolor1b,edgecolor2a,edgecolor2b,distance)。
  • 选择最显着的距离值对并对距离进行比率。
  • 将此数据作为查询字符串发送到 Google,在那里它会找到最相似的向量(可能使用直接最近邻计算,或者可能使用经过适当训练的分类器(可能是支持向量机)。

Google Shopper 还可以发送整个图片,此时 Google 可以使用功能更强大的处理器来处理图像处理数据,这意味着它可以使用更复杂的预处理(我选择上述步骤非常简单,以便可以在智能手机上使用)。

无论如何,一般步骤很可能是(1)提取尺度和旋转不变特征,(2)将该特征向量与预先计算的特征库进行匹配。

I have no idea how Google Shopper actually works. But it could work like this:

  • Take your image and convert to edges (using an edge filter, preserving color information).
  • Find points where edges intersect and make a list of them (including colors and perhaps angles of intersecting edges).
  • Convert to a rotation-independent metric by selecting pairs of high-contrast points and measuring distance between them. Now the book cover is represented as a bunch of numbers: (edgecolor1a,edgecolor1b,edgecolor2a,edgecolor2b,distance).
  • Pick pairs of the most notable distance values and ratio the distances.
  • Send this data as a query string to Google, where it finds the most similar vector (possibly with direct nearest-neighbor computation, or perhaps with an appropriately trained classifier--probably a support vector machine).

Google Shopper could also send the entire picture, at which point Google could use considerably more powerful processors to crunch on the image processing data, which means it could use more sophisticated preprocessing (I've chosen the steps above to be so easy as to be doable on smartphones).

Anyway, the general steps are very likely to be (1) extract scale and rotation-invariant features, (2) match that feature vector to a library of pre-computed features.

流年已逝 2024-09-26 21:35:31

无论如何,模式识别/机器学习方法通​​常基于:

  1. 从图像中提取可以描述为数字的特征。例如,使用边缘(如 Rex Kerr 之前解释的那样)、颜色、纹理等。描述或表示图像的一组数字称为“特征向量”,有时称为“描述符”。提取图像的“特征向量”后,可以使用距离或(不)相似度函数来比较图像。
  2. 从图像中提取文本。有多种方法可以做到这一点,通常基于 OCR(光学字符识别)
  3. 使用特征和文本对数据库进行搜索,以找到最接近的相关产品。

    图像也可能被切割成子图像,因为算法经常在图像上找到特定的徽标。

    在我看来,图像特征被发送到不同模式分类器(能够使用特征向量作为输入来预测“类”的算法),以便识别徽标,然后,产品本身。

    使用这种方法,它可以是:本地、远程或混合。如果是本地的,则所有处理都在设备上进行,仅将“特征向量”和“文本”发送到数据库所在的服务器。如果是远程,整个图像将发送到服务器。如果是混合的(我认为这是最有可能的),部分在本地执行,部分在服务器上执行。

    另一个有趣的软件是 Google Googles,它使用 CBIR(基于内容的图像检索)来搜索与智能手机拍摄的照片相关的其他图像。它与 Shopper 解决的问题相关。

In any case, the Pattern Recognition/Machine Learning methods often are based on:

  1. Extract features from the image that can be described as numbers. For instance, using edges (as Rex Kerr explained before), color, texture, etc. A set of numbers that describes or represents an image is called "feature vector" or sometimes "descriptor". After extracting the "feature vector" of an image it is possible to compare images using a distance or (dis)similarity function.
  2. Extract text from the image. There are several method to do it, often based on OCR (optical character recognition)
  3. Perform a search on a database using the features and the text in order to find the closest related product.

    It is also likely that the image is also cuted into subimages, since the algorithm often finds a specific logo on the image.

    In my opinion, the image features are send to different pattern classifiers (algorithms that are able to predict a "class" using as input a feature vector), in order to recognize logos and, afterwards, the product itself.

    Using this approach, it can be: local, remote or mixed. If local, all processing is carried out on the device, and just the "feature vector" and "text" are sent to a server where the database is. If remote, the whole image goes to the server. If mixed (I think this is the most probable one), partially executed locally and partially at the server.

    Another interesting software is the Google Googles, that uses CBIR (content-based image retrieval) in order to search for other images that are related to the picture taken by the smartphone. It is related to the problem that is addressed by Shopper.

陌若浮生 2024-09-26 21:35:31

模式识别。

Pattern Recognition.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文