找到使图像在列表中唯一的像素,您可以改进暴力破解吗?
假设我有一个字符串列表,其中每个字符串的
- 长度正好是 4 个字符,并且
- 在列表中是唯一的。
对于每个字符串,我想确定字符串中使该字符串唯一的字符的位置。
因此,对于三个字符串的列表,
abcd
abcc
bbcb
对于第一个字符串,我想标识第四个位置 d 中的字符,因为 d 不会出现在任何其他字符串的第四个位置中。
对于第二个字符串,我想识别第四个位置的字符c。
对于第三个字符串,我想识别第一个位置的字符 b 和第四个位置的字符 b。
这可以简明地表示为
abcd -> ...d
abcc -> ...c
bbcb -> b..b
如果您考虑相同的问题但使用二进制数列表
0101
0011
1111
那么我想要的结果将是
0101 -> ..0.
0011 -> .0..
1111 -> 1...
保持二进制主题我可以使用 XOR 来识别哪些位在 two 中是唯一的二进制数,因为
0101 ^ 0011 = 0110
我可以将其解释为在这种情况下,第二位和第三位(从左到右读取)在这两个二进制数之间是唯一的。这种技术可能会转移注意力,除非它能以某种方式扩展到更大的列表。
强力方法是依次查看每个字符串,并对每个字符串迭代列表中其余字符串的垂直切片。
因此,对于列表,
abcd
abcc
bbcb
我将从
abcd
垂直切片开始并迭代
abcc
bbcb
这些垂直切片所在的位置
a | b | c | c
b | b | c | b
或以列表形式“ab”、“bb”、“cc”、“cb”。
这将导致四次比较
a : ab -> . (a is not unique)
b : bb -> . (b is not unique)
c : cc -> . (c is not unique)
d : cb -> d (d is unique)
或简明地
abcd -> ...d
可能是一厢情愿的想法,但我有一种感觉,应该有一个优雅且通用的解决方案,适用于任意大的字符串(或二进制数字)列表。但如果有的话我还没有看到。
我希望使用此算法从一组唯一图像(位图)中获取最小签名,以便将来有效地识别这些图像。如果不考虑未来的效率,我会使用每个图像的简单散列。
你能改进蛮力吗?
编辑 我喜欢的方法是构建像素到图像的映射
sprawl[Tuple<x=10, y=33,color=f1fefd>] => {
image17,
image23,
...
}
sprawl[Tuple<x=10, y=34,color=f1fef0>] => {
image11
...
}
,然后使用该映射来识别每个图像的最小签名像素集。
如果一个像素(由 x、y、颜色标识)仅引用一个图像,那么我就找到了该图像的完美(最小)签名。
如果图像没有唯一的像素,情况会更复杂,但由于我知道列表中的所有图像都是唯一的,所以我应该能够组合两个或更多像素引用(但尽可能少)来推断图像。
更新
我一直在为此研究一种算法。我的问题与这个问题非常相似,我已经写了我的算法作为该问题的答案。此更新是为了引起仍在关注的任何人的注意(我看到五个书签)。我正在单独研究这个问题,所以欢迎任何和所有的反馈,即使只是为了观察我没有说清楚!
Suppose I have a list of strings where each string is
- exactly 4 characters long and
- unique within the list.
For each of these strings I want to identify the position of the characters within the string that make the string unique.
So for a list of three strings
abcd
abcc
bbcb
For the first string I want to identify the character in 4th position d since d does not appear in the 4th position in any other string.
For the second string I want to identify the character in 4th position c.
For the third string it I want to identify the character in 1st position b AND the character in 4th position, also b.
This could be concisely represented as
abcd -> ...d
abcc -> ...c
bbcb -> b..b
If you consider the same problem but with a list of binary numbers
0101
0011
1111
Then the result I want would be
0101 -> ..0.
0011 -> .0..
1111 -> 1...
Staying with the binary theme I can use XOR to identify which bits are unique within two binary numbers since
0101 ^ 0011 = 0110
which I can interpret as meaning that in this case the 2nd and 3rd bits (reading left to right) are unique between these two binary numbers. This technique might be a red herring unless somehow it can be extended to the larger list.
A brute-force approach would be to look at each string in turn, and for each string to iterate through vertical slices of the remainder of the strings in the list.
So for the list
abcd
abcc
bbcb
I would start with
abcd
and iterate through vertical slices of
abcc
bbcb
where these vertical slices would be
a | b | c | c
b | b | c | b
or in list form, "ab", "bb", "cc", "cb".
This would result in four comparisons
a : ab -> . (a is not unique)
b : bb -> . (b is not unique)
c : cc -> . (c is not unique)
d : cb -> d (d is unique)
or concisely
abcd -> ...d
Maybe it's wishful thinking, but I have a feeling that there should be an elegant and general solution that would apply to an arbitrarily large list of strings (or binary numbers). But if there is I haven't yet been able to see it.
I hope to use this algorithm to to derive minimal signatures from a collection of unique images (bitmaps) in order to efficiently identify those images at a future time. If future efficiency wasn't a concern I would use a simple hash of each image.
Can you improve on brute force?
Edit
The approach I'm warming to is building a map of pixels to images
sprawl[Tuple<x=10, y=33,color=f1fefd>] => {
image17,
image23,
...
}
sprawl[Tuple<x=10, y=34,color=f1fef0>] => {
image11
...
}
and then using that map to identify the minimal set of signature pixels for each image.
If a pixel (identified by x, y, color) references just one image then I have found a perfect (minimal) signature for that image.
It's more complicated if an image has no unique pixels, but since I know all images are unique within the list I should be able to combine two or more pixel references (but as few as possible) to deduce the image.
Update
I've been working on an algorithm for this. My problem is very similar to this one, and I've written up my algorithm as an answer to that question. This update is to flag the attention of anyone still following (I see five bookmarks). I'm working on this in isolation so any and all feedback is welcome, even if just to observe that I haven't made myself clear!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这个问题可以通过trie或前缀树来解决。
请参阅 Trie - 维基百科,免费百科全书
对于示例中的 3 个字符串:
将转换为trie 树(其中 ^ 表示树的根):
到其分支节点的路径是公共前缀。最后一个分支点之后的节点使特定字符串变得唯一。在本例中,它们是 d、c、b。
我认为字符串的顺序对您来说并不重要,您可以比较所有字符串以找到唯一性,而不仅仅是相邻的字符串。
复杂度应该是 O(nxm)。但这可能会受到字符串中字符的域的影响。
This problem can be solved by trie, or prefix tree.
See Trie - Wikipedia, the free encyclopedia
For the 3 strings in your example:
will be turned into a trie tree (where ^ denotes the root of the tree):
The path to the node where it branch off are the common prefix. The node after the last branch point is what makes a particular string unique. In this case, they are d, c, b.
I assume the order of string is not important for you, that you compares all strings to find the uniqueness, not just the neighboring string.
The complexity should be O(n x m). But this will probably affected by the domain of the characters in your string.
您可以生成一个二维数组,其中包含每个字符在每个位置 (0-3) 中出现的次数。例如,
arr[1,3]
将包含数字/字符1
在最后一个位置出现的次数。然后对于每个字符串
s
,检查字符串中的所有字符。根据数组,在该位置仅出现一次的字符是该字符串的唯一字符。换句话说,如果arr[s[i], i]==1
则字符串s
在位置i
中是唯一的。这将为您提供线性时间的解决方案,而您给出的算法将花费二次时间。
You can generate a two dimensional array which will contain the number of times each character appears in each position (0-3). For example,
arr[1,3]
will contain the number of times the digit/character1
appears in the last position.Then for each string
s
, go over all characters in the string. The ones which appear only once in that position according to the array are the unique characters for that string. In other words, ifarr[s[i], i]==1
Then strings
is unique in positioni
.This will give you the solution in linear time, while the algorithm you gave will take quadratic time.
如果您的目标是稍后识别图像,您可以通过选择预定义点作为身份像素来创建图像的非常快速的哈希值。
例如,您可以有一个结构(类、结构,无论什么语言)如下:
这种“不完整哈希”将允许您识别可能的身份,然后您可以根据需要谨慎地进行昂贵的全面比较。
根据需要展开不完整的哈希。
If your goal is to identify images later, you could create a very fast hash of the image by picking predefined points to serve as identity pixels.
for example, you could have a structure (class, struct, doesn't matter what language) as follows:
This sort of "incomplete hash" will allow you identify possible identities, and then you can do the expensive, full comparison sparingly as required.
Expand the incomplete hash as necessary.