找到使图像在列表中唯一的像素，您可以改进暴力破解吗？

发布于 2024-09-01 15:53:57 字数 2303 浏览 6 评论 0原文

假设我有一个字符串列表，其中每个字符串的

长度正好是 4 个字符，并且
在列表中是唯一的。

对于每个字符串，我想确定字符串中使该字符串唯一的字符的位置。

因此，对于三个字符串的列表，

abcd
abcc
bbcb

对于第一个字符串，我想标识第四个位置 d 中的字符，因为 d 不会出现在任何其他字符串的第四个位置中。

对于第二个字符串，我想识别第四个位置的字符c。

对于第三个字符串，我想识别第一个位置的字符 b 和第四个位置的字符 b。

这可以简明地表示为

abcd -> ...d
abcc -> ...c
bbcb -> b..b

如果您考虑相同的问题但使用二进制数列表

0101
0011
1111

那么我想要的结果将是

0101 -> ..0.
0011 -> .0..
1111 -> 1...

保持二进制主题我可以使用 XOR 来识别哪些位在 two 中是唯一的二进制数，因为

0101 ^ 0011 = 0110

我可以将其解释为在这种情况下，第二位和第三位（从左到右读取）在这两个二进制数之间是唯一的。这种技术可能会转移注意力，除非它能以某种方式扩展到更大的列表。

强力方法是依次查看每个字符串，并对每个字符串迭代列表中其余字符串的垂直切片。

因此，对于列表，

abcd
abcc
bbcb

我将从

abcd

垂直切片开始并迭代

abcc
bbcb

这些垂直切片所在的位置

a | b | c | c
b | b | c | b

或以列表形式“ab”、“bb”、“cc”、“cb”。

这将导致四次比较

a : ab -> . (a is not unique)
b : bb -> . (b is not unique)
c : cc -> . (c is not unique)
d : cb -> d (d is unique)

或简明地

abcd -> ...d

可能是一厢情愿的想法，但我有一种感觉，应该有一个优雅且通用的解决方案，适用于任意大的字符串（或二进制数字）列表。但如果有的话我还没有看到。

我希望使用此算法从一组唯一图像（位图）中获取最小签名，以便将来有效地识别这些图像。如果不考虑未来的效率，我会使用每个图像的简单散列。

你能改进蛮力吗？

编辑我喜欢的方法是构建像素到图像的映射

sprawl[Tuple<x=10, y=33,color=f1fefd>] => {
     image17,
     image23,
     ...
}

sprawl[Tuple<x=10, y=34,color=f1fef0>] => {
     image11
     ...
}

，然后使用该映射来识别每个图像的最小签名像素集。

如果一个像素（由 x、y、颜色标识）仅引用一个图像，那么我就找到了该图像的完美（最小）签名。

如果图像没有唯一的像素，情况会更复杂，但由于我知道列表中的所有图像都是唯一的，所以我应该能够组合两个或更多像素引用（但尽可能少）来推断图像。

更新

我一直在为此研究一种算法。我的问题与这个问题非常相似，我已经写了我的算法作为该问题的答案。此更新是为了引起仍在关注的任何人的注意（我看到五个书签）。我正在单独研究这个问题，所以欢迎任何和所有的反馈，即使只是为了观察我没有说清楚！

原文

Suppose I have a list of strings where each string is

exactly 4 characters long and
unique within the list.

For each of these strings I want to identify the position of the characters within the string that make the string unique.

So for a list of three strings

abcd
abcc
bbcb

For the first string I want to identify the character in 4th position d since d does not appear in the 4th position in any other string.

For the second string I want to identify the character in 4th position c.

For the third string it I want to identify the character in 1st position b AND the character in 4th position, also b.

This could be concisely represented as

abcd -> ...d
abcc -> ...c
bbcb -> b..b

If you consider the same problem but with a list of binary numbers

0101
0011
1111

Then the result I want would be

0101 -> ..0.
0011 -> .0..
1111 -> 1...

Staying with the binary theme I can use XOR to identify which bits are unique within two binary numbers since

0101 ^ 0011 = 0110

which I can interpret as meaning that in this case the 2nd and 3rd bits (reading left to right) are unique between these two binary numbers. This technique might be a red herring unless somehow it can be extended to the larger list.

A brute-force approach would be to look at each string in turn, and for each string to iterate through vertical slices of the remainder of the strings in the list.

So for the list

abcd
abcc
bbcb

I would start with

abcd

and iterate through vertical slices of

abcc
bbcb

where these vertical slices would be

a | b | c | c
b | b | c | b

or in list form, "ab", "bb", "cc", "cb".

This would result in four comparisons

a : ab -> . (a is not unique)
b : bb -> . (b is not unique)
c : cc -> . (c is not unique)
d : cb -> d (d is unique)

or concisely

abcd -> ...d

Maybe it's wishful thinking, but I have a feeling that there should be an elegant and general solution that would apply to an arbitrarily large list of strings (or binary numbers). But if there is I haven't yet been able to see it.

I hope to use this algorithm to to derive minimal signatures from a collection of unique images (bitmaps) in order to efficiently identify those images at a future time. If future efficiency wasn't a concern I would use a simple hash of each image.

Can you improve on brute force?

Edit
The approach I'm warming to is building a map of pixels to images

sprawl[Tuple<x=10, y=33,color=f1fefd>] => {
     image17,
     image23,
     ...
}

sprawl[Tuple<x=10, y=34,color=f1fef0>] => {
     image11
     ...
}

and then using that map to identify the minimal set of signature pixels for each image.

If a pixel (identified by x, y, color) references just one image then I have found a perfect (minimal) signature for that image.

It's more complicated if an image has no unique pixels, but since I know all images are unique within the list I should be able to combine two or more pixel references (but as few as possible) to deduce the image.

Update

I've been working on an algorithm for this. My problem is very similar to this one, and I've written up my algorithm as an answer to that question. This update is to flag the attention of anyone still following (I see five bookmarks). I'm working on this in isolation so any and all feedback is welcome, even if just to observe that I haven't made myself clear!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

撩发小公举 2024-09-08 15:53:58

这个问题可以通过trie或前缀树来解决。

请参阅 Trie - 维基百科，免费百科全书

对于示例中的 3 个字符串：

abcd
abcc
bbcb

将转换为trie 树（其中 ^ 表示树的根）：

^--a-b-c-d
 \      \
  \      c
   \
    b-b-c-b

到其分支节点的路径是公共前缀。最后一个分支点之后的节点使特定字符串变得唯一。在本例中，它们是 d、c、b。

我认为字符串的顺序对您来说并不重要，您可以比较所有字符串以找到唯一性，而不仅仅是相邻的字符串。

复杂度应该是 O(nxm)。但这可能会受到字符串中字符的域的影响。

This problem can be solved by trie, or prefix tree.

See Trie - Wikipedia, the free encyclopedia

For the 3 strings in your example:

abcd
abcc
bbcb

will be turned into a trie tree (where ^ denotes the root of the tree):

^--a-b-c-d
 \      \
  \      c
   \
    b-b-c-b

The path to the node where it branch off are the common prefix. The node after the last branch point is what makes a particular string unique. In this case, they are d, c, b.

I assume the order of string is not important for you, that you compares all strings to find the uniqueness, not just the neighboring string.

The complexity should be O(n x m). But this will probably affected by the domain of the characters in your string.

回复收藏 0 原文

终止放荡 2024-09-08 15:53:57

您可以生成一个二维数组，其中包含每个字符在每个位置 (0-3) 中出现的次数。例如，arr[1,3] 将包含数字/字符1 在最后一个位置出现的次数。

然后对于每个字符串 s，检查字符串中的所有字符。根据数组，在该位置仅出现一次的字符是该字符串的唯一字符。换句话说，如果 arr[s[i], i]==1 则字符串 s 在位置 i 中是唯一的。

这将为您提供线性时间的解决方案，而您给出的算法将花费二次时间。

回复收藏 0 原文

薄荷港 2024-09-08 15:53:57

如果您的目标是稍后识别图像，您可以通过选择预定义点作为身份像素来创建图像的非常快速的哈希值。

例如，您可以有一个结构（类、结构，无论什么语言）如下：

structure ImageHash {
    int x_pixels, y_pixels;
    u_long hash;
    void createHash(Image img) {
        x_pixels = img.x_pixels;
        y_pixels = img.y_pixels;
        for(int i = 1; i < 5; i++) {
            int x = x_pixels / i;
            for(int j = 1; j < 5; j++) {
                int y = y_pixels / j;
                int r = img.getPixelRed(x,y);
                int g = img.getPixelGreen(x,y);
                int b = img.getPixelBlue(x,y);
                hash = (hash * 31) ^ (r^g^b);
            }
        }
    }
}

这种“不完整哈希”将允许您识别可能的身份，然后您可以根据需要谨慎地进行昂贵的全面比较。

根据需要展开不完整的哈希。

If your goal is to identify images later, you could create a very fast hash of the image by picking predefined points to serve as identity pixels.

for example, you could have a structure (class, struct, doesn't matter what language) as follows:

structure ImageHash {
    int x_pixels, y_pixels;
    u_long hash;
    void createHash(Image img) {
        x_pixels = img.x_pixels;
        y_pixels = img.y_pixels;
        for(int i = 1; i < 5; i++) {
            int x = x_pixels / i;
            for(int j = 1; j < 5; j++) {
                int y = y_pixels / j;
                int r = img.getPixelRed(x,y);
                int g = img.getPixelGreen(x,y);
                int b = img.getPixelBlue(x,y);
                hash = (hash * 31) ^ (r^g^b);
            }
        }
    }
}

This sort of "incomplete hash" will allow you identify possible identities, and then you can do the expensive, full comparison sparingly as required.

Expand the incomplete hash as necessary.

回复收藏 0 原文

~没有更多了~