我有一个正在查看多个元素的函数。每个元素都是 8x1 列向量的形式。向量中的每个条目都是小于 1000 的整数。每次看到这样的向量,我都会在检查该向量是否已在此列表中后将其添加到“已见过”向量列表中。该函数将检查约 100,000 个此类向量。
最初我尝试使用 ismember(v', M, 'rows'),但发现这非常慢。接下来我尝试:
found = containers.Map('KeyType', 'double', 'ValueType', 'any');
然后每次检查新向量v
时,计算:
key = dot(v, [1000000000000000000000 1000000000000000000 1000000000000000 ...
1000000000000 1000000000 1000000 1000 1]);
然后检查isKey(found, key)
。如果密钥不在容器中,则 found(key) = 1
。
这似乎是一个非常糟糕的解决方案,尽管它的运行速度确实比 ismember
快得多。任何帮助/建议将不胜感激。
编辑:也许使用 mat2str 来生成密钥会比这个愚蠢的点积更好?
I have a function that is looking at a number of elements. Each element is of the form of an 8x1 column vector. Each entry in the vector is an integer less than 1000. Every time I see such a vector, I'd like to add it to a list of "already seen" vectors, after checking to see that the vector is not already on this list. The function will examine on the order of ~100,000 such vectors.
Originally I tried using ismember(v', M, 'rows')
, but found this to be very slow. Next I tried:
found = containers.Map('KeyType', 'double', 'ValueType', 'any');
Then each time I examine a new vector v
, compute:
key = dot(v, [1000000000000000000000 1000000000000000000 1000000000000000 ...
1000000000000 1000000000 1000000 1000 1]);
Then check isKey(found, key)
. If the key is not in the container, then found(key) = 1
.
This seems like a pretty lousy solution, even though it does run considerably faster than ismember
. Any help/suggestions would be greatly appreciated.
EDIT: Perhaps it would be better to use mat2str
to generate the key, rather than this silly dot product?
发布评论
评论(3)
在您的情况下生成密钥/哈希的最简单方法是使用
char
。由于您的整数值永远不会超过 1000,并且char
可以接受从 0 到 65535 的数值(对应于 Unicode 字符),因此这将为每个唯一的 8 位字符提供唯一的 8 字符密钥。 1 个向量。这是一个例子:The easiest way to generate a key/hash in your case would be to just convert the vector of integer values to a character array using
char
. Since your integer values never go above 1000, andchar
can accept numeric values from 0 to 65535 (corresponding to Unicode characters), this will give you a unique 8-character key for every unique 8-by-1 vector. Here's an example:你的主意很好。但你需要找到更好的哈希函数。使用一些标准的哈希函数。
您希望看到“sha”算法的实现:
http://www.se.mathworks.com/matlabcentral/fileexchange/31795-sha-algorithms-160224256384-512
如果您发现 sha 算法很慢,那么您可能可以采取一些技巧。我现在能想到的一个是:
这可能应该有效,但你必须检查一下。
your idea is good. but you need to find a better hash function. use some standard hash function.
There is an implementation of 'sha' algorithms you's like to see:
http://www.se.mathworks.com/matlabcentral/fileexchange/31795-sha-algorithms-160224256384-512
If you find the sha algorithm slow then you can probably resort to some tricks. One that i can think of now is following:
this should probably work but you'll have to check.
不太喜欢哈希,但仍然相信找到了解决问题的最简单方法。
它的运行速度比 ismember 快大约 10 倍。
Not really into hashing, but still believe to have found the simplest way to solve your problem.
This runs about 10x faster than ismember.