查找最多 2 个不同位置的字符串邻居
给定一个种子字符串,我想找到其邻居最多有 2 个位置不同。 生成字符串涉及的所有数字只有四位(即0、1、2、3)。 这就是我的意思的例子:
# In this example, 'first' column
# are neighbors with only 1 position differ.
# The rest of the columns are 2 positions differ
Seed = 000
100 110 120 130 101 102 103
200 210 220 230 201 202 203
300 310 320 330 301 302 303
010 011 012 013
020 021 022 023
030 031 032 033
001
002
003
Seed = 001
101 111 121 131 100 102 103
201 211 221 231 200 202 203
301 311 321 331 300 302 303
011 010 012 013
021 020 022 023
031 030 032 033
000
003
002
Hence given a tag of length L
we will have 3*L + 9L(L-1)/2 neighbors
但是为什么我的这段代码无法正确生成它? 特别是当种子字符串不是“000”时。
其他方法也受到欢迎,特别是在速度改进方面。 自从 我们将处理数百万个长度为 34 到 36 的种子标签。
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
using namespace std;
string ConvertInt2String(int IntVal) {
std::string S;
std::stringstream out;
out << IntVal;
S = out.str();
return S;
}
string Vec2Str (vector <int> NTg) {
string StTg = "";
for (unsigned i = 0; i < NTg.size(); i++) {
StTg += ConvertInt2String(NTg[i]);
}
return StTg;
}
template <typename T> void prn_vec(const std::vector < T >&arg, string sep="")
{
for (unsigned n = 0; n < arg.size(); n++) {
cout << arg[n] << sep;
}
return;
}
vector <int> neighbors(vector<int>& arg, int posNo, int baseNo) {
// pass base position and return neighbors
vector <int> transfVec;
transfVec = arg;
//modified according to strager's first post
transfVec[posNo % arg.size()] = baseNo;
return transfVec;
}
int main () {
vector <int> numTag;
numTag.push_back(0);
numTag.push_back(0);
numTag.push_back(1); // If "000" this code works, but not 001 or others
// Note that in actual practice numTag can be greater than 3
int TagLen = static_cast<int>(numTag.size());
for ( int p=0; p< TagLen ; p++ ) {
// First loop is to generate tags 1 position differ
for ( int b=1; b<=3 ; b++ ) {
int bval = b;
if (numTag[p] == b) {
bval = 0;
}
vector <int> nbnumTag = neighbors(numTag, p, bval);
string SnbnumTag = Vec2Str(nbnumTag);
cout << SnbnumTag;
cout << "\n";
// Second loop for tags in 2 position differ
for (int l=p+1; l < TagLen; l++) {
for (int c=1; c<=3; c++) {
int cval = c;
if (nbnumTag[l] == c) {
cval = c;
}
vector <int> nbnumTag2 = neighbors(nbnumTag, l, cval);
string SnbnumTag2 = Vec2Str(nbnumTag2);
cout << "\t" << SnbnumTag2;
cout << "\n";
}
}
}
}
return 0;
}
Given a seed string, I want to find its neighbors with at most differ in 2 positions. All the digits involve in generating string are only four (i.e. 0,1,2,3). This is the example for what I mean:
# In this example, 'first' column
# are neighbors with only 1 position differ.
# The rest of the columns are 2 positions differ
Seed = 000
100 110 120 130 101 102 103
200 210 220 230 201 202 203
300 310 320 330 301 302 303
010 011 012 013
020 021 022 023
030 031 032 033
001
002
003
Seed = 001
101 111 121 131 100 102 103
201 211 221 231 200 202 203
301 311 321 331 300 302 303
011 010 012 013
021 020 022 023
031 030 032 033
000
003
002
Hence given a tag of length L
we will have 3*L + 9L(L-1)/2 neighbors
But why this code of mine fails to generate it correctly? Especially when the seed string is other than "000".
Other approaches are also welcomed, escpecially with speed improvement. Since
we will be processing millions of seed tags of length 34 to 36.
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
using namespace std;
string ConvertInt2String(int IntVal) {
std::string S;
std::stringstream out;
out << IntVal;
S = out.str();
return S;
}
string Vec2Str (vector <int> NTg) {
string StTg = "";
for (unsigned i = 0; i < NTg.size(); i++) {
StTg += ConvertInt2String(NTg[i]);
}
return StTg;
}
template <typename T> void prn_vec(const std::vector < T >&arg, string sep="")
{
for (unsigned n = 0; n < arg.size(); n++) {
cout << arg[n] << sep;
}
return;
}
vector <int> neighbors(vector<int>& arg, int posNo, int baseNo) {
// pass base position and return neighbors
vector <int> transfVec;
transfVec = arg;
//modified according to strager's first post
transfVec[posNo % arg.size()] = baseNo;
return transfVec;
}
int main () {
vector <int> numTag;
numTag.push_back(0);
numTag.push_back(0);
numTag.push_back(1); // If "000" this code works, but not 001 or others
// Note that in actual practice numTag can be greater than 3
int TagLen = static_cast<int>(numTag.size());
for ( int p=0; p< TagLen ; p++ ) {
// First loop is to generate tags 1 position differ
for ( int b=1; b<=3 ; b++ ) {
int bval = b;
if (numTag[p] == b) {
bval = 0;
}
vector <int> nbnumTag = neighbors(numTag, p, bval);
string SnbnumTag = Vec2Str(nbnumTag);
cout << SnbnumTag;
cout << "\n";
// Second loop for tags in 2 position differ
for (int l=p+1; l < TagLen; l++) {
for (int c=1; c<=3; c++) {
int cval = c;
if (nbnumTag[l] == c) {
cval = c;
}
vector <int> nbnumTag2 = neighbors(nbnumTag, l, cval);
string SnbnumTag2 = Vec2Str(nbnumTag2);
cout << "\t" << SnbnumTag2;
cout << "\n";
}
}
}
}
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
这能行吗? 它枚举可能的字符串树,修剪所有与原始字符串的差异大于 2 的字符串。
Would this do it? It enumerates the tree of possible strings, pruning all with >2 differences from the original.
这是一种适用于任意数量的字符和字符串长度的方法:
Here's one way to do it that should work for any number of characters and length of string:
这应该相当于在 4 个符号字母表上生成汉明距离 2 内的所有字符串。 我已经看过它的算法,但我现在找不到它们。 也许这可以作为正确方向的指针。
This should be equivalent to generating all the strings within a hamming distance of 2, over a 4-symbol alphabet. I've seen algorithms for it, but I'm at a loss to find them right now. Perhaps this can serve as a pointer in the right direction.
您的问题[编辑:原始问题(请参阅问题的先前修订版)]是在您的内部循环中,您仅分配“下一个”元素。 一个快速修复方法是将写入包装在
neighbors
中:此修复仅在数组中有两个或三个项目时有效。 如果您想要更多,您需要重写您的算法,因为它根本不处理长度大于三的情况。 (甚至不需要。您使用的算法限制太多。)
Your problem [EDIT: the original one (see previous revisions of question)] is that in your inner loop, you're only assigning the 'next' element. A quick fix is to wrap the write in
neighbors
:This fix only works when you have two or three items in your array. If you want more, you need to rewrite your algorithm as it doesn't handle cases where the length is greater than three at all. (It shouldn't need to, even. The algorithm you use is just too restrictive.)
这两个 if:
应该有
continue
主体。这两个循环应该从 0 开始:
These two if's:
Should instead have bodies of
continue
.These two loops should start at 0:
看起来 strager 已经确定了主要问题:循环条件。 你的字母表是 0,1,2,3,所以你应该循环整个范围。 0 不是一个特殊情况,因为您的代码会尝试处理它。 特殊情况是当字母值等于键中的值时跳过迭代,这就是 strager 建议的 continue 所完成的任务。
以下是我的算法版本。 它对循环结构有一些替代想法,并且通过就地修改密钥来避免复制密钥。 请注意,您还可以通过更改
MIN_VALUE
和MAX_VALUE
常量来更改字母表的大小。这是“001”情况的输出:
这是代码:
It looks like strager has identified the main problem: the loop conditions. Your alphabet is 0,1,2,3, so you should loop over that whole range. 0 is not a special case, as your code tries to treat it. The special case is to skip the iteration when the alphabet value equals the value in your key, which is what the continue suggested by strager accomplishes.
Below is my version of your algorithm. It has some alternative ideas for loop structures, and it avoids copying the key by modifying it in place. Note that you can also change the size of the alphabet by changing the
MIN_VALUE
andMAX_VALUE
constants.Here's the output for the "001" case:
And here's the code:
我试图纠正你的算法,尽可能接近原始算法:
问题是你没有迭代所有 4 个可能的值 (0,1,2,3),但由于某种原因你跳过了 0 。 我这样做的方法是迭代所有这些并忽略(通过使用继续)与种子相同的向量或在第 1 阶段计算的 1 点不同标签。
话虽如此,我相信更好提出了比您的算法更好的算法,最好考虑其中之一。
I've tried to correct your algorithm, staying as close as possible to the original one:
The problem is that you don't iterate over all 4 possible values (0,1,2,3), but you skip 0 for some reason. The way I am doing it is to iterate over all of them and ignore (by using a continue) the vector that is the same with the seed or the 1-point different tag computed at phase 1.
Having said that, I believe that better algorithms than yours are proposed and it would be better to consider one of them.
这是我丑陋的、hacky 的解决方案:
Here's my ugly, hacky solution: