按正确的顺序在字符串数组中查找常见字符
我花了几天时间研究一个函数,以正确的顺序获取字符串数组中的常见字符,以创建通配符。
这是一个解释我的问题的例子。我做了大约3个函数,但是当每个字母的绝对位置不同时,我总是遇到一个错误。
让我们假设“+”是“通配符字符”:
Array(
0 => '48ca135e0$5',
1 => 'b8ca136a0$5',
2 => 'c48ca13730$5',
3 => '48ca137a0$5');
应该返回:
$wildcard='+8ca13+0$5';
在此示例中,棘手的事情是 $array[2]
比其他字符多 1 个字符。
其他示例:
Array(
0 => "case1b25.occHH&FmM",
1 => "case11b25.occHH&FmM",
2 => "case12b25.occHH&FmM",
3 => "case20b25.occHH&FmM1");
应该返回:
$wildcard='case+b25.occHH&FmM+';
在这个示例中,棘手的部分是:
- 重复字符,例如1 -> 11 在“删除”部分,c->抄送公共部分
- $array[2] 中的“2”字符[3]中“要删除”部分不在同一位置
- 最后一个字符串末尾的“1”字符
我真的需要帮助,因为我找不到这个函数的解决方案,而它是我的应用程序的主要部分。
预先感谢,请不要犹豫提出问题,我会尽快回答。
米克乌尔
I spent days working on a function to get common chars in an array of strings, in the right order, to create a wildcard.
Here is an example to explain my problem. I made about 3 functions, but I always have a bug when the absolute position of each letter is different.
Let's assume "+" is the "wildcard char":
Array(
0 => '48ca135e0$5',
1 => 'b8ca136a0$5',
2 => 'c48ca13730$5',
3 => '48ca137a0$5');
Should return :
$wildcard='+8ca13+0$5';
In this example, the tricky thing is that $array[2]
as 1 char more than others.
Other example :
Array(
0 => "case1b25.occHH&FmM",
1 => "case11b25.occHH&FmM",
2 => "case12b25.occHH&FmM",
3 => "case20b25.occHH&FmM1");
Should return :
$wildcard='case+b25.occHH&FmM+';
In this example, the tricky parts are :
- Repeating chars, such as 1 -> 11 in the "to delete" part, and c -> cc in the common part
- The "2" char in $array[2] & [3] in the "to delete" part is not in the same position
- The "1" char at the end of the last string
I really need help because I can't find a solution to this function and it is a main part of my application.
Thanks in advance, don't hesitate to ask questions, I will answer as fast as possible.
Mykeul
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
似乎您想从一组示例字符串中创建类似正则表达式的内容。
一般来说,这可能相当棘手。找到这个链接,不确定是否相关:
http://scholar.google.com/学者?hl=en&rlz=1B3GGGL_enEE351EE351&q=%22regular%20expression%20by%20example%22&oq=&um=1&ie=UTF-8&sa=N&tab=ws
上另一方面,如果您只需要一个表示“0 个或多个字符”的特定通配符,那么它应该容易得多。 Levenshtein 距离 算法计算 2 个字符串之间的相似度。通常只需要结果,但在您的情况下,差异的地方很重要。您还需要针对 N 个字符串进行调整。
所以我建议学习这个算法,希望你能得到一些如何解决你的问题的想法(至少你会得到一些文本算法和动态编程的练习)。
PHP 中的继承算法:
_http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#PHP
您可能还想搜索“diff”的 PHP 实现。
http://paulbutler.org/archives/a-simple-diff -algorithm-in-php/
Seems you want to create something like regular expression out of set of example strings.
This might be quite tricki in general. Found this link, not sure if it's relevant:
http://scholar.google.com/scholar?hl=en&rlz=1B3GGGL_enEE351EE351&q=%22regular%20expression%20by%20example%22&oq=&um=1&ie=UTF-8&sa=N&tab=ws
On the other hand, if you need only one specific wildcard meaning "0 or more characters", then it should be much easier. Levenshtein distance algorithm computes similarity between 2 strings. Normally only result is needed, but in your case the places of differences are important. You also need to adapt this for N strings.
So I recommend to study this algorithm and hopefully you'll get some ideas how to solve your problem (at least you'll get some practice with text algorithms and dynamic programming).
Heres algorithm in PHP:
_http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#PHP
You might want also to search for PHP implementations of "diff".
http://paulbutler.org/archives/a-simple-diff-algorithm-in-php/
主要代码:
步骤 1:按长度(从最短到最长)对字符串进行排序,放入 array[]
步骤 2:比较 array[0] 和 array[1] 中的字符串以获得 $temp_wildcard
步骤 3:比较 array[2] 中的字符串与 $temp_wildcard 以创建新的 $temp_wildcard
步骤 4:继续将每个字符串与 $temp_wildcard 进行比较 - 最后一个 $wildcard 是您的 $temp_wildcard
好的,所以现在我们要讨论如何比较两个字符串以返回通配符字符串的问题。
子程序代码:
逐个字符比较字符串,当比较不匹配时将通配符替换为返回值。
要处理不同长度的问题,请对第二个字符串较长且有偏移量的每个字符额外运行此比较一次。 (比较 string1[x] 与 string2[x+offset]。)对于每个返回的字符串,计算通配符的数量。子例程应返回具有最少数量通配符的答案。
祝你好运!
Main code:
Step 1: Sort strings by length, shortest to longest, into array[]
Step 2: Compare string in array[0] and array[1] to get $temp_wildcard
Step 3: Compare string in array[2] with $temp_wildcard to create new $temp_wildcard
Step 4: Continue comparing each string with $temp_wildcard - the last $wildcard is your $temp_wildcard
OK, so now we're down to the problem of how to compare two strings to return your wildcard string.
Subroutine code:
Compare strings character-by-character, substituting wildcards into your return value when the comparison doesn't match.
To handle the problem of different lengths, run this comparison an extra time for each character that the second string is longer with an offset. (Compare string1[x] to string2[x+offset].) For each returned string, count the number of wildcard characters. The subroutine should return the answer with the fewest number of wildcard characters.
Good luck!