通过转换为数字的字符串匹配技术?
我有各种长度的字符串,其中充满了 Base64 字符。 实际上,它们是因歌曲而异的音频识别数据。
为了轻松比较这些字符串的各个部分,我将它们分成 16 个字符的子字符串。 (大约是一首歌的 1 秒)但在某些情况下,我只是无法将这些进行比较......我应该测量它们。
例如,与“hellohellohelloo
”和“hallohellohelloo
”进行比较应该得到比“hellohellohelloo
”和“herehellohelloo'比较。
是否有任何算法或理论
编辑:抱歉,我是新来的:)而且我无法说清楚。 这里有一些评论可以让我清楚并提出一个想法。
评论1:
实际上我知道编辑距离,但问题是每次我比较两个字符串时,我都必须构建比较矩阵,这使得搜索过程变慢。 例如,如果我可以将 hello 转换为 4444,将 Hallo 转换为 4443,我可以通过索引数值来确定“hello”记录的接近程度。
评论 2:
也许我应该确定一个基本的恒定长度字符串并将它们之间的距离值存储为字符串的索引值。 这只是一个想法?!
I have various length strings which are full of Base64 chars. Actualy they are audio recognition datas differs by song-to-song.
For easily comparing parts of those strings i divide them into 16-char sub-strings. (which is about 1 second of a song) But in some cases, i just can't compare these ones head to head.. i should be measuring them.
For example comparison with 'hellohellohelloo
' and 'hallohellohelloo
' should get a closer value then 'hellohellohelloo
' and 'herehellohelloo
' comparison.
Is there any algorithm or theorical
Edit: Sorry, i am new here :) And i couldn't make myself clear. Here are some comments that will make me clear and proposes an idea.
Comment 1:
Actually i know about Levenshtein distance, but the problem is every time i compare two strings i have to build comparison matrix and that makes searching process slow. If i can convert for example hello to 4444 and hallo to 4443 i can determine how close records i have for 'hello' by just indexing numerical values.
Comment 2:
Maybe i should determine a base constant-length string(s) and store distance values from them as the index values for string. It's just an idea?!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Levenshtein 的距离可能会对您有所帮助:http://en.wikipedia.org/wiki/Levenshtein_distance
它通常非常快,并且大多数现代语言也有实现。
Levenshtein's distance will probably help you : http://en.wikipedia.org/wiki/Levenshtein_distance
It's usually pretty fast, and there are implementations in most modern languages too.
编辑距离可能适合您。 另请参阅维基百科关于编辑距离的概述。
The Levenshtein distance might work for you. Also see the Wikipedia overview of edit distance.