反向搜索相似字符串列表的最佳方法

发布于 2024-09-15 06:38:13 字数 301 浏览 2 评论 0原文

我有一个数据列表，其中包括命令字符串以及字母表（大写和小写），总共 512 个以上（包括子列表）字符串。我想解析输入数据，但除了从最大可能的命令大小开始并将其缩小直到找到与字符串相同的命令然后输出的位置之外，我想不出任何方法可以正确执行此操作命令，但这需要很长时间。我能想到的任何其他方式都会导致重叠。我在 python 中这样做

说：

L = ['a', 'b',['aa','bb','cc'], 'c']

对于“bb”，输出将是“0201”，“c”将是“03”，

那么我应该怎么做？

原文

I have a list of data that includes both command strings as well as the alphabet, upper and lowercase, totaling to 512+ (including sub-lists) strings. I want to parse the input data, but i cant think of any way to do it properly other than starting from the largest possible command size and cutting it down until i find a command that is the same as the string and then output the location of the command, but that takes forever. any other way i can think of will cause overlapping. im doing this in python

say:

L = ['a', 'b',['aa','bb','cc'], 'c']

for 'bb' the output would be '0201' and 'c' would be '03'

so how should i do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

丢了幸福的猪 2024-09-22 06:38:13

听起来您正在列表中搜索每个子字符串。您构建一个字典来查找键怎么样？当然，您仍然必须从最长的子项开始搜索。

L = ['a', 'b',['aa','bb','cc'], 'c']

def lookups( L ):
    """ returns `item`, `code` tuples """
    for i, item in enumerate(L):
        if isinstance(item, list):
            for j, sub in enumerate(item):
                yield sub, "%02d%02d" % (i,j)
        else:
            yield item, "%02d" % i

然后，您可以使用以下命令查找子字符串：

lookupdict = dict(lookups(L))
print lookupdict['bb'] # but you have to do 'bb' before trying 'b' ...

但如果密钥长度不仅仅是 1 或 2，则将项目分组到每个密钥具有相同长度的单独字典中也可能是有意义的。

It sounds like you're searching through the list for every substring. How about you built a dict to lookup the keys. Of cause you still have to start searching at the longest subkey.

L = ['a', 'b',['aa','bb','cc'], 'c']

def lookups( L ):
    """ returns `item`, `code` tuples """
    for i, item in enumerate(L):
        if isinstance(item, list):
            for j, sub in enumerate(item):
                yield sub, "%02d%02d" % (i,j)
        else:
            yield item, "%02d" % i

You could then lookup substrings with:

lookupdict = dict(lookups(L))
print lookupdict['bb'] # but you have to do 'bb' before trying 'b' ...

But if the key length is not just 1 or 2, it might also make sense to group the items into separate dicts where each key has the same length.

回复收藏 0 原文

柏林苍穹下 2024-09-22 06:38:13

如果你必须使用这个数据结构：

from collections import MutableSequence

def scanList( command, theList ):
    for i, elt in enumerate( theList ):
        if elt == command:
            return ( i, None )
        if isinstance( elt, MutableSequence ):
            for j, elt2 in enumerate( elt ):
                if elt2 == command:
                    return i, j

L = ['a', 'b',['aa','bb','cc'], 'c']
print( scanList( "bb", L ) )
# (2, 1 )
print( scanlist( "c", L ) )
# (3, None )

但是

这是一个糟糕的数据结构。您能以更好的形式获取这些数据吗？

If you must use this data structure:

from collections import MutableSequence

def scanList( command, theList ):
    for i, elt in enumerate( theList ):
        if elt == command:
            return ( i, None )
        if isinstance( elt, MutableSequence ):
            for j, elt2 in enumerate( elt ):
                if elt2 == command:
                    return i, j

L = ['a', 'b',['aa','bb','cc'], 'c']
print( scanList( "bb", L ) )
# (2, 1 )
print( scanlist( "c", L ) )
# (3, None )