用通配符匹配 2 个列表的算法

发布于 2024-12-26 11:51:14 字数 264 浏览 0 评论 0原文

我正在寻找一种有效的方法来匹配两个列表,一个包含完整信息,另一个包含通配符。我已经能够使用固定长度的通配符来做到这一点,但现在我尝试使用可变长度的通配符来做到这一点。

因此:

match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )

只要所有元素在两个列表中的顺序相同,就会返回 True。

我正在使用对象列表,但为了简单起见,使用了上面的字符串。

I'm looking for an efficient way to match 2 lists, one wich contains complete information, and one which contains wildcards. I've been able to do this with wildcards of fixed lengths, but am now trying to do it with wildcards of variable lengths.

Thus:

match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )

would return True as long as all the elements are in the same order in both lists.

I'm working with lists of objects, but used strings above for simplicity.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

不打扰别人 2025-01-02 11:51:14

[编辑以证明在比较对象的OP评论之后没有RE]

看来您没有使用字符串,而是比较对象。因此,我给出了一个显式算法——正则表达式为字符串提供了一个很好的解决方案,不要误会我的意思,但是从你对问题的评论来看,似乎一个显式的、简单的算法可能会让你的事情变得更容易。

事实证明,这可以用比之前的答案更简单的算法来解决:

def matcher (l1, l2):
    if (l1 == []):
        return (l2 == [] or l2 == ['*'])
    if (l2 == [] or l2[0] == '*'):
        return matcher(l2, l1)
    if (l1[0] == '*'):
        return (matcher(l1, l2[1:]) or matcher(l1[1:], l2))
    if (l1[0] == l2[0]):
        return matcher(l1[1:], l2[1:])
    else:
        return False

关键的想法是,当你遇到通配符,您可以探索两个选项:

  • 要么在包含通配符的列表中前进(并考虑通配符与迄今为止存在的任何内容相匹配)
  • ,要么在不包含通配符的列表中前进(并考虑在列表的头部有由通配符匹配)。

[edited to justify no RE after OP comment on comparing objects]

It appears you are not using strings, but rather comparing objects. I am therefore giving an explicit algorithm — regular expressions provide a good solution tailored for strings, don't get me wrong, but from what you say as a comment to your questions, it seems an explicit, simple algorithm may make things easier for you.

It turns out that this can be solved with a much simpler algorithm than this previous answer:

def matcher (l1, l2):
    if (l1 == []):
        return (l2 == [] or l2 == ['*'])
    if (l2 == [] or l2[0] == '*'):
        return matcher(l2, l1)
    if (l1[0] == '*'):
        return (matcher(l1, l2[1:]) or matcher(l1[1:], l2))
    if (l1[0] == l2[0]):
        return matcher(l1[1:], l2[1:])
    else:
        return False

The key idea is that when you encounter a wildcard, you can explore two options :

  • either advance in the list that contains the wildcard (and consider the wildcard matched whatever there was until now)
  • or advance in the list that doesn't contain the wildcard (and consider that whatever is at the head of the list has to be matched by the wildcard).
愚人国度 2025-01-02 11:51:14

下面怎么样:

import re

def match(pat, lst):
  regex = ''.join(term if term != '*' else '.*' for term in pat) + '

它使用正则表达式。通配符 (*) 更改为 .*,所有其他搜索词保持原样。

需要注意的是,如果您的搜索词可能包含在正则表达式语言中具有特殊含义的内容,则需要正确转义这些内容。在 match 函数中处理这个问题非常容易,我只是不确定这是否是您所需要的。

s = ''.join(lst) return re.match(regex, s) is not None print match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )

它使用正则表达式。通配符 (*) 更改为 .*,所有其他搜索词保持原样。

需要注意的是,如果您的搜索词可能包含在正则表达式语言中具有特殊含义的内容,则需要正确转义这些内容。在 match 函数中处理这个问题非常容易,我只是不确定这是否是您所需要的。

How about the following:

import re

def match(pat, lst):
  regex = ''.join(term if term != '*' else '.*' for term in pat) + '

It uses regular expressions. Wildcards (*) are changed to .* and all other search terms are kept as-is.

One caveat is that if your search terms could contain things that have special meaning in the regex language, those would need to be properly escaped. It's pretty easy to handle this in the match function, I just wasn't sure if this was something you required.

s = ''.join(lst) return re.match(regex, s) is not None print match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )

It uses regular expressions. Wildcards (*) are changed to .* and all other search terms are kept as-is.

One caveat is that if your search terms could contain things that have special meaning in the regex language, those would need to be properly escaped. It's pretty easy to handle this in the match function, I just wasn't sure if this was something you required.

机场等船 2025-01-02 11:51:14

我建议将 ['A', 'B', '*', 'D'] 转换为 '^AB.*D$', [ 'A', 'B', 'C', 'C', 'C', 'D']'ABCCCD',然后使用 re 模块(正则表达式)进行匹配。

如果列表的元素每个只有一个字符,并且它们是字符串,则这将是有效的。

例如:

import(re)
def myMatch( patternList, stringList ):
    # convert pattern to flat string with wildcards
    # convert AB*D to valid regex ^AB.*D$
    pattern = ''.join(patternList) 
    regexPattern = '^' + pattern.replace('*','.*') + '

如果列表包含数字或单词,请选择您不希望出现的字符,例如 #。然后['Aa','Bs','Ce','Cc','CC','Dd']可以转换为Aa#Bs#Ce#Cc#CC# Dd,通配符模式['Aa','Bs','*','Dd']可以转换为^Aa#Bs#.*#Dd$,并执行匹配。

实际上,这仅意味着 myMatch 中的所有 ''.join(...) 都变为 '#'.join(...)

# perform matching against = ''.join(stringList) # convert ['A','B','C','C','D'] to ABCCCD # return whether there is a match return (re.match(regexPattern,against) is not None)

如果列表包含数字或单词,请选择您不希望出现的字符,例如 #。然后['Aa','Bs','Ce','Cc','CC','Dd']可以转换为Aa#Bs#Ce#Cc#CC# Dd,通配符模式['Aa','Bs','*','Dd']可以转换为^Aa#Bs#.*#Dd$,并执行匹配。

实际上,这仅意味着 myMatch 中的所有 ''.join(...) 都变为 '#'.join(...)

I'd recommend converting ['A', 'B', '*', 'D'] to '^AB.*D$', ['A', 'B', 'C', 'C', 'C', 'D'] to 'ABCCCD', and then using the re module (regular expressions) to do the match.

This will be valid if the elements of your lists are only one character each, and if they're strings.

something like:

import(re)
def myMatch( patternList, stringList ):
    # convert pattern to flat string with wildcards
    # convert AB*D to valid regex ^AB.*D$
    pattern = ''.join(patternList) 
    regexPattern = '^' + pattern.replace('*','.*') + '

If the lists contain numbers, or words, choose a character that you wouldn't expect to be in either, for example #. Then ['Aa','Bs','Ce','Cc','CC','Dd'] can be converted to Aa#Bs#Ce#Cc#CC#Dd, the wildcard pattern ['Aa','Bs','*','Dd'] could be converted to ^Aa#Bs#.*#Dd$, and the match performed.

Practically speaking this just means all the ''.join(...) becomes '#'.join(...) in myMatch.

# perform matching against = ''.join(stringList) # convert ['A','B','C','C','D'] to ABCCCD # return whether there is a match return (re.match(regexPattern,against) is not None)

If the lists contain numbers, or words, choose a character that you wouldn't expect to be in either, for example #. Then ['Aa','Bs','Ce','Cc','CC','Dd'] can be converted to Aa#Bs#Ce#Cc#CC#Dd, the wildcard pattern ['Aa','Bs','*','Dd'] could be converted to ^Aa#Bs#.*#Dd$, and the match performed.

Practically speaking this just means all the ''.join(...) becomes '#'.join(...) in myMatch.

黎夕旧梦 2025-01-02 11:51:14

我同意关于这可以用正则表达式来完成的评论。例如:

import re

lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = ['A', 'B', 'C+', 'D']

print re.match(''.join(pattern), ''.join(lst)) # Will successfully match

编辑:正如评论所指出的,可能提前知道必须匹配某个字符,但不知道是哪个字符。在这种情况下,正则表达式仍然有用:

import re

lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = r'AB(\w)\1*D'

print re.match(pattern, ''.join(lst)).groups()

I agree with the comment regarding this could be done with regular expressions. For example:

import re

lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = ['A', 'B', 'C+', 'D']

print re.match(''.join(pattern), ''.join(lst)) # Will successfully match

Edit: As pointed out by a comment, it might be known in advance just that some character has to be matched, but not which one. In that case, regular expressions are useful still:

import re

lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = r'AB(\w)\1*D'

print re.match(pattern, ''.join(lst)).groups()
往日情怀 2025-01-02 11:51:14

我同意,正则表达式通常是处理此类事情的方法。这个算法是有效的,但对我来说它看起来很复杂。不过写起来很有趣。

def match(listx, listy):
    listx, listy = map(iter, (listx, listy))
    while 1:
        try:
            x = next(listx)
        except StopIteration:
            # This means there are values left in listx that are not in listy.
            try:
                y = next(listy)
            except StopIteration:
                # This means there are no more values to be compared in either
                # listx or listy; since no exception was raied elsewhere, the
                # lists match.
                return True
            else:
                # This means that there are values in listy that are not in
                # listx.
                return False
        else:
            try:
                y = next(listy)
            except StopIteration:
                # Similarly, there are values in listy that aren't in listx.
                return False
        if x == y:
            pass
        elif x == '*':
            try:
                # Get the value in listx after '*'.
                x = next(listx)
            except StopIteration:
                # This means that listx terminates with '*'. If there are any
                # remaining values of listy, they will, by definition, match.
                return True
            while 1:
                if x == y:
                    # I didn't shift to the next value in listy because I
                    # assume that a '*' matches the empty string and well as
                    # any other.
                    break
                else:
                    try:
                        y = next(listy)
                    except StopIteration:
                        # This means there is at least one remaining value in
                        # listx that is not in listy, because listy has no
                        # more values.
                        return False
                    else:
                        pass
        # Same algorithm as above, given there is a '*' in listy.
        elif y == '*':
            try:
                y = next(listy)
            except StopIteration:
                return True
            while 1:
                if x == y:
                    break
                else:
                    try:
                        x = next(listx)
                    except StopIteration:
                        return False
                    else:
                        pass

I agree, regular expressions are usually the way to go with this sort of thing. This algorithm works, but it just looks convoluted to me. It was fun to write though.

def match(listx, listy):
    listx, listy = map(iter, (listx, listy))
    while 1:
        try:
            x = next(listx)
        except StopIteration:
            # This means there are values left in listx that are not in listy.
            try:
                y = next(listy)
            except StopIteration:
                # This means there are no more values to be compared in either
                # listx or listy; since no exception was raied elsewhere, the
                # lists match.
                return True
            else:
                # This means that there are values in listy that are not in
                # listx.
                return False
        else:
            try:
                y = next(listy)
            except StopIteration:
                # Similarly, there are values in listy that aren't in listx.
                return False
        if x == y:
            pass
        elif x == '*':
            try:
                # Get the value in listx after '*'.
                x = next(listx)
            except StopIteration:
                # This means that listx terminates with '*'. If there are any
                # remaining values of listy, they will, by definition, match.
                return True
            while 1:
                if x == y:
                    # I didn't shift to the next value in listy because I
                    # assume that a '*' matches the empty string and well as
                    # any other.
                    break
                else:
                    try:
                        y = next(listy)
                    except StopIteration:
                        # This means there is at least one remaining value in
                        # listx that is not in listy, because listy has no
                        # more values.
                        return False
                    else:
                        pass
        # Same algorithm as above, given there is a '*' in listy.
        elif y == '*':
            try:
                y = next(listy)
            except StopIteration:
                return True
            while 1:
                if x == y:
                    break
                else:
                    try:
                        x = next(listx)
                    except StopIteration:
                        return False
                    else:
                        pass
心凉怎暖 2025-01-02 11:51:14

我有这段 C++ 代码,它似乎正在做您想做的事情(输入是字符串而不是字符数组,但无论如何您都必须调整内容)。

bool Utils::stringMatchWithWildcards (const std::string str, const std::string strWithWildcards)
    PRINT("Starting in stringMatchWithWildcards('" << str << "','" << strWithWildcards << "')");
    const std::string wildcard="*";

    const bool startWithWildcard=(strWithWildcards.find(wildcard)==0);
    int pos=strWithWildcards.rfind(wildcard);
    const bool endWithWildcard = (pos!=std::string::npos) && (pos+wildcard.size()==strWithWildcards.size());

    // Basically, the point is to split the string with wildcards in strings with no wildcard.
    // Then search in the first string for the different chunks of the second in the correct order
    std::vector<std::string> vectStr;
    boost::split(vectStr, strWithWildcards, boost::is_any_of(wildcard));
    // I expected all the chunks in vectStr to be non-empty. It doesn't seem the be the case so let's remove them.
    vectStr.erase(std::remove_if(vectStr.begin(), vectStr.end(), std::mem_fun_ref(&std::string::empty)), vectStr.end());

    // Check if at least one element (to have first and last element)
    if (vectStr.empty())
    {
        const bool matchEmptyCase = (startWithWildcard || endWithWildcard || str.empty());
        PRINT("Match " << (matchEmptyCase?"":"un") << "successful (empty case) : '" << str << "' and '" << strWithWildcards << "'");
        return matchEmptyCase;
    }

    // First Element
    std::vector<std::string>::const_iterator vectStrIt = vectStr.begin();
    std::string aStr=*vectStrIt;
    if (!startWithWildcard && str.find(aStr, 0)!=0) {
        PRINT("Match unsuccessful (beginning) : '" << str << "' and '" << strWithWildcards << "'");
        return false;
    }

    // "Normal" Elements
    bool found(true);
    pos=0;
    std::vector<std::string>::const_iterator vectStrEnd = vectStr.end();
    for ( ; vectStrIt!=vectStrEnd ; vectStrIt++)
    {
        aStr=*vectStrIt;
        PRINT( "Searching '" << aStr << "' in '" << str << "' from  " << pos);
        pos=str.find(aStr, pos);
        if (pos==std::string::npos)
        {
            PRINT("Match unsuccessful ('" << aStr << "' not found) : '" << str << "' and '" << strWithWildcards << "'");
            return false;
        } else
        {
            PRINT( "Found at position " << pos);
            pos+=aStr.size();
        }
    }

    // Last Element
    const bool matchEnd = (endWithWildcard || str.rfind(aStr)+aStr.size()==str.size());
    PRINT("Match " << (matchEnd?"":"un") << "successful (usual case) : '" << str << "' and '" << strWithWildcards);
    return matchEnd;
}

   /* Tested on these values :
   assert( stringMatchWithWildcards("ABC","ABC"));
   assert( stringMatchWithWildcards("ABC","*"));
   assert( stringMatchWithWildcards("ABC","*****"));
   assert( stringMatchWithWildcards("ABC","*BC"));
   assert( stringMatchWithWildcards("ABC","AB*"));
   assert( stringMatchWithWildcards("ABC","A*C"));
   assert( stringMatchWithWildcards("ABC","*C"));
   assert( stringMatchWithWildcards("ABC","A*"));

   assert(!stringMatchWithWildcards("ABC","BC"));
   assert(!stringMatchWithWildcards("ABC","AB"));
   assert(!stringMatchWithWildcards("ABC","AB*D"));
   assert(!stringMatchWithWildcards("ABC",""));

   assert( stringMatchWithWildcards("",""));
   assert( stringMatchWithWildcards("","*"));
   assert(!stringMatchWithWildcards("","ABC"));
   */

这并不是我真正感到自豪的事情,但到目前为止似乎正在发挥作用。我希望你会发现它很有用。

I had this c++ piece of code which seems to be doing what you are trying to do (inputs are strings instead of arrays of characters but you'll have to adapt stuff anyway).

bool Utils::stringMatchWithWildcards (const std::string str, const std::string strWithWildcards)
    PRINT("Starting in stringMatchWithWildcards('" << str << "','" << strWithWildcards << "')");
    const std::string wildcard="*";

    const bool startWithWildcard=(strWithWildcards.find(wildcard)==0);
    int pos=strWithWildcards.rfind(wildcard);
    const bool endWithWildcard = (pos!=std::string::npos) && (pos+wildcard.size()==strWithWildcards.size());

    // Basically, the point is to split the string with wildcards in strings with no wildcard.
    // Then search in the first string for the different chunks of the second in the correct order
    std::vector<std::string> vectStr;
    boost::split(vectStr, strWithWildcards, boost::is_any_of(wildcard));
    // I expected all the chunks in vectStr to be non-empty. It doesn't seem the be the case so let's remove them.
    vectStr.erase(std::remove_if(vectStr.begin(), vectStr.end(), std::mem_fun_ref(&std::string::empty)), vectStr.end());

    // Check if at least one element (to have first and last element)
    if (vectStr.empty())
    {
        const bool matchEmptyCase = (startWithWildcard || endWithWildcard || str.empty());
        PRINT("Match " << (matchEmptyCase?"":"un") << "successful (empty case) : '" << str << "' and '" << strWithWildcards << "'");
        return matchEmptyCase;
    }

    // First Element
    std::vector<std::string>::const_iterator vectStrIt = vectStr.begin();
    std::string aStr=*vectStrIt;
    if (!startWithWildcard && str.find(aStr, 0)!=0) {
        PRINT("Match unsuccessful (beginning) : '" << str << "' and '" << strWithWildcards << "'");
        return false;
    }

    // "Normal" Elements
    bool found(true);
    pos=0;
    std::vector<std::string>::const_iterator vectStrEnd = vectStr.end();
    for ( ; vectStrIt!=vectStrEnd ; vectStrIt++)
    {
        aStr=*vectStrIt;
        PRINT( "Searching '" << aStr << "' in '" << str << "' from  " << pos);
        pos=str.find(aStr, pos);
        if (pos==std::string::npos)
        {
            PRINT("Match unsuccessful ('" << aStr << "' not found) : '" << str << "' and '" << strWithWildcards << "'");
            return false;
        } else
        {
            PRINT( "Found at position " << pos);
            pos+=aStr.size();
        }
    }

    // Last Element
    const bool matchEnd = (endWithWildcard || str.rfind(aStr)+aStr.size()==str.size());
    PRINT("Match " << (matchEnd?"":"un") << "successful (usual case) : '" << str << "' and '" << strWithWildcards);
    return matchEnd;
}

   /* Tested on these values :
   assert( stringMatchWithWildcards("ABC","ABC"));
   assert( stringMatchWithWildcards("ABC","*"));
   assert( stringMatchWithWildcards("ABC","*****"));
   assert( stringMatchWithWildcards("ABC","*BC"));
   assert( stringMatchWithWildcards("ABC","AB*"));
   assert( stringMatchWithWildcards("ABC","A*C"));
   assert( stringMatchWithWildcards("ABC","*C"));
   assert( stringMatchWithWildcards("ABC","A*"));

   assert(!stringMatchWithWildcards("ABC","BC"));
   assert(!stringMatchWithWildcards("ABC","AB"));
   assert(!stringMatchWithWildcards("ABC","AB*D"));
   assert(!stringMatchWithWildcards("ABC",""));

   assert( stringMatchWithWildcards("",""));
   assert( stringMatchWithWildcards("","*"));
   assert(!stringMatchWithWildcards("","ABC"));
   */

It's not something I'm really proud of but it seems to be working so far. I hope you can find it useful.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文