当前位置：文江博客话题详情

如何在re.search中给所有特殊角色的字面意义？

发布于 2025-01-27 04:45:05 字数 1296 浏览 2 评论 0原文

我试图弄清楚制造商/品牌所有者是否在在线平台上出售产品。例如，对于带有品牌名称“ Hello Olly”的产品，我希望以下卖家名称显示

Hello olly
helloolly Inc.
Hello Olly Company，

但不是匹配，

XYZ Seller
Hello Olly Company

问题：我遇到了品牌名称具有特殊角色的问题，例如（

goal ：将所有特殊字符视为字面字符串。例如，

'您好（Olly'应该与“ Hello”（Olly Company'
- 如果它也与“ Hello Olly Company”相匹配，那将是多么的好 - 注释（已以卖方名称删除。
与 “ Hello Olly Company”相匹配 - 注释（已被卖方名称删除。打开（。这两个。都有（在产品名称中，如果没有匹配的闭合括号会产生额外的并发症。

将特殊字符视为文字字符串，则所有这些问题都应解决。

如果我希望他们都有任意数量的特殊字符。

：如果我没有特殊字符，处理特殊角色，但没有帮助

def match_string(brand, seller):
   
    brand = str(brand).lower().replace(" ", "") .replace("-", "") # may not need replace("-", "") if I have a better process to deal with all special characters.
    seller = str(seller).lower().replace(" ", "") .replace("-", "")

    # Tried the following two lines to give special characters their literal meaning. But it doesn't seem to work
    brand = re.escape(brand)
    seller = re.escape(seller)
    
    try:
        match = re.search(brand, seller).group()
        return True
    except AttributeError:
        return False

谢谢大家

原文

I am trying to figure out if the manufacturer/brand owner is selling a product on an online platform. For example, for a product with the brand name “Hello Olly”, I would like the following seller names to show a match

HELlo ollY
HelloOlly Inc.
The hello olly Company

But not a Match for,

XYZ Seller
Hello The olly company

Problem: I run into problems where the brand name has special characters, such as (

Goal: To treat all special characters as literal strings. For example,

‘Hello (olly’ should show a match with ‘The Hello (olly Company’
- Would be extra nice, if it also matches ‘Hello olly Company’ – note ( has been removed in seller name.
‘Hello (olly)’ should show a match with ‘The Hello (olly) Company’ – note the first instance had only opening (. This has both (). Having just ( in the product name creates extra complications, if there isn’t a matching closing bracket.

All of these problems should be resolved if special characters are treated as literal strings.

Note: There could be an arbitrary number of special characters at any position. I would like them all to have their literal meaning.

The following function works if there are no special characters. I tried to use re.escape() to deal with special characters, but it didn’t help

def match_string(brand, seller):
   
    brand = str(brand).lower().replace(" ", "") .replace("-", "") # may not need replace("-", "") if I have a better process to deal with all special characters.
    seller = str(seller).lower().replace(" ", "") .replace("-", "")

    # Tried the following two lines to give special characters their literal meaning. But it doesn't seem to work
    brand = re.escape(brand)
    seller = re.escape(seller)
    
    try:
        match = re.search(brand, seller).group()
        return True
    except AttributeError:
        return False

THANKS, everyone

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时常饿 2025-02-03 04:45:05

我通常更喜欢我们只是REGEX或只是“标准”字符串操作，而不是混合它们（可能只是我的个人喜好）。您可以首先设置re.ignorecase标志，以避免将所有内容施放到str.lowercase（）首先。

现在，直截了当的方法是实际将正则pattern喂给您的功能。在您的示例中，它将是'Hello \ s？\（？olly \）？（\ s？（inc \。？）|（Company）|（comp \。））？''' 。您当然只能查找'Hello \ s？如果输入字符串是'Hello（olly）company）'

。在编程中构造正则构造的问题是，python会自动添加逃脱字符，以逃脱您要在模式中使用的逃生字符

。

re.error：位置0
使用re.sub（' +'，'\ s'，text）
时出错

或r'（\（\（| \ [| \ {）？'时，

错误
'（\（| \ [| \ {）？'

（| \ [| \ {）吗

'（ \ 可以解释为布尔人：

if re.search(brand, seller, re.IGNORECASE):
    print('I found the pattern!')
else:
    print('I did not found the pattern =(')

I usually prefer to us just regex or just "standard" string-manipulation and not mix them (could be hat it is just my personal preference). You could set the re.IGNORECASE flag to avoid casting everything to str.lowercase() first.

Now, the straight forward way would be to actually feed a regex-pattern to your function. In your example, it would be 'Hello\s?\(?olly\)?(\s?(Inc\.?)|(Company)|(Comp\.))?’'. You could of course only look for 'Hello\s?\(?olly' which would also return a match in all cases but not returns the part 'Hello (olly' if the input string yould be 'The Hello (olly) Company)'.

Howevery, I fear that you are trying to write a function that builds a regex pattern from an input string. That is difficult as it needs quite a few assumptions. The problem with constructing regex-patterns programatically is that is that python automatically adds escape characters to escape the escape characters that you want to use in the pattern.

That is why you get a

re.error: bad escape \s at position 0
error when using re.sub(' +', '\s', text)

or r'(\(|\[|\{)?' will look like

'(\(|\[|\{)?'

I never thought of how to avoid this tough...

BTW, the return value of re.search(brand, seller, re.IGNORECASE) can be interpreted as a boolean:

if re.search(brand, seller, re.IGNORECASE):
    print('I found the pattern!')
else:
    print('I did not found the pattern =(')

回复收藏 0 原文

~没有更多了~

关于作者

山田美奈子

暂无简介

文章

29 人气

关注发私信

友情链接

文江博客

如何在re.search中给所有特殊角色的字面意义？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如何在re.search中给所有特殊角色的字面意义？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。