当前位置：文江博客话题详情

Python String regex dictionary brackets

删除 unicode 行括号中的字符串 - python

发布于 2024-11-17 11:20:52 字数 889 浏览 4 评论 0原文

我的正则表达式有一些问题，并删除了括号内的强项。

这是我的代码：

import sys, re
import codecs

reload(sys)
sys.setdefaultencoding('utf-8')

reader = codecs.open("input",'r','utf-8')
p = re.compile('s/[\[\(].+?[\]\)]//g', re.DOTALL)
# i've also tried several regex but it didn't work
# p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
# p = re.compile('\{\{*.*?\}\}', re.DOTALL)

for row in reader:
    if ("(" in row) and (")" not in row):
        continue
    if row.count("(") != row.count(")"):
        continue
    else:
        row2 = p.sub('', row)
        print row2

对于输入文本文件，它看起来像这样：

가시 돋친(신랄한)평 spinosity
가장 완전한 (같은 종류의 것 중에서)   unabridged
(알코올이)표준강도(50%) 이하의 underproof
(암초 awash
치명적인(fatal) capital
열을) 전도하다    transmit

所需的输出应如下所示：

가시 돋친평  spinosity
가장 완전한  unabridged
표준강도 이하의    underproof
치명적인    capital

i've got some problems with my regex and removing my the strongs bounded by brackets.

here's my code:

import sys, re
import codecs

reload(sys)
sys.setdefaultencoding('utf-8')

reader = codecs.open("input",'r','utf-8')
p = re.compile('s/[\[\(].+?[\]\)]//g', re.DOTALL)
# i've also tried several regex but it didn't work
# p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
# p = re.compile('\{\{*.*?\}\}', re.DOTALL)

for row in reader:
    if ("(" in row) and (")" not in row):
        continue
    if row.count("(") != row.count(")"):
        continue
    else:
        row2 = p.sub('', row)
        print row2

for the input textfiles it looks something like this:

가시 돋친(신랄한)평 spinosity
가장 완전한 (같은 종류의 것 중에서)   unabridged
(알코올이)표준강도(50%) 이하의 underproof
(암초 awash
치명적인(fatal) capital
열을) 전도하다    transmit

the required output should look like this:

가시 돋친평  spinosity
가장 완전한  unabridged
표준강도 이하의    underproof
치명적인    capital

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

冰雪梦之恋 2024-11-24 11:20:52

这对你有用吗？

# -*- coding: utf-8 -*-
import sys, re
import codecs

#reload(sys)
#sys.setdefaultencoding('utf-8')

#prepareing the examples to work on
writer = codecs.open("input.txt",'w','utf-8')
examples = [u'가시 돋친(신랄한)평 spinosity',
            u'가장 완전한 (같은 종류의 것 중에서)',
            u'알코올이)표준강도(50%) 이하의 underproof',
            u'(암초 awash',
            u'치명적인(fatal) capital']
for exampl in examples:
    writer.write(exampl+"\n")
writer.write(exampl)
writer.close()

reader = codecs.open("input.txt",'r','utf-8')

#order of patterns is important,
#if you remove brackets first, the other won't find anything
patterns_to_remove = [r"\(.{1,}\)",r"[\(\)]"]

#one pattern would work just fine, with the loop is a bit more clear
#pat = r"(\(.{1,}\))|([\(\)])"    
#for row in reader:
#    row = re.sub(pat,'',row)#,re.U)
#    print row

reader.seek(0)
for row in reader:
    for pat in patterns_to_remove:
        row = re.sub(pat,'',row)#,re.U)
    print row
reader.close()

Would this work for you?

# -*- coding: utf-8 -*-
import sys, re
import codecs

#reload(sys)
#sys.setdefaultencoding('utf-8')

#prepareing the examples to work on
writer = codecs.open("input.txt",'w','utf-8')
examples = [u'가시 돋친(신랄한)평 spinosity',
            u'가장 완전한 (같은 종류의 것 중에서)',
            u'알코올이)표준강도(50%) 이하의 underproof',
            u'(암초 awash',
            u'치명적인(fatal) capital']
for exampl in examples:
    writer.write(exampl+"\n")
writer.write(exampl)
writer.close()

reader = codecs.open("input.txt",'r','utf-8')

#order of patterns is important,
#if you remove brackets first, the other won't find anything
patterns_to_remove = [r"\(.{1,}\)",r"[\(\)]"]

#one pattern would work just fine, with the loop is a bit more clear
#pat = r"(\(.{1,}\))|([\(\)])"    
#for row in reader:
#    row = re.sub(pat,'',row)#,re.U)
#    print row

reader.seek(0)
for row in reader:
    for pat in patterns_to_remove:
        row = re.sub(pat,'',row)#,re.U)
    print row
reader.close()

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

26 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

Promise

文章 0 评论 0

qq_lbRlsh

文章 0 评论 0

待＂谢繁草

文章 0 评论 0

yy2010hell

文章 0 评论 0

漫无边际

文章 0 评论 0

傲娇萝莉攻

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文