删除 unicode 行括号中的字符串 - python
我的正则表达式有一些问题,并删除了括号内的强项。
这是我的代码:
import sys, re
import codecs
reload(sys)
sys.setdefaultencoding('utf-8')
reader = codecs.open("input",'r','utf-8')
p = re.compile('s/[\[\(].+?[\]\)]//g', re.DOTALL)
# i've also tried several regex but it didn't work
# p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
# p = re.compile('\{\{*.*?\}\}', re.DOTALL)
for row in reader:
if ("(" in row) and (")" not in row):
continue
if row.count("(") != row.count(")"):
continue
else:
row2 = p.sub('', row)
print row2
对于输入文本文件,它看起来像这样:
가시 돋친(신랄한)평 spinosity
가장 완전한 (같은 종류의 것 중에서) unabridged
(알코올이)표준강도(50%) 이하의 underproof
(암초 awash
치명적인(fatal) capital
열을) 전도하다 transmit
所需的输出应如下所示:
가시 돋친평 spinosity
가장 완전한 unabridged
표준강도 이하의 underproof
치명적인 capital
i've got some problems with my regex and removing my the strongs bounded by brackets.
here's my code:
import sys, re
import codecs
reload(sys)
sys.setdefaultencoding('utf-8')
reader = codecs.open("input",'r','utf-8')
p = re.compile('s/[\[\(].+?[\]\)]//g', re.DOTALL)
# i've also tried several regex but it didn't work
# p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
# p = re.compile('\{\{*.*?\}\}', re.DOTALL)
for row in reader:
if ("(" in row) and (")" not in row):
continue
if row.count("(") != row.count(")"):
continue
else:
row2 = p.sub('', row)
print row2
for the input textfiles it looks something like this:
가시 돋친(신랄한)평 spinosity
가장 완전한 (같은 종류의 것 중에서) unabridged
(알코올이)표준강도(50%) 이하의 underproof
(암초 awash
치명적인(fatal) capital
열을) 전도하다 transmit
the required output should look like this:
가시 돋친평 spinosity
가장 완전한 unabridged
표준강도 이하의 underproof
치명적인 capital
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这对你有用吗?
Would this work for you?