python 2.7 中的正则表达式和 csv 问题

发布于 2024-12-15 20:57:21 字数 546 浏览 1 评论 0原文

使用以下方法修复问题（对于其余问题，将更改我的代码）。很抱歉我最初的帖子中的代码格式不正确。

import csv, re, mechanize  

htmlML = br.response().read() 

#escaping ? fixed the regex match 
patMemberName = re.compile('<a href=/foo.php\?XID=(d+) ><font color=#000000><b>(.*) </b>') 
searchMemberName = re.findall(patMemberName,htmlML)

MembersCsv = 'path-to-csv' 
MemberWriter = csv.writer(open(MembersCsv, 'wb')) #adding b fixed the \n in csv

for i in searchMemberName:
    MemberWriter.writerow(i)
    print (i)

谢谢您的宝贵时间

原文

Used the following to fix the problems (for the remaining issues, will change my code around). Sorry for the improper code format in my initial post.

import csv, re, mechanize  

htmlML = br.response().read() 

#escaping ? fixed the regex match 
patMemberName = re.compile('<a href=/foo.php\?XID=(d+) ><font color=#000000><b>(.*) </b>') 
searchMemberName = re.findall(patMemberName,htmlML)

MembersCsv = 'path-to-csv' 
MemberWriter = csv.writer(open(MembersCsv, 'wb')) #adding b fixed the \n in csv

for i in searchMemberName:
    MemberWriter.writerow(i)
    print (i)

Thank you for your time

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱人如己 2024-12-22 20:57:21

不幸的是，我现在找不到适合 Python 的转义序列。通常，您将使用不应在“\Q...\E”中解释的元字符来包装表达式。

尝试将字符串包装在 re.escape(string) 中。所以：

re.compile(re.escape('<font color=#000000><b>(.*)</b>'))

Unfortunately, I can't find the proper escape sequence for Python right now. Generally, you would wrap an expression with meta-characters that should not be interpreted in "\Q...\E".

Try wrapping your string in re.escape(string). So:

re.compile(re.escape('<font color=#000000><b>(.*)</b>'))

回复收藏 0 原文

黎歌 2024-12-22 20:57:21

对于问题 1)，您必须转义模式中的 ?。

import re

htmlML = '<a href=/foo.php?XID=123 ><font color=#000000><b>user</b>'
patMemberID = re.compile('<a href=/foo.php\?XID=(\d*) ><font color=#000000><b>user</b>')

searchMemberID = re.findall(patMemberID, htmlML)
print len(searchMemberID)

for i in searchMemberID:
    print (i)

然后可以从字符串中提取 123

问题 2a)

您可以使用 (.*?) 替换 some string，即 ? 表示非贪婪匹配

For question 1), you have to escape the ? in the pattern.

import re

htmlML = '<a href=/foo.php?XID=123 ><font color=#000000><b>user</b>'
patMemberID = re.compile('<a href=/foo.php\?XID=(\d*) ><font color=#000000><b>user</b>')

searchMemberID = re.findall(patMemberID, htmlML)
print len(searchMemberID)

for i in searchMemberID:
    print (i)

Then the 123 can be extracted from the string

Question 2a)

You can use (.*?) to replace some string, the ? maens non-greedy match

回复收藏 0 原文

~没有更多了~