组合for循环
两个程序都读取相同的 XML 文件。第一个程序复制
标记之间的所有数据。第二个程序从
标签复制有限的数据。
我只想要有限的数据。那么是否可以在第一个程序中使用此语句:
m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)
第一个程序
from lxml import etree
doc = etree.parse('file.xml')
def first(seq,default=None):
for item in seq:
return item
return default
NSMAP=dict(mw="http://www.mediawiki.org/xml/export-0.5/")
for i,page in enumerate(doc.xpath('/mw:mediawiki/mw:page',namespaces=NSMAP)):
text = first(page.xpath('./mw:revision/mw:text/text()',namespaces=NSMAP))
id = first(page.xpath('./mw:id/text()',namespaces=NSMAP))
title = first(page.xpath('./mw:title/text()',namespaces=NSMAP))
print " %s" % (text)
第二个程序
import re
from xml.etree import ElementTree
with open('file.xml') as f:
xml = ElementTree.parse(f)
for t in xml.findall('//{http://www.mediawiki.org/xml/export-0.5/}text'):
print '===================='
m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)
if m:
print m.group(1)
更新:请帮助我。还有其他选择吗?
Both programs are reading the same XML file. First program copies all data between <text></text>
tags. And second program copies limited data from <text></text>
tags.
I want to only limited data. So is it possible to use this statement in first program:
m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)
First Program
from lxml import etree
doc = etree.parse('file.xml')
def first(seq,default=None):
for item in seq:
return item
return default
NSMAP=dict(mw="http://www.mediawiki.org/xml/export-0.5/")
for i,page in enumerate(doc.xpath('/mw:mediawiki/mw:page',namespaces=NSMAP)):
text = first(page.xpath('./mw:revision/mw:text/text()',namespaces=NSMAP))
id = first(page.xpath('./mw:id/text()',namespaces=NSMAP))
title = first(page.xpath('./mw:title/text()',namespaces=NSMAP))
print " %s" % (text)
Second Program
import re
from xml.etree import ElementTree
with open('file.xml') as f:
xml = ElementTree.parse(f)
for t in xml.findall('//{http://www.mediawiki.org/xml/export-0.5/}text'):
print '===================='
m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)
if m:
print m.group(1)
UPDATE: please help me. Is there any other alternative?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不认为您有任何理由无法在第一个程序结束时执行以下操作:
根据您的描述,您的
text
变量应包含 all 文本,然后您的正则表达式应该能够从中过滤出必要的部分。I don't see any reason why you wouldn't be able to do the following at the end of your first program:
As per what you describe, your
text
variable should contain all the text, and your regexp should then be able to filter out the necessary parts from that.