组合for循环

发布于 2024-12-11 10:06:53 字数 1160 浏览 2 评论 0原文

两个程序都读取相同的 XML 文件。第一个程序复制标记之间的所有数据。第二个程序从标签复制有限的数据。

我只想要有限的数据。那么是否可以在第一个程序中使用此语句：

m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)

第一个程序

from lxml import etree
doc = etree.parse('file.xml')
def first(seq,default=None):
  for item in seq:
    return item
  return default
    NSMAP=dict(mw="http://www.mediawiki.org/xml/export-0.5/")
for i,page in enumerate(doc.xpath('/mw:mediawiki/mw:page',namespaces=NSMAP)):
  text = first(page.xpath('./mw:revision/mw:text/text()',namespaces=NSMAP))
  id = first(page.xpath('./mw:id/text()',namespaces=NSMAP))
  title = first(page.xpath('./mw:title/text()',namespaces=NSMAP))
  print " %s"  % (text)

第二个程序

import re
from xml.etree import ElementTree
with open('file.xml') as f:
    xml = ElementTree.parse(f)
    for t in xml.findall('//{http://www.mediawiki.org/xml/export-0.5/}text'):
    print '===================='
    m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)
    if m:
        print m.group(1)

更新：请帮助我。还有其他选择吗？

原文

Both programs are reading the same XML file. First program copies all data between <text></text> tags. And second program copies limited data from <text></text> tags.

I want to only limited data. So is it possible to use this statement in first program:

m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)

First Program

from lxml import etree
doc = etree.parse('file.xml')
def first(seq,default=None):
  for item in seq:
    return item
  return default
    NSMAP=dict(mw="http://www.mediawiki.org/xml/export-0.5/")
for i,page in enumerate(doc.xpath('/mw:mediawiki/mw:page',namespaces=NSMAP)):
  text = first(page.xpath('./mw:revision/mw:text/text()',namespaces=NSMAP))
  id = first(page.xpath('./mw:id/text()',namespaces=NSMAP))
  title = first(page.xpath('./mw:title/text()',namespaces=NSMAP))
  print " %s"  % (text)

Second Program

import re
from xml.etree import ElementTree
with open('file.xml') as f:
    xml = ElementTree.parse(f)
    for t in xml.findall('//{http://www.mediawiki.org/xml/export-0.5/}text'):
    print '===================='
    m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', t.text)
    if m:
        print m.group(1)

UPDATE: please help me. Is there any other alternative?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月光色 2024-12-18 10:06:53

我不认为您有任何理由无法在第一个程序结束时执行以下操作：

m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', text)
if m:
    print m.group(1)

根据您的描述，您的 text 变量应包含 all 文本，然后您的正则表达式应该能够从中过滤出必要的部分。

I don't see any reason why you wouldn't be able to do the following at the end of your first program:

m = re.search(r'(?ms).*?{{(Infobox film.*?)}}', text)
if m:
    print m.group(1)

As per what you describe, your text variable should contain all the text, and your regexp should then be able to filter out the necessary parts from that.

回复收藏 0 原文

~没有更多了~