将 XML 解析为 CSV 的多个 for 循环不起作用

发布于 2025-01-15 23:13:02 字数 2772 浏览 1 评论 0原文

我想编写一个可用于不同 XML 文件(均采用 TEI 编码)的代码,以查看特定元素和属性是否出现、它们出现的频率以及在什么上下文中出现。为此,我编写了以下代码:

from logging import root
import xml.etree.ElementTree as ET
import csv

f = open('orestes-elements.csv', 'w', encoding="utf-8")
writer = csv.writer(f)
writer.writerow(["Note Attributes", "Note Text", "Responsibility", "Certainty Element", "Certainty Attributes", "Certainty Text"])

tree = ET.parse(r"C:\Users\noahb\OneDrive\Desktop\Humboldt\Semester 2\Daten\Hausarbeit-TEI\edition-euripides\Orestes.xml")
root = tree.getroot()


try:
    for note in root.findall('.//note'):
        noteat = note.attrib
        notetext = note.text
        print(noteat)
        print(notetext)
    #attribute search
    for responsibility in root.findall(".//*[@resp]"):
        responsibilities = str(responsibility.tag, responsibility.attrib, responsibility.text)
    for certainty in root.findall(".//*[@cert]"):
        certaintytag = certainty.tag
        certaintyat = certainty.attrib
        certaintytext = certainty.text
    writer.writerow([noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext])
finally:
    f.close()

我收到错误“NameError:名称'noteat'未定义”。我可以缩进 writer.writerrow 但来自另一个 for 循环的信息不会被添加。如何将不同 for 循环中的信息获取到 CSV 文件中?帮助将不胜感激? (for 循环中的 print() 给了我正确的结果,并且带着责任,我尝试将其全部变成一个字符串,但这不是必需的,我只是尝试不同的解决方案 - 到目前为止都没有工作)。

这是我的 XML 文件的示例:(某些元素和属性不会出现在某些文件中 - 这可能是形成错误的原因吗?)

<?xml version="1.0" encoding="UTF-8"?>
<!--<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="grc">-->
<?oxygen RNGSchema="teiScholiaSchema2021beta.rng" type="xml"?>

<TEI xml:lang="grc">
 <teiHeader>
  <titleStmt>
    <title cert="high">Scholia on Euripides’ Orestes 1–500</title>
    <author><note>Donald J.</note> Mastronarde</author>
   </titleStmt>
</teiHeader>
 <text>
   <div1 type="subdivisionByPlay" xml:id="Orestes">
    <div2 type="hypotheseis" xml:id="hypOrestes">
     <head type="outer" xml:lang="en">Prefatory material (argumenta/hypotheseis) for Orestes</head>
       <p>Orestes, pursuing <note cert="low">(vengeance for)</note> the murder of his father, killed Aegisthus and
        Clytemnestra. Having dared to commit matricide he paid the penalty immediately, becoming
        mad. And after Tyndareus, the father of the murdered woman, brought an accusation, the
        Argives were about to issue a public vote about him, concerning what the man who had acted
        impiously should suffer.
        </p>    
    </div2>
   </div1>
 </text>
</TEI>

CSV 的示例如下:

I want to write a code that can be used on different XML files (all with TEI encoding) to see if specific elements and attributes appear, how often they appear and in what context). To do this I have written the following code:

from logging import root
import xml.etree.ElementTree as ET
import csv

f = open('orestes-elements.csv', 'w', encoding="utf-8")
writer = csv.writer(f)
writer.writerow(["Note Attributes", "Note Text", "Responsibility", "Certainty Element", "Certainty Attributes", "Certainty Text"])

tree = ET.parse(r"C:\Users\noahb\OneDrive\Desktop\Humboldt\Semester 2\Daten\Hausarbeit-TEI\edition-euripides\Orestes.xml")
root = tree.getroot()


try:
    for note in root.findall('.//note'):
        noteat = note.attrib
        notetext = note.text
        print(noteat)
        print(notetext)
    #attribute search
    for responsibility in root.findall(".//*[@resp]"):
        responsibilities = str(responsibility.tag, responsibility.attrib, responsibility.text)
    for certainty in root.findall(".//*[@cert]"):
        certaintytag = certainty.tag
        certaintyat = certainty.attrib
        certaintytext = certainty.text
    writer.writerow([noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext])
finally:
    f.close()

I get the error "NameError: name 'noteat' is not defined". I can indent writer.writerrow but the information from the other for loop doesnt get added. How do I get the information from the different for loops into my CSV file? Help would be greatly appreciated? (The print() in the for loops gives me the right results and with responsibilities I tried making it all one string but that isnt necessary I am just trying out different solutions - none work until now).

This is an example of my XML file: (some of the elements and attributes will not appear in some of the files - might this be a reason form the errors?)

<?xml version="1.0" encoding="UTF-8"?>
<!--<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="grc">-->
<?oxygen RNGSchema="teiScholiaSchema2021beta.rng" type="xml"?>

<TEI xml:lang="grc">
 <teiHeader>
  <titleStmt>
    <title cert="high">Scholia on Euripides’ Orestes 1–500</title>
    <author><note>Donald J.</note> Mastronarde</author>
   </titleStmt>
</teiHeader>
 <text>
   <div1 type="subdivisionByPlay" xml:id="Orestes">
    <div2 type="hypotheseis" xml:id="hypOrestes">
     <head type="outer" xml:lang="en">Prefatory material (argumenta/hypotheseis) for Orestes</head>
       <p>Orestes, pursuing <note cert="low">(vengeance for)</note> the murder of his father, killed Aegisthus and
        Clytemnestra. Having dared to commit matricide he paid the penalty immediately, becoming
        mad. And after Tyndareus, the father of the murdered woman, brought an accusation, the
        Argives were about to issue a public vote about him, concerning what the man who had acted
        impiously should suffer.
        </p>    
    </div2>
   </div1>
 </text>
</TEI>

Example of what CSV should look like:
CSV that should result

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一念一轮回 2025-01-22 23:13:02

如果缺少元素,则 writer.writerow() 中的值将不会被定义。您可以定义一些默认值来避免这种情况。

尝试在 try 语句后添加以下内容:

noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext = [''] * 6

如果愿意,当然可以使用 'NA'

The values in your writer.writerow() will not be defined if an element is missing. You could just define some default values to avoid this.

Try adding the following after the try statement:

noteat, notetext, responsibilities, certaintytag, certaintyat, certaintytext = [''] * 6

You could of course have 'NA' if preferred.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文