如何将此XML数据解析到Python的表中

发布于 2025-01-28 11:23:00 字数 2030 浏览 0 评论 0原文

我有以下XML,我想将其解析到表中。我一直在环顾四周,找不到一个好的答案。困难的部分是:

  1. 不同子树中的标题和数据
  2. 所有内部标签具有相同的名称(TH或TD)
疫苗日期状态剂量路线站点点评论ID ID
疫苗A星期一,2019年3月15日,DOMEIMM。
疫苗BTUE,2019年9月20日完成了IMM。

<ns0:text xmlns:ns0="urn:hl7-org:v3">
  <ns0:table border="1" width="100%">
    <ns0:thead>
      <ns0:tr>
        <ns0:th>Vaccine</ns0:th>
        <ns0:th>Date</ns0:th>
        <ns0:th>Status</ns0:th>
        <ns0:th>Dose</ns0:th>
        <ns0:th>Route</ns0:th>
        <ns0:th>Site</ns0:th>
        <ns0:th>Comment</ns0:th>
      </ns0:tr>
    </ns0:thead>
    <ns0:tbody>
      <ns0:tr>
        <ns0:td>
          <ns0:content ID="immunizationDescription1">Vaccin A</ns0:content>
        </ns0:td>
        <ns0:td>Monday, March 15, 2019 at 4:46:00 pm</ns0:td>
        <ns0:td>Done</ns0:td>
        <ns0:td>
        </ns0:td>
        <ns0:td />
        <ns0:td />
        <ns0:td />
      </ns0:tr>
      <ns0:tr>
        <ns0:td>
          <ns0:content ID="immunizationDescription2">Vaccine B</ns0:content>
        </ns0:td>
        <ns0:td>Tuesday, September 20, 2019 at 12:00:00 am</ns0:td>
        <ns0:td>Done</ns0:td>
        <ns0:td>
        </ns0:td>
        <ns0:td />
        <ns0:td />
        <ns0:td />
      </ns0:tr>
    </ns0:tbody>
  </ns0:table>
</ns0:text>

I have the following xml and I want to parse it into a table. I have been looking around and did not find a good answer. The difficult parts are:

  1. The header and data in different subtree
  2. All inner tags have same name (th or td)
VaccineDateStatusDoseRouteSiteCommentID
Vaccine AMon,Mar 15,2019Doneimm.
Vaccine BTue,Sep 20, 2019Doneimm.

<ns0:text xmlns:ns0="urn:hl7-org:v3">
  <ns0:table border="1" width="100%">
    <ns0:thead>
      <ns0:tr>
        <ns0:th>Vaccine</ns0:th>
        <ns0:th>Date</ns0:th>
        <ns0:th>Status</ns0:th>
        <ns0:th>Dose</ns0:th>
        <ns0:th>Route</ns0:th>
        <ns0:th>Site</ns0:th>
        <ns0:th>Comment</ns0:th>
      </ns0:tr>
    </ns0:thead>
    <ns0:tbody>
      <ns0:tr>
        <ns0:td>
          <ns0:content ID="immunizationDescription1">Vaccin A</ns0:content>
        </ns0:td>
        <ns0:td>Monday, March 15, 2019 at 4:46:00 pm</ns0:td>
        <ns0:td>Done</ns0:td>
        <ns0:td>
        </ns0:td>
        <ns0:td />
        <ns0:td />
        <ns0:td />
      </ns0:tr>
      <ns0:tr>
        <ns0:td>
          <ns0:content ID="immunizationDescription2">Vaccine B</ns0:content>
        </ns0:td>
        <ns0:td>Tuesday, September 20, 2019 at 12:00:00 am</ns0:td>
        <ns0:td>Done</ns0:td>
        <ns0:td>
        </ns0:td>
        <ns0:td />
        <ns0:td />
        <ns0:td />
      </ns0:tr>
    </ns0:tbody>
  </ns0:table>
</ns0:text>

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

过气美图社 2025-02-04 11:23:00

只要您照顾自己的命名空间,您就应该对这样的东西可以很好,尽管它有点令人费解:

from lxml import etree

nsmap = {"ns0": "urn:hl7-org:v3"}
rows = []
cols = doc.xpath('//ns0:thead//ns0:tr//ns0:th/text()', namespaces=nsmap)
cols.append("ID")

for p in doc.xpath('//ns0:tbody//ns0:tr', namespaces=nsmap):
    vaccine = p.xpath('.//ns0:content/text()', namespaces=nsmap)[0]
    id = p.xpath('.//ns0:content/@ID', namespaces=nsmap)[0]
    date = p.xpath('substring-before(.//ns0:td[position()=2]/text()," at")', namespaces=nsmap)
    status = p.xpath('.//ns0:td[position()>2]', namespaces=nsmap)
    row = []
    row.extend([vaccine,date])
    row.extend([sta.text.strip() if sta.text else "" for sta in status])
    #you could combine the previous two lines into one, but that would make it somewhat less readable
    row.append(id)
    rows.append(row)

输出(赦免格式):

Vaccine     Date    Status  Dose    Route   Site    Comment     ID
0   Vaccin A    Monday, March 15, 2019          Done            immunizationDescription1
1   Vaccine B   Tuesday, September 20, 2019     Done            immunizationDescription2

As long as you take care of your namespace, you should be OK with something like this, though it's a bit convoluted:

from lxml import etree

nsmap = {"ns0": "urn:hl7-org:v3"}
rows = []
cols = doc.xpath('//ns0:thead//ns0:tr//ns0:th/text()', namespaces=nsmap)
cols.append("ID")

for p in doc.xpath('//ns0:tbody//ns0:tr', namespaces=nsmap):
    vaccine = p.xpath('.//ns0:content/text()', namespaces=nsmap)[0]
    id = p.xpath('.//ns0:content/@ID', namespaces=nsmap)[0]
    date = p.xpath('substring-before(.//ns0:td[position()=2]/text()," at")', namespaces=nsmap)
    status = p.xpath('.//ns0:td[position()>2]', namespaces=nsmap)
    row = []
    row.extend([vaccine,date])
    row.extend([sta.text.strip() if sta.text else "" for sta in status])
    #you could combine the previous two lines into one, but that would make it somewhat less readable
    row.append(id)
    rows.append(row)

Output (pardon the formatting):

Vaccine     Date    Status  Dose    Route   Site    Comment     ID
0   Vaccin A    Monday, March 15, 2019          Done            immunizationDescription1
1   Vaccine B   Tuesday, September 20, 2019     Done            immunizationDescription2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文