限制美丽的肥皂

发布于 2025-02-12 04:45:20 字数 1142 浏览 1 评论 0原文

我正在尝试使用Beautifutsoup将XML转换为JSON的XML文件,其结构如下:

<H3 id="LinkTarget_311">ISSS.1.A1Acceptance of Overall(B)</H3>

<Standard>An organisation's Top Management.</Standard>

<Standard>The Top Management MUST define.</Standard>

<H3 id="LinkTarget_3116">ISS.2.A2Acceptance of Overall(C)</H3>

<Standard>An organisation's Top.</Standard>

<Standard>Top Management.</Standard>

<H3 id="LinkTarget_316">ISS.2.2Acceptance of Overall(D)</H3>

<Standard>An organisation's Top resource.</Standard>

<Standard>Top Management resource.</Standard>
......
.......

我编写的代码如下:


extract2 = re.compile(r"[A-Z][a-z]\w*")

control_ids = {}
header = bs_content.find_all('h3',{'id':True})
sub = bs_content.find_all('standard')

for i,j in zip(header,sub):
      
    req_id = str.strip(re.split(extract2,i.text)[0])
      
    control_ids[req_id] = j.text

结果太长了,我不是全部粘贴:

预期结果:H3 TAG的文本与文本配对在以下“标准”标签

[{isss.1.a1.a1 accepter for总体(b):'一个组织的顶级管理。 top.top管理。'},....]

I am trying to convert xml to JSON using Beautifulsoup for the xml file having structure as below:

<H3 id="LinkTarget_311">ISSS.1.A1Acceptance of Overall(B)</H3>

<Standard>An organisation's Top Management.</Standard>

<Standard>The Top Management MUST define.</Standard>

<H3 id="LinkTarget_3116">ISS.2.A2Acceptance of Overall(C)</H3>

<Standard>An organisation's Top.</Standard>

<Standard>Top Management.</Standard>

<H3 id="LinkTarget_316">ISS.2.2Acceptance of Overall(D)</H3>

<Standard>An organisation's Top resource.</Standard>

<Standard>Top Management resource.</Standard>
......
.......

The code I wrote is as below :


extract2 = re.compile(r"[A-Z][a-z]\w*")

control_ids = {}
header = bs_content.find_all('h3',{'id':True})
sub = bs_content.find_all('standard')

for i,j in zip(header,sub):
      
    req_id = str.strip(re.split(extract2,i.text)[0])
      
    control_ids[req_id] = j.text

The result is too long I an not paste all of it:

Expected result: text of H3 tag paired with text of the following 'standard' tags

[{ISSS.1.A1Acceptance of Overall(B) : 'An organisation's Top Management.Top Management.'} , {ISS.2.A2Acceptance of Overall(C):'An organisation's Top.Top Management.'},....]

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

鲜肉鲜肉永远不皱 2025-02-19 04:45:20

尝试更简单的东西:

ids = bs_content.select('H3')
for id in ids:
    value = " ".join([stan.text for stan in id.fetchNextSiblings()[:2]])
    control_ids[id.text] = value
print(control_ids)

根据您的示例HTML输出:输出:

{'ISSS.1.A1Acceptance of Overall(B)': "An organisation's Top Management. The Top Management MUST define.", 
'ISS.2.A2Acceptance of Overall(C)': "An organisation's Top. Top Management.",  
'ISS.2.2Acceptance of Overall(D)': "An organisation's Top resource. Top Management resource."}

Try something simpler:

ids = bs_content.select('H3')
for id in ids:
    value = " ".join([stan.text for stan in id.fetchNextSiblings()[:2]])
    control_ids[id.text] = value
print(control_ids)

Output, based on your sample html:

{'ISSS.1.A1Acceptance of Overall(B)': "An organisation's Top Management. The Top Management MUST define.", 
'ISS.2.A2Acceptance of Overall(C)': "An organisation's Top. Top Management.",  
'ISS.2.2Acceptance of Overall(D)': "An organisation's Top resource. Top Management resource."}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文