使用Beautifutsoup从Python中的标签中解析一个特定单词

发布于 2025-02-10 11:49:57 字数 827 浏览 2 评论 0原文

我使用Beautifutsoup来解析XML文件,以便通过标签名称进行解析 但是,我可以在标签内搜索另一个词吗?

  Data = soup.find_all('Data')
for Data in Data:
    Data = Data.get_text()

数据是标签的名称,但我可以在此标签中选择一个单词以对其进行解析,也许是

Data = soup.find_all("Data", name = '"ObjectClass')

因为数据中的数据: data = data.get_text() 打印(数据)

我尝试了此错误,但会得到此错误 typeerror:tag.find_all()获得了参数“名称”的多个值,

这是一个XML示例:

<Document>
  <Data Name="ObjectClass">computer</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Data Name="ObjectClass">computer</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
</Document>

所以我只想搜索name = object class

i use beautifulsoup to parsing xml file so the parsing done by tag name
but can i put another word for searching inside the tag?

  Data = soup.find_all('Data')
for Data in Data:
    Data = Data.get_text()

Data is name of the tag but can i select a word inside this tag to parsing it maybe like this

Data = soup.find_all("Data", name = '"ObjectClass')

for Data in Data:
Data = Data.get_text()
print (Data)

i tried this but get this error
TypeError: Tag.find_all() got multiple values for argument 'name'

This is an XML example:

<Document>
  <Data Name="ObjectClass">computer</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Data Name="ObjectClass">computer</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
</Document>

So I want to search on only name =object class

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

通知家属抬走 2025-02-17 11:49:57

这将仅获得数据带有name =“ ObjectClass”的标签。它需要pip安装BS4 LXML外部库:

from bs4 import BeautifulSoup

xml = '''\
<Document>
  <Data Name="ObjectClass">computer</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Data Name="ObjectClass">computer</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Other Name="ObjectClass">other</Other>
</Document>
'''

soup = BeautifulSoup(xml,'xml')
for data in soup.find_all('Data',Name='ObjectClass'):
    print(data.get_text())

输出:

computer
computer

请注意,情况很重要(name not name)。

This will get only Data tags with Name="ObjectClass". It requires pip install bs4 lxml for external libraries:

from bs4 import BeautifulSoup

xml = '''\
<Document>
  <Data Name="ObjectClass">computer</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Data Name="ObjectClass">computer</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Other Name="ObjectClass">other</Other>
</Document>
'''

soup = BeautifulSoup(xml,'xml')
for data in soup.find_all('Data',Name='ObjectClass'):
    print(data.get_text())

Output:

computer
computer

Note that case matters (Name not name).

赠我空喜 2025-02-17 11:49:57

一个没有任何外部库的衬里:-)

import xml.etree.ElementTree as ET


xml = '''\
<Document>
  <Data Name="ObjectClass">computer1</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Data Name="ObjectClass">compute2r</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Other Name="ObjectClass">other</Other>
</Document>
'''
root = ET.fromstring(xml)
object_class_data = [x.text for x in root.findall('.//Data[@Name="ObjectClass"]')]
print(object_class_data)

输出

['computer1', 'compute2r']

One liner without any external library :-)

import xml.etree.ElementTree as ET


xml = '''\
<Document>
  <Data Name="ObjectClass">computer1</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Data Name="ObjectClass">compute2r</Data>
  <Data Name="AttributeLDAPDisplayName">ms-Mcs-AdmPwdExpirationTime</Data>
  <Other Name="ObjectClass">other</Other>
</Document>
'''
root = ET.fromstring(xml)
object_class_data = [x.text for x in root.findall('.//Data[@Name="ObjectClass"]')]
print(object_class_data)

output

['computer1', 'compute2r']
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文