嵌套XML属性&文字不要使用大熊猫在DF中显示
我是 Python 新手,有一个具有以下结构的 file.xml:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>blue dog w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>blue dog w short hair and unlimitied zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Hair</FNAME>
<FVALUE>short</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>blue</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>4</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
我使用一个非常简单的代码片段(如下)将其转换为 file_export.csv:
import pandas as pd
df = pd.read_xml("file.xml")
# df
df.to_csv("file_export.csv", index=False)
问题是我最终得到一个像这样的表:
DESCRIPTION_SHORT DESCRIPTION_LONG FEATURE
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN
我尝试删除FEATURE 属性,但最终用最后一个覆盖(?)之前的 FNAME 和 FVALUE,假设因为它们被称为相同:
DESCRIPTION_SHORT DESCRIPTION_LONG FNAME FVALUE
blue dog w short hair blue dog w short hair and unlimitied zoomies None NaN
None None Legs 4.0
我需要在代码中添加什么来显示嵌套属性(包括其文本)?像这样:
DESCRIPTION_SHORT DESCRIPTION_LONG FEATURE FNAME FVALUE
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Hair short
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Colour blue
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Legs 4
提前谢谢您!!
~C
I am new to Python and have a file.xml with the following structure:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>blue dog w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>blue dog w short hair and unlimitied zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Hair</FNAME>
<FVALUE>short</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>blue</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>4</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
I am using a very simple snippet (below) to turn it into file_export.csv:
import pandas as pd
df = pd.read_xml("file.xml")
# df
df.to_csv("file_export.csv", index=False)
The problem is that I end up with a table like this:
DESCRIPTION_SHORT DESCRIPTION_LONG FEATURE
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN
I tried removing the FEATURE attribute but ended up overwriting(?) previous FNAME and FVALUE with the last one, assuming because they are called the same:
DESCRIPTION_SHORT DESCRIPTION_LONG FNAME FVALUE
blue dog w short hair blue dog w short hair and unlimitied zoomies None NaN
None None Legs 4.0
What do I need to add to my code to show the nested attributes including their text? Like this:
DESCRIPTION_SHORT DESCRIPTION_LONG FEATURE FNAME FVALUE
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Hair short
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Colour blue
blue dog w short hair blue dog w short hair and unlimitied zoomies NaN Legs 4
Thank you in advance!!
~ C
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,您问题中的示例 xml(可能还有您的实际 xml)并不真正适合
read_xml()
。在这种情况下,您最好使用实际的 xml 解析器并将输出交给 pandas。此外,我不认为您想要的输出非常有效 - 在您的示例中,您将每个长描述和短描述重复 3 次,没有明显的原因。
说了这么多,我建议这样:
假设您的实际 xml 有多个宠物,例如:
如果您想更具冒险精神并使用 xpath 2.0(lxml 不支持)以及更多列表理解,你可以试试这个:
在任何一种情况下:
应该输出:
First, the sample xml in your question (and probably your actual xml) doesn't really lend itself to
read_xml()
. In this case you are probably better off using an actual xml parser and handing the output over to pandas.In addition, I don't think your desired output is very efficient - in your example, you repeat each of the long and short description 3 times, for no apparent reason.
Having said all that, I would suggest something like this:
Assuming your actual xml has more than one pet, something like:
If you want to be even more adventurous and use xpath 2.0 (which lxml doesn't support) as well as more list comprehensions, you can try this:
In either case:
should output: