嵌套XML属性&文字不要使用大熊猫在DF中显示

发布于 2025-01-17 13:13:26 字数 2078 浏览 4 评论 0原文

我是 Python 新手,有一个具有以下结构的 file.xml:

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <PRODUCT_DETAILS>
        <DESCRIPTION_SHORT>blue dog w short hair</DESCRIPTION_SHORT>
        <DESCRIPTION_LONG>blue dog w short hair and unlimitied zoomies</DESCRIPTION_LONG>
    </PRODUCT_DETAILS>
    <PRODUCT_FEATURES>
        <FEATURE>
            <FNAME>Hair</FNAME>
            <FVALUE>short</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Colour</FNAME>
            <FVALUE>blue</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Legs</FNAME>
            <FVALUE>4</FVALUE>
        </FEATURE>
    </PRODUCT_FEATURES>
</HEADER>

我使用一个非常简单的代码片段(如下)将其转换为 file_export.csv:

import pandas as pd

df = pd.read_xml("file.xml")

# df

df.to_csv("file_export.csv", index=False)

问题是我最终得到一个像这样的表:

DESCRIPTION_SHORT       DESCRIPTION_LONG                                FEATURE
blue dog w short hair   blue dog w short hair and unlimitied zoomies    NaN

我尝试删除FEATURE 属性,但最终用最后一个覆盖(?)之前的 FNAME 和 FVALUE,假设因为它们被称为相同:

DESCRIPTION_SHORT       DESCRIPTION_LONG                                FNAME   FVALUE
blue dog w short hair   blue dog w short hair and unlimitied zoomies    None    NaN
None                    None                                            Legs    4.0

我需要在代码中添加什么来显示嵌套属性(包括其文本)?像这样:

DESCRIPTION_SHORT       DESCRIPTION_LONG                                FEATURE FNAME   FVALUE
blue dog w short hair   blue dog w short hair and unlimitied zoomies    NaN     Hair    short
blue dog w short hair   blue dog w short hair and unlimitied zoomies    NaN     Colour  blue
blue dog w short hair   blue dog w short hair and unlimitied zoomies    NaN     Legs    4

提前谢谢您!!

~C

I am new to Python and have a file.xml with the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <PRODUCT_DETAILS>
        <DESCRIPTION_SHORT>blue dog w short hair</DESCRIPTION_SHORT>
        <DESCRIPTION_LONG>blue dog w short hair and unlimitied zoomies</DESCRIPTION_LONG>
    </PRODUCT_DETAILS>
    <PRODUCT_FEATURES>
        <FEATURE>
            <FNAME>Hair</FNAME>
            <FVALUE>short</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Colour</FNAME>
            <FVALUE>blue</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Legs</FNAME>
            <FVALUE>4</FVALUE>
        </FEATURE>
    </PRODUCT_FEATURES>
</HEADER>

I am using a very simple snippet (below) to turn it into file_export.csv:

import pandas as pd

df = pd.read_xml("file.xml")

# df

df.to_csv("file_export.csv", index=False)

The problem is that I end up with a table like this:

DESCRIPTION_SHORT       DESCRIPTION_LONG                                FEATURE
blue dog w short hair   blue dog w short hair and unlimitied zoomies    NaN

I tried removing the FEATURE attribute but ended up overwriting(?) previous FNAME and FVALUE with the last one, assuming because they are called the same:

DESCRIPTION_SHORT       DESCRIPTION_LONG                                FNAME   FVALUE
blue dog w short hair   blue dog w short hair and unlimitied zoomies    None    NaN
None                    None                                            Legs    4.0

What do I need to add to my code to show the nested attributes including their text? Like this:

DESCRIPTION_SHORT       DESCRIPTION_LONG                                FEATURE FNAME   FVALUE
blue dog w short hair   blue dog w short hair and unlimitied zoomies    NaN     Hair    short
blue dog w short hair   blue dog w short hair and unlimitied zoomies    NaN     Colour  blue
blue dog w short hair   blue dog w short hair and unlimitied zoomies    NaN     Legs    4

Thank you in advance!!

~ C

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

淡看悲欢离合 2025-01-24 13:13:26

首先,您问题中的示例 xml(可能还有您的实际 xml)并不真正适合 read_xml()。在这种情况下,您最好使用实际的 xml 解析器并将输出交给 pandas。

此外,我不认为您想要的输出非常有效 - 在您的示例中,您将每个长描述和短描述重复 3 次,没有明显的原因。

说了这么多,我建议这样:

假设您的实际 xml 有多个宠物,例如:

inventory="""<?xml version="1.0" encoding="UTF-8"?>
<doc>
<HEADER>
    <PRODUCT_DETAILS>
        <DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
        <DESCRIPTION_LONG>green cat w short hair and unlimitied zoomies</DESCRIPTION_LONG>
    </PRODUCT_DETAILS>
    <PRODUCT_FEATURES>
        <FEATURE>
            <FNAME>Hair</FNAME>
            <FVALUE>medium</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Colour</FNAME>
            <FVALUE>green</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Legs</FNAME>
            <FVALUE>14</FVALUE>
        </FEATURE>
    </PRODUCT_FEATURES>
</HEADER>
****the HEADER in your question goes here***
</doc>"""

from lxml import etree
import pandas as pd

doc = etree.XML(inventory.encode())
pets = doc.xpath('//HEADER')

headers=[elem.tag for elem in doc.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(doc.xpath('//HEADER[1]//FNAME/text()'))

rows = []

for pet in pets:

    row = [pet.xpath(f'.//{headers[0]}/text()')[0],pet.xpath(f'.//{headers[1]}/text()')[0]]
    f_values = pet.xpath('.//FVALUE/text()')
    row.extend(f_values)    
    rows.append(row)

如果您想更具冒险精神并使用 xpath 2.0(lxml 不支持)以及更多列表理解,你可以试试这个:

from elementpath import select

expression1 = '//HEADER[1]/string-join((./PRODUCT_DETAILS//*/name(),./PRODUCT_FEATURES//FNAME),",")'
expression2 = '//HEADER/string-join((./PRODUCT_DETAILS//*,./PRODUCT_FEATURES//FVALUE),",")'
headers = [h.split(',') for h in select(doc, expression1 )]
rows= [r.split(',') for r in select(doc, expression2)]

在任何一种情况下:

pd.DataFrame(rows,columns=headers)

应该输出:

       DESCRIPTION_SHORT    DESCRIPTION_LONG                                 Hair   Colour  Legs
0   green cat w short hair  green cat w short hair and unlimitied zoomies   medium  green   14
1   blue dog w long hair    blue dog w long hair and limitied zoomies   short   blue    4

First, the sample xml in your question (and probably your actual xml) doesn't really lend itself to read_xml(). In this case you are probably better off using an actual xml parser and handing the output over to pandas.

In addition, I don't think your desired output is very efficient - in your example, you repeat each of the long and short description 3 times, for no apparent reason.

Having said all that, I would suggest something like this:

Assuming your actual xml has more than one pet, something like:

inventory="""<?xml version="1.0" encoding="UTF-8"?>
<doc>
<HEADER>
    <PRODUCT_DETAILS>
        <DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
        <DESCRIPTION_LONG>green cat w short hair and unlimitied zoomies</DESCRIPTION_LONG>
    </PRODUCT_DETAILS>
    <PRODUCT_FEATURES>
        <FEATURE>
            <FNAME>Hair</FNAME>
            <FVALUE>medium</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Colour</FNAME>
            <FVALUE>green</FVALUE>
        </FEATURE>
        <FEATURE>
            <FNAME>Legs</FNAME>
            <FVALUE>14</FVALUE>
        </FEATURE>
    </PRODUCT_FEATURES>
</HEADER>
****the HEADER in your question goes here***
</doc>"""

from lxml import etree
import pandas as pd

doc = etree.XML(inventory.encode())
pets = doc.xpath('//HEADER')

headers=[elem.tag for elem in doc.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(doc.xpath('//HEADER[1]//FNAME/text()'))

rows = []

for pet in pets:

    row = [pet.xpath(f'.//{headers[0]}/text()')[0],pet.xpath(f'.//{headers[1]}/text()')[0]]
    f_values = pet.xpath('.//FVALUE/text()')
    row.extend(f_values)    
    rows.append(row)

If you want to be even more adventurous and use xpath 2.0 (which lxml doesn't support) as well as more list comprehensions, you can try this:

from elementpath import select

expression1 = '//HEADER[1]/string-join((./PRODUCT_DETAILS//*/name(),./PRODUCT_FEATURES//FNAME),",")'
expression2 = '//HEADER/string-join((./PRODUCT_DETAILS//*,./PRODUCT_FEATURES//FVALUE),",")'
headers = [h.split(',') for h in select(doc, expression1 )]
rows= [r.split(',') for r in select(doc, expression2)]

In either case:

pd.DataFrame(rows,columns=headers)

should output:

       DESCRIPTION_SHORT    DESCRIPTION_LONG                                 Hair   Colour  Legs
0   green cat w short hair  green cat w short hair and unlimitied zoomies   medium  green   14
1   blue dog w long hair    blue dog w long hair and limitied zoomies   short   blue    4
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文