如何添加所有列(XML到CSV)

发布于 2025-01-20 18:13:08 字数 4379 浏览 4 评论 0原文

我需要帮助将 XML 转换为 CSV 文件,我部分成功,但我不知道如何将时间值和阶段 id 添加到 python 代码中。

我有以下从 XML 链接复制的 XML:

<?xml version="1.0" encoding="UTF-8"?>
 <akouda>   
    <time value="2022-04-12 13:45:00">
        <phases>
            <phase id="0">          
                <act_energy>1.2000000000000455</act_energy>
                <react_energy>1.9711529080673937</react_energy>
                <current_inst>7.08</current_inst>
                <voltage_inst>242.7</voltage_inst>
                <power_inst>0.9</power_inst>
                <power_fact>0.52</power_fact>
                <thd>66.45</thd>
                </phase>
            <phase id="1">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>16.1</current_inst>
                <voltage_inst>242</voltage_inst>
                <power_inst>2.38</power_inst>
                <power_fact>0.61</power_fact>
                <thd>31</thd>
                </phase>
            <phase id="2">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>8.64</current_inst>
                <voltage_inst>242.7</voltage_inst>
                <power_inst>2.01</power_inst>
                <power_fact>0.95</power_fact>
                <thd>26.81</thd>
                </phase>
            </phases>
     </time>
<time value="2022-04-12 13:30:00">
        <phases>
            <phase id="0">          
                <act_energy>1.2999999999999545</act_energy>
                <react_energy>2.1354156504061876</react_energy>
                <current_inst>7.06</current_inst>
                <voltage_inst>242.2</voltage_inst>
                <power_inst>0.9</power_inst>
                <power_fact>0.52</power_fact>
                <thd>65.89</thd>
                </phase>
            <phase id="1">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>16.95</current_inst>
                <voltage_inst>241</voltage_inst>
                <power_inst>2.61</power_inst>
                <power_fact>0.63</power_fact>
                <thd>29.1</thd>
                </phase>
            <phase id="2">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>9.57</current_inst>
                <voltage_inst>242.4</voltage_inst>
                <power_inst>2.23</power_inst>
                <power_fact>0.96</power_fact>
                <thd>24.12</thd>
                </phase>
            </phases>
     </time>
    </akouda>

以及将 XML 转换为 CSV 的以下代码:

    import xml.etree.ElementTree as Xet
import pandas as pd

rows = []

# Parsing the XML file
xmlparse = Xet.parse('sample.xml')
root = xmlparse.getroot()
for i in root.findall('phases'):
    act_energy = i.find("act_energy").text
    react_energy = i.find("react_energy").text
    current_inst = i.find("current_inst").text
    voltage_inst = i.find("voltage_inst").text
    power_inst = i.find("power_inst").text
    power_fact = i.find("power_fact").text
    thd = i.find("thd").text


    rows.append({
                "act_energy": act_energy,
                "react_energy": react_energy,
                "current_inst": current_inst,
                "voltage_inst": voltage_inst,
                "power_inst": power_inst,
                "power_fact": power_fact,
                "thd": thd,
                })

df = pd.DataFrame(rows )

# Writing dataframe to csv
df.to_csv('output.csv')

  1. 如何在 python 代码中包含时间值和阶段 id?
  2. 如何从链接插入 XML,而不是从文件插入?

谢谢

I need help in converting XML to CSV files, I success in part but I don't know how to add the time value and phase id to python code.

I have the following XML that is copy from XML link :

<?xml version="1.0" encoding="UTF-8"?>
 <akouda>   
    <time value="2022-04-12 13:45:00">
        <phases>
            <phase id="0">          
                <act_energy>1.2000000000000455</act_energy>
                <react_energy>1.9711529080673937</react_energy>
                <current_inst>7.08</current_inst>
                <voltage_inst>242.7</voltage_inst>
                <power_inst>0.9</power_inst>
                <power_fact>0.52</power_fact>
                <thd>66.45</thd>
                </phase>
            <phase id="1">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>16.1</current_inst>
                <voltage_inst>242</voltage_inst>
                <power_inst>2.38</power_inst>
                <power_fact>0.61</power_fact>
                <thd>31</thd>
                </phase>
            <phase id="2">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>8.64</current_inst>
                <voltage_inst>242.7</voltage_inst>
                <power_inst>2.01</power_inst>
                <power_fact>0.95</power_fact>
                <thd>26.81</thd>
                </phase>
            </phases>
     </time>
<time value="2022-04-12 13:30:00">
        <phases>
            <phase id="0">          
                <act_energy>1.2999999999999545</act_energy>
                <react_energy>2.1354156504061876</react_energy>
                <current_inst>7.06</current_inst>
                <voltage_inst>242.2</voltage_inst>
                <power_inst>0.9</power_inst>
                <power_fact>0.52</power_fact>
                <thd>65.89</thd>
                </phase>
            <phase id="1">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>16.95</current_inst>
                <voltage_inst>241</voltage_inst>
                <power_inst>2.61</power_inst>
                <power_fact>0.63</power_fact>
                <thd>29.1</thd>
                </phase>
            <phase id="2">          
                <act_energy>0</act_energy>
                <react_energy>0</react_energy>
                <current_inst>9.57</current_inst>
                <voltage_inst>242.4</voltage_inst>
                <power_inst>2.23</power_inst>
                <power_fact>0.96</power_fact>
                <thd>24.12</thd>
                </phase>
            </phases>
     </time>
    </akouda>

and the following code to convert XML to CSV :

    import xml.etree.ElementTree as Xet
import pandas as pd

rows = []

# Parsing the XML file
xmlparse = Xet.parse('sample.xml')
root = xmlparse.getroot()
for i in root.findall('phases'):
    act_energy = i.find("act_energy").text
    react_energy = i.find("react_energy").text
    current_inst = i.find("current_inst").text
    voltage_inst = i.find("voltage_inst").text
    power_inst = i.find("power_inst").text
    power_fact = i.find("power_fact").text
    thd = i.find("thd").text


    rows.append({
                "act_energy": act_energy,
                "react_energy": react_energy,
                "current_inst": current_inst,
                "voltage_inst": voltage_inst,
                "power_inst": power_inst,
                "power_fact": power_fact,
                "thd": thd,
                })

df = pd.DataFrame(rows )

# Writing dataframe to csv
df.to_csv('output.csv')

  1. How I can include the time value and phase id in the python code?
  2. How to insert the XML from the link, not from a file?

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

轮廓§ 2025-01-27 18:13:08

您可以使用 pd.read_xml<例如,具有正确 XPath 的 /code>函数(您也可以向 .read_xml() 函数提供 URL):

df = pd.read_xml("data.xml", xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))

打印:

                 value   id  act_energy  react_energy  current_inst  voltage_inst  power_inst  power_fact    thd
1  2022-04-12 13:45:00  0.0         1.2      1.971153          7.08         242.7        0.90        0.52  66.45
2  2022-04-12 13:45:00  1.0         0.0      0.000000         16.10         242.0        2.38        0.61  31.00
3  2022-04-12 13:45:00  2.0         0.0      0.000000          8.64         242.7        2.01        0.95  26.81
5  2022-04-12 13:30:00  0.0         1.3      2.135416          7.06         242.2        0.90        0.52  65.89
6  2022-04-12 13:30:00  1.0         0.0      0.000000         16.95         241.0        2.61        0.63  29.10
7  2022-04-12 13:30:00  2.0         0.0      0.000000          9.57         242.4        2.23        0.96  24.12

编辑: 要从提供的 URL 读取:

import requests
import pandas as pd
from html import unescape

url = "https://issat.ttn.tn/cu/export/akouda.php"

# quick-and-dirty method to remove first <pre> and last </pre>
# ideally, you will do this with html parser:
s = unescape(requests.get(url).text)[5:-6]

df = pd.read_xml(s, xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))

You can use pd.read_xml function with proper XPath, for example (you can supply URL to the .read_xml() function as well):

df = pd.read_xml("data.xml", xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))

Prints:

                 value   id  act_energy  react_energy  current_inst  voltage_inst  power_inst  power_fact    thd
1  2022-04-12 13:45:00  0.0         1.2      1.971153          7.08         242.7        0.90        0.52  66.45
2  2022-04-12 13:45:00  1.0         0.0      0.000000         16.10         242.0        2.38        0.61  31.00
3  2022-04-12 13:45:00  2.0         0.0      0.000000          8.64         242.7        2.01        0.95  26.81
5  2022-04-12 13:30:00  0.0         1.3      2.135416          7.06         242.2        0.90        0.52  65.89
6  2022-04-12 13:30:00  1.0         0.0      0.000000         16.95         241.0        2.61        0.63  29.10
7  2022-04-12 13:30:00  2.0         0.0      0.000000          9.57         242.4        2.23        0.96  24.12

EDIT: To read from provided URL:

import requests
import pandas as pd
from html import unescape

url = "https://issat.ttn.tn/cu/export/akouda.php"

# quick-and-dirty method to remove first <pre> and last </pre>
# ideally, you will do this with html parser:
s = unescape(requests.get(url).text)[5:-6]

df = pd.read_xml(s, xpath="//phases/* | //time")
df["value"] = df["value"].ffill()
print(df.dropna(how="all", axis=1).dropna(axis=0))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文