将 XML 数据组织到字典中

发布于 2024-11-25 08:59:36 字数 4782 浏览 1 评论 0原文

我正在尝试将 XML 数据组织成字典格式。这将用于运行蒙特卡罗模拟。

下面是 XML 中几个条目的示例：

<retirement>
    <item>
        <low>-0.34</low>
        <high>-0.32</high>
        <freq>0.0294117647058824</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
    <item>
        <low>-0.32</low>
        <high>-0.29</high>
        <freq>0</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
</retirement>

我当前的数据集只有两个变量，类型可以是 3 种离散类型中的 1 种，也可能是 4 种离散类型。对两个变量进行硬编码不是问题，但我想开始处理具有更多变量的数据并自动化此过程。我的目标是自动将此 XML 数据导入到字典中，以便以后能够进一步操作它，而无需在数组标题中进行硬编码，变量。

这就是我所拥有的：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse('xmlfile')

# Create iterable item list
Items = tree.findall('item')

# Create Master Dictionary
masterDictionary = {}

# Assign variables to dictionary
for Item in Items:
    thisKey = Item.find('variable').text
    if thisKey in masterDictionary == False:
        masterDictionary[thisKey] = []
    else:
        pass

thisList = masterDictionary[thisKey]
newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text))
thisSublist.append(newDataPoint)

我收到一个 KeyError @ thisList = masterDictionary[thisKey]

我还尝试创建一个类来处理 xml 的一些其他元素：

# Define a class for each data point that contains low, hi and freq attributes
class DataPoint:
 def __init__(self, low, high, freq):
  self.low = low
  self.high = high
  self.freq = freq

然后我是否能够检查某个值喜欢：

masterDictionary['stock'] [0].freq

感谢任何和所有帮助

更新

感谢约翰的帮助。缩进问题是我的草率。这是我第一次在 Stack 上发帖，只是没有正确复制/粘贴。 else: 之后的部分实际上缩进为 for 循环的一部分，并且该类在我的代码中缩进了四个空格——这里只是一个糟糕的发布。我会记住大小写约定。您的建议确实有效，现在使用以下命令：

print masterDictionary.keys()
print masterDictionary['stock'][0].low

yields：

['inflation', 'stock']
-0.34

这些确实是我的两个变量，并且值与顶部列出的 xml 同步。

更新2

好吧，我以为我已经解决了这个问题，但我又粗心了，结果发现我还没有完全解决这个问题。之前的解决方案最终将所有数据写入我的两个字典键，以便我有两个相等的列表，其中包含分配给两个不同字典键的所有数据。这个想法是将不同的数据集从 XML 分配给匹配的字典键。这是当前代码：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}
thisList = []

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text 
    masterDictionary[thisKey] = thisList
    if thisKey not in masterDictionary:
        masterDictionary[thisKey] = []
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    thisList.append(newDataPoint)

当我输入时：

print masterDictionary['stock'][5].low
print masterDictionary['inflation'][5].low
print len(masterDictionary['stock'])
print len(masterDictionary['inflation'])

两个键（“stock”和“inflation”）的结果是相同的：

-.22
-.22
56
56

XML 文件中有 27 个带有 stock 标签的项目，29 个带有通胀标记的项目。如何使分配给字典键的每个列表仅提取循环中的特定数据？

更新 3

它似乎适用于 2 个循环，但我不知道它如何以及为什么不能在 1 个循环中工作。我意外地做到了这一点：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text
    thisList = []
    masterDictionary[thisKey] = thisList

for item in items:
    thisKey = item.find('variable').text
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    masterDictionary[thisKey].append(newDataPoint)

我尝试了大量的排列以使其在一个循环中发生，但没有运气。我可以获取两个键中列出的所有数据——所有数据的相同数组（不是很有帮助），或者将数据正确排序到两个键的两个不同数组中，但只有最后一个数据条目（循环会覆盖自身）每次在数组中只留下一个条目）。

原文

I'm trying to organize my data into a dictionary format from XML data. This will be used to run Monte Carlo simulations.

Here is an example of what a couple of entries in the XML look like:

<retirement>
    <item>
        <low>-0.34</low>
        <high>-0.32</high>
        <freq>0.0294117647058824</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
    <item>
        <low>-0.32</low>
        <high>-0.29</high>
        <freq>0</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
</retirement>

My current data sets only have two variables and the type can be 1 of 3 or possible 4 discrete types. Hard coding two variables isn't a problem, but I would like to start working with data that has many more variables and automate this process. My goal is to automatically import this XML data into a dictionary to be able to further manipulate it later without having to hard code in the array titles and
the variables.

Here is what I have:

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse('xmlfile')

# Create iterable item list
Items = tree.findall('item')

# Create Master Dictionary
masterDictionary = {}

# Assign variables to dictionary
for Item in Items:
    thisKey = Item.find('variable').text
    if thisKey in masterDictionary == False:
        masterDictionary[thisKey] = []
    else:
        pass

thisList = masterDictionary[thisKey]
newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text))
thisSublist.append(newDataPoint)

I'm getting a KeyError @ thisList = masterDictionary[thisKey]

I am also trying to create a class to deal with some of the other elements of the xml:

# Define a class for each data point that contains low, hi and freq attributes
class DataPoint:
 def __init__(self, low, high, freq):
  self.low = low
  self.high = high
  self.freq = freq

Would I then be able to check a value with something like:

masterDictionary['stock'] [0].freq

Any and all help is appreciated

UPDATE

Thanks for the help John. The indentation issues are sloppiness on my part. It's my first time posting on Stack and I just didn't get the copy/paste right. The part after the else: is in fact indented to be a part of the for loop and the class is indented with four spaces in my code--just a bad posting here. I'll keep the capitalization convention in mind. Your suggestion indeed worked and now with the commands:

print masterDictionary.keys()
print masterDictionary['stock'][0].low

yields:

['inflation', 'stock']
-0.34

those are indeed my two variables and the value syncs with the xml listed at the top.

UPDATE 2

Well, I thought I had figured this one out, but I was careless again and it turns out that I hadn't quite fixed the issue. The previous solution ended up writing all of the data to my two dictionary keys so that I have two equal lists of all the data assigned to two different dictionary keys. The idea is to have distinct sets of data assigned from the XML to the matching dictionary key. Here is the current code:

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}
thisList = []

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text 
    masterDictionary[thisKey] = thisList
    if thisKey not in masterDictionary:
        masterDictionary[thisKey] = []
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    thisList.append(newDataPoint)

When I input:

print masterDictionary['stock'][5].low
print masterDictionary['inflation'][5].low
print len(masterDictionary['stock'])
print len(masterDictionary['inflation'])

the results are identical for both keys ('stock' and 'inflation'):

-.22
-.22
56
56

There are 27 items with the stock tag in the XML file and 29 tagged with inflation. How can I make each list assigned to a dictionary key only pull the particular data in the loop?

UPDATE 3

It seems to work with 2 loops, but I have no idea how and why it won't work in 1 single loop. I managed this accidentally:

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text
    thisList = []
    masterDictionary[thisKey] = thisList

for item in items:
    thisKey = item.find('variable').text
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    masterDictionary[thisKey].append(newDataPoint)

I have tried a large number of permutations to make it happen in one single loop but no luck. I can get all of the data listed into both keys--identical arrays of all the data (not very helpful), or the data sorted correctly into 2 distinct arrays for both keys, but only the last single data entry (the loop overwrites itself each time leaving you with only one entry in the array).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一束光，穿透我孤独的魂 2024-12-02 08:59:36

在（不必要的）else: pass 之后，您遇到了严重的缩进问题。解决这个问题并重试。您的示例输入数据是否出现问题？其他数据？第一次循环？导致问题的 thisKey 的值是多少[提示：它在 KeyError 错误消息中报告]？错误发生之前 masterDictionary 的内容是什么[提示：在代码周围撒上一些 print 语句]？

与您的问题无关的其他注释：

考虑使用 if thisKey not in masterDictionary: 而不是 if thisKey in masterDictionary == False: ... 与 True< /code> 或 False 几乎总是多余的和/或有点“代码味道”。

Python 约定是为类保留首字母大写的名称（如 Item）。

每个缩进级别仅使用一个空格会使代码几乎难以辨认，因此已被严重弃用。始终使用 4（除非你有充分的理由——但我从未听说过）。

更新我错了：masterDictionary == False中的thisKey == False比我想象的更糟糕；因为 in 是一个关系运算符，所以使用链式求值（如 a <= b < c），所以你有 (thisKey in masterDictionary) 和 (masterDictionary == False) 其计算结果始终为 False，因此字典永远不会更新。修复方法正如我建议的那样：使用 if thisKey not in masterDictionary:

另外，它看起来像 thisList （已初始化但未使用）应该是 thisSublist （使用但未初始化）。

回复收藏 0 原文

十二 2024-12-02 08:59:36

更改：

if thisKey in masterDictionary == False:

这

if thisKey not in masterDictionary:

似乎就是您收到该错误的原因。
另外，在尝试附加到“thisSublist”之前，您需要为其分配一些内容。尝试：

thisSublist = []
thisSublist.append(newDataPoint)

Change:

if thisKey in masterDictionary == False:

if thisKey not in masterDictionary:

That seems to be why you were getting that error.
Also, you need to assign something to 'thisSublist' before you try and append to it. Try:

thisSublist = []
thisSublist.append(newDataPoint)

回复收藏 0 原文

神也荒唐 2024-12-02 08:59:36

for 循环内的 if 语句有错误。考虑

if thisKey in masterDictionary == False:

出现

if (thisKey in masterDictionary) == False:

到原始代码的其余部分，您将能够像这样访问数据：

masterDictionary['stock'][0].freq

John Machin 提出了一些关于风格和气味的有效观点（您应该考虑他建议的更改），但这些事情会随着时间的推移而和经验。

You have an error in your if-statement inside the for-loop. Instead of

if thisKey in masterDictionary == False:

write

if (thisKey in masterDictionary) == False:

Given the rest of your original code, you will be able to access data like so:

masterDictionary['stock'][0].freq

John Machin makes some valid points regarding style and smell, (and you should think about his suggested changes), but those things will come with time and experience.

回复收藏 0 原文

~没有更多了~