将 Solr XML 解析为 Python 字典

发布于 2024-10-27 04:14:30 字数 926 浏览 8 评论 0原文

我是 python 新手，正在尝试将 xml 文档（填充了 solr 实例的文档）传递到 python 字典中。我在尝试实际完成此任务时遇到困难。我尝试过使用 ElementTree 和 minidom，但似乎无法获得正确的结果。

这是我的 XML 结构：

<add>
    <doc>
        <field name="genLatitude">45.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">5.879745</field>
    </doc>
    <doc>
        <field name="genLatitude">46.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">6.879745</field>
    </doc>
</add>

我需要将其转换为一本字典，如下所示：

doc {
    "genLatitude": '45.639968',
    "carOfficeHoursEnd": '2000-01-01T09:00:00.000Z',
    "genLongitude": '5.879745',
    }

我不太熟悉字典的工作原理，但还有一种方法可以将所有“文档”放入一个字典中。

干杯。

原文

I am new to python and am trying to pass an xml document (filled with documents for a solr instance) into a python dictionary. I am having trouble trying to actually accomplish this. I have tried using ElementTree and minidom but I can't seem to get the right results.

Here is my XML Structure:

<add>
    <doc>
        <field name="genLatitude">45.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">5.879745</field>
    </doc>
    <doc>
        <field name="genLatitude">46.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">6.879745</field>
    </doc>
</add>

And From this I need to turn it into a dictionary that looks like:

doc {
    "genLatitude": '45.639968',
    "carOfficeHoursEnd": '2000-01-01T09:00:00.000Z',
    "genLongitude": '5.879745',
    }

I am not too familiar with how dictionaries work but is there also a way to get all the "docs" into one dictionary.

cheers.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浮光之海 2024-11-03 04:14:30

import xml.etree.cElementTree as etree
from pprint import pprint

root = etree.fromstring(xmlstr) # or etree.parse(filename_or_file).getroot()

docs = [{f.attrib['name']: f.text for f in doc.iterfind('field[@name]')}
        for doc in root.iterfind('doc')]
pprint(docs)

输出

[{'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
  'genLatitude': '45.639968',
  'genLongitude': '5.879745'},
 {'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
  'genLatitude': '46.639968',
  'genLongitude': '6.879745'}]

其中 xmlstr 为：

xmlstr = """
<add>
    <doc>
        <field name="genLatitude">45.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">5.879745</field>
    </doc>
    <doc>
        <field name="genLatitude">46.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">6.879745</field>
    </doc>
</add>
"""

import xml.etree.cElementTree as etree
from pprint import pprint

root = etree.fromstring(xmlstr) # or etree.parse(filename_or_file).getroot()

docs = [{f.attrib['name']: f.text for f in doc.iterfind('field[@name]')}
        for doc in root.iterfind('doc')]
pprint(docs)

Output

[{'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
  'genLatitude': '45.639968',
  'genLongitude': '5.879745'},
 {'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
  'genLatitude': '46.639968',
  'genLongitude': '6.879745'}]

Where xmlstr is:

xmlstr = """
<add>
    <doc>
        <field name="genLatitude">45.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">5.879745</field>
    </doc>
    <doc>
        <field name="genLatitude">46.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">6.879745</field>
    </doc>
</add>
"""

回复收藏 0 原文

甜柠檬 2024-11-03 04:14:30

如果在请求参数中添加 wt=python ，Solr 可以返回 Python 字典。要将此文本响应转换为 Python 对象，请使用 ast.literal_eval （text_response）。

这比解析 XML 简单得多。

回复收藏 0 原文

清风夜微凉 2024-11-03 04:14:30

使用 ElementTree 的可能解决方案，为了示例，输出格式非常漂亮：

>>> import xml.etree.ElementTree as etree
>>> root = etree.parse(document).getroot()
>>> docs = []
>>> for doc in root.findall('doc'):
...   fields = {}
...   for field in doc:
...     fields[field.attrib['name']] = field.text
...   docs.append(fields)
... 
>>> print docs
[{'genLongitude': '5.879745',
  'genLatitude': '45.639968',
  'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
 {'genLongitude': '6.879745',
  'genLatitude': '46.639968',
  'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}]

您显示的 XML 文档没有提供区分每个 doc 的方法，因此我认为列表是最好的收集每个字典的结构。

事实上，如果您想将每个 doc 数据插入到另一个字典中，当然可以，但您需要为该字典选择合适的键。例如，使用Python为每个对象提供的id，您可以这样写：

>>> docs = {}
>>> for doc in root.findall('doc'):
...   fields = {}
...   for field in doc:
...     fields[field.attrib['name']] = field.text
...   docs[id(fields)] = fields
... 
>>> print docs
{3076930796L: {'genLongitude': '6.879745',
               'genLatitude': '46.639968',
               'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
 3076905540L: {'genLongitude': '5.879745',
               'genLatitude': '45.639968',
               'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}}

这个例子只是为了让您了解如何使用外部字典。如果您决定走这条路，我建议您找到一个有意义且可用的键，而不是 id 返回的对象的内存地址，该地址在不同的运行中可能会发生变化。

A possible solution using ElementTree, with output pretty formatted for sake of example:

>>> import xml.etree.ElementTree as etree
>>> root = etree.parse(document).getroot()
>>> docs = []
>>> for doc in root.findall('doc'):
...   fields = {}
...   for field in doc:
...     fields[field.attrib['name']] = field.text
...   docs.append(fields)
... 
>>> print docs
[{'genLongitude': '5.879745',
  'genLatitude': '45.639968',
  'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
 {'genLongitude': '6.879745',
  'genLatitude': '46.639968',
  'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}]

The XML document you show does not provide a way to distinguish each doc from the other, so I would maintain that a list is the best structure to collect each dictionary.

Indeed, if you want to insert each doc data into another dictionary, of course you can, but you need to choose a suitable key for that dictionary. For example, using the id Python provides for each object, you could write:

>>> docs = {}
>>> for doc in root.findall('doc'):
...   fields = {}
...   for field in doc:
...     fields[field.attrib['name']] = field.text
...   docs[id(fields)] = fields
... 
>>> print docs
{3076930796L: {'genLongitude': '6.879745',
               'genLatitude': '46.639968',
               'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
 3076905540L: {'genLongitude': '5.879745',
               'genLatitude': '45.639968',
               'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}}

This example is designed just to let you see how to use the outer dictionary. If you decide to go down this path, I would suggest you to find a meaningful and usable key instead of the obejct's memory address returned by id, which can change from run to run.

回复收藏 0 原文

梓梦 2024-11-03 04:14:30

将来自外部的任何字符串直接评估到 python 中是有风险的。谁知道里面有什么。

我建议使用 json 接口。像这样的东西：

import json
import urllib2

response_dict = json.loads(urllib2.urlopen('http://localhost:8080/solr/combined/select?wt=json&q=*&rows=1').read())

#to view the dict
print json.dumps(answer, indent=1)

It's risky to eval any string that comes from the outside directly into python. Who knows what's in there.

I'd suggest using the json interface. Something like:

import json
import urllib2

response_dict = json.loads(urllib2.urlopen('http://localhost:8080/solr/combined/select?wt=json&q=*&rows=1').read())

#to view the dict
print json.dumps(answer, indent=1)

回复收藏 0 原文

~没有更多了~