将 Solr XML 解析为 Python 字典

发布于 2024-10-27 04:14:30 字数 926 浏览 8 评论 0原文

我是 python 新手,正在尝试将 xml 文档(填充了 solr 实例的文档)传递到 python 字典中。我在尝试实际完成此任务时遇到困难。我尝试过使用 ElementTree 和 minidom,但似乎无法获得正确的结果。

这是我的 XML 结构:

<add>
    <doc>
        <field name="genLatitude">45.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">5.879745</field>
    </doc>
    <doc>
        <field name="genLatitude">46.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">6.879745</field>
    </doc>
</add>

我需要将其转换为一本字典,如下所示:

doc {
    "genLatitude": '45.639968',
    "carOfficeHoursEnd": '2000-01-01T09:00:00.000Z',
    "genLongitude": '5.879745',
    }

我不太熟悉字典的工作原理,但还有一种方法可以将所有“文档”放入一个字典中。

干杯。

I am new to python and am trying to pass an xml document (filled with documents for a solr instance) into a python dictionary. I am having trouble trying to actually accomplish this. I have tried using ElementTree and minidom but I can't seem to get the right results.

Here is my XML Structure:

<add>
    <doc>
        <field name="genLatitude">45.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">5.879745</field>
    </doc>
    <doc>
        <field name="genLatitude">46.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">6.879745</field>
    </doc>
</add>

And From this I need to turn it into a dictionary that looks like:

doc {
    "genLatitude": '45.639968',
    "carOfficeHoursEnd": '2000-01-01T09:00:00.000Z',
    "genLongitude": '5.879745',
    }

I am not too familiar with how dictionaries work but is there also a way to get all the "docs" into one dictionary.

cheers.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

浮光之海 2024-11-03 04:14:30
import xml.etree.cElementTree as etree
from pprint import pprint

root = etree.fromstring(xmlstr) # or etree.parse(filename_or_file).getroot()

docs = [{f.attrib['name']: f.text for f in doc.iterfind('field[@name]')}
        for doc in root.iterfind('doc')]
pprint(docs)

输出

[{'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
  'genLatitude': '45.639968',
  'genLongitude': '5.879745'},
 {'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
  'genLatitude': '46.639968',
  'genLongitude': '6.879745'}]

其中 xmlstr 为:

xmlstr = """
<add>
    <doc>
        <field name="genLatitude">45.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">5.879745</field>
    </doc>
    <doc>
        <field name="genLatitude">46.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">6.879745</field>
    </doc>
</add>
"""
import xml.etree.cElementTree as etree
from pprint import pprint

root = etree.fromstring(xmlstr) # or etree.parse(filename_or_file).getroot()

docs = [{f.attrib['name']: f.text for f in doc.iterfind('field[@name]')}
        for doc in root.iterfind('doc')]
pprint(docs)

Output

[{'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
  'genLatitude': '45.639968',
  'genLongitude': '5.879745'},
 {'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z',
  'genLatitude': '46.639968',
  'genLongitude': '6.879745'}]

Where xmlstr is:

xmlstr = """
<add>
    <doc>
        <field name="genLatitude">45.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">5.879745</field>
    </doc>
    <doc>
        <field name="genLatitude">46.639968</field>
        <field name="carOfficeHoursEnd">2000-01-01T09:00:00.000Z</field>
        <field name="genLongitude">6.879745</field>
    </doc>
</add>
"""
甜柠檬 2024-11-03 04:14:30

如果在请求参数中添加 wt=python ,Solr 可以返回 Python 字典。要将此文本响应转换为 Python 对象,请使用 ast.literal_eval (text_response)

这比解析 XML 简单得多。

Solr can return a Python dictionary if you add wt=python to the request parameters. To convert this text response into a Python object, use ast.literal_eval(text_response).

This is much simpler than parsing the XML.

清风夜微凉 2024-11-03 04:14:30

使用 ElementTree 的可能解决方案,为了示例,输出格式非常漂亮:

>>> import xml.etree.ElementTree as etree
>>> root = etree.parse(document).getroot()
>>> docs = []
>>> for doc in root.findall('doc'):
...   fields = {}
...   for field in doc:
...     fields[field.attrib['name']] = field.text
...   docs.append(fields)
... 
>>> print docs
[{'genLongitude': '5.879745',
  'genLatitude': '45.639968',
  'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
 {'genLongitude': '6.879745',
  'genLatitude': '46.639968',
  'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}]

您显示的 XML 文档没有提供区分每个 doc 的方法,因此我认为列表是最好的收集每个字典的结构。

事实上,如果您想将每个 doc 数据插入到另一个字典中,当然可以,但您需要为该字典选择合适的键。例如,使用Python为每个对象提供的id,您可以这样写:

>>> docs = {}
>>> for doc in root.findall('doc'):
...   fields = {}
...   for field in doc:
...     fields[field.attrib['name']] = field.text
...   docs[id(fields)] = fields
... 
>>> print docs
{3076930796L: {'genLongitude': '6.879745',
               'genLatitude': '46.639968',
               'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
 3076905540L: {'genLongitude': '5.879745',
               'genLatitude': '45.639968',
               'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}}

这个例子只是为了让您了解如何使用外部字典。如果您决定走这条路,我建议您找到一个有意义且可用的键,而不是 id 返回的对象的内存地址,该地址在不同的运行中可能会发生变化。

A possible solution using ElementTree, with output pretty formatted for sake of example:

>>> import xml.etree.ElementTree as etree
>>> root = etree.parse(document).getroot()
>>> docs = []
>>> for doc in root.findall('doc'):
...   fields = {}
...   for field in doc:
...     fields[field.attrib['name']] = field.text
...   docs.append(fields)
... 
>>> print docs
[{'genLongitude': '5.879745',
  'genLatitude': '45.639968',
  'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
 {'genLongitude': '6.879745',
  'genLatitude': '46.639968',
  'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}]

The XML document you show does not provide a way to distinguish each doc from the other, so I would maintain that a list is the best structure to collect each dictionary.

Indeed, if you want to insert each doc data into another dictionary, of course you can, but you need to choose a suitable key for that dictionary. For example, using the id Python provides for each object, you could write:

>>> docs = {}
>>> for doc in root.findall('doc'):
...   fields = {}
...   for field in doc:
...     fields[field.attrib['name']] = field.text
...   docs[id(fields)] = fields
... 
>>> print docs
{3076930796L: {'genLongitude': '6.879745',
               'genLatitude': '46.639968',
               'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'},
 3076905540L: {'genLongitude': '5.879745',
               'genLatitude': '45.639968',
               'carOfficeHoursEnd': '2000-01-01T09:00:00.000Z'}}

This example is designed just to let you see how to use the outer dictionary. If you decide to go down this path, I would suggest you to find a meaningful and usable key instead of the obejct's memory address returned by id, which can change from run to run.

梓梦 2024-11-03 04:14:30

将来自外部的任何字符串直接评估到 python 中是有风险的。谁知道里面有什么。

我建议使用 json 接口。像这样的东西:

import json
import urllib2

response_dict = json.loads(urllib2.urlopen('http://localhost:8080/solr/combined/select?wt=json&q=*&rows=1').read())

#to view the dict
print json.dumps(answer, indent=1)

It's risky to eval any string that comes from the outside directly into python. Who knows what's in there.

I'd suggest using the json interface. Something like:

import json
import urllib2

response_dict = json.loads(urllib2.urlopen('http://localhost:8080/solr/combined/select?wt=json&q=*&rows=1').read())

#to view the dict
print json.dumps(answer, indent=1)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文