使用多个键将文本文件转换为字典

发布于 2025-01-18 17:24:32 字数 999 浏览 0 评论 0原文

我正在尝试使用以下格式的文本文件阅读:

student
    first name: John
    last name: Doe
    grade: 9
    gpa: 4.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

进入Python词典。试图最终结果看起来像:

{0:{'student':{'first name': 'John',
    'last name': 'Doe',
    'grade': '9',
    'gpa': '4.0'},
    "school": {'name': 'Richard High School',
               'city': 'Kansas City'},
1:{'student':{'first name': 'Jane',
    'last name': 'Doe',
    'grade': '10',
    'gpa': '3.0'},
    'school': {'name': 'Richard High School',
               'city': 'Kansas City'}
}

到目前为止,我知道如何处理内部钥匙:

with open('<filename>') as f:
    dict = {}
    for line in f:
        x, y = line.split(": ")
        dict[x] = y
    print(dict)

但是除​​此之外,我仍然陷入困境。

I'm trying to read in a text file formatted like the following:

student
    first name: John
    last name: Doe
    grade: 9
    gpa: 4.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

into a Python dictionary. Trying to have the end result look like:

{0:{'student':{'first name': 'John',
    'last name': 'Doe',
    'grade': '9',
    'gpa': '4.0'},
    "school": {'name': 'Richard High School',
               'city': 'Kansas City'},
1:{'student':{'first name': 'Jane',
    'last name': 'Doe',
    'grade': '10',
    'gpa': '3.0'},
    'school': {'name': 'Richard High School',
               'city': 'Kansas City'}
}

So far, I know how to handle the inner keys with:

with open('<filename>') as f:
    dict = {}
    for line in f:
        x, y = line.split(": ")
        dict[x] = y
    print(dict)

But beyond that I'm stuck.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

月下伊人醉 2025-01-25 17:24:32

这是一个可能的解决方案:

import re

file = open("a.txt")
dictionaryMain = {}
dictionaryElement = {}
dictionaryStudent = {}
dictionarySchool = {}


text = file.read()
elements = text.split("####")

i = 0
for element in elements:
    firstName = re.search('first name: (.+)', text).group(1)
    lastName = re.search('last name: (.+)', text).group(1)
    grade = re.search('grade: (.+)', text).group(1)
    gpa = re.search('gpa: (.+)', text).group(1)
    name = re.search('name: (.+)', text).group(1)
    city = re.search('city: (.+)', text).group(1)
    dictionaryStudent['first name'] = firstName
    dictionaryStudent['last name'] = lastName
    dictionaryStudent['grade'] = grade
    dictionaryStudent['gpa'] = gpa
    dictionarySchool['name'] = name
    dictionarySchool['city'] = city
    dictionaryElement['student'] = dictionaryStudent
    dictionaryElement['school'] = dictionarySchool
    i = i+1
    dictionaryMain[i] = dictionaryElement

print(dictionaryMain)

输入文件:

student
    first name: John
    last name: Doe
    grade: 9
    gpa: 4.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

输出:

{
  1: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  },
  2: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  },
  3: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  }
}

我不完全知道您的用例是什么,但是如果您具有如此严格的格式,您应该真正考虑使用数据类。

That's a possible solution:

import re

file = open("a.txt")
dictionaryMain = {}
dictionaryElement = {}
dictionaryStudent = {}
dictionarySchool = {}


text = file.read()
elements = text.split("####")

i = 0
for element in elements:
    firstName = re.search('first name: (.+)', text).group(1)
    lastName = re.search('last name: (.+)', text).group(1)
    grade = re.search('grade: (.+)', text).group(1)
    gpa = re.search('gpa: (.+)', text).group(1)
    name = re.search('name: (.+)', text).group(1)
    city = re.search('city: (.+)', text).group(1)
    dictionaryStudent['first name'] = firstName
    dictionaryStudent['last name'] = lastName
    dictionaryStudent['grade'] = grade
    dictionaryStudent['gpa'] = gpa
    dictionarySchool['name'] = name
    dictionarySchool['city'] = city
    dictionaryElement['student'] = dictionaryStudent
    dictionaryElement['school'] = dictionarySchool
    i = i+1
    dictionaryMain[i] = dictionaryElement

print(dictionaryMain)

Input file:

student
    first name: John
    last name: Doe
    grade: 9
    gpa: 4.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

####

student
    first name: Jane
    last name: Doe
    grade: 10
    gpa: 3.0
school
    name: Richard High School
    city: Kansas City

Output:

{
  1: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  },
  2: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  },
  3: {
    'student': {
      'first name': 'John',
      'last name': 'Doe',
      'grade': '9',
      'gpa': '4.0'
    },
    'school': {
      'name': 'John',
      'city': 'Kansas City'
    }
  }
}

I do not exactly know what your use-case is, but you should really think about using data-classes if you have such a strict format.

少跟Wǒ拽 2025-01-25 17:24:32

如果您的数据的模式完全如您所写,并且您不介意拥有平面词典,每个学生一个:

pattern = re.compile(r"""
student
    first name: (?P<first_name>.*)
    last name: (?P<last_name>.*)
    grade: (?P<grade>\d*)
    gpa: (?P<gpa>\d+.?\d*)
school
    name: (?P<school>.*)
    city: (?P<city>.*)""".strip())


with open(<filename>, "r") as f:
    data = f.read()


students = [match.groupdict() for match in pattern.finditer(data)]

输出:

[{'first_name': 'John',
  'last_name': 'Doe',
  'grade': '9',
  'gpa': '4.0',
  'school': 'Richard High School',
  'city': 'Kansas City'},
 {'first_name': 'Jane',
  'last_name': 'Doe',
  'grade': '10',
  'gpa': '3.0',
  'school': 'Richard High School',
  'city': 'Kansas City'}]

我没有看到您想要的数据结构的好处,因此我的建议以获得更有利于表格数据分析的东西。

编辑:现在我们正在谈论 Pandas,

In [4]: df = pd.DataFrame(students)

In [5]: df
Out[5]:
  first_name last_name grade  gpa               school         city
0       John       Doe     9  4.0  Richard High School  Kansas City
1       Jane       Doe    10  3.0  Richard High School  Kansas City

获取每个年级的学生数量:

In [6]: df.groupby("grade").size()
Out[6]:
grade
10    1
9     1
dtype: int64

您还可以按任意数量的列进行分组,例如按年级和学校:

In [7]: df.groupby(["grade", "school"]).size()
Out[7]:
grade  school
10     Richard High School    1
9      Richard High School    1
dtype: int64

If your data are patterned exactly as you have written, and you don't mind having flat dictionaries, one per student:

pattern = re.compile(r"""
student
    first name: (?P<first_name>.*)
    last name: (?P<last_name>.*)
    grade: (?P<grade>\d*)
    gpa: (?P<gpa>\d+.?\d*)
school
    name: (?P<school>.*)
    city: (?P<city>.*)""".strip())


with open(<filename>, "r") as f:
    data = f.read()


students = [match.groupdict() for match in pattern.finditer(data)]

Output:

[{'first_name': 'John',
  'last_name': 'Doe',
  'grade': '9',
  'gpa': '4.0',
  'school': 'Richard High School',
  'city': 'Kansas City'},
 {'first_name': 'Jane',
  'last_name': 'Doe',
  'grade': '10',
  'gpa': '3.0',
  'school': 'Richard High School',
  'city': 'Kansas City'}]

I don't see the benefit of your desired data structure, hence my suggestion for something more conducive to tabular data analysis.

EDIT: now that we're talking about Pandas,

In [4]: df = pd.DataFrame(students)

In [5]: df
Out[5]:
  first_name last_name grade  gpa               school         city
0       John       Doe     9  4.0  Richard High School  Kansas City
1       Jane       Doe    10  3.0  Richard High School  Kansas City

Getting the count of students in each grade:

In [6]: df.groupby("grade").size()
Out[6]:
grade
10    1
9     1
dtype: int64

You can also group by any number of columns, for instance by grade and school:

In [7]: df.groupby(["grade", "school"]).size()
Out[7]:
grade  school
10     Richard High School    1
9      Richard High School    1
dtype: int64
2025-01-25 17:24:32
import re
temp = 0
data = {temp:{}}
with open('txt.txt') as f:
    for line in f:
        if len(line.strip()) == 0:
            continue
        if re.match("^[^:]*:.*$", line):
            key, value = line.split(':', 1)
            data[temp][main_key][key.strip()] = value.strip()
        elif re.match("^[^\#]*$", line):
            main_key = line.strip()
            if main_key in (data[temp].keys()):
                temp += 1
                data[temp] = {}
            data[temp][main_key] = {}

如果我正确地理解了你的目标,这就是答案。但要小心,它基于正则表达式,您现在可以在 regex101.com 中了解更多信息,

如果我看到像“”这样的行并且充满了空! (断线)
第二,我检查行格式是否类似于“key:value”,如果不是,那么它是主键,我将其添加到主字典中,否则,我将其添加到主字典中的最后一个字典中

import re
temp = 0
data = {temp:{}}
with open('txt.txt') as f:
    for line in f:
        if len(line.strip()) == 0:
            continue
        if re.match("^[^:]*:.*
quot;, line):
            key, value = line.split(':', 1)
            data[temp][main_key][key.strip()] = value.strip()
        elif re.match("^[^\#]*
quot;, line):
            main_key = line.strip()
            if main_key in (data[temp].keys()):
                temp += 1
                data[temp] = {}
            data[temp][main_key] = {}

if i realized your target correctly, this is answer. but be careful, it is based on regex and you can now more about it in regex101.com

in fist if, i scape lines that are somthing like " " and full of empty! (break lines)
in second, i check that if line format is like "key: value", if not, so it is main key and I add it in main dict and else, i add it in my last dict in main dict

枕头说它不想醒 2025-01-25 17:24:32

您可以这样做,但请记住,此方法非常特定于原始问题中定义的输入和输出:

d = dict()
k = 0
with open('foo.txt') as infile:
    for line in map(str.strip, infile):
        if len(line) > 0:
            match line:
                case 'student':
                    td = dict()
                    d[k] = {line: td}
                    k += 1
                case 'school':
                    td[line] = dict()
                    td = td[line]
                case _:
                    k_, *v = line.split(':')
                    if v:
                        td[k_] = v[0].strip()

print(d)

输出:

{0: {'student': {'first name': 'John', 'last name': 'Doe', 'grade': '9', 'gpa': '4.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}, 1: {'student': {'first name': 'Jane', 'last name': 'Doe', 'grade': '10', 'gpa': '3.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}}

You could do it like this but bear in mind that this method is very specific to the input and output as defined in the original question:

d = dict()
k = 0
with open('foo.txt') as infile:
    for line in map(str.strip, infile):
        if len(line) > 0:
            match line:
                case 'student':
                    td = dict()
                    d[k] = {line: td}
                    k += 1
                case 'school':
                    td[line] = dict()
                    td = td[line]
                case _:
                    k_, *v = line.split(':')
                    if v:
                        td[k_] = v[0].strip()

print(d)

Output:

{0: {'student': {'first name': 'John', 'last name': 'Doe', 'grade': '9', 'gpa': '4.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}, 1: {'student': {'first name': 'Jane', 'last name': 'Doe', 'grade': '10', 'gpa': '3.0', 'school': {'name': 'Richard High School', 'city': 'Kansas City'}}}}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文