通过解析输出文件创建复杂的数据结构

发布于 2025-01-07 15:29:42 字数 789 浏览 0 评论 0原文

我正在寻找一些有关如何通过解析文件创建数据结构的建议。这是我的文件中的列表。

'01bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',
'01bpar( 3)=  0.00000000E+00',
'02epar( 1)=  0.49998963E+02',
'02epar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',
'02epar( 3)=  0.00000000E+00',
'02epar( 4)=  0.17862340E-01  half_life=  0.3880495E+02  relax_time=  0.5598371E+02',
'02bpar( 1)=  0.49998962E+02',
'02bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',

我需要做的是构建一个如下所示的数据结构：

http://img11 .imageshack.us/img11/7645/datastruct.gif

（由于新用户限制而无法发布）

我已设法将所有正则表达式过滤器设置为得到需要的东西，但我无法构建结构。有想法吗？

原文

I'm looking for some advice on how to create a data structure by parsing a file.
This is the list i have in my file.

'01bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',
'01bpar( 3)=  0.00000000E+00',
'02epar( 1)=  0.49998963E+02',
'02epar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',
'02epar( 3)=  0.00000000E+00',
'02epar( 4)=  0.17862340E-01  half_life=  0.3880495E+02  relax_time=  0.5598371E+02',
'02bpar( 1)=  0.49998962E+02',
'02bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',

What I need to do is construct a data structure which chould look like this:

http://img11.imageshack.us/img11/7645/datastructure.gif

(couldn't post it becouse of new user restriction)

I've managed to get all the regexp filters to get what is needed, but i fail to construct the structure.
Ideas?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

薆情海 2025-01-14 15:29:42

理论上可以让 pyparsing 使用解析操作创建整个结构，但如果您只是像下面那样命名各个字段，那么构建结构也不错。如果您想转换为使用 RE，这个示例应该让您开始了解事情的外观

source = """\
'01bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02', 
'01bpar( 3)=  0.00000000E+00', 
'02epar( 1)=  0.49998963E+02', 
'02epar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02', 
'02epar( 3)=  0.00000000E+00', 
'02epar( 4)=  0.17862340E-01  half_life=  0.3880495E+02  relax_time=  0.5598371E+02', 
'02bpar( 1)=  0.49998962E+02', 
'02bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02', """

from pyparsing import Literal, Regex, Word, alphas, nums, oneOf, OneOrMore, quotedString, removeQuotes

EQ = Literal('=').suppress()
scinotationnum = Regex(r'\d\.\d+E[+-]\d+')
dataname = Word(alphas+'_')
key = Word(nums,exact=2) + oneOf("bpar epar")
index = '(' + Word(nums) + ')'

keyedValue = key + EQ + scinotationnum

# define an item in the source - suppress values with keys, just want the unkeyed ones
item = key('key') + index + EQ + OneOrMore(keyedValue.suppress() | scinotationnum)('data')

# initialize summary structure
from collections import defaultdict
results = defaultdict(lambda : {'epar':[], 'bpar':[]})

# extract quoted strings from list
quotedString.setParseAction(removeQuotes)
for raw in quotedString.searchString(source):
    parts = item.parseString(raw[0])
    num,par = parts.key
    results[num][par].extend(parts.data)

# dump out results, or do whatever
from pprint import pprint
pprint(dict(results.iteritems()))

：

{'01': {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []},
 '02': {'bpar': ['0.49998962E+02', '0.23103878E-01'],
        'epar': ['0.49998963E+02',
                 '0.23103878E-01',
                 '0.00000000E+00',
                 '0.17862340E-01']}}

It's theoretically possible to have pyparsing create the whole structure using parse actions, but if you just name the various fields as I have below, building up the structure is not too bad. And if you want to convert to using RE's, this example should give you a start on how things might look:

source = """\
'01bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02', 
'01bpar( 3)=  0.00000000E+00', 
'02epar( 1)=  0.49998963E+02', 
'02epar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02', 
'02epar( 3)=  0.00000000E+00', 
'02epar( 4)=  0.17862340E-01  half_life=  0.3880495E+02  relax_time=  0.5598371E+02', 
'02bpar( 1)=  0.49998962E+02', 
'02bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02', """

from pyparsing import Literal, Regex, Word, alphas, nums, oneOf, OneOrMore, quotedString, removeQuotes

EQ = Literal('=').suppress()
scinotationnum = Regex(r'\d\.\d+E[+-]\d+')
dataname = Word(alphas+'_')
key = Word(nums,exact=2) + oneOf("bpar epar")
index = '(' + Word(nums) + ')'

keyedValue = key + EQ + scinotationnum

# define an item in the source - suppress values with keys, just want the unkeyed ones
item = key('key') + index + EQ + OneOrMore(keyedValue.suppress() | scinotationnum)('data')

# initialize summary structure
from collections import defaultdict
results = defaultdict(lambda : {'epar':[], 'bpar':[]})

# extract quoted strings from list
quotedString.setParseAction(removeQuotes)
for raw in quotedString.searchString(source):
    parts = item.parseString(raw[0])
    num,par = parts.key
    results[num][par].extend(parts.data)

# dump out results, or do whatever
from pprint import pprint
pprint(dict(results.iteritems()))

Prints:

{'01': {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []},
 '02': {'bpar': ['0.49998962E+02', '0.23103878E-01'],
        'epar': ['0.49998963E+02',
                 '0.23103878E-01',
                 '0.00000000E+00',
                 '0.17862340E-01']}}

回复收藏 0 原文

紫南 2025-01-14 15:29:42

考虑使用字典的字典。

#!/usr/bin/env python
import re
import pprint
raw = """'01bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',
'01bpar( 3)=  0.00000000E+00',
'02epar( 1)=  0.49998963E+02',
'02epar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',
'02epar( 3)=  0.00000000E+00',
'02epar( 4)=  0.17862340E-01  half_life=  0.3880495E+02  relax_time=  0.5598371E+02',
'02bpar( 1)=  0.49998962E+02',
'02bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',"""

datastruct = {}
pattern = re.compile(r"""\D(?P<digits>\d+)(?P<field>[eb]par)[^=]+=\D+(?P<number>\d+\.\d+E[+-]\d+)""")
for line in raw.splitlines():
    result = pattern.search(line)
    parts = result.groupdict()
    if not parts['digits'] in datastruct:
        datastruct[parts['digits']] = {'epar':[], 'bpar':[]}
    datastruct[parts['digits']][parts['field']].append(parts['number'])

pprint.pprint(datastruct, depth=4)

产生：

{'01': {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []},
 '02': {'bpar': ['0.49998962E+02', '0.23103878E-01'],
        'epar': ['0.49998963E+02',
                 '0.23103878E-01',
                 '0.00000000E+00',
                 '0.17862340E-01']}}

根据评论修订版本：

pattern = re.compile(r"""\D(?P<digits>\d+)(?P<field>[eb]par)[^=]+=\D+(?P<number>\d+\.\d+E[+-]\d+)""")

default = lambda : dict((('epar',[]), ('bpar',[])))
datastruct = defaultdict( default)

for line in raw.splitlines():
    result = pattern.search(line)
    parts = result.groupdict()
    datastruct[parts['digits']][parts['field']].append(parts['number'])

pprint.pprint(datastruct.items())

产生：

[('02',
  {'bpar': ['0.49998962E+02', '0.23103878E-01'],
   'epar': ['0.49998963E+02',
            '0.23103878E-01',
            '0.00000000E+00',
            '0.17862340E-01']}),
 ('01', {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []})]

Consider using a dict of dicts.

#!/usr/bin/env python
import re
import pprint
raw = """'01bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',
'01bpar( 3)=  0.00000000E+00',
'02epar( 1)=  0.49998963E+02',
'02epar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',
'02epar( 3)=  0.00000000E+00',
'02epar( 4)=  0.17862340E-01  half_life=  0.3880495E+02  relax_time=  0.5598371E+02',
'02bpar( 1)=  0.49998962E+02',
'02bpar( 2)=  0.23103878E-01  half_life=  0.3000133E+02  relax_time=  0.4328278E+02',"""

datastruct = {}
pattern = re.compile(r"""\D(?P<digits>\d+)(?P<field>[eb]par)[^=]+=\D+(?P<number>\d+\.\d+E[+-]\d+)""")
for line in raw.splitlines():
    result = pattern.search(line)
    parts = result.groupdict()
    if not parts['digits'] in datastruct:
        datastruct[parts['digits']] = {'epar':[], 'bpar':[]}
    datastruct[parts['digits']][parts['field']].append(parts['number'])

pprint.pprint(datastruct, depth=4)

Produces:

{'01': {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []},
 '02': {'bpar': ['0.49998962E+02', '0.23103878E-01'],
        'epar': ['0.49998963E+02',
                 '0.23103878E-01',
                 '0.00000000E+00',
                 '0.17862340E-01']}}

Revised version in light of comments:

pattern = re.compile(r"""\D(?P<digits>\d+)(?P<field>[eb]par)[^=]+=\D+(?P<number>\d+\.\d+E[+-]\d+)""")

default = lambda : dict((('epar',[]), ('bpar',[])))
datastruct = defaultdict( default)

for line in raw.splitlines():
    result = pattern.search(line)
    parts = result.groupdict()
    datastruct[parts['digits']][parts['field']].append(parts['number'])

pprint.pprint(datastruct.items())

which produces:

[('02',
  {'bpar': ['0.49998962E+02', '0.23103878E-01'],
   'epar': ['0.49998963E+02',
            '0.23103878E-01',
            '0.00000000E+00',
            '0.17862340E-01']}),
 ('01', {'bpar': ['0.23103878E-01', '0.00000000E+00'], 'epar': []})]

回复收藏 0 原文

删除会话 2025-01-14 15:29:42

您的顶层结构是位置性的，因此它是列表的完美选择。由于列表可以容纳任意项目，因此命名元组是完美的。元组中的每个项目都可以包含一个包含其元素的列表。

所以，你的代码应该看起来像这样的伪代码：

from collections import named tuple
data = []
newTuple = namedtuple('stuff', ['epar','bpar'])
for line in theFile.readlines():
    eparVals = regexToGetThemFromString()
    bparVals = regexToGetThemFromString()
    t = newTuple(eparVals, bparVals)
    data.append(t)

你说你已经可以循环文件，并且有各种正则表达式来获取数据，所以我没有费心添加所有细节。

Your top level structure is positional, so it's a perfect choice for a list. Since lists can hold arbitrary items, then a named tuple is perfect. Each item in the tuple can hold a list with it's elements.

So, your code should look something like this pseudocode:

from collections import named tuple
data = []
newTuple = namedtuple('stuff', ['epar','bpar'])
for line in theFile.readlines():
    eparVals = regexToGetThemFromString()
    bparVals = regexToGetThemFromString()
    t = newTuple(eparVals, bparVals)
    data.append(t)

You said you could already loop over the file, and had various regex to get the data, so I didn't bother adding all the details.

回复收藏 0 原文

~没有更多了~