使用 Python 从基于关键字的材料数据文件中解析数据

发布于 2025-01-10 09:33:01 字数 960 浏览 0 评论 0原文

我有一个基于关键字的材料数据文件。我想解析该文件中的数据并创建变量和矩阵以在 Python 脚本中处理它们。材料文件的最顶部可能有以字符串“**”开头的注释行,我只是想忽略这些注释并解析 *keyword_1 形式的关键字后面的其他行上的数据,以及它们的逗号分隔参数格式为 param_1=param1。

输入图片这里的描述

使用 Python 从这种基于关键字的文本文件中解析数据的最快、最简单的方法是什么?我可以使用 pandas 来实现此目的吗?如何使用?

以下是输入材料文件示例:alloy_1.nam

*************************************************
**               ALLOY_1 MATERIAL DATA
*************************************************
*MATERIAL,NAME=ALLOY_1
*ELASTIC,TYPE=ISO
2.08E5,0.3,291.
2.04E5,0.3,422.
1.96E5,0.3,589.
1.85E5,0.3,755.
1.74E5,0.3,922.
1.61E5,0.3,1089.
1.52E5,0.3,1220.
*EXPANSION,TYPE=ISO,ZERO=293.
13.5E-6,291.
13.6E-6,422.
13.9E-6,589.
14.2E-6,755.
14.7E-6,922.
15.5E-6,1089.
16.4E-6,1200.
*DENSITY
7.92E-9
*CONDUCTIVITY
10.,273.
18.,873.
27.,1373.
*SPECIFIC HEAT
450.e6,273.
580.e6,873.
710.e6,1373.

I have a keyword based materials data file. I want to parse data from this file and create variables and matrices to work on them in a Python script. The material file may have comment lines in the very top starting with the string "**", I simply want to ignore these and parse the data on other lines that follows a keyword of the form *keyword_1, and also their comma-delimited parameters of the form param_1=param1.

enter image description here

What is the fastest and easiest way to parse data from this kind of keyword based text file with Python? Can I use pandas for this and how?

below is a sample input material file: alloy_1.nam

*************************************************
**               ALLOY_1 MATERIAL DATA
*************************************************
*MATERIAL,NAME=ALLOY_1
*ELASTIC,TYPE=ISO
2.08E5,0.3,291.
2.04E5,0.3,422.
1.96E5,0.3,589.
1.85E5,0.3,755.
1.74E5,0.3,922.
1.61E5,0.3,1089.
1.52E5,0.3,1220.
*EXPANSION,TYPE=ISO,ZERO=293.
13.5E-6,291.
13.6E-6,422.
13.9E-6,589.
14.2E-6,755.
14.7E-6,922.
15.5E-6,1089.
16.4E-6,1200.
*DENSITY
7.92E-9
*CONDUCTIVITY
10.,273.
18.,873.
27.,1373.
*SPECIFIC HEAT
450.e6,273.
580.e6,873.
710.e6,1373.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

踏雪无痕 2025-01-17 09:33:01

方法是创建一个字典列表,其中每个元素都是一个键 = 类别名称和数据框形式的数据。我们必须使用临时字典来存储逗号分隔的数据,每次找到新类别时,这些数据都会附加到字典列表中。

使用 pandas.Dataframe() 创建数据帧

下面是代码:

with open('/Users/rpghosh/scikit_learn_data/test.txt') as f:
    lines = f.readlines()

# empty list of dataframes
lst_dfs = []

# empty dictionary to store each dataframe temporarily
d = {}
dfName = ''
PrevdfName = ''
createDF = False

for line in lines:
    
    if re.match('^\*{1}([A-Za-z0-9,=]{1,})\n

要查看输出,您将有一个数据帧列表,因此代码将是 -

for idx, df in enumerate(lst_dfs):
    print(f"{idx=}")
    print(df)
    print()

输出:

idx=0
{'elastic':   col1 col2 col3
0   21   22   23
1   11   12   13
2   31   32   33}

idx=1
{'expansion':   col1 col2 col3
0    4    5    6
1   41   15   16
2   42   25   26}

idx=2
{'density':     col1
0  12343}

idx=3
{'conductivity':   col1 col2 col3 col4 col5 col6
0   54   55   56   51   55   56
1   42   55   56   51   55   56
2   54   55   56   51   55   56
3   42   55   56   51   55   56}
, line): variable = line.lstrip('*').rstrip().split(',') PrevdfName = dfName dfName = variable[0] createDF = False if (not createDF) and len(d) > 0: df = pd.DataFrame(d) # append a dictionary which has category and dataframe lst_dfs.append( { PrevdfName : df} ) d = {} elif re.match('^[0-9]([0-9,]){1,}\n

要查看输出,您将有一个数据帧列表,因此代码将是 -


输出:


,line):
        #dfName = PrevdfName
        data = line.rstrip().split(',')

        
        for i in range(len(data)):
        
            # customised column name 
            colName = 'col' + str(i+1)

            # if the colname is already present in the 
            # dictionary keys then append the element 
            # to existing key's list
            if colName in d.keys():
                d[colName].append( data[i])
            else:
                d[colName] = [data[i]]                
    else:
        createDF = False
        d={}

df = pd.DataFrame(d)
lst_dfs.append({ dfName : df})

要查看输出,您将有一个数据帧列表,因此代码将是 -

输出:

The way is to create a list of dictionaries where each element is a key = Category name and the data in the form of dataframe. We have to use a temporary dictionary to store the comma separated data which gets appended into the list of dictionary each time a new category is found.

Use the pandas.Dataframe() to create the dataframe

Below is the code:

with open('/Users/rpghosh/scikit_learn_data/test.txt') as f:
    lines = f.readlines()

# empty list of dataframes
lst_dfs = []

# empty dictionary to store each dataframe temporarily
d = {}
dfName = ''
PrevdfName = ''
createDF = False

for line in lines:
    
    if re.match('^\*{1}([A-Za-z0-9,=]{1,})\n

To view the output , you will have a list of dataframes , so the code will be -

for idx, df in enumerate(lst_dfs):
    print(f"{idx=}")
    print(df)
    print()

Output :

idx=0
{'elastic':   col1 col2 col3
0   21   22   23
1   11   12   13
2   31   32   33}

idx=1
{'expansion':   col1 col2 col3
0    4    5    6
1   41   15   16
2   42   25   26}

idx=2
{'density':     col1
0  12343}

idx=3
{'conductivity':   col1 col2 col3 col4 col5 col6
0   54   55   56   51   55   56
1   42   55   56   51   55   56
2   54   55   56   51   55   56
3   42   55   56   51   55   56}
, line): variable = line.lstrip('*').rstrip().split(',') PrevdfName = dfName dfName = variable[0] createDF = False if (not createDF) and len(d) > 0: df = pd.DataFrame(d) # append a dictionary which has category and dataframe lst_dfs.append( { PrevdfName : df} ) d = {} elif re.match('^[0-9]([0-9,]){1,}\n

To view the output , you will have a list of dataframes , so the code will be -


Output :


,line):
        #dfName = PrevdfName
        data = line.rstrip().split(',')

        
        for i in range(len(data)):
        
            # customised column name 
            colName = 'col' + str(i+1)

            # if the colname is already present in the 
            # dictionary keys then append the element 
            # to existing key's list
            if colName in d.keys():
                d[colName].append( data[i])
            else:
                d[colName] = [data[i]]                
    else:
        createDF = False
        d={}

df = pd.DataFrame(d)
lst_dfs.append({ dfName : df})

To view the output , you will have a list of dataframes , so the code will be -

Output :

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文