使用 Python 从基于关键字的材料数据文件中解析数据

发布于 2025-01-10 09:33:01 字数 960 浏览 0 评论 0原文

我有一个基于关键字的材料数据文件。我想解析该文件中的数据并创建变量和矩阵以在 Python 脚本中处理它们。材料文件的最顶部可能有以字符串“**”开头的注释行，我只是想忽略这些注释并解析 *keyword_1 形式的关键字后面的其他行上的数据，以及它们的逗号分隔参数格式为 param_1=param1。

使用 Python 从这种基于关键字的文本文件中解析数据的最快、最简单的方法是什么？我可以使用 pandas 来实现此目的吗？如何使用？

以下是输入材料文件示例：alloy_1.nam

*************************************************
**               ALLOY_1 MATERIAL DATA
*************************************************
*MATERIAL,NAME=ALLOY_1
*ELASTIC,TYPE=ISO
2.08E5,0.3,291.
2.04E5,0.3,422.
1.96E5,0.3,589.
1.85E5,0.3,755.
1.74E5,0.3,922.
1.61E5,0.3,1089.
1.52E5,0.3,1220.
*EXPANSION,TYPE=ISO,ZERO=293.
13.5E-6,291.
13.6E-6,422.
13.9E-6,589.
14.2E-6,755.
14.7E-6,922.
15.5E-6,1089.
16.4E-6,1200.
*DENSITY
7.92E-9
*CONDUCTIVITY
10.,273.
18.,873.
27.,1373.
*SPECIFIC HEAT
450.e6,273.
580.e6,873.
710.e6,1373.

原文

I have a keyword based materials data file. I want to parse data from this file and create variables and matrices to work on them in a Python script. The material file may have comment lines in the very top starting with the string "**", I simply want to ignore these and parse the data on other lines that follows a keyword of the form *keyword_1, and also their comma-delimited parameters of the form param_1=param1.

What is the fastest and easiest way to parse data from this kind of keyword based text file with Python? Can I use pandas for this and how?

below is a sample input material file: alloy_1.nam

*************************************************
**               ALLOY_1 MATERIAL DATA
*************************************************
*MATERIAL,NAME=ALLOY_1
*ELASTIC,TYPE=ISO
2.08E5,0.3,291.
2.04E5,0.3,422.
1.96E5,0.3,589.
1.85E5,0.3,755.
1.74E5,0.3,922.
1.61E5,0.3,1089.
1.52E5,0.3,1220.
*EXPANSION,TYPE=ISO,ZERO=293.
13.5E-6,291.
13.6E-6,422.
13.9E-6,589.
14.2E-6,755.
14.7E-6,922.
15.5E-6,1089.
16.4E-6,1200.
*DENSITY
7.92E-9
*CONDUCTIVITY
10.,273.
18.,873.
27.,1373.
*SPECIFIC HEAT
450.e6,273.
580.e6,873.
710.e6,1373.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

踏雪无痕 2025-01-17 09:33:01

方法是创建一个字典列表，其中每个元素都是一个键 = 类别名称和数据框形式的数据。我们必须使用临时字典来存储逗号分隔的数据，每次找到新类别时，这些数据都会附加到字典列表中。

使用 pandas.Dataframe() 创建数据帧

下面是代码：

with open('/Users/rpghosh/scikit_learn_data/test.txt') as f:
    lines = f.readlines()

# empty list of dataframes
lst_dfs = []

# empty dictionary to store each dataframe temporarily
d = {}
dfName = ''
PrevdfName = ''
createDF = False

for line in lines:
    
    if re.match('^\*{1}([A-Za-z0-9,=]{1,})\n
要查看输出，您将有一个数据帧列表，因此代码将是 -
for idx, df in enumerate(lst_dfs):
    print(f"{idx=}")
    print(df)
    print()

输出：
idx=0
{'elastic':   col1 col2 col3
0   21   22   23
1   11   12   13
2   31   32   33}

idx=1
{'expansion':   col1 col2 col3
0    4    5    6
1   41   15   16
2   42   25   26}

idx=2
{'density':     col1
0  12343}

idx=3
{'conductivity':   col1 col2 col3 col4 col5 col6
0   54   55   56   51   55   56
1   42   55   56   51   55   56
2   54   55   56   51   55   56
3   42   55   56   51   55   56}

, line):
        variable = line.lstrip('*').rstrip().split(',')
        PrevdfName = dfName
        dfName = variable[0]

        createDF = False

        if (not createDF) and len(d) > 0:
            df = pd.DataFrame(d)
            # append a dictionary which has category and dataframe
            lst_dfs.append( { PrevdfName : df} )
            d = {}
            
        
    elif re.match('^[0-9]([0-9,]){1,}\n
要查看输出，您将有一个数据帧列表，因此代码将是 -

输出：

,line):
        #dfName = PrevdfName
        data = line.rstrip().split(',')

        
        for i in range(len(data)):
        
            # customised column name 
            colName = 'col' + str(i+1)

            # if the colname is already present in the 
            # dictionary keys then append the element 
            # to existing key's list
            if colName in d.keys():
                d[colName].append( data[i])
            else:
                d[colName] = [data[i]]                
    else:
        createDF = False
        d={}

df = pd.DataFrame(d)
lst_dfs.append({ dfName : df})

要查看输出，您将有一个数据帧列表，因此代码将是 -

输出：

The way is to create a list of dictionaries where each element is a key = Category name and the data in the form of dataframe. We have to use a temporary dictionary to store the comma separated data which gets appended into the list of dictionary each time a new category is found.

Use the pandas.Dataframe() to create the dataframe

Below is the code:

with open('/Users/rpghosh/scikit_learn_data/test.txt') as f:
    lines = f.readlines()

# empty list of dataframes
lst_dfs = []

# empty dictionary to store each dataframe temporarily
d = {}
dfName = ''
PrevdfName = ''
createDF = False

for line in lines:
    
    if re.match('^\*{1}([A-Za-z0-9,=]{1,})\n
To view the output , you will have a list of dataframes , so the code will be -
for idx, df in enumerate(lst_dfs):
    print(f"{idx=}")
    print(df)
    print()

Output :
idx=0
{'elastic':   col1 col2 col3
0   21   22   23
1   11   12   13
2   31   32   33}

idx=1
{'expansion':   col1 col2 col3
0    4    5    6
1   41   15   16
2   42   25   26}

idx=2
{'density':     col1
0  12343}

idx=3
{'conductivity':   col1 col2 col3 col4 col5 col6
0   54   55   56   51   55   56
1   42   55   56   51   55   56
2   54   55   56   51   55   56
3   42   55   56   51   55   56}

, line):
        variable = line.lstrip('*').rstrip().split(',')
        PrevdfName = dfName
        dfName = variable[0]

        createDF = False

        if (not createDF) and len(d) > 0:
            df = pd.DataFrame(d)
            # append a dictionary which has category and dataframe
            lst_dfs.append( { PrevdfName : df} )
            d = {}
            
        
    elif re.match('^[0-9]([0-9,]){1,}\n
To view the output , you will have a list of dataframes , so the code will be -

Output :

,line):
        #dfName = PrevdfName
        data = line.rstrip().split(',')

        
        for i in range(len(data)):
        
            # customised column name 
            colName = 'col' + str(i+1)

            # if the colname is already present in the 
            # dictionary keys then append the element 
            # to existing key's list
            if colName in d.keys():
                d[colName].append( data[i])
            else:
                d[colName] = [data[i]]                
    else:
        createDF = False
        d={}

df = pd.DataFrame(d)
lst_dfs.append({ dfName : df})

To view the output , you will have a list of dataframes , so the code will be -

Output :

回复收藏 0 原文

~没有更多了~