根据另一列的值在 python 文件中添加一个额外的列

发布于 2025-01-16 17:15:17 字数 1620 浏览 0 评论 0原文

我有一个名为 temp.rule 的文件，其中包含 m 行和 n 列，其中每行看起来像 att1,att2,att3 ,...attN,类别,健身。假设我的文件如下所示：

A,B,C,1,0.67
D,E,F,1,0.84
P,Q,R,2,0.77
S,T,U,2,0.51
G,H,I,1,0.45
J,K,L,1,0.82
M,N,O,2,0.28
V,W,X,2,0.41
Y,Z,A,2,0.51

对于第一行，A、B、C 是属性，1 是类别，0.67 是适应度。现在我想根据每个班级的适应度对行进行排序，并希望分配排名。因此，在此之后，我的文件将类似于：

P,Q,R,2,0.77,5
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2
M,N,O,2,0.28,1
D,E,F,1,0.84,4
J,K,L,1,0.82,3
A,B,C,1,0.67,2
G,H,I,1,0.45,1

在第 2 类中，因为有 5 行，因此它们根据适合度排序，排名从 1 分配到 5，第 1 类也是如此，因为有 4 行，所以它们被排序根据适应度，排名从1到4分配。我已经完成了排序部分，但无法像这样分配排名。我还创建了字典来记录 1 类和 2 类的数量等等。 3.5 之所以存在，是因为如果出现平局，我想取连续排名的平均值。

下面我尝试一下：

rule_file_name = 'temp.rule'
rule_fp = open(rule_file_name)

rule_fit_val = []
for line in rule_fp.readlines():
    rule_fit_val.append(line.replace("\n","").split(","))
            
def convert_fitness_to_float(lst):            
    return lst[:-1] + [float(lst[-1])]
rule_fit_val =[convert_fitness_to_float(i) for i in rule_fit_val]
rule_fit_val = sorted(rule_fit_val, key=lambda x: x[-2:], reverse=True)


item_list = []
for i in rule_fit_val:
    i = list(map(str, i))
    s = ','.join(i).replace("\n","")
    item_list.append(s)
print(*item_list,sep='\n')

with open("check_sorted_fitness.rule", "w") as outfile:
    outfile.write("\n".join(item_list))
 
list1=[]   
for i in rule_fit_val:
    list1.append(i[-2])

freq = {}
for items in list1:
    freq[items] = list1.count(items)
my_dict_new = {k:v for k,v in freq.items()}

print(my_dict_new)

请帮我说一下如何分配这样的排名。

原文

I have a file say temp.rule which has say m rows and n columns where each row looks like att1,att2,att3,...attN,class,fitness. Suppose my file looks something like below:

A,B,C,1,0.67
D,E,F,1,0.84
P,Q,R,2,0.77
S,T,U,2,0.51
G,H,I,1,0.45
J,K,L,1,0.82
M,N,O,2,0.28
V,W,X,2,0.41
Y,Z,A,2,0.51

Where for the 1st row, A,B,C are the attributes and 1 is the class and 0.67 is the fitness. Now I want to sort the rows according to the fitness within each class and want to assign rank. So after this my file will look something like:

P,Q,R,2,0.77,5
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2
M,N,O,2,0.28,1
D,E,F,1,0.84,4
J,K,L,1,0.82,3
A,B,C,1,0.67,2
G,H,I,1,0.45,1

With in class 2 as there are 5 rows so they are sorted according to fitness and rank is assigned from 1 to 5 and same goes for class 1 i.e as there are 4 rows so they are sorted according to fitness and rank is assigned from 1 to 4. I have done the sorting part but unable to assign the rank like this. I have also created the dictionary to keep a count of how many class 1 and class 2 and so on. And the 3.5 is there because in case of a tie I want to take the average of the consecutive ranks.

Below I am giving my try:

rule_file_name = 'temp.rule'
rule_fp = open(rule_file_name)

rule_fit_val = []
for line in rule_fp.readlines():
    rule_fit_val.append(line.replace("\n","").split(","))
            
def convert_fitness_to_float(lst):            
    return lst[:-1] + [float(lst[-1])]
rule_fit_val =[convert_fitness_to_float(i) for i in rule_fit_val]
rule_fit_val = sorted(rule_fit_val, key=lambda x: x[-2:], reverse=True)


item_list = []
for i in rule_fit_val:
    i = list(map(str, i))
    s = ','.join(i).replace("\n","")
    item_list.append(s)
print(*item_list,sep='\n')

with open("check_sorted_fitness.rule", "w") as outfile:
    outfile.write("\n".join(item_list))
 
list1=[]   
for i in rule_fit_val:
    list1.append(i[-2])

freq = {}
for items in list1:
    freq[items] = list1.count(items)
my_dict_new = {k:v for k,v in freq.items()}

print(my_dict_new)

Please help me out saying how I can assign rank like that.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

厌味 2025-01-23 17:15:18

考虑使用 pandas 模块，那么你可以得到类似这样的东西：

import pandas as pd

df = pd.read_csv('temp.rule', names=['att1','att2','att3','class','fitness'])
#-----------------^^^^^^^^^ your file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ column headers
>>> df
'''
  att1 att2 att3  class  fitness
0    A    B    C      1     0.67
1    D    E    F      1     0.84
2    P    Q    R      2     0.77
3    S    T    U      2     0.51
4    G    H    I      1     0.45
5    J    K    L      1     0.82
6    M    N    O      2     0.28
7    V    W    X      2     0.41
8    Y    Z    A      2     0.51
'''
out = (df.assign(rank=df.groupby('class')['fitness'].
                 transform(lambda x: x.rank())).
       sort_values(['class','fitness'], ascending=False))

>>> out
'''
  att1 att2 att3  class  fitness  rank
2    P    Q    R      2     0.77   5.0
3    S    T    U      2     0.51   3.5
8    Y    Z    A      2     0.51   3.5
7    V    W    X      2     0.41   2.0
6    M    N    O      2     0.28   1.0
1    D    E    F      1     0.84   4.0
5    J    K    L      1     0.82   3.0
0    A    B    C      1     0.67   2.0
4    G    H    I      1     0.45   1.0
'''
out.to_csv('out.rule', header=False, index=False)
#-----------^^^^^^^^ new file
>>> out.rule
'''
P,Q,R,2,0.77,5.0
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2.0
M,N,O,2,0.28,1.0
D,E,F,1,0.84,4.0
J,K,L,1,0.82,3.0
A,B,C,1,0.67,2.0
G,H,I,1,0.45,1.0

UPD

现在，如果最后两列分别是“class”和“fitness”，那么文件中有多少列并不重要：

import pandas as pd

df = pd.read_csv('temp.rule', header=None)
df = df.rename(columns={df.columns[-1]:'fitness',df.columns[-2]:'class'})
out = (df.assign(rank=df.groupby('class')['fitness'].
                 transform(lambda x: x.rank())).
       sort_values(['class','fitness'],ascending=False))
out.to_csv('out.rule',header=False,index=False)

consider using pandas module, then you can get something like this:

import pandas as pd

df = pd.read_csv('temp.rule', names=['att1','att2','att3','class','fitness'])
#-----------------^^^^^^^^^ your file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ column headers
>>> df
'''
  att1 att2 att3  class  fitness
0    A    B    C      1     0.67
1    D    E    F      1     0.84
2    P    Q    R      2     0.77
3    S    T    U      2     0.51
4    G    H    I      1     0.45
5    J    K    L      1     0.82
6    M    N    O      2     0.28
7    V    W    X      2     0.41
8    Y    Z    A      2     0.51
'''
out = (df.assign(rank=df.groupby('class')['fitness'].
                 transform(lambda x: x.rank())).
       sort_values(['class','fitness'], ascending=False))

>>> out
'''
  att1 att2 att3  class  fitness  rank
2    P    Q    R      2     0.77   5.0
3    S    T    U      2     0.51   3.5
8    Y    Z    A      2     0.51   3.5
7    V    W    X      2     0.41   2.0
6    M    N    O      2     0.28   1.0
1    D    E    F      1     0.84   4.0
5    J    K    L      1     0.82   3.0
0    A    B    C      1     0.67   2.0
4    G    H    I      1     0.45   1.0
'''
out.to_csv('out.rule', header=False, index=False)
#-----------^^^^^^^^ new file
>>> out.rule
'''
P,Q,R,2,0.77,5.0
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2.0
M,N,O,2,0.28,1.0
D,E,F,1,0.84,4.0
J,K,L,1,0.82,3.0
A,B,C,1,0.67,2.0
G,H,I,1,0.45,1.0

UPD

now it does not matter how many columns are in your file if two last columns supposed to be 'class' and 'fitness' respectively:

import pandas as pd

df = pd.read_csv('temp.rule', header=None)
df = df.rename(columns={df.columns[-1]:'fitness',df.columns[-2]:'class'})
out = (df.assign(rank=df.groupby('class')['fitness'].
                 transform(lambda x: x.rank())).
       sort_values(['class','fitness'],ascending=False))
out.to_csv('out.rule',header=False,index=False)

回复收藏 0 原文

~没有更多了~