使用自定义功能的PANDAS列聚合用于重复值
我有一个数据框,我想在列中汇总类似的ID。
X_train['freq_qd1'] = X_train.groupby('qid1')['qid1'].transform('count')
X_train['freq_qd2'] = X_train.groupby('qid2')['qid2'].transform('count')
我已经附加了数据框的快照以供参考。在此数据框架上,我尝试在QID1和QID2上应用自定义功能。 我尝试了以下代码:
def frequency(qid):
freq = []
for i in str(qid):
if i not in freq:
freq.append(i)
ids = set()
if i not in ids:
ids.add(i)
freq.append(ids)
return freq
def extract_simple_feat(fe) :
fe['question1'] = fe['question1'].fillna(' ')
fe['question2'] = fe['question2'].fillna(' ')
fe['qid1'] = fe['qid1']
fe['qid2'] = fe['qid2']
token_feat = fe.apply(lambda x : get_simple_features(x['question1'],
x['question2']), axis = 1)
fe['q1_len'] = list(map(lambda x : x[0], token_feat))
fe['q2_len'] = list(map(lambda x : x[1], token_feat))
fe['freq_qd1'] = fe.apply(lambda x: frequency(x['qid1']), axis = 1)
fe['freq_qd2'] = fe.apply(lambda x: frequency(x['qid2']), axis = 1)
fe['q1_n_words'] = list(map(lambda x : x[2], token_feat))
fe['q2_n_words'] = list(map(lambda x : x[3], token_feat))
fe['word_common'] = list(map(lambda x : x[4], token_feat))
fe['word_total'] = list(map(lambda x : x[5], token_feat))
fe['word_share'] = list(map(lambda x : x[6], token_feat))
return fe
X_train = extract_simple_feat(X_train)
应用了自己的实现后,我无法获得所需的结果。我正在为结果附加一个快照。
如果有人可以帮助我,因为我真的被困并且无法正确纠正它。
这是一个小文本输入:
qid1 qid2
23 24
25 26
27 28
318830 318831
359558 318831
384105 318831
413505 318831
451953 318831
530151 318831
我希望聚合输出为:
qid1 qid2 freq_qid1 freq_id2
23 24 1 1
25 26 1 1
27 28 1 1
318830 318831 1 6
359558 1 6
384105 1 6
413505 1 6
451953 1 6
530151 1 6
I have a dataframe and I want to aggregate the similar ids in column.
X_train['freq_qd1'] = X_train.groupby('qid1')['qid1'].transform('count')
X_train['freq_qd2'] = X_train.groupby('qid2')['qid2'].transform('count')
The above code I understand but i want to custom build a function to apply on multiple columns.
I have attached a snapshot of the dataframe for reference. On this dataframe i tried to apply a custom function on qid1 and qid2.
I tried the below code :
def frequency(qid):
freq = []
for i in str(qid):
if i not in freq:
freq.append(i)
ids = set()
if i not in ids:
ids.add(i)
freq.append(ids)
return freq
def extract_simple_feat(fe) :
fe['question1'] = fe['question1'].fillna(' ')
fe['question2'] = fe['question2'].fillna(' ')
fe['qid1'] = fe['qid1']
fe['qid2'] = fe['qid2']
token_feat = fe.apply(lambda x : get_simple_features(x['question1'],
x['question2']), axis = 1)
fe['q1_len'] = list(map(lambda x : x[0], token_feat))
fe['q2_len'] = list(map(lambda x : x[1], token_feat))
fe['freq_qd1'] = fe.apply(lambda x: frequency(x['qid1']), axis = 1)
fe['freq_qd2'] = fe.apply(lambda x: frequency(x['qid2']), axis = 1)
fe['q1_n_words'] = list(map(lambda x : x[2], token_feat))
fe['q2_n_words'] = list(map(lambda x : x[3], token_feat))
fe['word_common'] = list(map(lambda x : x[4], token_feat))
fe['word_total'] = list(map(lambda x : x[5], token_feat))
fe['word_share'] = list(map(lambda x : x[6], token_feat))
return fe
X_train = extract_simple_feat(X_train)
after applying my own implementation i am not getting the desired result. i am attaching a snapshot for the result i got.
The desired result wanted is below:
if someone can help me because i am really stuck and not able to rectify it properly.
here's a small text input :
qid1 qid2
23 24
25 26
27 28
318830 318831
359558 318831
384105 318831
413505 318831
451953 318831
530151 318831
I want aggregation output as :
qid1 qid2 freq_qid1 freq_id2
23 24 1 1
25 26 1 1
27 28 1 1
318830 318831 1 6
359558 1 6
384105 1 6
413505 1 6
451953 1 6
530151 1 6
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
给定:(我为边缘案例添加了一个额外的行)
执行:
输出:
如果我想做更多您正在做的事情...
给出:
做:
输出:
Given: (I added an extra row for an edge case)
Doing:
Output:
If I wanted to do more of what you're doing...
Given:
Doing:
Output: