Python将计数器转换为数据框列

发布于 2025-01-31 02:53:18 字数 2042 浏览 3 评论 0原文

我在这里没有找到问题的答案，我想知道我是否可以得到一些帮助（对这些链接表示歉意，我还不能嵌入图像）。

我已经将计数器对象存储在我的数据框架中，还希望它们作为每个计数元素的列中添加到数据框架中。

启动数据

data = {
    "words": ["ABC", "BCDB", "CDE", "F"],
    "stuff": ["abc", "bcda", "cde", "f"]
}
df = pd.DataFrame(data)

初步数据框架

patternData = {
    "name": ["A", "B", "C", "D", "E", "F"],
    "rex": ["A{1}", "B{1}", "C{1}", "D{1}", "E{1}", "F{1}"]
}
patterns = pd.DataFrame(patternData)

模式数据框架

def countFound(ps):
    result = Counter()
    for index, row in patterns.iterrows():
        findName = row['name']
        findRex = row['rex']
        found = re.findall(findRex, ps)
        if (len(found) > 0):
            result.update({findName:len(found)})
    return result

df['found'] = df['words'].apply(lambda x: countFound(x))

结果

单词	a href =“ https://i.sstatic.net/2d0wj.png” rel =“ nofollow noreferrer”>所需的	发现	a	b	c	d e	f	abc
acb	acb	`{'a''： 1，'b'：1，'c'：1}`	1	1	1	0	0	0
bcd bcd	bcd	`{'b'：1，'c'：1，'d'：1}`	0	2	1	1	0	0
CDE	CDE	`{'C'：1，'d'：1，'e'：1}`	0	0	1	1 1	1	0
f f f f	f f	`{'f '：1}`	0	0	0	0	0	1

原文

I haven't been able to find an answer here specific to my issue and I'm wondering if I could get some help (apologies for the links, I'm not allowed to embed images yet).

I have stored Counter objects within my DataFrame and also want them added to the DataFrame as a column for each counted element.

Beginning data

data = {
    "words": ["ABC", "BCDB", "CDE", "F"],
    "stuff": ["abc", "bcda", "cde", "f"]
}
df = pd.DataFrame(data)

Preliminary Data Frame

patternData = {
    "name": ["A", "B", "C", "D", "E", "F"],
    "rex": ["A{1}", "B{1}", "C{1}", "D{1}", "E{1}", "F{1}"]
}
patterns = pd.DataFrame(patternData)

Pattern DataFrame

def countFound(ps):
    result = Counter()
    for index, row in patterns.iterrows():
        findName = row['name']
        findRex = row['rex']
        found = re.findall(findRex, ps)
        if (len(found) > 0):
            result.update({findName:len(found)})
    return result

df['found'] = df['words'].apply(lambda x: countFound(x))

Found DataFrame

Desired Results

words	stuff	found	A	B	C	D	E	F
ABC	acb	`{'A': 1, 'B': 1, 'C': 1}`	1	1	1	0	0	0
BCD	bcd	`{'B': 1, 'C': 1, 'D': 1}`	0	2	1	1	0	0
CDE	cde	`{'C': 1, 'D': 1, 'E': 1}`	0	0	1	1	1	0
F	f	`{'F': 1}`	0	0	0	0	0	1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

江湖正好 2025-02-07 02:53:18

您可以使用

df.join(pd.json_normalize(df['found']).fillna(0, downcast='infer'))

输出：

  words stuff                     found  A  B  C  D  E  F
0   ABC   abc  {'A': 1, 'B': 1, 'C': 1}  1  1  1  0  0  0
1  BCDB  bcda  {'B': 2, 'C': 1, 'D': 1}  0  2  1  1  0  0
2   CDE   cde  {'C': 1, 'D': 1, 'E': 1}  0  0  1  1  1  0
3     F     f                  {'F': 1}  0  0  0  0  0  1

您也可以无需自定义功能即可直接获取列。为此，使用命名捕获组和” Noreflow noreferrer> str.Extractall ：

regex = ('(?P<'+patterns['name']+'>'+patterns['rex']+')').str.cat(sep='|')
# (?P<A>A{1})|(?P<B>B{1})|(?P<C>C{1})|(?P<D>D{1})|(?P<E>E{1})|(?P<F>F{1})

df2 = df.join(df
 ['words']
 .str.extractall(regex)
 .groupby(level=0).count()
 )

或variant nond in命名捕获组和设置列表以后将列名调整为：

regex = ('('+patterns['rex']+')').str.cat(sep='|')
# (A{1})|(B{1})|(C{1})|(D{1})|(E{1})|(F{1})

print(df.join(df
 ['words']
 .str.extractall(regex)
 .set_axis(patterns['name'], axis=1)
 .groupby(level=0).count()
 ))

输出：

  words stuff  A  B  C  D  E  F
0   ABC   abc  1  1  1  0  0  0
1  BCDB  bcda  0  2  1  1  0  0
2   CDE   cde  0  0  1  1  1  0
3     F     f  0  0  0  0  0  1

You can use json_normalize:

df.join(pd.json_normalize(df['found']).fillna(0, downcast='infer'))

Output:

  words stuff                     found  A  B  C  D  E  F
0   ABC   abc  {'A': 1, 'B': 1, 'C': 1}  1  1  1  0  0  0
1  BCDB  bcda  {'B': 2, 'C': 1, 'D': 1}  0  2  1  1  0  0
2   CDE   cde  {'C': 1, 'D': 1, 'E': 1}  0  0  1  1  1  0
3     F     f                  {'F': 1}  0  0  0  0  0  1

You can also directly get the columns without your custom function. For this use a dynamically crafted regex with named capturing groups and str.extractall:

regex = ('(?P<'+patterns['name']+'>'+patterns['rex']+')').str.cat(sep='|')
# (?P<A>A{1})|(?P<B>B{1})|(?P<C>C{1})|(?P<D>D{1})|(?P<E>E{1})|(?P<F>F{1})

df2 = df.join(df
 ['words']
 .str.extractall(regex)
 .groupby(level=0).count()
 )

Or variant without named capturing groups and settings up the column names later:

regex = ('('+patterns['rex']+')').str.cat(sep='|')
# (A{1})|(B{1})|(C{1})|(D{1})|(E{1})|(F{1})

print(df.join(df
 ['words']
 .str.extractall(regex)
 .set_axis(patterns['name'], axis=1)
 .groupby(level=0).count()
 ))

Output:

  words stuff  A  B  C  D  E  F
0   ABC   abc  1  1  1  0  0  0
1  BCDB  bcda  0  2  1  1  0  0
2   CDE   cde  0  0  1  1  1  0
3     F     f  0  0  0  0  0  1

回复收藏 0 原文

血之狂魔 2025-02-07 02:53:18

计数器的行为很像字典。在字典列表上调用pd.dataframe将为您提供计数值的矩阵：

found = df['words'].apply(countFound).to_list()
pd.concat([
    df.assign(found=found),
    pd.DataFrame(found).fillna(0).astype("int")
], axis=1)

A Counter behaves a lot like a dictionary. Calling pd.DataFrame on a list of dictionaries will give you the matrix of counted values:

found = df['words'].apply(countFound).to_list()
pd.concat([
    df.assign(found=found),
    pd.DataFrame(found).fillna(0).astype("int")
], axis=1)

回复收藏 0 原文

~没有更多了~

关于作者

掩于岁月

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

Python将计数器转换为数据框列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

Python将计数器转换为数据框列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。