熊猫中有什么类似于dplyr' s'

发布于 2025-02-13 20:44:05 字数 242 浏览 0 评论 0 原文

我目前在数据分析中从R转换为Python,在那里的任何教程中我都没有看到一件事:Pandas中有类似于Dplyr的“列表列”的东西吗?

链接到重新:

I'm currently transitioning from R to Python in my data analysis, and there's one thing I haven't seen in any tutorials out there: is there anything in Pandas similar to dplyr's 'list columns' ?

Link to refence:
https://www.rstudio.com/resources/webinars/how-to-work-with-list-columns/

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小巷里的女流氓 2025-02-20 20:44:06

pandas 将在对象类型列中接受任何对象类型,包括列表。

df = pd.DataFrame()
df['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']
df.genre = df.genre.str.split(', ')
print(df, '\n', df.genre.dtype, '\n', type(df.genre[0]))

# Output:

                      genre
0   [drama, comedy, action]
1  [romance, sci-fi, drama]
2                  [horror]
 object
 <class 'list'>

我们可以看到:

  • 类型是列表列。
  • dtype 类型列是 object
  • genre 的第一个值的类型 code> list <列表< /代码>。

有许多与列表一起使用的 str 功能。

例如:

print(df.genre.str.join(' | '))

# Output:

0     drama | comedy | action
1    romance | sci-fi | drama
2                      horror
Name: genre, dtype: object
print(df.genre.str[::2])

# Output:

0     [drama, action]
1    [romance, drama]
2            [horror]
Name: genre, dtype: object

如果没有内置方法,则通常可以使用应用函数来完成:

print(df.genre.apply(lambda x: max(x)))

# Output:

0     drama
1    sci-fi
2    horror
Name: genre, dtype: object

请参阅文档以获取更多...


​彼此之间的嵌套数据帧,它是可能的,但是,我相信它被认为是一个反模式, pandas 将在此处与您打击:

data = {'df1': df, 'df2': df}
df2 = pd.Series(data.values(), data.keys()).to_frame()
df2.columns = ['dfs']
print(df2)

# Output:

                                                   dfs
df1                        genre
0   [drama, comedy...
df2                        genre
0   [drama, comedy...
print(df2['dfs'][0])

# Output:

                      genre
0   [drama, comedy, action]
1  [romance, sci-fi, drama]
2                  [horror]

<

  • a a href = ” https://stackoverflow.com/questions/17954520/pandas-dataframe-within-dataframe"> link1

link2将它们存储为 numpy 数组:

df2 = df2.applymap(np.array)
print(df2)
print(df2['dfs'][0])

# Output:

                                                   dfs
df1  [[[drama, comedy, action]], [[romance, sci-fi,...
df2  [[[drama, comedy, action]], [[romance, sci-fi,...

array([[list(['drama', 'comedy', 'action'])],
       [list(['romance', 'sci-fi', 'drama'])],
       [list(['horror'])]], dtype=object)

pandas will accept any object type, including lists, in an object type column.

df = pd.DataFrame()
df['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']
df.genre = df.genre.str.split(', ')
print(df, '\n', df.genre.dtype, '\n', type(df.genre[0]))

# Output:

                      genre
0   [drama, comedy, action]
1  [romance, sci-fi, drama]
2                  [horror]
 object
 <class 'list'>

We can see that:

  • genre is a column of lists.
  • The dtype of the genre column is object
  • The type of the first value of genre is list.

There are a number of str functions that work with lists.

For example:

print(df.genre.str.join(' | '))

# Output:

0     drama | comedy | action
1    romance | sci-fi | drama
2                      horror
Name: genre, dtype: object
print(df.genre.str[::2])

# Output:

0     [drama, action]
1    [romance, drama]
2            [horror]
Name: genre, dtype: object

Others can typically be done with an apply function if there isn't a built-in method:

print(df.genre.apply(lambda x: max(x)))

# Output:

0     drama
1    sci-fi
2    horror
Name: genre, dtype: object

See the documentation for more... pandas str functions


As for nesting dataframes within one another, it is possible but, I believe it's considered an anti-pattern, and pandas will fight you the whole way there:

data = {'df1': df, 'df2': df}
df2 = pd.Series(data.values(), data.keys()).to_frame()
df2.columns = ['dfs']
print(df2)

# Output:

                                                   dfs
df1                        genre
0   [drama, comedy...
df2                        genre
0   [drama, comedy...
print(df2['dfs'][0])

# Output:

                      genre
0   [drama, comedy, action]
1  [romance, sci-fi, drama]
2                  [horror]

See:

A possibly acceptable work around, would be storing them as numpy arrays:

df2 = df2.applymap(np.array)
print(df2)
print(df2['dfs'][0])

# Output:

                                                   dfs
df1  [[[drama, comedy, action]], [[romance, sci-fi,...
df2  [[[drama, comedy, action]], [[romance, sci-fi,...

array([[list(['drama', 'comedy', 'action'])],
       [list(['romance', 'sci-fi', 'drama'])],
       [list(['horror'])]], dtype=object)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文