排序在几分钟或季节中表达的持续时间

发布于 2025-02-13 09:49:39 字数 1564 浏览 1 评论 0原文

目的是选择持续不超过100分钟的电影或系列。问题在于持续时间是在几分钟或季节中表示的。

代码：

import pandas as pd
import numpy as np

从kaggle：

url = 'netflix_titles.csv'
df1 = pd.read_csv(url)
df1.head()

查看有关“持续时间”

df1['duration'].head(10)
0       90 min
1    2 Seasons
2     1 Season
3     1 Season
4    2 Seasons
5     1 Season
6       91 min
7      125 min
8    9 Seasons
9      104 min
Name: duration, dtype: object

我的解决方案：

df_US['duree'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[0])
df_US['duree'] = df_US['duree'].astype('float')
df_US['duree_unit'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[-1])
df_US[(df_US['duree_unit'] == 'min') & (df_US['duree'] < 100)].head(3)

我收到很多警告：

c：\ users \ atapalou \ appdata \ local \ temp \ ipykernel_1436 \ 2173588888888.py：1：1： setterWithCopyWarning：一个值试图在一个副本上设置一个值从数据框架切片。尝试使用.loc [row_indexer，col_indexer] = 值
请参阅文档中的注意事项： https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-a-view-a-view-view-versus-a-copy df_us ['duree'] = df_us ['duration']。应用（lambda x： str（x））。应用（lambda x：x.split（''））。申请（lambda x：x [0]

解决方案不满足我，必须有一个更优雅的解决方案。

这个 atapalou

原文

The goal is to select movies or series that do not last more than 100 minutes.
The problem is that the duration is expressed either in minutes or in number of seasons.

code:

import pandas as pd
import numpy as np

from kaggle:

url = 'netflix_titles.csv'
df1 = pd.read_csv(url)
df1.head()

view about 'duration'

df1['duration'].head(10)
0       90 min
1    2 Seasons
2     1 Season
3     1 Season
4    2 Seasons
5     1 Season
6       91 min
7      125 min
8    9 Seasons
9      104 min
Name: duration, dtype: object

My solution:

df_US['duree'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[0])
df_US['duree'] = df_US['duree'].astype('float')
df_US['duree_unit'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[-1])
df_US[(df_US['duree_unit'] == 'min') & (df_US['duree'] < 100)].head(3)

I get lots of warnings like:

C:\Users\Atapalou\AppData\Local\Temp\ipykernel_1436\2173588888.py:1:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_US['duree'] = df_US['duration'].apply(lambda x:
str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[0]

This solution does not satisfy me, there must be a more elegant solution. Any idea?

Regards,
Atapalou

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月亮是我掰弯的 2025-02-20 09:49:39

给定一个示例，

df
Out[46]: 
    duration
0     90 min
1  2 Seasons
2   1 Season
3   1 Season
4  2 Seasons
5   1 Season
6     91 min
7    125 min
8  9 Seasons
9    104 min

您可以解析到timedelta，然后选择（或排序）

df['duration'] = pd.to_timedelta(
    pd.to_numeric(
        df['duration'].str.replace('min', ''), 
        errors='coerce'
    ), 
    unit='T'
)

df[df['duration']<=pd.Timedelta(minutes=100)]

         duration
0 0 days 01:30:00
6 0 days 01:31:00

基本上忽略了季节，因为这些字符串无法转换为数字值，pd.to_numeric（df ['duration']。str.replace（ 'min'，''），errors =“ coerce”）在这种情况下返回NAN（为TimeDELTA转换为NAT）。

given the example

df
Out[46]: 
    duration
0     90 min
1  2 Seasons
2   1 Season
3   1 Season
4  2 Seasons
5   1 Season
6     91 min
7    125 min
8  9 Seasons
9    104 min

you can parse to timedelta and select (or also sort) like

df['duration'] = pd.to_timedelta(
    pd.to_numeric(
        df['duration'].str.replace('min', ''), 
        errors='coerce'
    ), 
    unit='T'
)

df[df['duration']<=pd.Timedelta(minutes=100)]

         duration
0 0 days 01:30:00
6 0 days 01:31:00

That basically ignores the seasons as those strings cannot be converted to a numeric value, pd.to_numeric(df['duration'].str.replace('min', ''), errors='coerce') returns NaN in that cases (which converts to NaT for the timedelta).

回复收藏 0 原文

~没有更多了~