排序在几分钟或季节中表达的持续时间
目的是选择持续不超过100分钟的电影或系列。 问题在于持续时间是在几分钟或季节中表示的。
代码:
import pandas as pd
import numpy as np
从kaggle:
url = 'netflix_titles.csv'
df1 = pd.read_csv(url)
df1.head()
查看有关“持续时间”
df1['duration'].head(10)
0 90 min
1 2 Seasons
2 1 Season
3 1 Season
4 2 Seasons
5 1 Season
6 91 min
7 125 min
8 9 Seasons
9 104 min
Name: duration, dtype: object
我的解决方案:
df_US['duree'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[0])
df_US['duree'] = df_US['duree'].astype('float')
df_US['duree_unit'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[-1])
df_US[(df_US['duree_unit'] == 'min') & (df_US['duree'] < 100)].head(3)
我收到很多警告:
c:\ users \ atapalou \ appdata \ local \ temp \ ipykernel_1436 \ 2173588888888.py:1:1: setterWithCopyWarning:一个值试图在一个副本上设置一个值 从数据框架切片。尝试使用.loc [row_indexer,col_indexer] = 值
请参阅文档中的注意事项: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-a-view-a-view-view-versus-a-copy df_us ['duree'] = df_us ['duration']。应用(lambda x: str(x))。应用(lambda x:x.split(''))。申请(lambda x:x [0]
解决方案不满足我,必须有一个更优雅的解决方案。
这个 atapalou
The goal is to select movies or series that do not last more than 100 minutes.
The problem is that the duration is expressed either in minutes or in number of seasons.
code:
import pandas as pd
import numpy as np
from kaggle:
url = 'netflix_titles.csv'
df1 = pd.read_csv(url)
df1.head()
view about 'duration'
df1['duration'].head(10)
0 90 min
1 2 Seasons
2 1 Season
3 1 Season
4 2 Seasons
5 1 Season
6 91 min
7 125 min
8 9 Seasons
9 104 min
Name: duration, dtype: object
My solution:
df_US['duree'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[0])
df_US['duree'] = df_US['duree'].astype('float')
df_US['duree_unit'] = df_US['duration'].apply(lambda x: str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[-1])
df_US[(df_US['duree_unit'] == 'min') & (df_US['duree'] < 100)].head(3)
I get lots of warnings like:
C:\Users\Atapalou\AppData\Local\Temp\ipykernel_1436\2173588888.py:1:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value insteadSee the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_US['duree'] = df_US['duration'].apply(lambda x:
str(x)).apply(lambda x:x.split(' ')).apply(lambda x: x[0]
This solution does not satisfy me, there must be a more elegant solution. Any idea?
Regards,
Atapalou
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
给定一个示例,
您可以解析到timedelta,然后选择(或排序)
基本上忽略了季节,因为这些字符串无法转换为数字值,
pd.to_numeric(df ['duration']。str.replace( 'min',''),errors =“ coerce”)
在这种情况下返回NAN(为TimeDELTA转换为NAT)。given the example
you can parse to timedelta and select (or also sort) like
That basically ignores the seasons as those strings cannot be converted to a numeric value,
pd.to_numeric(df['duration'].str.replace('min', ''), errors='coerce')
returns NaN in that cases (which converts to NaT for the timedelta).