填写缺失的值​使用功能
您好,
我正在研究一个缺少值的列('year_of_release')。数据类型为“ Timestamp64”。
首先,我创建了一个“拉”一年数字的函数,从某些游戏的名称旁边出现的列,最后,我将这些数据合并到了一个新列中 - 'lays_from_titles':
def get_year(row):
regex="\d{4}"
match=re.findall(regex, row)
for i in match:
if (int(i) > 1970) & (int(i) < 2017):
return int(I)
gaming['years_from_titles']=gaming['name'].apply(lambda x: get_year(str(x)))
我测试了功能,并且可以工作。
现在,我正在尝试创建另一个函数,这将填补原始列的那些丢失的年份 - 'year_of_release',但前提是它们出现在同一行上:
def year_row(row):
if math.isnan(row['year_of_release']):
return row['years_from_titles']
else:
return row['year_of_release']
gaming['year_of_release']=gaming.apply(year_row,axis=1)
但是我'运行代码我会得到TypeError:
/tmp/ipykernel_31/133192424.py in <module>
7 return row['year_of_release']
8
----> 9 gaming['year_of_release']=gaming.apply(year_row,axis=1)
/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in get_result(self)
183 return self.apply_raw()
184
--> 185 return self.apply_standard()
186
187 def apply_empty_result(self):
/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_standard(self)
274
275 def apply_standard(self):
--> 276 results, res_index = self.apply_series_generator()
277
278 # wrap results
/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_series_generator(self)
288 for i, v in enumerate(series_gen):
289 # ignore SettingWithCopy here in case the user mutates
--> 290 results[i] = self.f(v)
291 if isinstance(results[i], ABCSeries):
292 # If we have a view on v, we need to make a copy because
/tmp/ipykernel_31/133192424.py in year_row(row)
2 # but only if a year is found, on the same row, and in correspond to years_from_titles column.
3 def year_row(row):
----> 4 if math.isnan(row['year_of_release']):
5 return row['years_from_titles']
6 else:
TypeError: must be real number, not Timestamp.
如果有人知道如何克服这一点,我将非常感谢它。 谢谢
Hello,
I'm working on a column that has missing values ('year_of_release'). The data type is 'timestamp64'.
At first, I created a function that "pulls" the year numbers, from a column in which years appears next to the names of some games, and finally, I combined this data into a new column - 'years_from_titles':
def get_year(row):
regex="\d{4}"
match=re.findall(regex, row)
for i in match:
if (int(i) > 1970) & (int(i) < 2017):
return int(I)
gaming['years_from_titles']=gaming['name'].apply(lambda x: get_year(str(x)))
I tested the function and it works.
Now, I'm trying to create another function, which will fill in those missing years of the original column - 'year_of_release', but only if they appear on the same row:
def year_row(row):
if math.isnan(row['year_of_release']):
return row['years_from_titles']
else:
return row['year_of_release']
gaming['year_of_release']=gaming.apply(year_row,axis=1)
But when I'm running the code I get TypeError:
/tmp/ipykernel_31/133192424.py in <module>
7 return row['year_of_release']
8
----> 9 gaming['year_of_release']=gaming.apply(year_row,axis=1)
/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
7766 kwds=kwds,
7767 )
-> 7768 return op.get_result()
7769
7770 def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:
/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in get_result(self)
183 return self.apply_raw()
184
--> 185 return self.apply_standard()
186
187 def apply_empty_result(self):
/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_standard(self)
274
275 def apply_standard(self):
--> 276 results, res_index = self.apply_series_generator()
277
278 # wrap results
/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_series_generator(self)
288 for i, v in enumerate(series_gen):
289 # ignore SettingWithCopy here in case the user mutates
--> 290 results[i] = self.f(v)
291 if isinstance(results[i], ABCSeries):
292 # If we have a view on v, we need to make a copy because
/tmp/ipykernel_31/133192424.py in year_row(row)
2 # but only if a year is found, on the same row, and in correspond to years_from_titles column.
3 def year_row(row):
----> 4 if math.isnan(row['year_of_release']):
5 return row['years_from_titles']
6 else:
TypeError: must be real number, not Timestamp.
If anyone knows how to overcome this I would greatly appreciate it.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用
nan
与自身不相等的功能。或使用
series.mask
或
series.fillna
You can use the feature that
NaN
is not equal with itself.Or with
Series.mask
Or with
Series.fillna
这是一种更具体的方法,而不是使用
Math
模块检查缺失值。更改此行:
对此:
Instead of using the
math
module to check for missing values, here's a more pandas-specific approach.Change this line:
to this: