填写缺失的值＆＃x200b;使用功能

发布于 2025-02-13 05:57:02 字数 2686 浏览 2 评论 0原文

您好，

我正在研究一个缺少值的列（'year_of_release'）。数据类型为“ Timestamp64”。

首先，我创建了一个“拉”一年数字的函数，从某些游戏的名称旁边出现的列，最后，我将这些数据合并到了一个新列中 - 'lays_from_titles'：

def get_year(row):
    regex="\d{4}"
    match=re.findall(regex, row)
    
    for i in match:
        if (int(i) > 1970) & (int(i) < 2017):
            return int(I)

gaming['years_from_titles']=gaming['name'].apply(lambda x: get_year(str(x)))

我测试了功能，并且可以工作。

现在，我正在尝试创建另一个函数，这将填补原始列的那些丢失的年份 - 'year_of_release'，但前提是它们出现在同一行上：

def year_row(row):
   if math.isnan(row['year_of_release']):
      return row['years_from_titles']
   else:
      return row['year_of_release']

gaming['year_of_release']=gaming.apply(year_row,axis=1)

但是我'运行代码我会得到TypeError：

/tmp/ipykernel_31/133192424.py in <module>
      7         return row['year_of_release']
      8 
----> 9 gaming['year_of_release']=gaming.apply(year_row,axis=1)

/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

/tmp/ipykernel_31/133192424.py in year_row(row)
      2 # but only if a year is found, on the same row, and in correspond to years_from_titles column.
      3 def year_row(row):
----> 4     if math.isnan(row['year_of_release']):
      5         return row['years_from_titles']
      6     else:

TypeError: must be real number, not Timestamp.

如果有人知道如何克服这一点，我将非常感谢它。谢谢

原文

Hello,

I'm working on a column that has missing values ('year_of_release'). The data type is 'timestamp64'.

At first, I created a function that "pulls" the year numbers, from a column in which years appears next to the names of some games, and finally, I combined this data into a new column - 'years_from_titles':

def get_year(row):
    regex="\d{4}"
    match=re.findall(regex, row)
    
    for i in match:
        if (int(i) > 1970) & (int(i) < 2017):
            return int(I)

gaming['years_from_titles']=gaming['name'].apply(lambda x: get_year(str(x)))

I tested the function and it works.

Now, I'm trying to create another function, which will fill in those missing years of the original column - 'year_of_release', but only if they appear on the same row:

def year_row(row):
   if math.isnan(row['year_of_release']):
      return row['years_from_titles']
   else:
      return row['year_of_release']

gaming['year_of_release']=gaming.apply(year_row,axis=1)

But when I'm running the code I get TypeError:

/tmp/ipykernel_31/133192424.py in <module>
      7         return row['year_of_release']
      8 
----> 9 gaming['year_of_release']=gaming.apply(year_row,axis=1)

/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

/tmp/ipykernel_31/133192424.py in year_row(row)
      2 # but only if a year is found, on the same row, and in correspond to years_from_titles column.
      3 def year_row(row):
----> 4     if math.isnan(row['year_of_release']):
      5         return row['years_from_titles']
      6     else:

TypeError: must be real number, not Timestamp.

If anyone knows how to overcome this I would greatly appreciate it.
Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小草泠泠 2025-02-20 05:57:02

您可以使用nan与自身不相等的功能。

def year_row(row):
   if row['year_of_release'] != row['year_of_release']:
      return row['years_from_titles']
   else:
      return row['year_of_release']

gaming['year_of_release']=gaming.apply(year_row,axis=1)

或使用series.mask

gaming['year_of_release'] = gaming['year_of_release'].mask(gaming['year_of_release'].isna(), gaming['years_from_titles'])

或series.fillna

gaming['year_of_release'] = gaming['year_of_release'].fillna(gaming['years_from_titles'])

You can use the feature that NaN is not equal with itself.

def year_row(row):
   if row['year_of_release'] != row['year_of_release']:
      return row['years_from_titles']
   else:
      return row['year_of_release']

gaming['year_of_release']=gaming.apply(year_row,axis=1)

Or with Series.mask

gaming['year_of_release'] = gaming['year_of_release'].mask(gaming['year_of_release'].isna(), gaming['years_from_titles'])

Or with Series.fillna

gaming['year_of_release'] = gaming['year_of_release'].fillna(gaming['years_from_titles'])

回复收藏 0 原文

淑女气质 2025-02-20 05:57:02

这是一种更具体的方法，而不是使用Math模块检查缺失值。

更改此行：

if math.isnan(row['year_of_release']):

对此：

if row['year_of_release'].isna():

Instead of using the math module to check for missing values, here's a more pandas-specific approach.

Change this line:

if math.isnan(row['year_of_release']):

to this:

if row['year_of_release'].isna():

回复收藏 0 原文

~没有更多了~

关于作者

您的好友蓝忘机已上羡

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

填写缺失的值＆＃x200b;使用功能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

著墨染雨君画夕

屋檐

最后的乘客

眼前雾蒙蒙

kidking

kill136

友情链接

填写缺失的值＆＃x200b;使用功能

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

著墨染雨君画夕

屋檐

最后的乘客

眼前雾蒙蒙

kidking

kill136

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。