填写缺失的值​使用功能

发布于 2025-02-13 05:57:02 字数 2686 浏览 2 评论 0原文

您好

我正在研究一个缺少值的列('year_of_release')。数据类型为“ Timestamp64”。

首先,我创建了一个“拉”一年数字的函数,从某些游戏的名称旁边出现的列,最后,我将这些数据合并到了一个新列中 - 'lays_from_titles':

def get_year(row):
    regex="\d{4}"
    match=re.findall(regex, row)
    
    for i in match:
        if (int(i) > 1970) & (int(i) < 2017):
            return int(I)

gaming['years_from_titles']=gaming['name'].apply(lambda x: get_year(str(x)))

我测试了功能,并且可以工作。

现在,我正在尝试创建另一个函数,这将填补原始列的那些丢失的年份 - 'year_of_release',但前提是它们出现在同一行上:

def year_row(row):
   if math.isnan(row['year_of_release']):
      return row['years_from_titles']
   else:
      return row['year_of_release']

gaming['year_of_release']=gaming.apply(year_row,axis=1)

但是我'运行代码我会得到TypeError:

/tmp/ipykernel_31/133192424.py in <module>
      7         return row['year_of_release']
      8 
----> 9 gaming['year_of_release']=gaming.apply(year_row,axis=1)

/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

/tmp/ipykernel_31/133192424.py in year_row(row)
      2 # but only if a year is found, on the same row, and in correspond to years_from_titles column.
      3 def year_row(row):
----> 4     if math.isnan(row['year_of_release']):
      5         return row['years_from_titles']
      6     else:

TypeError: must be real number, not Timestamp.

如果有人知道如何克服这一点,我将非常感谢它。 谢谢

Hello,

I'm working on a column that has missing values ('year_of_release'). The data type is 'timestamp64'.

At first, I created a function that "pulls" the year numbers, from a column in which years appears next to the names of some games, and finally, I combined this data into a new column - 'years_from_titles':

def get_year(row):
    regex="\d{4}"
    match=re.findall(regex, row)
    
    for i in match:
        if (int(i) > 1970) & (int(i) < 2017):
            return int(I)

gaming['years_from_titles']=gaming['name'].apply(lambda x: get_year(str(x)))

I tested the function and it works.

Now, I'm trying to create another function, which will fill in those missing years of the original column - 'year_of_release', but only if they appear on the same row:

def year_row(row):
   if math.isnan(row['year_of_release']):
      return row['years_from_titles']
   else:
      return row['year_of_release']

gaming['year_of_release']=gaming.apply(year_row,axis=1)

But when I'm running the code I get TypeError:

/tmp/ipykernel_31/133192424.py in <module>
      7         return row['year_of_release']
      8 
----> 9 gaming['year_of_release']=gaming.apply(year_row,axis=1)

/opt/conda/lib/python3.9/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   7766             kwds=kwds,
   7767         )
-> 7768         return op.get_result()
   7769 
   7770     def applymap(self, func, na_action: Optional[str] = None) -> DataFrame:

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in get_result(self)
    183             return self.apply_raw()
    184 
--> 185         return self.apply_standard()
    186 
    187     def apply_empty_result(self):

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_standard(self)
    274 
    275     def apply_standard(self):
--> 276         results, res_index = self.apply_series_generator()
    277 
    278         # wrap results

/opt/conda/lib/python3.9/site-packages/pandas/core/apply.py in apply_series_generator(self)
    288             for i, v in enumerate(series_gen):
    289                 # ignore SettingWithCopy here in case the user mutates
--> 290                 results[i] = self.f(v)
    291                 if isinstance(results[i], ABCSeries):
    292                     # If we have a view on v, we need to make a copy because

/tmp/ipykernel_31/133192424.py in year_row(row)
      2 # but only if a year is found, on the same row, and in correspond to years_from_titles column.
      3 def year_row(row):
----> 4     if math.isnan(row['year_of_release']):
      5         return row['years_from_titles']
      6     else:

TypeError: must be real number, not Timestamp.

If anyone knows how to overcome this I would greatly appreciate it.
Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小草泠泠 2025-02-20 05:57:02

您可以使用nan与自身不相等的功能。

def year_row(row):
   if row['year_of_release'] != row['year_of_release']:
      return row['years_from_titles']
   else:
      return row['year_of_release']

gaming['year_of_release']=gaming.apply(year_row,axis=1)

或使用series.mask

gaming['year_of_release'] = gaming['year_of_release'].mask(gaming['year_of_release'].isna(), gaming['years_from_titles'])

series.fillna

gaming['year_of_release'] = gaming['year_of_release'].fillna(gaming['years_from_titles'])

You can use the feature that NaN is not equal with itself.

def year_row(row):
   if row['year_of_release'] != row['year_of_release']:
      return row['years_from_titles']
   else:
      return row['year_of_release']

gaming['year_of_release']=gaming.apply(year_row,axis=1)

Or with Series.mask

gaming['year_of_release'] = gaming['year_of_release'].mask(gaming['year_of_release'].isna(), gaming['years_from_titles'])

Or with Series.fillna

gaming['year_of_release'] = gaming['year_of_release'].fillna(gaming['years_from_titles'])
淑女气质 2025-02-20 05:57:02

这是一种更具体的方法,而不是使用Math模块检查缺失值。

更改此行:

if math.isnan(row['year_of_release']):

对此:

if row['year_of_release'].isna():

Instead of using the math module to check for missing values, here's a more pandas-specific approach.

Change this line:

if math.isnan(row['year_of_release']):

to this:

if row['year_of_release'].isna():
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文