应用 lambda 或定义一个函数在 dask 数据框中返回 1 else 0

发布于 2025-01-13 18:42:18 字数 657 浏览 3 评论 0原文

可能很简单,但我仍在学习。

我正在 dask 数据框中创建一个新列,其中的值将来自提取 str ddmmyyyy 中 date 列的最后四个 str 字符。 我所做的:

  1. 有一个 inv_years 列表,
  2. 提取字符串 date 的前四个字符,
  3. 尝试定义一个函数,如果提取的年份在 inv_years 列表中,则在新列中返回 1,否则返回 0。

问题:如何以更少的行数编写一个工作函数或更好的 lambda 函数

def valid_yr(x):
    inv_years = ['1921','1969','2026','2030','2041','2060','2062']
    validity_year = ddf['string_ddmmyyyy'].str[-4:] #extract the last four to get the year
    if validity_year.isin(inv_years): 
        x = 1
    else:
        x = 0
    return x

#create a new column and apply function
ddf['validity_year']= ??? # what to write here?

Probably easy, but I am still learning.

I am creating a new column in dask dataframe where the value will come from after extracting the last four str characters of date column in str ddmmyyyy.
What I did:

  1. have is a list of inv_years
  2. extract the lst four characters of the string date
  3. tried to define a function that if the extracted years are in the inv_years list, return 1 else 0 in a new column.

Issue: How do I write a working function or better in fewer lines a lambda function

def valid_yr(x):
    inv_years = ['1921','1969','2026','2030','2041','2060','2062']
    validity_year = ddf['string_ddmmyyyy'].str[-4:] #extract the last four to get the year
    if validity_year.isin(inv_years): 
        x = 1
    else:
        x = 0
    return x

#create a new column and apply function
ddf['validity_year']= ??? # what to write here?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

幸福不弃 2025-01-20 18:42:18

我可以想出的一种非常脾气暴躁的方法是,

inv_years = ['1921','1969','2026','2030','2041','2060','2062']
ddf['validity_year'] = ddf.apply(lambda row: 1 if row.string_ddmmyyyy[-4:] in inv_years else 0, axis=1)

或者尝试让您的方法发挥作用,我们首先稍微修改您的函数,以便它的参数是单行。

def valid_yr(row):
    inv_years = ['1921','1969','2026','2030','2041','2060','2062']
    validity_year = row.string_ddmmyyyy[-4:]
    if validity_year in inv_years:
        x = 1
    else:
        x = 0
    return x

现在我们可以将此函数应用于所有行。

ddf['validity_year'] = ddf.apply(valid_yr, axis=1)

A very grumpy way I could come up with is

inv_years = ['1921','1969','2026','2030','2041','2060','2062']
ddf['validity_year'] = ddf.apply(lambda row: 1 if row.string_ddmmyyyy[-4:] in inv_years else 0, axis=1)

or to try and get your approach working we initially modify your function a bit so as it's argument is a single row.

def valid_yr(row):
    inv_years = ['1921','1969','2026','2030','2041','2060','2062']
    validity_year = row.string_ddmmyyyy[-4:]
    if validity_year in inv_years:
        x = 1
    else:
        x = 0
    return x

Now we can apply this function to all rows.

ddf['validity_year'] = ddf.apply(valid_yr, axis=1)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文