当前位置：文江博客话题详情

如何提取特定子字符串并将文本与 pandas 数据框中的数字分开？

发布于 2025-01-20 06:29:16 字数 390 浏览 4 评论 0原文

我在数据框中有一些以下格式的数据。请参阅下面的图片链接

当前输出

我试图解决的问题有两个

方面工资列，我想将文本和数字分开并提取值。只要有一个范围，我就想取平均值
取决于工资是否为每小时/每周/每年等，我想根据是否存在子字符串字符（例如（'年'，'月'， 'week'、'hour' 等）

最终输出应如下图所示

谢谢！

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夜光 2025-01-27 06:29:16

这可以对你有用

for i in range(len(df)):
    splitted_value = df["salary"].iloc[i].split()
    salary_type = (splitted_value[-1]+"ly").title()
    if "-" in splitted_value:
        ranged_salary = [int(x.replace("$","").replace(",","")) for x in splitted_value if "$" in x]
        salary = sum(ranged_salary)/len(ranged_salary)
    else:
        salary = int(splitted_value[-3].replace("$","").replace(",",""))
    df.loc[i,"salary_value"] = salary
    df.loc[i,"salary_type"] = salary_type

This can work for you

for i in range(len(df)):
    splitted_value = df["salary"].iloc[i].split()
    salary_type = (splitted_value[-1]+"ly").title()
    if "-" in splitted_value:
        ranged_salary = [int(x.replace("
quot;,"").replace(",","")) for x in splitted_value if "quot; in x]
        salary = sum(ranged_salary)/len(ranged_salary)
    else:
        salary = int(splitted_value[-3].replace("quot;,"").replace(",",""))
    df.loc[i,"salary_value"] = salary
    df.loc[i,"salary_type"] = salary_type

回复收藏 0 原文

你与昨日 2025-01-27 06:29:16

这是一个有趣的问题，但下次请提供输入数据作为我们可以复制/粘贴的内容。

您需要一个函数，将工资数据的字符串转换为值和工资类型。

您可以解析字符串中的字符以查找数字，并在遇到 -（破折号）字符时使用布尔开关，以防您需要计算平均值。

lst = [
    "Up to $80,000 a year",
    "$8,500 - $10,500 a month",
    "$25 - $40 an hour",
    "$1,546 a week"
]


def convert(salary_data: str):
    value = ""
    value_max = ""
    need_average = False
    # iterate over the characters in the string
    for c in salary_data:
        if c.isdigit():
            if need_average:
                value_max += c
            else:
                value += c
        elif c == "-":
            # switch to adding to value_max after finding the dash
            need_average = True
    if not need_average:
        # slight cheating for the f-string below
        value_max = value
    value = f"{(int(value) + int(value_max)) / 2:.2f}"
    if "hour" in salary_data:
        salary_type = "hourly"
    elif "week" in salary_data:
        salary_type = "weekly"
    elif "month" in salary_data:
        salary_type = "monthly"
    else:
        # use this as fallback
        salary_type = "yearly"
    return value, salary_type


for element in lst:
    value, salary_type = convert(element)
    print(value, salary_type)

输出

80000.00 yearly
9500.00 monthly
32.50 hourly
1546.00 weekly

This is an interesting question, but next time please provide the input data as something we can copy/paste.

What you need is a function that converts the string for the salary data into the value and the salary type.

You parse over the characters in the string to find the numbers, and use a boolean switch when you encounter the - (dash) character, in case you need to calculate an average.

lst = [
    "Up to $80,000 a year",
    "$8,500 - $10,500 a month",
    "$25 - $40 an hour",
    "$1,546 a week"
]


def convert(salary_data: str):
    value = ""
    value_max = ""
    need_average = False
    # iterate over the characters in the string
    for c in salary_data:
        if c.isdigit():
            if need_average:
                value_max += c
            else:
                value += c
        elif c == "-":
            # switch to adding to value_max after finding the dash
            need_average = True
    if not need_average:
        # slight cheating for the f-string below
        value_max = value
    value = f"{(int(value) + int(value_max)) / 2:.2f}"
    if "hour" in salary_data:
        salary_type = "hourly"
    elif "week" in salary_data:
        salary_type = "weekly"
    elif "month" in salary_data:
        salary_type = "monthly"
    else:
        # use this as fallback
        salary_type = "yearly"
    return value, salary_type


for element in lst:
    value, salary_type = convert(element)
    print(value, salary_type)

output

80000.00 yearly
9500.00 monthly
32.50 hourly
1546.00 weekly

回复收藏 0 原文

~没有更多了~

关于作者

撩人痒

暂无简介

文章

26 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

如何提取特定子字符串并将文本与 pandas 数据框中的数字分开？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如何提取特定子字符串并将文本与 pandas 数据框中的数字分开？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。