如何将ISO持续时间转换为Pyspark或Python的分钟

发布于 2025-01-29 02:39:00 字数 1104 浏览 1 评论 0 原文

我在python数据框中有一个列,该列的值如下

​ 输出应为:

  • 15分钟
  • 90分钟
  • 5分钟
import pandas as pd
import re
import json
from datetime import datetime
currentdate=datetime.today().strftime('%Y/%m/%d')
absolutepath='/project/sniper/'+'/'+currentdate+'/*.json'

df = pd.read_json('absolutepath', lines=True)
df_sugar = df.loc[df['ingredients'].str.contains("Sugar|sugar", case=True)]
def convertToInteger(my_str):
    if 'H' in my_str and PT in my_str:
      characters_to_remove_H = "H"
      for l in characters_to_remove_H:
           new_string_hour = my_str.replace(l, "*60")
           new_p=int(new_string_hour.replace(PT,""))
      return  pd.Series(new_p)
   
    elif  'M' in my_str and PT in my_str:
        characters_to_remove_M = "PTM"
        for m in characters_to_remove_M:
            new_string_minute = int(my_str.replace(m, ""))
        return  pd.Series(new_string_minute)

df2[["new_col_2"]] = df_beef["prepTime"].apply(convertToInteger)

I have a column in the python data frame which has values like below

enter image description here

I am looking to convert the ISO format in minutes
The output should be :

  • 15 minutes
  • 90 minutes
  • 5 minutes
import pandas as pd
import re
import json
from datetime import datetime
currentdate=datetime.today().strftime('%Y/%m/%d')
absolutepath='/project/sniper/'+'/'+currentdate+'/*.json'

df = pd.read_json('absolutepath', lines=True)
df_sugar = df.loc[df['ingredients'].str.contains("Sugar|sugar", case=True)]
def convertToInteger(my_str):
    if 'H' in my_str and PT in my_str:
      characters_to_remove_H = "H"
      for l in characters_to_remove_H:
           new_string_hour = my_str.replace(l, "*60")
           new_p=int(new_string_hour.replace(PT,""))
      return  pd.Series(new_p)
   
    elif  'M' in my_str and PT in my_str:
        characters_to_remove_M = "PTM"
        for m in characters_to_remove_M:
            new_string_minute = int(my_str.replace(m, ""))
        return  pd.Series(new_string_minute)

df2[["new_col_2"]] = df_beef["prepTime"].apply(convertToInteger)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无力看清 2025-02-05 02:39:00

假设您的数据是这样的(您可能有更多列,但是您明白了):

df = pd.DataFrame(['PT15M', 'PT1H30M', 'PT5M'], columns=['prepTime'])

我会使用 isodate 软件包具有更强大的解决问题的方法

def get_minutes(iso_str):
    iso_timedelta = isodate.parse_duration(iso_str)
    return iso_timedelta.seconds // 60
    
df['prepTimeMinutes'] = df['prepTime'].apply(get_minutes)

或ONELINER:

df['prepTimeMinutes'] = df['prepTime'].apply(lambda x: isodate.parse_duration(x).seconds // 60)

如果您不想使用 isodate ,则可以应用自定义方法。根据您的要求,您可以将其推广,但是如果您的所有字符串均以格式 “ Pt [< hours> h]< minees> m“” ,您可以简单地做类似的事情

import re

def get_minutes(iso_str):
    hours = re.search(r"(\d+)H", iso_str)
    hours = hours.group(1) if hours else 0
    minutes = re.search(r"(\d+)M", iso_str)
    minutes = minutes.group(1) if minutes else 0
    
    return int(hours) * 60 + int(minutes)

df['prepTimeMinutes'] = df['prepTime'].apply(get_minutes)

:概括它,无论如何我建议您看看 iSodate 源

还有许多其他方法可以做同样的事情,我希望这给您一些有关如何进行的提示:)

Assuming your data is something like this (you probably have more columns, but you get the point):

df = pd.DataFrame(['PT15M', 'PT1H30M', 'PT5M'], columns=['prepTime'])

I'd use isodate package to have a more robust approach to the problem

def get_minutes(iso_str):
    iso_timedelta = isodate.parse_duration(iso_str)
    return iso_timedelta.seconds // 60
    
df['prepTimeMinutes'] = df['prepTime'].apply(get_minutes)

Or oneliner:

df['prepTimeMinutes'] = df['prepTime'].apply(lambda x: isodate.parse_duration(x).seconds // 60)

If you don't want to use isodate, you could apply a custom approach. According to your requirements, you may generalize it, but if all of your string are in the format "PT[<hours>H]<minutes>M" you could simply do something like:

import re

def get_minutes(iso_str):
    hours = re.search(r"(\d+)H", iso_str)
    hours = hours.group(1) if hours else 0
    minutes = re.search(r"(\d+)M", iso_str)
    minutes = minutes.group(1) if minutes else 0
    
    return int(hours) * 60 + int(minutes)

df['prepTimeMinutes'] = df['prepTime'].apply(get_minutes)

To generalize it, I'd anyway suggest you to take a look at isodate source.

There are many other ways to do the same thing, I hope this gives you some hints on how to proceed :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文