高级字符串拆分 - 如何从其价格分开产品顺序？

发布于 2025-02-05 02:17:20 字数 1048 浏览 2 评论 0原文

我有一个CSV文件，该文件具有一个名为“ basket_items”的字段，它只是一串comma分开的项目。它具有产品名称和风味（如果适用）和价格。

前两排CSV文件的示例：

timestamp,store,customer_name,basket_items,total_price,cash_or_card
06/06/2022 09:00,Chesterfield,Stephanie Neyhart,"Large Flat white - 2.45, Large Flavoured iced latte - Vanilla - 3.25, Large Flavoured iced latte - Hazelnut - 3.25",8.95,CASH
06/06/2022 09:02,Chesterfield,Donna Marley,"Large Flavoured iced latte - Hazelnut - 3.25, Regular Latte - 2.15, Large Flavoured iced latte - Vanilla - 3.25",8.65,CARD

单行basket_items字段将是：

大调味冰拿铁 - 榛子-3.25，常规拿铁 - 2.15，大调味冰拿铁-3.25

我想成为能够在此CSV文件中迭代每一行，并能够单独获取产品名称和价格，然后随后将产品名称与其价格匹配。我正在努力弄清楚如何做到这一点。

也许我可以以字典格式或作为产品列表来使用它，我真的不确定如何做。我试图弄乱：

data = pd.read_csv("team1-project/example_transactions.csv")
df = pd.DataFrame(data)

#Drop null values
df = df.dropna()

basket_items_list = []

for row in df.basket_items:
    order = row.split(',')
    basket_items_list.append(order)

但是我比我想做的事情更接近而不是接近。感谢任何帮助。谢谢。

原文

I have a CSV file that has a field called 'basket_items' which is just a string of items seperated with a comma; it has the product name and flavour (if applicable) and the price.

Example of first two rows of CSV file:

timestamp,store,customer_name,basket_items,total_price,cash_or_card
06/06/2022 09:00,Chesterfield,Stephanie Neyhart,"Large Flat white - 2.45, Large Flavoured iced latte - Vanilla - 3.25, Large Flavoured iced latte - Hazelnut - 3.25",8.95,CASH
06/06/2022 09:02,Chesterfield,Donna Marley,"Large Flavoured iced latte - Hazelnut - 3.25, Regular Latte - 2.15, Large Flavoured iced latte - Vanilla - 3.25",8.65,CARD

Single row of basket_items field would be:

Large Flavoured iced latte - Hazelnut - 3.25, Regular Latte - 2.15, Large Flavoured iced latte - Vanilla - 3.25

I want to be able to iterate through each row in this CSV file and be able to obtain the product names and the prices seperately, and then later match up the product name to it's price. I am struggling to figure out how to do this.

Maybe I could have it in dictionary format, or as a list of products, I'm really not sure how to do it. I tried to mess around with:

data = pd.read_csv("team1-project/example_transactions.csv")
df = pd.DataFrame(data)

#Drop null values
df = df.dropna()

basket_items_list = []

for row in df.basket_items:
    order = row.split(',')
    basket_items_list.append(order)

But I got further rather than close than what I'm trying to do.
Would appreciate any help. Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

離人涙 2025-02-12 02:17:20

怎么办？：

regex = r'(?P<designation>(?!\s)[^,]*[^\s,]+)\s*-\s*(?P<price>\d+(?:\.\d+)?)'
df['basket_items'].str.extractall(regex)

输出：

                                   designation price
  match                                             
0 0                           Large Flat white  2.45
  1       Large Flavoured iced latte - Vanilla  3.25
  2      Large Flavoured iced latte - Hazelnut  3.25
1 0      Large Flavoured iced latte - Hazelnut  3.25
  1                              Regular Latte  2.15
  2       Large Flavoured iced latte - Vanilla  3.25

对于唯一值

regex = r'(?P<designation>(?!\s)[^,]*[^\s,]+)\s*-\s*(?P<price>\d+(?:\.\d+)?)'
(df['basket_items'].str.extractall(regex)
 .drop_duplicates(['designation'])
 .reset_index(drop=True)
)

：

                             designation price
0                       Large Flat white  2.45
1   Large Flavoured iced latte - Vanilla  3.25
2  Large Flavoured iced latte - Hazelnut  3.25
3                          Regular Latte  2.15

regex demox demo

What about using a regex?:

regex = r'(?P<designation>(?!\s)[^,]*[^\s,]+)\s*-\s*(?P<price>\d+(?:\.\d+)?)'
df['basket_items'].str.extractall(regex)

output:

                                   designation price
  match                                             
0 0                           Large Flat white  2.45
  1       Large Flavoured iced latte - Vanilla  3.25
  2      Large Flavoured iced latte - Hazelnut  3.25
1 0      Large Flavoured iced latte - Hazelnut  3.25
  1                              Regular Latte  2.15
  2       Large Flavoured iced latte - Vanilla  3.25

For the unique values:

regex = r'(?P<designation>(?!\s)[^,]*[^\s,]+)\s*-\s*(?P<price>\d+(?:\.\d+)?)'
(df['basket_items'].str.extractall(regex)
 .drop_duplicates(['designation'])
 .reset_index(drop=True)
)

output:

                             designation price
0                       Large Flat white  2.45
1   Large Flavoured iced latte - Vanilla  3.25
2  Large Flavoured iced latte - Hazelnut  3.25
3                          Regular Latte  2.15

regex demo

回复收藏 0 原文

~没有更多了~