删除除逗号以外的所有字符和数字

发布于 2025-01-17 14:40:32 字数 411 浏览 1 评论 0原文

我正在尝试从数据框列中的字符串中删除所有字符,但请保留逗号,但它仍然可以删除包括逗号在内的所有内容。

我知道之前已经问过这个问题,但我尝试了很多答案,并且都删除了逗号。

df[new_text_field_name] = df[new_text_field_name].apply(lambda elem: re.sub(r"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", str(elem)))

示例文本:

'100%聚酯,纸板(最小30%再生),100%聚丙烯',

'polyerter,纸板,纸板,聚丙烯',,

I am trying to remove all the characters from string in the DataFrame column but keep the comma but it still removes everything including the comma.

I know the question has been asked before but I tried many answers and all remove the comma as well.

df[new_text_field_name] = df[new_text_field_name].apply(lambda elem: re.sub(r"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", str(elem)))

sample text:

'100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene',

the required output:

' polyester, Paperboard , polypropylene',

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

叫思念不要吵 2025-01-24 14:40:33

可能的解决方案如下:

# pip install pandas

import pandas as pd
pd.set_option('display.max_colwidth', 200)

# set test data and create dataframe
data = {"text": ['100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene','Polypropylene plastic', '100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene', 'Bamboo, Clear nitrocellulose lacquer', 'Willow, Stain, Solid wood, Polypropylene plastic, Stainless steel, Steel, Galvanized, Steel, 100% polypropylene', 'Banana fibres, Clear lacquer', 'Polypropylene plastic (min. 20% recycled)']}
df = pd.DataFrame(data)

def cleanup(txt):
    re_pattern = re.compile(r"[^a-z, ()]", re.I)
    return re.sub(re_pattern, "", txt).replace("  ", " ").strip()

df['text_cleaned'] = df['text'].apply(cleanup)
df

返回

在此处输入图像描述

Possible solution is the following:

# pip install pandas

import pandas as pd
pd.set_option('display.max_colwidth', 200)

# set test data and create dataframe
data = {"text": ['100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene','Polypropylene plastic', '100 % polyester, Paperboard (min. 30% recycled), 100% polypropylene', 'Bamboo, Clear nitrocellulose lacquer', 'Willow, Stain, Solid wood, Polypropylene plastic, Stainless steel, Steel, Galvanized, Steel, 100% polypropylene', 'Banana fibres, Clear lacquer', 'Polypropylene plastic (min. 20% recycled)']}
df = pd.DataFrame(data)

def cleanup(txt):
    re_pattern = re.compile(r"[^a-z, ()]", re.I)
    return re.sub(re_pattern, "", txt).replace("  ", " ").strip()

df['text_cleaned'] = df['text'].apply(cleanup)
df

Returns

enter image description here

街角卖回忆 2025-01-24 14:40:33

targin.isdigit()和targin.isletter()函数可用于识别其是数字还是字符。

Character.isDigit() and Character.isLetter() functions can be used to identify whether it is number or character.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文