有什么办法可以解决re.sub问题?

发布于 2025-01-26 12:04:51 字数 731 浏览 2 评论 0原文

sub()缺失1所需的位置参数:'string'string'

def preprocess_text(sentence):
    #Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sentence)

    #Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
    
    #Removing multiple spaces
    #sentence = re.sub(r'\s+'+ ',', sentence)
    sentence = re.sub(r'\s+',' ',sentence)

    return sentence

TAG_RE = re.compile(r'<[^>]+>')

def remove_tags(text):
    return TAG_RE.sub('', text)

X = []
sentences = list(product_reviews['Görüş'])
for sentence in sentences :
    X.append(preprocess_text(sentence))

X[81] 

sub() missing 1 required positional argument: 'string'

def preprocess_text(sentence):
    #Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sentence)

    #Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
    
    #Removing multiple spaces
    #sentence = re.sub(r'\s+'+ ',', sentence)
    sentence = re.sub(r'\s+',' ',sentence)

    return sentence

TAG_RE = re.compile(r'<[^>]+>')

def remove_tags(text):
    return TAG_RE.sub('', text)

X = []
sentences = list(product_reviews['Görüş'])
for sentence in sentences :
    X.append(preprocess_text(sentence))

X[81] 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

—━☆沉默づ 2025-02-02 12:04:51

在标签 #Rememoving多个空间之后,有一个+,其中应为正则式模式和替换字符串之间的逗号。

sentence = re.sub(r'\s+'+' ',sentence)

应该是

sentence = re.sub(r'\s+',' ',sentence)

After the label #Removing multiple spaces there is a + where the should be a comma between the regex pattern and the replacement string.

sentence = re.sub(r'\s+'+' ',sentence)

should be

sentence = re.sub(r'\s+',' ',sentence)
淡紫姑娘! 2025-02-02 12:04:51

您的代码缺少某些变量的初始化。我对它进行了轻微的修改,以独立运行。

import re
def preprocess_text(sentence):
    #Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sentence)
    #Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
    
    #Removing multiple spaces
    #sentence = re.sub(r'\s+'+ ',', sentence)
    sentence = re.sub(r'\s+',' ',sentence)
    return sentence
TAG_RE = re.compile(r'<[^>]+>')
def remove_tags(text):
    return TAG_RE.sub('', text)
X = []
sentences=['test1 a &_€','test   2']
#sentences = list(product_reviews['Görüş'])
for sentence in sentences :
    X.append(preprocess_text(sentence))
print(X[0])
print(X[1])

输出

test
test

Your code is missing the initialisation of certain variables. I have slighty modified it to run on it's own.

import re
def preprocess_text(sentence):
    #Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sentence)
    #Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
    
    #Removing multiple spaces
    #sentence = re.sub(r'\s+'+ ',', sentence)
    sentence = re.sub(r'\s+',' ',sentence)
    return sentence
TAG_RE = re.compile(r'<[^>]+>')
def remove_tags(text):
    return TAG_RE.sub('', text)
X = []
sentences=['test1 a &_€','test   2']
#sentences = list(product_reviews['Görüş'])
for sentence in sentences :
    X.append(preprocess_text(sentence))
print(X[0])
print(X[1])

output

test
test
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文