有什么办法可以解决re.sub问题？

发布于 2025-01-26 12:04:51 字数 731 浏览 2 评论 0原文

def preprocess_text(sentence):
    #Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sentence)

    #Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
    
    #Removing multiple spaces
    #sentence = re.sub(r'\s+'+ ',', sentence)
    sentence = re.sub(r'\s+',' ',sentence)

    return sentence

TAG_RE = re.compile(r'<[^>]+>')

def remove_tags(text):
    return TAG_RE.sub('', text)

X = []
sentences = list(product_reviews['Görüş'])
for sentence in sentences :
    X.append(preprocess_text(sentence))

X[81]

原文

sub() missing 1 required positional argument: 'string'

def preprocess_text(sentence):
    #Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sentence)

    #Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
    
    #Removing multiple spaces
    #sentence = re.sub(r'\s+'+ ',', sentence)
    sentence = re.sub(r'\s+',' ',sentence)

    return sentence

TAG_RE = re.compile(r'<[^>]+>')

def remove_tags(text):
    return TAG_RE.sub('', text)

X = []
sentences = list(product_reviews['Görüş'])
for sentence in sentences :
    X.append(preprocess_text(sentence))

X[81]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

—━☆沉默づ 2025-02-02 12:04:51

在标签 #Rememoving多个空间之后，有一个+，其中应为正则式模式和替换字符串之间的逗号。

sentence = re.sub(r'\s+'+' ',sentence)

应该是

sentence = re.sub(r'\s+',' ',sentence)

After the label #Removing multiple spaces there is a + where the should be a comma between the regex pattern and the replacement string.

sentence = re.sub(r'\s+'+' ',sentence)

should be

sentence = re.sub(r'\s+',' ',sentence)

回复收藏 0 原文

淡紫姑娘！ 2025-02-02 12:04:51

您的代码缺少某些变量的初始化。我对它进行了轻微的修改，以独立运行。

import re
def preprocess_text(sentence):
    #Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sentence)
    #Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
    
    #Removing multiple spaces
    #sentence = re.sub(r'\s+'+ ',', sentence)
    sentence = re.sub(r'\s+',' ',sentence)
    return sentence
TAG_RE = re.compile(r'<[^>]+>')
def remove_tags(text):
    return TAG_RE.sub('', text)
X = []
sentences=['test1 a &_€','test   2']
#sentences = list(product_reviews['Görüş'])
for sentence in sentences :
    X.append(preprocess_text(sentence))
print(X[0])
print(X[1])

输出

test
test

Your code is missing the initialisation of certain variables. I have slighty modified it to run on it's own.

import re
def preprocess_text(sentence):
    #Remove punctuations and numbers
    sentence = re.sub('[^a-zA-Z]', ' ', sentence)
    #Single character removal
    sentence = re.sub(r"\s+[a-zA-Z]\s+", ' ', sentence)
    
    #Removing multiple spaces
    #sentence = re.sub(r'\s+'+ ',', sentence)
    sentence = re.sub(r'\s+',' ',sentence)
    return sentence
TAG_RE = re.compile(r'<[^>]+>')
def remove_tags(text):
    return TAG_RE.sub('', text)
X = []
sentences=['test1 a &_€','test   2']
#sentences = list(product_reviews['Görüş'])
for sentence in sentences :
    X.append(preprocess_text(sentence))
print(X[0])
print(X[1])

output