如何使用Python3删除所有表情符号? (包括移动表情符号)
现在我有一个客户端字符串由表情符号“📲”组成,这意味着带有箭头的手机。我想在我的文本预处理步骤中删除它,以便我可以将它传递给我的 NLP 模型。我尝试使用:
def remove_emojis(text: str) -> str:
emojis = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U00002500-\U00002BEF" # chinese char
u"\U00002702-\U000027B0"
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U0001f926-\U0001f937"
u"\U00010000-\U0010ffff"
u"\u2640-\u2642"
u"\u2600-\u2B55"
u"\u200d"
u"\u23cf"
u"\u23e9"
u"\u231a"
u"\ufe0f" # dingbats
u"\u3030" # flags (iOS)
"]+", flags=re.UNICODE)
return emojis.sub(r'', text)
但它不起作用。我认为它不包括这个表情符号。有什么办法可以删除像这样的表情符号吗?
PS:仅保留英文字符对我的情况不起作用,因为客户端字符串不是英文。
Right now I have a client string consists an emoji "📲", which means Mobile Phone With Arrow. I want to remove it in my text pre-process step so that I can pass it to my NLP model. I tried to use:
def remove_emojis(text: str) -> str:
emojis = re.compile("["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
u"\U00002500-\U00002BEF" # chinese char
u"\U00002702-\U000027B0"
u"\U00002702-\U000027B0"
u"\U000024C2-\U0001F251"
u"\U0001f926-\U0001f937"
u"\U00010000-\U0010ffff"
u"\u2640-\u2642"
u"\u2600-\u2B55"
u"\u200d"
u"\u23cf"
u"\u23e9"
u"\u231a"
u"\ufe0f" # dingbats
u"\u3030" # flags (iOS)
"]+", flags=re.UNICODE)
return emojis.sub(r'', text)
But it is not working. I think it does not cover this emoji. Is there any way I can remove emojis like this one?
PS: Keeping only English characters does not work for my case cause the client string is not in English.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论