当前位置：文江博客话题详情

Python replace text-files whitespace special-characters

如何在Python中用空格替换所有这些特殊字符？

发布于 2024-12-26 01:51:17 字数 384 浏览 1 评论 0原文

如何在 python 中用空格替换所有这些特殊字符？

我有一份公司名称清单。。。

例如：-[myfiles.txt]

我的公司.INC
老酒私人有限公司
大师头脑有限公司
“apex实验室有限公司”
“印度新公司”
印美私人有限公司

这里，按照上面的例子。。。我需要文件 myfiles.txt 中的所有特殊字符[-,",/,.] 必须替换为单个空格并保存到另一个文本文件 myfiles1.txt.

有人可以帮我吗？

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（5）

蓝色星空 2025-01-02 01:51:17

假设您打算更改所有非字母数字的内容，您可以在命令行上执行此操作：

cat foo.txt | sed "s/[^A-Za-z0-99]/ /g" > bar.txt

或者在 Python 中使用 re 模块：

import re
original_string = open('foo.txt').read()
new_string = re.sub('[^a-zA-Z0-9\n\.]', ' ', original_string)
open('bar.txt', 'w').write(new_string)

Assuming you mean to change everything non-alphanumeric, you can do this on the command line:

cat foo.txt | sed "s/[^A-Za-z0-99]/ /g" > bar.txt

Or in Python with the re module:

import re
original_string = open('foo.txt').read()
new_string = re.sub('[^a-zA-Z0-9\n\.]', ' ', original_string)
open('bar.txt', 'w').write(new_string)

回复收藏 0 原文

江城子 2025-01-02 01:51:17

import string

specials = '-"/.' #etc
trans = string.maketrans(specials, ' '*len(specials))
#for line in file
cleanline = line.translate(trans)

例如

>>> line = "Indo-American pvt/ltd"
>>> line.translate(trans)
'Indo American pvt ltd'

import string

specials = '-"/.' #etc
trans = string.maketrans(specials, ' '*len(specials))
#for line in file
cleanline = line.translate(trans)

e.g.

>>> line = "Indo-American pvt/ltd"
>>> line.translate(trans)
'Indo American pvt ltd'

回复收藏 0 原文

奈何桥上唱咆哮 2025-01-02 01:51:17

import re
strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
strs = re.sub(r'[?|$|.|!]',r'',strs) #for remove particular special char
strs = re.sub(r'[^a-zA-Z0-9 ]',r'',strs) #for remove all characters
strs=''.join(c if c not in map(str,range(0,10)) else '' for c in strs) #for remove numbers
strs = re.sub('  ',' ',strs) #for remove extra spaces
print(strs) 

Ans: how much for the maple syrup Thats ricidulous

import re
strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
strs = re.sub(r'[?|$|.|!]',r'',strs) #for remove particular special char
strs = re.sub(r'[^a-zA-Z0-9 ]',r'',strs) #for remove all characters
strs=''.join(c if c not in map(str,range(0,10)) else '' for c in strs) #for remove numbers
strs = re.sub('  ',' ',strs) #for remove extra spaces
print(strs) 

Ans: how much for the maple syrup Thats ricidulous

回复收藏 0 原文

苏大泽ㄣ 2025-01-02 01:51:17

虽然 maketrans 是最快的方法，但我从来不记得语法。由于速度很少成为问题，而且我了解正则表达式，因此我倾向于这样做：

>>> line = "-[myfiles.txt] MY company.INC"
>>> import re
>>> re.sub(r'[^a-zA-Z0-9]', ' ',line)
'  myfiles txt  MY company INC'

这样做的另一个好处是声明您接受的字符而不是您拒绝的字符，在这种情况下感觉更容易。

当然，如果您使用非 ASCII 字符，则必须返回删除您拒绝的字符。如果只有标点符号，你可以这样做：

>>> import string
>>> chars = re.escape(string.punctuation)
>>> re.sub(r'['+chars+']', ' ',line)
'  myfiles txt  MY company INC'

但你会注意到

While maketrans is the fastes way to do it, I never remerber the syntax. Since speed is rarely an issue and I know regular expression, I would tend to do this:

>>> line = "-[myfiles.txt] MY company.INC"
>>> import re
>>> re.sub(r'[^a-zA-Z0-9]', ' ',line)
'  myfiles txt  MY company INC'

This has the additional benefit of declaring the character you accept instead of the one you reject, which feels easier in this case.

Of couse if you are using non ASCII caracters you'll have to go back to removing the characters you reject. If there are just punctuations sign, you can do:

>>> import string
>>> chars = re.escape(string.punctuation)
>>> re.sub(r'['+chars+']', ' ',line)
'  myfiles txt  MY company INC'

But you'll notice

回复收藏 0 原文

好久不见√ 2025-01-02 01:51:17

起初我想提供一个 string.maketrans/translate 示例，但也许你正在使用一些 utf-8 编码的字符串，并且 ord() 排序的翻译表会在你脸上爆炸，所以我想到了另一个解决方案：

conversion = '-"/.'
text =  f.read()
newtext = ''
for c in text:
    newtext += ' ' if c in conversion else c

这不是最快的方法，但易于掌握和修改。

因此，如果您的文本是非 ascii，您可以将 conversion 和文本字符串解码为 unicode，然后以您想要的任何编码重新编码。

At first i thought to provide a string.maketrans/translate example, but maybe you are using some utf-8 encoded strings and the ord() sorted translate-table will blow in your face, so i thought about another solution:

conversion = '-"/.'
text =  f.read()
newtext = ''
for c in text:
    newtext += ' ' if c in conversion else c

It's not the fastest way, but easy to grasp and modify.

So if your text is non-ascii you could decode conversion and the text-strings to unicode and afterwards reencode in whichever encoding you want to.

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

25 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

檐前雨

文章 0 评论 0

鹿港巷口少年归

文章 0 评论 0

qq_32QL4xcD

文章 0 评论 0

sum_

文章 0 评论 0

DLL

文章 0 评论 0

唐婉

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文