如何从 c++ 中的字符串中删除所有非字母数字字符?
我正在编写一个软件,它要求我使用 libcurl 处理从网页获取的数据。当我获取数据时,由于某种原因,其中有额外的换行符。我需要找到一种只允许字母、数字和空格的方法。并删除其他所有内容,包括换行符。有什么简单的方法可以做到这一点吗?谢谢。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
编写一个接受
char
的函数,如果要删除该字符,则返回true
;如果要保留该字符,则返回false
:然后使用
std::remove_if
算法从字符串中删除不需要的字符:根据您的要求,您可以使用标准库谓词之一,例如
std::isalnum
,而不是写你自己的谓词(你说你需要匹配字母数字字符和空格,所以这可能不完全符合您的需要)。如果您想使用标准库
std::isalnum
函数,则需要进行强制转换来消除 C 标准库头文件中的std::isalnum
函数之间的歧义
(这是您要使用的)和 C++ 标准库标头
中的std::isalnum
(哪个不是您想要使用的容器,除非您想要执行特定于区域设置的字符串处理):这对于任何序列容器(包括
std::string
、std: :vector 和 std::deque )。该习惯用法通常称为“擦除/删除”习惯用法。
std::remove_if
算法也适用于普通数组。std::remove_if
仅对序列进行一次传递,因此它具有线性时间复杂度。Write a function that takes a
char
and returnstrue
if you want to remove that character orfalse
if you want to keep it:Then use the
std::remove_if
algorithm to remove the unwanted characters from the string:Depending on your requirements, you may be able to use one of the Standard Library predicates, like
std::isalnum
, instead of writing your own predicate (you said you needed to match alphanumeric characters and spaces, so perhaps this doesn't exactly fit what you need).If you want to use the Standard Library
std::isalnum
function, you will need a cast to disambiguate between thestd::isalnum
function in the C Standard Library header<cctype>
(which is the one you want to use) and thestd::isalnum
in the C++ Standard Library header<locale>
(which is not the one you want to use, unless you want to perform locale-specific string processing):This works equally well with any of the sequence containers (including
std::string
,std::vector
andstd::deque
). This idiom is commonly referred to as the "erase/remove" idiom. Thestd::remove_if
algorithm will also work with ordinary arrays. Thestd::remove_if
makes only a single pass over the sequence, so it has linear time complexity.以前使用
std::isalnum
时,如果不传递 unary 参数,则无法使用std::ptr_fun
进行编译,因此此解决方案带有lambda 函数应该封装正确的答案:Previous uses of
std::isalnum
won't compile withstd::ptr_fun
without passing the unary argument is requires, hence this solution with a lambda function should encapsulate the correct answer:如果您使用
字符串
,您始终可以循环遍历并擦除
所有非字母数字字符。更好地使用标准库的人可能可以在没有循环的情况下完成此操作。
如果您仅使用
char
缓冲区,则可以循环遍历,如果字符不是字母数字,则将其后面的所有字符向后移动一位(以覆盖有问题的字符):You could always loop through and just
erase
all non alphanumeric characters if you're usingstring
.Someone better with the Standard Lib can probably do this without a loop.
If you're using just a
char
buffer, you can loop through and if a character is not alphanumeric, shift all the characters after it backwards one (to overwrite the offending character):只是稍微扩展了 James McNellis 的代码。他的功能是删除alnum字符而不是非alnum字符。
从字符串中删除非数字字符。 (alnum = 字母或数字)
声明一个函数(如果传递的 char 不是 alnum,isalnum 返回 0)
然后写这个
,那么你的字符串就只有 alnum 字符。
Just extending James McNellis's code a little bit more. His function is deleting alnum characters instead of non-alnum ones.
To delete non-alnum characters from a string. (alnum = alphabetical or numeric)
Declare a function (isalnum returns 0 if passed char is not alnum)
And then write this
then your string is only with alnum characters.
对不同方法进行基准测试。
如果你正在寻找一个基准,我做了一个。
NB 必须修改所选答案,因为它只保留特殊字符
NB2:测试文件是一个(几乎)8192 kb 的文本文件,大约有 62 个数字和 12 个特殊字符,随机而且写得均匀。
Benchmark源码
我的解决方案
按位法可以直接在我的github,基本上我避免了分支指令(如果)感谢掩码。我避免使用 C++ 标签发布按位运算,我对此非常讨厌。
对于 C 风格,我迭代字符串并有两个索引:
n
用于我们保留的字符,i
用于遍历字符串,我们在其中一个接一个地进行测试如果是数字,则为大写或小写。添加此功能:
并用作:
Benchmarking the different methods.
If you are looking for a benchmark I made one.
NB the selected answer had to be modified as it was keeping only the special characters
NB2: The test file is a (almost) 8192 kb text file with roughly 62 alnum and 12 special characters, randomly and evenly written.
Benchmark source code
My solution
For the bitwise method you can check it directly on my github, basically I avoid branching instructions (if) thanks to the mask. I avoid posting bitwise operations with C++ tag, I get a lot of hate for it.
For the C style one, I iterate over the string and have two index:
n
for the characters we keep andi
to go through the string, where we test one after another if it is a digit, a uppercase or a lowercase.Add this function:
and use as:
remove_copy_if 标准算法非常适合您的情况。
The remove_copy_if standard algorithm would be very appropriate for your case.
结果:
您使用 isalnum 来确定每个字符是否是字母数字,然后使用 ptr_fun 将函数传递给 NOT1返回值,只留下你想要的字母数字内容。
Results in:
You use
isalnum
to determine whether or not each character is alpha numeric, then useptr_fun
to pass the function tonot1
which NOTs the returned value, leaving you with only the alphanumeric stuff you want.您可以这样使用删除擦除算法 -
You can use the remove-erase algorithm this way -
下面的代码对于给定的字符串
s
应该可以正常工作。它利用
和
库。Below code should work just fine for given string
s
. It's utilizing<algorithm>
and<locale>
libraries.提到的解决方案
非常好,但不幸的是在 Visual Studio(调试模式)中不适用于像“Ñ”这样的字符,因为这一行:
在 isctype.c
所以,我会推荐这样的东西:
The mentioned solution
is very nice, but unfortunately doesn't work with characters like 'Ñ' in Visual Studio (debug mode), because of this line:
in isctype.c
So, I would recommend something like this:
以下内容对我有用。
The following works for me.