如何消除字符串中的所有换行符?
我需要删除字符串中出现的所有换行符(来自数据库)。 我使用下面的代码来做到这一点:
value.Replace("\r\n", "").Replace("\n", "").Replace("\r", "")
我可以看到至少有一个角色的行为就像行尾一样幸存下来。字符代码为 8232。
我很蹩脚,但我必须说这是我第一次很高兴看到这个角色。很明显,我可以直接替换这个字符,但我正在考虑将我当前的方法(基于替换“\r”和“\n”的组合)扩展到更可靠的东西,所以它不仅包括 ' 8232' 字符以及所有其他我尚未找到的字符。
对于这样的问题,你有一个万无一失的方法吗?
编辑#1:
在我看来,有几种可能的解决方案:
- 使用 Regex.Replace
- 删除所有字符(如果是 IsSeparator 或 IsControl)
- 如果是 IsWhiteSpace 则替换为“”
- 创建所有可能的行结尾的列表(“\r\n”, "\r", "\n",LF ,VT, FF, CR, CR+LF, NEL, LS, PS) 并将它们替换为空字符串。替换的次数还蛮多的
我想说最好的结果是应用第一种和第四种方法后,但我无法决定哪种方法更快。你认为哪一篇最完整?
编辑#2
我在下面发布了答案。
I have a need to get rid of all line breaks that appear in my strings (coming from db).
I do it using code below:
value.Replace("\r\n", "").Replace("\n", "").Replace("\r", "")
I can see that there's at least one character acting like line ending that survived it. The char code is 8232.
It's very lame of me, but I must say this is the first time I have a pleasure of seeing this char. It's obvious that I can just replace this char directly, but I was thinking about extending my current approach (based on replacing combinations of "\r" and "\n") to something much more solid, so it would not only include the '8232' char but also all others not-found-by-me yet.
Do you have a bullet-proof approach for such a problem?
EDIT#1:
It seems to me that there are several possible solutions:
- use Regex.Replace
- remove all chars if it's IsSeparator or IsControl
- replace with " " if it's IsWhiteSpace
- create a list of all possible line endings ( "\r\n", "\r", "\n",LF ,VT, FF, CR, CR+LF, NEL, LS, PS) and just replace them with empty string. It's a lot of replaces.
I would say that the best results will be after applying 1st and 4th approaches but I cannot decide which will be faster. Which one do you think is the most complete one?
EDIT#2
I posted anwer below.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(12)
下面是解决我的问题的扩展方法。 LineSeparator 和 ParagraphEnding 当然可以在其他地方定义为静态值等。
Below is the extension method solving my problem. LineSeparator and ParagraphEnding can be of course defined somewhere else, as static values etc.
根据 wikipedia,您可能需要处理许多行终止符(包括您提到)。
According to wikipedia, there are numerous line terminators you may need to handle (including this one you mention).
8232 (0x2028) 和 8233 (0x2029) 是您可能想要消除的唯一其他值。请参阅 char.IsSeparator 的文档。
8232 (0x2028) and 8233 (0x2029) are the only other ones you might want to eliminate. See the documentation for char.IsSeparator.
在这一点上向约塞连表示支持,我认为他是对的。将所有空格替换为单个空格:
Props to Yossarian on this one, I think he's right. Replace all whitespace with a single space:
我建议删除所有空格 (char.IsWhitespace),并将其替换为单个空格。IsWhiteSpace 负责处理所有奇怪的 unicode 空格。
I'd recommend removing ALL the whitespace (char.IsWhitespace), and replacing it with single space.. IsWhiteSpace takes care of all weird unicode whitespaces.
这是我第一次尝试这样做,但我认为这会满足您的要求...
另外,请参阅此链接以了解有关您可以使用的其他方法的详细信息:字符方法
This is my first attempt at this, but I think this will do what you want....
Also, see this link for details on other methods you can use: Char Methods
您是否尝试过 string.Replace(Environment.NewLine, "") ?这通常会给我带来很多。
Have you tried
string.Replace(Environment.NewLine, "")
? That usually gets a lot of them for me.查看此链接:http://msdn.microsoft.com/en-us/ Library/844skk0h.aspx
您将必须尝试并构建适合您的 REGEX 表达式。但这是骨架...
Check out this link: http://msdn.microsoft.com/en-us/library/844skk0h.aspx
You wil lhave to play around and build a REGEX expression that works for you. But here's the skeleton...
假设8232是unicode,你可以这样做:
Assuming that 8232 is unicode, you can do this:
我个人会选择
personally i'd go with
如果你有一个字符串,请说“theString”
使用 Replace 方法并为其提供如下所示的参数:
theString = theString.Replace(System.Environment.NewLine, "");
If you've a string say "theString" then
use the method Replace and give it the arguments shown below:
theString = theString.Replace(System.Environment.NewLine, "");
以下是使用 .NET 正则表达式的一些快速解决方案:
s = Regex.Replace(s, @"\s+", "");
(\s< /code> 匹配任何 Unicode 空白字符)
s = Regex.Replace(s, @"[\s-[\r\n]]+", "");< /代码> (
[\s-[\r\n]]
是包含减法结构的字符类,它匹配除 CR 和 LF 之外的任何空格)\p{ Zs}
(除制表符之外的任何水平空白)和来自\s
的\t
(制表符):s = Regex.Replace(s, @"[\s-[\p{Zs}\t]]+", "");
。将最后一个包装到扩展方法中:
请参阅 正则表达式演示。
Here are some quick solutions with .NET regex:
s = Regex.Replace(s, @"\s+", "");
(\s
matches any Unicode whitespace chars)s = Regex.Replace(s, @"[\s-[\r\n]]+", "");
([\s-[\r\n]]
is a character class containing a subtraction construct, it matches any whitespace but CR and LF)\p{Zs}
(any horizontal whitespace but tab) and\t
(tab) from\s
:s = Regex.Replace(s, @"[\s-[\p{Zs}\t]]+", "");
.Wrapping the last one into an extension method:
See the regex demo.