如何通过正则表达式删除字符串中多余的回车符和空格?

发布于 2024-10-16 19:06:31 字数 46 浏览 6 评论 0原文

我将 HTML 代码转换为纯文本。但是有很多额外的回车符和空格。如何删除它们?

I convert a HTML code to plain text.But there are many extra returns and spaces.How to remove them?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

枉心 2024-10-23 19:06:31

字符串 new_string = Regex.Replace(orig_string , @"\s", "") 将删除所有空格

string new_string = Regex.Replace(orig_string, @"\s+", " ") 将崩溃将多个空格合并为一个

string new_string = Regex.Replace(orig_string, @"\s", "") will remove all whitespace

string new_string = Regex.Replace(orig_string, @"\s+", " ") will just collapse multiple whitespaces into one

咆哮 2024-10-23 19:06:31

我假设您想要

  • 找到两个或多个连续空格并将它们替换为单个空格,并
  • 找到两个或多个连续换行符并将它们替换为单个换行符。

如果这是正确的,那么您可以使用

resultString = Regex.Replace(subjectString, @"( |\r?\n)\1+", "$1");

这使空白的原始“类型”保持完整,并且还正确保留 Windows 行结尾。如果您还想将多个制表符“压缩”为一个,请使用

resultString = Regex.Replace(subjectString, @"( |\t|\r?\n)\1+", "$1");

将一串换行符和空格(每个制表符任意数量)压缩为单个换行符,请使用

resultString = Regex.Replace(subjectString, @"(?:(?:\r?\n)+ +){2,}", @"\n");

I'm assuming that you want to

  • find two or more consecutive spaces and replace them with a single space, and
  • find two or more consecutive newlines and replace them with a single newline.

If that's correct, then you could use

resultString = Regex.Replace(subjectString, @"( |\r?\n)\1+", "$1");

This keeps the original "type" of whitespace intact and also preserves Windows line endings correctly. If you also want to "condense" multiple tabs into one, use

resultString = Regex.Replace(subjectString, @"( |\t|\r?\n)\1+", "$1");

To condense a string of newlines and spaces (any number of each) into a single newline, use

resultString = Regex.Replace(subjectString, @"(?:(?:\r?\n)+ +){2,}", @"\n");
相思故 2024-10-23 19:06:31

我为此使用了很多算法。每个循环都很好,但这是清晰和绝对的。

//define what you want to remove as char

char tb = (char)9; //Tab char ascii code
spc = (char)32;    //space char ascii code
nwln = (char)10;   //New line char ascii char

yourstring.Replace(tb,"");
yourstring.Replace(spc,"");
yourstring.Replace(nwln,"");

//by defining chars, result was better.

I used a lot of algorithm for that. Every loop was good but this was clear and absolute.

//define what you want to remove as char

char tb = (char)9; //Tab char ascii code
spc = (char)32;    //space char ascii code
nwln = (char)10;   //New line char ascii char

yourstring.Replace(tb,"");
yourstring.Replace(spc,"");
yourstring.Replace(nwln,"");

//by defining chars, result was better.
感悟人生的甜 2024-10-23 19:06:31

您可以使用 Trim() 删除空格和回车。在 HTML 中,空格并不重要,因此您可以使用 System.String 类中的 Trim() 方法省略它们。

You can use Trim() to remove the spaces and returns. In HTML the spaces is not important so you can omit them by using the Trim() method in System.String class.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文