如何消除字符串中的所有换行符?

发布于 2024-11-24 22:33:30 字数 769 浏览 0 评论 0原文

我需要删除字符串中出现的所有换行符(来自数据库)。 我使用下面的代码来做到这一点:

value.Replace("\r\n", "").Replace("\n", "").Replace("\r", "")

我可以看到至少有一个角色的行为就像行尾一样幸存下来。字符代码为 8232

我很蹩脚,但我必须说这是我第一次很高兴看到这个角色。很明显,我可以直接替换这个字符,但我正在考虑将我当前的方法(基于替换“\r”和“\n”的组合)扩展到更可靠的东西,所以它不仅包括 ' 8232' 字符以及所有其他我尚未找到的字符。

对于这样的问题,你有一个万无一失的方法吗?

编辑#1:

在我看来,有几种可能的解决方案:

  1. 使用 Regex.Replace
  2. 删除所有字符(如果是 IsSeparator 或 IsControl)
  3. 如果是 IsWhiteSpace 则替换为“”
  4. 创建所有可能的行结尾的列表(“\r\n”, "\r", "\n",LF ,VT, FF, CR, CR+LF, NEL, LS, PS) 并将它们替换为空字符串。替换的次数还蛮多的

我想说最好的结果是应用第一种和第四种方法后,但我无法决定哪种方法更快。你认为哪一篇最完整?

编辑#2

我在下面发布了答案。

I have a need to get rid of all line breaks that appear in my strings (coming from db).
I do it using code below:

value.Replace("\r\n", "").Replace("\n", "").Replace("\r", "")

I can see that there's at least one character acting like line ending that survived it. The char code is 8232.

It's very lame of me, but I must say this is the first time I have a pleasure of seeing this char. It's obvious that I can just replace this char directly, but I was thinking about extending my current approach (based on replacing combinations of "\r" and "\n") to something much more solid, so it would not only include the '8232' char but also all others not-found-by-me yet.

Do you have a bullet-proof approach for such a problem?

EDIT#1:

It seems to me that there are several possible solutions:

  1. use Regex.Replace
  2. remove all chars if it's IsSeparator or IsControl
  3. replace with " " if it's IsWhiteSpace
  4. create a list of all possible line endings ( "\r\n", "\r", "\n",LF ,VT, FF, CR, CR+LF, NEL, LS, PS) and just replace them with empty string. It's a lot of replaces.

I would say that the best results will be after applying 1st and 4th approaches but I cannot decide which will be faster. Which one do you think is the most complete one?

EDIT#2

I posted anwer below.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(12

一口甜 2024-12-01 22:33:30

下面是解决我的问题的扩展方法。 LineSeparator 和 ParagraphEnding 当然可以在其他地方定义为静态值等。

public static string RemoveLineEndings(this string value)
{
    if(String.IsNullOrEmpty(value))
    {
        return value;
    }
    string lineSeparator = ((char) 0x2028).ToString();
    string paragraphSeparator = ((char)0x2029).ToString();

    return value.Replace("\r\n", string.Empty)
                .Replace("\n", string.Empty)
                .Replace("\r", string.Empty)
                .Replace(lineSeparator, string.Empty)
                .Replace(paragraphSeparator, string.Empty);
}

Below is the extension method solving my problem. LineSeparator and ParagraphEnding can be of course defined somewhere else, as static values etc.

public static string RemoveLineEndings(this string value)
{
    if(String.IsNullOrEmpty(value))
    {
        return value;
    }
    string lineSeparator = ((char) 0x2028).ToString();
    string paragraphSeparator = ((char)0x2029).ToString();

    return value.Replace("\r\n", string.Empty)
                .Replace("\n", string.Empty)
                .Replace("\r", string.Empty)
                .Replace(lineSeparator, string.Empty)
                .Replace(paragraphSeparator, string.Empty);
}
腻橙味 2024-12-01 22:33:30

根据 wikipedia,您可能需要处理许多行终止符(包括您提到)。

LF:换行,U+000A
VT:垂直制表符,U+000B
FF:换页,U+000C
CR:回车,U+000D
CR+LF:CR (U+000D) 后接 LF (U+000A)
NEL:下一行,U+0085
LS:行分隔符,U+2028
PS:段落分隔符,U+2029

According to wikipedia, there are numerous line terminators you may need to handle (including this one you mention).

LF: Line Feed, U+000A
VT: Vertical Tab, U+000B
FF: Form Feed, U+000C
CR: Carriage Return, U+000D
CR+LF: CR (U+000D) followed by LF (U+000A)
NEL: Next Line, U+0085
LS: Line Separator, U+2028
PS: Paragraph Separator, U+2029

淡看悲欢离合 2024-12-01 22:33:30

8232 (0x2028) 和 8233 (0x2029) 是您可能想要消除的唯一其他值。请参阅 char.IsSeparator 的文档。

8232 (0x2028) and 8233 (0x2029) are the only other ones you might want to eliminate. See the documentation for char.IsSeparator.

神仙妹妹 2024-12-01 22:33:30

在这一点上向约塞连表示支持,我认为他是对的。将所有空格替换为单个空格:

data = Regex.Replace(data, @"\s+", " ");

Props to Yossarian on this one, I think he's right. Replace all whitespace with a single space:

data = Regex.Replace(data, @"\s+", " ");
方圜几里 2024-12-01 22:33:30

我建议删除所有空格 (char.IsWhitespace),并将其替换为单个空格。IsWhiteSpace 负责处理所有奇怪的 unicode 空格。

I'd recommend removing ALL the whitespace (char.IsWhitespace), and replacing it with single space.. IsWhiteSpace takes care of all weird unicode whitespaces.

椒妓 2024-12-01 22:33:30

这是我第一次尝试这样做,但我认为这会满足您的要求...

var controlChars = from c in value.ToCharArray() where Char.IsControl(c) select c;
foreach (char c in controlChars)  
   value = value.Replace(c.ToString(), "");

另外,请参阅此链接以了解有关您可以使用的其他方法的详细信息:字符方法

This is my first attempt at this, but I think this will do what you want....

var controlChars = from c in value.ToCharArray() where Char.IsControl(c) select c;
foreach (char c in controlChars)  
   value = value.Replace(c.ToString(), "");

Also, see this link for details on other methods you can use: Char Methods

心不设防 2024-12-01 22:33:30

您是否尝试过 string.Replace(Environment.NewLine, "") ?这通常会给我带来很多。

Have you tried string.Replace(Environment.NewLine, "") ? That usually gets a lot of them for me.

一腔孤↑勇 2024-12-01 22:33:30

查看此链接:http://msdn.microsoft.com/en-us/ Library/844skk0h.aspx

您将必须尝试并构建适合您的 REGEX 表达式。但这是骨架...

static void Main(string[] args)
{

        StringBuilder txt = new StringBuilder();
        txt.Append("Hello \n\n\r\t\t");
        txt.Append( Convert.ToChar(8232));

        System.Console.WriteLine("Original: <" + txt.ToString() + ">");

        System.Console.WriteLine("Cleaned: <" + CleanInput(txt.ToString()) + ">");

        System.Console.Read();

    }

    static string CleanInput(string strIn)
    {
        // Replace invalid characters with empty strings.
        return Regex.Replace(strIn, @"[^\w\.@-]", ""); 
    }

Check out this link: http://msdn.microsoft.com/en-us/library/844skk0h.aspx

You wil lhave to play around and build a REGEX expression that works for you. But here's the skeleton...

static void Main(string[] args)
{

        StringBuilder txt = new StringBuilder();
        txt.Append("Hello \n\n\r\t\t");
        txt.Append( Convert.ToChar(8232));

        System.Console.WriteLine("Original: <" + txt.ToString() + ">");

        System.Console.WriteLine("Cleaned: <" + CleanInput(txt.ToString()) + ">");

        System.Console.Read();

    }

    static string CleanInput(string strIn)
    {
        // Replace invalid characters with empty strings.
        return Regex.Replace(strIn, @"[^\w\.@-]", ""); 
    }
绝對不後悔。 2024-12-01 22:33:30

假设8232是unicode,你可以这样做:

value.Replace("\u2028", string.Empty);

Assuming that 8232 is unicode, you can do this:

value.Replace("\u2028", string.Empty);
强者自强 2024-12-01 22:33:30

我个人会选择

    public static String RemoveLineEndings(this String text)
    {
        StringBuilder newText = new StringBuilder();
        for (int i = 0; i < text.Length; i++)
        {
            if (!char.IsControl(text, i))
                newText.Append(text[i]);
        }
        return newText.ToString();
    }

personally i'd go with

    public static String RemoveLineEndings(this String text)
    {
        StringBuilder newText = new StringBuilder();
        for (int i = 0; i < text.Length; i++)
        {
            if (!char.IsControl(text, i))
                newText.Append(text[i]);
        }
        return newText.ToString();
    }
南城追梦 2024-12-01 22:33:30

如果你有一个字符串,请说“theString”
使用 Replace 方法并为其提供如下所示的参数:

theString = theString.Replace(System.Environment.NewLine, "");

If you've a string say "theString" then
use the method Replace and give it the arguments shown below:

theString = theString.Replace(System.Environment.NewLine, "");

陌伤ぢ 2024-12-01 22:33:30

以下是使用 .NET 正则表达式的一些快速解决方案:

  • 要从字符串中删除任何空格: s = Regex.Replace(s, @"\s+", ""); (\s< /code> 匹配任何 Unicode 空白字符)
  • 要删除除 CR 和 LF 之外的所有空白: s = Regex.Replace(s, @"[\s-[\r\n]]+", "");< /代码> ([\s-[\r\n]] 是包含减法结构的字符类,它匹配除 CR 和 LF 之外的任何空格)
  • 要删除任何垂直空格,请减去 \p{ Zs}(除制表符之外的任何水平空白)和来自 \s\t(制表符): s = Regex.Replace(s, @"[\s-[\p{Zs}\t]]+", "");

将最后一个包装到扩展方法中:

public static string RemoveLineEndings(this string value)
{
    return Regex.Replace(value, @"[\s-[\p{Zs}\t]]+", "");
}

请参阅 正则表达式演示

Here are some quick solutions with .NET regex:

  • To remove any whitespace from a string: s = Regex.Replace(s, @"\s+", ""); (\s matches any Unicode whitespace chars)
  • To remove all whitespace BUT CR and LF: s = Regex.Replace(s, @"[\s-[\r\n]]+", ""); ([\s-[\r\n]] is a character class containing a subtraction construct, it matches any whitespace but CR and LF)
  • To remove any vertical whitespace, subtract \p{Zs} (any horizontal whitespace but tab) and \t (tab) from \s: s = Regex.Replace(s, @"[\s-[\p{Zs}\t]]+", "");.

Wrapping the last one into an extension method:

public static string RemoveLineEndings(this string value)
{
    return Regex.Replace(value, @"[\s-[\p{Zs}\t]]+", "");
}

See the regex demo.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文