Regex.Split() 句子到单词,同时保留空格

发布于 2024-12-17 06:33:15 字数 155 浏览 0 评论 0原文

我正在使用 Regex.Split() 来获取用户输入并将其转换为列表中的单个单词,但目前它删除了它们添加的任何空格,我希望它保留空格。

string[] newInput = Regex.Split(updatedLine, @"\s+");

I'm using Regex.Split() to take the user input and turn it into individual words in a list but at the moment it removes any spaces they add, I would like it to keep the whitespace.

string[] newInput = Regex.Split(updatedLine, @"\s+");

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

听不够的曲调 2024-12-24 06:33:15
string text = "This            is some text";
var splits = Regex.Split(text, @"(?=(?<=[^\s])\s+)");

foreach (string item  in splits)
    Console.Write(item);
Console.WriteLine(splits.Count());

这将为您提供 4 个分割,每个分割保留所有前导空格。

(?=\s+)

表示从前面有空格的点开始分割。但如果您单独使用它,它将在示例文本上创建 15 个分割,因为每个空格后面都会跟着另一个空格,以防出现重复的空格。

(?=(?<=[^\s])\s+)

这意味着从前面没有空格字符并且前面有空格的点分割。

如果文本从空格开始,并且您希望在没有文本的情况下在第一个分割中捕获该文本,那么您可以将表达式修改为以下,

(?=(?<=^|[^\s])\s+)

这意味着一系列空格之前需要有一个非空格字符或字符串的开头。

string text = "This            is some text";
var splits = Regex.Split(text, @"(?=(?<=[^\s])\s+)");

foreach (string item  in splits)
    Console.Write(item);
Console.WriteLine(splits.Count());

This will give you 4 splits each having all the leading spaces preserved.

(?=\s+)

Means split from the point where there are spaces ahead. But if you use this alone it will create 15 splits on the sample text because every space is followed by another space in case of repeated spaces.

(?=(?<=[^\s])\s+)

This means split from a point which has non space character before it and it has spaces ahead of it.

If the text starts from a space and you want that to be captured in first split with no text then you can modify the expression to following

(?=(?<=^|[^\s])\s+)

Which means series of spaces need to have a non space character before it OR start of the string.

枫林﹌晚霞¤ 2024-12-24 06:33:15

我猜您感兴趣的一些“单词”实际上是可以接受空格的短语。您无法轻松地将空格字符用作短语分隔符和短语本身中允许的字符。尝试使用逗号作为分隔符:

string updatedLine = "user,input,two words,even three words";
string[] newInput = Regex.Split(updatedLine, @",");

此版本的正则表达式允许在逗号后添加尾随空格:

string updatedLine = "user, input,   two words,    even three words";
string[] newInput = Regex.Split(updatedLine, @",\s+|,");

I'm guessing that some of the "words" you're interested in are actually phrases where spaces are acceptable. You can't easily use the space character as both a phrase delimiter and an allowable character within the phrase itself. Try using a comma for a delimiter instead:

string updatedLine = "user,input,two words,even three words";
string[] newInput = Regex.Split(updatedLine, @",");

This version of the regex allows trailing spaces after the commas:

string updatedLine = "user, input,   two words,    even three words";
string[] newInput = Regex.Split(updatedLine, @",\s+|,");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文