查找单词 - 字符串操作或 Linq
我有一串有几百个字的字符串。
我如何获得每个“单词”(这也可以是单个字母数字或标点符号),并且当找到每个“单词”时,它将从字符串中删除。
这可能吗?
示例:
字符串:
“这是一个充满单词和字母以及一些标点符号的字符串!和 num6er5。”
就算法而言,上面的字符串中正好有 15 个单词。
I have a string full of a few hundred words.
How would I get each "word" (this can also be a single letter number or punctuation), and as each "word" is found, it is removed from the string.
Is this possible?
Example:
String:
"this is a string full of words and letters and also some punctuation! and num6er5."
As far as the algorithm is concerned, there are exactly 15 words in the above string.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您尝试做的事情称为标记化。
在 C# 中,字符串 Split() 函数运行得很好。如果它像 Niederair 的代码一样使用而不带任何参数,它会返回一个由任何空格分割(分割?)的字符串数组,如下所示:
“我有空格”-> {"I", "have", "spaces"}
您还可以将任何要分割的字符作为 Split() 的参数(例如,',' 或 ';' 来处理 csv 文件)。
Split() 方法不关心字符串中的内容,因此将处理任何字母、数字和其他字符。
关于从字符串中删除单词:您可能希望将字符串写入缓冲区来实现此目的,但我认真地认为这太过分了。字符串是不可变的,这意味着每当您删除“下一个单词”时,您都必须重新创建整个字符串对象。
只需 Split() 整个字符串,扔掉字符串,然后从那里开始处理数组就会容易得多。
What you're trying to do is known as tokenizing.
In C#, the string Split() function works pretty well. If it's used like in Niedermair's code without any parameters, it returns an array of strings split (splitted?) by any spaces like this:
"I have spaces" -> {"I", "have", "spaces"}
You can also give any chars to split on as a parameter to Split() (for instance, ',' or ';' to handle csv files).
The Split() method pays no heed to what goes into the strings, so any letters, numbers and other chars will be handled.
About removing the words from the string: You might want to write the string into a buffer to achieve this, but I seriously think that's going too far. Strings are immutable which means any time you remove the "next word" you'll have to recreate the entire string object.
It will be a lot easier to just Split() the entire string, throw the string away, and work with the array from there on.