C# 从带引号的字符串中删除分隔符
我正在编写一个程序,必须从文本文件中带引号的字符串中删除分隔符。
例如:
"Hello, my name is world"
必须是:
"Hello my name is world"
这听起来很简单(我认为会是这样),但是您需要检测引号何时开始,何时结束,然后在特定字符串中搜索分隔符。如何?
我尝试过一些正则表达式,但我总是让自己感到困惑!
有什么想法吗?即使只是一些让事情滚动的东西,我也完全被难住了。
I'm writing a program that has to remove separator characters from quoted strings in text files.
For example:
"Hello, my name is world"
Has to be:
"Hello my name is world"
This sounds quite easy at first (I thought it would be), but you need to detect when the quote starts, when the quote ends, then search that specific string for separator characters. How?
I've experimented with some Regexs but I just keep getting myself confused!
Any ideas? Even just something to get the ball rolling, I'm just completely stumped.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
为什么不尝试用 Linq 来做呢?
why not try and do it with Linq ?
使用具有前瞻功能的正则表达式模式,该模式将是:
"\"(?=[^\"]+,)[^\"]+\""
\"
匹配开头的双引号。前瞻(?=[^\"]+,)
将尝试匹配引用文本中的逗号。接下来我们匹配字符串的其余部分,只要它不是双引号[^\"]+
,然后我们匹配结束双引号\"
。使用 Regex.Replace 可以采用一种紧凑的方法来更改结果并删除不需要的逗号。
Using a regex pattern with a look-ahead the pattern would be:
"\"(?=[^\"]+,)[^\"]+\""
The
\"
matches the opening double-quote. The look-ahead(?=[^\"]+,)
will try to match a comma within the quoted text. Next we match the rest of the string as long as it's not a double-quote[^\"]+
, then we match the closing double-quote\"
.Using
Regex.Replace
allows for a compact approach to altering the result and removing the unwanted commas.您想要编写的内容称为“词法分析器”(或者“标记器”),它逐个字符读取输入并将其分解为标记。这通常就是编译器中解析的工作原理(作为第一步)。词法分析器会将文本分解为标记流(字符串文字、标识符、“(”等)。然后解析器获取这些标记,并使用它们生成解析树。
在您的情况下,您只需要一个词法分析器。您将有两种类型的标记“带引号的字符串”和“其他所有内容”,
然后您只需编写代码将输入分解为“其他所有内容”标记。看到一个“,并在看到下一个”时结束。如果您正在阅读源代码,您可能必须处理像“”或“”这样的特殊情况。
完成后,您就可以迭代标记。并对“字符串”标记进行所需的处理。
What you want to write is called a "lexer" (or alternatively a "tokenizer"), that reads the input character by character and breaks it up into tokens. That's generally how parsing in a compiler works (as a first step). A lexer will break text up into a stream of tokens (string literal, identifer, "(", etc). The parser then takes those tokens, and uses them to produce a parse tree.
In your case, you only need a lexer. You will have 2 types of tokens "quoted strings", and "everything else".
You then just need to write code to break the input up into tokens. By default something is an "everything else" token. A string token starts when you see a ", and ends when you see the next ". If you are reading source code you may have to deal with things like \" or "" as special cases.
Once you have done that, then you can just iterate over the tokens and do what ever processing you need on the "string" tokens.
我必须在用于翻译平面文件的应用程序中执行类似的操作。这是我采取的方法:(只是我的应用程序中的复制/粘贴)
I've had to do something similar in an application I use to translate flat files. This is the approach I took: (just a copy/paste from my application)
所以我猜你有一些很长的文字,里面有很多引号?我会制作一个执行类似操作的方法:
编辑
我刚刚有了一个更好的主意。这个怎么样:
我认为应该有一种更好的方法将字符串组合成一个最后在它们之间添加一个“,但我不知道更好的方法,所以请随意在这里提出一个好的方法:)
So I guess you have some long text with a lot of quotes inside? I would make a method that does something like this:
EDIT
I just got a better idea. What about this:
I think there should be a better way of combining the string into one with a " between them at the end, but I don't know any better ones, so feel free to suggest a good method here :)
好吧,这有点古怪,但它确实有效。
因此,首先,根据
"
字符将字符串拆分为多个部分:然后在删除中的每个逗号后,需要在
"
字符上将字符串重新连接在一起其他部分:删除函数如下所示:
确实要求您包含 System.Collections.Generic 的 using 指令。
Ok, this is a bit wacky, but it works.
So first off you split your string up into parts, based on the
"
character:then you need to join the string back together on the
"
character, after removing each comma in every other part:The removal function looks like this:
The does require that you include a using directive for
System.Collections.Generic
.有很多方法可以做到这一点:
查看函数
string.Split()
和string.IndexOfAny()
您可以使用 string.Split(new char[] {',',' '}, StringSplitOption .RemoveEmptyEntries) 将短语插入单词中,然后使用
StringBuilder
类将单词组合在一起。对要删除的每个字符多次调用
string.Replace("[要删除的字符]"',"")
也可以。编辑:
调用
string.Split(new char[] {'\"'}, StringSplitOption.RemoveEmptyEntries)
获取引号 ( " ) 之间的字符串数组,然后调用Replace< /code> 在每个字符串上,然后使用
StringBuilder
将字符串放在一起。There are many ways to do this:
Lok at the functions
string.Split()
andstring.IndexOfAny()
You can use string.Split(new char[] {',',' '}, StringSplitOption.RemoveEmptyEntries) to slipt the phrase into words, then use the
StringBuilder
class to put the words together.Calling
string.Replace("[char to remove goes here]"',"")
multiple times with each char you want to remove will also work.EDIT:
Call
string.Split(new char[] {'\"'}, StringSplitOption.RemoveEmptyEntries)
to obtain an array of the strings that are between quotes ( " ) then callReplace
on each of them, then put the strings together withStringBuilder
.