C# 从带引号的字符串中删除分隔符

发布于 2024-10-03 00:21:32 字数 303 浏览 10 评论 0原文

我正在编写一个程序,必须从文本文件中带引号的字符串中删除分隔符。

例如:

"Hello, my name is world"

必须是:

"Hello my name is world"

这听起来很简单(我认为会是这样),但是您需要检测引号何时开始,何时结束,然后在特定字符串中搜索分隔符。如何?

我尝试过一些正则表达式,但我总是让自己感到困惑!

有什么想法吗?即使只是一些让事情滚动的东西,我也完全被难住了。

I'm writing a program that has to remove separator characters from quoted strings in text files.

For example:

"Hello, my name is world"

Has to be:

"Hello my name is world"

This sounds quite easy at first (I thought it would be), but you need to detect when the quote starts, when the quote ends, then search that specific string for separator characters. How?

I've experimented with some Regexs but I just keep getting myself confused!

Any ideas? Even just something to get the ball rolling, I'm just completely stumped.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

情徒 2024-10-10 00:21:32
string pattern = "\"([^\"]+)\"";
value = Regex.Match(textToSearch, pattern).Value;

string[] removalCharacters = {",",";"}; //or any other characters
foreach (string character in removalCharacters)
{
    value = value.Replace(character, "");
}
string pattern = "\"([^\"]+)\"";
value = Regex.Match(textToSearch, pattern).Value;

string[] removalCharacters = {",",";"}; //or any other characters
foreach (string character in removalCharacters)
{
    value = value.Replace(character, "");
}
下雨或天晴 2024-10-10 00:21:32

为什么不尝试用 Linq 来做呢?

var x = @" this is a great whatever ""Hello, my name is world"" and all that";

var result = string.Join(@"""", x.Split('"').
Select((val, index) => index%2 == 1 ? 
val.Replace(",", "") : val).ToArray());

why not try and do it with Linq ?

var x = @" this is a great whatever ""Hello, my name is world"" and all that";

var result = string.Join(@"""", x.Split('"').
Select((val, index) => index%2 == 1 ? 
val.Replace(",", "") : val).ToArray());
用心笑 2024-10-10 00:21:32

使用具有前瞻功能的正则表达式模式,该模式将是: "\"(?=[^\"]+,)[^\"]+\""

\" 匹配开头的双引号。前瞻 (?=[^\"]+,) 将尝试匹配引用文本中的逗号。接下来我们匹配字符串的其余部分,只要它不是双引号 [^\"]+,然后我们匹配结束双引号 \"

使用 Regex.Replace 可以采用一种紧凑的方法来更改结果并删除不需要的逗号。

string input = "\"Hello, my name, is world\"";
string pattern = "\"(?=[^\"]+,)[^\"]+\"";
string result = Regex.Replace(input, pattern, m => m.Value.Replace(",", ""));
Console.WriteLine(result);

Using a regex pattern with a look-ahead the pattern would be: "\"(?=[^\"]+,)[^\"]+\""

The \" matches the opening double-quote. The look-ahead (?=[^\"]+,) will try to match a comma within the quoted text. Next we match the rest of the string as long as it's not a double-quote [^\"]+, then we match the closing double-quote \".

Using Regex.Replace allows for a compact approach to altering the result and removing the unwanted commas.

string input = "\"Hello, my name, is world\"";
string pattern = "\"(?=[^\"]+,)[^\"]+\"";
string result = Regex.Replace(input, pattern, m => m.Value.Replace(",", ""));
Console.WriteLine(result);
随心而道 2024-10-10 00:21:32

您想要编写的内容称为“词法分析器”(或者“标记器”),它逐个字符读取输入并将其分解为标记。这通常就是编译器中解析的工作原理(作为第一步)。词法分析器会将文本分解为标记流(字符串文字、标识符、“(”等)。然后解析器获取这些标记,并使用它们生成解析树。

在您的情况下,您只需要一个词法分析器。您将有两种类型的标记“带引号的字符串”和“其他所有内容”,

然后您只需编写代码将输入分解为“其他所有内容”标记。看到一个“,并在看到下一个”时结束。如果您正在阅读源代码,您可能必须处理像“”或“”这样的特殊情况。

完成后,您就可以迭代标记。并对“字符串”标记进行所需的处理。

What you want to write is called a "lexer" (or alternatively a "tokenizer"), that reads the input character by character and breaks it up into tokens. That's generally how parsing in a compiler works (as a first step). A lexer will break text up into a stream of tokens (string literal, identifer, "(", etc). The parser then takes those tokens, and uses them to produce a parse tree.

In your case, you only need a lexer. You will have 2 types of tokens "quoted strings", and "everything else".

You then just need to write code to break the input up into tokens. By default something is an "everything else" token. A string token starts when you see a ", and ends when you see the next ". If you are reading source code you may have to deal with things like \" or "" as special cases.

Once you have done that, then you can just iterate over the tokens and do what ever processing you need on the "string" tokens.

可遇━不可求 2024-10-10 00:21:32

我必须在用于翻译平面文件的应用程序中执行类似的操作。这是我采取的方法:(只是我的应用程序中的复制/粘贴)

        protected virtual string[] delimitCVSBuffer(string inputBuffer) {
        List<string> output       = new List<string>();
        bool insideQuotes         = false;
        StringBuilder fieldBuffer = new StringBuilder();
        foreach (char c in inputBuffer) {
            if (c == FieldDelimiter && !insideQuotes) {
                output.Add(fieldBuffer.Remove(0, 1).Remove(fieldBuffer.Length - 1, 1).ToString().Trim());
                fieldBuffer.Clear();
                continue;
            } else if (c == '\"')
                insideQuotes = !insideQuotes;
            fieldBuffer.Append(c);
        }
        output.Add(fieldBuffer.Remove(0, 1).Remove(fieldBuffer.Length - 1, 1).ToString().Trim());
        return output.ToArray();
    }

I've had to do something similar in an application I use to translate flat files. This is the approach I took: (just a copy/paste from my application)

        protected virtual string[] delimitCVSBuffer(string inputBuffer) {
        List<string> output       = new List<string>();
        bool insideQuotes         = false;
        StringBuilder fieldBuffer = new StringBuilder();
        foreach (char c in inputBuffer) {
            if (c == FieldDelimiter && !insideQuotes) {
                output.Add(fieldBuffer.Remove(0, 1).Remove(fieldBuffer.Length - 1, 1).ToString().Trim());
                fieldBuffer.Clear();
                continue;
            } else if (c == '\"')
                insideQuotes = !insideQuotes;
            fieldBuffer.Append(c);
        }
        output.Add(fieldBuffer.Remove(0, 1).Remove(fieldBuffer.Length - 1, 1).ToString().Trim());
        return output.ToArray();
    }
简单气质女生网名 2024-10-10 00:21:32

所以我猜你有一些很长的文字,里面有很多引号?我会制作一个执行类似操作的方法:

  1. 运行思考字符串,直到遇到第一个“
  2. 然后将子字符串向上直到下一个”,然后执行 str.Replace(",","") 并替换任何其他您要替换的字符。
  3. 然后不进行替换,直到遇到下一个“并继续直到最后。

编辑

我刚刚有了一个更好的主意。这个怎么样:

  string mycompletestring = "This is a string\"containing, a quote\"and some more text";
  string[] splitstring = mycompletestring.Split('"');
  for (int i = 1; i < splitstring.Length; i += 2) {
    splitstring[i] = splitstring[i].Replace(",", "");
  }
  StringBuilder builder = new StringBuilder();
  foreach (string s in splitstring) {
    builder.Append(s + '"');
  }
  mycompletestring = builder.ToString().Substring(0, builder.ToString().Length - 1);

我认为应该有一种更好的方法将字符串组合成一个最后在它们之间添加一个“,但我不知道更好的方法,所以请随意在这里提出一个好的方法:)

So I guess you have some long text with a lot of quotes inside? I would make a method that does something like this:

  1. Run thought the string until you encounter the first "
  2. Then take the substring up till the next ", and do a str.Replace(",","") and also replace any other characters that you want to replace.
  3. Then go without replacing until you encounter the next " and continue until the end.

EDIT

I just got a better idea. What about this:

  string mycompletestring = "This is a string\"containing, a quote\"and some more text";
  string[] splitstring = mycompletestring.Split('"');
  for (int i = 1; i < splitstring.Length; i += 2) {
    splitstring[i] = splitstring[i].Replace(",", "");
  }
  StringBuilder builder = new StringBuilder();
  foreach (string s in splitstring) {
    builder.Append(s + '"');
  }
  mycompletestring = builder.ToString().Substring(0, builder.ToString().Length - 1);

I think there should be a better way of combining the string into one with a " between them at the end, but I don't know any better ones, so feel free to suggest a good method here :)

瘫痪情歌 2024-10-10 00:21:32

好吧,这有点古怪,但它确实有效。

因此,首先,根据 " 字符将字符串拆分为多个部分:

string msg = "this string should have a comma here,\"but, there should be no comma in this bit\", and there should be a comma back at that and";

var parts = msg.Split('"');

然后在删除中的每个逗号后,需要在 " 字符上将字符串重新连接在一起其他部分:

string result = string.Join("\"", RemoveCommaFromEveryOther(parts));

删除函数如下所示:

IEnumerable<string> RemoveCommaFromEveryOther(IEnumerable<string> parts)
{
    using (var partenum = parts.GetEnumerator())
    {
        bool replace = false;
        while (partenum.MoveNext())
        {
            if(replace)
            {
                yield return partenum.Current.Replace(",","");
                replace = false;
            }
            else
            {
                yield return partenum.Current;
                replace = true;
            }
        }
    }
}

确实要求您包含 System.Collections.Generic 的 using 指令。

Ok, this is a bit wacky, but it works.

So first off you split your string up into parts, based on the " character:

string msg = "this string should have a comma here,\"but, there should be no comma in this bit\", and there should be a comma back at that and";

var parts = msg.Split('"');

then you need to join the string back together on the " character, after removing each comma in every other part:

string result = string.Join("\"", RemoveCommaFromEveryOther(parts));

The removal function looks like this:

IEnumerable<string> RemoveCommaFromEveryOther(IEnumerable<string> parts)
{
    using (var partenum = parts.GetEnumerator())
    {
        bool replace = false;
        while (partenum.MoveNext())
        {
            if(replace)
            {
                yield return partenum.Current.Replace(",","");
                replace = false;
            }
            else
            {
                yield return partenum.Current;
                replace = true;
            }
        }
    }
}

The does require that you include a using directive for System.Collections.Generic.

生生不灭 2024-10-10 00:21:32

有很多方法可以做到这一点:
查看函数 string.Split()string.IndexOfAny()

您可以使用 string.Split(new char[] {',',' '}, StringSplitOption .RemoveEmptyEntries) 将短语插入单词中,然后使用 StringBuilder 类将单词组合在一起。

对要删除的每个字符多次调用 string.Replace("[要删除的字符]"',"") 也可以。

编辑:

调用 string.Split(new char[] {'\"'}, StringSplitOption.RemoveEmptyEntries) 获取引号 ( " ) 之间的字符串数组,然后调用 Replace< /code> 在每个字符串上,然后使用 StringBuilder 将字符串放在一起。

There are many ways to do this:
Lok at the functions string.Split() and string.IndexOfAny()

You can use string.Split(new char[] {',',' '}, StringSplitOption.RemoveEmptyEntries) to slipt the phrase into words, then use the StringBuilder class to put the words together.

Calling string.Replace("[char to remove goes here]"',"") multiple times with each char you want to remove will also work.

EDIT:

Call string.Split(new char[] {'\"'}, StringSplitOption.RemoveEmptyEntries) to obtain an array of the strings that are between quotes ( " ) then call Replace on each of them, then put the strings together with StringBuilder.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文