C# - 删除与正则表达式匹配的行
我有一些数据..它看起来与此类似:
0423 222222 ADH, TEXTEXT
0424 1234 ADH,MORE TEXT
0425 98765 ADH, TEXT 3609
2000 98765-4 LBL,IUC,PCA,S/N
0010 99999-27 LBL,IUI,1.0x.25
9000 12345678 HERE IS MORE, TEXT
9010 123-123 SOMEMORE,TEXT1231
9100 SD178 YAYFOR, TEXT01
9999 90123 HEY:HOW-TO DOTHIS
我想删除以 9 开头的每一个整行xxx。现在我尝试使用正则表达式替换该值。这就是我的想法:
output = Regex.Replace(output, @"^9[\d]{3}\s+[\d*\-*\w*]+\s+[\d*\w*\-*\,*\:*\;*\.*\d*\w*]+", "");
然而,这真的很难阅读,而且它实际上并没有删除整行。
代码: 这是我正在使用的代码部分:
try
{
// Resets the formattedTextRichTextBox so multiple files aren't loaded on top of eachother.
formattedTextRichTextBox.ResetText();
foreach (string line in File.ReadAllLines(openFile.FileName))
{
// Uses regular expressions to find a line that has, digit(s), space(s), digit(s) + letter(s),
// space(s), digit(s), space(s), any character (up to 25 times).
Match theMatch = Regex.Match(line, @"^[\.*\d]+\s+[\d\w]+\s+[\d\-\w*]+\s+.{25}");
if (theMatch.Success)
{
// Stores the matched value in string output.
string output = theMatch.Value;
// Replaces the text with the required layout.
output = Regex.Replace(output, @"^[\.*\d]+\s+", "");
//output = Regex.Replace(output, @"^9[\d]{3}\s+[\d*\-*\w*]+\s+[\d*\w*\-*\,*\:*\;*\.*\d*\w*]+", "");
output = Regex.Replace(output, @"\s+", " ");
// Sets the formattedTextRichTextBox to the string output.
formattedTextRichTextBox.AppendText(output);
formattedTextRichTextBox.AppendText("\n");
}
}
}
结果: 因此,我希望新数据的格式如下:(删除 9xxx):
0423 222222 ADH, TEXTEXT
0424 1234 ADH,MORE TEXT
0425 98765 ADH, TEXT 3609
2000 98765-4 LBL,IUC,PCA,S/N
0010 99999-27 LBL,IUI,1.0x.25
问题:
- 有没有更简单的方法来解决这个问题?
- 如果是这样,我可以使用正则表达式来解决这个问题还是必须使用不同的方式?
I have some data.. it looks similar to this:
0423 222222 ADH, TEXTEXT
0424 1234 ADH,MORE TEXT
0425 98765 ADH, TEXT 3609
2000 98765-4 LBL,IUC,PCA,S/N
0010 99999-27 LBL,IUI,1.0x.25
9000 12345678 HERE IS MORE, TEXT
9010 123-123 SOMEMORE,TEXT1231
9100 SD178 YAYFOR, TEXT01
9999 90123 HEY:HOW-TO DOTHIS
And I would like to remove each entire line that begins with a 9xxx. Right now I have tried Replacing the value using Regex. Here is what I have for that:
output = Regex.Replace(output, @"^9[\d]{3}\s+[\d*\-*\w*]+\s+[\d*\w*\-*\,*\:*\;*\.*\d*\w*]+", "");
However, this is really hard to read and it actually does not delete the entire line.
CODE:
Here is the section of the code I am using:
try
{
// Resets the formattedTextRichTextBox so multiple files aren't loaded on top of eachother.
formattedTextRichTextBox.ResetText();
foreach (string line in File.ReadAllLines(openFile.FileName))
{
// Uses regular expressions to find a line that has, digit(s), space(s), digit(s) + letter(s),
// space(s), digit(s), space(s), any character (up to 25 times).
Match theMatch = Regex.Match(line, @"^[\.*\d]+\s+[\d\w]+\s+[\d\-\w*]+\s+.{25}");
if (theMatch.Success)
{
// Stores the matched value in string output.
string output = theMatch.Value;
// Replaces the text with the required layout.
output = Regex.Replace(output, @"^[\.*\d]+\s+", "");
//output = Regex.Replace(output, @"^9[\d]{3}\s+[\d*\-*\w*]+\s+[\d*\w*\-*\,*\:*\;*\.*\d*\w*]+", "");
output = Regex.Replace(output, @"\s+", " ");
// Sets the formattedTextRichTextBox to the string output.
formattedTextRichTextBox.AppendText(output);
formattedTextRichTextBox.AppendText("\n");
}
}
}
OUTCOME:
So what I would like the new data to look like is in this format (removed 9xxx):
0423 222222 ADH, TEXTEXT
0424 1234 ADH,MORE TEXT
0425 98765 ADH, TEXT 3609
2000 98765-4 LBL,IUC,PCA,S/N
0010 99999-27 LBL,IUI,1.0x.25
QUESTIONS:
- Is there an easier way to go about this?
- If so, can I use regex to go about this or must I use a different way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
只需重新编写测试格式的正则表达式,以匹配所有不以 9 开头的内容 - 这样,以 9 开头的行就不会添加到富文本框中。
Just reformulate the regex that tests your format to match everything that doesn't begin with 9 - that way lines starting with 9 are not added to the rich text box.
试试这个(使用 Linq):
Try this(Uses Linq):
是的,有一个更简单的方法。只需使用
Regex.Replace
方法,并提供Multiline
选项。Yes, there is a simpler way. Just use
Regex.Replace
method, and provideMultiline
option.为什么不只匹配第一个 9xxx 部分,使用通配符来匹配该行的其余部分,这样会更具可读性。
output = Regex.Replace(output, @"^9[\d{3}].*", "")
Why don't you just match the first 9xxx part the use a wildcard to match the rest of the line, it would be a lot more readable.
output = Regex.Replace(output, @"^9[\d{3}].*", "")