正则表达式匹配不包含空格

发布于 2024-12-21 08:32:02 字数 1445 浏览 4 评论 0原文

我有这个正则表达式:

(?'box_id'\d{1,19})","box_name":"(?'box_name'[\w\d\.\s]{1,19})

除非框名称包含空格,否则效果很好。例如,当在 my box 上执行它时,它会返回 mybox,不带空格。

如何使其在 box_name 组中包含空格?

代码:

Regex reg = new Regex(@"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)""");
MatchCollection matches = reg.Matches(result);
if ( matches == null) throw new Exception("There was an error while parsing data."); 
if ( matches.Count > 0 )
{
  FileArchive.FilesDataTable filesdataTable = new FileArchive.FilesDataTable();
  foreach ( Match match in matches )
  {
    FileArchive.FilesRow row = filesdataTable.NewFilesRow();
    row.ID = match.Groups["object_id"].Value;
    row.Name = match.Groups["file_name"].Value;
  }
}

输入:

{"objects":[{"object_id":"135248","file_name":"这里有空间.jpg","video_status":"0","thumbnail_status":"1"},{"object_id ":"135257","file_name":"jup 13.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135260","file_name":"我的pic.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135262","file_name":"EveningWav)es,H on(olulu,夏威夷.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135280","file_name":"test with space.jpg","video_status":"0","thumbnail_status":"1"}],"status":"ok"}

I have this regular expression:

(?'box_id'\d{1,19})","box_name":"(?'box_name'[\w\d\.\s]{1,19})

This works well, except when the box name contains spaces. For example, when executing it on my box it returns mybox, without the space.

How can I make it include spaces in the box_name group?

Code:

Regex reg = new Regex(@"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)""");
MatchCollection matches = reg.Matches(result);
if ( matches == null) throw new Exception("There was an error while parsing data."); 
if ( matches.Count > 0 )
{
  FileArchive.FilesDataTable filesdataTable = new FileArchive.FilesDataTable();
  foreach ( Match match in matches )
  {
    FileArchive.FilesRow row = filesdataTable.NewFilesRow();
    row.ID = match.Groups["object_id"].Value;
    row.Name = match.Groups["file_name"].Value;
  }
}

Input:

{"objects":[{"object_id":"135248","file_name":"some space here.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135257","file_name":"jup 13.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135260","file_name":"my pic.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135262","file_name":"EveningWav)es,Hon(olulu,Hawaii.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135280","file_name":"test with spaces.jpg","video_status":"0","thumbnail_status":"1"}],"status":"ok"}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

静若繁花 2024-12-28 08:32:02

在我看来,您的数据始终是用双引号分隔的,不是吗?这一事实应该是正则表达式的基础:

(?<box_id>\d{1,19})","file_name":"(?<box_name>[^"]{1,19})  //1 to 19 non " chars.

就缺少空格而言,此标记 (?'box_name'[\w\d.\s]{1,19}) 无法匹配包含 ' 的字符串上的 'mybox'我的盒子',所以这个问题一定是下游的。

拼写错误和风格:您有文字“box_name”,但标记是“file_name”。另外,当 <> 时,为什么要切换到使用单引号作为命名组分隔符?默认情况下的括号更具可读性(因为引号位于正则表达式中!)

It appears to me that your data is consistently double quote delimited, no? That fact should be the basis of the regex:

(?<box_id>\d{1,19})","file_name":"(?<box_name>[^"]{1,19})  //1 to 19 non " chars.

As far as missing spaces, this token, (?'box_name'[\w\d.\s]{1,19}) , cannot match 'mybox' on a string containing 'my box', so that issue must be downstream.

Typos and style: you have the literal 'box_name' but the tokens are 'file_name'. Also, why in the world would you switch to using single quotes as the named group delimiter when <> brackets, the default, are MORE readable (since quotes are in the regex!)

寻梦旅人 2024-12-28 08:32:02

除了 @sweaver2112 所说的之外,我认为您需要通过添加引号来扩展框架并摆脱 {1,19} 范围。

这些正则表达式在 Perl 中工作,我不想启动 C# 来测试它。

"(?\d+)","(?:${type})":"(?[\w.]+(?:\s[\w.]+ )*)"
或者,
"\s*(?\d+)\s*","\s*(?:${type})\s*":"\s*(?[\ w.]+(?:\s[\w.]+)*)\s*"
其中 $type = '文件名';

但实际上,这也应该有效(类型被替换)。它的验证是宽松的。
"(?\d+)","file_name":"(?[^"]*)"

编辑

“不确定,什么我的正则表达式返回给你了吗? – sln 昨天
它返回了正确的结果,在我的问题的输入中,我得到了 'somespacehere.jpg' 'jup13.jpg' 等 file_name 组。 – NET 开发人员昨天“

我获取了您的代码和输入,然后打印了组,它工作完美。空格就在那里,
将其分配给 ROW 数据时一定存在问题。

在这里查看 http://www.ideone.com/HsTMF

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = @"{""objects"":[{""object_id"":""135248"",""file_name"":""some space here.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135257"",""file_name"":""jup 13.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135260"",""file_name"":""my pic.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135262"",""file_name"":""EveningWav)es,Hon(olulu,Hawaii.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135280"",""file_name"":""test with spaces.jpg"",""video_status"":""0"",""thumbnail_status"":""1""}],""status"":""ok""}";
      Regex reg = new Regex(
                   @"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)"""
      );
      foreach ( Match match in reg.Matches(input) )
         Console.WriteLine(
                 "Id = '{0}',  File name = '{1}'", 
                 match.Groups["object_id"].Value,
                 match.Groups["file_name"].Value  );
   }
}

输出:

Id = '135248',  File name = 'some space here.jpg'
Id = '135257',  File name = 'jup 13.jpg'
Id = '135260',  File name = 'my pic.jpg'
Id = '135280',  File name = 'test with spaces.jpg'

In addition to what @sweaver2112 said, I think you need to expand the framing by adding quotes and get rid of the {1,19} range.

These regex's work in Perl, I don't want to crank up C# to test it.

"(?<box_id>\d+)","(?:${type})":"(?<box_name>[\w.]+(?:\s[\w.]+)*)"
or,
"\s*(?<box_id>\d+)\s*","\s*(?:${type})\s*":"\s*(?<box_name>[\w.]+(?:\s[\w.]+)*)\s*"
where $type = 'file_name';

Realistically though, this should work too (type is substituted). Its validation is relaxed.
"(?<box_id>\d+)","file_name":"(?<box_name>[^"]*)"

edit

"Not sure, what did my regex return to you? – sln yesterday
It returned correct results, in the input in my question i got 'somespacehere.jpg' 'jup13.jpg' and so on for file_name group. – NET Developer yesterday "

I took your code and input and just print the groups, it works perfect. The spaces are there,
something must be a problem with assigning it to your ROW data.

See it here http://www.ideone.com/HsTMF

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
      string input = @"{""objects"":[{""object_id"":""135248"",""file_name"":""some space here.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135257"",""file_name"":""jup 13.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135260"",""file_name"":""my pic.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135262"",""file_name"":""EveningWav)es,Hon(olulu,Hawaii.jpg"",""video_status"":""0"",""thumbnail_status"":""1""},{""object_id"":""135280"",""file_name"":""test with spaces.jpg"",""video_status"":""0"",""thumbnail_status"":""1""}],""status"":""ok""}";
      Regex reg = new Regex(
                   @"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)"""
      );
      foreach ( Match match in reg.Matches(input) )
         Console.WriteLine(
                 "Id = '{0}',  File name = '{1}'", 
                 match.Groups["object_id"].Value,
                 match.Groups["file_name"].Value  );
   }
}

Output:

Id = '135248',  File name = 'some space here.jpg'
Id = '135257',  File name = 'jup 13.jpg'
Id = '135260',  File name = 'my pic.jpg'
Id = '135280',  File name = 'test with spaces.jpg'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文