正则表达式匹配不包含空格
我有这个正则表达式:
(?'box_id'\d{1,19})","box_name":"(?'box_name'[\w\d\.\s]{1,19})
除非框名称包含空格,否则效果很好。例如,当在 my box
上执行它时,它会返回 mybox
,不带空格。
如何使其在 box_name
组中包含空格?
代码:
Regex reg = new Regex(@"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)""");
MatchCollection matches = reg.Matches(result);
if ( matches == null) throw new Exception("There was an error while parsing data.");
if ( matches.Count > 0 )
{
FileArchive.FilesDataTable filesdataTable = new FileArchive.FilesDataTable();
foreach ( Match match in matches )
{
FileArchive.FilesRow row = filesdataTable.NewFilesRow();
row.ID = match.Groups["object_id"].Value;
row.Name = match.Groups["file_name"].Value;
}
}
输入:
{"objects":[{"object_id":"135248","file_name":"这里有空间.jpg","video_status":"0","thumbnail_status":"1"},{"object_id ":"135257","file_name":"jup 13.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135260","file_name":"我的pic.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135262","file_name":"EveningWav)es,H on(olulu,夏威夷.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135280","file_name":"test with space.jpg","video_status":"0","thumbnail_status":"1"}],"status":"ok"}
I have this regular expression:
(?'box_id'\d{1,19})","box_name":"(?'box_name'[\w\d\.\s]{1,19})
This works well, except when the box name contains spaces. For example, when executing it on my box
it returns mybox
, without the space.
How can I make it include spaces in the box_name
group?
Code:
Regex reg = new Regex(@"""object_id"":""(?<object_id>\d{1,19})"",""file_name"":""(?<file_name>[\w.]+(?:\s[\w.]+)*)""");
MatchCollection matches = reg.Matches(result);
if ( matches == null) throw new Exception("There was an error while parsing data.");
if ( matches.Count > 0 )
{
FileArchive.FilesDataTable filesdataTable = new FileArchive.FilesDataTable();
foreach ( Match match in matches )
{
FileArchive.FilesRow row = filesdataTable.NewFilesRow();
row.ID = match.Groups["object_id"].Value;
row.Name = match.Groups["file_name"].Value;
}
}
Input:
{"objects":[{"object_id":"135248","file_name":"some space here.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135257","file_name":"jup 13.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135260","file_name":"my pic.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135262","file_name":"EveningWav)es,Hon(olulu,Hawaii.jpg","video_status":"0","thumbnail_status":"1"},{"object_id":"135280","file_name":"test with spaces.jpg","video_status":"0","thumbnail_status":"1"}],"status":"ok"}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在我看来,您的数据始终是用双引号分隔的,不是吗?这一事实应该是正则表达式的基础:
就缺少空格而言,此标记 (?'box_name'[\w\d.\s]{1,19}) 无法匹配包含 ' 的字符串上的 'mybox'我的盒子',所以这个问题一定是下游的。
拼写错误和风格:您有文字“box_name”,但标记是“file_name”。另外,当 <> 时,为什么要切换到使用单引号作为命名组分隔符?默认情况下的括号更具可读性(因为引号位于正则表达式中!)
It appears to me that your data is consistently double quote delimited, no? That fact should be the basis of the regex:
As far as missing spaces, this token, (?'box_name'[\w\d.\s]{1,19}) , cannot match 'mybox' on a string containing 'my box', so that issue must be downstream.
Typos and style: you have the literal 'box_name' but the tokens are 'file_name'. Also, why in the world would you switch to using single quotes as the named group delimiter when <> brackets, the default, are MORE readable (since quotes are in the regex!)
除了 @sweaver2112 所说的之外,我认为您需要通过添加引号来扩展框架并摆脱 {1,19} 范围。
这些正则表达式在 Perl 中工作,我不想启动 C# 来测试它。
"(?\d+)","(?:${type})":"(?[\w.]+(?:\s[\w.]+ )*)"
或者,
"\s*(?\d+)\s*","\s*(?:${type})\s*":"\s*(?[\ w.]+(?:\s[\w.]+)*)\s*"
其中 $type = '文件名';
但实际上,这也应该有效(类型被替换)。它的验证是宽松的。
"(?\d+)","file_name":"(?[^"]*)"
编辑
“不确定,什么我的正则表达式返回给你了吗? – sln 昨天
它返回了正确的结果,在我的问题的输入中,我得到了 'somespacehere.jpg' 'jup13.jpg' 等 file_name 组。 – NET 开发人员昨天“
我获取了您的代码和输入,然后打印了组,它工作完美。空格就在那里,
将其分配给 ROW 数据时一定存在问题。
在这里查看 http://www.ideone.com/HsTMF
输出:
In addition to what @sweaver2112 said, I think you need to expand the framing by adding quotes and get rid of the {1,19} range.
These regex's work in Perl, I don't want to crank up C# to test it.
"(?<box_id>\d+)","(?:${type})":"(?<box_name>[\w.]+(?:\s[\w.]+)*)"
or,
"\s*(?<box_id>\d+)\s*","\s*(?:${type})\s*":"\s*(?<box_name>[\w.]+(?:\s[\w.]+)*)\s*"
where $type = 'file_name';
Realistically though, this should work too (type is substituted). Its validation is relaxed.
"(?<box_id>\d+)","file_name":"(?<box_name>[^"]*)"
edit
"Not sure, what did my regex return to you? – sln yesterday
It returned correct results, in the input in my question i got 'somespacehere.jpg' 'jup13.jpg' and so on for file_name group. – NET Developer yesterday "
I took your code and input and just print the groups, it works perfect. The spaces are there,
something must be a problem with assigning it to your ROW data.
See it here http://www.ideone.com/HsTMF
Output: