正则表达式 flash url

发布于 2024-07-24 23:32:49 字数 341 浏览 7 评论 0原文

您好,我正在尝试开发一个 C# 程序来抓取网站上 flash 电影的 url,这是我正在尝试解析的代码

flashvars="file=http://cache01-videos02.myspacecdn.com/24/vid_878ccd5444874681845df39eb3f00628 .flv"/>

我使用正则表达式得到的最接近的是这个表达式

file=http://[^/ ]+/(.*)flv

但是它输出时带有 file= 部分,如何过滤掉 file= 部分?

Hi Im trying to develop a C# program to scrape the urls of flash movies on a website, this is the code im trying to parse

flashvars="file=http://cache01-videos02.myspacecdn.com/24/vid_878ccd5444874681845df39eb3f00628.flv"/>

the closest I got using regex was this expression

file=http://[^/]+/(.*)flv

However it outputs with the file= portion, How do I filter out the file= part?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

送你一个梦 2024-07-31 23:32:49

我认为你需要这个:

var url=@"flashvars=""file=http://cache01-videos02.myspacecdn.com/24/vid_878ccd5444874681845df39eb3f00628.flv""";
        var match = Regex.Match(url, @"file=(?<flashurl>http://[^/]+/(.*)flv)");
        var scrapedurl = match.Groups["flashurl"].Value;

(?/...) 部分将提取括号之间的部分,并将其命名为“flashurl”;

I think you need this:

var url=@"flashvars=""file=http://cache01-videos02.myspacecdn.com/24/vid_878ccd5444874681845df39eb3f00628.flv""";
        var match = Regex.Match(url, @"file=(?<flashurl>http://[^/]+/(.*)flv)");
        var scrapedurl = match.Groups["flashurl"].Value;

The (?/<flashurl>...) part will extract the part between the parentheses and give it the name "flashurl";

三生池水覆流年 2024-07-31 23:32:49

将 Regex 更改为以下内容并使用 Groups 属性

public void ScrapeURLs(String input) {
  Regex regex = new Regex("file=(http://[^/]+/.*flv)");

  foreach(Match m in regex.Matches(input)) {
     //The URL should now be in the Groups property
     //Note that Groups is a zero based index but Groups[0] will give the complete match
     String url = m.Groups[1].Value;

     //Do something with the URL...
  }
}

基本上,.Net 中的正则表达式语法使用括号 () 进行分组,模式中的每个括号表达式都可以通过 Groups 属性访问。 组从零到右从左到右编号,但整个匹配始终被视为一个组,并且在组集合中始终具有索引 0

编辑

此模式需要注意的一件事是,如果输入包含如果有多个 Flash URL,那么正则表达式的贪婪性质将导致您得到一个奇怪的匹配,其中包含从第一个 URL 开头到最后一个 URL 结尾的所有文本。

Change the Regex to the following and use the Groups property

public void ScrapeURLs(String input) {
  Regex regex = new Regex("file=(http://[^/]+/.*flv)");

  foreach(Match m in regex.Matches(input)) {
     //The URL should now be in the Groups property
     //Note that Groups is a zero based index but Groups[0] will give the complete match
     String url = m.Groups[1].Value;

     //Do something with the URL...
  }
}

Basically the Regular Expression syntax in .Net uses brackets () for grouping, each bracketed expression in the pattern will be accessible through the Groups property. Groups are numbered from left to right from zero BUT the entire match is always considered as a Group and will always have index 0 in the Groups collection

Edit

One thing to note with this pattern is that if the input contains multiple flash URLs then the greedy nature of Regular Expressions will cause you to get a weird match which incorporates all the text from the start of the first URL to the end of the last URL.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文