php中的正则表达式从wiki文本中删除引用

发布于 2024-11-29 18:37:50 字数 1393 浏览 0 评论 0原文

从给定的示例文本中，我希望将文本与 [[]] 和 {{}} 中包含的文本分开。

示例文本：

On 11 December 1988，年龄只有 15 岁零 232 天，Tendulkar 在他的首秀中得分为 100 分[[一流板球|一流]] [[孟买板球队|孟买]] 对阵 [[古吉拉特邦板球队|古吉拉特邦]]，使他最年轻的印度人在头等舱首秀中获得一百分。随后，他在第一个 Deodhar 和 Duleep 奖杯中获得了一个世纪的成绩。 {{引用网页|url=http://www.espnstar.com/cricket/international-cricket/news/detail/item136972/Sachin-Tendulkar-factfile/|title=Sachin Tendulkar事实文件|publisher=www.espnstar.com| accessdate=3 August 2009}} 在看到他谈判后，他被孟买队长 [[Dilip Vengsarkar]] 选中[[Kapil Dev]] 在篮网队效力，并以孟买最高跑分得分手的身份结束了本赛季。他以平均 67.77 分的成绩获得 583 分，在总体跑分得分手中排名第六{{cite web|url=http://blogs.cricinfo.com/link_to_database/ARCHIVE/1980S/1988-89/IND_LOCAL/RANJI/STATS/IND_LOCAL_RJI_AVS_BAT_MOST_RUNS.html|title=1988–89 Ranji 赛季 – 大多数运行|publisher=Cricinfo|accessdate =2009年8月3日}}他还创造了一个世纪不败在 [[伊朗杯]] 决赛中，{{cite web|url=http://cricketarchive.com/Archive/Scorecards/52/52008.html|title=1989/90 年印度其他地区对阵德里 |publisher=Cricketarchive|accessdate=2009 年 8 月 3 日}}，仅在一个头等赛季之后就被选中参加明年的巴基斯坦巡回赛。

我尝试了这个：

$patterns = ("/^{{*/", "/*}}$/" );$replacements = "";
  preg_replace($patterns, $replacements, $parts);
  print_r($parts);

还有这个：

$parts = preg_replace("/\[(?:\\\\|\\\]|[^\]])*\]/", "", $ans_str);

还有这个：

$pattern = ("/\[.*?\]/", "/\{.*?\}/");
  $ans = preg_replace($pattern, "", $parts);

它不起作用。请帮忙，谢谢。

原文

From the given sample text i want the text apart from the ones that are contained in [[]] and {{}}

Sample Text:

On 11 December 1988, aged just 15 years and 232 days, Tendulkar scored 100 not out in his debut [[first-class cricket|first-class]] match for [[Mumbai cricket team|Bombay]] against [[Gujarat cricket team|Gujarat]], making him the youngest Indian to score a century on first-class debut. He followed this by scoring a century in his first Deodhar and Duleep Trophy.
{{cite web|url=http://www.espnstar.com/cricket/international-cricket/news/detail/item136972/Sachin-Tendulkar-factfile/|title=Sachin Tendulkar factfile |publisher=www.espnstar.com|accessdate=3 August 2009}} He was picked by the Mumbai captain [[Dilip Vengsarkar]] after seeing him negotiate [[Kapil Dev]] in the nets, and finished the season as Bombay's highest run-scorer.He scored 583 runs at an average of 67.77, and was the sixth highest run-scorer overall{{cite web|url=http://blogs.cricinfo.com/link_to_database/ARCHIVE/1980S/1988-89/IND_LOCAL/RANJI/STATS/IND_LOCAL_RJI_AVS_BAT_MOST_RUNS.html|title=1988–89 Ranji season – Most Runs|publisher=Cricinfo|accessdate=3 August 2009}} He also made an unbeaten century in the [[Irani Trophy]] final,{{cite web|url=http://cricketarchive.com/Archive/Scorecards/52/52008.html|title=Rest of India v Delhi in 1989/90
|publisher=Cricketarchive|accessdate=3 August 2009}} and was selected for the tour of Pakistan next year, after just one first class season.

I tried this:

$patterns = ("/^{{*/", "/*}}$/" );$replacements = "";
  preg_replace($patterns, $replacements, $parts);
  print_r($parts);

and this:

$parts = preg_replace("/\[(?:\\\\|\\\]|[^\]])*\]/", "", $ans_str);

and this too:

$pattern = ("/\[.*?\]/", "/\{.*?\}/");
  $ans = preg_replace($pattern, "", $parts);

It does not work.
Please help, thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笛声青案梦长安 2024-12-06 18:37:50

这应该可以解决问题

$str = "On 11 December 1988, ...";
$str = preg_replace('/\{\{.+\}\}/Us', '', $str);
var_dump($str);

U 修饰符用于非贪婪模式，这意味着尽快停止匹配（以避免所有引用被捕获为一个巨大的匹配）。

编辑：添加了 s 修饰符，请参阅评论

This should do the trick

$str = "On 11 December 1988, ...";
$str = preg_replace('/\{\{.+\}\}/Us', '', $str);
var_dump($str);

U modifier is for ungreedy mode, which means stop the match as soon as possible (to avoid all citations being caught as one giant match).

EDIT: added the s modifier, see comments

回复收藏 0 原文

紫瑟鸿黎 2024-12-06 18:37:50

// remove `{{cite}}` tags
$str = preg_replace('/\s*\{\{[^}{]*+\}\}\s*/', ' ', $str);

// remove links--including rollover text--leaving link text
$str = preg_replace('/\[\[(?:[^][|]*+\|)?+([^][]*+)\]\]/', '$1', $str);

在 ideone.com 上查看演示

// remove `{{cite}}` tags
$str = preg_replace('/\s*\{\{[^}{]*+\}\}\s*/', ' ', $str);

// remove links--including rollover text--leaving link text
$str = preg_replace('/\[\[(?:[^][|]*+\|)?+([^][]*+)\]\]/', '$1', $str);

see demo on ideone.com

回复收藏 0 原文

颜漓半夏 2024-12-06 18:37:50

下面两行就成功了：

$str = preg_replace(/\s*\{{.*?\}}\s*/g, " ", $str);//to remove the curly braces and the text between them.
$str = preg_replace(/[\[(.)\]]/g, "", $str);//to remove the square braces.

抱歉，出了问题。

the following two lines did the trick :

$str = preg_replace(/\s*\{{.*?\}}\s*/g, " ", $str);//to remove the curly braces and the text between them.
$str = preg_replace(/[\[(.)\]]/g, "", $str);//to remove the square braces.

Sorry it went wrong.

回复收藏 0 原文

~没有更多了~