PHP 的 preg_replace 模式

发布于 2024-11-18 15:09:44 字数 1656 浏览 2 评论 0 原文

我的模式有一些问题。希望有人能帮助我解决这个问题。

给定一个字符串

$string = Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."<ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>

,我想删除 <ref> 内的字符串()或

在替换后 删除单个引用标签 最终输出应该是:

互利共生已追溯 被定性为意识形态上的 介于个人主义和 无政府主义的集体主义形式。 蒲鲁东首先描述了他的目标 作为“第三种社会形态, 共产主义和财产的综合。” 另一个是

我的代码似乎不起作用,

$pattern1[] = "/&lt;ref[^\/]*\/&gt;/is"; //remove <ref name=something/>  
$pattern1[] = "/&lt;ref[^\/]*&gt;(.*?)&lt;\/ref&gt;/s";  //remove ref <ref>some text here</ref>
preg_replace($pattern1,"\n", $string);

而是输出:

互利共生已追溯 被定性为意识形态上的 介于个人主义和 无政府主义的集体主义形式。 《布莱克威尔政治百科全书》 思想”,布莱克威尔出版 1991 ISBN 0-631-17944-5,第 12 页11. 蒲鲁东首先描述了他的目标 作为“第三种社会形态, 共产主义和的综合 财产。”另一个是

我猜它被 &lt;br />

I got some problems with my patterns. Hope somebody could help me with this.

given a string

$string = Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."<ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>

I want to remove strings inside the <ref> (<ref name='something'></ref> or <ref></ref>) or remove the single ref tag <ref name='sss' />

after replacing the final out put should be :

Mutualism has been retrospectively
characterised as ideologically
situated between individualist and
collectivist forms of anarchism.
Proudhon first characterised his goal
as a "third form of society, the
synthesis of communism and property."
Another is

my code doesn't seem to work

$pattern1[] = "/<ref[^\/]*\/>/is"; //remove <ref name=something/>  
$pattern1[] = "/<ref[^\/]*>(.*?)<\/ref>/s";  //remove ref <ref>some text here</ref>
preg_replace($pattern1,"\n", $string);

instead it outputs :

Mutualism has been retrospectively
characterised as ideologically
situated between individualist and
collectivist forms of anarchism.
''Blackwell Encyclopaedia of Political
Thought'', Blackwell Publishing 1991
ISBN 0-631-17944-5, p. 11.</ref>
Proudhon first characterised his goal
as a "third form of society, the
synthesis of communism and
property." Another is

I guess it got caught up with the <br />

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

执着的年纪 2024-11-25 15:09:44

不是最有效的,但非常简单

$text=strip_tags(str_replace(array('<','>'),array('<','>'),$text));

strip_tags

not the most efficient, but very simple

$text=strip_tags(str_replace(array('<','>'),array('<','>'),$text));

strip_tags

离笑几人歌 2024-11-25 15:09:44

问题是你的第一个模式也匹配

<参考>阿夫里奇,保罗。 ''无政府主义者
声音:无政府主义的口述历史
在美国”,普林斯顿大学
按 1996 ISBN 0-691-04494-5,
第6页
>

[^\/]* 匹配以下内容

>阿夫里奇,保罗。 “无政府主义者的声音:
无政府主义的口述历史
美国》,普林斯顿大学出版社
1996 ISBN 0-691-04494-5,第 6 页

//is 为了匹配标签,

在这种情况下我们使用 (?:[^\/&]|&(?!gt;))* 代替[^\/]*

第一个 (?:[^\/&]|&(?!gt;))* 匹配除 / 和 之外的任何字符&, 作为第一个选项,或者 &如果后面没有gt;即不是 a > 的一部分符号作为第二个选项,这里 (?!gt;) 是一个否定的前瞻断言(参见 http://www.php.net/manual/en/regexp.reference.assertions.php) 这仅仅意味着不消耗 gt;,确保下一个3 个字符与此模式不匹配。

第二个仅匹配除 / 之外的任何字符。

所以下面的代码

$str = "Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."<ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>";
$match = array(
    "/<ref(?:[^\/&]|&(?!gt;))*\/>/is",
    "/<ref[^\/]*>(.*?)<\/ref>/s",);
$str = preg_replace($match,'',$str);
echo $str;

输出

互利共生已追溯
被定性为意识形态上的
介于个人主义和
无政府主义的集体主义形式。
蒲鲁东首先描述了他的目标
作为“第三种社会形态,
共产主义和财产的综合。”
另一个是

The Problem is that your first pattern is also matching

<ref>Avrich, Paul. ''Anarchist
Voices: An Oral History of Anarchism
in America'', Princeton University
Press 1996 ISBN 0-691-04494-5,
p.6<br />

[^\/]* matches the following

>Avrich, Paul. ''Anarchist Voices:
An Oral History of Anarchism in
America'', Princeton University Press
1996 ISBN 0-691-04494-5, p.6<br

the solution is to use /<ref(?:[^\/&]|&(?!gt;))*\/>/is to match tags

in this case we use (?:[^\/&]|&(?!gt;))* instead of [^\/]*

The first (?:[^\/&]|&(?!gt;))* Matches any character excluding / and &, As the first option, or & if its not followed by gt; i.e. not part of a > symbol as the second option here the (?!gt;) is a negative look ahead assertion (see http://www.php.net/manual/en/regexp.reference.assertions.php) this simply means with out consuming the gt;, insure the next 3 character don't match this pattern.

The Second simply matches any character that's not a /.

so the following code

$str = "Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."<ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>";
$match = array(
    "/<ref(?:[^\/&]|&(?!gt;))*\/>/is",
    "/<ref[^\/]*>(.*?)<\/ref>/s",);
$str = preg_replace($match,'',$str);
echo $str;

outputs

Mutualism has been retrospectively
characterised as ideologically
situated between individualist and
collectivist forms of anarchism.
Proudhon first characterised his goal
as a "third form of society, the
synthesis of communism and property."
Another is

有木有妳兜一样 2024-11-25 15:09:44

不建议使用正则表达式解析 HTML,但对于这个简单的情况,您可以执行以下操作:

<?php
preg_replace('/<ref.*?\/>|<ref>.*?<\/ref>/', '', $string);

It's not recommended to parse HTML with regex, but for this simple case you could do a:

<?php
preg_replace('/<ref.*?\/>|<ref>.*?<\/ref>/', '', $string);
霓裳挽歌倾城醉 2024-11-25 15:09:44

我已将您的原始字符串括在双引号中:

$string = "Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."<ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>";

$pattern = '#<ref.*?>(.*?</ref>)?#is';

print htmlspecialchars_decode(preg_replace($pattern, '', $string));

" 转换为双引号需要 htmlspecialchars_decode - 如果您要输出到执行此操作的设备,请忽略此选项你,比如浏览器。

输出:

互利共生已追溯
被定性为意识形态上的
介于个人主义和
无政府主义的集体主义形式。
蒲鲁东首先描述了他的目标
作为“第三种社会形态,
共产主义和财产的综合。”
另一个是

注释:

我已经交换了通常的 / 分隔符 用于#,这意味着/ 可以在模式内部使用而无需转义。

.* 默认是贪婪的。在模式中添加 ? 修饰符会使此变得不贪婪,这相当于添加 U 模式修饰符

<ref.*?> 匹配 <ref 后跟任何内容,直到下一个 > 出现成立。

.*? 匹配任何内容,直到下一个 <

/ref> 换行 .*?</ref> ()? 中的 > 表示需要找到零次或一次出现的情况。这适用于有开始和结束标签的情况,以及有开始标签但后面没有内容的情况。

如果您还想匹配开始标记及其后面的内容,但没有结束标记,您可以将模式更改为:

$pattern = '#<ref.*?>(.*?</ref>|.*)#is';

I've enclosed your original string in double quotes:

$string = "Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."<ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>";

$pattern = '#<ref.*?>(.*?</ref>)?#is';

print htmlspecialchars_decode(preg_replace($pattern, '', $string));

htmlspecialchars_decode is required to convert " to double quotes - omit this if you are outputting to a device that does this for you, such as a browser.

Output:

Mutualism has been retrospectively
characterised as ideologically
situated between individualist and
collectivist forms of anarchism.
Proudhon first characterised his goal
as a "third form of society, the
synthesis of communism and property."
Another is

Notes:

I've swapped the usual / delimiter for #, which means that / can be used inside the pattern without escaping it.

.* is greedy by default. Adding the ? modifier within the pattern makes this ungreedy, which is equivalent to adding the U pattern modifier.

<ref.*?> matches <ref followed by anything until the next > is found.

.*? matches anything until the next </ref>

Wrapping .*?</ref> in ()? means that zero or one occurrence needs to be found. This caters for situations where there is an opening and closing tag, and where there is an opening tag with no content following it.

If you want to also match an opening tag with content following it, but no closing tag, you can change the pattern to this:

$pattern = '#<ref.*?>(.*?</ref>|.*)#is';
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文