我的模式有一些问题。希望有人能帮助我解决这个问题。
给定一个字符串
$string = Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."<ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>
,我想删除 <ref> 内的字符串( 或 )或
在替换后 删除单个引用标签 [最终输出应该是:]
互利共生已追溯
被定性为意识形态上的
介于个人主义和
无政府主义的集体主义形式。
蒲鲁东首先描述了他的目标
作为“第三种社会形态,
共产主义和财产的综合。”
另一个是
我的代码似乎不起作用,
$pattern1[] = "/<ref[^\/]*\/>/is"; //remove <ref name=something/>
$pattern1[] = "/<ref[^\/]*>(.*?)<\/ref>/s"; //remove ref <ref>some text here</ref>
preg_replace($pattern1,"\n", $string);
而是输出:
互利共生已追溯
被定性为意识形态上的
介于个人主义和
无政府主义的集体主义形式。
《布莱克威尔政治百科全书》
思想”,布莱克威尔出版 1991
ISBN 0-631-17944-5,第 12 页11.
蒲鲁东首先描述了他的目标
作为“第三种社会形态,
共产主义和的综合
财产。”另一个是
我猜它被 <br />
I got some problems with my patterns. Hope somebody could help me with this.
given a string
$string = Mutualism has been retrospectively characterised as ideologically situated between individualist and collectivist forms of anarchism.<ref>Avrich, Paul. ''Anarchist Voices: An Oral History of Anarchism in America'', Princeton University Press 1996 ISBN 0-691-04494-5, p.6<br />''Blackwell Encyclopaedia of Political Thought'', Blackwell Publishing 1991 ISBN 0-631-17944-5, p. 11.</ref> Proudhon first characterised his goal as a "third form of society, the synthesis of communism and property."<ref>Pierre-Joseph Proudhon. ''What Is Property?'' Princeton, MA: Benjamin R. Tucker, 1876. p. 281.</ref> Another is <ref name=rupert/>
I want to remove strings inside the <ref> (<ref name='something'></ref> or <ref></ref>) or remove the single ref tag <ref name='sss' />
after replacing the final out put should be :
Mutualism has been retrospectively
characterised as ideologically
situated between individualist and
collectivist forms of anarchism.
Proudhon first characterised his goal
as a "third form of society, the
synthesis of communism and property."
Another is
my code doesn't seem to work
$pattern1[] = "/<ref[^\/]*\/>/is"; //remove <ref name=something/>
$pattern1[] = "/<ref[^\/]*>(.*?)<\/ref>/s"; //remove ref <ref>some text here</ref>
preg_replace($pattern1,"\n", $string);
instead it outputs :
Mutualism has been retrospectively
characterised as ideologically
situated between individualist and
collectivist forms of anarchism.
''Blackwell Encyclopaedia of Political
Thought'', Blackwell Publishing 1991
ISBN 0-631-17944-5, p. 11.</ref>
Proudhon first characterised his goal
as a "third form of society, the
synthesis of communism and
property." Another is
I guess it got caught up with the <br />
发布评论
评论(4)
不是最有效的,但非常简单
strip_tags
not the most efficient, but very simple
strip_tags
问题是你的第一个模式也匹配
[^\/]*
匹配以下内容//is 为了匹配标签,
在这种情况下我们使用
(?:[^\/&]|&(?!gt;))*
代替[^\/]*
第一个
(?:[^\/&]|&(?!gt;))*
匹配除 / 和 之外的任何字符&, 作为第一个选项,或者 &如果后面没有gt;即不是 a > 的一部分符号作为第二个选项,这里(?!gt;)
是一个否定的前瞻断言(参见 http://www.php.net/manual/en/regexp.reference.assertions.php) 这仅仅意味着不消耗 gt;,确保下一个3 个字符与此模式不匹配。第二个仅匹配除 / 之外的任何字符。
所以下面的代码
输出
The Problem is that your first pattern is also matching
[^\/]*
matches the followingthe solution is to use
/<ref(?:[^\/&]|&(?!gt;))*\/>/is
to match tagsin this case we use
(?:[^\/&]|&(?!gt;))*
instead of[^\/]*
The first
(?:[^\/&]|&(?!gt;))*
Matches any character excluding / and &, As the first option, or & if its not followed by gt; i.e. not part of a > symbol as the second option here the(?!gt;)
is a negative look ahead assertion (see http://www.php.net/manual/en/regexp.reference.assertions.php) this simply means with out consuming the gt;, insure the next 3 character don't match this pattern.The Second simply matches any character that's not a /.
so the following code
outputs
不建议使用正则表达式解析 HTML,但对于这个简单的情况,您可以执行以下操作:
It's not recommended to parse HTML with regex, but for this simple case you could do a:
我已将您的原始字符串括在双引号中:
将
"
转换为双引号需要htmlspecialchars_decode
- 如果您要输出到执行此操作的设备,请忽略此选项你,比如浏览器。输出:
注释:
我已经交换了通常的
/
分隔符 用于#
,这意味着/
可以在模式内部使用而无需转义。.*
默认是贪婪的。在模式中添加?
修饰符会使此变得不贪婪,这相当于添加U
模式修饰符。<ref.*?>
匹配<ref
后跟任何内容,直到下一个>
出现成立。.*?
匹配任何内容,直到下一个<
/ref> 换行
.*?</ref>
()?
中的 > 表示需要找到零次或一次出现的情况。这适用于有开始和结束标签的情况,以及有开始标签但后面没有内容的情况。如果您还想匹配开始标记及其后面的内容,但没有结束标记,您可以将模式更改为:
I've enclosed your original string in double quotes:
htmlspecialchars_decode
is required to convert"
to double quotes - omit this if you are outputting to a device that does this for you, such as a browser.Output:
Notes:
I've swapped the usual
/
delimiter for#
, which means that/
can be used inside the pattern without escaping it..*
is greedy by default. Adding the?
modifier within the pattern makes this ungreedy, which is equivalent to adding theU
pattern modifier.<ref.*?>
matches<ref
followed by anything until the next>
is found..*?
matches anything until the next</ref>
Wrapping
.*?</ref>
in()?
means that zero or one occurrence needs to be found. This caters for situations where there is an opening and closing tag, and where there is an opening tag with no content following it.If you want to also match an opening tag with content following it, but no closing tag, you can change the pattern to this: