如何解决字符串替换失败的问题
注意:我的问题不是我的链接没有被替换。但是,它是嵌套的。 注释
some string with www.google.com/blah/blah also something else www.google.com
例如,这是第二个字符串替换完成时的
,第一个字符串的一部分也有效(www.google.com/blah/blah),因此它会替换该链接两次。我有一个允许用户发表评论的网络应用程序。 我正在处理输入字符串,并将所有链接转换为 html 链接格式当我在页面上显示时。原始用户输入字符串保留在数据库中并且什么也没有发生,因此它不会在处理过程中被损坏。当我在页面上显示它时,我就在上面执行了我的功能。
现在,这是我用来将所有链接替换为其 html 格式的逻辑
- Regex 所有链接
- 对于每个匹配项,将链接替换为其原始字符串中的 html 格式版本。
- 最后显示字符串。
例如:www.google.com
变为 www.google.com
就在它显示在页面上之前。
这一直很有效,直到最近,我的一位客户发布了包含来自同一域的两个链接的内容。
链接是
- www.google.com/images/blahblah
- www.google.com
我的问题是,当第二次时,字符串替换完成(我正在使用 StringBuilder.Replace
)第一个链接也被替换!
所以,首先,
www.google.com/images/blahblah
变成
<a href="http://www.google.com/images/blahblah" target="_blank">www.google.com/image/blahblah</a>
哪样都好。但是第二个字符串替换出现了问题,因为替换是全局的,它对已经处理的链接进行替换,因此原始(上面)链接变得扭曲,因为它也在那里看到 www.google.com 。
这太混乱了,我实际上得到了一根被肢解的、令人憎恶的绳子。
我该如何避免这种情况?
Regex.Matches
是否提供匹配元素的索引供我使用?我到处都找不到它。
最好的处理方法是什么?有什么建议吗?
抱歉问了这么长的问题。
我可以通过手动遍历字符串来做到这一点,但它又长又痛苦,必须有一个好方法来做到这一点...
编辑按照有人的要求添加额外信息:
我的正则表达式:
string rPattern = @"(((http|ftp|https):\/\/)|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#!]*[\w\-\@?^=%&/~\+#])?";
Regex rLinks = new Regex(rPattern, RegexOptions.IgnoreCase);
MatchCollection matches = rLinks.Matches(inputString);
然后我正在使用
foreach(Match match in matches)
{
if(match.value.StartsWith("www.youtube.com/watch"))
{
//logic to embed youtube video - this works fine.
}
}
//Here I replace all hyperlinks to their <a href> parts
NOTE : My problem is NOT that my links are not being replaced. But, it's being NESTED.
eg, this is the comment
some string with www.google.com/blah/blah also something else www.google.com
by the time second string replace is done, part of first one is also valid (www.google.com/blah/blah) so it's replacing that link twice.
I have a web app which lets users comment.
I am processing the input string and converting all links to html link format when I display it on the page. Original user input string stays in DB and nothing ever happens so it's not corrupted over processing. Just when I show that on page, I do my function on it.
Now, this is the logic I am using to replace all links with their html formats
- Regex all links
- For each match, replace link with it's html format version in the original string.
- Finally display string.
ex: www.google.com
becomes <a href="http://www.google.com" target="_blank">www.google.com</a>
just before it's displayed on page.
This was working great until recently, one of my customer posted a content with two links from same domain.
the links were, say,
- www.google.com/images/blahblah
- www.google.com
My problem is, when the second time around, a string replace is done (I am using StringBuilder.Replace
) the first link gets replaced as well!
so, firstly,
www.google.com/images/blahblah
becomes
<a href="http://www.google.com/images/blahblah" target="_blank">www.google.com/image/blahblah</a>
which is well. But the problem arises for second string replace, since replace is global, it does a replace on already processed link so the original (above) link becomes twisted as it sees www.google.com in there as well.
This is messing up so much that I actually get a mutilated abomination of a string.
How do I avoid this?
Does the Regex.Matches
provide an index of matched element for me to work with? I couldn't find it anywhere.
What's the best way to deal with? any suggestions?
sorry for lengthy question.
I can prolly do this by manually traversing string but it's long and painful there's got to be a good way to do it...
edit adding extra info as someone asked:
My regex:
string rPattern = @"(((http|ftp|https):\/\/)|www\.)[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#!]*[\w\-\@?^=%&/~\+#])?";
Regex rLinks = new Regex(rPattern, RegexOptions.IgnoreCase);
MatchCollection matches = rLinks.Matches(inputString);
then I am using
foreach(Match match in matches)
{
if(match.value.StartsWith("www.youtube.com/watch"))
{
//logic to embed youtube video - this works fine.
}
}
//Here I replace all hyperlinks to their <a href> parts
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Regex.Matches
返回一个MatchCollection
。Match.Index
就是您要查找的内容。但实际上,您可能正在寻找更像这样的东西:
或者,您可以使用 matchEvaluator 来做更高级的工作(例如确保我们不会添加双 http://.
Regex.Matches
returns aMatchCollection
.Match.Index
Is what you're looking for.But really, you're probably looking for something more like this:
Or, you can use a matchEvaluator to do more advanced work (like ensure we don't add a double http://.
我有同样的需求,这就是我过去几年一直在使用的:
这是一个提示。如果您确实希望它在页面上有大量评论时表现良好,请在发布评论时将不安全和安全版本都存储在数据库中。这样,在页面上显示每条评论时,您就不必重复调用此函数。
I had the same need and this is what I've been using for the past couple years now:
And here's a tip. If you really want this to perform well with lots of comments on the page, then store both the unsafe and safe versions in the database when the comment is posted. That way you don't have to call this function repeatedly when displaying every comment on a page.
使用
Regex.Replace
方法,例如:Use
Regex.Replace
method, e.g.:扮演魔鬼拥护者的角色:
因此,您想要更正看起来像这样的字符串:
但是,而不是像这样的字符串:
www.example.com
www.example.com/foo/bar
www.example.co.tw/baz.moo?foo=1
我猜我是对的。简单的解决方案,扩展您的正则表达式以查看看起来像 URL 的内容的任一侧,并在以下情况下忽略它:
href="
和" target="_blank"> 之间;
" target="_blank">
和之间
To play devils advocate:
So, you want to correct strings that look like:
but, not strings that look like:
www.example.com
www.example.com/foo/bar
www.example.co.tw/baz.moo?foo=1
I would guess that I am correct. Simple solution, expand your regex to look either side of the thing that looks like a URL and to ignore it if it:
href="
and a" target="_blank">
" target="_blank">
and a</a>