我正在开发一个 ASP.NET 响应过滤器,它会重写 URL 以在特定情况下指向不同的域。
由于 ASP.NET 对响应写入进行分块,因此在页面完全流式传输之前,我的过滤器会被调用多次。 这意味着我需要小心,每次调用 Regex.Replace 不会双重替换 url (您最终会得到 http://foo.comhttp://foo.com/path)。
为此,我尝试使用负后向表达式进行替换,但它似乎不起作用:
content = Regex.Replace(content,"((?<!" + newDomain + ")" + match + ")", newDomain + match);
这会创建一个正则表达式,例如:
((?<!http://www.foo.com/)actual/url)
但是,它似乎不尊重后向查找,并且我将所有内容加倍更换。
有任何想法吗?
编辑:当我使用 Regex Coach 等工具根据示例数据对其进行测试时,此正则表达式非常有效。
编辑2:添加了斜线,它实际上就在那里。
I am working on a ASP.NET response filter that rewrites URL's to point to a different domain in specific situations.
Because ASP.NET chunks the response writes, my filter gets called several times before the page is fully streamed. This means that I need to be careful that each call to Regex.Replace doesn't double replace a url (You end up with http://foo.comhttp://foo.com/path).
To do this, I'm trying to use a negative lookbehind expression for the replace, but it doesn't seem to be working:
content = Regex.Replace(content,"((?<!" + newDomain + ")" + match + ")", newDomain + match);
This creates a regex like:
((?<!http://www.foo.com/)actual/url)
However, it seems to not respect the look behind and I am getting everything double replaced.
Any ideas?
EDIT: This regex works great when I use a tool like Regex Coach to test it against sample data.
EDIT 2: Added the slash, it is actually there.
发布评论
评论(5)
我会尝试第三个角度。
我认为您混淆了您的正则表达式“匹配”正则表达式教练中的某些内容这一事实,它与您想要的部分匹配。 因此,您对替换结果感到惊讶。
替换将所有匹配输入替换为新令牌。
负向后查找可确保该模式不存在,但该模式不是匹配输入的一部分。
您得到的结果是因为只有 URL 的路径(您的匹配字符串)是匹配的输入,并且您将其替换为 newDomain 变量。
这就是为什么你会得到你所得到的结果。
I will try a third angle.
I think you are confusing that fact your regex "matches" something in regex coach, with it matching the part you want. Therefore you are surprised by the replace results.
the replace swaps all the matched input for the new token.
the negative lookbehind makes sure the pattern is not present, but the pattern is not part of the matched input.
the results you are getting is because only the path (your match string) of your URL is the matched input and you are replacing this with the newDomain variable.
That is why you are getting the results you are getting.
几个想法:
语法,也没有我的书,所以这可能是一个有争议的问题。
希望其中一些有所帮助。
A couple of thoughts:
<!
syntax and don't have my books to hand so this may be a moot point.Hope some of that is of help.
我会尝试这个,
只有域不是 newDomain 并且路径匹配时,这才会匹配(从而替换表达式上的域部分)。
I would try this
This will match (and thus replace the domain part on the expression) only is the domain is not newDomain and the path is match.
也许我错过了一些东西,但是你应该使用消极的回顾吗? 从本质上讲,向后查找不会匹配任何内容。 而您想要匹配域和路径,然后替换域。 正确的?
所以它应该更像是这样的:
这个想法是利用分组来发挥你的优势。 这就是 $2 部分将获取匹配的后半部分(路径)并将其附加到新域的地方。 我在 Regex Hero (.NET 正则表达式测试器)中对此进行了测试,它有效。 顺便说一句,Regex Coach 是基于 Perl 的,与 .NET 正则表达式引擎相比,您可能会遇到一些差异。
Maybe I'm missing something, but should you be using negative lookbehinds at all? A lookbehind, by nature, will not match anything. Whereas you are wanting to match the domain and the path, and then replace the domain. Right?
So it should be something more like this:
The idea is to use grouping to your advantage. That's where the $2 part will grab the second half of the match (the path) and append it to the new domain. I tested this in Regex Hero (a .NET regex tester) and it works. By the way, The Regex Coach is Perl-based and you may run into some difference when comparing to the .NET regex engine.
当您在字符串中找不到要替换的域部分时,仅替换它的想法怎么样?
即,滥用 perl 作为简写:
How about the idea of only replacing it, when you don't find the to-be-replaced-with domain part in the string?
I.e., to abuse perl as shorthand: