将基域添加到目录的正则表达式

发布于 2024-09-11 15:16:53 字数 396 浏览 4 评论 0原文

需要缓存10个网站。缓存时:照片、css、js 等无法正确显示,因为基本域未附加到目录。我需要一个正则表达式将基本域添加到目录中。下面的示例

基域: http://www.example.com

使用 img src 读取缓存页面时会出现问题="thumb/123.jpg" 或 src="/inc/123.js"。

如果是 img src="http://www.example.com/thumb/123.jpg" 或 src="http://www.example.com/inc/123.js",它们将正确显示。

正则表达式类似于: if (src=") 后面没有基域,则添加基域

10 websites need to be cached. When caching: photos, css, js, etc are not displayed properly because the base domain isn't attached to the directory. I need a regex to add the base domain to the directory. examples below

base domain: http://www.example.com

the problem occurs when reading cached pages with img src="thumb/123.jpg" or src="/inc/123.js".

they would display correctly if it was img src="http://www.example.com/thumb/123.jpg" or src="http://www.example.com/inc/123.js".

regex something like: if (src=") isn't followed by the base domain then add the base domain

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

百思不得你姐 2024-09-18 15:16:54

在不了解语言的情况下,您可以使用(也许是最便携的)替代修饰符:

s/^(src=")([^"]+")$/$1www\.example\.com\/$2/

这应该执行以下操作:
1. 字符串 'src="' (并将其捕获到变量 $1 中)
2. 一个或多个非双引号 (") 字符,后跟 "(并将其捕获到变量 $2 中)
3. 在两个捕获组之间替换“www.example.com/”。

根据语言的不同,您可以将其包装在一个条件中,以检查域是否存在,如果未找到则进行替换。

检查域: /www\.example\.com/i 应该可以。

编辑:参见评论:

对于 PHP,我会做一些不同的事情。我可能会使用 simplexml。不过,我认为这不会很好地翻译,所以这是一个正则表达式...

$html = file_get_contents('/path/to/file.html');
$regex_match = '/(src="|href=")[^(?:www.example.com\/)]([^"]+")/gi';
$regex_substitute = '$1www.example.com/$2';
preg_replace($regex_match, $regex_substitute, $html);

注意:我实际上并没有运行它来调试它,它只是即兴的。我会担心三件事。首先,我不确定 preg_replace 将如何处理 / 字符。不过,我认为您不会关心这个问题,除非 VB 也有类似的问题。其次,如果换行符有可能妨碍,我可能会更改正则表达式。第三,我添加了 [^(?:www\.example\.com)] 位。这应该将匹配更改为任何没有 www.example.com/ 的 src 或 href,但这取决于所使用的正则表达式类型 (POSIX/PCRE)。

其余的更改应该没问题(我添加了 href=" 并使其不区分大小写 (\i),并且需要将其设置为全局 (\g),否则,它只会匹配一次)。

我希望有帮助。

without knowing the language, you can use the (maybe most portable) substitute modifier:

s/^(src=")([^"]+")$/$1www\.example\.com\/$2/

This should do the following:
1. the string 'src="' (and capture it in variable $1)
2. one or more non-double-quote (") character followed by " (and capture it in variable $2)
3. Substitutes 'www.example.com/' in between the two capture groups.

Depending on the language, you can wrap this in a conditional that checks for the existence of the domain and substitutes if it isn't found.

to check for domain: /www\.example\.com/i should do.

EDIT: See comments:

For PHP, I would do this a bit differently. I would probably use simplexml. I don't think that will translate well, though, so here's a regex one...

$html = file_get_contents('/path/to/file.html');
$regex_match = '/(src="|href=")[^(?:www.example.com\/)]([^"]+")/gi';
$regex_substitute = '$1www.example.com/$2';
preg_replace($regex_match, $regex_substitute, $html);

Note: I haven't actually run this to debug it, it's just off the cuff. I would be concerned about 3 things. first, I am unsure how preg_replace will handle the / character. I don't think you're concerned with this, though, unless VB has a similar problem. Second, If there's a chance that line breaks would get in the way, I might change the regex. Third, I added the [^(?:www\.example\.com)] bit. This should change the match to any src or href that doesn't have www.example.com/ there, but this depends on the type of regex being used (POSIX/PCRE).

The rest of the changes should be fine (I added href=" and also made it case-insensitive (\i) and there's a requirement to make it global (\g) otherwise, it will just match once).

I hope that helps.

嘿咻 2024-09-18 15:16:54

匹配正则表达式:

(?:src|href)="(http://www\.example\.com/)?.+

Matching regular expression:

(?:src|href)="(http://www\.example\.com/)?.+
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文