使用字符串操作来解决目录分隔符的疯狂问题？

发布于 2024-12-05 03:33:16 字数 844 浏览 5 评论 0原文

我正在努力转换一个网站。它涉及标准化图像和媒体文件的目录结构。我正在解析来自各种标签的路径信息，对它们进行标准化，检查媒体是否存在于新的标准化位置，如果不存在则将其放在那里。我正在使用字符串操作来执行此操作。

这有点开放式，但是有没有一个类、工具或概念可以让我避免一些头痛？例如，我遇到的问题是，su 目录中的页面 (website.com/subdir/dir/page.php) 具有相对图像路径 (../images /image.png），或其他类似的东西。这并不是说存在一个总体问题，而是有很多小问题加起来。

当我认为我的脚本涵盖了大多数情况时，我会收到类似 Could not find file at export/standardized_folder/proper_image_folderimage.png 的错误，它应该是 export/standardized_folder/proper_image_folder/图片.png。进行字符串解析和检查以确保目录分隔符位于正确的位置，这有点让我发疯。

我觉得我投入了太多的精力来使一次性导入脚本变得非常健壮。也许有人已经以一种可重复使用的方式解决了这个混乱，我可以利用这种方式？

帖子脚本：这是一个更深入的独家新闻。我编写的脚本解析一种“类型”的页面并从同类页面中提取内容。然后我转动脚本来解析另一种类型的页面，获取所有错误，并了解到我所有关于如何引用路径的假设都必须被抛弃。清洗、冲洗，重复。

因此，我正在考虑对我的脚本进行一些主要的重构，抛弃所有假设，并检查、重新检查和双重检查路径信息。由于我真的在尝试构建一个强大的路径构建脚本，因此希望我可以避免重新发明轮子。外面有轮子吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

趁微风不噪 2024-12-12 03:33:16

如果您的问题根源在于解析文档中的相对链接并解析为绝对链接（这应该是将链接图像路径映射到文件系统的工作的一半），我通常使用 Net_URL2 来自 pear。这是一个简单的类，可以完成工作。

要安装，只需以root身份调用

# pear install channel://pear.php.net/Net_URL2-0.3.1

即使它是测试版软件包，它也非常稳定。

举个小例子，假设有一个包含所有相关图像 src 的数组，并且该文档有一个基本 URL：

require_once('Net/URL2.php');

$baseUrl = 'http://www.example.com/test/images.html';

$docSrcs = array(...);

$baseUrl = new Net_URL2($baseUrl);

foreach($docSrcs as $href)
{
    $url = $baseUrl->resolve($href);
    echo ' * ', $href, ' -> ', $url->getURL(), "\n";
    // or
    echo " $href -> $url\n"; # Net_URL2 supports string context
}

这将根据您的基本 URL 将任何相对链接转换为绝对链接。基本 URL 首先是文档地址。该文档可以通过使用 base 元素指定另一个来覆盖它^文档。因此，您可以使用已经使用的 HTML 解析器（以及 src 和 href 值）进行查找。

Net_URL2 反映当前的 RFC 3986 进行 URL 解析。

另一个对 URL 处理可能很方便的东西是 getNormalizedURL 函数。它确实消除了一些潜在的错误情况，例如不必要的点段等。如果您需要将一个 URL 与另一个 URL 进行比较，并且自然地将 URL 映射到路径，那么这很有用：

foreach($docSrcs as $href)
{
    $url = $baseUrl->resolve($href);
    $url = $url->getNormalizedURL();
    echo " $href -> $url\n";
}

因此，您可以将所有 URL 解析为绝对 URL，并且您可以让它们标准化，您可以决定它们是否对您的网站有问题，只要 url 仍然是 Net_URL2 实例，您就可以使用众多函数之一来执行此操作：

$host = strtolower($url->getHost());
if (in_array($host, array('example.com', 'www.example.com'))
{
    # URL is on my server, process it further
}

Left is文件的具体路径URL：

$path = $url->getPath();

考虑到您正在与 UNIX 文件系统进行比较，该路径应该很容易以具体的基本目录作为前缀：

$filesystemImagePath = '/var/www/site-new/images';
$newPath = $filesystemImagePath . $path;
if (is_file($newPath))
{
    # new image already exists.
}

如果您在将基本路径与图像路径组合时遇到问题，则图像路径将始终具有开头有一个斜杠。

希望这有帮助。

If your problems have their root in resolving the relative links from a document and resolve to an absolute one (which should be half the job to map the linked images paths onto the file-system), I normally use Net_URL2 from pear. It's a simple class that just does the job.

To install, as root just call

# pear install channel://pear.php.net/Net_URL2-0.3.1

Even if it's a beta package, it's really stable.

A little example, let's say there is an array with all the images srcs in question and there is a base-URL for the document:

require_once('Net/URL2.php');

$baseUrl = 'http://www.example.com/test/images.html';

$docSrcs = array(...);

$baseUrl = new Net_URL2($baseUrl);

foreach($docSrcs as $href)
{
    $url = $baseUrl->resolve($href);
    echo ' * ', $href, ' -> ', $url->getURL(), "\n";
    // or
    echo " $href -> $url\n"; # Net_URL2 supports string context
}

This will convert any relative links into absolute ones based on your base URL. The base URL is first of all the documents address. The document can override it by specifying another one with the base element^Docs. So you could look that up with the HTML parser you're already using (as well as the src and href values).

Net_URL2 reflects the current RFC 3986 to do the URL resolving.

Another thing that might be handy for your URL handling is the getNormalizedURL function. It does remove some potential error-cases like needless dot segments etc. which is useful if you need to compare one URL with another one and naturally for mapping the URL to a path then:

foreach($docSrcs as $href)
{
    $url = $baseUrl->resolve($href);
    $url = $url->getNormalizedURL();
    echo " $href -> $url\n";
}

So as you can resolve all URLs to absolute ones and you get them normalized, you can decide whether or not they are in question for your site, as long as the url is still a Net_URL2 instance, you can use one of the many functions to do that:

$host = strtolower($url->getHost());
if (in_array($host, array('example.com', 'www.example.com'))
{
    # URL is on my server, process it further
}

Left is the concrete path to the file in the URL:

$path = $url->getPath();

That path, considering you're comparing against a UNIX file-system, should be easy to prefix with a concrete base directory:

$filesystemImagePath = '/var/www/site-new/images';
$newPath = $filesystemImagePath . $path;
if (is_file($newPath))
{
    # new image already exists.
}

If you've got problems to combine the base path with the image path, the image path will always have a slash at the beginning.

Hope this helps.

回复收藏 0 原文