PHP 将文件的前 2 行读入变量并通过子文件夹循环

发布于 2024-12-21 19:52:06 字数 3356 浏览 2 评论 0原文

我正在尝试使用 PHP 执行以下操作...

读取目录
查找所有 .md 和 .markdown 文件
读取这些 Markdown 文件的前 2 行。
如果在第 1 行上找到标题：文件标题，则将其添加到数组中
如果在<上找到描述：简短描述，则将其添加到数组中strong>第2行然后将其添加到数组中
如果找到子目录，则对其重复步骤1-5
现在应该有一个不错的列表/数组
将此列表/数组打印到屏幕以像这样显示。 ...

Directory 1 Name

<a href="LINK TO MARKDOWN FILE 1"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 1 line 2

<a href="LINK TO MARKDOWN FILE 2"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 2 line 2

<a href="LINK TO MARKDOWN FILE 3"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 3 line 2

Directory 2 Name

<a href="LINK TO MARKDOWN FILE 1"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 1 line 2

<a href="LINK TO MARKDOWN FILE 2"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 2 line 2

<a href="LINK TO MARKDOWN FILE 3"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 3 line 2

etc..........

到目前为止的代码

function getFilesFromDir($dir)
{
    $files = array();
    //scan directory passsed into function
    if ($handle = opendir($dir)) {
        while (false !== ($file = readdir($handle))) {

            // If file is .md or .markdown continue
            if (preg_match('/\.(md|markdown)$/', $file)) {

                // Grab first 2 lines of Markdown file
                $content = file($dir . '/' . $file);
                $title = $content[0];
                $description = $content[1];

                // If first 2 lines of Markdown file have a 
                // "Title: file title" and "Description: file description" lines we then
                // add these key/value pairs to the array for meta data

                // Match Title line
                $pattern = '/^(Title|Description):(.+)/';
                if (preg_match($pattern, $title, $matched)) {
                    $title = trim($matched[2]);
                }

                // match Description line 
                if (preg_match($pattern, $description, $matched)) {
                    $description = trim($matched[2]);
                }

                // Add .m and .markdown files and folder path to array
                // Add captured Title and Description to array as well
                $files[$dir][] = array("filepath" => $dir . '/' . $file,
                                       "title" => $title,
                                       "description" => $description
                                    );

            }
        }
        closedir($handle);
    }

    return $files;
}

用法

$dir = 'mdfiles';
$fileArray = getFilesFromDir($dir);

需要帮助

到目前为止，代码只需要添加在子目录上执行操作的功能以及匹配前两行代码然后运行正则表达式两次的方式，可能可以用不同的方式完成吗？

我认为有一个更好的方法，使我必须匹配标题和描述的正则表达式可以只运行一次？

有人可以帮助我修改以使此代码检测并在子目录上运行，并改进它读取 Markdown 文件的前两行以获取标题和描述（如果存在）的方式吗？

还需要帮助将数组打印到屏幕上，使其不仅显示数据，我知道如何做到这一点，但必须分解文件以在每组顶部显示文件夹名称，如上面的演示输出所示。

我很感激任何帮助

原文

I am trying to do the following with PHP...

Read a directory
Find all .md and .markdown files
Read the first 2 lines of these Markdown files.
If a Title: Title for the file is found on line 1 then add it to the array
If a Description: Short description is found on line 2 then add it to the array
If a Sub-directory is found, repeat steps 1-5 on them
Should now have a nice list/array
Print this list/array to screen to show up like this....

Directory 1 Name

<a href="LINK TO MARKDOWN FILE 1"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 1 line 2

<a href="LINK TO MARKDOWN FILE 2"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 2 line 2

<a href="LINK TO MARKDOWN FILE 3"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 3 line 2

Directory 2 Name

<a href="LINK TO MARKDOWN FILE 1"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 1 line 2

<a href="LINK TO MARKDOWN FILE 2"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 2 line 2

<a href="LINK TO MARKDOWN FILE 3"> TITLE from line 1 of Markdown FILE 1</a> <br>
Description from Markdown FILE 3 line 2

etc..........

Code so far

function getFilesFromDir($dir)
{
    $files = array();
    //scan directory passsed into function
    if ($handle = opendir($dir)) {
        while (false !== ($file = readdir($handle))) {

            // If file is .md or .markdown continue
            if (preg_match('/\.(md|markdown)$/', $file)) {

                // Grab first 2 lines of Markdown file
                $content = file($dir . '/' . $file);
                $title = $content[0];
                $description = $content[1];

                // If first 2 lines of Markdown file have a 
                // "Title: file title" and "Description: file description" lines we then
                // add these key/value pairs to the array for meta data

                // Match Title line
                $pattern = '/^(Title|Description):(.+)/';
                if (preg_match($pattern, $title, $matched)) {
                    $title = trim($matched[2]);
                }

                // match Description line 
                if (preg_match($pattern, $description, $matched)) {
                    $description = trim($matched[2]);
                }

                // Add .m and .markdown files and folder path to array
                // Add captured Title and Description to array as well
                $files[$dir][] = array("filepath" => $dir . '/' . $file,
                                       "title" => $title,
                                       "description" => $description
                                    );

            }
        }
        closedir($handle);
    }

    return $files;
}

Usage

$dir = 'mdfiles';
$fileArray = getFilesFromDir($dir);

Help needed

So far the code just needs to add the ability to do what it does on sub-directories and the way that it matches the first 2 lines of code and then runs the regex 2 times, can probably be done differently?

I would think there is a better way so that the REGEX I have to match the Title and Description can be run just once?

Can someone help me modify to make this code detect and run on sub-directories as well as improve the way it reads the first 2 lines of a markdown file to get the title and description if they exist?

Also need help printing the array to screen to make it not only just show the dat, I know how to do that but has to break the files up to show the Folder name at the top of each set like in my demo output above.

I appreciate any help

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

傾城如夢未必闌珊 2024-12-28 19:52:06

要递归地迭代文件，RecursiveDirectoryIterator 非常方便（相关：PHP 递归目录路径）。它已经提供了对 FileSystemObject 的轻松访问，这在您想要获取文件内容的情况下看起来很有用。

此外，可以运行一个正则表达式来解析文件的前两行，因为当您更频繁地执行它们时模式会被缓存，这应该没问题。一种模式的优点是代码更加结构化，但缺点是模式更加复杂。配置可能如下所示：

#
# configuration
#

$path = 'md';
$fileFilter = '~\.(md|markdown)$~';
$pattern = '~^(?:Title: (.*))?(?:(?:\r\n|\n)(?:Description: (.*)))?~u';

为了防止 markdown 文件实际上是 UTF-8 编码，我添加了 u 修饰符 (PCRE8)。

然后，代码的处理部分是在 $path 上使用递归目录迭代器，跳过与 $fileFilter 不匹配的文件，然后解析每个文件的前两行（如果文件至少可读并且至少有一行）并将其存储到基于目录的哈希表/数组 $result：

#
# main
#

# init result array (the nice one)
$result = array();

# recursive iterator for files
$iterator = new RecursiveIteratorIterator(
               new RecursiveDirectoryIterator($path, FilesystemIterator::KEY_AS_PATHNAME | FilesystemIterator::CURRENT_AS_FILEINFO), 
               RecursiveIteratorIterator::SELF_FIRST);

foreach($iterator as $path => $info)
{
    # filter out files that don't match
    if (!preg_match($fileFilter, $path)) continue;

    # get first two lines
    try
    {
        for
        (
            $maxLines = 2,
            $lines = '',
            $file = $info->openFile()
            ; 
            !$file->eof() && $maxLines--
            ; 
            $lines .= $file->fgets()
        );
        $lines = rtrim($lines, "\n");

        if (!strlen($lines)) # skip empty files 
            continue;
    }
    catch (RuntimeException $e)
    {
        continue; # files which are not readable are skipped.
    }

    # parse md file
    $r = preg_match($pattern, $lines, $matches);
    if (FALSE === $r)
    {
        throw new Exception('Regular expression failed.');
    }
    list(, $title, $description) = $matches + array('', '', '');

    # grow result array
    $result[dirname($path)][] = array($path, $title, $description);
}

剩下的是输出。由于哈希表是按目录哈希预先排序的，因此首先遍历目录，然后遍历其中的文件，这是相当直接的：

#
# output
#

$dirCounter = 0;
foreach ($result as $name => $dirs)
{
    printf("Directory %d %s\n", ++$dirCounter, basename($name));
    foreach ($dirs as $entry)
    {
        list($path, $title, $description) = $entry;
        printf("<a href='%s'>%s from line 1 of Markdown %s</a> <br>\n%s\n\n", 
                htmlspecialchars($path), 
                htmlspecialchars($title),               
                htmlspecialchars(basename($path)),
                htmlspecialchars($description)
              );
    }
}

To recursively iterate over files, the RecursiveDirectoryIterator is quite handy (related: PHP recursive directory path). It already offers an easy access to FileSystemObject as well which looks useful in your case as you want to obtain the files content.

Additionally it's possible to run one regular expression to parse the first two lines of the file, as patterns get cached when you execute them more often, it should be fine. One pattern has the benefit that the code is more structured, but the downside that the pattern is more complex. Configuration could look like this:

#
# configuration
#

$path = 'md';
$fileFilter = '~\.(md|markdown)$~';
$pattern = '~^(?:Title: (.*))?(?:(?:\r\n|\n)(?:Description: (.*)))?~u';

Just in case the markdown files are actually UTF-8 encoded, I added the u-modifier (PCRE8).

The processing part of the code is then using a recursive directory iterator over $path, skips files not matching $fileFilter and then parses the first two lines of each file (if a file is at least readable and has at least one line) and stores it into a directory based hashtable/array $result:

#
# main
#

# init result array (the nice one)
$result = array();

# recursive iterator for files
$iterator = new RecursiveIteratorIterator(
               new RecursiveDirectoryIterator($path, FilesystemIterator::KEY_AS_PATHNAME | FilesystemIterator::CURRENT_AS_FILEINFO), 
               RecursiveIteratorIterator::SELF_FIRST);

foreach($iterator as $path => $info)
{
    # filter out files that don't match
    if (!preg_match($fileFilter, $path)) continue;

    # get first two lines
    try
    {
        for
        (
            $maxLines = 2,
            $lines = '',
            $file = $info->openFile()
            ; 
            !$file->eof() && $maxLines--
            ; 
            $lines .= $file->fgets()
        );
        $lines = rtrim($lines, "\n");

        if (!strlen($lines)) # skip empty files 
            continue;
    }
    catch (RuntimeException $e)
    {
        continue; # files which are not readable are skipped.
    }

    # parse md file
    $r = preg_match($pattern, $lines, $matches);
    if (FALSE === $r)
    {
        throw new Exception('Regular expression failed.');
    }
    list(, $title, $description) = $matches + array('', '', '');

    # grow result array
    $result[dirname($path)][] = array($path, $title, $description);
}

What's left is the output. As the hashtable is pre-ordered by the directory hash, it's fairly straight forward by first iterating over the directories and then over the files within:

#
# output
#

$dirCounter = 0;
foreach ($result as $name => $dirs)
{
    printf("Directory %d %s\n", ++$dirCounter, basename($name));
    foreach ($dirs as $entry)
    {
        list($path, $title, $description) = $entry;
        printf("<a href='%s'>%s from line 1 of Markdown %s</a> <br>\n%s\n\n", 
                htmlspecialchars($path), 
                htmlspecialchars($title),               
                htmlspecialchars(basename($path)),
                htmlspecialchars($description)
              );
    }
}

回复收藏 0 原文

我很坚强 2024-12-28 19:52:06

这应该可行：

if (preg_match('/\.(md|markdown)$/', $file)) {
   // ...
} elseif (is_dir($file)) {
    $files = array_merge($files, getFilesFromDir($dir . '/' . $file));
}

运行正则表达式两次并不是那么糟糕，并且可能比尝试在两行中将某些内容散列在一起更好。但是，您可以使用 preg_replace 获得相同的结果：

$title = trim(preg_replace('/^Title:(.+)/', '$1', $content[0]));
$description = trim(preg_replace('/^Description:(.+)/', '$1', $content[1]));

要按照示例输出数组，请执行以下操作：

foreach ($filesArray as $directory => $files) {
    echo $directory . "\n\n";

    foreach ($files as $fileData) {
        echo '<a href="' . $fileData['filepath'] . '">' . $fileData['title'] . "</a><br />\n";
        echo $fileData['description'] . "\n\n";
    }
}

This should work:

if (preg_match('/\.(md|markdown)$/', $file)) {
   // ...
} elseif (is_dir($file)) {
    $files = array_merge($files, getFilesFromDir($dir . '/' . $file));
}

Running the regex twice isn't so bad, and may be better than trying to hash something together across both lines. However you could achieve the same result with preg_replace:

$title = trim(preg_replace('/^Title:(.+)/', '$1', $content[0]));
$description = trim(preg_replace('/^Description:(.+)/', '$1', $content[1]));

For outputting your array as per the example, this this:

foreach ($filesArray as $directory => $files) {
    echo $directory . "\n\n";

    foreach ($files as $fileData) {
        echo '<a href="' . $fileData['filepath'] . '">' . $fileData['title'] . "</a><br />\n";
        echo $fileData['description'] . "\n\n";
    }
}

回复收藏 0 原文

~没有更多了~