php 解析器:确定通过正则表达式找到的字符串是否位于锚标记内

发布于 2024-12-19 07:25:27 字数 2959 浏览 1 评论 0原文

已编辑。我知道 HTML 不应使用正则表达式进行解析。 我正在寻求帮助。如何在标签和文本的混合中找到任意字符串,然后确定它是否在锚点内?

我的 WordPress 网站中有一个交互式词汇表。其功能的一部分是在帖子内容中搜索术语表术语(文本字符串)。如果找到,该术语将包含在指向包含定义的自定义分类条目的链接中。

我喜欢它的工作方式,但一个问题是,如果该术语已经是链接的一部分,术语表解析器会劫持当前链接,通过在链接中插入链接。解析器纯粹基于正则表达式,没有 DOM 解析。 我知道 HTML 不应该用正则表达式来解析。但目前该函数只是搜索特定的文本字符串,它根本不尝试对标签执行任何操作。

但是有没有一种相对快速(在处理方面)且可靠的方法可以检查找到的字符串是否位于锚标记内?显然情况并非总是如此,因为该单词可能看起来像位于任何标签内。在这种情况下,术语表解析器不会添加链接。我知道这个功能会使用 DOM 解析器,但我不确定从哪里开始。

解析器:

function glossary_parse($content){

    //Run the glossary parser
    if (((!is_page() && get_option('glossaryOnlySingle') == 0) OR
    (!is_page() && get_option('glossaryOnlySingle') == 1 && is_single()) OR
    (is_page() && get_option('glossaryOnPages') == 1))){
        $glossary_index = get_children(array(
                                            'post_type'     => 'glossary',
                                            'post_status'   => 'publish',
                                            ));
        $current_title = get_the_title();                                   
        if ($glossary_index){
            $timestamp = time();
            foreach($glossary_index as $glossary_item){
                $timestamp++;
                $glossary_title = $glossary_item->post_title;
                if ($current_title == $glossary_title) {
                continue;
                }
                $glossary_search = '/\b'.$glossary_title.'s*?\b(?=([^"]*"[^"]*")*[^"]*$)/i';
                $glossary_replace = '<a'.$timestamp.'>$0</a'.$timestamp.'>';
                if (get_option('glossaryFirstOnly') == 1) {
                    $content_temp = preg_replace($glossary_search, $glossary_replace, $content, 1);
                }
                else {
                    $content_temp = preg_replace($glossary_search, $glossary_replace, $content);
                }
                $content_temp = rtrim($content_temp);

                    $link_search = '/<a'.$timestamp.'>('.$glossary_item->post_title.'[A-Za-z]*?)<\/a'.$timestamp.'>/i';
                    if (get_option('glossaryTooltip') == 1) {
                        $link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '" onmouseover="tooltip.show(\'' . addslashes($glossary_item->post_excerpt) . '\');" onmouseout="tooltip.hide();">$1</a>';
                    }
                    else {
                        $link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '">$1</a>';
                    }
                    $content_temp = preg_replace($link_search, $link_replace, $content_temp);
                    $content = $content_temp;
            }
        }
    }
    return $content;
}

Edited. I know HTML should not be parsed with regex. I am asking for help. How can I find an arbitrary string in a mix of tags and text and then determine if it is inside an anchor?

I have an interactive glossary in my WordPress site. Part of its functionality is searching the content of a post for a glossary term (a text string). If found, the term is wrapped in a link to a custom taxonomy entry that contains the definition.

I like how it works, but one hitch is that if the term is already part of a link, the glossary parser hijacks the current link, by inserting a link within the link. The parser is purely regex based, there isn't DOM parsing. I know that HTML should not be parsed with regex. But currently the function is just searching for a specific text string, its not trying to do anything with tags at all.

But is there a relatively fast (in terms of processing) and reliable way I can check if the found string is inside an anchor tag? Obviously this would not always be the case, as the word could be seemingly be inside any tag. The glossary parser would not add a link in this case. I know this feature would use a DOM parser, but I'm unsure where to go from here.

The parser:

function glossary_parse($content){

    //Run the glossary parser
    if (((!is_page() && get_option('glossaryOnlySingle') == 0) OR
    (!is_page() && get_option('glossaryOnlySingle') == 1 && is_single()) OR
    (is_page() && get_option('glossaryOnPages') == 1))){
        $glossary_index = get_children(array(
                                            'post_type'     => 'glossary',
                                            'post_status'   => 'publish',
                                            ));
        $current_title = get_the_title();                                   
        if ($glossary_index){
            $timestamp = time();
            foreach($glossary_index as $glossary_item){
                $timestamp++;
                $glossary_title = $glossary_item->post_title;
                if ($current_title == $glossary_title) {
                continue;
                }
                $glossary_search = '/\b'.$glossary_title.'s*?\b(?=([^"]*"[^"]*")*[^"]*$)/i';
                $glossary_replace = '<a'.$timestamp.'>$0</a'.$timestamp.'>';
                if (get_option('glossaryFirstOnly') == 1) {
                    $content_temp = preg_replace($glossary_search, $glossary_replace, $content, 1);
                }
                else {
                    $content_temp = preg_replace($glossary_search, $glossary_replace, $content);
                }
                $content_temp = rtrim($content_temp);

                    $link_search = '/<a'.$timestamp.'>('.$glossary_item->post_title.'[A-Za-z]*?)<\/a'.$timestamp.'>/i';
                    if (get_option('glossaryTooltip') == 1) {
                        $link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '" onmouseover="tooltip.show(\'' . addslashes($glossary_item->post_excerpt) . '\');" onmouseout="tooltip.hide();">$1</a>';
                    }
                    else {
                        $link_replace = '<a class="glossaryLink" href="' . get_permalink($glossary_item) . '" title="Glossary: '. $glossary_title . '">$1</a>';
                    }
                    $content_temp = preg_replace($link_search, $link_replace, $content_temp);
                    $content = $content_temp;
            }
        }
    }
    return $content;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文