DOM 错误 - ID“someAnchor”已在实体中定义,第 X 行

发布于 2024-09-16 23:47:59 字数 7883 浏览 4 评论 0原文

如果我尝试将 HTML 文档加载到 PHP DOM 中,我会收到以下错误:

Error DOMDocument::loadHTML() [domdocument.loadhtml]: ID someAnchor already defined in Entity, line: 9

我无法弄清楚原因。下面是一些将 HTML 字符串加载到 DOM 中的代码。

第一个不包含锚标记,第二个包含锚标记。第二个文档产生错误。

希望您能够将其剪切并粘贴到脚本中并运行它以查看相同的输出:

<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);


$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
</body>
</html>
EOT;

$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>
</body>
</html>
EOT;

class domGrabber
    {
    public $_FileErrorStr = '';

    /**
    *@desc DOM object factory does the work of loading the DOM object
    */
    public function getLoadAsDOMObj($htmlString)
        {
        $this->_FileErrorStr =''; //reset error container
        $xmlDoc = new DOMDocument();
        set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed
        $xmlDoc->loadHTML($htmlString);
        restore_error_handler();
        return $xmlDoc;
        }

    /**
    *@desc public so that it can catch errors from outside this class
    */
    public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
        {
        if ($this->_FileErrorStr === null)
            {
            $this->_FileErrorStr = $errstr;
            }
        else    {
            $this->_FileErrorStr .= (PHP_EOL . $errstr);
            }
        }
    }

$domGrabber = new  domGrabber();
$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithNoAnchor );

echo 'PHP Version: '. phpversion() .'<br />'."\n";

echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
    {
    echo 'Error'. $domGrabber->_FileErrorStr;
    }



$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithAnchor);
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
    {
    echo 'Error'. $domGrabber->_FileErrorStr;
    }

我在 Firefox 源代码视图中得到以下输出:

PHP Version: 5.2.9<br />
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
</body></html>
</pre>
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>

</body></html>
</pre>
Error
DOMDocument::loadHTML() [<a href='domdocument.loadhtml'>domdocument.loadhtml</a>]: ID someAnchor already defined in Entity, line: 9

那么,为什么 DOM 说 someAnchor 已经定义了?


更新:

我尝试了两者

  • 而不是使用 loadHTML() 我使用了 loadXML() 方法 - 并且修复了它
  • 而不是同时拥有 id 和名称我只使用了 id - 属性并修复了它。

为了完整起见,请参阅此处的比较脚本:

<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);


$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithNoAnchor</p>
</body>
</html>
EOT;

$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchor</p>
<a  name="someAnchor" id="someAnchor" ></a>
</body>
</html>
EOT;

$stringWithAnchorButOnlyIdAtt = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchorButOnlyIdAtt</p>
<a id="someAnchor"></a>
</body>
</html>
EOT;

class domGrabber
    {
    public $_FileErrorStr = '';
    public $useHTMLMethod = TRUE;

    /**
    *@desc DOM object factory does the work of loading the DOM object
    */
    public function loadDOMObjAndWriteOut($htmlString)
        {
        $this->_FileErrorStr ='';

        $xmlDoc = new DOMDocument();
        set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed


        if ($this->useHTMLMethod)
            {
            $xmlDoc->loadHTML($htmlString);
            }
        else    {
            $xmlDoc->loadXML($htmlString);
            }


        restore_error_handler();

        echo "<h1>";
        echo ($this->useHTMLMethod) ? 'using xmlDoc->loadHTML() ' : 'using $xmlDoc->loadXML()';
        echo "</h1>";
        echo '<pre>';
        print $xmlDoc->saveXML();
        echo '</pre>'."\n";
        if ($this->_FileErrorStr)
            {
            echo 'Error'. $this->_FileErrorStr;
            }
        }

    /**
    *@desc public so that it can catch errors from outside this class
    */
    public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
        {
        if ($this->_FileErrorStr === null)
            {
            $this->_FileErrorStr = $errstr;
            }
        else    {
            $this->_FileErrorStr .= (PHP_EOL . $errstr);
            }
        }
    }

$domGrabber = new  domGrabber();

echo 'PHP Version: '. phpversion() .'<br />'."\n";

$domGrabber->useHTMLMethod = TRUE; //DOM->loadHTML
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);

$domGrabber->useHTMLMethod = FALSE; //use DOM->loadXML
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);

If I try to load an HTML document into PHP DOM I get an error along the lines of:

Error DOMDocument::loadHTML() [domdocument.loadhtml]: ID someAnchor already defined in Entity, line: 9

I cannot work out why. Here is some code that loads an HTML string into DOM.

First without containing an anchor tag and second with one. The second document produces an error.

Hopefully you should be able to cut and paste it into a script and run it to see the same output:

<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);


$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
</body>
</html>
EOT;

$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>
</body>
</html>
EOT;

class domGrabber
    {
    public $_FileErrorStr = '';

    /**
    *@desc DOM object factory does the work of loading the DOM object
    */
    public function getLoadAsDOMObj($htmlString)
        {
        $this->_FileErrorStr =''; //reset error container
        $xmlDoc = new DOMDocument();
        set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed
        $xmlDoc->loadHTML($htmlString);
        restore_error_handler();
        return $xmlDoc;
        }

    /**
    *@desc public so that it can catch errors from outside this class
    */
    public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
        {
        if ($this->_FileErrorStr === null)
            {
            $this->_FileErrorStr = $errstr;
            }
        else    {
            $this->_FileErrorStr .= (PHP_EOL . $errstr);
            }
        }
    }

$domGrabber = new  domGrabber();
$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithNoAnchor );

echo 'PHP Version: '. phpversion() .'<br />'."\n";

echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
    {
    echo 'Error'. $domGrabber->_FileErrorStr;
    }



$xmlDoc = $domGrabber->getLoadAsDOMObj($stringWithAnchor);
echo '<pre>';
print $xmlDoc->saveXML();
echo '</pre>'."\n";
if ($domGrabber->_FileErrorStr)
    {
    echo 'Error'. $domGrabber->_FileErrorStr;
    }

I get the following out put in my Firefox source code view:

PHP Version: 5.2.9<br />
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
</body></html>
</pre>
<pre><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns="http://www.w3.org/1999/xhtml"><head><title>My document</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /></head><body>
<h1>Hello</h1>
<a name="someAnchor" id="someAnchor"></a>

</body></html>
</pre>
Error
DOMDocument::loadHTML() [<a href='domdocument.loadhtml'>domdocument.loadhtml</a>]: ID someAnchor already defined in Entity, line: 9

So, why is DOM saying that someAnchor is already defined?


Update:

I experimented with both

  • Instead of using loadHTML() I used the loadXML() method - and that fixed it
  • Instead of having both id and name I used just id - Attribute and that fixed it.

See the comparison script here for the sake of completion:

<?php
ini_set('display_errors', 1);
error_reporting(E_ALL);


$stringWithNoAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithNoAnchor</p>
</body>
</html>
EOT;

$stringWithAnchor = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchor</p>
<a  name="someAnchor" id="someAnchor" ></a>
</body>
</html>
EOT;

$stringWithAnchorButOnlyIdAtt = <<<EOT
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>My document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body >
<p>stringWithAnchorButOnlyIdAtt</p>
<a id="someAnchor"></a>
</body>
</html>
EOT;

class domGrabber
    {
    public $_FileErrorStr = '';
    public $useHTMLMethod = TRUE;

    /**
    *@desc DOM object factory does the work of loading the DOM object
    */
    public function loadDOMObjAndWriteOut($htmlString)
        {
        $this->_FileErrorStr ='';

        $xmlDoc = new DOMDocument();
        set_error_handler(array($this, '_FileErrorHandler')); // Warnings and errors are suppressed


        if ($this->useHTMLMethod)
            {
            $xmlDoc->loadHTML($htmlString);
            }
        else    {
            $xmlDoc->loadXML($htmlString);
            }


        restore_error_handler();

        echo "<h1>";
        echo ($this->useHTMLMethod) ? 'using xmlDoc->loadHTML() ' : 'using $xmlDoc->loadXML()';
        echo "</h1>";
        echo '<pre>';
        print $xmlDoc->saveXML();
        echo '</pre>'."\n";
        if ($this->_FileErrorStr)
            {
            echo 'Error'. $this->_FileErrorStr;
            }
        }

    /**
    *@desc public so that it can catch errors from outside this class
    */
    public function _FileErrorHandler($errno, $errstr, $errfile, $errline)
        {
        if ($this->_FileErrorStr === null)
            {
            $this->_FileErrorStr = $errstr;
            }
        else    {
            $this->_FileErrorStr .= (PHP_EOL . $errstr);
            }
        }
    }

$domGrabber = new  domGrabber();

echo 'PHP Version: '. phpversion() .'<br />'."\n";

$domGrabber->useHTMLMethod = TRUE; //DOM->loadHTML
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);

$domGrabber->useHTMLMethod = FALSE; //use DOM->loadXML
$domGrabber->loadDOMObjAndWriteOut($stringWithNoAnchor);
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchor );
$domGrabber->loadDOMObjAndWriteOut($stringWithAnchorButOnlyIdAtt);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

幸福%小乖 2024-09-23 23:47:59

如果您正在加载 XML 文件(就是这种情况,XHTML 就是 XML),那么您应该使用 DOMDocument::loadXML(),而不是 DOMDocument::loadHTML()

在 HTML 中,nameid 都引入了 ID。所以你重复了 id“someAnchor”,因此出现错误。

但是,W3C 验证器允许以 形式显示的重复 ID。这可能是 libmxl2 的一个错误。

在此 libxml2 的 错误报告 中,用户提出了一个补丁,仅考虑 < code>name 属性作为 ID:

根据 HTML 和 XHTML 规范,只有 a 元素的 name 属性
与 id 属性共享名称空间。对于某些要素,可以争论
具有相同名称的多个实例没有意义,但它们应该
尽管如此,不应将其视为与其他元素的 id 位于同一名称空间中
属性。

参见http://www.zvon.org/xxl/xhtmlReference /Output/Strict/attr_name.html 对于所有
采用名称属性及其语义的元素。

If you are loading XML files (that's the case, XHTML is XML), then you should use DOMDocument::loadXML(), not DOMDocument::loadHTML().

In HTML, both name and id introduce an ID. So you are repeating the id "someAnchor", hence the error.

However, the W3C validator allows repeated IDs in the form you show <a id="someAnchor" name="someAnchor"></a>. This may be a bug of libmxl2.

In this bug report for libxml2, a user proposes a patch to only consider the name attribute as an ID:

According to the HTML and XHTML specs, only the a element's name attribute
shares name space with id attributes. For some of the elements it can be argued
that multiple instances with the same name don't make sense, but they should
nevertheless not be considered in the same namespace as other elements' id
attributes.

See http://www.zvon.org/xxl/xhtmlReference/Output/Strict/attr_name.html for all
the elements that take name attributes and their semantics.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文