DOM 导航：消除文本节点

发布于 2024-11-04 00:22:45 字数 1435 浏览 4 评论 0原文

我有一个读取和解析 XML 的 js 脚本。它从 XMLHttpRequest 请求（与返回 XML 的 php 脚本联系）获取 XML。该脚本应该在第一个父节点下接收 2 个或更多节点。它需要的 2 个节点具有明确定义的名称，其他节点可以是任何名称。 php 的输出可能是：

<?xml version='1.0'?>
<things>
    <carpet>
        <id>1</id>
        <name>1</name>
        <desc>1.5</desc>
    </carpet>
    <carpet>
        <id>2</id>
        <name>2</name>
        <height>unknown</height>
    </carpet>
</things>

这里所有地毯都有 7 个节点。

但也可能是：

<?xml version='1.0'?>
<things>
    <carpet>
        <id>1</id>
        <name>1</name>
        <desc>1.5</desc>
    </carpet>
    <carpet><id>2</id><name>2</name><height>unknown</height></carpet>
</things>

这里第一个地毯有 7 个节点，第二个地毯有 3 个节点。我希望我的 javascript 代码能够以快速、干净的方式以完全相同的方式处理这两者。如果可能的话，我想删除每个标签之间的所有文本节点。因此，像上面这样的代码总是会被视为：

<?xml version='1.0'?>
    <things><carpet><id>1</id><name>1</name><desc>1.5</desc></carpet><carpet><id>2</id><name>2</name><height>unknown</height></carpet></things>

这是否可以以快速有效的方式实现？如果可能并且效率更高，我不想使用任何 get 函数（getElementsByTagName()、getElementById，...）。

原文

I have a js script that reads and parses XML.
It obtains the XML from an XMLHttpRequest request (which contacts with a php script which returns XML).
The script is supposed to receive 2 or more nodes under the first parentNode.
The 2 nodes it requires have the name well defined, the other ones can be any name.
The output from the php may be:

<?xml version='1.0'?>
<things>
    <carpet>
        <id>1</id>
        <name>1</name>
        <desc>1.5</desc>
    </carpet>
    <carpet>
        <id>2</id>
        <name>2</name>
        <height>unknown</height>
    </carpet>
</things>

Here all carpets have 7 nodes.

but it also may be:

<?xml version='1.0'?>
<things>
    <carpet>
        <id>1</id>
        <name>1</name>
        <desc>1.5</desc>
    </carpet>
    <carpet><id>2</id><name>2</name><height>unknown</height></carpet>
</things>

Here the first carpet has 7 nodes, the 2nd carpet has 3 nodes.
I want my javascript code to treat both exactly the same way in a quick and clean way.
If possible, I'd like to remove all the text nodes between each tag. So a code like the one above would always be treated as:

<?xml version='1.0'?>
    <things><carpet><id>1</id><name>1</name><desc>1.5</desc></carpet><carpet><id>2</id><name>2</name><height>unknown</height></carpet></things>

Is that possible in a quick and efficient way? I'd like not to use any get function (getElementsByTagName(), getElementById, ...), if possible and if more efficient.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

和我恋爱吧 2024-11-11 00:22:45

遍历 DOM 并删除您认为空的节点（仅包含空格）非常简单。

这是未经测试（已测试并已修复，此处实时复制），但它看起来像这样（显然，用符号替换那些幻数）：

var reBlank = /^\s*$/;
function walk(node) {
    var child, next;
    switch (node.nodeType) {
        case 3: // Text node
            if (reBlank.test(node.nodeValue)) {
                node.parentNode.removeChild(node);
            }
            break;
        case 1: // Element node
        case 9: // Document node
            child = node.firstChild;
            while (child) {
                next = child.nextSibling;
                walk(child);
                child = next;
            }
            break;
    }
}
walk(xmlDoc); // Where xmlDoc is your XML document instance

我对“空白”的定义是根据 JavaScript 解释器对 \s 的理解，只有空格的任何东西（空格）RegExp 类。请注意，某些实现存在 \s 包容性不够的问题（ASCII 范围之外的几个 Unicode“空白”字符不匹配等），因此请务必使用示例数据进行测试。

It's pretty straightforward to walk the DOM and remove the nodes you consider empty (containing only whitespace).

This is untested (tested and fixed, live copy here), but it would look something like this (replace those magic numbers with symbols, obviously):

var reBlank = /^\s*$/;
function walk(node) {
    var child, next;
    switch (node.nodeType) {
        case 3: // Text node
            if (reBlank.test(node.nodeValue)) {
                node.parentNode.removeChild(node);
            }
            break;
        case 1: // Element node
        case 9: // Document node
            child = node.firstChild;
            while (child) {
                next = child.nextSibling;
                walk(child);
                child = next;
            }
            break;
    }
}
walk(xmlDoc); // Where xmlDoc is your XML document instance

There my definition of "blank" is anything which only has whitespace according to the JavaScript interpreter's understanding of the \s (whitespace) RegExp class. Note that some implementations have issues with \s not being inclusive enough (several Unicode "blank" characters outside the ASCII range not being matched, etc.), so be sure to test with your sample data.

回复收藏 0 原文

鹿童谣 2024-11-11 00:22:45

我只想尝试一个非常粗略的字符串替换：假设您将其存储在一个名为 xml 的变量中：

var rex = /(\<(\/)?[A-Za-z0-9]+\>)(\s)+/gi;
var a = xml.replace( rex, "$1" );

这是我整理的完整测试：

<html><head></head>

<body>
<script type="text/javascript">
var xml = "<?xml version='1.0'?>\n" + 
"<things>\n" +
"    <carpet>\n" +
"        <id>1</id>\n" +
"        <name>1</name>\n" +
"        <desc>1.5</desc>\n" +
"    </carpet>\n" +
"    <carpet>\n" +
"        <id>2</id>\n" +
"        <name>2</name>\n" +
"        <height>unknown</height>\n" +
"    </carpet>\n" +
"</things>";

var rex = /(\<(\/)?[A-Za-z0-9]+\>)(\s)+/gi;
var a = xml.replace( rex, "$1" );
alert( a );

</script>


</body></html>

I would just try a very crude string replace: assuming you store this in a variable called xml:

var rex = /(\<(\/)?[A-Za-z0-9]+\>)(\s)+/gi;
var a = xml.replace( rex, "$1" );

here's the complete test I put together:

<html><head></head>

<body>
<script type="text/javascript">
var xml = "<?xml version='1.0'?>\n" + 
"<things>\n" +
"    <carpet>\n" +
"        <id>1</id>\n" +
"        <name>1</name>\n" +
"        <desc>1.5</desc>\n" +
"    </carpet>\n" +
"    <carpet>\n" +
"        <id>2</id>\n" +
"        <name>2</name>\n" +
"        <height>unknown</height>\n" +
"    </carpet>\n" +
"</things>";

var rex = /(\<(\/)?[A-Za-z0-9]+\>)(\s)+/gi;
var a = xml.replace( rex, "$1" );
alert( a );

</script>


</body></html>

回复收藏 0 原文

~没有更多了~