删除html标签mondodb

发布于 2025-02-07 22:51:50 字数 829 浏览 3 评论 0原文

我正在创建一个查询,以提取MongoDB客户的描述。不幸的是,描述是以HTML格式的。有没有办法替换所有HTML标签并将其作为“”。将其替换为“”或删除HTML标签。

以下是示例文档,

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "<p><span>This will be a test description</span><br/></p>", 
}

输出应删除“ P”,“ SPAN”和“ BR”。 MongoDB是否有一个功能可以一次删除它们,而无需重复$项目,

这是预期的输出:

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "This will be a test description", 
}

谢谢!

I am creating a query to extract description of customers in mongodb. Unfortunately, the description is in HTML Format. Is there a way to replace all HTML tags and make it as " ". Either replace it with " " or remove HTML Tags.

Below is a sample document

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "<p><span>This will be a test description</span><br/></p>", 
}

The output should remove "p", "span", and "br". Is there a function in mongodb to remove them all at once without repeating $project

This is the expected output:

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "This will be a test description", 
}

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

×纯※雪 2025-02-14 22:51:51

一种方法是通过在保存方法的hook hook of Save方法中删除所有

Description.replace(/(<([^>]+)>)/gi, "");

标签/a>

One way to do it is by removing all tags by regex in pre hook of save method

Description.replace(/(<([^>]+)>)/gi, "");

See hooks here

抱着落日 2025-02-14 22:51:50

如果您使用Mongo 4.2,则必须找到确切的正则是从HTML中提取内容的正则。在下面,您还可以找到聚合管道和正则拨号。

db.getCollection("name_of_your_collection").aggregate({
    $set: {
        contentRegex: {
            $regexFind: { input: "$Description", regex: /([^<>]+)(?!([^<]+)?>)/gi }
        }
    }
},
    {
        $set: {
            content: { $ifNull: ["$contentRegex.match", "$Description"] }
        }
    },
    {
        $unset: [ "contentRegex" ]
    }
)

If you use Mongo 4.2 then you have to find the exact regex which will extract content from HTML. Below you can find an aggregate pipeline and the regex also.

db.getCollection("name_of_your_collection").aggregate({
    $set: {
        contentRegex: {
            $regexFind: { input: "$Description", regex: /([^<>]+)(?!([^<]+)?>)/gi }
        }
    }
},
    {
        $set: {
            content: { $ifNull: ["$contentRegex.match", "$Description"] }
        }
    },
    {
        $unset: [ "contentRegex" ]
    }
)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文