删除html标签mondodb

发布于 2025-02-07 22:51:50 字数 829 浏览 3 评论 0原文

我正在创建一个查询，以提取MongoDB客户的描述。不幸的是，描述是以HTML格式的。有没有办法替换所有HTML标签并将其作为“”。将其替换为“”或删除HTML标签。

以下是示例文档，

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "<p><span>This will be a test description</span><br/></p>", 
}

输出应删除“ P”，“ SPAN”和“ BR”。 MongoDB是否有一个功能可以一次删除它们，而无需重复$项目，

这是预期的输出：

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "This will be a test description", 
}

谢谢！

原文

I am creating a query to extract description of customers in mongodb. Unfortunately, the description is in HTML Format. Is there a way to replace all HTML tags and make it as " ". Either replace it with " " or remove HTML Tags.

Below is a sample document

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "<p><span>This will be a test description</span><br/></p>", 
}

The output should remove "p", "span", and "br". Is there a function in mongodb to remove them all at once without repeating $project

This is the expected output:

{ 
        "_id" : ObjectId("61f72aefdc85500a8baa6bb8")
        "CustomerPin" : "22010871", 
        "CustomerName" : "TestLastName, TestFirstName", 
        "Age" : 39.0, 
        "Gender" : "Male", 
        "Description" : "This will be a test description", 
}

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

×纯※雪 2025-02-14 22:51:51

一种方法是通过在保存方法的hook hook of Save方法中删除所有

Description.replace(/(<([^>]+)>)/gi, "");

标签/a>

One way to do it is by removing all tags by regex in pre hook of save method

Description.replace(/(<([^>]+)>)/gi, "");

See hooks here

回复收藏 0 原文

抱着落日 2025-02-14 22:51:50

如果您使用Mongo 4.2，则必须找到确切的正则是从HTML中提取内容的正则。在下面，您还可以找到聚合管道和正则拨号。

db.getCollection("name_of_your_collection").aggregate({
    $set: {
        contentRegex: {
            $regexFind: { input: "$Description", regex: /([^<>]+)(?!([^<]+)?>)/gi }
        }
    }
},
    {
        $set: {
            content: { $ifNull: ["$contentRegex.match", "$Description"] }
        }
    },
    {
        $unset: [ "contentRegex" ]
    }
)

If you use Mongo 4.2 then you have to find the exact regex which will extract content from HTML. Below you can find an aggregate pipeline and the regex also.

db.getCollection("name_of_your_collection").aggregate({
    $set: {
        contentRegex: {
            $regexFind: { input: "$Description", regex: /([^<>]+)(?!([^<]+)?>)/gi }
        }
    }
},
    {
        $set: {
            content: { $ifNull: ["$contentRegex.match", "$Description"] }
        }
    },
    {
        $unset: [ "contentRegex" ]
    }
)

回复收藏 0 原文

~没有更多了~