如何仅通过标题找到属于哪个类别的报价?

发布于 2024-11-10 02:30:15 字数 329 浏览 4 评论 0原文

我正在开发一项新服务,它将查询多个优惠(Groupon 等),我想破译哪个类别属于该优惠。

示例:

我收到此标题:“Acqualina Wellness Expo – Acqualina Resort & Spa”,我需要找出此优惠属于哪个类别。

我尝试使用 http://www.google.com/insights/search/ 但它这并不容易,因为它只接收 7 个参数(术语),有时我们有无法分开的复合词。

I am developing a new service that will query multiple offers (Groupon, etc. ..) and I would like to decipher which category belongs to this offer.

Example:

I get this title: "Acqualina Wellness Expo – Acqualina Resort & Spa" and I need to find out what category belongs to this offer.

I try play with http://www.google.com/insights/search/ but it's not easy because it receives only 7 parameters (terms) and sometimes we have compound words that cannot be separated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

魂牵梦绕锁你心扉 2024-11-17 02:30:15

有一些基于 Wordnet 和搜索距离等的有趣方法,但标准方法是贝叶斯垃圾邮件过滤器方法。

第 1 步:构造一组标题(或标题和正文)示例以及您认为它属于哪个类别。你使这个集合越大、越多样化就越好。您需要从您希望能够识别的每个类别中获得许多(假设至少是两位数,但最好是数百个)不同的示例。如果您需要帮助构建此集合,您可以使用亚马逊的 Mechanical Turk 并付费其他人进行分类。

第 2 步:通过 CRM114 (http://crm114.sourceforge.net/ ) 或类似工具运行所有示例。如果您想使用云服务,我认为 Google Prediction API 允许使用文本字段。

第 3 步:为了进行测试,不要让分类程序看到所有示例。将一些保留在所谓的样本外集合中,您可以在其上测试分类器。它对已经见过的东西进行分类要容易得多,所以你要确保你知道它在未见过的例子上有多好。有些分类器会自动为您执行此测试。

祝你好运!

There are fun methods based on Wordnet and search distance and such, but the standard way would be the Bayesian spam filter approach.

Step 1: Construct an example set of title (or title and body) and what category you think it belongs to. The larger and more diverse you make this set the better. You need to have many (let's say at least a two-digit number, but preferably hundreds) different examples from each category you want to be able to recognize. If you want help constructing this set, you could use Amazon's Mechanical Turk and pay other people to do the categorization.

Step 2: Run all your examples by CRM114 (http://crm114.sourceforge.net/ ) or something similar. If you want to use a cloud service, I think the Google Prediction API allows for text fields.

Step 3: For testing, don't let the categorizer see all examples. Keep some in what is called an out-of-sample set, that you can test your categorizer on. It is much easier for it to categorize stuff it has already seen, so you want to make sure that you know how good it is on unseen examples. Some categorizers will do this test for you automatically.

Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文