对文本内容进行地理标记或地理标记的方法

发布于 2024-07-06 18:30:21 字数 241 浏览 6 评论 0原文

有哪些好的算法可以自动用城市/地区或原产地标记文本? 也就是说,如果一个博客是关于纽约的,我如何以编程方式判断。 是否有软件包/论文声称可以以一定程度的确定性做到这一点?

我已经研究了一些基于 tfidf 的方法、专有名词交叉点,但到目前为止,还没有取得惊人的成功,我很感激您的想法!

更普遍的问题是在给定一些主题列表的情况下将文本分配给主题。

简单/朴素的方法优于完整的贝叶斯方法,但我持开放态度。

What are some good algorithms for automatically labeling text with the city / region or origin? That is, if a blog is about New York, how can I tell programatically. Are there packages / papers that claim to do this with any degree of certainty?

I have looked at some tfidf based approaches, proper noun intersections, but so far, no spectacular successes, and I'd appreciate ideas!

The more general question is about assigning texts to topics, given some list of topics.

Simple / naive approaches preferred to full on Bayesian approaches, but I'm open.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

永不分离 2024-07-13 18:30:21

您正在寻找命名实体识别系统,或简称NER。 有几个 工具包可以帮助您。 LingPipe 尤其有一个非常不错的教程CAGEclass 似乎是面向地理地名上的NER,但我还没用过。

如果您要使用 Java,我建议使用 LingPipe NER 类。 OpenNLP也有一些,但前者有更好的文档。

如果您正在寻找一些理论背景,Chavez 等人。 (2005) 构建了一个有趣的系统并记录了它。

You're looking for a named entity recognition system, or short NER. There are several good toolkits available to help you out. LingPipe in particular has a very decent tutorial. CAGEclass seems to be oriented around NER on geographical place names, but I haven't used it yet.

If you're going with Java, I'd recommend using the LingPipe NER classes. OpenNLP also has some, but the former has a better documentation.

If you're looking for some theoretical background, Chavez et al. (2005) have constructed an interesting system and documented it.

贵在坚持 2024-07-13 18:30:21

潜在语义映射似乎可能是一个不错的选择。 这就是您可能会发现的最简单的算法。

Latent Semantic Mapping seems like potentially a good fit. That's just about as naive of an algorithm as you're likely to find.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文