文本挖掘库还是语言库?

发布于 2024-08-18 17:39:35 字数 93 浏览 7 评论 0原文

我从我拥有的论坛中收集了一堆数据,并且想要进行一些文本挖掘或使用一些语言库来提取有用的信息。

任何语言的文本挖掘、数据挖掘库都可以。

谢谢。

i have a bunch of data harvested from a forum I own, and would like to do some text mining or use some linguistic library to extract useful information.

any text mining, data mining library in any language will do.

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

仙女 2024-08-25 17:39:35

我建议您看看 R。它有大量的文本挖掘包:看看自然语言处理查看。特别是查看 tm 包。以下是一些相关链接:

另一个有用的包的例子是 Gary King 的自述文件包

I recommend that you have a look at R. It has an extensive number of text mining packages: have a look at the Natural Language Processing view. In particular, look at the tm package. Here are some relevant links:

Another example of useful package for this is Gary King's readme package.

爱情眠于流年 2024-08-25 17:39:35

您可能想看看Python NLTK(自然语言工具包):它是专门为这种类型设计的的东西。

还有一本好书供您入门。

You may like to have a look at the Python NLTK (Natural Language ToolKit): it's specifically designed for this kind of thing.

There is also a great book you can but to get you started.

撑一把青伞 2024-08-25 17:39:35

Mallet 是一个专为文本挖掘而设计的 java 库。预处理完文本数据后,可以使用通用数据挖掘工具,例如 Weka也足以满足你的任务。

如果您可以使用 SPSS 或 SAS,他们的产品应该更容易使用。

Mallet is a java library designed for text mining. Once you have preprocessed the text data, a general data mining tool like Weka would also suffice your task.

If you have access to SPSS or SAS, their products should be more easier to use.

独﹏钓一江月 2024-08-25 17:39:35

尝试一下 GATE,它有 GUI,当然你可以使用 java api 来获得更多功能:
http://gate.ac.uk/family/developer.html

您还可以使用Weka 用于处理文本和进行文本挖掘,看看这些有用的讲座:
http://sentimentmining.net/weka/

Try GATE, it has GUI and of course you can use java api for more power:
http://gate.ac.uk/family/developer.html

You can also use Weka for processing text and doing text mining, have a look at these useful lectures:
http://sentimentmining.net/weka/

万劫不复 2024-08-25 17:39:35

stanford core-nlp 适用于英文文本,并且具有命名实体识别等功能。看一下:http://nlp.stanford.edu/software/corenlp.shtml

Ehsan 已经推荐的 GATE 也不错,但如果您需要编写自己的组件,它可能会有点复杂。对于大型的东西来说这是很棒的。

UIMA 与 GATE 类似,但使用起来不太方便,因为它不像 GATE 那样具有广泛的 GUI。 (http://uima.apache.org)

stanford core-nlp is good for English text, and has things like Named Entity Recognition. Take a look at: http://nlp.stanford.edu/software/corenlp.shtml

GATE, which Ehsan already recommended, is also good, but it can be a bit complicated if you need to write your own components. For large-scale stuff it's great though.

UIMA is similar to GATE, but not as easy to use because it doesn't feature an extensive GUI like GATE. (http://uima.apache.org)

鸩远一方 2024-08-25 17:39:35

我推荐以下Python库:

  1. nltk
  2. keras
  3. tensorflow

注意:在进行任何文本分析之前,您应该根据您的要求清理数据

I would recommend the following Python libraries:

  1. nltk
  2. keras
  3. tensorflow

Note: Before any text analysis you should clean the data based on your requirement

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文