IR 和 QA - 初级项目范围

发布于 2024-12-16 16:51:59 字数 632 浏览 0 评论 0原文

我一直在为问答领域的本科项目集思广益。一个包含 IR 和 NLP 组成部分的项目。

首先出现的当然是事实问答,但这似乎是一个已经被攻克的问题。 #IBM 沃森!

非事实 QA 似乎很有趣,所以我接受了它。现在,我们正处于项目描述的范围确定阶段。因此,从回答用户提出的任何问题这一雄心勃勃的目标出发,我需要确定我们的项目范围。

因此,我做出了以下决定:

  1. 它将是封闭域 - C++ 编程
  2. 语料库将仅包含一个网站。 (cplusplus 或维基百科)或仅一份文档(完整参考)
  3. 我们将仅开发整个 QA 架构的一个模块 - 段落检索或答案提取。

我们的导师坚持首先实施现有的解决方案。 我被困在这一点上,寻找现有的实现。 这是一个。但当我仔细阅读环境要求时,却大吃一惊。有很多库和工具包,但我没有找到任何非事实性的 QA 系统,至少在很小的范围内了解这一点是很好的。

建议项目的良好范围。我希望通过我的大师继续从事这方面的工作,那么这会是一个好的开始吗?我们的项目大约有 4 个月的时间,重要的是不要最终完成一个研究项目。它应该有一个有形的产出。

I have been brainstorming for an Undergraduate Project in Question Answering domain. A project that has components of IR and NLP.

The first thing that popped up, was of course factoid question answering, but that seemed to be an already conquered problem. #IBM Watson!

Non-factoid QA seems interesting, so I took it up. Now, we are in scope-it-out phase of the project description. So, from the ambitious goal - of answering any question put up by the user - I need to scope out our project.

So I took the following decisions:

  1. It will be closed-domain - C++ Programming
  2. The corpus will consist of just one website. (cplusplus or wikipedia) or just one document (the complete reference)
  3. We will develop only one module of the entire QA architecture - Passage Retrieval or Answer Extraction.

Our mentor insists on implementing an already existing solution, to start with.
I am stuck at this point, to search for existing implementations. Here is one. But when I read through the environment requirements, it was staggering. There are a lot of libraries and tool kits, but I didn't find any non-factoid QA system, that was good to know at least on a very small scale.

Suggest a good scope for the project. I wish to continue working on this through my masters, so it what would be a good start? We have about 4 months for the project, and it is important not to end up doing a research project. It should have a tangible output.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沫离伤花 2024-12-23 16:51:59

对于 IR,您可以使用 Lucene/Solr。

对于机器学习和 nlp,有很多库可用,主要是 python 和 java,至少是用户友好的库。

实现 Hoifung 的系统是相当雄心勃勃的,我会追求更简单的东西。你看过他的代码吗?

你可以在过去几年的 BioNLP 挑战中找到很多东西,但这些也是相对复杂的任务。

推特电影评论发现怎么样?即根据 X 条推文,这部电影很糟糕吗?

For IR you have Lucene/Solr.

For machine learning and nlp lots of libraries are available, primarily in python and java, at least the user friendly ones.

Implementing Hoifung's system is pretty ambitious, I'd go for something simpler. Have you looked at his code at all?

Something you could find lots of stuff in is the BioNLP challenges from the last few years, but those are also relatively complicated tasks.

How about twitter movie review discovery? Ie based on X tweets, does this movie suck?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文