我正在开发一个流程,该流程将对一个人执行自然语言处理(NLP)--以及可能的几个——我们内容丰富的网站。 NLP 完成后我想做的就是自动将输出(通常是一组术语,考虑到该隐喻的普遍性,您可能会将其视为标签)组织到某种标准或普遍接受的组织结构中。
在一个完美的世界中,我真的希望它能够在大众分类概念(而不是分类法)下进行众包,因为最终目标是针对/吸引真实的人而不是“领域专家”,但我持开放态度想法和最佳实践。出于可扩展性的明显目的,我想自动化这种税收/民间分类法的填充,以便团队/组织中的“某个人”不负责随意查看一堆单词(有或没有上下文)充实树的上下文组件。
我有一些这样做的想法,需要一些研究来确定可行性,但我对这类事情的实践经验为零,所以这些想法实际上只是归结为我编造的东西,它们可能在完成任务中发挥一些作用。想象其他人在这类事情上有更多的经验,我希望我能站在你的肩膀上。
感谢您的想法和见解。
实际示例
我针对 我自己博客上的一篇文章。 NLP 返回了以下具有足够相关性的术语:
现在我想将这些术语放入税收/民间分类法中,而无需人工干预。在这种情况下,“Git”和“Rob Wilkerson”是可以分类的术语 - 在此过程中存在/将会有一项附加规定,如果这些术语没有产生足够的吸引力而值得跟踪,则将从结构中删除术语。另一方面,“改变”可能太模糊/模棱两可,不值得这么麻烦。
I'm working on a process that will perform natural language processing (NLP) on one--and potentially several--of our content rich sites. What I'd like to do once the NLP is complete is to automatically organize the output (generally a set of terms that you might think of as tags given the prevalence of that metaphor) into some kind of standard or generally accepted organizational structure.
In a perfect world, I'd really like this to be crowd sourced under the folksonomy concept (as opposed to a taxonomy) since the ultimate goal is to target/appeal to real people rather than "domain experts", but I'm open to ideas and best practices. For the obvious purpose of scalability, I'd like to automate the population of this tax/folksonomy so that "some guy" in the team/organization isn't responsible for looking at a bunch of words (with or without context) and arbitrarily fleshing out the contextual components of the tree.
I have a few ideas for doing this that require some research to establish viability, but I have exactly zero practical experience with this sort of thing so the ideas really just boil down to stuff I made up that might perform some role in accomplishing the task. Imagining that others have vastly more experience with this sort of thing, I'm hoping that I can stand on your shoulders.
Thanks for your thoughts and insights.
Practical Example
I ran the NLP against an article on my own blog. The NLP returned the following terms with an sufficient level of relevance:
Now I want to put those terms into a tax/folksonomy without human intervention. In this case, "Git" and "Rob Wilkerson" are terms could be classified--there is/will be an additional stipulation in the process that will remove terms from the structure if those terms don't generate enough traction to be worth tracking. On the other hand, "change" is probably too vague/ambiguous to be worth the trouble.
发布评论
评论(1)
它看起来像 Freebase,也许与 DBpedia,可能正是我正在寻找的东西。
It looks like Freebase, perhaps in combination with DBpedia, might be just what I was looking for.