如何在我的 C# 应用程序中使用 USE SharpNlp

发布于 2024-10-02 02:00:51 字数 595 浏览 3 评论 0原文

我需要对语料库中的文件进行词性标记。我已成功按照 SharpNlp 的安装说明进行操作
我使用的是二进制版本

I created a new c# project in:       E:\sharp\sharpapp
location of Models Folder is:        E:\sharp\sharpapp\bin\Models
location of my SharpNlp Binary is:   E:\sharp\SharpNLP-1.0.2529-Bin

我还按照说明修改了 .config 文件“ParseTree.Exe”和“ToolsExamples.Exe”

现在在我的 c# 项目中，我有一个名为 tagging.cs 的类，我必须在其中访问我的语料库文本文件并对这些文件进行 POS 标记。任何人都可以帮助我如何利用 SharpNlp 来做到这一点

请提供步骤。

原文

I require POS tagging for my files in the corpus.
I have successfully followed the installation instructions of SharpNlp
I am using the binary version

I created a new c# project in:       E:\sharp\sharpapp
location of Models Folder is:        E:\sharp\sharpapp\bin\Models
location of my SharpNlp Binary is:   E:\sharp\SharpNLP-1.0.2529-Bin

I have also followed the instructions to modify both .config files "ParseTree.Exe" and "ToolsExamples.Exe"

Now in my c# project I have a class called tagging.cs where I have to access my corpus text files and do POS tagging for those files. Can anybody help me how can I make use of SharpNlp to do so

Please provide steps to do so.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷血 2024-10-09 02:00:51

简而言之，SharpNLP 是

OpenNLP 工具<的 C# 端口/a> 和 OpenNLP MaxEnt
连接到 WordNet 的
一组预先计算的模型，主要用于英语语言
实用模块，例如与 SQLLite 集成。

应该注意的是，OpenNLP 库的移植相对非正式，具有各种类和属性名称更改，可能松散地保留功能和语义，并且与原始版本没有明显的联系Java 项目的生命周期。这种情况可能会确保 SharpNLP 的 OpenNLP 部分最终会更像是远房表兄弟，而不是双胞胎姐妹……

尽管如此，可以使用 OpenNLP 中的示例和文档来补充相对薄弱的支持材料可与 SharpNLP 一起使用。 SharpNLP 源代码和 OpenNLP API 参考 和 OpenNLP wiki，人们通常可以绘制事物图并进行相应的调整。

松散的导体可能是对这个特定的源文件，它以一种看起来接近您可能需要的方式使用 OpenNLP。请注意 OpenNLP 和 SharpNLP 之间的名称变化，例如 POSTTaggerME 类变为 MaximumEntropyPosTagger，Parse() 方法及其重载变为 TagSentence() 等等。

更一般的提示是理解...
...执行 POS 标记通常所需的步骤序列。
这是一个非常高级的近似描述，但我认为很有用。

获取要标记的文本 = 文本字符串
初始化文本解析器
解析它 = 带有单独的“数组”（或其他容器） >标记即单词和标点符号。
初始化 POS Tagger，特别是告诉它应该使用哪个模型
将 [ordered] 标记序列提供给 POS Tagger
Ta dah！使用 POS 标签来实现 NLP 应用程序的最终目的。

请注意上述序列如何假设模型随时可用。
该模型是一般文本统计“概况”的表示，是通过使用一组易于标记的文本训练标记器而获得的。
SharpNLP 附带了通用英语语言模型，但为了标记其他语言，或者如果要标记的特定语料库属于特定领域（例如医疗报告或推文或......），最好重新训练标记器以提高其精度。
Open/SharpNLP 作为大多数词性标注器，无论是独立的还是 API，通常都包含训练它们的功能（= 在给定易于标记的文本样本集的情况下生成模型），并验证如此生成的模型/标注器的质量（= 将测试集上生成的标签与该集预期的标签进行比较）。

In a nutshell, SharpNLP is

a port to C# of OpenNLP Tools and OpenNLP MaxEnt
a connector to WordNet
a set of pre-computed models, mostly for the English language
utility modules such as integration with SQLLite

It should be noted that the port of the OpenNLP libraries is relatively informal, with various class and property name changes, possibly loose preservation of features and semantics and no apparent connection with the original Java projects' lifecycle. This situation will likely ensure that in time the OpenNLP portion of SharpNLP will be more akin to distant cousins than twin sisters...

Never the less, it is possible to use examples and documentation from OpenNLP to complement the relatively thin support material available with SharpNLP. Between the source code of SharpNLP and resources like the OpenNLP API reference and the OpenNLP wiki, one can generally map things and adapt accordingly.

A loose conductor could be the study of this particular source file which makes use of OpenNLP in a way that seems close to what you may need. Note the name changes between OpenNLP and SharpNLP, for example POSTTaggerME class becomes MaximumEntropyPosTagger and the Parse() method and its overload turn to TagSentence() and such.

A more general hint is to understand...
...the sequence of steps typically necessary to perform POS Tagging.
This is a very high-level approximate description but, I think, useful.

get the text to be tagged = string(s) of text
Initialize a text parser
parse it = an "array" (or other container) with individual tokens i.e. words and punctuation characters.
initialize the POS Tagger, in particular tell its which model it should use
feed the [ordered] sequence of tokens to the POS Tagger
Ta dah! Use the POS tags for the eventual purpose of your NLP application.

Note how the above sequence assumes that the model is readily available.
The model is a representation of the statistical "profile" of text in general, obtained from training the Tagger with a set of text readily tagged.
SharpNLP comes with a model for generic English language, but in order to tag other languages or if the specific corpora to be tagged belongs to a particular domain (say medical reports or Tweets or...) it may be preferable to re-train the tagger to improve its precision.
Open/SharpNLP as most POS Taggers, whether stand-alone or their API, typically include features to train them (= to produce a model given a sample set of text readily tagged) and also to verify the quality of the model/tagger so produced (= to compare the tags produced on a test set, with the tags expected for this set).

回复收藏 0 原文