用于自然语言处理的 ARFF

发布于 2024-11-10 16:17:20 字数 128 浏览 13 评论 0原文

我正在尝试获取一组评论，并将它们转换为 ARFF 格式以便与 WEKA 一起使用。不幸的是，要么我完全误解了格式的工作原理，要么我必须为所有可能的单词提供一个属性，然后是一个存在指示符。有谁知道更好的方法，或者最好有一个示例 ARFF 文件？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时间你老了 2024-11-17 16:17:20

如果您将评论存储在纯文本文件和不同的文件夹中（在您的情况下是正面和负面的），则可以使用 TextDirectoryLoader。

您可以在 Weka 的 KnowledgeFlow 应用程序中或通过命令行找到它。更多信息请参见：http://weka.wikispaces.com/ARFF+files+from +文本+集合

回复收藏 0 原文

橙味迷妹 2024-11-17 16:17:20

花了一段时间才解决，但是使用这个 input.arff:

@relation text_files

@attribute review string
@attribute sentiment {0, 1}

@data
"this is some text", 1
"this is some more text", 1
"different stuff", 0

和这个命令：

java -classpath "C:\\Program Files\\Weka-3-6\\weka.jar" weka.filters.unsupervised.attribute.StringToWordVector -i input.arff -o output.arff

生成以下内容：

@relation 'text_files-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'

@attribute sentiment {0,1}
@attribute different numeric
@attribute is numeric
@attribute more numeric
@attribute some numeric
@attribute stuff numeric
@attribute text numeric
@attribute this numeric

@data

{0 1,2 1,4 1,6 1,7 1}
{0 1,2 1,3 1,4 1,6 1,7 1}
{1 1,5 1}

Took a while to work out, but with this input.arff:

@relation text_files

@attribute review string
@attribute sentiment {0, 1}

@data
"this is some text", 1
"this is some more text", 1
"different stuff", 0

And this command:

java -classpath "C:\\Program Files\\Weka-3-6\\weka.jar" weka.filters.unsupervised.attribute.StringToWordVector -i input.arff -o output.arff

The following is produced:

@relation 'text_files-weka.filters.unsupervised.attribute.StringToWordVector-R1-W1000-prune-rate-1.0-N0-stemmerweka.core.stemmers.NullStemmer-M1-tokenizerweka.core.tokenizers.WordTokenizer -delimiters \" \\r\\n\\t.,;:\\\'\\\"()?!\"'

@attribute sentiment {0,1}
@attribute different numeric
@attribute is numeric
@attribute more numeric
@attribute some numeric
@attribute stuff numeric
@attribute text numeric
@attribute this numeric

@data

{0 1,2 1,4 1,6 1,7 1}
{0 1,2 1,3 1,4 1,6 1,7 1}
{1 1,5 1}

回复收藏 0 原文

~没有更多了~