如何循环多个文件,保留基本名称以供进一步处理?

发布于 2024-10-19 11:39:44 字数 753 浏览 13 评论 0原文

我有多个需要标记化的文本文件,POS 和 NER。我正在使用 C&C 标记器并运行了他们的教程,但我我想知道是否有一种方法可以标记多个文件而不是一个一个地标记。

目前,我正在对文件进行标记:

bin/tokkie --input working/tutorial/example.txt--quotes delete --output working/tutorial/example.tok

如下所示,然后是部分语音标记:

bin/pos --input working/tutorial/example.tok --model models/pos --output working/tutorial/example.pos

最后是命名实体识别:

bin/ner --input working/tutorial/example.pos --model models/ner --output working/tutorial/example.ner

我不确定如何创建一个循环来执行此操作并保持文件名与输入相同,但是扩展名代表它所具有的标签。我正在考虑使用 bash 脚本或 Perl 来打开目录,但我不确定如何输入 C&C 命令以使脚本能够理解。

目前我正在手动执行此操作,至少可以说非常耗时!

I have multiple text files that need to be tokenised, POS and NER. I am using C&C taggers and have run their tutorial, but I am wondering if there is a way to tag multiple files rather than one by one.

At the moment I am tokenising the files:

bin/tokkie --input working/tutorial/example.txt--quotes delete --output working/tutorial/example.tok

as follows and then Part of Speech tagging:

bin/pos --input working/tutorial/example.tok --model models/pos --output working/tutorial/example.pos

and lastly Named Entity Recognition:

bin/ner --input working/tutorial/example.pos --model models/ner --output working/tutorial/example.ner

I am not sure how I would go about creating a loop to do this and keep the file name the same as the input but with the extension representing the tagging it has. I was thinking of a bash script or perhaps Perl to open the directory but I am not sure on how to enter the C&C commands in order for the script to understand.

At the moment I am doing it manually and it's pretty time consuming to say the least!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

谈情不如逗狗 2024-10-26 11:39:44

未经测试,可能需要一些目录修改。

use autodie qw(:all);
use File::Basename qw(basename);

for my $text_file (glob 'working/tutorial/*.txt') {
    my $base_name = basename($text_file, '.txt');
    system 'bin/tokkie',
        '--input'  => "working/tutorial/$base_name.txt",
        '--quotes' => 'delete',
        '--output' => "working/tutorial/$base_name.tok";
    system 'bin/pos',
        '--input'  => "working/tutorial/$base_name.tok",
        '--model'  => 'models/pos',
        '--output' => "working/tutorial/$base_name.pos";
    system 'bin/ner',
        '--input'  => "working/tutorial/$base_name.pos",
        '--model'  => 'models/ner',
        '--output' => "working/tutorial/$base_name.ner";
}

Untested, likely needs some directory mangling.

use autodie qw(:all);
use File::Basename qw(basename);

for my $text_file (glob 'working/tutorial/*.txt') {
    my $base_name = basename($text_file, '.txt');
    system 'bin/tokkie',
        '--input'  => "working/tutorial/$base_name.txt",
        '--quotes' => 'delete',
        '--output' => "working/tutorial/$base_name.tok";
    system 'bin/pos',
        '--input'  => "working/tutorial/$base_name.tok",
        '--model'  => 'models/pos',
        '--output' => "working/tutorial/$base_name.pos";
    system 'bin/ner',
        '--input'  => "working/tutorial/$base_name.pos",
        '--model'  => 'models/ner',
        '--output' => "working/tutorial/$base_name.ner";
}
み格子的夏天 2024-10-26 11:39:44

在重击中:

#!/bin/bash
dir='working/tutorial'
for file in "$dir"/*.txt
do
    noext=${file/%.txt}

    bin/tokkie --input "$file" --quotes delete --output "$noext.tok"

    bin/pos --input "$noext.tok" --model models/pos --output "$noext.pos"

    bin/ner --input "$noext.pos" --model models/ner --output "$noext.ner"

done

In Bash:

#!/bin/bash
dir='working/tutorial'
for file in "$dir"/*.txt
do
    noext=${file/%.txt}

    bin/tokkie --input "$file" --quotes delete --output "$noext.tok"

    bin/pos --input "$noext.tok" --model models/pos --output "$noext.pos"

    bin/ner --input "$noext.pos" --model models/ner --output "$noext.ner"

done
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文