协助查找和替换正则表达式

发布于 2024-08-06 13:47:51 字数 485 浏览 3 评论 0原文

我有一个文本文件,每一行的形式如下:

TAB WORD TAB PoS TAB FREQ#

Word    PoS Freq
the Det 61847
of  Prep    29391
and Conj    26817
a   Det 21626
in  Prep    18214
to  Inf 16284
it  Pron    10875
is  Verb    9982
to  Prep    9343
was Verb    9236
I   Pron    8875
for Prep    8412
that    Conj    7308
you Pron    6954

你们中的一位正则表达式向导能否帮助我从文件中分离出单词?希望我会在 TextPad 中进行查找和替换,仅此而已。多次查找和替换就可以了。一件事:请注意,搜索“动词”也会出现“动词”的单词,而不仅仅是词性,所以要小心。最后我想每行 1 个单词。

非常感谢!

I have a text file, and each line is of the form:

TAB WORD TAB PoS TAB FREQ#

Word    PoS Freq
the Det 61847
of  Prep    29391
and Conj    26817
a   Det 21626
in  Prep    18214
to  Inf 16284
it  Pron    10875
is  Verb    9982
to  Prep    9343
was Verb    9236
I   Pron    8875
for Prep    8412
that    Conj    7308
you Pron    6954

Would one of you regex wizards kindly assist me in isolating the WORDS from the file? I'll do a find and replace in TextPad, hopefully, and that will be that. Multiple find and replaces is fine. One thing: notice that searching for "verb" would also turn up the WORD of "verb," not just the part of speech, so be carefull. In the end I want to end up with 1 word per line.

Thanks so much!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

故乡的云 2024-08-13 13:47:51

我认为微软Excel可以更好地帮助你...

只需复制Excel上的整个文本,它将被格式化为表格,然后继续为单词选择适当的列单元格,最后将它们复制到记事本上。

我敢打赌这是最简单的路径。

如果 Excel 将所有值存储在单个列中,则在单独的列中通过以下方式提取单词:

=Trim(LEFT(C1,maxchar))

I think microsoft excel can help you that better...

Just copy the whole text on excel and it will be formatted as table then go ahead and select the appropriate column cells for the word, finally copy them on notepad.

I bet this is the easiest path.

If in case excel stores all values in a single column, in a separate column extract the word by:

=Trim(LEFT(C1,maxchar))

耶耶耶 2024-08-13 13:47:51

您可以使用 awk 删除第一列,如

awk '{print $1}' /path/to/filename

使用 Skip the first line

awk 'NR!=1 {print $1}' /path/to/filename

You could just use awk to remove the first column, as in

awk '{print $1}' /path/to/filename

Skip the first line by using

awk 'NR!=1 {print $1}' /path/to/filename
ら栖息 2024-08-13 13:47:51

实际上没有必要为此使用正则表达式。例如,您可以使用 cut

cut -f1 <inputfile

There's not really any need to use a regular expression for this. For example, you can use cut:

cut -f1 <inputfile
作业与我同在 2024-08-13 13:47:51

\s*([a-zA-z]+)\s*([a-zA-z]+) 之类的东西会将单词和 PoS 作为组返回。然后,您可以在替换语句中将它们用作 $1 和 $2 来根据需要输出。

如果你只想要 WORD 部分,你可以在替换中使用 $1 。

Something like \s*([a-zA-z]+)\s*([a-zA-z]+) would return the word and PoS as groups. You can then use them in the replace statement as $1 and $2 to output as you want.

If you only want the WORD part you can just use $1 in the replace.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文