使用正则表达式将不同的文本插入另一个文本？

发布于 2024-11-04 05:37:12 字数 1046 浏览 2 评论 0原文

我有两个文本文件。我想从标记之间的第一个文本中获取文本，并将其插入到 {} 之间的另一个文本文件中。

更好的例子（比如字典）

Text1:

<sup>1</sup>dog
<sup>2</sup>cat
<sup>3</sup>lion
<sup>1</sup>flower
<sup>2</sup>tree
.
.

Text2:

\chapter1
\pkt{1}{}{labrador retirever is..}
\pkt{2}{}{home pets..}
\pkt{3}{}{wild cats..}
\chapter2
\pkt{1}{}{red rose}
\pkt{2}{}{lemon tree}
.
.

What I want:

Text3:

\chapter1
\pkt{1}{dog}{labrador retirever is..}
\pkt{2}{cat}{home pets..}
\pkt{3}{lion}{wild cats..}
\chapter2
\pkt{1}{flower}{red rose}
\pkt{2}{tree}{lemon tree}

文本是随机的，但你可以看到我想要的。 Perl 会是最好的。

因此，按顺序获取

</sup>**text**<sup>

并将其粘贴到

\pkt{nr}{**here**}{this is translation of this word already stored in text2}.

文本 A 和 B 中，因此，如果我可以从文本 A 中首先读取 text ，将其保存在 temp 中，请从文本中删除此行A，将其放在文本 B 中的第一个空闲 {} 插槽中，然后重新开始，那就太好了。数字将匹配，因为订单已保存。对不起我的英语:) 谢谢！

原文

I've two text files. I want to take text from first one between </sup><sup> tags, and insert it to another text file between {}.

Better example (sth like a dictionary)

Text1:

<sup>1</sup>dog
<sup>2</sup>cat
<sup>3</sup>lion
<sup>1</sup>flower
<sup>2</sup>tree
.
.

Text2:

\chapter1
\pkt{1}{}{labrador retirever is..}
\pkt{2}{}{home pets..}
\pkt{3}{}{wild cats..}
\chapter2
\pkt{1}{}{red rose}
\pkt{2}{}{lemon tree}
.
.

What I want:

Text3:

\chapter1
\pkt{1}{dog}{labrador retirever is..}
\pkt{2}{cat}{home pets..}
\pkt{3}{lion}{wild cats..}
\chapter2
\pkt{1}{flower}{red rose}
\pkt{2}{tree}{lemon tree}

Text is random, but You can see what I want. Perl would be best.

So get

</sup>**text**<sup>

and paste it to

\pkt{nr}{**here**}{this is translation of this word already stored in text2}.

Text A and B are in order, so if I could read first </sup>text<sup> from Text A, save it in temp, delete this line from Text A, put it on first free {} slot in text B, and start over again it would be great. Numbers will match because order is saved.
Sorry for my English:)
Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

决绝 2024-11-11 05:37:12

此代码将所有字典项按照它们出现的顺序放入一个数组中。然后循环 tex 文件，每次点击 \pkt{num}{} 时都会插入数组中的一项。

字典中的换行符被处理并替换为空格（如果您不希望这种行为，只需在地图中删除此替换）。只要 \pkt{num}{} 部分不跨越多行，就应该找到 \pkt。否则，我认为最简单的解决方案是 undef $/ （输入记录分隔符）并将整个文件读入字符串中，然后循环替换（尽管可能有点占用内存）。

#!/usr/bin/perl -wT

use strict;

my $dict_filename = 'text1';
my $tex_filename = 'text2';
my $out_filename = 'text3';

open(DICT, $dict_filename);
my @dict;
{
    # Set newline separator to <sup>
    local $/ = '<sup>';
    # Throw away first "line", it will be empty
    <DICT>;
    # Extract string and throw away newlines
    @dict = map { $_ =~ m@</sup>\s*(.*?)\s*(?:<sup>|$)@s; $_ = $1; $_ =~ s/\n/ /g; $_; } <DICT>;
}
close(DICT);

open(TEX, $tex_filename);
open(OUT, ">$out_filename");

my $tex_line;
my $dict_pos = 0;
while($tex_line = <TEX>)
{
    # Replace any \pkt{num}{} with \pkt{num}{text}
    $tex_line =~ s|(\\pkt\{\d+\}\{)(\})|$1$dict[$dict_pos++]$2|g;

    print OUT $tex_line;
}

close(TEX);
close(OUT);

This code puts all dict items in an array, in the order they appear. The tex file is then looped and each time \pkt{num}{} is hit an item from the array is inserted.

Newlines in dict are handled and replaced with spaces (Just remove this replace in the map if you don't want this behavior). \pkt should be found as long as the part \pkt{num}{} is not spanning multiple lines. Otherwise I think the easiest solution would be to undef $/ (the input record separator) and read the whole file into a string and just loop the replacement (could be a bit memory hungry though).

#!/usr/bin/perl -wT

use strict;

my $dict_filename = 'text1';
my $tex_filename = 'text2';
my $out_filename = 'text3';

open(DICT, $dict_filename);
my @dict;
{
    # Set newline separator to <sup>
    local $/ = '<sup>';
    # Throw away first "line", it will be empty
    <DICT>;
    # Extract string and throw away newlines
    @dict = map { $_ =~ m@</sup>\s*(.*?)\s*(?:<sup>|$)@s; $_ = $1; $_ =~ s/\n/ /g; $_; } <DICT>;
}
close(DICT);

open(TEX, $tex_filename);
open(OUT, ">$out_filename");

my $tex_line;
my $dict_pos = 0;
while($tex_line = <TEX>)
{
    # Replace any \pkt{num}{} with \pkt{num}{text}
    $tex_line =~ s|(\\pkt\{\d+\}\{)(\})|$1$dict[$dict_pos++]$2|g;

    print OUT $tex_line;
}

close(TEX);
close(OUT);

回复收藏 0 原文

~没有更多了~