ruby regex：替换 POS 标记数据中的名词簇

发布于 2024-10-29 11:15:22 字数 372 浏览 1 评论 0原文

我有 POS 标记的英语短语，其形式为：the_DTflower_NN pot_NN，并且希望将所有名词序列组合成一个由下划线分隔的名词：the_DTflower_pot_NN。

我正在尝试以下操作：

s.gsub!(/ ([^ ]+)_NN ([^ ]+)_NN/, " #{$1}_#{$2}_NN")

当连续有多个名词时，此解决方案会失败，例如：the_DT monster_NN Truck_NN wallpaper_NN，它应该成为the_DT monster_truck_wallpaper_NN。

我应该怎么办？

原文

I have POS tagged English phrases of the form: the_DT flower_NN pot_NN and want to combine all noun sequences into a single noun separated by an underscore: the_DT flower_pot_NN.

I'm trying the following:

s.gsub!(/ ([^ ]+)_NN ([^ ]+)_NN/, " #{$1}_#{$2}_NN")

This solution fails when there are multiple nouns in a row such as: the_DT monster_NN truck_NN wallpaper_NN, which should become the_DT monster_truck_wallpaper_NN.

What should I do?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

痴骨ら 2024-11-05 11:15:22

while s.gsub!(/\b(\S+)_NN\s+(\S+)_NN\b/, '\1_\2_NN')
end

您无法在正则表达式中一次性完成所有操作，因为没有提供递归（或迭代）。你要做的就是连接相邻对，然后重复，直到没有相邻的神经网络。

编辑：也修复了替换零件。现在应该可以工作了。

while s.gsub!(/\b(\S+)_NN\s+(\S+)_NN\b/, '\1_\2_NN')
end

You cannot do it all at once in a regexp, since there is no provision for recursion (or iteration). What you do is, join the adjacent pairs, then repeat until there are no adjacent NNs.

EDIT: Fixed the replacement part as well. Should work now.

回复收藏 0 原文