ruby regex:替换 POS 标记数据中的名词簇
我有 POS 标记的英语短语,其形式为:the_DTflower_NN pot_NN
,并且希望将所有名词序列组合成一个由下划线分隔的名词:the_DTflower_pot_NN
。
我正在尝试以下操作:
s.gsub!(/ ([^ ]+)_NN ([^ ]+)_NN/, " #{$1}_#{$2}_NN")
当连续有多个名词时,此解决方案会失败,例如:the_DT monster_NN Truck_NN wallpaper_NN
,它应该成为the_DT monster_truck_wallpaper_NN
。
我应该怎么办?
I have POS tagged English phrases of the form: the_DT flower_NN pot_NN
and want to combine all noun sequences into a single noun separated by an underscore: the_DT flower_pot_NN
.
I'm trying the following:
s.gsub!(/ ([^ ]+)_NN ([^ ]+)_NN/, " #{$1}_#{$2}_NN")
This solution fails when there are multiple nouns in a row such as: the_DT monster_NN truck_NN wallpaper_NN
, which should become the_DT monster_truck_wallpaper_NN
.
What should I do?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您无法在正则表达式中一次性完成所有操作,因为没有提供递归(或迭代)。你要做的就是连接相邻对,然后重复,直到没有相邻的神经网络。
编辑:也修复了替换零件。现在应该可以工作了。
You cannot do it all at once in a regexp, since there is no provision for recursion (or iteration). What you do is, join the adjacent pairs, then repeat until there are no adjacent NNs.
EDIT: Fixed the replacement part as well. Should work now.
您也可以使用一个全局替换来尝试此操作。
You can also try this, using one global replace.