Python如何纠正字符串中未对齐的子字符串位置信息

发布于 2025-01-09 18:38:28 字数 369 浏览 3 评论 0原文

我有一个字符串列表以及需要用于训练 nlp 模型的子字符串的起始偏移量和结束偏移量。
其中一些子字符串的位置未对齐。例如：

text = 'Car is blue'  
start_offset = 0  
end_offset = 2  #misaligned. should be 3.  
substring = text[start_offset:end_offset] # should be 'Car' but misaligned to give substring as 'Ca'

目的是检查突出显示的子字符串是否是整个字符串中的整个单词。如果不是，请更正开始和结束偏移。
我可以使用什么Python代码来获取整个单词子串？

原文

I have a list of strings and the start offset and end offset of substrings that need to be used for training a nlp model.
Some of these positions for substring are misaligned. Eg:

text = 'Car is blue'  
start_offset = 0  
end_offset = 2  #misaligned. should be 3.  
substring = text[start_offset:end_offset] # should be 'Car' but misaligned to give substring as 'Ca'

The aim is to check if substring highlighted is a whole word from the whole string. If not, correct the start and end offset.
What python code could I use to get whole word substrings?

分享到QQ

分享到微博