Python:在字符串中找到特定单词号的开始索引

发布于 2025-01-28 17:41:44 字数 344 浏览 2 评论 0原文

我有一个字符串:

myString = "Tomorrow will be very very rainy"

我想获得单词编号5(非常)的启动索引。

我目前要做的是,我确实将我分为单词:

words = re.findall( r'\w+|[^\s\w]+', myString)

但是我不确定如何获取单词数字5:单词[5]的开始索引。

使用index()不起作用,因为它发现了第一次出现:

start_index = myString.index(words[5])

I have this string:

myString = "Tomorrow will be very very rainy"

I would like to get the start index of the word number 5 (very).

What I do currently, I do split myString into words:

words = re.findall( r'\w+|[^\s\w]+', myString)

But I am not sure on how to get the start index of the word number 5: words[5].

Using the index() is not working as it finds the first occurrence:

start_index = myString.index(words[5])

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

南笙 2025-02-04 17:41:44

不是很优雅,而是通过拆分单词列表进行循环,并根据单词长度和拆分字符计算索引(在这种情况下为空间)。该答案将针对句子中的第五个单词。

myString = "Tomorrow will be very very rainy"

target_word = 5

split_string = myString.split()

idx_start = 0

for i in range(target_word-1):
    idx_start += len(split_string[i])
    if myString[idx_start] == " ":
        idx_start += 1

idx_end = idx_start + len(split_string[target_word-1]) + 1

print(idx_start, idx_end, myString[idx_start:idx_end])

Not very elegant, but loop through the list of split words and calculate the index based on the word length and the split character (in this case a space). This answer will target the fifth word in the sentence.

myString = "Tomorrow will be very very rainy"

target_word = 5

split_string = myString.split()

idx_start = 0

for i in range(target_word-1):
    idx_start += len(split_string[i])
    if myString[idx_start] == " ":
        idx_start += 1

idx_end = idx_start + len(split_string[target_word-1]) + 1

print(idx_start, idx_end, myString[idx_start:idx_end])
深海蓝天 2025-02-04 17:41:44
wordnum = 5
l = [x.span()[1] for x in re.finditer(" +", string)]
pos = l[wordnum-2]
print(pos)

输出

22
wordnum = 5
l = [x.span()[1] for x in re.finditer(" +", string)]
pos = l[wordnum-2]
print(pos)

output

22
只是我以为 2025-02-04 17:41:44

如果单词之间只有单个空间:

  • 总结所有单词长度在想要的单词
  • 添加空间数量
word_idx = 4  # zero based index
words = myString.split()
start_index = sum(len(word) for word in words[:word_idx]) + word_idx

之前:

22

If only single spaces between words:

  • Sum all word lengths before the wanted word
  • Add amount of spaces
word_idx = 4  # zero based index
words = myString.split()
start_index = sum(len(word) for word in words[:word_idx]) + word_idx

Result:

22
岁月苍老的讽刺 2025-02-04 17:41:44

如果字符串以5个单词开头,则可以匹配前4个单词并捕获第五个单词。

您可以使用start方法,然后将1传递给匹配对象

^(?:\w+\s+){4}(\w+)

说明

  • ^字符串的开始
  • (?:\ w+ \ s+){4}
  • (\ w+)捕获组1,匹配1+字字符的

示例

import re

myString = "Tomorrow will be very very rainy"
pattern = r"^(?:\w+\s+){4}(\w+)"
m = re.match(pattern, myString)
if m:
    print(m.start(1))

输出

22

更广泛的匹配您可以使用\ s+匹配一个或多个非空格字符。

pattern = r"^(?:\S+\s+){4}(\S+)"

If the string starts with 5 words, you can match the first 4 words and capture the fifth one.

The you can use the start method and pass 1 to it for the first capture group of the Match Object.

^(?:\w+\s+){4}(\w+)

Explanation

  • ^ Start of string
  • (?:\w+\s+){4} Repeat 4 times matching 1+ word characters and 1+ whitspace chars
  • (\w+) Capture group 1, match 1+ word characters

Example

import re

myString = "Tomorrow will be very very rainy"
pattern = r"^(?:\w+\s+){4}(\w+)"
m = re.match(pattern, myString)
if m:
    print(m.start(1))

Output

22

For a broader match you can use \S+ to match one or more non whitespace characters.

pattern = r"^(?:\S+\s+){4}(\S+)"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文