Python 中的格式化输入
我有一个特殊的问题。我需要使用 python 读取(从 txt 文件)仅存在于预定义偏移范围内的子字符串。比方说 5-8 和 12-16。
例如,如果文件中的一行类似于:
abcdefghi akdhflskdhfhglskdjfhghsldk
那么我想读取两个单词 - “efgh”和“kdhfl”。因为,在单词“efgh”中,字符“e”的偏移量是5,字符“h”的偏移量是8。同样,另一个单词“kdhfl”。
请注意,空格也会增加偏移量。事实上,我的文件中的空格并不是在每一行中“一致出现”,并且不能依赖于提取感兴趣的单词。这就是为什么我必须依靠抵消。
我希望我能够把问题说清楚。
等待答案!
编辑 -
是的,每行中的空白量可以改变并且也可以考虑偏移量。例如,考虑这两行 -
abcz d
a bc d
在这两种情况下,我认为最终字符“d”的偏移量是相同的。正如我所说,文件中的空格不一致,我不能依赖它们。我需要根据字符的偏移量来拾取字符。你的答案还成立吗?
I have a peculiar problem. I need to read (from a txt file) using python only those substrings that are present at predefined range of offsets. Let's say 5-8 and 12-16.
For example, if a line in the file is something like:
abcdefghi akdhflskdhfhglskdjfhghsldk
then I would like to read the two words - "efgh" and "kdhfl". Because, in the word "efgh", the offset of character "e" is 5 and that of "h" is 8. Similarly, the other word "kdhfl".
Please note that the whitespaces also add to the offset. Infact, the white spaces in my file are not "consistenty occurring" in every line and cannot be depended upon to extract the words of interest. Which is why, I have to bank on the offsets.
I hope I've been able to make the question clear.
Awaiting answers!
Edit -
yes, the whitespace amount in each line can change and accounts for the offsets also. For example, consider these two lines -
abcz d
a bc d
In both cases, I view the offset of the final character "d" as the same. As I said, the white spaces in the file are not consistent and I cannot rely on them. I need to pick up the characters based on their offsets. Does your answer still hold?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设它是一个文件,
assuming its a file,
要从偏移量中提取片段,只需将每一行读入字符串,然后使用切片 ([from:to]) 访问子字符串。
目前尚不清楚您所说的不一致的空白是什么。如果空格添加到偏移量中,则它必须一致才有意义。如果空白量可以改变但实际上考虑了偏移量,则您无法可靠地提取数据。
在您添加的示例中,只要 d 的偏移量保持不变,您就可以通过切片提取它。
To extract pieces from offsets simply read each line into a string and then access a substring with a slice ([from:to]).
It's unclear what you're saying about the inconsistent whitespace. If whitespace adds to the offset, it must be consistent to be meaningful. If the whitespace amount can change but actually accounts for the offsets, you can't reliably extract your data.
In your added example, as long as d's offset stays the same, you can extract it with slicing.
什么阻止你使用正则表达式?除了空白之外,偏移量还会变化吗?
What's to stop you from using a regular expression? Besides the whitespace do the offsets vary?