多次解析字符串的编程习惯
我正在开发盲文翻译库,我需要将一串文本翻译成盲文。我计划分多次执行此操作,但我需要一种方法来跟踪字符串的哪些部分已被翻译,哪些部分尚未翻译,因此我不会重新翻译它们。
我总是可以创建一个类来跟踪已处理的字符串中的位置范围,然后设计我的搜索/替换算法以在后续传递中忽略它们,但我想知道是否有更优雅的方法完成同样的事情。
我想多遍字符串翻译并不少见,我只是不确定执行此操作的选项是什么。
I'm working on a Braille translation library, and I need to translate a string of text into braille. I plan to do this in multiple passes, but I need a way to keep track of which parts of the string have been translated and which have not, so I don't retranslate them.
I could always create a class which would track the ranges of positions in the string which had been processed, and then design my search/replace algorithm to ignore them on subsequent passes, but I'm wondering if there isn't a more elegant way to accomplish the same thing.
I would imagine that multi-pass string translation isn't all that uncommon, I'm just not sure what the options are for doing it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
更常见的方法是将您的输入标记化,然后处理标记。例如,首先将字符串标记为每个字符的标记。然后,在第一遍中逐个标记生成简单的盲文映射。在后续传递中,您可以替换更多标记 - 例如,通过用单个输出标记替换输入标记序列。
由于您的标记是对象或结构,而不是简单的字符,因此您可以为每个标记附加附加信息 - 例如您翻译(或更确切地说,音译)当前标记的源标记。
A more usual approach would be to tokenize your input, then work on the tokens. For example, start by tokenizing the string into a token for each character. Then, in a first pass generate a straightforward braille mapping, token by token. In subsequent passes, you can replace more of the tokens - for example, by replacing sequences of input tokens with a single output token.
Because your tokens are objects or structs, rather than simple characters, you can attach additional information to each - such as the source token(s) you translated (or rather, transliterated) the current token from.
查看一些基本的编译器理论。
Check out some basic compiler theory..