多次解析字符串的编程习惯

发布于 2024-08-07 02:57:22 字数 213 浏览 12 评论 0原文

我正在开发盲文翻译库,我需要将一串文本翻译成盲文。我计划分多次执行此操作,但我需要一种方法来跟踪字符串的哪些部分已被翻译,哪些部分尚未翻译,因此我不会重新翻译它们。

我总是可以创建一个类来跟踪已处理的字符串中的位置范围,然后设计我的搜索/替换算法以在后续传递中忽略它们,但我想知道是否有更优雅的方法完成同样的事情。

我想多遍字符串翻译并不少见,我只是不确定执行此操作的选项是什么。

I'm working on a Braille translation library, and I need to translate a string of text into braille. I plan to do this in multiple passes, but I need a way to keep track of which parts of the string have been translated and which have not, so I don't retranslate them.

I could always create a class which would track the ranges of positions in the string which had been processed, and then design my search/replace algorithm to ignore them on subsequent passes, but I'm wondering if there isn't a more elegant way to accomplish the same thing.

I would imagine that multi-pass string translation isn't all that uncommon, I'm just not sure what the options are for doing it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

聊慰 2024-08-14 02:57:22

更常见的方法是将您的输入标记化,然后处理标记。例如,首先将字符串标记为每个字符的标记。然后,在第一遍中逐个标记生成简单的盲文映射。在后续传递中,您可以替换更多标记 - 例如,通过用单个输出标记替换输入标记序列。

由于您的标记是对象或结构,而不是简单的字符,因此您可以为每个标记附加附加信息 - 例如您翻译(或更确切地说,音译)当前标记的源标记。

A more usual approach would be to tokenize your input, then work on the tokens. For example, start by tokenizing the string into a token for each character. Then, in a first pass generate a straightforward braille mapping, token by token. In subsequent passes, you can replace more of the tokens - for example, by replacing sequences of input tokens with a single output token.

Because your tokens are objects or structs, rather than simple characters, you can attach additional information to each - such as the source token(s) you translated (or rather, transliterated) the current token from.

零度° 2024-08-14 02:57:22

查看一些基本的编译器理论。

Check out some basic compiler theory..

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文