如何统计替换字符串的数量

发布于 2024-09-15 12:30:43 字数 830 浏览 5 评论 0原文

我有一个巨大的字符串，我试图将其解析为字符串形式的一系列标记，并且我发现了一个问题：因为许多字符串都是相似的，有时执行 string.replace() 会导致先前替换的字符再次被更换。

假设我要替换的字符串是“goto”，它会被“41”（十六进制）替换并转换为 ASCII（“A”）。之后字符串'A'也要被替换，这样转换后的token又被替换，导致问题。

只更换一次琴弦的最佳方法是什么？将每个标记从原始字符串中分离出来并一次搜索它们需要很长时间

这是我现在的代码。虽然它或多或少有效，但速度不是很快

# The largest token is 8 ASCII chars long
'out' is the string with the final outputs
while len(data) != 0:
    length = 8
    while reverse_search(data[:length]) == None:#sorry THC4k, i used your code 
                                                #at first, but it didnt work out 
                                                #for this and I was too lazy to
                                                #change it
        length -= 1
    out += reverse_search(data[:length])
    data = data[length:]

原文

I have a massive string im trying to parse as series of tokens in string form, and i found a problem: because many of the strings are alike, sometimes doing string.replace()will cause previously replaced characters to be replaced again.

say i have the string being replaced is 'goto' and it gets replaced by '41' (hex) and gets converted into ASCII ('A'). later on, the string 'A' is also to be replaced, so that converted token gets replaced again, causing problems.

what would be the best way to get the strings to be replaced only once? breaking each token off the original string and searching for them one at a time takes very long

This is the code i have now. although it more or less works, its not very fast

# The largest token is 8 ASCII chars long
'out' is the string with the final outputs
while len(data) != 0:
    length = 8
    while reverse_search(data[:length]) == None:#sorry THC4k, i used your code 
                                                #at first, but it didnt work out 
                                                #for this and I was too lazy to
                                                #change it
        length -= 1
    out += reverse_search(data[:length])
    data = data[length:]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哽咽笑 2024-09-22 12:30:43

如果您尝试立即替换字符串，则可以使用字典：

translation = {'PRINT': '32', 'GOTO': '41'}
code = ' '.join(translation[i] if i in translation else i for i in code.split(' '))

基本上是 O(2|S|+(n*|dict|))。非常快。尽管内存使用量可能相当大。跟踪替换可以让您在线性时间内解决问题，但前提是排除查找先前替换的成本。总而言之，这个问题本质上似乎是多项式的。

除非Python中有一个我不知道的通过字典翻译字符串的函数，否则这似乎是最简单的表达方式。

它变成

10 PRINT HELLO
20 GOTO 10

了

10 32 HELLO
20 41 10

我希望这与你的问题有关。

If you're trying to substitute strings at once, you can use a dictionary:

translation = {'PRINT': '32', 'GOTO': '41'}
code = ' '.join(translation[i] if i in translation else i for i in code.split(' '))

which is basically O(2|S|+(n*|dict|)). Very fast. Although memory usage could be quite substantial. Keeping track of substitutions would allow you to solve the problem in linear time, but only if you exclude the cost of looking up previous substitution. Altogether, the problem seems to be polynomial by nature.

Unless there is a function in python to translate strings via dictionaries that i don't know about, this one seems to be the simplest way of putting it.

it turns