Python - 检查一个字符是否在字典中,如果不在则处理它

发布于 2024-08-21 09:12:19 字数 152 浏览 12 评论 0原文

我要从一种源语言(输入文件)音译到目标语言(目标文件),因此我正在检查源代码中字典中的等效映射,源代码中的某些字符没有等效映射,例如逗号(,) 和所有其他此类特殊符号。如何检查该字符是否属于具有等效映射的字典,甚至如何处理要在目标文件中打印的那些在字典中没有等效映射的特殊符号。谢谢你:)。

I am going about transliteration from one source language(input file) to a target language(target file) so I am checking for equivalent mappings in a dictionary in my source code, certain characters in the source code don't have an equivalent mapping like comma(,) and all other such special symbols. How do I check if the character belongs to the dictionary for which I have an equivalent mapping and to even take care of those special symbols to be printed in the target file which don't have an equivalent mapping in the dictionary. Thank you:).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

述情 2024-08-28 09:12:19

我的建议是,考虑到 rules 是字符到其音译等效项的映射:

results = []
for char in source_text:
    results.append(rules.get(char, char))
return ''.join(results)    # turns the list back into a string

字典的 get 方法将返回键的值或默认值(如果键)不存在 - 通常默认值为 None,但在这种情况下,我们给出了与默认值(第二个参数)相同的字符,这样如果找不到键,它只会返回自身。

使用生成器表达式编写此代码的更紧凑的方法是:

''.join((rules.get(char, char) for char in source_text))

My recommendation, given that rules is a mapping of the characters to their transliterated equivalents:

results = []
for char in source_text:
    results.append(rules.get(char, char))
return ''.join(results)    # turns the list back into a string

A dict's get method will return either the value for a key or a default value if the key does not exist - normally the default value is None, but in this case, we gave the same character as the default value (the second argument) so that if the key is not found it will just return itself.

A more compact way to write this using generator expressions would be:

''.join((rules.get(char, char) for char in source_text))
瞳孔里扚悲伤 2024-08-28 09:12:19

如果您使用 Unicode 对象的 translate 方法,正如我在回答您的另一个问题时所建议的那样,一切都会完全按照您的意愿自动完成:每个 Unicode 字符 c 其代码点 (ord(c)) 不在音译字典中,只是简单地从输入传递到输出,就像您想要的那样。为什么要重新发明轮子?

If you use the translate method of Unicode objects, as I recommended in answer to another question of yours, everything's done automatically for you exactly as you desire: each Unicode character c whose codepoints (ord(c)) is not in the transliteration dictionary is simply passed unchanged from input to output, just as you want. Why reinvent the wheel?

谁与争疯 2024-08-28 09:12:19

我想你想要这样的东西:

tokenMapping = {"&&" : "and"}

for token in source file: # <-- pseudocode
    translatedToken = tokenMapping[token] if token in tokenMapping else "transliteration unknown"

如果字典中有翻译(例如“&&”->“and”),它将使用它。否则它将翻译为“音译未知”。

希望有帮助。

编辑:正如LeafStorm建议的那样,可以使用字典的get函数来简化上述代码。循环中的代码行将变为

    translatedToken = tokenMapping.get(token, "transliteration unknown")

I think you want something like this:

tokenMapping = {"&&" : "and"}

for token in source file: # <-- pseudocode
    translatedToken = tokenMapping[token] if token in tokenMapping else "transliteration unknown"

If there's a translation in the dictionary (e.g. "&&" -> "and"), it will use that. Else it will translate to "transliteration unknown".

Hope that helped.

EDIT: As LeafStorm suggested, a dictionary's get function can be used to simplify the above code. The code line in the loop would become

    translatedToken = tokenMapping.get(token, "transliteration unknown")
老旧海报 2024-08-28 09:12:19
dictx = {}
for itm in my_source :
    dictx[itm] = dictx.get(itm, 0) + 1

我不完全理解你的问题的细节,但这是我能想到的最简单的例子,它说明了我认为你所追求的模式。

我相信“获取”方法就是您想要的。它允许您从字典中检索键,但如果该键不存在,您可以设置一个默认值 - 即,“我想要 dictx[itm] (分配给键“itm”的值),但如果 ' itm' 不在字典中,然后创建它和 .' 的值

此代码片段将循环遍历您的源文档('my_source')并计算其中各个项目的频率,将这些计数作为值添加到字典中已有的键中,但是当它到达不存在键的项目时,不会抛出异常,添加一个键并分配一个值“0”。

dictx = {}
for itm in my_source :
    dictx[itm] = dictx.get(itm, 0) + 1

I didn't completely understand the details of your question, but here's the simplest example i could think of that illustrates the pattern i think you are after.

The 'get' method i believe is what you want. It allows you to retrieve a key from a dictionary, but if the key is not there, you can set a default value--i.e., "i want dictx[itm] (the value assigned to the key 'itm') but if 'itm' is not in dictionary then create it and value of .'

This snippet will loop through your source document ('my_source') and count the frequency of the various items in it, adding those counts as values to the keys already in your dictionary, but when it reaches an item for which no key exists, no exception is thrown, a key is added and a value of '0' assigned.

心碎的声音 2024-08-28 09:12:19

这看起来非常简单。如果你的字典是 char 到 char,那么你会做类似的事情,

outstr = ''
for ch in instr:
    if ch in mydict:
        outstr += mydict[ch]
    else:
        outstr += ch

这里,instr 是你的输入字符串,mydict 包含你的 char 到 char 的映射。

如果您想检查单词的一部分,我建议使用两本词典:一本包含任何单词中包含的字符,另一本包含单词。您可以这样使用它:

outstr = ''
word = ''
for ch in instr:
    if ch in chardict:
        word += ch
    else:
        if len(word):
            if word in worddict:
                outstr += worddict[word]
            else:
                outstr += word
            word = ''
        outstr += ch
if len(word):
    outstr += worddict[word]
else:
    outstr += word

例如,chardict 可能包含所有字母表。当然,您可能想要对某些部分进行一些不同的处理(例如使用 chardict 以外的其他东西来检查 char 是否被视为有效单词的一部分 - 也许使用二分搜索),但希望您明白这个想法。

This seems pretty straightforward. If your dictionary is char to char, then you would do something like

outstr = ''
for ch in instr:
    if ch in mydict:
        outstr += mydict[ch]
    else:
        outstr += ch

Here, instr is your input string and mydict contains your mapping of chars to chars.

If you want to check parts of words, I would recommend using two dictionaries: one that contains the characters that are contained in any word, and one that contains the words. You could use it like this:

outstr = ''
word = ''
for ch in instr:
    if ch in chardict:
        word += ch
    else:
        if len(word):
            if word in worddict:
                outstr += worddict[word]
            else:
                outstr += word
            word = ''
        outstr += ch
if len(word):
    outstr += worddict[word]
else:
    outstr += word

chardict might contain all of the alphabet for instance. Of course, you might want to do some parts a little bit differently (like use something other than chardict to check if a char is to be considered part of a valid word - perhaps something with a binary search), but hopefully you get the idea.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文