Python - 检查一个字符是否在字典中,如果不在则处理它
我要从一种源语言(输入文件)音译到目标语言(目标文件),因此我正在检查源代码中字典中的等效映射,源代码中的某些字符没有等效映射,例如逗号(,) 和所有其他此类特殊符号。如何检查该字符是否属于具有等效映射的字典,甚至如何处理要在目标文件中打印的那些在字典中没有等效映射的特殊符号。谢谢你:)。
I am going about transliteration from one source language(input file) to a target language(target file) so I am checking for equivalent mappings in a dictionary in my source code, certain characters in the source code don't have an equivalent mapping like comma(,) and all other such special symbols. How do I check if the character belongs to the dictionary for which I have an equivalent mapping and to even take care of those special symbols to be printed in the target file which don't have an equivalent mapping in the dictionary. Thank you:).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我的建议是,考虑到
rules
是字符到其音译等效项的映射:字典的
get
方法将返回键的值或默认值(如果键)不存在 - 通常默认值为None
,但在这种情况下,我们给出了与默认值(第二个参数)相同的字符,这样如果找不到键,它只会返回自身。使用生成器表达式编写此代码的更紧凑的方法是:
My recommendation, given that
rules
is a mapping of the characters to their transliterated equivalents:A dict's
get
method will return either the value for a key or a default value if the key does not exist - normally the default value isNone
, but in this case, we gave the same character as the default value (the second argument) so that if the key is not found it will just return itself.A more compact way to write this using generator expressions would be:
如果您使用 Unicode 对象的
translate
方法,正如我在回答您的另一个问题时所建议的那样,一切都会完全按照您的意愿自动完成:每个 Unicode 字符c
其代码点 (ord(c)
) 不在音译字典中,只是简单地从输入传递到输出,就像您想要的那样。为什么要重新发明轮子?If you use the
translate
method of Unicode objects, as I recommended in answer to another question of yours, everything's done automatically for you exactly as you desire: each Unicode characterc
whose codepoints (ord(c)
) is not in the transliteration dictionary is simply passed unchanged from input to output, just as you want. Why reinvent the wheel?我想你想要这样的东西:
如果字典中有翻译(例如“&&”->“and”),它将使用它。否则它将翻译为“音译未知”。
希望有帮助。
编辑:正如LeafStorm建议的那样,可以使用字典的
get
函数来简化上述代码。循环中的代码行将变为I think you want something like this:
If there's a translation in the dictionary (e.g. "&&" -> "and"), it will use that. Else it will translate to "transliteration unknown".
Hope that helped.
EDIT: As LeafStorm suggested, a dictionary's
get
function can be used to simplify the above code. The code line in the loop would become我不完全理解你的问题的细节,但这是我能想到的最简单的例子,它说明了我认为你所追求的模式。
我相信“获取”方法就是您想要的。它允许您从字典中检索键,但如果该键不存在,您可以设置一个默认值 - 即,“我想要 dictx[itm] (分配给键“itm”的值),但如果 ' itm' 不在字典中,然后创建它和 .' 的值
此代码片段将循环遍历您的源文档('my_source')并计算其中各个项目的频率,将这些计数作为值添加到字典中已有的键中,但是当它到达不存在键的项目时,不会抛出异常,添加一个键并分配一个值“0”。
I didn't completely understand the details of your question, but here's the simplest example i could think of that illustrates the pattern i think you are after.
The 'get' method i believe is what you want. It allows you to retrieve a key from a dictionary, but if the key is not there, you can set a default value--i.e., "i want dictx[itm] (the value assigned to the key 'itm') but if 'itm' is not in dictionary then create it and value of .'
This snippet will loop through your source document ('my_source') and count the frequency of the various items in it, adding those counts as values to the keys already in your dictionary, but when it reaches an item for which no key exists, no exception is thrown, a key is added and a value of '0' assigned.
这看起来非常简单。如果你的字典是 char 到 char,那么你会做类似的事情,
这里,instr 是你的输入字符串,mydict 包含你的 char 到 char 的映射。
如果您想检查单词的一部分,我建议使用两本词典:一本包含任何单词中包含的字符,另一本包含单词。您可以这样使用它:
例如,chardict 可能包含所有字母表。当然,您可能想要对某些部分进行一些不同的处理(例如使用 chardict 以外的其他东西来检查 char 是否被视为有效单词的一部分 - 也许使用二分搜索),但希望您明白这个想法。
This seems pretty straightforward. If your dictionary is char to char, then you would do something like
Here, instr is your input string and mydict contains your mapping of chars to chars.
If you want to check parts of words, I would recommend using two dictionaries: one that contains the characters that are contained in any word, and one that contains the words. You could use it like this:
chardict might contain all of the alphabet for instance. Of course, you might want to do some parts a little bit differently (like use something other than chardict to check if a char is to be considered part of a valid word - perhaps something with a binary search), but hopefully you get the idea.