从特殊字符列表创建字典
我正在编写这个小脚本:基本上它将列表元素(其中包含特殊字符)映射到其索引以创建字典。
#!/usr/bin/env python
#-*- coding: latin-1 -*-
ln1 = '?0>9<8~7|65"4:3}2{1+_)'
ln2 = "(*&^%$£@!/`'\][=-#¢"
refStr = ln2+ln1
keyDict = {}
for i in range(0,len(refStr)):
keyDict[refStr[i]] = i
print "-" * 32
print "Originl: ",refStr
print "KeyDict: ", keyDict
# added just to test a few special characters
tsChr = ['£','%','\\','¢']
for k in tsChr:
if k in keyDict:
print k, "\t", keyDict[k]
else: print k, "\t", "not in the dic."
它返回这样的结果:
Originl: (*&^%$£@!/`'\][=-#¢?0>9<8~7|65"4:3}2{1+_)
KeyDict: {'!': 9, '\xa3': 7, '\xa2': 20, '%': 4, '$': 5, "'": 12, '&': 2, ')': 42, '(': 0, '+': 40, '*': 1, '-': 17, '/': 10, '1': 39, '0': 22, '3': 35, '2': 37, '5': 31, '4': 33, '7': 28, '6': 30, '9': 24, '8': 26, ':': 34, '=': 16, '<': 25, '?': 21, '>': 23, '@': 8, '\xc2': 19, '#': 18, '"': 32, '[': 15, ']': 14, '\\': 13, '_': 41, '^': 3, '`': 11, '{': 38, '}': 36, '|': 29, '~': 27}
这一切都很好,除了字符 £
、%
和 \
正在转换为 \xa3分别为
、\xa2
和 \\
。有谁知道为什么打印 ln1
/ln2
可以,但字典不行。我该如何解决这个问题?非常感谢任何帮助。干杯!!
Update 1
我添加了额外的特殊字符 - #
和 cent
然后这就是我按照 @Duncan 的建议得到的:
! 9
? 7
? 20
% 4
$ 5
....
....
8 26
: 34
= 16
< 25
? 21
> 23
@ 8
? 19
....
....
注意第 7 个、第 19 个和第 20 个元素,即根本打印不正确。第 21 个元素是实际的 ?
字符。干杯!!
Update 2
只是将此循环添加到我的原始帖子中以实际测试我的目的:
tsChr = ['£','%','\\','¢']
for k in tsChr:
if k in keyDict:
print k, "\t", keyDict[k]
else: print k, "\t", "not in the dic."
这就是我得到的结果:
£ not in the dic.
% 4
\ 13
¢ not in the dic.
运行脚本时,它认为 £
和 cent
实际上不是在字典里 - 这就是我的问题。任何人都知道如何解决这个问题或我做错了什么/哪里?
最终,我将检查字典中文件(或一行文本)中的字符,看看它是否存在,并且有可能包含像 é
或 < code>£ 等在文本中。干杯!!
I'm working on this small script: basically it's mapping the list elements (with special characters in it) to its index to create a dictionary.
#!/usr/bin/env python
#-*- coding: latin-1 -*-
ln1 = '?0>9<8~7|65"4:3}2{1+_)'
ln2 = "(*&^%$£@!/`'\][=-#¢"
refStr = ln2+ln1
keyDict = {}
for i in range(0,len(refStr)):
keyDict[refStr[i]] = i
print "-" * 32
print "Originl: ",refStr
print "KeyDict: ", keyDict
# added just to test a few special characters
tsChr = ['£','%','\\','¢']
for k in tsChr:
if k in keyDict:
print k, "\t", keyDict[k]
else: print k, "\t", "not in the dic."
It returns the result like this:
Originl: (*&^%$£@!/`'\][=-#¢?0>9<8~7|65"4:3}2{1+_)
KeyDict: {'!': 9, '\xa3': 7, '\xa2': 20, '%': 4, '
which is all good, except for the characters £
, %
and \
are converting to \xa3
, \xa2
and \\
respectively. Does any one know why printing ln1
/ln2
is just fine but the dictionary is not. How can I fix this? Any help greatly appreciated. Cheers!!
Update 1
I've added extra special characters - #
and ¢
and then this is what I get following @Duncan's suggestion:
! 9
? 7
? 20
% 4
$ 5
....
....
8 26
: 34
= 16
< 25
? 21
> 23
@ 8
? 19
....
....
Notice that 7th, 19th and 20th elements, which is not printing correctly at all. 21st element is the actual ?
character. Cheers!!
Update 2
Just added this loop to my original post to actually test my purpose:
tsChr = ['£','%','\\','¢']
for k in tsChr:
if k in keyDict:
print k, "\t", keyDict[k]
else: print k, "\t", "not in the dic."
and this what I get as result:
£ not in the dic.
% 4
\ 13
¢ not in the dic.
Whist running the script, it thinks that £
and ¢
are not actually in the dictionary - and that's my problem. Anyone knows how to fix that or what/where am I doing wrong?
eventually, I'll be checking for the character(s) from a file (or a line of text) in the dictionary to see if it exists and there is a chance of having character like é
or £
and so on in the text. Cheers!!
: 5, "'": 12, '&': 2, ')': 42, '(': 0, '+': 40, '*': 1, '-': 17, '/': 10, '1': 39, '0': 22, '3': 35, '2': 37, '5': 31, '4': 33, '7': 28, '6': 30, '9': 24, '8': 26, ':': 34, '=': 16, '<': 25, '?': 21, '>': 23, '@': 8, '\xc2': 19, '#': 18, '"': 32, '[': 15, ']': 14, '\\': 13, '_': 41, '^': 3, '`': 11, '{': 38, '}': 36, '|': 29, '~': 27}
which is all good, except for the characters £
, %
and \
are converting to \xa3
, \xa2
and \\
respectively. Does any one know why printing ln1
/ln2
is just fine but the dictionary is not. How can I fix this? Any help greatly appreciated. Cheers!!
Update 1
I've added extra special characters - #
and ¢
and then this is what I get following @Duncan's suggestion:
Notice that 7th, 19th and 20th elements, which is not printing correctly at all. 21st element is the actual ?
character. Cheers!!
Update 2
Just added this loop to my original post to actually test my purpose:
and this what I get as result:
Whist running the script, it thinks that £
and ¢
are not actually in the dictionary - and that's my problem. Anyone knows how to fix that or what/where am I doing wrong?
eventually, I'll be checking for the character(s) from a file (or a line of text) in the dictionary to see if it exists and there is a chance of having character like é
or £
and so on in the text. Cheers!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当您打印包含字符串的字典或列表时,Python 将显示字符串的
repr()
。如果您print repr(ln2)
您会发现没有任何变化:您的字典键只是 '£' &c 的 latin-1 编码。人物。如果您这样做:
那么字符将按您的预期显示。
When you print a dictionary or list that contains strings Python will display the
repr()
of the strings. If youprint repr(ln2)
you'll see that nothing has changed: your dictionary key is just the latin-1 encoding of '£' &c. characters.If you do:
then the characters will display as you expect.
以我的拙见,了解unicode的一般知识和它在python中的使用会很有用
如果您不有兴趣知道为什么人们必须把事情搞砸,所以您必须处理 '\xa3' 而不是只有一个简单的
£
那么邓肯在上面回答非常完美,可以告诉您您想知道的一切。更新(关于您的更新#2)
请断言您的文件是用 latin-1 编码和 非 utf-8 保存的,因为它现在是这样,您的测试将通过(或者只是更改
#-*-编码:latin-1 -*-
到#-*- 编码:utf-8 -*-
)这是您可以轻松理解从我的链接阅读(和理解)内容的事情上面:
您的文件另存为utf-8 这意味着对于 char
£
使用 2 个字节,但由于你告诉 python 解释器编码是 latin-1,他将使用 2 个 utf-8 中的每一个密钥的£
字节。事实上,我可以计算
ln2
中的 19 个字符,但如果您发出len(ln2)
,它将返回 21。当您在 keyDict.keys( 中测试
'£' 时)
您正在寻找一个 2 个字符的字符串,而每个 2 个字符在字典中都有自己的键,这就是它找不到它的原因。您还可以测试 len(keyDict) 并发现它比您预期的要长。
我想这解释了一切,请理解并不是所有的故事都很容易在一个网页中解释,但上面的链接,在我看来是一个很好的起点,混合了一些故事和一些编码示例。
干杯
P.S.:我正在使用这段代码,将其保存为 UTF-8 并且它可以完美地工作:
In my humble opinion it would be useful to learn about unicode in general and it's use in python
if you are not interested to know why people had to mess up things so you have to deal with a '\xa3' instead of having just a plain
£
then Duncan answer above is perfect and tells you everything you want to know.Update (regardin your Update #2)
please assert your file is saved with latin-1 encoding and non utf-8 as it's now and your test will pass (or just change
#-*- coding: latin-1 -*-
to#-*- coding: utf-8 -*-
)This is a thing you could easily understand reading (and understanding) contents from my link above:
your file is saved as utf-8 this means for char
£
2 bytes are used but since you tell python interpreter encoding is latin-1 he will use each of the 2 utf-8 bytes of£
for a key.Infact I can count 19 chars in
ln2
but if you issuelen(ln2)
it will return 21.When you test for
'£' in keyDict.keys()
you are looking for a 2-char string while each of the 2-chars got its own key in dictionary, that's why it won't find it.Also you can test
len(keyDict)
and find it's longer than what you expect.I guess this explains everything, please understand not all the story is easy to be explained in a single webpage but the link above, in my humble opinion is a nice starting point, mixing some story and some coding examples.
Cheers
P.S.: I'm using this code, saving it as UTF-8 and it works flawlessly: