哈希非 ASCII Python 字符串

发布于 2024-12-11 10:30:18 字数 1007 浏览 2 评论 0原文

我正在尝试使用 python re 从文件中提取一些字符串，然后使用 MD5ing 该字符串类似于：

    #MD5er.py
    salt = extract_salt(file_foo)
    print 'salt: %s' % salt
    from md5 import md5
    print 'hash: %s' % md5(salt).hexdigest()

$python MD5er

    salt: \0001\072\206\277\354\107\134\061\361\076\150\047\010\124\200\315\100
    hash: ce24166858853dfb12a86d7d602b0638

但是，像这样使用 iPython：

    In [40]: salt = '\0001\072\206\277\354\107\134\061\361\076\150\047\010\124\200\315\100'

    In [41]: salt
    Out[41]: "\x001:\x86\xbf\xecG\\1\xf1>h'\x08T\x80\xcd@"

    In [42]: print salt
    1:���G\1�>hT��@

    In [43]: from md5 import md5

    In [44]: md5(salt).hexdigest()
    Out[44]: 'ebae47a953591f7448ff7079837fb534'

有任何线索说明 MD5 在这两种情况下不同吗？为什么在 ipython 中，当我输入变量名时，它以与原始字符串不同的格式出现，而 print() 输出是第三种格式！？

暗示：

    In [53]: import sys
    In [54]: sys.getdefaultencoding()
    Out[54]: 'ascii'

原文

I'm trying to extract some string from a file using python re, then MD5ing this string using
something like:

    #MD5er.py
    salt = extract_salt(file_foo)
    print 'salt: %s' % salt
    from md5 import md5
    print 'hash: %s' % md5(salt).hexdigest()

$python MD5er

    salt: \0001\072\206\277\354\107\134\061\361\076\150\047\010\124\200\315\100
    hash: ce24166858853dfb12a86d7d602b0638

BUT, using iPython like that:

    In [40]: salt = '\0001\072\206\277\354\107\134\061\361\076\150\047\010\124\200\315\100'

    In [41]: salt
    Out[41]: "\x001:\x86\xbf\xecG\\1\xf1>h'\x08T\x80\xcd@"

    In [42]: print salt
    1:���G\1�>hT��@

    In [43]: from md5 import md5

    In [44]: md5(salt).hexdigest()
    Out[44]: 'ebae47a953591f7448ff7079837fb534'

Any clues why the MD5 is different in the 2 scenarios?
and why in ipython when I typed the variable name it appeared in a different format from the original string, and print() output was a third format!?

Hint:

    In [53]: import sys
    In [54]: sys.getdefaultencoding()
    Out[54]: 'ascii'

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

阳光的暖冬 2024-12-18 10:30:18

第一种情况下的字符串正是您所看到的打印内容：

>>> salt = '\\0001\\072\\206\\277\\354\\107\\134\\061\\361\\076\\150\\047\\010\\
124\\200\\315\\100'
>>> md5(salt).hexdigest()
'ce24166858853dfb12a86d7d602b0638'

请注意我如何转义反斜杠以防止数字被解释为八进制字节值。

第一种情况下的字符串正是您所看到的打印内容：

>>> salt = '\\0001\\072\\206\\277\\354\\107\\134\\061\\361\\076\\150\\047\\010\\
124\\200\\315\\100'
>>> md5(salt).hexdigest()
'ce24166858853dfb12a86d7d602b0638'

请注意我如何转义反斜杠以防止数字被解释为八进制字节值。

编辑：

假设您想从此列表中的八进制值创建一个字节字符串：

data = ['\\0001', '\\072', '\\206', '\\277', '\\354', '\\107', '\\134', 
        '\\061', '\\361', '\\076', '\\150', '\\047', '\\010', '\\124', 
        '\\200', '\\315', '\\100']

您可以转换为整数，然后连接字符，但这与 IPython 中的不同。第一个值是 4 位而不是 3 位。是否应将其视为“\0”后跟 ASCII“1”，还是应将其视为“\1”？下面执行后者：

salt = ''.join(chr(int(d[1:], 8)) for d in data)
print repr(salt)
print md5(salt).hexdigest()

输出：

"\x01:\x86\xbf\xecG\\1\xf1>h'\x08T\x80\xcd@"
d2092426d1bd5bec1579c8b7ed9c73c2

The string in the first case is exactly what you saw printed:

>>> salt = '\\0001\\072\\206\\277\\354\\107\\134\\061\\361\\076\\150\\047\\010\\
124\\200\\315\\100'
>>> md5(salt).hexdigest()
'ce24166858853dfb12a86d7d602b0638'

Notice how I've escaped the backslashes to keep the digits from being interpreted as octal byte values.

The string in the first case is exactly what you saw printed:

>>> salt = '\\0001\\072\\206\\277\\354\\107\\134\\061\\361\\076\\150\\047\\010\\
124\\200\\315\\100'
>>> md5(salt).hexdigest()
'ce24166858853dfb12a86d7d602b0638'

Notice how I've escaped the backslashes to keep the digits from being interpreted as octal byte values.

Edit:

Assuming you want to create a byte string from the octal values in this list:

data = ['\\0001', '\\072', '\\206', '\\277', '\\354', '\\107', '\\134', 
        '\\061', '\\361', '\\076', '\\150', '\\047', '\\010', '\\124', 
        '\\200', '\\315', '\\100']

You can convert to an integer and then join the characters, but it's different from what you got in IPython. The first value is 4 digits instead of 3. Should it be treated as '\0' followed by an ASCII '1', or should it be treated as '\1'? The following does the latter:

salt = ''.join(chr(int(d[1:], 8)) for d in data)
print repr(salt)
print md5(salt).hexdigest()

Output:

"\x01:\x86\xbf\xecG\\1\xf1>h'\x08T\x80\xcd@"
d2092426d1bd5bec1579c8b7ed9c73c2

回复收藏 0 原文

~没有更多了~

关于作者

糖果控

暂无简介

文章

24 人气

关注发私信

友情链接

文江博客

哈希非 ASCII Python 字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

哈希非 ASCII Python 字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

微信用户

小情绪

追我者格杀勿论

ゞ记忆︶ㄣ

笨死的猪

彭明超

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。