将数据从新闻提要传递到 IRC 服务器时如何正确处理编码
代码:
import socket, feedparser
feed = feedparser.parse("http://pwnmyi.com/feed")
latest = feed.entries[0]
art_name = latest.title
network = 'irc.rizon.net'
port = 6667
irc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
irc.connect((network, port))
print irc.recv(4096)
irc.send('NICK PwnBot\r\n')
irc.send('USER PwnBot PwnBot PwnBot :PwnBot by Fike\r\n')
irc.send('JOIN #pwnmyi\r\n')
while True:
data = irc.recv(4096)
if data.find('PING') != -1:
irc.send('PONG ' + data.split() [1] + '\r\n')
if data.find( '!latest' ) != -1:
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + art_name + '\r\n')
它连接等等,但是当我在频道中执行 !latest 时,它就这样退出:
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + art_name + '\r\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 55: ordinal not in range(128)
你能帮我调试这段代码吗?它以前对我有用。
Code:
import socket, feedparser
feed = feedparser.parse("http://pwnmyi.com/feed")
latest = feed.entries[0]
art_name = latest.title
network = 'irc.rizon.net'
port = 6667
irc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
irc.connect((network, port))
print irc.recv(4096)
irc.send('NICK PwnBot\r\n')
irc.send('USER PwnBot PwnBot PwnBot :PwnBot by Fike\r\n')
irc.send('JOIN #pwnmyi\r\n')
while True:
data = irc.recv(4096)
if data.find('PING') != -1:
irc.send('PONG ' + data.split() [1] + '\r\n')
if data.find( '!latest' ) != -1:
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + art_name + '\r\n')
It connects etc., but then when I do !latest in the channel, it just quits with this:
irc.send('PRIVMSG #pwnmyi :Latest Article: ' + art_name + '\r\n')
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 55: ordinal not in range(128)
Could you please help me debug this code? It used to work for me before.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
IRC 协议没有定义用于消息的特定字符集编码,而是一个 8 位协议,其中具有用于控制字符的某些八位字节。 (请参阅 rfc1459 第 2.2 节。
显然流行的 mIRC 客户端如果能够识别 utf8 序列,就会对其进行解码,而这对于 irc 的使用来说非常有意义,因为 ascii 代码点使用与 ascii 字符相同的字节进行编码,而非 ascii 代码点都编码为值 > 127。
在 python 中,拼写为
unicode.encode(encoding=' utf8')
像这样:the IRC protocol does not define a particular character set encoding used for messages, rather it's an 8bit protocol, which has certain octets used for control characters. (See rfc1459 section 2.2.
Apparently the popular mIRC client will decode utf8 sequences if it recognizes them as such, and this makes pretty decent sense for irc's use since ascii codepoints are encoded with the same bytes as the ascii characters, and non-ascii codepoints are all encoded as values > 127.
In python, that's spelled
unicode.encode(encoding='utf8')
like so:您必须对发布到 IRC 服务器的字符串进行编码。此外,根据 feedparser 返回的内容,您可能希望从特定编码对其进行解码。
编码取决于提要包含的内容。
You'll have to encode the string you post to the IRC server. Also, depending on what feedparser returns, you might want to decode it from a specific encoding.
Encoding depends on what does the feed contain.
latest.title
中包含非 ASCII 字符。您必须删除它们、转义它们或翻译它们。
廉价且简单的方法是使用 repr()
或更好的
方法。从长远来看,您需要处理输入中的非 ASCII 字符。
latest.title
has non-ASCII characters in it.You must either remove them, escape them or translate them.
The cheap and easy way out is to use
repr()
Or better
In the long run, you need to address non-ASCII characters in your input.
就我个人而言,我建议将所有字符串转换为“utf-8”,您可以使用以下方法对 unicode 字符串进行编码/解码:
这是一个解释 Python Unicode 的优秀网站:http://farmdev.com/talks/unicode
其中最好的 3 个技巧是:
Personally I'd recommend converting all strings to 'utf-8', you can encode/decode unicode strings using this:
This is an excellent website that explains Python's Unicode: http://farmdev.com/talks/unicode
The best 3 tips from it are: