TCL-script (Eggdrop) 存在特殊字符问题
我已经在新的 Debian 服务器上安装了 Eggdrop,但它在处理特殊字符时一直存在问题。
Eggdrop 正在运行 utf-8。我什至在脚本中手动将 TCL 编码强制为 utf-8。我尝试按照 http://eggwiki.org/Utf-8 中的说明重新编译 Eggdrop。
22:00 <@me> !tr fr I have prepared lots of cookies for the entire family.
22:00 <@bot> J'ai préparé beaucoup de biscuits pour toute la famille.
22:00 <@me> !tr ar The special characters are processed.
22:00 <@bot> êêÃE ÃEùçÃDìé çÃDãÃÂñÃA çÃDîçõé.
(另请参阅之前提出的问题,该问题没有得到解决:Eggdrop 上的 TCL 编码问题)
namespace eval gTranslator {
# Factor this out into a helper
proc getJson url {
set tok [http::geturl $url]
set res [json::json2dict [http::data $tok]]
http::cleanup $tok
return $res
}
# How to decode _decimal_ entities; WARNING: high magic factor within!
proc decodeEntities str {
set str [string map {\[ {\[} \] {\]} \$ {\$} \\ \\\\} $str]
subst [regsub -all {&#(\d+);} $str {[format %c \1]}]
}
bind pub - !tr gTranslator::translate
proc translate { nick uhost handle chan text } {
package require http
package require json
set lngto [string tolower [lindex [split $text] 0]]
set text [http::formatQuery q [join [lrange [split $text] 1 end]]]
set dturl "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=$text"
set lng [dict get [getJson $dturl] responseData language]
if { $lng == $lngto } {
putserv "PRIVMSG $chan :\002Error\002 translating $lng to $lngto."
return 0
}
set trurl "http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=$lng%7c$lngto&$text"
putlog $trurl
set res [getJson $trurl]
putlog $res
#putserv "PRIVMSG $chan :Language detected: $lng"
set translated [decodeEntities [dict get $res responseData translatedText]]
putserv "PRIVMSG $chan :[encoding convertto utf-8 $translated]"
}
}
I have installed Eggdrop on a new Debian server, but it keeps having issues with processing special characters.
Eggdrop is running utf-8. I have even manually enforced TCL encoding to utf-8 in the script. And I have tried recompiling Eggdrop with instructions from http://eggwiki.org/Utf-8.
22:00 <@me> !tr fr I have prepared lots of cookies for the entire family.
22:00 <@bot> J'ai préparé beaucoup de biscuits pour toute la famille.
22:00 <@me> !tr ar The special characters are processed.
22:00 <@bot> êêÃE ÃEùçÃDìé çÃDãÃÂñÃA çÃDîçõé.
(Also see a previous Question asked, that did not get solved: Issues with TCL encoding on Eggdrop)
namespace eval gTranslator {
# Factor this out into a helper
proc getJson url {
set tok [http::geturl $url]
set res [json::json2dict [http::data $tok]]
http::cleanup $tok
return $res
}
# How to decode _decimal_ entities; WARNING: high magic factor within!
proc decodeEntities str {
set str [string map {\[ {\[} \] {\]} \$ {\$} \\ \\\\} $str]
subst [regsub -all {(\d+);} $str {[format %c \1]}]
}
bind pub - !tr gTranslator::translate
proc translate { nick uhost handle chan text } {
package require http
package require json
set lngto [string tolower [lindex [split $text] 0]]
set text [http::formatQuery q [join [lrange [split $text] 1 end]]]
set dturl "http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&q=$text"
set lng [dict get [getJson $dturl] responseData language]
if { $lng == $lngto } {
putserv "PRIVMSG $chan :\002Error\002 translating $lng to $lngto."
return 0
}
set trurl "http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&langpair=$lng%7c$lngto&$text"
putlog $trurl
set res [getJson $trurl]
putlog $res
#putserv "PRIVMSG $chan :Language detected: $lng"
set translated [decodeEntities [dict get $res responseData translatedText]]
putserv "PRIVMSG $chan :[encoding convertto utf-8 $translated]"
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您所看到的丑陋混乱是 UTF-8 被解释为 ISO 8859-1。它表明某个地方对字符的含义存在误解,可能是由于通信通道上的线路交叉或应用了额外一轮编码而导致的。因为涉及到相当多的移动部分(IRC 客户端、IRC 服务器、eggdrop、您的脚本、Google 翻译),所以有必要通过调试来指导您。
Tcl 和 Google 能够正确地相互通信(我已经仔细检查了代码),因此我们可以消除这种可能性。因此,问题出在你的 IRC 客户端、IRC 服务器和 Eggdrop 之间;如果他们不同意“在线”字节的解释,你就会受到破坏。
您可以通过使用
encoding Convertto
(和encoding Convertfrom
)在脚本中添加(或删除)重整,但有必要< em>明确你正在做什么,以便把它做好。在内存中,Tcl 将字符串表示为抽象 Unicode 字符序列;它们在内存中“写下”的方式与您无关(事实上,有时会以一种复杂的方式变化,但就运行时而言几乎总是非常高效)。如果普遍认为 IRC 服务器的通道将通过 UTF-8 传递,则您的要求是:处理第一点,我不记得 Eggdrop 是否自动为您处理编码。如果是这样,您只需在绑定的最后阶段执行此操作:
如果没有,您可以执行以下操作:
实验。使用正确的。
关于第二点(客户端),探索其设置并使其正确。请注意,如果客户端在无法正确显示所有 Unicode 字符的情况下运行,则可能会出现其他问题(如果在终端中运行,则这是一个常见问题)。您的 Eggdrop 脚本无法解决这个问题。
That ugly mess you are seeing is UTF-8 interpreted as ISO 8859-1. It indicates that somewhere there's a misinterpretation of what characters mean, and can be caused by either getting wires crossed over a communication channel, or by an extra round of encoding being applied. Because there are rather a lot of moving parts involved (IRC client, IRC server, eggdrop, your script, Google translate) it is necessary to talk you through debugging.
Tcl and Google communicate correctly with each other (I've double-checked the code) so we can eliminate that possibility. The problem is therefore between your IRC client, the IRC server, and eggdrop; if they don't agree on what the interpretation of the bytes “on the wire” is, you get mangling.
You can add (or remove) mangling in the script through the use of
encoding convertto
(andencoding convertfrom
) but it is necessary to be clear what you are doing in order to get it right. In memory, Tcl represents strings as sequences of abstract Unicode characters; the way in which they are “written down” in memory is not your business (and in fact varies from time to time in a complex way that's almost always highly efficient in terms of run-time). If there is a general agreement that the IRC server's channel will be passing through UTF-8, your requirement then is to:Dealing with the first point, I can't remember if eggdrop handles encodings automatically for you or not. If it does, you just do this in the final stage of your binding:
If it does not, you do this:
Experiment. Use the right one.
On the second point (the client), explore its settings and get it right. Be aware that there can be additional problems if the client is running in a situation where it cannot display all Unicode characters correctly (a common problem if running in a terminal). There's nothing that your eggdrop script can do to fix that.
值得注意的是,如果数据的创建者以“编码 a”对其进行编码,并以“编码 b”读取它,那么当您查看它时,文本已经被破坏了。您不能只是告诉 Tcl 用另一种编码对其进行编码并期望它能够工作。
考虑一下:
因为原始解码没有匹配编码,你就有问题了。这不是一个完美的类比,但可能会有所帮助。
It may be worth noting that, if the creator of the data encodes it in "encoding a" and it's read in "encoding b", then the text is already broken by the time you're looking at it. You can't just tell Tcl to encode it in another encoding and expect it to work.
Consider it something like:
Since the original decode didn't match the encoding, you have a problem. This isn't a perfect analogy, but it might help.