为 tcl 添加 utf-8 支持

发布于 2024-11-01 14:39:36 字数 669 浏览 0 评论 0原文

set botlisten(port) "3333"
set botlisten(password) "123456"
set botlisten(channel) "#chan"
listen $botlisten(port) script botlisten
proc botlisten {idx} {
    control $idx botlisten2
}
proc botlisten2 {idx args} {
global botlisten newTorrentChannel
set args [join $args]
set botlisten(pass) [lindex [split $args] 0]
set botlisten(message) [join [lrange [split $args] 1 end]]
if {[string match $botlisten(pass) $botlisten(password)]} then {
   putquick "PRIVMSG $botlisten(channel) :$botlisten(message)"
 } else {
  putlog "Unauthorized person tried to connect to the bot"
  }
}  

假设消息有这些字符:ąčęėįšųūž 所以机器人输出奇怪的字符。所以,我认为解决方案是添加 utf-8 支持。

set botlisten(port) "3333"
set botlisten(password) "123456"
set botlisten(channel) "#chan"
listen $botlisten(port) script botlisten
proc botlisten {idx} {
    control $idx botlisten2
}
proc botlisten2 {idx args} {
global botlisten newTorrentChannel
set args [join $args]
set botlisten(pass) [lindex [split $args] 0]
set botlisten(message) [join [lrange [split $args] 1 end]]
if {[string match $botlisten(pass) $botlisten(password)]} then {
   putquick "PRIVMSG $botlisten(channel) :$botlisten(message)"
 } else {
  putlog "Unauthorized person tried to connect to the bot"
  }
}  

Let say message have these chars: ąčęėįšųūž so bot output strange chars. So, in my opinion solution is add utf-8 support.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

水中月 2024-11-08 14:39:37

Tcl 已经完全集成了 UTF-8 支持十多年(从 Tcl 8.1 开始,尽管没有人再使用该版本,因为有单调更好的版本)。

然而,一般来说,有必要告诉 Tcl 在与外界的特定通信通道上使用什么编码(使用 fconfigure-encoding 选项)。 Tcl 使用与系统相关的默认猜测;在我的系统上,它实际上是 UTF-8,但在其他系统上它是 ISO 8859-1 或 -15 或适当的 Windows 代码页。 (顺便说一句,Tcl 擅长进行默认猜测。)在套接字上,情况更加尴尬,因为编码实际上是协议级别的决定(某些协议指定了特定的编码 - SMTP 指定,IIRC - 在协议操作期间某些开关编码) – HTTP 是一个典型的例子 – 有些根本没有指定 – IRC 是一个典型的例子)。在某些情况下,encoding命令是必要的,以便脚本可以手动控制字节序列和字符之间的转换。不过这种情况相当罕见。

当然,如果使用的代码只是获取 Tcl 的字符串并使用低级网络(hellooo,eggdrop!)盲目地将它们推送到网络上,那么一般 Tcl 级别实际上并没有那么多功能。在这种情况下的解决方法是构建 Eggdrop 以使用不同的编码(如 Zero 的评论中的链接 所说) )或使用 encoding 进行修改,如下所示:

将 UTF-8 转换为编码形式:

set encoded [encoding convertto utf-8 $normalString]

将编码的 UTF-8 转换回普通字符串:

set normalString [encoding convertfrom utf-8 $encoded]

Tcl has had fully-integrated UTF-8 support for well over a decade (since Tcl 8.1, though nobody sane uses that version any more as there are monotonically better ones).

However, in general it is necessary to tell Tcl about what encoding is used on a particular communications channel with the outside world (with fconfigure's -encoding option). Tcl uses a default guess that is system dependent; on my system, it's actually UTF-8 but on others it is ISO 8859-1 or -15 or the appropriate Windows codepage. (Tcl's good at making default guesses BTW.) On sockets it's more awkward, since the encoding is something that's really a protocol-level decision (some protocols specify a particular encoding – SMTP does, IIRC – some switch encodings during the operation of the protocol – HTTP is a prime example of that – and some don't specify at all – IRC is the classic example of that). In some cases, the encoding command is necessary, so that scripts can take manual control over the conversion between byte sequences and characters. It's rather rare though.

Of course, if code is being used is just taking Tcl's strings and pushing them blindly across the net using low-level networking (hellooo, eggdrop!) then there's not really all that much the general Tcl level can do. The workarounds in that case are either to build eggdrop to use a different encoding (as Zero's link from his comment says) or to use encoding to do the munging, like this:

Convert UTF-8 into encoded form:

set encoded [encoding convertto utf-8 $normalString]

Convert encoded UTF-8 back into a normal string:

set normalString [encoding convertfrom utf-8 $encoded]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文