¦ 什么时候不等于¦?

发布于 2024-08-31 11:29:18 字数 2030 浏览 5 评论 0原文

背景。我正在使用 netlists,一般来说,人们通过使用 指定不同的层次结构/。但是,实际使用 / 作为实例名称的一部分并不违法。

例如,X1/X2/X3/X4 可能引用另一个名为 X1/X2/X3 的实例中的实例 X4。或者,它可能会在名为 X1 的实例内引用名为 X2 的实例中名为 X3/X4 的实例。知道了?

实际上没有“常规”字符不能用作实例名称的一部分,因此您求助于不可打印的字符,或者......也许是标准 0..127 之外的字符ASCII 字符。

我想我应该尝试(十进制)166,因为对我来说,它显示为管道:...

所以...我有一些 C++ 代码,它使用 作为分层分隔符来构造路径名,因此上面的路径看起来像 X1⁠X2/X3⁠X4

现在 GUI 是用 Tcl/Tk 编写的,为了正确地将其翻译成人类可读的术语,我需要执行如下操作:

set path [getPathFromC++] ;# returns X1¦X2/X3¦X4
set humanreadable [join [split $path ¦] /]

基本上,用 / 替换 (我也可以使用 [string map] 来完成此操作。

现在的问题是,我从 C++ 获得的字符串中的 与我可以在 Tcl 中创建的 不匹配。即失败:

set path [getPathFromC++] ;# returns X1¦X2/X3¦X4
string match $path [format X1%cX2/X3%cX4 166 166]

从视觉上看,两个字符串看起来相同,但字符串匹配失败。我什至尝试使用 scan 来查看是否混淆了位值。但是

set path [getPathFromC++] ;# returns X1¦X2/X3¦X4
set path2 [format X1%cX2/X3%cX4 166 166]
for {set i 0} {$i < [string length $path]} {incr i} {
   set p [string range $path $i $i]
   set p2 [string range $path2 $i $i]
   scan %c $p c
   scan %c $p2 c2
   puts [list $p $c :::: $p2 $c2 equal? [string equal $c $c2]]
}

生成的输出看起来就像所有东西都应该匹配,除了[string equal]对于带有打印行的...字符失败

¦ 166 :::: ¦ 166 equal? 0

:值得一提的是,C++ 中的字符定义为:

const char SEPARATOR = 166;

有什么想法为什么超出常规 ASCII 范围的字符会像这样失败吗?当我将分隔符更改为(十进制)28 (^\) 时,一切正常。我只是不想在不同的平台上遇到类似的问题。 (我目前使用的是 Redhat Linux)。

Background. I'm working with netlists, and in general, people specify different hierarchies by using /. However, it's not illegal to actually use a / as a part of an instance name.

For example, X1/X2/X3/X4 might refer to instance X4 inside another instance named X1/X2/X3. Or it might refer an instance named X3/X4 inside an instance named X2 inside an instance named X1. Got it?

There's really no "regular" character that cannot be used as a part of an instance name, so you resort to a non-printable one, or ... perhaps one outside of the standard 0..127 ASCII chars.

I thought I'd try (decimal) 166, because for me it shows up as the pipe: ¦.

So... I've got some C++ code which constructs the path name using ¦ as the hierarchical separator, so the path above looks like X1¦X2/X3¦X4.

Now the GUI is written in Tcl/Tk, and to properly translate this into human readable terms I need to do something like the following:

set path [getPathFromC++] ;# returns X1¦X2/X3¦X4
set humanreadable [join [split $path ¦] /]

Basically, replace the ¦ with / (I could also accomplish this with [string map]).

Now, the problem is, the ¦ in the string I get from C++ doesn't match the ¦ I can create in Tcl. i.e. This fails:

set path [getPathFromC++] ;# returns X1¦X2/X3¦X4
string match $path [format X1%cX2/X3%cX4 166 166]

Visually, the two strings look identical, but string match fails. I even tried using scan to see if I'd mixed up the bit values. But

set path [getPathFromC++] ;# returns X1¦X2/X3¦X4
set path2 [format X1%cX2/X3%cX4 166 166]
for {set i 0} {$i < [string length $path]} {incr i} {
   set p [string range $path $i $i]
   set p2 [string range $path2 $i $i]
   scan %c $p c
   scan %c $p2 c2
   puts [list $p $c :::: $p2 $c2 equal? [string equal $c $c2]]
}

Produces output which looks like everything should match, except the [string equal] fails for the ¦ characters with a print line:

¦ 166 :::: ¦ 166 equal? 0

For what it's worth, the character in C++ is defined as:

const char SEPARATOR = 166;

Any ideas why a character outside the regular ASCII range would fail like this? When I changed the separator to (decimal) 28 (^\), things worked fine. I just don't want to get bit by a similar problem on a different platform. (I'm currently using Redhat Linux).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

美人迟暮 2024-09-07 11:29:18

Latin-1 有两个不同的竖线字符:

  • 124 | VERTICAL LINE
  • 166 ⁠ BROKEN BAR

一些较旧的字体混淆了这两个字形。

Latin-1 has two different vertical bar characters:

  • 124 | VERTICAL LINE
  • 166 ¦ BROKEN BAR

Some older fonts mixed up the two glyphs.

年少掌心 2024-09-07 11:29:18

据我了解,现代版本的 TCL 在内部使用 UTF-8 进行字符串表示。在 UTF-8 中,十进制 166 是一个字符的一半,所以难怪一切都会崩溃。 ;-)

我的猜测是您的 C++ 代码使用的是 Latin-1 字符串(即 char *),并且您将其传递给 TCL,TCL 将其解释为 UTF-8 字符串。在将 C++ 字符串传递给任何 TCL C 函数之前,您需要将其转换为 UTF-8。 TCL 提供了一些用于此目的的函数

您可以阅读有关 TCL 和 UTF-8 的更多信息。

As I understand it, modern versions of TCL use UTF-8 internally for string representation. In UTF-8, decimal 166 is half of a character, so it's no wonder that all hell is breaking loose. ;-)

My guess is that your C++ code is using a Latin-1 string (i.e., char *) and you're passing that to TCL which is interpreting it as a UTF-8 string. You need to convert your C++ string to UTF-8 before passing it to any TCL C functions. TCL provides some functions for this purpose.

You can read more about TCL and UTF-8.

ι不睡觉的鱼゛ 2024-09-07 11:29:18

在我的系统上,tcl 脚本 puts [format %c 166] 以 UTF-8 格式输出(“\xC2\xA6”),而 C++ 语句 cout << "\xA6"; 输出 Latin-1。确保编码差异不会让您失望。

On my system, the tcl script puts [format %c 166] outputs in UTF-8 ("\xC2\xA6"), while the C++ statement cout << "\xA6"; outputs Latin-1. Make sure encoding differences aren't throwing you off.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文