看一下问题:通常,在交互式 Haskell 环境中,非拉丁 Unicode 字符(构成结果的一部分)会被转义打印,即使语言环境允许此类字符(而不是通过 putStrLn< 直接输出) /code>, putChar
看起来不错并且可读)--示例显示了 GHCi 和 Hugs98:
$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Prelude> 'Я'
'\1071'
Prelude> putStrLn "hello: привет"
hello: привет
Prelude> :q
Leaving GHCi.
$ hugs -98
__ __ __ __ ____ ___ _________________________________________
|| || || || || || ||__ Hugs 98: Based on the Haskell 98 standard
||___|| ||__|| ||__|| __|| Copyright (c) 1994-2005
||---|| ___|| World Wide Web: http://haskell.org/hugs
|| || Bugs: http://hackage.haskell.org/trac/hugs
|| || Version: September 2006 _________________________________________
Hugs mode: Restart with command line option +98 for Haskell 98 mode
Type :? for help
Hugs> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Hugs> 'Я'
'\1071'
Hugs> putStrLn "hello: привет"
hello: привет
Hugs> :q
[Leaving Hugs]
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$
我们可以猜测这是因为 print
和 show
用于格式化结果,这些函数尽最大努力以规范的、最大程度可移植的方式格式化数据——所以他们更喜欢转义奇怪的字符(也许,它甚至在 Haskell 的标准中拼写出来):
$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> show 'Я'
"'\\1071'"
Prelude> :q
Leaving GHCi.
$ hugs -98
Type :? for help
Hugs> show 'Я'
"'\\1071'"
Hugs> :q
[Leaving Hugs]
$
但如果我们知道如何破解,那就太好了GHCi 或 Hugs 以人类可读的方式打印这些字符,即直接、不转义。当在教育目的中使用交互式 Haskell 环境时,您可以在非英语观众面前进行 Haskell 教程/演示,您希望向他们展示一些有关人类语言数据的 Haskell 信息,这一点可以得到赞赏。
实际上,它不仅可用于教育目的,还可用于调试!当您的函数是在表示其他语言的单词(包含非 ASCII 字符)的字符串上定义时。因此,如果程序是特定于语言的,并且只有另一种语言的单词才有意义作为数据,并且您的函数仅在这些单词上定义,那么在 GHCi 中进行调试以查看此数据非常重要。
总结我的问题:有哪些方法可以破解现有的交互式 Haskell 环境,以便在结果中更友好地打印 Unicode? (“友好”在我的情况下意味着“更简单”:我希望在 GHCi 或 Hugs 中使用 print
以简单直接的方式显示非拉丁字符,如 putChar
所做的那样, putStrLn
,即未转义。)
(也许,除了 GHCi 和 Hugs98 之外,我还会看看现有的与 Haskell 交互的 Emacs 模式,看看它们是否可以以漂亮的、未转义的方式呈现结果时尚。)
Look at the problem: Normally, in the interactive Haskell environment, non-Latin Unicode characters (that make a part of the results) are printed escaped, even if the locale allows such characters (as opposed to direct output through putStrLn
, putChar
which looks fine and readable)--the examples show GHCi and Hugs98:
$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Prelude> 'Я'
'\1071'
Prelude> putStrLn "hello: привет"
hello: привет
Prelude> :q
Leaving GHCi.
$ hugs -98
__ __ __ __ ____ ___ _________________________________________
|| || || || || || ||__ Hugs 98: Based on the Haskell 98 standard
||___|| ||__|| ||__|| __|| Copyright (c) 1994-2005
||---|| ___|| World Wide Web: http://haskell.org/hugs
|| || Bugs: http://hackage.haskell.org/trac/hugs
|| || Version: September 2006 _________________________________________
Hugs mode: Restart with command line option +98 for Haskell 98 mode
Type :? for help
Hugs> "hello: привет"
"hello: \1087\1088\1080\1074\1077\1090"
Hugs> 'Я'
'\1071'
Hugs> putStrLn "hello: привет"
hello: привет
Hugs> :q
[Leaving Hugs]
$ locale
LANG=ru_RU.UTF-8
LC_CTYPE="ru_RU.UTF-8"
LC_NUMERIC="ru_RU.UTF-8"
LC_TIME="ru_RU.UTF-8"
LC_COLLATE="ru_RU.UTF-8"
LC_MONETARY="ru_RU.UTF-8"
LC_MESSAGES="ru_RU.UTF-8"
LC_PAPER="ru_RU.UTF-8"
LC_NAME="ru_RU.UTF-8"
LC_ADDRESS="ru_RU.UTF-8"
LC_TELEPHONE="ru_RU.UTF-8"
LC_MEASUREMENT="ru_RU.UTF-8"
LC_IDENTIFICATION="ru_RU.UTF-8"
LC_ALL=
$
We can guess that it's because print
and show
are used to format the result, and these functions do their best to format the data in a canonical, maximally portable way -- so they prefer to escape the strange characters (perhaps, it's even spelled out in a standard for Haskell):
$ ghci
GHCi, version 7.0.1: http://www.haskell.org/ghc/ :? for help
Prelude> show 'Я'
"'\\1071'"
Prelude> :q
Leaving GHCi.
$ hugs -98
Type :? for help
Hugs> show 'Я'
"'\\1071'"
Hugs> :q
[Leaving Hugs]
$
But still it would be nice if we knew how to hack GHCi or Hugs to print these characters in the pretty human-readable way, i.e. directly, unescaped. This can be appreciated when using the interactive Haskell environment in educational purposes, for a tutorial/demonstration of Haskell in front of a non-English audience whom you want to show some Haskell on data in their human language.
Actually, it's not only useful for educational purposes but for debugging, as well! When you have functions that are defined on strings representing words of other languages, with non-ASCII characters. So, if the program is language-specific, and only words of another language make sense as the data, and you have functions that are defined only on such words, it's important for debugging in GHCi to see this data.
To sum up my question: What ways to hack the existing interactive Haskell environments for a friendlier printing of Unicode in the results are there? ("Friendlier" means even "simpler" in my case: I'd like print
in GHCi or Hugs to show non-Latin characters the simple direct way as done by putChar
, putStrLn
, i.e. unescaped.)
(Perhaps, besides GHCi and Hugs98, I'll also have a look at existing Emacs modes for interacting with Haskell to see if they can present the results in the pretty, unescaped fashion.)
发布评论
评论(8)
解决这个问题的一种方法是将 GHCi 包装到 shell 包装器中,该包装器读取其标准输出并转义 Unicode 字符。这当然不是 Haskell 方式,但它可以完成工作:)
例如,这是一个使用
sh
和python3
的包装器ghci-esc
code>(3 在这里很重要):ghci-esc
的用法:请注意,并非上面所有的转义都正确完成,但这是向观众显示 Unicode 输出的快速方法。
One way to hack this is to wrap GHCi into a shell wrapper that reads its stdout and unescapes Unicode characters. This is not the Haskell way of course, but it does the job :)
For example, this is a wrapper
ghci-esc
that usessh
andpython3
(3 is important here):Usage of
ghci-esc
:Note that not all unescaping above is done correctly, but this is a fast way to show Unicode output to your audience.
这个问题已经取得了一些进展;感谢 bravit (Vitaly Bragilevsky)!:
可能并入 GHC 7.6.1。 (是吗?..)
如何让它现在打印西里尔字母 :
实际问题:想出一个不错的替代方案
Show
那么,现在有一个实际问题:用什么来代替标准
Show
(其中 - 标准 < code>Show--违背我们的意愿转义想要的符号)?使用其他人的工作:其他漂亮的打印机
上面建议使用
Text.PrettyPrint.Leijen
,可能是因为已知它不会转义字符串中的此类符号。我们自己的基于 Show 的 Show —— 有吸引力,但不实用
编写我们自己的
Show
怎么样,比如,ShowGhci
正如这里的答案中所建议的那样。它实用吗?...为了节省为替代
Show
类(如ShowGhci
)定义实例的工作,人们可能会想使用Show 的现有实例
默认情况下,仅重新定义String
和Char
的实例。但这是行不通的,因为如果您使用showGhci = show
,那么对于任何包含字符串的复杂数据,show
都会“硬编译”以调用旧的show
显示字符串。这种情况要求能够将实现相同类接口的不同字典传递给使用该接口的函数(show
会将其传递给子show
)。有 GHC 扩展吗?如果您希望它成为“通用”,基于
Show
并只想重新定义Char
和String
的实例并不是很实用(广泛适用)作为显示
。重新解析
show
更实用(且简短)的解决方案在另一个答案中:解析
show
的输出以检测字符和字符串,并重新格式化它们。 (虽然在语义上看起来有点难看,但在大多数情况下,解决方案简短且安全(如果show
中没有用于其他目的的引号;对于标准内容来说一定不是这种情况,因为show
是或多或少正确的可解析 Haskell。)程序中的语义类型
还有一件事
实际上,如果我们关心 GHCi 中的调试(而不仅仅是演示 Haskell 并希望拥有) 。一个漂亮的输出),显示非 ASCII 字母的需要必须来自于程序中这些字符的某些固有存在(否则,为了调试,您可以用拉丁字符替换它们,或者不太关心显示代码)。换句话说,从问题域的角度来看,这些字符或字符串有一些含义(例如,我最近从事俄语的语法分析,俄语单词为。示例词典的一部分“固有地”存在于我的程序中,它的工作只有在使用这些特定单词时才有意义。所以我需要在调试时读取它们。)
但是看,如果这些字符串有某种含义,那么它们就不再是纯字符串了;它们是字符串。它是有意义类型的数据。如果您为这种含义声明一个特殊类型,程序可能会变得更好、更安全。
然后,万岁!您只需为此类型定义
Show
实例即可。您可以在 GHCi 中调试您的程序了。举个例子,在我的语法分析程序中,我做了:
and
(这里额外的
fromString
是因为我可能会将内部表示从String
切换到ByteString
或其他)除了能够很好地
展示
它们之外,我还变得更安全,因为在编写代码时我无法混合不同类型的单词。There has been some progress with this issue; thanks to bravit (Vitaly Bragilevsky)!:
Probably incorporated into GHC 7.6.1. (Is it?..)
How to make it print Cyrillic now:
Practical problem: coming up with an alternative nice
Show
So, now there is a practical problem: what to use as a substitute of the standard
Show
(which--the standardShow
--escapes the wanted symbols against our wish)?Using others' work: other pretty-printers
Above,
Text.PrettyPrint.Leijen
is suggested, probably because it is known not escape such symbols in strings.Our own Show based on Show -- attractive, but not practical
What about writing our own
Show
, say,ShowGhci
as was suggested in an answer here. Is it practical?..To save work defining the instances for an alternative
Show
class (likeShowGhci
), one might be tempted to use the existing instances ofShow
by default, only re-define the instance forString
andChar
. But that won't work, because if you useshowGhci = show
, then for any complex data containing stringsshow
is "hard-compiled" to call oldshow
to show the string. This situation asks for the ability to pass different dictionaries implementing the same class interface to functions which use this interface (show
would pass it down to subshow
s). Any GHC extensions for this?Basing on
Show
and wanting to redefine only the instances forChar
andString
is not very practical, if you want it to be as "universal" (widely applicable) asShow
.Re-parsing
show
A more practical (and short) solution is in another answer here: parse the output from
show
to detect chars and strings, and re-format them. (Although seems a bit ugly semantically, the solution is short and safe in most cases (if there are no quotes used for other purposes inshow
; must not be the case for standard stuff, because the idea ofshow
is to be more-or-less correct parsable Haskell.)Semantic types in your programs
And one more remark.
Actually, if we care about debugging in GHCi (and not simply demonstrating Haskell and wanting to have a pretty output), the need for showing non-ASCII letters must come from some inherent presence of these characters in your program (otherwise, for debugging, you could substitute them with Latin characters or not care much about being shown the codes). In other words, there is some MEANING in these characters or strings from the point of view of the problem domain. (For example, I've been recently engaged with grammatical analysis of Russian, and the Russian words as part of an example dictionary were "inherently" present in my program. Its work would make sense only with these specific words. So I needed to read them when debugging.)
But look, if the strings have some MEANING, then they are not plain strings any more; it's data of a meaningful type. Probably, the program would become even better and safer, if you would declare a special type for this kind of meanings.
And then, hooray!, you simply define your instance of
Show
for this type. And you are OK with debugging your program in GHCi.As an example, in my program for grammatical analysis, I have done:
and
(the extra
fromString
here is because I might switch the internal representation fromString
toByteString
or whatever)Apart from being able to
show
them nicely, I got safer because I wouldn't be able to mix different types of words when composing my code.Ghci 的下一个版本 7.6.1 中情况将会发生变化,因为它提供了一个名为:-interactive-print 的新 Ghci 选项。
这是从 ghc-manual 复制的:(我编写了 myShow 和 myPrint 如下)
并且它们运行良好:
Things will change on the next version 7.6.1 of Ghci as it supplies a new Ghci option called: -interactive-print.
Here is copied from ghc-manual: (And I writed myShow and myPrint as follows)
And they work well:
选项 1(不好):
修改这行代码:
https ://github.com/ghc/packages-base/blob/ba98712/GHC/Show.lhs#L356
并重新编译 ghc。
选项 2(大量工作):
当 GHCi 类型检查已解析的语句时,它最终会出现在依赖于
mkPlan
的tcRnStmt
中(均位于 https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs)。这会尝试对所输入的语句的多个变体进行类型检查,包括:具体来说:
此处可能需要更改的是
printName
(它绑定到System.IO.print
)。如果它绑定到类似printGhci
的东西,那么Ghci 就可以通过将不同的实例引入上下文来更改打印的内容。
Option 1 (bad):
Modify this line of code:
https://github.com/ghc/packages-base/blob/ba98712/GHC/Show.lhs#L356
And recompile ghc.
Option 2 (lots of work):
When GHCi type checks a parsed statement it ends up in
tcRnStmt
which relies onmkPlan
(both in https://github.com/ghc/ghc/blob/master/compiler/typecheck/TcRnDriver.lhs). This attempts to type check several variants of the statement that was typed in including:Specifically:
All that might need to change here is
printName
(which binds toSystem.IO.print
). If it instead bound to something likeprintGhci
which was implemented like:Ghci could then change what is printed by bringing different instances into context.
您可以切换到使用 'text' IO 包。例如,
该包是标准 Haskell 发行版的一部分,tHaskell 平台,并提供高效的打包、不可变的 Unicode 文本IO 操作类型。支持许多编码。
使用 .ghci 文件,您可以将 -XOverloadStrings 设置为默认启用,并编写
:def
宏来引入通过显示值的
:text
命令仅文本。那行得通。You could switch to using the 'text' package for IO. E.g.
The package is part of the standard Haskell distribution, the Haskell Platform, and provides an efficient packed, immutable Unicode text type with IO operations. Many encodings are supported.
Using a .ghci file you could set -XOverloadStrings to be on by default, and write a
:def
macro to introduce a:text
command that shows a value viatext
only. That would work.现在我知道了 ghci 的
-interactive-print
,这是一个很棒的功能。非常感谢您写下问题和答案!顺便说一句,我可以在网上找到的现有漂亮打印机有一些极端情况,并且问题编写好的 Unicodeshow
事实证明比看起来更复杂。因此,我决定为此目的编写一个 Haskell 包 unicode-show ,(希望如此) ) 可以很好地打印角写字符串和复合类型。
衷心祝愿这个软件包对搜索此问答的人有用:)
Now that I know ghci's
-interactive-print
, this is a great feature. Many thanks for writing the question and answers! By the way, existing pretty printers I can find on the web have some corner cases, and the problem of writing good Unicodeshow
turned out to be more complicated than it seems.Therefore, I decided to write a Haskell package unicode-show for this purpose, that (hopefully) prints cornercase strings and compound types well.
Best wishes, that this package is useful to people who searched for this Q&A :)
理想的是 ghci 的补丁,允许用户
:set
一个函数来显示除show
以外的结果。目前不存在这样的功能。然而,Don 对:def
宏(带或不带文本包)的建议一点也不差。What would be ideal is a patch to ghci allowing the user to
:set
a function to use for displaying results other thanshow
. No such feature currently exists. However, Don's suggestion for a:def
macro (with or without the text package) isn't bad at all.一种可能的好解决方案是:
pretty-simple
,例如使用cabal
:~/.ghci
:pretty- simple
库在打印各种类型的数据时提供了额外的好处。One possible good solution is:
pretty-simple
, for example withcabal
:~/.ghci
:The
pretty-simple
library provides additional benefits when printing various types of data.