实现读取类型类,其中解析字符串包含“$”
我已经使用 Haskell 大约一个月了。对于我的第一个“真正的”Haskell 项目,我正在编写一个词性标注器。作为该项目的一部分,我有一个名为 Tag
的类型,它表示词性标记,实现如下:
data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS ...
上面是一长串标准化词性标记,我已经将其实现故意截断。然而,在这套标准标签中,有两个以美元符号 ($) 结尾:PRP$ 和 NNP$。因为我不能使用名称中包含 $ 的类型构造函数,所以我选择将它们重命名为 PRPS 和 NNPS。
这一切都很好,但我想从词典中的字符串中读取标签并将它们转换为我的 Tag
类型。尝试此操作失败:
instance Read Tag where
readsPrec _ input =
(\inp -> [((NNPS), rest) | ("NNP$", rest) <- lex inp]) input
Haskell 词法分析器因 $ 而卡住。有什么想法如何实现这一目标吗?
实施 Show 相当简单。如果 Read 也有类似的策略,那就太好了。
instance Show Tag where
showsPrec _ NNPS = showString "NNP$"
showsPrec _ PRPS = showString "PRP$"
showsPrec _ tag = shows tag
I've been playing with Haskell for about a month. For my first "real" Haskell project I'm writing a parts-of-speech tagger. As part of this project I have a type called Tag
that represents a parts-of-speech tag, implemented as follows:
data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS ...
The above is a long list of standardized parts-of-speech tags which I've intentionally truncated. However, in this standard set of tags there are two that end in a dollar sign ($): PRP$ and NNP$. Because I can't have type constructors with $ in their name, I've elected to rename them PRPS and NNPS.
This is all well and good, but I'd like to read tags from strings in a lexicon and convert them to my Tag
type. Trying this fails:
instance Read Tag where
readsPrec _ input =
(\inp -> [((NNPS), rest) | ("NNP$", rest) <- lex inp]) input
The Haskell lexer chokes on the $. Any ideas how to pull this off?
Implementing Show was fairly straightforward. It would be great if there were some similar strategy for Read.
instance Show Tag where
showsPrec _ NNPS = showString "NNP$"
showsPrec _ PRPS = showString "PRP$"
showsPrec _ tag = shows tag
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您在这里滥用了
Read
。Show
和Read
旨在打印和解析有效的 Haskell 值,以启用调试等。这并不总是完美的(例如,如果您导入Data.Map
合格,然后对Map
值调用show
,对fromList
的调用不合格),但它是一个有效的起点。如果您想打印或解析您的值以匹配某些特定格式,请为前者使用漂亮的打印库,为后者使用实际的解析库(例如 uu-parsinglib、polyparse、parsec 等)。它们通常比
ReadS
提供更好的解析支持(尽管 GHC 中的ReadP
还不错)。虽然您可能会说这是没有必要的,但这只是您正在做的快速而肮脏的黑客行为,快速而肮脏的黑客行为往往会徘徊......帮自己一个忙,并正确地做事第一次:这意味着当您以后想要“正确”地进行操作时,需要重写的内容会更少。
You're abusing
Read
here.Show
andRead
are meant to print and parse valid Haskell values, to enable debugging, etc. This doesn't always perfectly (e.g. if you importData.Map
qualified and then callshow
on aMap
value, the call tofromList
isn't qualified) but it's a valid starting point.If you want to print or parse your values to match some specific format, then use a pretty-printing library for the former and an actual parsing library (e.g. uu-parsinglib, polyparse, parsec, etc.) for the latter. They typically have much nicer support for parsing than
ReadS
(thoughReadP
in GHC isn't too bad).Whilst you may argue that this isn't necessary, this is just a quick'n'dirty hack you're doing, quick'n'dirty hacks have a tendency to linger around... do yourself a favour and do it right the first time: it means there's less to re-write when you want to do it "properly" later on.
那么就不要使用 Haskell 词法分析器。
read
函数使用 ParSec,您可以在 Real World Haskell 书中找到关于 ParSec 的精彩介绍。这是一些似乎可以工作的代码,
只需运行它即可
代码非常不言自明。
string x
解析器 monad 匹配x
,如果成功(不抛出异常),则返回y
。我们使用choice
在所有这些中进行选择。它将适当回溯,因此如果您添加一个CCC
构造函数,则部分匹配“CCC”的CC
稍后会失败,并且它将回溯到CCC
。当然,如果您不需要这个,则使用<|>
组合器。Don't use the Haskell lexer then. The
read
functions use ParSec, which you can find an excellent introduction to in the Real World Haskell book.Here's some code that seems to work,
just run it with
The code is pretty self explanatory. The
string x
parser monad matchesx
, and if it succeeds (doesn't throw an exception), theny
is returned. We usechoice
to select among all of these. It will backtrack appropriately, so if you add aCCC
constructor, thenCC
partially matching "CCC" will fail later, and it will backtrack toCCC
. Of course, if you don't need this, then use the<|>
combinator.