Haskell 程序删除注释
我正在尝试编写一个 Haskell 程序,该程序接受 Java 程序 (.java) 并输出它并删除所有注释。输入的语法不必是正确的。我已将 IO 组件设置为如下所示:
main =
do
javaFile <- getFileName
text <- readFile javaFile
displayProgram ( AAAA )
return ()
AAAA
是获取文本并生成带有注释的新文本的表达式 已删除。请注意,这些函数是必需的:
getFileName :: IO [Char]
displayProgram :: [String] -> IO ()
我知道算法非常简单:
- 搜索
//
并删除整行文本。 - 搜索
/*
并删除以下所有文本,直到到达*/
并删除*/
。这当然应该同时处理块注释和文档注释。 - 输出剩余的文本。
然而,Haskell 并不是我最擅长的语言之一。任何帮助将不胜感激。
I'm trying to write a Haskell program that takes a Java program (.java) and outputs it with all of its comments removed. The input does not have to be syntactically correct. I've set up the IO component to look like so:
main =
do
javaFile <- getFileName
text <- readFile javaFile
displayProgram ( AAAA )
return ()
AAAA
is the expression that takes the text and produces the new text with comments
removed. Notice that these functions are required:
getFileName :: IO [Char]
displayProgram :: [String] -> IO ()
I know the algorithm is pretty straightforward:
- Search for
//
and remove that entire line of text. - Search for
/*
and remove all of the following text until you reach*/
and remove*/
as well. This should of course take care of both block comments and doc comments. - Output the remaining text.
However, Haskell is not one of my strongest languages. Any help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您的算法是错误的:您的搜索模式可能出现在字符串内部,您的代码需要考虑到这一点。最简单的例子是带有注释的 quine:
Your algorithm is wrong: your search patterns may occur inside strings and your code needs to take that into account. The simplest example is a quine with comments:
您可以使用这样的函数:
这将以递归方式简单地“循环”字符串(但是,它是尾递归的,因此它就像一个循环)并复制不在注释内的每个字符。
以下函数用于检测注释的结束。它们忽略除结束分隔符之外的任何字符,因此模式匹配中的下划线。
但是,如果您使用更复杂的解析,我建议尝试一下 Parsec 单子解析库。
编辑:正如 user268396 指出的那样,您应该意识到看起来像注释的东西可能隐藏在字符串中。您可能希望使用“inString”函数扩展上述函数,该函数不会忽略遇到的字符,但如果遇到这些字符的起始分隔符,则不会切换到 inComment 或 inMultiComment。
You can use a function like that:
This will simply “loop“ through the string in a recursive manner (however, it’s tail recursive, thus it’s like a loop) and copy each char which is not within a comment.
The following functions are used to detect the end of a comment. They ignore any characters except for the ending delimiters, thus the underscore in the pattern match.
If you use more sophisticated parsing however I recommend taking a shot on the Parsec monadic parsing library.
EDIT: As user268396 pointed out you should be aware that something looking like a Comment may be hiding in a String. You might want to extend the above functions with a “inString” function which does not ignore the characters it encounters, yet does not switch to inComment or inMultiComment if it encounters starting delimiters for these.
作为类似内容的示例,请参阅我如何从 点代码;请注意,我正在使用我定义的组合器(但未完全注释)此处 用于 PolyParse 中的文本解析器。
它不考虑字符串等中的注释,但确实使用
/* ... */
和// .. 删除表单的所有注释。 .
As an example for something similar, see how I strip out comments, etc. from Dot code; note that I'm using combinators I've defined (but not fully commented) here for use with the Text parser in PolyParse.
It doesn't consider comments in Strings, etc., but does remove all comments of the form using
/* ... */
and// ...
可能有 3 种实现方法:
对于家庭作业,我会使用手动匹配
为了实现稳健的实现,我会选择 Text.Parsec
对于快速而肮脏的解决方案,我会选择 Text.Regex
3 implementation approaches are possible:
For a homework, I'd go with manual matching
For a robust implementation, I'd go with Text.Parsec
For a quick and dirty solution, I'd go with Text.Regex