Haskell 程序删除注释

发布于 2024-12-11 22:03:21 字数 622 浏览 0 评论 0原文

我正在尝试编写一个 Haskell 程序，该程序接受 Java 程序 (.java) 并输出它并删除所有注释。输入的语法不必是正确的。我已将 IO 组件设置为如下所示：

main =
  do
     javaFile <- getFileName
     text <- readFile javaFile
     displayProgram ( AAAA )
     return ()

AAAA 是获取文本并生成带有注释的新文本的表达式已删除。请注意，这些函数是必需的：

getFileName :: IO [Char]
displayProgram :: [String] -> IO ()

我知道算法非常简单：

搜索 // 并删除整行文本。
搜索 /* 并删除以下所有文本，直到到达 */ 并删除 */。这当然应该同时处理块注释和文档注释。
输出剩余的文本。

然而，Haskell 并不是我最擅长的语言之一。任何帮助将不胜感激。

原文

I'm trying to write a Haskell program that takes a Java program (.java) and outputs it with all of its comments removed. The input does not have to be syntactically correct. I've set up the IO component to look like so:

main =
  do
     javaFile <- getFileName
     text <- readFile javaFile
     displayProgram ( AAAA )
     return ()

AAAA is the expression that takes the text and produces the new text with comments
removed. Notice that these functions are required:

getFileName :: IO [Char]
displayProgram :: [String] -> IO ()

I know the algorithm is pretty straightforward:

Search for // and remove that entire line of text.
Search for /* and remove all of the following text until you reach */ and remove */ as well. This should of course take care of both block comments and doc comments.
Output the remaining text.

However, Haskell is not one of my strongest languages. Any help would be greatly appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦里兽 2024-12-18 22:03:21

您的算法是错误的：您的搜索模式可能出现在字符串内部，您的代码需要考虑到这一点。最简单的例子是带有注释的 quine：

package quine;
public class Quine {
   /**
    * This is a quine.
    */
   public static void main(String[] args) {
     String s1 = "package quine;\npublic class Quine {\n  /**\n   * This is a quine.\n   */\npublic static void main(String[] args) {\nString s1 = \"";
     // further code elided.
   }
}

Your algorithm is wrong: your search patterns may occur inside strings and your code needs to take that into account. The simplest example is a quine with comments:

package quine;
public class Quine {
   /**
    * This is a quine.
    */
   public static void main(String[] args) {
     String s1 = "package quine;\npublic class Quine {\n  /**\n   * This is a quine.\n   */\npublic static void main(String[] args) {\nString s1 = \"";
     // further code elided.
   }
}

回复收藏 0 原文

清风无影 2024-12-18 22:03:21

您可以使用这样的函数：

stripComments :: String -> String
stripComments [] = []
stripComments ('/':'/':xs) = inComment xs 
stripComments ('/':'*':xs) = inMultiComment xs
stripComments (x:xs) = x : stripComments xs

这将以递归方式简单地“循环”字符串（但是，它是尾递归的，因此它就像一个循环）并复制不在注释内的每个字符。

以下函数用于检测注释的结束。它们忽略除结束分隔符之外的任何字符，因此模式匹配中的下划线。

inComment :: String -> String
inComment ('\n':xs) = stripComments xs
inComment (_:xs) = stripComments xs
inComment [] = []

inMultiComment :: String -> String
inMultiComment ('*':'/':xs) = stripComments xs
inMultiComment (_:xs) = inMultiComment xs
inMultiComment [] = []

但是，如果您使用更复杂的解析，我建议尝试一下 Parsec 单子解析库。

编辑：正如 user268396 指出的那样，您应该意识到看起来像注释的东西可能隐藏在字符串中。您可能希望使用“inString”函数扩展上述函数，该函数不会忽略遇到的字符，但如果遇到这些字符的起始分隔符，则不会切换到 inComment 或 inMultiComment。

You can use a function like that:

stripComments :: String -> String
stripComments [] = []
stripComments ('/':'/':xs) = inComment xs 
stripComments ('/':'*':xs) = inMultiComment xs
stripComments (x:xs) = x : stripComments xs

This will simply “loop“ through the string in a recursive manner (however, it’s tail recursive, thus it’s like a loop) and copy each char which is not within a comment.

The following functions are used to detect the end of a comment. They ignore any characters except for the ending delimiters, thus the underscore in the pattern match.

inComment :: String -> String
inComment ('\n':xs) = stripComments xs
inComment (_:xs) = stripComments xs
inComment [] = []

inMultiComment :: String -> String
inMultiComment ('*':'/':xs) = stripComments xs
inMultiComment (_:xs) = inMultiComment xs
inMultiComment [] = []

If you use more sophisticated parsing however I recommend taking a shot on the Parsec monadic parsing library.

EDIT: As user268396 pointed out you should be aware that something looking like a Comment may be hiding in a String. You might want to extend the above functions with a “inString” function which does not ignore the characters it encounters, yet does not switch to inComment or inMultiComment if it encounters starting delimiters for these.

回复收藏 0 原文

如梦初醒的夏天 2024-12-18 22:03:21

作为类似内容的示例，请参阅我如何从点代码;请注意，我正在使用我定义的组合器（但未完全注释）此处用于 PolyParse 中的文本解析器。

它不考虑字符串等中的注释，但确实使用 /* ... */ 和 // .. 删除表单的所有注释。 .

回复收藏 0 原文

习惯那些不曾习惯的习惯 2024-12-18 22:03:21

可能有 3 种实现方法：

对字符串进行手动模式匹配
使用 Text.Parsec 包
使用Text.Regex 包

对于家庭作业，我会使用手动匹配
为了实现稳健的实现，我会选择 Text.Parsec
对于快速而肮脏的解决方案，我会选择 Text.Regex

回复收藏 0 原文

噩梦成真你也成魔 2024-12-18 22:03:21

stripComments :: String -> String
stripComments [] = []
stripComments ('/':'/':xs) = inComment xs 
stripComments ('/':'*':xs) = inMultiComment xs
stripComments ('\"':xs) = '\"' : inString xs
stripComments (x:xs) = x : stripComments xs

inComment :: String -> String
inComment [] = []
inComment ('\n':xs) = stripComments xs
inComment (_:xs) = inComment xs

inMultiComment :: String -> String
inMultiComment [] = []
inMultiComment ('*':'/':xs) = stripComments xs
inMultiComment (_:xs) = inMultiComment xs

inString :: String -> String
inString [] = []
inString ('\"':xs) = '\"' : stripComments xs
inString (x:xs) = x : inString xs

stripComments :: String -> String
stripComments [] = []
stripComments ('/':'/':xs) = inComment xs 
stripComments ('/':'*':xs) = inMultiComment xs
stripComments ('\"':xs) = '\"' : inString xs
stripComments (x:xs) = x : stripComments xs

inComment :: String -> String
inComment [] = []
inComment ('\n':xs) = stripComments xs
inComment (_:xs) = inComment xs

inMultiComment :: String -> String
inMultiComment [] = []
inMultiComment ('*':'/':xs) = stripComments xs
inMultiComment (_:xs) = inMultiComment xs

inString :: String -> String
inString [] = []
inString ('\"':xs) = '\"' : stripComments xs
inString (x:xs) = x : inString xs

回复收藏 0 原文

~没有更多了~