如何将IO中的数据读取到数据结构中,然后对数据结构进行处理?

发布于 2024-10-20 14:40:01 字数 789 浏览 2 评论 0原文

首先,我很抱歉做了“我从哪里开始”的典型事情,但我完全迷失了。

我已经读了“为伟大的利益而学习哈斯克尔”网站了,感觉已经有一个时代了(差不多半个学期了。我即将完成“输入和输出”一章,但我仍然没有知道如何编写多行程序

我已经看到了 do 语句,并且只能使用它将 IO 操作连接到单个函数中,但我不知道如何编写一个实际的应用程序。 我

有人能给我指出正确的方向吗?

有 C 背景,基本上我在大学的一个模块中使用了 Haskell,我想将 C++ 与 Haskell 进行比较(在很多方面)。我希望创建一系列搜索和排序程序,以便我可以评论它们在各自语言中的易用性及其速度。

但是,我真的开始失去使用 Haskell 的信心,因为已经六周了,并且。我仍然不知道如何编写一个完整的应用程序,而且我正在阅读的网站中的章节似乎越来越长。

我基本上需要创建一个将存储在结构中的基本对象(我知道该怎么做),更多我正在努力解决的是,如何创建一个从某些文本文件读取数据并填充的程序首先包含该数据的结构,然后继续处理它。由于 haskell 似乎分割了 IO 和其他操作,它不仅仅让我在程序中编写多行,我正在寻找这样的东西:

main = data <- getContent
       let allLines = lines data
       let myStructure = generateStruct allLines
       sort/search/etc
       print myStructure

我该怎么做?有什么好的教程可以帮助我开始实际的程序吗?

-一个

first off sorry for doing the typical thing of 'where do I begin', but I'm totally lost.

I've been reading the 'Learn you a haskell for great good' site for what feels like an age now (pretty much half a semester. I'm just about to finish the 'Input and Output' chapter, and I still have no clue how to write a multi line program.

I've seen the do statement, and that you can only use it to concat IO actions into a single function, but I can't see how I'm gonna go about writing a realistic application.

Can someone point me in the right direction.

I'm from a C background, and basically I'm using haskell for one of my modules this semester at uni, I want to compare C++ against haskell (in many aspects). I'm looking to create a series of searching and sorting programs so that I can comment on how easy they are in the respective languages versus their speed.

However, I'm really starting to loose my faith in using Haskell as its been six weeks, and I still have no idea how to write a complete application, and the chapters in the site I'm reading seem to be getting longer and longer.

I basically need to create a basic object which will be stored in the structure (which I know how to do), more what I'm struggling with is, how do I create a program which reads data in from some text file, and populates the structure with that data in the first place, then goes on to process it. As haskell seems to split IO and other operations and it won't just let me write multiple lines in a program, I'm looking for something like this:

main = data <- getContent
       let allLines = lines data
       let myStructure = generateStruct allLines
       sort/search/etc
       print myStructure

how do I go about this? any good tutorials which will help me get going with realistic programs?

-A

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

只是一片海 2024-10-27 14:40:01

您提到看到 do 表示法,现在是时候学习如何使用 do 了。考虑您的示例 main 是一个 IO,您应该使用 do 语法或绑定:

main = do
  dat <- getContent
  let allLines = lines dat
      myStructure = generateStruct allLines
      sorted = mySort myStructure
      searchResult = mySearch myStructure
  print myStructure
  print sorted
  print searchResult

所以现在您有一个获取 stdin 的 main,将其打开通过 lines 转换为 [String],大概将其解析为一个结构,并对该结构运行排序和搜索。请注意,有趣的代码都是纯的 - mySortmySearchgenerateStruct 不需要是 IO(也不可能是,因为在 let 绑定中),因此您实际上正确地一起使用了纯粹且有效的代码。

我建议你看看bind是如何工作的(>>=)以及如何将符号脱糖到bind中。 这个问题应该有帮助。

You mentioned seeing do notation, now it's time to learn how to use do. Consider your example main is an IO, you should be using do syntax or binds:

main = do
  dat <- getContent
  let allLines = lines dat
      myStructure = generateStruct allLines
      sorted = mySort myStructure
      searchResult = mySearch myStructure
  print myStructure
  print sorted
  print searchResult

So now you have a main that gets stdin, turns it into [String] via lines, presumably parses it into a structure and runs sorting and searches on that structure. Notice the interesting code is all pure - mySort, mySearch, and generateStruct doesn't need to be IO (and can't be, being inside a let binding) so you are actually properly using pure and effectful code together.

I suggest you look at how bind works (>>=) and how do notation desugars into bind. This SO question should help.

羁〃客ぐ 2024-10-27 14:40:01

另请参阅 Neil Mitchell 的解释没有 Monad 的 Haskell IO

See also Explaining Haskell IO without Monads by Neil Mitchell.

故人爱我别走 2024-10-27 14:40:01

我将尝试从一个简化的示例开始。假设这就是我们想要做的:

  1. 打开一个包含整数列表的文件并返回它。
  2. 对此列表进行排序
  3. 让我们也反转列表
  4. 在屏幕上打印结果

让我们也说我们有这些可以使用的函数:

getContent :: IO [Int]
sort :: [Int] -> [Int]
reverse :: [Int] -> [Int]
show :: a -> String
putStrLn :: String -> IO ()

为了让我们清楚,我将介绍一下这些函数:

  • getContent:我编写了这个函数,但是如果有这样的函数将是它的签名(您可以使用getContent = return [3,7,2,1]测试目的)。我确信您以前见过这样的签名,并且至少模糊地理解,由于它确实进行 IO,因此它的签名不能只是 getContent :: [Int]
  • sort:它是Data.List模块中定义的函数,用法很简单:sort [3,1,2]返回[1,2,3]< /code>
  • reverse:也在 Data.List 模块中定义: reverse [1,3,2] 返回 [2,3,1]
  • show:不需要导入任何东西,直接使用即可:show 11 返回字符串"11"show [1,2,3] 返回字符串 "[1,2,3]"
  • putStrLn:获取一个字符串,将其放在屏幕上并返回IO (),现在再说一次,因为它执行 IO,所以它的签名不能只是 putStrLn :: Stiring -> ()

好的,现在我们已经拥有了创建程序所需的一切,现在的问题是将这些函数连接在一起。让我们从连接函数开始:

getContent :: IO [Int]sort :: [Int] -> [Int]

我认为如果你得到了这一部分,你也将很容易得到其余的。因此,问题在于,由于 getContent 返回 IO [Int] 而不仅仅是 [Int],因此您不能忽略或摆脱IO 部分并将其推入 sort 中。也就是说,您不能执行以下操作来连接这些函数:


排序(getRidOfIO getContent)

这里是 >>= :: ma -> (a→mb)→ m b 操作可以解决这个问题。现在请注意,mab类型变量,因此如果我们替换 m > 表示 IOa 表示 [Int]b 表示 [Int] ,我们得到签名:

>>= :: IO [Int] -> ([Int] -> IO [Int]) -> IO [Int]

再次查看这些 getContentsort 函数及其签名,并尝试考虑它们如何适合 >>=。我相信您会注意到,您可以直接使用 getContent 作为 >>= 的第一个参数。到目前为止, >>= 要做的就是将 [Int] 取出 getContent 并将其推入作为第二个参数提供的函数中。但是第二个参数中的函数是什么?我们不能使用 sort :: [Int] ->直接 [Int],我们可以尝试的下一个最好的方法是

\listOfInts ->对 listOfInts 进行排序,

但仍然具有签名 [Int] -> [Int] 所以这没有多大帮助。这是另一个英雄登场的地方,

return :: a ->; m a..

同样,am 是类型变量,让我们替换它们,我们将得到

return :: [Int] -> IO [Int]

因此添加 \listOfInts ->对 listOfIntsreturn 一起排序,我们将得到:

\listOfInts ->返回 $ sort listOfInts :: [Int] -> IO [Int]

这正是我们想要作为 >>= 的第二个参数的内容。因此,最后让我们使用粘合剂将 getContentsort 连接在一起:

getContent >>= (\listOfInts -> return $ sort listOfInts)

这与(使用 do 表示法)是一样的:

do listOfInts <- getContent
   return $ sort listOfInts

到了,最可怕的部分就结束了。现在可能会出现一个顿悟时刻,尝试思考我们刚刚建立的连接的结果类型是什么。我会破坏它,...

getContent >>= (\listOfInts -> return $ sort listOfInts) 的类型是 IO [Int]再次。

让我们总结一下:我们采用了 IO [Int] 类型和 [Int] -> 类型的内容。 [Int],将这两个东西粘合在一起,并再次得到 IO [Int] 类型的东西!

现在继续尝试完全相同的事情:获取我们刚刚创建的 IO [Int] 对象并将其粘合在一起(使用 >>=return) 和 reverse :: [Int] -> [整数]

我想我写得太多了,但是如果有任何不清楚的地方或者您是否需要其余部分的帮助,请告诉我。

到目前为止我所描述的内容可能如下所示:

getContent :: IO [Int]   
getContent = return [5,2,1,7]

main :: IO ()
main = do
  listOfInts <- getContent
  return $ sort listOfInts
  return ()                   -- This is only to sattisfy the signature of main

I'll try to start with a simplified example. Let's say this is what we want to do:

  1. Open a file which contains a list of integers and return it.
  2. Sort this list
  3. Let's also reverse the list
  4. Print the result on the screen

Let's also say that we have these functions that we can use:

getContent :: IO [Int]
sort :: [Int] -> [Int]
reverse :: [Int] -> [Int]
show :: a -> String
putStrLn :: String -> IO ()

Just so we are clear, I'll have a word about these functions:

  • getContent: I made up this function, but if there was such function that would be it's signature (you can use getContent = return [3,7,2,1] for testing purposes). I'm sure you've seen such signature before and at least vaguely understand that since it does IO its signature can not be just getContent :: [Int].
  • sort: It's a function defined in Data.List module, usage is simple: sort [3,1,2] returns [1,2,3]
  • reverse: Also defined in Data.List module: reverse [1,3,2] returns [2,3,1]
  • show: don't need to import anything, just use it: show 11 returns the string "11"; show [1,2,3] returns the string "[1,2,3]", etc.
  • putStrLn: takes a string, puts it on the screen and returns IO (), now again, since it does IO its signature can not be just putStrLn :: Stiring -> ().

OK, now we have all we need to create our program, the problem now is about connecting these functions together. Let's start with connecting functions:

getContent :: IO [Int] with sort :: [Int] -> [Int]

I think if you get this part, you'll easily get the rest as well. So, the problem is that since getContent returns IO [Int] and not just [Int], you can't just ignore or get rid of the IO part and shove it into sort. That is, this is what you can not do to connect these functions:


sort (getRidOfIO getContent)

Here is where the >>= :: m a -> (a -> m b) -> m b operation comes to the rescue. Now notice that m, a and b are type variables so if we substitute m for IO, a for [Int] and b for [Int], we get the signagure:

>>= :: IO [Int] -> ([Int] -> IO [Int]) -> IO [Int]

Have a look again at those getContent and sort functions and their signatures and try to think about how they'll fit into the >>=. I'm sure you'll notice that you can use getContent directly as the first argument to >>=. So far what >>= will do is take the [Int] out getContent and shoves it into the function provided as a second argument. But what will be the function in the second argument? We can't use the sort :: [Int] -> [Int] directly, the next best thing we can try is

\listOfInts -> sort listOfInts

but that still has signature [Int] -> [Int] so that did not help much. Here is where the other hero comes to the play, the

return :: a -> m a.

Again, a and m are type variables, lets substitute them and we will get

return :: [Int] -> IO [Int]

so adding \listOfInts -> sort listOfInts and return together we will get:

\listOfInts -> return $ sort listOfInts :: [Int] -> IO [Int]

Which is exactly what we want to put as a second argument to >>=. So lets finaly connect getContent and sort using our glue together:

getContent >>= (\listOfInts -> return $ sort listOfInts)

which is the same thing as (using the do notation):

do listOfInts <- getContent
   return $ sort listOfInts

There, that is the end of the most terrifying part. And now comes possibly one of the aha moments, try to think about what is the result type of the connection we just made up. I'll spoil it for you,... the type of

getContent >>= (\listOfInts -> return $ sort listOfInts) is IO [Int] again.

Lets summarize: we took something of type IO [Int] and something of type [Int] -> [Int], glued those two things together and got again something of type IO [Int]!

Now go ahead and try exactly the same thing: Take the IO [Int] object we have just created and glue it together (using >>= and return) with reverse :: [Int] -> [Int].

I think I wrote way too much, but let me know if anything was not clear or if you need help with the rest.

Wha I've described so far can look something like this:

getContent :: IO [Int]   
getContent = return [5,2,1,7]

main :: IO ()
main = do
  listOfInts <- getContent
  return $ sort listOfInts
  return ()                   -- This is only to sattisfy the signature of main
灯角 2024-10-27 14:40:01

如果是从 stdin 读取并将结果写入 stdout 的问题,而无需进一步干预用户输入 - 正如您提到的 getContents建议——然后是古老的 interact :: (String -> String) -> IO(),或者其他几个版本,例如Data.ByteString.interact :: (ByteString -> ByteString) -> IO()Data.Text.interact::(Text -> Text) -> IO() 就是所需要的全部。 interact 基本上是“用这个函数制作一个小unix工具”函数——它将正确类型的函数映射到可执行操作(即类型的值) code>IO()。)所有 Haskell 教程都应该在第三页或第四页提到它,并附有编译说明。

因此,如果您

main = interact arthur

arthur :: String -> String
arthur = reverse

使用 ghc --make -O2 Reverse.hs -overse 编写和编译,那么您通过管道传输到 ./reverse 的任何内容都将被理解为字符列表,并且出现逆转。同样,无论您通过管道

main = interact (unlines . meredith  . lines)

meredith :: [String] -> [String]
meredith = filter (not.null)

传输到什么,都会出现省略空行的情况。更有趣的是,

main = interact ( unlines . map show . luther . map read . lines)

luther :: [Int] -> [Int]
luther = filter even 

将获取由换行符分隔的字符流,将它们读取为 Int,删除奇数字符,并生成适当过滤的流。

main = interact ( unlines . map show . emma . map read . lines)

emma :: [Int] -> Int
emma = sum . map square 
  where square x = x * x

将打印换行符分隔的数字的平方和。

在最后两种情况下,lutheremma 内部“数据结构”是 [Int],这非常乏味,并且应用于它的函数非常简单,课程。要点是让一种interact 形式来处理所有 IO,从而让“填充结构”和“处理它”之类的图像从您的脑海中消失。要使用interact,您需要使用组合来使整体产生某种String ->字符串函数。但即使在这里,正如第一个例子中的那样 arthur:: String -> String 您正在定义一个更像数学意义上的真正函数。 StringByteString 类型中的值与 BoolInt 中的值一样纯粹。

在这种基本交互类型的更复杂的情况下,您的任务首先是考虑如何将您将关注的函数的所需值映射到String 值(这里,对于 Int 来说只是 show ,或者对于 [Int] 来说是 unlines .map show ) 。 interact 知道如何处理字符串。 -- 然后弄清楚如何定义从字符串或 ByteString(将包含“原始”数据)到主函数作为参数的一个或多个类型中的值的纯映射。在这里我只是使用 map read 。行 产生[Int]。如果您正在处理一些更复杂的树结构,您需要一个从 [Int]MyTree Int 的函数。当然,放置在这个位置的更复杂的函数是解析器。

然后你就可以进城了,在这种情况下:根本没有理由认为自己是“编程”、“填充”和“处理”。这就是 LYAH 的所有炫酷功能发挥作用的地方。您的职责是在特定的定义规则内定义映射。在最后两种情况下,它们是从 [Int][Int] 以及从 [Int]Int >,但这里有一个类似的示例,源自 优秀但仍不完整的教程 超级优秀的 Vector,其中处理初始数字结构是 Vector Int

{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8      as L
import qualified Data.Vector.Unboxed            as U
import System.Environment

main = L.interact (L.pack . (++"\n") . show . roman . parse)
    where 
    parse :: L.ByteString -> U.Vector Int
    parse bytestr = U.unfoldr step bytestr
    step !s = case L.readInt s of
        Nothing       -> Nothing
        Just (!k, !t) -> Just (k, L.tail t)

-- now the IO and stringy nonsense is out of the way 
-- so we can calculate properly:

roman :: U.Vector Int -> Int
roman = U.sum

这里,roman 又是低能的,从 Int 向量到 Int 的任何函数,无论多么复杂,都可以取代它。写一个更好的roman永远不会是“填充”“多行编程”“处理”等问题,尽管我们当然是这样说的;这只是一个通过 Data.Vector 和其他地方的函数组合来定义真正函数的问题。天空是极限,也请查看该教程。

If it is a question of reading from stdin and writing a result to stdout, with no further intevening user input -- as your mention of getContents suggests -- then the ancient interact :: (String -> String) -> IO (), or the several other versions, e.g. Data.ByteString.interact :: (ByteString -> ByteString) -> IO () or Data.Text.interact :: (Text -> Text) -> IO() are all that are needed. interact is basically the 'make a little unix tool out of this function' function -- it maps pure functions of the right type to executable actions (i.e. values of the type IO().) All Haskell tutorials should mention it on the third or fourth page, with instructions on compilation.

So if you write

main = interact arthur

arthur :: String -> String
arthur = reverse

and compile with ghc --make -O2 Reverse.hs -o reverse then whatever you pipe to ./reverse will be understood as a list of characters and emerge reversed. Similarly, whatever you pipe to

main = interact (unlines . meredith  . lines)

meredith :: [String] -> [String]
meredith = filter (not.null)

will emerge with the empty lines omitted. More interestingly,

main = interact ( unlines . map show . luther . map read . lines)

luther :: [Int] -> [Int]
luther = filter even 

will take a stream of characters separated by newlines, read them as Ints, removing the odd ones, and yielding the suitably filtered stream.

main = interact ( unlines . map show . emma . map read . lines)

emma :: [Int] -> Int
emma = sum . map square 
  where square x = x * x

will print the sum of the squares of the newline-separated numerals.

In these last two cases, luther and emma the internal 'data structure' is [Int], which is pretty dull, and the function applied to it is idiot simple, of course. The main point is to let one of the forms of interact take care of all of the IO, and thus get images like 'populating a structure' and 'processing it' out of your head. To use interact you need to use composition to make the whole yield some sort of String -> String function. But even here, as in the runt first example arthur:: String -> String you are defining a genuine function in something more like the mathematical sense. Values in the types String and ByteString are just as pure as those in Bool or Int.

In more complicated cases of this basic interact type, your task is thus, first, to think how the desired pure values of the function you will be focussing on can be mapped to String values (here, it's just show for an Int or unlines . map show for a [Int]). interact knows what to "do" with the string. -- And then to figure out how to define a pure mapping from Strings or ByteString (which will contain your 'raw' data) to values in the type or types your principal function takes as arguments. Here I was just using map read . lines resulting in a [Int]. If you are working on some more complicated, say tree structure you'd need a function from [Int] to MyTree Int. A more elaborate function to put in this position would be a Parser, of course.

Then you can go to town, in this sort of case: there is really no reason to think of yourself as 'programming', 'populating' and 'processing' at all. This is where all the cool devices of LYAH kick in. Your duty is to define a mapping within the specific definitional discipline. In the last two cases, these are from [Int] to [Int] and from [Int] to Int, but here is a similar example derived from the excellent, still incomplete, tutorial on the super-excellent Vector package where the initial numerical structure one is dealing with is Vector Int

{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8      as L
import qualified Data.Vector.Unboxed            as U
import System.Environment

main = L.interact (L.pack . (++"\n") . show . roman . parse)
    where 
    parse :: L.ByteString -> U.Vector Int
    parse bytestr = U.unfoldr step bytestr
    step !s = case L.readInt s of
        Nothing       -> Nothing
        Just (!k, !t) -> Just (k, L.tail t)

-- now the IO and stringy nonsense is out of the way 
-- so we can calculate properly:

roman :: U.Vector Int -> Int
roman = U.sum

Here again roman is moronic, any function from a Vector of Ints to an Int, however complex, can take its place. Writing a better roman will never be a question of "populating" "multi-line programming" "processing" etc., though of course we speak this way; it is just a question of defining a genuine function by composition of the functions in Data.Vector and elsewhere. The sky is the limit, check out that tutorial too.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文