带有序列运算符的 F# 列表

发布于 2024-12-05 02:04:43 字数 961 浏览 3 评论 0原文

查看这两个线程后: Does F# 有与 Haskell 等效的吗采取?从 F# 中具有 N 个不同索引的序列中取出 N 个元素< /a> ,我一直想知道在列表上使用序列运算符或即使使用它们的最佳方法。

我目前是 F# 的新手,我正在编写一个程序,该程序必须处理从 HtmlAgilityPack 获得的大量序列。 Seq 模块中有一些有趣的运算符,但正如这些线程中所述,它可能与性能相关性较差,并且如果我们必须不断在 seq -> 之间进行转换。列出它还会使代码变得混乱,其中包含无法解决问题的内容...这就是我首先开始学习 F# 的原因。

一个简单的例子是,当我需要获取列表中的“N”个元素时:

               listOfRows
               |> Seq.take 2
               // Now I don't have a list anymore, it returns a sequence
               |> List.ofSeq

那么,有人可以阐明处理这些情况的最佳方法吗?我可以使用 Seq.take 和 Seq.skip 工作解决方案,但众所周知,这是非常低效的。另一方面,将功能内置到标准库中并必须重新实现它以在不同的集合上使用相同的概念,或者通过显式转换使代码变得更脏,这是一种耻辱。

我如何才能看到性能对 'list ->; 之间的每次转换的影响? seq' 和 'seq ->清单'有吗?

多谢。

After having a look at these two threads: Does F# have an equivalent to Haskell's take? , Take N elements from sequence with N different indexes in F#
, I've been wondering about the best way to use sequence operators on lists or even if using them.

I am at the moment a newbie at F# and I'm writing a program which has to deal with lots of sequences that I get from HtmlAgilityPack. There are some interesting operators in the Seq module but as stated in those threads, it might be poor related to performance and if we are obliged to constantly converting between seq -> list it also clutters code with things that are not problem-solving... which is why I started learning F# in the first place.

A simple example is when I need to take 'N' elements of a list:

               listOfRows
               |> Seq.take 2
               // Now I don't have a list anymore, it returns a sequence
               |> List.ofSeq

So, could anyone shed some light about the best way to deal with these scenarios? I can work solutions using Seq.take and Seq.skip but this is known to be very inefficient. On the other hand it's a shame to have the functionality built into the standard library and having to reimplement it for using the same concept on a different collection, or make the code dirtier with explicit conversions.

How could I see the performance impact each conversion between 'list -> seq' and 'seq -> list' has?

Thanks a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

狼性发作 2024-12-12 02:04:43

这些实施起来相当简单。

module List =

  let take n items =
    let rec take' acc = function
      | 0, _ -> List.rev acc
      | _, [] -> invalidOp "count exceeds number of elements"
      | n, h::t -> take' (h::acc) (n-1, t)
    take' [] (n, items)

  let rec skip n items =
    match n, items with
    | 0, _ -> items
    | _, [] -> invalidOp "count exceeds number of elements"
    | n, _::t -> skip (n-1) t

以下是它们与 Seq 同行相比的表现。

let n = 10000000
let l = List.init n id
let test f = f (n-1) l

test List.take               //Real: 00:00:03.724, CPU: 00:00:03.822, GC gen0: 57, gen1: 34, gen2: 1
test Seq.take |> Seq.toList  //Real: 00:00:04.953, CPU: 00:00:04.898, GC gen0: 57, gen1: 33, gen2: 0
test List.skip               //Real: 00:00:00.044, CPU: 00:00:00.046, GC gen0: 0, gen1: 0, gen2: 0
test Seq.skip |> Seq.toList  //Real: 00:00:01.147, CPU: 00:00:01.154, GC gen0: 0, gen1: 0, gen2: 0

如果毫秒对您的应用程序很重要,那么也许值得创建“缺失的”List 函数。否则,我会说使用 Seq 版本完全没问题。

These are fairly trivial to implement.

module List =

  let take n items =
    let rec take' acc = function
      | 0, _ -> List.rev acc
      | _, [] -> invalidOp "count exceeds number of elements"
      | n, h::t -> take' (h::acc) (n-1, t)
    take' [] (n, items)

  let rec skip n items =
    match n, items with
    | 0, _ -> items
    | _, [] -> invalidOp "count exceeds number of elements"
    | n, _::t -> skip (n-1) t

And here's how they perform vs their Seq counterparts.

let n = 10000000
let l = List.init n id
let test f = f (n-1) l

test List.take               //Real: 00:00:03.724, CPU: 00:00:03.822, GC gen0: 57, gen1: 34, gen2: 1
test Seq.take |> Seq.toList  //Real: 00:00:04.953, CPU: 00:00:04.898, GC gen0: 57, gen1: 33, gen2: 0
test List.skip               //Real: 00:00:00.044, CPU: 00:00:00.046, GC gen0: 0, gen1: 0, gen2: 0
test Seq.skip |> Seq.toList  //Real: 00:00:01.147, CPU: 00:00:01.154, GC gen0: 0, gen1: 0, gen2: 0

If milliseconds count for your application then maybe it's worth it to create "missing" List functions. Otherwise, I would say using the Seq versions is perfectly fine.

许仙没带伞 2024-12-12 02:04:43

其中一些可能取决于您想要如何端到端地使用所有这些。

在许多情况下,预先转换为列表一次就可以了,然后只使用列表运算符来映射/遍历等。可能没有 List.take,但那是因为对于列表,如果您知道至少有两个元素并且您想获取这两个元素,则可以使用模式匹配来完成,例如,

let (item1::item2::rest) = someList

所以我怀疑这可能是您在这种情况下想要做的(我期望通过 HTML 解析,您可能有某种您正在寻找的元素的预期粗略模式,等等)。

(如果惰性/流是必需的,那么 Seq 就变得更加有用。)

简而言之,最常见的运算符(如 map)适用于所有集合类型(Seq>ListArray,...),而不常见的(例如 take)仅在 Seq 上可用,通常是因为有更好的方法来做事情当你有一个具体的类型(例如列表模式匹配采取第一项)。

Some of this might depend on exactly how you want to use all this end-to-end.

In many cases it will be fine to convert to a list once up front, and then only use the List operators to map/traverse/etc. There may not be a List.take, but that's because with lists, if you know there are going to be at least two elements and you want to grab those two, you can do it with a pattern match, e.g.

let (item1::item2::rest) = someList

So I suspect that may be what you want to do in this scenario (I expect with HTML parsing, you might have some kind of expected rough schema of elements you're looking for, etc.).

(If laziness/streaming is essential, then Seq becomes much more useful.)

Briefly, the commonest operators (like map) are on all the collection types (Seq, List, Array, ...) whereas the uncommoner ones (like take) are only available on Seq, often because there are better ways to do things when you have a concrete type (e.g. list pattern matching to take the first items).

不一样的天空 2024-12-12 02:04:43

添加注释

从纯粹的功能意义上来说,take 无法就地操作列表 - 考虑一下,

a::b::c::d::[]

如果我们只想要前 2 个元素,我们至少需要修改 b 这样我们就得到

a::b::[]

既然b被修改了,你还需要修改a,使它指向新修改的b。因此,不可能在列表上实现 take in place,这解释了为什么 List 模块中缺少它。

如果您确实担心性能,请先进行分析,然后考虑切换到不同的数据类型。您可能最好使用 .Net System.Collections.Generic.List<_>,它具有许多与 ListArray 相同的方法> - http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/fsharp.powerpack/microsoft.fsharp.collections.resizearray.html

To add a comment

In a purely functional sense take cannot operate on the list in place - consider

a::b::c::d::[]

if we want only the first 2 elements, we need to at the very least modify b so that we get

a::b::[]

Since b was modified, you will also need to modify a so that it points to the new modified b. As a result of this, it is impossible to implement take in place on a list, which explains why it is missing from the List module.

If you are really worried about performance, profile first, and then consider switching to a different data type. You may be better off using the .Net System.Collections.Generic.List<_> which has many of the same methods as List and Array - http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/fsharp.powerpack/microsoft.fsharp.collections.resizearray.html

三五鸿雁 2024-12-12 02:04:43

您可以充分了解转换 Seq -> 的性能影响列表与列表->通过检查相应的转换实现进行排序:

// List.ofSeq
let ofSeq source = Seq.toList source
// List.toSeq
let toSeq list = Seq.ofList list
// Seq.ofList
let ofList (source : 'T list) = 
        (source :> seq<'T>)
// Seq.toList
let toList (source : seq<'T>) = 
        checkNonNull "source" source
        match source with 
        | :? ('T list) as res -> res
        | :? ('T[]) as res -> List.ofArray res
        | _ -> 
            use e = source.GetEnumerator()
            let mutable res = [] 
            while e.MoveNext() do
                res <- e.Current :: res
            List.rev res

与集合上的实际操作相比,转换本身对性能的影响相对较小。在我的旧 Core 2 Duo 2.4Ghz 笔记本上运行以下代码片段,将 100 万个成员的列表转换为 seq,然后返回到另一个列表,

open System.Diagnostics

let tls = Stopwatch()
let l = [1..1000000]
tls.Start()
let s = List.toSeq l
//Seq.length s |> ignore
//Seq.length s |> ignore
tls.Stop()
printfn "List<int> of 1000000 -> Seq: %d ticks" tls.ElapsedTicks 

let tsl = Stopwatch()
tsl.Start()
let l' = Seq.toList s
//l'.Length |> ignore
//l'.Length |> ignore
tsl.Stop()
printfn "Seq<int> of 1000000 -> List: %d ticks" tsl.ElapsedTicks

相应地显示了 42 和 8 个刻度。如果我们取消注释第一行的长度计数器,执行将花费 18695 和 12952 个周期。取消注释带有长度计数器的第二条相应行后,执行持续时间显示 38377 和 25404 个刻度,这表明惰性与观察到的现象无关。

与集合操作执行本身相比,Seq 和 List 类型之间的转换开销似乎可以忽略不计。

You may fully understand performance impact of conversions Seq -> List and List -> Seq by inspecting correspondent conversion implementations:

// List.ofSeq
let ofSeq source = Seq.toList source
// List.toSeq
let toSeq list = Seq.ofList list
// Seq.ofList
let ofList (source : 'T list) = 
        (source :> seq<'T>)
// Seq.toList
let toList (source : seq<'T>) = 
        checkNonNull "source" source
        match source with 
        | :? ('T list) as res -> res
        | :? ('T[]) as res -> List.ofArray res
        | _ -> 
            use e = source.GetEnumerator()
            let mutable res = [] 
            while e.MoveNext() do
                res <- e.Current :: res
            List.rev res

Conversions themself are relatively light on performance when compared with actual operations on collections. Running the following snippet that converts a List of 1 million members to seq and then back to another List on my old Core 2 Duo 2.4Ghz notebook

open System.Diagnostics

let tls = Stopwatch()
let l = [1..1000000]
tls.Start()
let s = List.toSeq l
//Seq.length s |> ignore
//Seq.length s |> ignore
tls.Stop()
printfn "List<int> of 1000000 -> Seq: %d ticks" tls.ElapsedTicks 

let tsl = Stopwatch()
tsl.Start()
let l' = Seq.toList s
//l'.Length |> ignore
//l'.Length |> ignore
tsl.Stop()
printfn "Seq<int> of 1000000 -> List: %d ticks" tsl.ElapsedTicks

shows blasting 42 and 8 ticks correspondently. If we uncomment first respective lines with length counters execution will take 18695 and 12952 ticks. After uncommenting second respective lines with length counters execution duration is showing 38377 and 25404 ticks, which indicates that laziness is unrelated to observed fenomena.

It seems that overhead of conversions between Seq and List types may be negligible compared to Collections operations execution per se.

眼中杀气 2024-12-12 02:04:43

List to Seq只不过是在List上创建一个迭代器(在.net世界中是一个Enumerable),所以基本上它不是一个会导致很大性能问题的操作(它只是创建一个状态机来保存有关哪个状态的状态)列表中当前需要“yield”的元素,并随着请求更多元素而增加它)。另一方面,将 Seq(它将具有一些从中生成值的底层集合)转换为 List 在概念上与迭代列表并从中创建新列表相同,因此这可能是一个耗时和内存消耗的过程,以防万一清单足够长。

就这些运算符的使用而言,一种可能的方法是将所有序列运算符分组在一起(与 linq 查询相同,您创建一个管道,通过该管道将集合元素一一处理),然后最后如果您需要时,您可以从结果 Seq 创建列表,因为列表是在 seq 上的所有过滤、映射、获取等工作结束时创建的,当最终数据准备好时,您将其转换为列表。创建中间列表效果不佳,并且会导致性能问题。

List to Seq is nothing but creating a iterator (in .net world a Enumerable) on the List, so basically it is not a operation which will cause much of a performance issue (it just create a state machine which hold the state about which is the current element in the list that need to be 'yield' and the increment it as more elements are requested). On the other hand converting a Seq (which will have some underlying collection from which it produces values) to a List is conceptually same as iterating a list and creating a new list from it so it may be a time and memory consuming process in case the list is long enough.

As far as usage of these operator goes one possible approach would be to group all your sequence operator together (same as linq queries where your sort of create a pipeline through which the collection elements gets processed one by one) and then in the end if you need you can create the list from resulting Seq as the list is created at the end of all the filtering, mapping, take etc work on seq and when the final data is ready you convert it to List. Creating intermediate lists will not work out well and will cause performance issues.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文