当前位置：文江博客话题详情

F# 与 C# 性能签名及示例代码

发布于 2024-10-15 14:57:41 字数 3492 浏览 4 评论 0原文

关于这个话题已经有很多讨论了，但我更喜欢鞭打死马，尤其是当我发现它们可能还在呼吸时。

我当时正在解析 CSV 这种不寻常且奇特的文件格式，为了好玩，我决定针对我所知道的 2 种 .net 语言（C# 和 F#）来描述其性能。

结果……令人不安。 F# 以很大优势赢得了 2 倍或更多（实际上我认为它更像是 0.5n，但事实证明获得真正的基准测试非常困难，因为我正在针对硬件 IO 进行测试）。

像读取 CSV 这样常见的事情中不同的性能特征令我感到惊讶（请注意，该系数意味着 C# 在非常小的文件上胜出。我做的测试越多，感觉 C# 的扩展性就越差，这既令人惊讶又令人担忧，因为这可能意味着我做错了）。

一些注意事项：Core 2 双核笔记本电脑，主轴磁盘 80 GB，3 GB DDR 800 内存，Windows 7 64 位高级版，.Net 4，未打开电源选项。

30,000 行 5 宽 1 短语 10 个字符或更少给了我 3 的因子，有利于第一次运行后的尾部调用递归（它似乎缓存了文件）

300,000（重复相同的数据）是尾部的 2 因子使用 F# 的可变实现进行调用递归稍微胜出，但性能签名表明我正在访问磁盘而不是对整个文件进行 ram 磁盘，这会导致半随机性能峰值。

F# 代码

//Module used to import data from an arbitrary CSV source
module CSVImport
open System.IO

//imports the data froma path into a list of strings and an associated value
let ImportData (path:string) : List<string []> = 

    //recursively rips through the file grabbing a line and adding it to the 
    let rec readline (reader:StreamReader) (lines:List<string []>) : List<string []> =
        let line = reader.ReadLine()
        match line with
        | null -> lines
        | _ -> readline reader  (line.Split(',')::lines)

    //grab a file and open it, then return the parsed data
    use chaosfile = new StreamReader(path)
    readline chaosfile []

//a recreation of the above function using a while loop
let ImportDataWhile (path:string) : list<string []> =
    use chaosfile = new StreamReader(path)
    //values ina loop construct must be mutable
    let mutable retval = []
    //loop
    while chaosfile.EndOfStream <> true do
        retval <- chaosfile.ReadLine().Split(',')::retval 
    //return retval by just declaring it
    retval

let CSVlines (path:string) : string seq= 
    seq { use streamreader = new StreamReader(path)
          while not streamreader.EndOfStream do
            yield streamreader.ReadLine() }

let ImportDataSeq (path:string) : string [] list =
    let mutable retval = []
    let sequencer = CSVlines path
    for line in sequencer do
        retval <- line.Split()::retval
    retval

C# 代码

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text;

namespace CSVparse
{
    public class CSVprocess
    {
        public static List<string[]> ImportDataC(string path)
        {
            List<string[]> retval = new List<string[]>();
            using(StreamReader readfile = new StreamReader(path))
            {
                string line = readfile.ReadLine();
                while (line != null)
                {
                    retval.Add(line.Split());
                    line = readfile.ReadLine();
                }
            } 

           return retval;
        }

        public static List<string[]> ImportDataReadLines(string path)
        {
            List<string[]> retval = new List<string[]>();
            IEnumerable<string> toparse = File.ReadLines(path);

            foreach (string split in toparse)
            {
                retval.Add(split.Split());
            }
            return retval;
        }
    }

}

请注意其中的各种实现。使用迭代器、使用序列、使用尾部调用优化、两种语言的 while 循环...

一个主要问题是我正在访问磁盘，因此可以解释一些特性，我打算重写此代码以读取内存流（假设我不开始交换，应该更加一致）

但是我所教/读的所有内容都表明 while 循环/for 循环比尾调用优化/递归更快，并且我运行的每个实际基准测试都在说与此完全相反。

所以我想我的问题是，我应该质疑传统智慧吗？

在 .net 生态系统中，尾调用递归真的比 while 循环更好吗？

这在 Mono 上效果如何？

原文

There are many discussions on this topic already, but I am all about flogging dead horses, particularly when I discover they may still be breathing.

I was working on parsing the unusual and exotic file format that is the CSV, and for fun I decided to characterize the performance against the 2 .net languages I know, C# and F#.

The results were...unsettling. F# won, by a wide margin, a factor of 2 or more(and I actually think it's more like .5n, but getting real benchmarks is proving to be tough since I am testing against hardware IO).

Divergent performance characteristics in something as common as reading a CSV is surprising to me(note that the coefficient means that C# wins on very small files. The more testing I am doing the more it feels like C# scales worse, which is both surprising and concerning, since it probably means I am doing it wrong).

Some notes : Core 2 duo laptop, spindle disk 80 gigs, 3 gigs ddr 800 memory, windows 7 64 bit premium, .Net 4, no power options turned on.

30,000 lines 5 wide 1 phrase 10 chars or less is giving me a factor of 3 in favor of the tail call recursion after the first run(it appears to cache the file)

300,000(same data repeated) is a factor of 2 for the tail call recursion with F#'s mutable implementation winning out slightly, but the performance signatures suggest that I am hitting the disk and not ram-disking the whole file, which causes semi-random performance spikes.

F# code

//Module used to import data from an arbitrary CSV source
module CSVImport
open System.IO

//imports the data froma path into a list of strings and an associated value
let ImportData (path:string) : List<string []> = 

    //recursively rips through the file grabbing a line and adding it to the 
    let rec readline (reader:StreamReader) (lines:List<string []>) : List<string []> =
        let line = reader.ReadLine()
        match line with
        | null -> lines
        | _ -> readline reader  (line.Split(',')::lines)

    //grab a file and open it, then return the parsed data
    use chaosfile = new StreamReader(path)
    readline chaosfile []

//a recreation of the above function using a while loop
let ImportDataWhile (path:string) : list<string []> =
    use chaosfile = new StreamReader(path)
    //values ina loop construct must be mutable
    let mutable retval = []
    //loop
    while chaosfile.EndOfStream <> true do
        retval <- chaosfile.ReadLine().Split(',')::retval 
    //return retval by just declaring it
    retval

let CSVlines (path:string) : string seq= 
    seq { use streamreader = new StreamReader(path)
          while not streamreader.EndOfStream do
            yield streamreader.ReadLine() }

let ImportDataSeq (path:string) : string [] list =
    let mutable retval = []
    let sequencer = CSVlines path
    for line in sequencer do
        retval <- line.Split()::retval
    retval

C# Code

using System;
using System.Collections.Generic;
using System.Linq;
using System.IO;
using System.Text;

namespace CSVparse
{
    public class CSVprocess
    {
        public static List<string[]> ImportDataC(string path)
        {
            List<string[]> retval = new List<string[]>();
            using(StreamReader readfile = new StreamReader(path))
            {
                string line = readfile.ReadLine();
                while (line != null)
                {
                    retval.Add(line.Split());
                    line = readfile.ReadLine();
                }
            } 

           return retval;
        }

        public static List<string[]> ImportDataReadLines(string path)
        {
            List<string[]> retval = new List<string[]>();
            IEnumerable<string> toparse = File.ReadLines(path);

            foreach (string split in toparse)
            {
                retval.Add(split.Split());
            }
            return retval;
        }
    }

}

Note the variety of implementations there. Using iterators, using sequences, using tail call optimizatons, while loops in 2 languages...

A major issue is that I am hitting the disk, and so some idiosyncracies can be accounted for by that, I intend on rewriting this code to read from a memory stream(which should be more consistent assuming I don't start to swap)

But everything I am taught/read says that while loops/for loops are faster than tail call optimizations/recursion, and every actual benchmark that I run is saying the dead opposite of that.

So I guess my question is, should I question the conventional wisdom?

Is tail call recursion really better than while loops in the .net ecosystem?

How does this work out on Mono?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

゛时过境迁 2024-10-22 14:57:41

我认为差异可能是由于 F# 和 C# 中不同的 List 引起的。 F# 使用单链表（请参阅 http://msdn.microsoft.com/en- us/library/dd233224.aspx），而在 C# 中使用的是基于数组的 System.Collections.Generic.List。

对于单链表来说，连接要快得多，特别是当您解析大文件时（您需要不时分配/复制整个数组列表）。

尝试在 C# 代码中使用 LinkedList，我对结果很好奇:) ...

PS：此外，这也是何时使用探查器的一个很好的示例。您可以轻松找到 C# 代码的“热点”...

编辑

因此，我自己尝试了这一点：我使用了两个相同的文件来防止缓存效应。这些文件有 3.000.000 行，包含 10 次“abcdef”，以逗号分隔。

主程序如下所示：（

static void Main(string[] args) {
   var dt = DateTime.Now;
   CSVprocess.ImportDataC("test.csv"); // C# implementation
   System.Console.WriteLine("Time {0}", DateTime.Now - dt);
   dt = DateTime.Now;
   CSVImport.ImportData("test1.csv"); // F# implementation
   System.Console.WriteLine("Time {0}", DateTime.Now - dt);
}

我也尝试过先执行 F# 实现，然后执行 C#...）

结果是：

C#：3.7 秒
F#：7.6 秒

运行 C# 解决方案之后 F# 解决方案为 F# 版本提供了相同的性能，但为 C# 提供了 4.7 秒（我认为是由于 F# 解决方案分配了大量内存）。单独运行每个解决方案不会改变上述结果。

对于 C# 解决方案，使用 6.000.000 行的文件大约需要 7 秒，F# 解决方案会产生 OutOfMemoryException（我在具有 12GB RAM 的机器上运行它......）

所以对我来说，传统的“智慧”似乎是这样的是的，C# 使用简单的循环对于此类任务来说速度更快......

I think that the difference may arise from different Lists in F# and C#. F# uses singly linked lists (see http://msdn.microsoft.com/en-us/library/dd233224.aspx) whereas in C# System.Collections.Generic.List ist used, which is based on arrays.

Concatenation is much faster for singly linked lists, especially when you're parsing big files (you need to allocate/copy the whole array list from time to time).

Try using a LinkedList in the C# code, I'm curious about the results :) ...

PS: Also, this would be a good example on when to use a profiler. You could easily find the "hot spot" of the C# code...

EDIT

So, I tried this for myself: I used two identical files in order to prevent caching effects. The files were 3.000.000 lines with 10 times 'abcdef', separated by comma.

The main program looks like this:

static void Main(string[] args) {
   var dt = DateTime.Now;
   CSVprocess.ImportDataC("test.csv"); // C# implementation
   System.Console.WriteLine("Time {0}", DateTime.Now - dt);
   dt = DateTime.Now;
   CSVImport.ImportData("test1.csv"); // F# implementation
   System.Console.WriteLine("Time {0}", DateTime.Now - dt);
}

(I also tried it with first executing the F# implementation and then the C#...)

The result is:

C#: 3.7 seconds
F#: 7.6 seconds

Running the C# solution after the F# solution gives the same performance for the F# version but 4.7 seconds for C# (I assume due to heavy memory allocation by the F# solution). Running each solution alone doesn't change the above results.

Using a file with 6.000.000 lines gives ~ 7 seconds for the C# solution, the F# solution produces an OutOfMemoryException (I'm running this on a maching with 12GB Ram ...)

So for me it seems that the conventional 'wisdom' is true and C# using a simple loop is faster for this kind of tasks ...

回复收藏 0 原文

寒尘 2024-10-22 14:57:41

您真的、真的、真的、真的不应该从这些结果中解读任何内容 - 要么对您的整个结果进行基准测试系统作为系统测试的一种形式，或者从基准测试中删除磁盘 I/O。这只会让事情变得混乱。采用 TextReader 参数而不是物理路径可能是更好的做法，以避免将实现链接到物理文件。

此外，作为微基准测试，您的测试还有一些其他缺陷：

您定义了许多在基准测试期间未调用的函数。您正在测试 ImportDataC 或 ImportDataReadLines 吗？为了清晰起见，进行选择 - 在实际应用程序中，不要重复实现，而是排除相似之处并根据另一个来定义一个。
您在 F# 中调用 .Split(',') 而在 C# 中调用 .Split() - 您打算按逗号还是空格进行拆分？
您正在重新发明轮子 - 至少将您的实现与使用高阶函数（又名 LINQ）的更短版本进行比较。