从指定位置读取文本文件直到指定长度

发布于 2024-12-22 14:41:54 字数 970 浏览 1 评论 0 原文

由于我收到了一个非常糟糕的数据文件,我必须想出代码来从特定的起始位置和特定的长度读取非分隔文本文件,以构建可用的数据集。文本文件没有以任何方式分隔,但是我确实有我需要读取的每个字符串的开始和结束位置。我已经想出了这段代码,但我收到一个错误并且无法弄清楚为什么,因为如果我用 0 替换 395 它就可以了..

例如发票号码起始位置 = 395,结束位置 = 414, length = 20

using (StreamReader sr = new StreamReader(@"\\t.txt"))
{                    
    char[] c = null;                   
    while (sr.Peek() >= 0)
    {
        c = new char[20];//Invoice number string
        sr.Read(c, 395, c.Length); //THIS IS GIVING ME AN ERROR                      
        Debug.WriteLine(""+c[0] + c[1] + c[2] + c[3] + c[4]..c[20]);
    }
}

这是我得到的错误:

System.ArgumentException: Offset and length were out of bounds for the array 
                          or count is greater than the number of elements from
                          index to the end of the source collection. at
                          System.IO.StreamReader.Read(Char[] b

Due to me receiving a very bad datafile, I have to come up with code to read from a non delimited textfile from a specific starting position and a specific length to buildup a workable dataset. The textfile is not delimited in any way, but I do have the starting and ending position of each string that I need to read. I've come up with this code, but I'm getting an error and can't figure out why, because if I replace the 395 with a 0 it works..

e.g. Invoice number starting position = 395, ending position = 414, length = 20

using (StreamReader sr = new StreamReader(@"\\t.txt"))
{                    
    char[] c = null;                   
    while (sr.Peek() >= 0)
    {
        c = new char[20];//Invoice number string
        sr.Read(c, 395, c.Length); //THIS IS GIVING ME AN ERROR                      
        Debug.WriteLine(""+c[0] + c[1] + c[2] + c[3] + c[4]..c[20]);
    }
}

Here is the error that I get:

System.ArgumentException: Offset and length were out of bounds for the array 
                          or count is greater than the number of elements from
                          index to the end of the source collection. at
                          System.IO.StreamReader.Read(Char[] b

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

琴流音 2024-12-29 14:41:54

请注意

Seek() 对于 OP 想要的东西来说太低了。请参阅此答案来进行逐行解析。

另外,正如 Jordan 提到的,Seek() 存在字符编码和不同字符大小的问题(例如,对于非 ASCII 和非 ANSI 文件,如 UTF,这可能不适用于此问题) 。感谢您指出这一点。


原始答案

Seek() 仅在流上可用,因此请尝试使用 sr.BaseStream.Seek(..),或使用不同的流,如下所示:

using (Stream s = new FileStream(path, FileMode.Open))
{
    s.Seek(offset, SeekOrigin.Begin);
    s.Read(buffer, 0, length);
}

Please Note

Seek() is too low level for what the OP wants. See this answer instead for line-by-line parsing.

Also, as Jordan mentioned, Seek() has the issue of character encodings and varying character sizes (e.g. for non-ASCII and non-ANSI files, like UTF, which is probably not applicable to this question). Thanks for pointing that out.


Original Answer

Seek() is only available on a stream, so try using sr.BaseStream.Seek(..), or use a different stream like such:

using (Stream s = new FileStream(path, FileMode.Open))
{
    s.Seek(offset, SeekOrigin.Begin);
    s.Read(buffer, 0, length);
}
山人契 2024-12-29 14:41:54

这是我给你的建议:

using (StreamReader sr = new StreamReader(@"\\t.txt"))
{
    char[] c = new char[20];  // Invoice number string 
    sr.BaseStream.Position = 395;
    sr.Read(c, 0, c.Length); 
}

Here is my suggestion for you:

using (StreamReader sr = new StreamReader(@"\\t.txt"))
{
    char[] c = new char[20];  // Invoice number string 
    sr.BaseStream.Position = 395;
    sr.Read(c, 0, c.Length); 
}
反话 2024-12-29 14:41:54

(基于评论的新答案)

您正在解析发票数据,每个条目都在一个新行上,并且所需的数据位于每行的固定偏移处。 Stream.Seek() 对于您想要执行的操作来说级别太低,因为您将需要多次查找,每一行一次。而是使用以下内容:

int offset = 395;
int length = 20;
using (StreamReader sr = new StreamReader(@"\\t.txt"))
{
    while (!sr.EndOfStream)
    {
        string line = sr.ReadLine();
        string myData = line.Substring(offset, length);
    }
}

(new answer based on comments)

You are parsing invoice data, with each entry on a new line, and the required data is at a fixed offset for every line. Stream.Seek() is too low level for what you want to do, because you will need several seeks, one for every line. Rather use the following:

int offset = 395;
int length = 20;
using (StreamReader sr = new StreamReader(@"\\t.txt"))
{
    while (!sr.EndOfStream)
    {
        string line = sr.ReadLine();
        string myData = line.Substring(offset, length);
    }
}
极度宠爱 2024-12-29 14:41:54

很久以前就解决了,只是想发布建议的解决方案

 using (StreamReader sr = new StreamReader(path2))
        {
          string line;
            while ((line = sr.ReadLine()) != null)
            {             
                dsnonhb.Tables[0].Columns.Add("InvoiceNum"  );
                dsnonhb.Tables[0].Columns.Add("Odo"         );
                dsnonhb.Tables[0].Columns.Add("PumpVal"      );
                dsnonhb.Tables[0].Columns.Add("Quantity"    );


                DataRow myrow;
                myrow = dsnonhb.Tables[0].NewRow();
                myrow["No"] = rowcounter.ToString();
                myrow["InvoiceNum"] = line.Substring(741, 6);
                myrow["Odo"] = line.Substring(499, 6);
                myrow["PumpVal"] = line.Substring(609, 7);
                myrow["Quantity"] = line.Substring(660, 6);

Solved this ages ago, just wanted to post the solution that was suggested

 using (StreamReader sr = new StreamReader(path2))
        {
          string line;
            while ((line = sr.ReadLine()) != null)
            {             
                dsnonhb.Tables[0].Columns.Add("InvoiceNum"  );
                dsnonhb.Tables[0].Columns.Add("Odo"         );
                dsnonhb.Tables[0].Columns.Add("PumpVal"      );
                dsnonhb.Tables[0].Columns.Add("Quantity"    );


                DataRow myrow;
                myrow = dsnonhb.Tables[0].NewRow();
                myrow["No"] = rowcounter.ToString();
                myrow["InvoiceNum"] = line.Substring(741, 6);
                myrow["Odo"] = line.Substring(499, 6);
                myrow["PumpVal"] = line.Substring(609, 7);
                myrow["Quantity"] = line.Substring(660, 6);
失去的东西太少 2024-12-29 14:41:54

我在 git hub 上的 Helpers 项目中创建了一个名为 AdvancedStreamReader 的类:

https://github.com/jsmunroe/Helpers/blob/master/Helpers/IO/AdvancedStreamReader.cs

它相当强大。它是 StreamReader 的子类,并保持所有功能不变。有一些注意事项:a)它在构造时重置流的位置; b) 在使用阅读器时不应寻找BaseStream; c) 如果换行符类型与环境不同且文件只能使用一种类型,则需要指定换行符类型。这里有一些单元测试来演示它是如何使用的。

    [TestMethod]
    public void ReadLineWithNewLineOnly()
    {
        // Setup
        var text = $"ƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nHa!";
        var bytes = Encoding.UTF8.GetBytes(text);
        var stream = new MemoryStream(bytes);
        var reader = new AdvancedStreamReader(stream, NewLineType.Nl);
        reader.ReadLine();

        // Execute
        var result = reader.ReadLine();

        // Assert
        Assert.AreEqual("ƒun ‼Æ¢ with åò☺ encoding!", result);
        Assert.AreEqual(54, reader.CharacterPosition);
    }


    [TestMethod]
    public void SeekCharacterWithUtf8()
    {
        // Setup
        var text = $"ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}Ha!";
        var bytes = Encoding.UTF8.GetBytes(text);
        var stream = new MemoryStream(bytes);
        var reader = new AdvancedStreamReader(stream);

        // Pre-condition assert
        Assert.IsTrue(bytes.Length > text.Length); // More bytes than characters in sample text.

        // Execute
        reader.SeekCharacter(84);

        // Assert
        Assert.AreEqual(84, reader.CharacterPosition);
        Assert.AreEqual($"Ha!", reader.ReadToEnd());
    }

我写这篇文章是为了自己使用,但我希望它对其他人有帮助。

I've created a class called AdvancedStreamReader into my Helpers project on git hub here:

https://github.com/jsmunroe/Helpers/blob/master/Helpers/IO/AdvancedStreamReader.cs

It is fairly robust. It is a subclass of StreamReader and keeps all of that functionality intact. There are a few caveats: a) it resets the position of the stream when it is constructed; b) you should not seek the BaseStream while you are using the reader; c) you need to specify the newline character type if it differs from the environment and the file can only use one type. Here are some unit tests to demonstrate how it is used.

    [TestMethod]
    public void ReadLineWithNewLineOnly()
    {
        // Setup
        var text = $"ƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nHa!";
        var bytes = Encoding.UTF8.GetBytes(text);
        var stream = new MemoryStream(bytes);
        var reader = new AdvancedStreamReader(stream, NewLineType.Nl);
        reader.ReadLine();

        // Execute
        var result = reader.ReadLine();

        // Assert
        Assert.AreEqual("ƒun ‼Æ¢ with åò☺ encoding!", result);
        Assert.AreEqual(54, reader.CharacterPosition);
    }


    [TestMethod]
    public void SeekCharacterWithUtf8()
    {
        // Setup
        var text = $"ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}Ha!";
        var bytes = Encoding.UTF8.GetBytes(text);
        var stream = new MemoryStream(bytes);
        var reader = new AdvancedStreamReader(stream);

        // Pre-condition assert
        Assert.IsTrue(bytes.Length > text.Length); // More bytes than characters in sample text.

        // Execute
        reader.SeekCharacter(84);

        // Assert
        Assert.AreEqual(84, reader.CharacterPosition);
        Assert.AreEqual($"Ha!", reader.ReadToEnd());
    }

I wrote this for my own use, but I hope it will help other people.

烟柳画桥 2024-12-29 14:41:54

395 是 c 数组中开始写入的索引。那里没有 395 索引,最大值是 19。
我会建议这样的事情。

StreamReader r;
...
string allFile = r.ReadToEnd();
int offset = 395;
int length = 20;

然后使用

allFile.Substring(offset, length)

395 is the index in c array at which you start writing. There's no 395 index there, max is 19.
I would suggest something like this.

StreamReader r;
...
string allFile = r.ReadToEnd();
int offset = 395;
int length = 20;

And then use

allFile.Substring(offset, length)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文