Java中以一定速率读取文件

发布于 2024-07-19 19:55:58 字数 68 浏览 5 评论 0原文

有没有关于如何以一定速率读取长文件的文章/算法?

假设我不想在发出读取时以 10 KB/秒的速度传递。

Is there an article/algorithm on how I can read a long file at a certain rate?

Say I do not want to pass 10 KB/sec while issuing reads.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

雨后咖啡店 2024-07-26 19:55:58

一个简单的解决方案,通过创建 ThrottledInputStream。

应该这样使用:

        final InputStream slowIS = new ThrottledInputStream(new BufferedInputStream(new FileInputStream("c:\\file.txt"),8000),300);

300 是每秒千字节数。 8000 是 BufferedInputStream 的块大小。

当然,这应该通过实现 read(byte b[], int off, int len) 来概括,这将为您节省大量的 System.currentTimeMillis() 调用。 每次读取字节都会调用一次 System.currentTimeMillis(),这可能会导致一些开销。 还应该可以存储无需调用 System.currentTimeMillis() 即可保存读取的字节数。

确保在两者之间放置一个 BufferedInputStream,否则 FileInputStream 将以单个字节而不是块进行轮询。 这会将 CPU 负载从 10% 减少到几乎 0。您将面临超过数据速率的风险由块大小中的字节数决定。

import java.io.InputStream;
import java.io.IOException;

public class ThrottledInputStream extends InputStream {
    private final InputStream rawStream;
    private long totalBytesRead;
    private long startTimeMillis;

    private static final int BYTES_PER_KILOBYTE = 1024;
    private static final int MILLIS_PER_SECOND = 1000;
    private final int ratePerMillis;

    public ThrottledInputStream(InputStream rawStream, int kBytesPersecond) {
        this.rawStream = rawStream;
        ratePerMillis = kBytesPersecond * BYTES_PER_KILOBYTE / MILLIS_PER_SECOND;
    }

    @Override
    public int read() throws IOException {
        if (startTimeMillis == 0) {
            startTimeMillis = System.currentTimeMillis();
        }
        long now = System.currentTimeMillis();
        long interval = now - startTimeMillis;
        //see if we are too fast..
        if (interval * ratePerMillis < totalBytesRead + 1) { //+1 because we are reading 1 byte
            try {
                final long sleepTime = ratePerMillis / (totalBytesRead + 1) - interval; // will most likely only be relevant on the first few passes
                Thread.sleep(Math.max(1, sleepTime));
            } catch (InterruptedException e) {//never realized what that is good for :)
            }
        }
        totalBytesRead += 1;
        return rawStream.read();
    }
}

A simple solution, by creating a ThrottledInputStream.

This should be used like this:

        final InputStream slowIS = new ThrottledInputStream(new BufferedInputStream(new FileInputStream("c:\\file.txt"),8000),300);

300 is the number of kilobytes per second. 8000 is the block size for BufferedInputStream.

This should of course be generalized by implementing read(byte b[], int off, int len), which will spare you a ton of System.currentTimeMillis() calls. System.currentTimeMillis() is called once for each byte read, which can cause a bit of an overhead. It should also be possible to store the number of bytes that can savely be read without calling System.currentTimeMillis().

Be sure to put a BufferedInputStream in between, otherwise the FileInputStream will be polled in single bytes rather than blocks. This will reduce the CPU load form 10% to almost 0. You will risk to exceed the data rate by the number of bytes in the block size.

import java.io.InputStream;
import java.io.IOException;

public class ThrottledInputStream extends InputStream {
    private final InputStream rawStream;
    private long totalBytesRead;
    private long startTimeMillis;

    private static final int BYTES_PER_KILOBYTE = 1024;
    private static final int MILLIS_PER_SECOND = 1000;
    private final int ratePerMillis;

    public ThrottledInputStream(InputStream rawStream, int kBytesPersecond) {
        this.rawStream = rawStream;
        ratePerMillis = kBytesPersecond * BYTES_PER_KILOBYTE / MILLIS_PER_SECOND;
    }

    @Override
    public int read() throws IOException {
        if (startTimeMillis == 0) {
            startTimeMillis = System.currentTimeMillis();
        }
        long now = System.currentTimeMillis();
        long interval = now - startTimeMillis;
        //see if we are too fast..
        if (interval * ratePerMillis < totalBytesRead + 1) { //+1 because we are reading 1 byte
            try {
                final long sleepTime = ratePerMillis / (totalBytesRead + 1) - interval; // will most likely only be relevant on the first few passes
                Thread.sleep(Math.max(1, sleepTime));
            } catch (InterruptedException e) {//never realized what that is good for :)
            }
        }
        totalBytesRead += 1;
        return rawStream.read();
    }
}
倾其所爱 2024-07-26 19:55:58

粗略的解决方案是一次读取一个块,然后睡眠,例如 10k,然后睡眠一秒钟。 但我要问的第一个问题是:为什么? 有几个可能的答案:

  1. 您不希望创造工作的速度快于其所能完成的速度; 或者
  2. 您不想给系统造成太大的负载。

我的建议是不要把它控制在阅读层面。 这有点混乱和不准确。 相反,在工作结束时对其进行控制。 Java 有很多很棒的并发工具来处理这个问题。 有几种替代方法可以做到这一点。

我倾向于使用 生产者消费者 模式来解决此类问题。 它为您提供了很好的选择,可以通过报告线程等来监控进度,并且它可以是一个非常干净的解决方案。

类似于 ArrayBlockingQueue可用于 (1) 和 (2) 所需的节流类型。 由于容量有限,当队列已满时,读取器最终会阻塞,因此不会填得太快。 可以控制工人(消费者)只工作得如此快,以限制覆盖率(2)。

The crude solution is just to read a chunk at a time and then sleep eg 10k then sleep a second. But the first question I have to ask is: why? There are a couple of likely answers:

  1. You don't want to create work faster than it can be done; or
  2. You don't want to create too great a load on the system.

My suggestion is not to control it at the read level. That's kind of messy and inaccurate. Instead control it at the work end. Java has lots of great concurrency tools to deal with this. There are a few alternative ways of doing this.

I tend to like using a producer consumer pattern for soling this kind of problem. It gives you great options on being able to monitor progress by having a reporting thread and so on and it can be a really clean solution.

Something like an ArrayBlockingQueue can be used for the kind of throttling needed for both (1) and (2). With a limited capacity the reader will eventually block when the queue is full so won't fill up too fast. The workers (consumers) can be controlled to only work so fast to also throttle the rate covering (2).

风渺 2024-07-26 19:55:58
  • 而!EOF
    • 将 System.currentTimeMillis() + 1000(1 秒)存储在长变量中
    • 读取 10K 缓冲区
    • 检查存储时间是否已过
      • 如果不是,Thread.sleep() 存储时间 - 当前时间

创建 ThrottledInputStream 并按照建议采用另一个 InputStream 将是一个不错的解决方案。

  • while !EOF
    • store System.currentTimeMillis() + 1000 (1 sec) in a long variable
    • read a 10K buffer
    • check if stored time has passed
      • if it isn't, Thread.sleep() for stored time - current time

Creating ThrottledInputStream that takes another InputStream as suggested would be a nice solution.

苏辞 2024-07-26 19:55:58

如果您使用过 Java I/O,那么您应该熟悉装饰流。 我建议使用一个 InputStream 子类,它采用另一个 InputStream 并限制流量。 (您可以子类化FileInputStream,但这种方法非常容易出错且不灵活。)

您的具体实现将取决于您的具体要求。 通常,您需要记下上次读取返回的时间 (System.nanoTime)。 在当前读取中,在底层读取之后,等待,直到经过足够的时间来传输数据量。 更复杂的实现可能会缓冲并(几乎)立即返回,仅包含速率指示的数据(请注意,如果缓冲区长度为零,则只应返回读取长度 0)。

If you have used Java I/O then you should be familiar with decorating streams. I suggest an InputStream subclass that takes another InputStream and throttles the flow rate. (You could subclass FileInputStream but that approach is highly error-prone and inflexible.)

Your exact implementation will depend upon your exact requirements. Generally you will want to note the time your last read returned (System.nanoTime). On the current read, after the underlying read, wait until sufficient time has passed for the amount of data transferred. A more sophisticated implementation may buffer and return (almost) immediately with only as much data as rate dictates (be careful that you should only return a read length of 0 if the buffer is of zero length).

仙女 2024-07-26 19:55:58

您可以使用速率限制器。 并在 InputStream 中实现您自己的读取。 下面是一个示例,

public class InputStreamFlow extends InputStream {
    private final InputStream inputStream;
    private final RateLimiter maxBytesPerSecond;

    public InputStreamFlow(InputStream inputStream, RateLimiter limiter) {
        this.inputStream = inputStream;
        this.maxBytesPerSecond = limiter;
    }

    @Override
    public int read() throws IOException {
        maxBytesPerSecond.acquire(1);
        return (inputStream.read());
    }

    @Override
    public int read(byte[] b) throws IOException {
        maxBytesPerSecond.acquire(b.length);
        return (inputStream.read(b));
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        maxBytesPerSecond.acquire(len);
        return (inputStream.read(b,off, len));
    }
}

如果您想将流量限制为 1 MB/s,您可以像这样获取输入流:

final RateLimiter limiter = RateLimiter.create(RateLimiter.ONE_MB); 
final InputStreamFlow inputStreamFlow = new InputStreamFlow(originalInputStream, limiter);

You can use a RateLimiter. And make your own implementation of the read in InputStream. An example of this can be seen bellow

public class InputStreamFlow extends InputStream {
    private final InputStream inputStream;
    private final RateLimiter maxBytesPerSecond;

    public InputStreamFlow(InputStream inputStream, RateLimiter limiter) {
        this.inputStream = inputStream;
        this.maxBytesPerSecond = limiter;
    }

    @Override
    public int read() throws IOException {
        maxBytesPerSecond.acquire(1);
        return (inputStream.read());
    }

    @Override
    public int read(byte[] b) throws IOException {
        maxBytesPerSecond.acquire(b.length);
        return (inputStream.read(b));
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        maxBytesPerSecond.acquire(len);
        return (inputStream.read(b,off, len));
    }
}

if you want to limit the flow by 1 MB/s you can get the input stream like this:

final RateLimiter limiter = RateLimiter.create(RateLimiter.ONE_MB); 
final InputStreamFlow inputStreamFlow = new InputStreamFlow(originalInputStream, limiter);
她比我温柔 2024-07-26 19:55:58

这在一定程度上取决于您的意思是“不超过某个比率”还是“保持接近某个比率”。

如果你的意思是“不超过”,你可以用一个简单的循环来保证:

 while not EOF do
    read a buffer
    Thread.wait(time)
    write the buffer
 od

等待的时间是缓冲区大小的简单函数; 如果缓冲区大小为 10K 字节,则需要在读取之间等待一秒钟。

如果你想比这更接近,你可能需要使用计时器。

  • 创建一个 Runnable 来执行读取
  • 创建一个 Timer 带有 TimerTask
  • >安排 TimerTask 每秒 n

。 如果您担心将数据传递给其他对象的速度,则不要控制读取,而是将数据放入队列或循环缓冲区等数据结构中,并控制另一端; 定期发送数据。 不过,您需要小心这一点,具体取决于数据集大小等,因为如果读取器比写入器快得多,您可能会遇到内存限制。

It depends a little on whether you mean "don't exceed a certain rate" or "stay close to a certain rate."

If you mean "don't exceed", you can guarantee that with a simple loop:

 while not EOF do
    read a buffer
    Thread.wait(time)
    write the buffer
 od

The amount of time to wait is a simple function of the size of the buffer; if the buffer size is 10K bytes, you want to wait a second between reads.

If you want to get closer than that, you probably need to use a timer.

  • create a Runnable to do the reading
  • create a Timer with a TimerTask to do the reading
  • schedule the TimerTask n times a second.

If you're concerned about the speed at which you're passing the data on to something else, instead of controlling the read, put the data into a data structure like a queue or circular buffer, and control the other end; send data periodically. You need to be careful with that, though, depending on the data set size and such, because you can run into memory limitations if the reader is very much faster than the writer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文