nodejs同步逐行读取大文件?

发布于 2024-12-06 22:26:30 字数 376 浏览 0 评论 0原文

我有一个大文件(utf8)。我知道 fs.createReadStream 可以创建流来读取大文件,但不同步。所以我尝试使用fs.readSync,但读取文本像“迈�”一样被破坏。

var fs = require('fs');
var util = require('util');
var textPath = __dirname + '/people-daily.txt';   
var fd = fs.openSync(textPath, "r");
var text = fs.readSync(fd, 4, 0, "utf8");
console.log(util.inspect(text, true, null));

I have a large file (utf8). I know fs.createReadStream can create stream to read a large file, but not synchronized. So i try to use fs.readSync, but read text is broken like "迈�".

var fs = require('fs');
var util = require('util');
var textPath = __dirname + '/people-daily.txt';   
var fd = fs.openSync(textPath, "r");
var text = fs.readSync(fd, 4, 0, "utf8");
console.log(util.inspect(text, true, null));

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

笨死的猪 2024-12-13 22:26:30

对于大文件,readFileSync 可能不方便,因为它将整个文件加载到内存中。另一种不同的同步方法是迭代调用 readSync,一次读取少量数据,并在数据行到来时对其进行处理。以下代码实现了这种方法,并一次同步处理文件“test.txt”中的一行:

var fs = require('fs');
var filename = 'test.txt'

var fd = fs.openSync(filename, 'r');
var bufferSize = 1024;
var buffer = new Buffer(bufferSize);

var leftOver = '';
var read, line, idxStart, idx;
while ((read = fs.readSync(fd, buffer, 0, bufferSize, null)) !== 0) {
  leftOver += buffer.toString('utf8', 0, read);
  idxStart = 0
  while ((idx = leftOver.indexOf("\n", idxStart)) !== -1) {
    line = leftOver.substring(idxStart, idx);
    console.log("one line read: " + line);
    idxStart = idx + 1;
  }
  leftOver = leftOver.substring(idxStart);
}

For large files, readFileSync can be inconvenient, as it loads the whole file in memory. A different synchronous approach is to iteratively call readSync, reading small bits of data at a time, and processing the lines as they come. The following bit of code implements this approach and synchronously processes one line at a time from the file 'test.txt':

var fs = require('fs');
var filename = 'test.txt'

var fd = fs.openSync(filename, 'r');
var bufferSize = 1024;
var buffer = new Buffer(bufferSize);

var leftOver = '';
var read, line, idxStart, idx;
while ((read = fs.readSync(fd, buffer, 0, bufferSize, null)) !== 0) {
  leftOver += buffer.toString('utf8', 0, read);
  idxStart = 0
  while ((idx = leftOver.indexOf("\n", idxStart)) !== -1) {
    line = leftOver.substring(idxStart, idx);
    console.log("one line read: " + line);
    idxStart = idx + 1;
  }
  leftOver = leftOver.substring(idxStart);
}
风渺 2024-12-13 22:26:30

使用 https://github.com/nacholibre/node-readlines

var lineByLine = require('n-readlines');
var liner = new lineByLine('./textFile.txt');

var line;
var lineNumber = 0;
while (line = liner.next()) {
    console.log('Line ' + lineNumber + ': ' + line.toString('ascii'));
    lineNumber++;
}

console.log('end of line reached');

use https://github.com/nacholibre/node-readlines

var lineByLine = require('n-readlines');
var liner = new lineByLine('./textFile.txt');

var line;
var lineNumber = 0;
while (line = liner.next()) {
    console.log('Line ' + lineNumber + ': ' + line.toString('ascii'));
    lineNumber++;
}

console.log('end of line reached');
执笔绘流年 2024-12-13 22:26:30

使用 readFileSync

fs.readFileSync(filename, [encoding]) 的同步版本
fs.readFile。返回文件名的内容。

如果指定了编码,则此函数返回一个字符串。
否则它返回一个缓冲区。

顺便说一句,由于您使用的是节点,所以我建议使用异步函数。

Use readFileSync:

fs.readFileSync(filename, [encoding]) Synchronous version of
fs.readFile. Returns the contents of the filename.

If encoding is specified then this function returns a string.
Otherwise it returns a buffer.

On a side note, since you are using node, I'd recommend using asynchronous functions.

温折酒 2024-12-13 22:26:30

我构建了一个更简单的版本 JB Kohn 的答案,它在缓冲区上使用 split() 。它适用于我尝试过的较大文件。

/*
 * Synchronously call fn(text, lineNum) on each line read from file descriptor fd.
 */
function forEachLine (fd, fn) {
    var bufSize = 64 * 1024;
    var buf = new Buffer(bufSize);
    var leftOver = '';
    var lineNum = 0;
    var lines, n;

    while ((n = fs.readSync(fd, buf, 0, bufSize, null)) !== 0) {
        lines = buf.toString('utf8', 0 , n).split('\n');
        lines[0] = leftOver+lines[0];       // add leftover string from previous read
        while (lines.length > 1) {          // process all but the last line
            fn(lines.shift(), lineNum);
            lineNum++;
        }
        leftOver = lines.shift();           // save last line fragment (may be '')
    }
    if (leftOver) {                         // process any remaining line
        fn(leftOver, lineNum);
    }
}

I built a simpler version JB Kohn's answer that uses split() on the buffer. It works on the larger files I tried.

/*
 * Synchronously call fn(text, lineNum) on each line read from file descriptor fd.
 */
function forEachLine (fd, fn) {
    var bufSize = 64 * 1024;
    var buf = new Buffer(bufSize);
    var leftOver = '';
    var lineNum = 0;
    var lines, n;

    while ((n = fs.readSync(fd, buf, 0, bufSize, null)) !== 0) {
        lines = buf.toString('utf8', 0 , n).split('\n');
        lines[0] = leftOver+lines[0];       // add leftover string from previous read
        while (lines.length > 1) {          // process all but the last line
            fn(lines.shift(), lineNum);
            lineNum++;
        }
        leftOver = lines.shift();           // save last line fragment (may be '')
    }
    if (leftOver) {                         // process any remaining line
        fn(leftOver, lineNum);
    }
}
断舍离 2024-12-13 22:26:30

两个潜在的问题,

  1. 开头的3个字节BOM你没有跳过
  2. 前4个字节不能很好地格式化为UTF8的字符(utf8不是固定长度)

two potential problems,

  1. 3bytes BOM at the beginning you did not skip
  2. first 4bytes cannot be well format to UTF8's chars( utf8 is not fixed length )
青萝楚歌 2024-12-13 22:26:30

Node 11 为此添加了异步迭代器语法。请参阅相关答案长话短说

const fs = require('fs')
const readline = require('readline')

const rl = readline.createInterface({
    input: fs.createReadStream(filename)
})

for await (const line of rl) {
  await processLine(line)
}

Node 11 added an async iterator syntax for this. See a related answer here

Long story short:

const fs = require('fs')
const readline = require('readline')

const rl = readline.createInterface({
    input: fs.createReadStream(filename)
})

for await (const line of rl) {
  await processLine(line)
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文