从 C 语言的文件中读取特定行数(scanf、fseek、fgets)

发布于 2024-09-27 17:36:40 字数 1904 浏览 5 评论 0原文

我有一个进程主机,它生成 N 个子进程,通过未命名管道与父进程进行通信。我必须能够:

  • 让父亲打开文件,然后向每个孩子发送一个结构,告诉它必须从 min 行读取到 ma​​x 行;
  • 这将同时发生,所以我不知道:
  • 第一个如何划分 N 个地图的总行数,
  • 第二个如何让每个孩子只读取它应该读取的行?

我的问题不涉及操作系统概念,只涉及文件操作:S

也许是 fseek?我无法 mmap 日志文件(有些日志文件超过 1GB)。

我会很感激一些想法。提前谢谢您

编辑:我试图让孩子们在不使用 fseek 和块值的情况下阅读相应的行,所以,有人可以告诉我这是否有效吗? :

//somewhere in the parent process:

    FILE* logFile = fopen(filename, "r");
                while (fgets(line, 1024, logFile) != NULL) {
                    num_lines++;
                }
                rewind(logFile);
                int prev = 0;
                for (i = 0; i < maps_nr; i++) {
                    struct send_to_Map request;
                    request.fp = logFile;
                    request.lower = lowLimit;
                    request.upper = highLimit;

                    if (i == 0)
                        request.minLine = 0;
                    else
                        request.minLine = 1 + prev;
                    if(i!=maps_nr-1)
                        request.maxLine = (request.minLine + num_lines / maps_nr) - 1;
                    else
                       request.maxLine = (request.minLine + num_lines / maps_nr)+(num_lines%maps_nr);
                    prev = request.maxLine;

                }
                //write this structure to respective pipe


//child process:

while(1) {
      ...
      //reads the structure to pipe (and knows which lines to read)
      int n=0, counter=0;
      while (fgets(line, 1024, logFile) != NULL){
         if (n>=minLine and n<=maxLine)
              counter+= process(Line);//returns 1 if IP was found, in that line, between the low and high limit
         n++; 
      }
     //(...) 
}

不知道能不能成功,只是想成功!即使这样,是否有可能优于读取整个文件并打印日志文件中找到的 ip 总数的单个进程?

I have a process master that spawns N child processes that communicate with the parent through unnamed pipes. I must be able to:

  • make the father open the file and then send, to each child, a struct telling that it has to read from min to max line;
  • this is going to happen at the same time, so I don't know:
  • 1st how to divide total_lines for N maps and
  • 2nd how do I make each child read just the lines it is supposed to?

My problem does not concern the O.S. concepts, only the file operations :S

Perhaps fseek? I can't mmap the log file (some have more than 1GB).

I would appreciate some ideas. Thank you in advance

EDIT: I'm trying to make the children read the respective lines without using fseek and the value of chunks, so, could someone please tell me if this is valid? :

//somewhere in the parent process:

    FILE* logFile = fopen(filename, "r");
                while (fgets(line, 1024, logFile) != NULL) {
                    num_lines++;
                }
                rewind(logFile);
                int prev = 0;
                for (i = 0; i < maps_nr; i++) {
                    struct send_to_Map request;
                    request.fp = logFile;
                    request.lower = lowLimit;
                    request.upper = highLimit;

                    if (i == 0)
                        request.minLine = 0;
                    else
                        request.minLine = 1 + prev;
                    if(i!=maps_nr-1)
                        request.maxLine = (request.minLine + num_lines / maps_nr) - 1;
                    else
                       request.maxLine = (request.minLine + num_lines / maps_nr)+(num_lines%maps_nr);
                    prev = request.maxLine;

                }
                //write this structure to respective pipe


//child process:

while(1) {
      ...
      //reads the structure to pipe (and knows which lines to read)
      int n=0, counter=0;
      while (fgets(line, 1024, logFile) != NULL){
         if (n>=minLine and n<=maxLine)
              counter+= process(Line);//returns 1 if IP was found, in that line, between the low and high limit
         n++; 
      }
     //(...) 
}

I don't know if it's going to work, I just to make it work! Even like this, is it possible to outperform a single process reading the whole file and printing the total number of ips found in the log file?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

傾城如夢未必闌珊 2024-10-04 17:36:40

如果您不关心精确均匀地划分文件,并且行长度的分布在整个文件中有些均匀,则可以避免在父级中读取整个文件一次。

  1. 获取文件大小。
  2. chunk_size = file_size / number_of_children
  3. 当您在父级中生成每个子级时:
    • 寻求 (child_num+1) * chunk_size
    • 向前阅读,直到找到换行符。
    • 生成子级,告诉它从前一个块的末尾开始(或第一个子级为 0),以及该块的实际长度。
  4. 每个子进程都会寻求start并读取chunk_size字节。

这是该策略的粗略草图。

编辑以简化一些事情。

编辑:以下是下面第 3 步和第 4 步的一些未经测试的代码。这一切都未经测试,我也没有注意过差一错误,但它让您了解了 fseekftell 的用法,这听起来喜欢你正在寻找的东西。

// Assume FILE* f is open to the file, chunk_size is the average expected size,
// child_num is the id of the current child, spawn_child() is a function that
// handles the logic of spawning a child and telling it where to start reading,
// and how much to read. child_chunks[] is an array of structs to keep track of
// where the chunks start and how big they are.
if(fseek(f, child_num * chunk_size, SEEK_SET) < 0) { handle_error(); }
int ch;
while((ch = fgetc(f)) != FEOF && ch != '\n')
{/*empty*/}

// FIXME: needs to handle EOF properly.
child_chunks[child_num].end = ftell(f); // FIXME: needs error check.
child_chunks[child_num+1].start = child_chunks[child_num].end + 1;
spawn_child(child_num);

然后在您的子进程中(第 4 步),假设子进程有权访问 child_chunks[] 并知道其 child_num

void this_is_the_child(int child_num)
{
    /* ... */

    fseek(f, child_chunks[child_num].start, SEEK_SET); // FIXME: handle error
    while(fgets(...) && ftell(f) < child_chunks[child_num].end)
    {
    }
}

If you don't care about dividing the file exactly evenly, and the distribution of line lengths is somewhat even over the entire file, you can avoid reading the entire file in the parent once.

  1. Get the file size.
  2. chunk_size = file_size / number_of_children
  3. When you spawn each child do in the parent:
    • seek to (child_num+1) * chunk_size
    • Read forward until you find a newline.
    • Spawn the child, telling it to start at the end of the previous chunk (or 0 for the first child), and the actual length of the chunk.
  4. Each child seeks to start and reads chunk_size bytes.

That's a rough sketch of the strategy.

Edited to simplify things a bit.

Edit: here's some untested code for step 3, and step 4 below. This is all untested, and I haven't been careful about off-by-one errors, but it gives you an idea of the usage of fseek and ftell, which sounds like what you are looking for.

// Assume FILE* f is open to the file, chunk_size is the average expected size,
// child_num is the id of the current child, spawn_child() is a function that
// handles the logic of spawning a child and telling it where to start reading,
// and how much to read. child_chunks[] is an array of structs to keep track of
// where the chunks start and how big they are.
if(fseek(f, child_num * chunk_size, SEEK_SET) < 0) { handle_error(); }
int ch;
while((ch = fgetc(f)) != FEOF && ch != '\n')
{/*empty*/}

// FIXME: needs to handle EOF properly.
child_chunks[child_num].end = ftell(f); // FIXME: needs error check.
child_chunks[child_num+1].start = child_chunks[child_num].end + 1;
spawn_child(child_num);

Then in your child (step 4), assume the child has access to child_chunks[] and knows its child_num:

void this_is_the_child(int child_num)
{
    /* ... */

    fseek(f, child_chunks[child_num].start, SEEK_SET); // FIXME: handle error
    while(fgets(...) && ftell(f) < child_chunks[child_num].end)
    {
    }
}
画中仙 2024-10-04 17:36:40
/* get an array with line-startpositions (file-offsets) */
fpos_t readLineBegins(FILE *f,fpos_t **begins)
{
  fpos_t ch=0, mark=0, num=0;
  *begins = 0;
  do {
    if( ch=='\n' )
    {
       *begins = realloc( *begins, ++num * sizeof(fpos_t) );
      (*begins)[num-1] = mark;
        mark = ftell(f);
    }
  } while( (ch=fgetc(f))!=EOF );

  if( mark<ftell(f) )
  {
    *begins = realloc( *begins, ++num * sizeof(fpos_t) );
    (*begins)[num-1]=mark;
  }

  return num;
}

/* output linenumber beg...end */
void workLineBlocks(FILE *f,fpos_t *begins,fpos_t beg,fpos_t end)
{
  while( beg<=end )
  {
    int ch;
    fsetpos( f, &begins[beg] ); /* set linestart-position */
    printf("%ld:", ++beg );
    while( (ch=fgetc(f))!=EOF && ch!='\n' && ch!='\r' )
      putchar(ch);
    puts("");
  }
}

main()
{
  FILE *f=fopen("file.txt","rb");
  fpos_t *lineBegins, /* Array with line-startpositions */
  lb = readLineBegins(f,&lineBegins); /* get number of lines */

  workLineBlocks(f,lineBegins,lb-2,lb-1); /* out last two lines */
  workLineBlocks(f,lineBegins,0,1); /* out first two lines */

  fclose(f);
  free(lineBegins);
}
/* get an array with line-startpositions (file-offsets) */
fpos_t readLineBegins(FILE *f,fpos_t **begins)
{
  fpos_t ch=0, mark=0, num=0;
  *begins = 0;
  do {
    if( ch=='\n' )
    {
       *begins = realloc( *begins, ++num * sizeof(fpos_t) );
      (*begins)[num-1] = mark;
        mark = ftell(f);
    }
  } while( (ch=fgetc(f))!=EOF );

  if( mark<ftell(f) )
  {
    *begins = realloc( *begins, ++num * sizeof(fpos_t) );
    (*begins)[num-1]=mark;
  }

  return num;
}

/* output linenumber beg...end */
void workLineBlocks(FILE *f,fpos_t *begins,fpos_t beg,fpos_t end)
{
  while( beg<=end )
  {
    int ch;
    fsetpos( f, &begins[beg] ); /* set linestart-position */
    printf("%ld:", ++beg );
    while( (ch=fgetc(f))!=EOF && ch!='\n' && ch!='\r' )
      putchar(ch);
    puts("");
  }
}

main()
{
  FILE *f=fopen("file.txt","rb");
  fpos_t *lineBegins, /* Array with line-startpositions */
  lb = readLineBegins(f,&lineBegins); /* get number of lines */

  workLineBlocks(f,lineBegins,lb-2,lb-1); /* out last two lines */
  workLineBlocks(f,lineBegins,0,1); /* out first two lines */

  fclose(f);
  free(lineBegins);
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文