如何知道文件是否以新的行字符结束

发布于 2025-01-22 17:04:56 字数 1328 浏览 0 评论 0原文

我正在尝试在文件的末尾输入一条线，该文件具有以下形状“ 1：1：1：1”，因此在某些时候，文件的末尾可能具有新的行字符，并且按顺序具有要执行操作我必须处理的操作，因此我提出了以下解决方案：转到文件的末尾，然后通过1个字符（我想是Linux OS中的新行字符的长度）向后移动，请阅读该字符，如果不是新的行字符，则插入一个字符，然后插入整个字符解决方案在C上的翻译

int insert_element(char filename[]){
    elements *elem;
    FILE *p,*test;
    size_t size = 0;
    char *buff=NULL;
    char c='\n';
    if((p = fopen(filename,"a"))!=NULL){
        if(test = fopen(filename,"a")){
            fseek(test,-1,SEEK_END );
            c= getc(test);
            if(c!='\n'){
                fprintf(test,"\n");
            }
        }
        fclose(test);
        p = fopen(filename,"a");
        fseek(p,0,SEEK_END);
        elem=(elements *)malloc(sizeof(elements));
        fflush(stdin);
        printf("\ninput the ID\n");
        scanf("%d",&elem->id);
        printf("input the adress \n");
        scanf("%s",elem->adr);
        printf("innput the type \n");
        scanf("%s",elem->type);
        printf("intput the mark \n");
        scanf("%s",elem->mark);
        fprintf(p,"%d :%s :%s :%s",elem->id,elem->adr,elem->type,elem->mark);
        free(elem);
        fflush(stdin);
        fclose(p);
   return 1;
   }else{
       printf("\nRrror while opening the file !\n");
       return 0;
   }
}

否则行插入行，这是该一种最佳方式，在另一个单词中，所有操作系统都可以使用

原文

I'm trying to input a line at the end of a file that has the following shape "1 :1 :1 :1" , so at some point the file may have a new line character at the end of it, and in order to execute the operation I have to deal with that, so I came up with the following solution :
go to the end of the file and go backward by 1 characters (the length of the new line character in Linux OS as I guess), read that character and if it wasn't a new line character insert a one and then insert the whole line else go and insert the line, and this is the translation of that solution on C :

int insert_element(char filename[]){
    elements *elem;
    FILE *p,*test;
    size_t size = 0;
    char *buff=NULL;
    char c='\n';
    if((p = fopen(filename,"a"))!=NULL){
        if(test = fopen(filename,"a")){
            fseek(test,-1,SEEK_END );
            c= getc(test);
            if(c!='\n'){
                fprintf(test,"\n");
            }
        }
        fclose(test);
        p = fopen(filename,"a");
        fseek(p,0,SEEK_END);
        elem=(elements *)malloc(sizeof(elements));
        fflush(stdin);
        printf("\ninput the ID\n");
        scanf("%d",&elem->id);
        printf("input the adress \n");
        scanf("%s",elem->adr);
        printf("innput the type \n");
        scanf("%s",elem->type);
        printf("intput the mark \n");
        scanf("%s",elem->mark);
        fprintf(p,"%d :%s :%s :%s",elem->id,elem->adr,elem->type,elem->mark);
        free(elem);
        fflush(stdin);
        fclose(p);
   return 1;
   }else{
       printf("\nRrror while opening the file !\n");
       return 0;
   }
}

as you may notice that the whole program depends on the length of the new line character (1 character "\n") so I wonder if there is an optimal way, in another word works on all OS's

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

疯狂的代价 2025-01-29 17:04:56

看来您已经了解了附加到文件的基础知识，因此我们只需要弄清楚该文件是否已经以newline结束。

在一个完美的世界中，您会跳到文件的末尾，备份一个字符，读取该字符，看看它是否匹配'\ n'。类似的事情：

FILE *f = fopen(filename, "r");
fseek(f, -1, SEEK_END);  /* this is a problem */
int c = fgetc(f);
fclose(f);
if (c != '\n') {
  /* we need to append a newline before the new content */
}

尽管这可能在Posix系统上有效，但它对其他许多产品都无法使用。该问题植根于多种不同的方式，系统在文本文件中分开和/或终止行。在C和C ++中，'\ n'是一个特殊的值，它告诉 text Mode 输出例程，以执行需要做的任何事情来插入线路断开。同样，文本模式输入例程将在返回读取数据时将每个行断裂转换为'\ n'。

在POSIX系统（例如Linux）上，线路断路由线供稿字符（LF）表示，该字符占据了UTF-8编码文本中的单个字节。因此，编译器仅将'\ n'定义为线供稿字符，然后输入和输出例程不必在文本模式下做任何特别的事情。

在某些较旧的系统（例如旧的MacOS和Amiga）上，换行线可能是由马车返回字符（CR）表示的。许多IBM大型机都使用了称为EBCDIC的不同字符编码，该字符没有直接映射LF或CR，但它们确实具有称为Next Line（NL）的特殊控制字符。甚至有一些系统（例如VM，IIRC）都没有使用流模型来用于文本文件，而是使用变量长度记录来表示每条线，因此线路断裂本身是隐式的，而不是由特定的控制字符标记。

其中大多数都是您在现代系统上不会面临的挑战。 Unicode添加了更多的换行惯例，但是很少有软件以一般的方式支持它们。

剩余的主要线路断定惯例是CR+LF组合。 CR+LF具有挑战性的是它是两个控制字符，但是C I/O函数必须使它们显示给程序员，好像它们是单个字符'\ n'。这对流式传输文本进出并不重要。但这使得在文件中难以定义寻求。这使我们回到有问题的线上：

fseek(f, -1, SEEK_END);

从末端备份“一个字符”的系统中，在线路断裂的末端备份“一个字符”是什么意思，在该系统中，线断裂由两个字符序列（例如LF+Cr）表示？我们是否真的希望I/O系统可能必须扫描整个文件，以便fseek（和ftell）来弄清楚如何理解偏移量？

C人纵向的标准。 在文本模式中，fseek的偏移参数只能为0或以前调用返回的值ftell。因此，有问题的呼叫（负相抵消）无效。（在POSIX系统上，无效呼叫fseek可能会起作用，但该标准不需要它。）

另外，POSIX将LF定义为行 terminator 而不是而不是 saparator ，因此不常见的非代码>'\ n'结尾的非空文本文件不常见（尽管确实发生了）。

对于更便宜的解决方案，我们有两个选择：

在文本模式下读取整个文件，请记住您阅读的最新字符是否为'\ n'。
此选项效率非常低，因此，除非您仅偶尔或仅使用简短文件进行此操作，否则我们可以排除。
以 binary 模式打开文件，从末端寻求几个字节，然后读取到末尾，记住您阅读的最后一件事是否是有效的换行顺序。
如果我们的fseek在以二进制模式打开文件时不支持seek_end Origin，这可能是一个问题。是的，C标准说是可选的。但是，大多数实现都支持它，因此我们将保持此选项打开。
由于该文件将在二进制模式下读取，因此输入例程不会将平台的换行顺序转换为'\ n'。我们需要一台状态机来检测一个多个字节长的线路断裂序列。
让我们做一个简化的假设，即线路断裂是LF或CR+LF。在后一种情况下，我们不在乎CR，因此我们可以从末尾简单地备份一个字节并测试是否是LF。
哦，我们必须弄清楚如何使用一个空文件。

bool NeedsLineBreak(const char *filename) {
  const int LINE_FEED = '\x0A';
  FILE *f = fopen(filename, "rb");  /* binary mode */
  if (f == NULL) return false;
  const bool empty_file = fseek(f, 0, SEEK_END) == 0 && ftell(f) == 0;
  const bool result = !empty_file ||
    (fseek(f, -1, SEEK_END) == 0 && fgetc(f) == LINE_FEED);
  fclose(f);
  return result;
}

It seems you already understand the basics of appending to a file, so we just have to figure out whether the file already ends with a newline.

In a perfect world, you'd jump to the end of the file, back up one character, read that character, and see if it matches '\n'. Something like this:

FILE *f = fopen(filename, "r");
fseek(f, -1, SEEK_END);  /* this is a problem */
int c = fgetc(f);
fclose(f);
if (c != '\n') {
  /* we need to append a newline before the new content */
}

Though this will likely work on Posix systems, it won't work on many others. The problem is rooted in the many different ways systems separate and/or terminate lines in text files. In C and C++, '\n' is a special value that tells the text mode output routines to do whatever needs to be done to insert a line break. Likewise, the text mode input routines will translate each line break to '\n' as it returns the data read.

On Posix systems (e.g., Linux), a line break is indicated by a line feed character (LF) which occupies a single byte in UTF-8 encoded text. So the compiler just defines '\n' to be a line feed character, and then the input and output routines don't have to do anything special in text mode.

On some older systems (like old MacOS and Amiga) a line break might be a represented by a carriage return character (CR). Many IBM mainframes used different character encodings called EBCDIC that don't have a direct mappings for LF or CR, but they do have a special control character called next line (NL). There were even systems (like VMS, IIRC) that didn't use a stream model for text files but instead used variable length records to represent each line, so the line breaks themselves were implicit rather than marked by a specific control character.

Most of those are challenges you won't face on modern systems. Unicode added more line break conventions, but very little software supports them in a general way.

The remaining major line break convention is the combination CR+LF. What makes CR+LF challenging is that it's two control characters, but the C i/o functions have to make them appear to the programmer as though they are the single character '\n'. That's not a big deal with streaming text in or out. But it makes seeking within a file hard to define. And that brings us back to the problematic line:

fseek(f, -1, SEEK_END);

What does it mean to back up "one character" from the end on a system where line breaks are indicated by a two character sequence like LF+CR? Do we really want the i/o system to have to possibly scan the entire file in order for fseek (and ftell) to figure out how to make sense of the offset?

The C standards people punted. In text mode, the offset argument for fseek can only be 0 or a value returned by a previous call to ftell. So the problematic call, with a negative offset, isn't valid. (On Posix systems, the invalid call to fseek will likely work, but the standard doesn't require it to.)

Also note that Posix defines LF as a line terminator rather than a separator, so a non-empty text file that doesn't end with a '\n' should be uncommon (though it does happen).

For a more portable solution, we have two choices:

Read the entire file in text mode, remembering whether the most recent character you read was '\n'.
This option is hugely inefficient, so unless you're going to do this only occasionally or only with short files, we can rule that out.
Open the file in binary mode, seek backwards a few bytes from the end, and then read to the end, remembering whether the last thing you read was a valid line break sequence.
This might be a problem if our fseek doesn't support the SEEK_END origin when the file is opened in binary mode. Yep, the C standard says supporting that is optional. However, most implementations do support it, so we'll keep this option open.
Since the file will be read in binary mode, the input routines aren't going to convert the platform's line break sequence to '\n'. We'll need a state machine to detect line break sequences that are more than one byte long.
Let's make the simplifying assumption that a line break is either LF or CR+LF. In the latter case, we don't care about the CR, so we can simply back up one byte from the end and test whether it's LF.
Oh, and we have to figure out what to do with an empty file.

bool NeedsLineBreak(const char *filename) {
  const int LINE_FEED = '\x0A';
  FILE *f = fopen(filename, "rb");  /* binary mode */
  if (f == NULL) return false;
  const bool empty_file = fseek(f, 0, SEEK_END) == 0 && ftell(f) == 0;
  const bool result = !empty_file ||
    (fseek(f, -1, SEEK_END) == 0 && fgetc(f) == LINE_FEED);
  fclose(f);
  return result;
}

回复收藏 0 原文

~没有更多了~