什么是空终止字符串?

发布于 2024-08-17 01:20:38 字数 109 浏览 11 评论 0原文

它与 std::string 有什么不同?

How does it differ from std::string?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

软糖 2024-08-24 01:20:38

“字符串”实际上只是一个 char 数组;以空字符结尾的字符串是指其中空字符 '\0' 标记字符串结尾(不一定是数组结尾)的字符串。代码中的所有字符串(由双引号 "" 分隔)都会由编译器自动以 null 终止。

例如,"hi"{'h', 'i', '\0'} 相同。

A "string" is really just an array of chars; a null-terminated string is one where a null character '\0' marks the end of the string (not necessarily the end of the array). All strings in code (delimited by double quotes "") are automatically null-terminated by the compiler.

So for example, "hi" is the same as {'h', 'i', '\0'}.

绮烟 2024-08-24 01:20:38

以 null 结尾的字符串是连续的字符序列,其中最后一个字符的二进制位模式全为零。我不确定“普通字符串”是什么意思,但如果您的意思是 std::string,那么不需要 std::stringC++11 之前)是连续的,并且不需要有终止符。此外,std::string 的字符串数据始终由包含它的 std::string 对象分配和管理;对于以 null 结尾的字符串,不存在这样的容器,您通常使用裸指针来引用和管理此类字符串。

所有这些都应该在任何像样的 C++ 教科书中涵盖 - 我建议获取 Accelerated C++,其中之一其中最好的。

A null-terminated string is a contiguous sequence of characters, the last one of which has the binary bit pattern all zeros. I'm not sure what you mean by a "usual string", but if you mean std::string, then a std::string is not required (until C++11) to be contiguous, and is not required to have a terminator. Also, a std::string's string data is always allocated and managed by the std::string object that contains it; for a null-terminated string, there is no such container, and you typically refer to and manage such strings using bare pointers.

All of this should really be covered in any decent C++ text book - I recommend getting hold of Accelerated C++, one of the best of them.

岁月流歌 2024-08-24 01:20:38

表示字符串的主要方式有两种:

1) 末尾带有 ASCII 空 (nul) 字符 0 的字符序列。您可以通过搜索终止符来判断它有多长。这称为空终止字符串,或有时称为空终止字符串。

2) 字符序列,加上一个单独的字段(整数长度,或指向字符串末尾的指针),告诉您它有多长。

我不确定“通常的字符串”,但经常发生的情况是,在谈论特定语言时,“字符串”一词用于表示该语言的标准表示形式。所以在Java中,java.lang.String是一个类型2字符串,所以这就是“字符串”的意思。在 C 语言中,“字符串”可能表示 1 类字符串。为了精确起见,该标准相当冗长,但人们总是想忽略“显而易见”的内容。

不幸的是,在 C++ 中,这两种类型都是标准的。 std::string 是类型 2 字符串[*],但从 C 继承的标准库函数对类型 1 字符串进行操作。

[*] 实际上,std::string 通常被实现为字符数组,具有单独的长度字段 一个 nul 终止符。这样就可以实现 c_str() 函数,而无需复制或重新分配字符串数据。我不记得在不存储长度字段的情况下实现 std::string 是否合法:问题是标准需要什么复杂性保证。对于一般容器,建议 size() 为 O(1),但实际上并不要求如此。因此,即使它是合法的,仅使用 null 终止符的 std::string 实现也会令人惊讶。

There are two main ways to represent a string:

1) A sequence of characters with an ASCII null (nul) character, 0, at the end. You can tell how long it is by searching for the terminator. This is called a null-terminated string, or sometimes nul-terminated.

2) A sequence of characters, plus a separate field (either an integer length, or a pointer to the end of the string), to tell you how long it is.

I'm not sure about "usual string", but what quite often happens is that when talking about a particular language, the word "string" is used to mean the standard representation for that language. So in Java, java.lang.String is a type 2 string, so that's what "string" means. In C, "string" probably means a type 1 string. The standard is quite verbose in order to be precise, but people always want to leave out what's "obvious".

In C++, unfortunately, both types are standard. std::string is a type 2 string[*], but standard library functions inherited from C operate on type 1 strings.

[*] Actually, std::string is often implemented as an array of characters, with a separate length field and a nul terminator. That's so that the c_str() function can be implemented without ever needing to copy or re-allocate the string data. I can't remember off-hand whether it's legal to implement std::string without storing a length field: the question is what complexity guarantees are required by the standard. For containers in general size() is recommended to be O(1), but isn't actually required to be. So even if it is legal, an implementation of std::string that just uses nul-terminators would be surprising.

清引 2024-08-24 01:20:38
'\0' 

是一个 ASCII 字符,代码为 0、空终止符、空字符、NUL。在C语言中,它用作保留字符,用于表示字符串的结尾。许多标准函数(例如 strcpy、strlen、strcmp 等)都依赖于此。否则,如果没有 NUL,则必须使用另一种方式来表示字符串结束:

这允许字符串为任意长度,且开销仅为 1
字节;存储计数的替代方法需要一个字符串
长度限制为 255 或超过 1 个字节的开销。

来自维基百科

C++ std::string遵循另一个约定,其数据由名为 _Rep 的结构表示:

// _Rep: string representation
      //   Invariants:
      //   1. String really contains _M_length + 1 characters: due to 21.3.4
      //      must be kept null-terminated.
      //   2. _M_capacity >= _M_length
      //      Allocated memory is always (_M_capacity + 1) * sizeof(_CharT).
      //   3. _M_refcount has three states:
      //      -1: leaked, one reference, no ref-copies allowed, non-const.
      //       0: one reference, non-const.
      //     n>0: n + 1 references, operations require a lock, const.
      //   4. All fields==0 is an empty string, given the extra storage
      //      beyond-the-end for a null terminator; thus, the shared
      //      empty string representation needs no constructor.

      struct _Rep_base
      {
    size_type       _M_length;
    size_type       _M_capacity;
    _Atomic_word        _M_refcount;
      };

struct _Rep : _Rep_base
      {
    // Types:
    typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;

    // (Public) Data members:

    // The maximum number of individual char_type elements of an
    // individual string is determined by _S_max_size. This is the
    // value that will be returned by max_size().  (Whereas npos
    // is the maximum number of bytes the allocator can allocate.)
    // If one was to divvy up the theoretical largest size string,
    // with a terminating character and m _CharT elements, it'd
    // look like this:
    // npos = sizeof(_Rep) + (m * sizeof(_CharT)) + sizeof(_CharT)
    // Solving for m:
    // m = ((npos - sizeof(_Rep))/sizeof(CharT)) - 1
    // In addition, this implementation quarters this amount.
    static const size_type  _S_max_size;
    static const _CharT _S_terminal;

    // The following storage is init'd to 0 by the linker, resulting
        // (carefully) in an empty string with one reference.
        static size_type _S_empty_rep_storage[];

        static _Rep&
        _S_empty_rep()
        { 
      // NB: Mild hack to avoid strict-aliasing warnings.  Note that
      // _S_empty_rep_storage is never modified and the punning should
      // be reasonably safe in this case.
      void* __p = reinterpret_cast<void*>(&_S_empty_rep_storage);
      return *reinterpret_cast<_Rep*>(__p);
    }

        bool
    _M_is_leaked() const
        { return this->_M_refcount < 0; }

        bool
    _M_is_shared() const
        { return this->_M_refcount > 0; }

        void
    _M_set_leaked()
        { this->_M_refcount = -1; }

        void
    _M_set_sharable()
        { this->_M_refcount = 0; }

    void
    _M_set_length_and_sharable(size_type __n)
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
        {
          this->_M_set_sharable();  // One reference.
          this->_M_length = __n;
          traits_type::assign(this->_M_refdata()[__n], _S_terminal);
          // grrr. (per 21.3.4)
          // You cannot leave those LWG people alone for a second.
        }
    }

    _CharT*
    _M_refdata() throw()
    { return reinterpret_cast<_CharT*>(this + 1); }

    _CharT*
    _M_grab(const _Alloc& __alloc1, const _Alloc& __alloc2)
    {
      return (!_M_is_leaked() && __alloc1 == __alloc2)
              ? _M_refcopy() : _M_clone(__alloc1);
    }

    // Create & Destroy
    static _Rep*
    _S_create(size_type, size_type, const _Alloc&);

    void
    _M_dispose(const _Alloc& __a)
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
        if (__gnu_cxx::__exchange_and_add_dispatch(&this->_M_refcount,
                               -1) <= 0)
          _M_destroy(__a);
    }  // XXX MT

    void
    _M_destroy(const _Alloc&) throw();

    _CharT*
    _M_refcopy() throw()
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
            __gnu_cxx::__atomic_add_dispatch(&this->_M_refcount, 1);
      return _M_refdata();
    }  // XXX MT

    _CharT*
    _M_clone(const _Alloc&, size_type __res = 0);
      };

实际数据可以通过以下方式获得:

_Rep* _M_rep() const
      { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

此代码片段来自我机器上的文件 basic_string.h位于 usr/include/c++/4.4/bits/basic_string.h

因此,如您所见,差异非常显着。

'\0' 

is an ASCII character with code 0, null terminator, null character, NUL. In C language it serves as a reserved character used to signify the end of a string. Many standard functions such as strcpy, strlen, strcmp among others rely on this. Otherwise, if there was no NUL, another way to signal end of string must have been used:

This allows the string to be any length with only the overhead of one
byte; the alternative of storing a count requires either a string
length limit of 255 or an overhead of more than one byte.

from wikipedia

C++ std::string follows this other convention and its data is represented by a structure called _Rep:

// _Rep: string representation
      //   Invariants:
      //   1. String really contains _M_length + 1 characters: due to 21.3.4
      //      must be kept null-terminated.
      //   2. _M_capacity >= _M_length
      //      Allocated memory is always (_M_capacity + 1) * sizeof(_CharT).
      //   3. _M_refcount has three states:
      //      -1: leaked, one reference, no ref-copies allowed, non-const.
      //       0: one reference, non-const.
      //     n>0: n + 1 references, operations require a lock, const.
      //   4. All fields==0 is an empty string, given the extra storage
      //      beyond-the-end for a null terminator; thus, the shared
      //      empty string representation needs no constructor.

      struct _Rep_base
      {
    size_type       _M_length;
    size_type       _M_capacity;
    _Atomic_word        _M_refcount;
      };

struct _Rep : _Rep_base
      {
    // Types:
    typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;

    // (Public) Data members:

    // The maximum number of individual char_type elements of an
    // individual string is determined by _S_max_size. This is the
    // value that will be returned by max_size().  (Whereas npos
    // is the maximum number of bytes the allocator can allocate.)
    // If one was to divvy up the theoretical largest size string,
    // with a terminating character and m _CharT elements, it'd
    // look like this:
    // npos = sizeof(_Rep) + (m * sizeof(_CharT)) + sizeof(_CharT)
    // Solving for m:
    // m = ((npos - sizeof(_Rep))/sizeof(CharT)) - 1
    // In addition, this implementation quarters this amount.
    static const size_type  _S_max_size;
    static const _CharT _S_terminal;

    // The following storage is init'd to 0 by the linker, resulting
        // (carefully) in an empty string with one reference.
        static size_type _S_empty_rep_storage[];

        static _Rep&
        _S_empty_rep()
        { 
      // NB: Mild hack to avoid strict-aliasing warnings.  Note that
      // _S_empty_rep_storage is never modified and the punning should
      // be reasonably safe in this case.
      void* __p = reinterpret_cast<void*>(&_S_empty_rep_storage);
      return *reinterpret_cast<_Rep*>(__p);
    }

        bool
    _M_is_leaked() const
        { return this->_M_refcount < 0; }

        bool
    _M_is_shared() const
        { return this->_M_refcount > 0; }

        void
    _M_set_leaked()
        { this->_M_refcount = -1; }

        void
    _M_set_sharable()
        { this->_M_refcount = 0; }

    void
    _M_set_length_and_sharable(size_type __n)
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
        {
          this->_M_set_sharable();  // One reference.
          this->_M_length = __n;
          traits_type::assign(this->_M_refdata()[__n], _S_terminal);
          // grrr. (per 21.3.4)
          // You cannot leave those LWG people alone for a second.
        }
    }

    _CharT*
    _M_refdata() throw()
    { return reinterpret_cast<_CharT*>(this + 1); }

    _CharT*
    _M_grab(const _Alloc& __alloc1, const _Alloc& __alloc2)
    {
      return (!_M_is_leaked() && __alloc1 == __alloc2)
              ? _M_refcopy() : _M_clone(__alloc1);
    }

    // Create & Destroy
    static _Rep*
    _S_create(size_type, size_type, const _Alloc&);

    void
    _M_dispose(const _Alloc& __a)
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
        if (__gnu_cxx::__exchange_and_add_dispatch(&this->_M_refcount,
                               -1) <= 0)
          _M_destroy(__a);
    }  // XXX MT

    void
    _M_destroy(const _Alloc&) throw();

    _CharT*
    _M_refcopy() throw()
    {
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
      if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
            __gnu_cxx::__atomic_add_dispatch(&this->_M_refcount, 1);
      return _M_refdata();
    }  // XXX MT

    _CharT*
    _M_clone(const _Alloc&, size_type __res = 0);
      };

the actual data might be obtained with:

_Rep* _M_rep() const
      { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

this code snippet comes from file basic_string.h which on my machine is located in usr/include/c++/4.4/bits/basic_string.h

So as you can see, the difference is significant.

孤独难免 2024-08-24 01:20:38

以 null 结尾的字符串意味着字符串的结尾是通过出现 null 字符(所有位均为零)来定义的。

“其他字符串”例如必须存储它们自己的长度。

A null-terminated string means that the end of your string is defined through the occurrence of a null-char (all bits are zero).

"Other strings" e.g. have to store their own lenght.

不知所踪 2024-08-24 01:20:38

以 null 结尾的字符串是 C 中的本机字符串格式。例如,字符串文字被实现为以 null 结尾。因此,大量代码(首先是 C 运行时库)假设字符串以 null 结尾。

A null-terminated string is a native string format in C. String literals, for example, are implemented as null-terminated. As a result, a whole lot of code (C run-time library to begin with) assumes that strings are null-terminated.

残疾 2024-08-24 01:20:38

以 null 结尾的字符串(c 字符串)是一个字符数组,该数组的最后一个元素是 0x0 值。 std::string 本质上是一个向量,因为它是一个自动调整大小的值容器。它不需要空终止符,因为它必须跟踪大小以了解何时需要调整大小。

老实说,我更喜欢 C 字符串而不是 std 字符串,它们只是在基本库中有更多应用程序,代码和分配最少,因此更难使用。

A null terminated string (c-string) is an array of char's, and the last element of the array being a 0x0 value. The std::string is essentially a vector, in that it is an auto-resizing container for values. It does not need a null terminator since it must keep track of size to know when a resize is needed.

Honestly, I prefer c-strings over std ones, they just have more applications in the basic libraries, the ones with minimal code and allocations, and the harder to use because of that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文