MySQL 中日期是如何编码/存储的？

发布于 01-15 12:58 字数 415 浏览 6 评论 0原文

我必须从 C++ 上的应用程序的数据库中获取的原始字节中解析日期。我发现MySQL中的日期是4个字节，最后两个分别是月份和日期。但是前两个字节奇怪地编码了年份，因此如果日期是2002-08-30，则内容将为210,15,8,31。如果日期是1996-12-22，则日期将存储为204 、15、12、22。显然，第一个字节不能大于255，所以我检查了2047年——它是255、15和2048——它是128、16。

一开始我以为关键是二进制运算，但我做到了不太明白逻辑：

2047: 0111 1111 1111
255:  0000 1111 1111
15:   0000 0000 1111

2048: 1000 0000 0000
128:  0000 1000 0000
16:   0000 0001 0000

有什么想法吗？

原文

I have to parse date from raw bytes I get from the database for my application on C++. I've found out that date in MySQL is 4 bytes and the last two are month and day respectively. But the first two bytes strangely encode the year, so if the date is 2002-08-30, the content will be 210, 15, 8, 31. If the date is 1996-12-22, the date will be stored as 204, 15, 12, 22.
Obviously, the first byte can't be bigger than 255, so I've checked year 2047 -- it's 255, 15, and 2048 -- it's 128, 16.

At first I thought that the key is binary operations, but I did not quite understand the logic:

2047: 0111 1111 1111
255:  0000 1111 1111
15:   0000 0000 1111

2048: 1000 0000 0000
128:  0000 1000 0000
16:   0000 0001 0000

Any idea?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅语花开2025-01-22 12:58:31

我们遇到了同样的问题，并开发了以下 C++20 辅助方法，用于与 mysqlx (MySQL Connector/C++ 8.0 X DevAPI) 一起进行生产，以正确读取 DATE、DATETIME 和 TIMESTAMP 字段：

#pragma once

#include <vector>
#include <cstddef>
#include <chrono>
#include <mysqlx/xdevapi.h>

namespace mysqlx {

static inline std::vector<uint64_t>
mysqlx_raw_as_u64_vector(const mysqlx::Value& in_value)
{
  std::vector<uint64_t> out;

  const auto bytes = in_value.getRawBytes();
  auto ptr = reinterpret_cast<const std::byte*>(bytes.first);
  auto end = reinterpret_cast<const std::byte*>(bytes.first) + bytes.second;

  while (ptr != end) {
    static constexpr std::byte carry_flag{0b1000'0000};
    static constexpr std::byte value_mask{0b0111'1111};

    uint64_t v = 0;
    uint64_t shift = 0;
    bool is_carry;
    do {
      auto byte = *ptr;
      is_carry = (byte & carry_flag) == carry_flag;
      v |= std::to_integer<uint64_t>(byte & value_mask) << shift;

      ++ptr;
      shift += 7;
    } while (is_carry && ptr != end && shift <= 63);

    out.push_back(v);
  }

  return out;
}

static inline std::chrono::year_month_day
read_date(const mysqlx::Value& value)
{
  const auto vector = mysqlx_raw_as_u64_vector(value);
  if (vector.size() < 3)
    throw std::out_of_range{"Value is not a valid DATE"};

  return std::chrono::year{static_cast<int>(vector.at(0))} / static_cast<int>(vector.at(1)) / static_cast<int>(vector.at(2));
}

static inline std::chrono::system_clock::time_point
read_date_time(const mysqlx::Value& value)
{
  const auto vector = mysqlx_raw_as_u64_vector(value);
  if (vector.size() < 3)
    throw std::out_of_range{"Value is not a valid DATETIME"};

  auto ymd = std::chrono::year{static_cast<int>(vector.at(0))} / static_cast<int>(vector.at(1)) / static_cast<int>(vector.at(2));
  auto sys_days = std::chrono::sys_days{ymd};

  auto out = std::chrono::system_clock::time_point(sys_days);

  auto it = vector.begin() + 2;
  auto end = vector.end();

  if (++it == end)
    return out;
  out += std::chrono::hours{*it};

  if (++it == end)
    return out;
  out += std::chrono::minutes{*it};

  if (++it == end)
    return out;
  out += std::chrono::seconds{*it};

  if (++it == end)
    return out;
  out += std::chrono::microseconds{*it};

  return out;
}

} //namespace

然后可以按如下方式使用：

auto row = table.select("datetime", "date").execute().fetchOne();
auto time_point = read_date_time(row[0]);
auto year_month_day = read_date(row[1]);

getBytes 文档链接到 ColumnMetaData 文档 url。

ColumnMetaData文档链接到protobuf编码url。

protobuf 编码 url / Protocol Buffers 文档文档说：

基础 128 变体
可变宽度整数或变体是连线的核心
格式。它们允许在任何地方使用无符号 64 位整数进行编码
1 到 10 个字节之间，较小的值使用较少的字节。
varint 中的每个字节都有一个连续位，指示是否
它后面的字节是 varint 的一部分。这是最
字节的有效位 (MSB)（有时也称为符号
少量）。低7位是有效负载；生成结果整数
通过将其组成字节的 7 位有效负载附加在一起。
例如，这里是数字 1，编码为 01 – 它是一个
字节，因此未设置 MSB：
<前><代码>0000 0001
^ 最高位
这里是 150，编码为 9601 – 这有点复杂：
<前><代码>10010110 00000001
^ 最高位 ^ 最高位
你怎么知道这是150？首先你从
每个字节，因为这只是告诉我们是否已经到达
数字的末尾（如您所见，它设置在第一个字节中，如下所示）
varint 中多于一个字节）。然后我们连接7位
有效负载，并将其解释为小尾数、64 位无符号
整数：
10010110 00000001 // 原始输入。
 0010110 0000001 // 删除连续位。
 0000001 0010110 // 放入小端顺序。
 10010110 // 连接。
 128 + 16 + 4 + 2 = 150 // 解释为整数。
因为在 protoscope 中，varints 对于协议缓冲区非常重要
语法中，我们将它们称为普通整数。 150 与 9601 相同。

We had the same issue and developed the following C++20 helper methods for production use with mysqlx (MySQL Connector/C++ 8.0 X DevAPI) to properly read DATE, DATETIME and TIMESTAMP fields:

#pragma once

#include <vector>
#include <cstddef>
#include <chrono>
#include <mysqlx/xdevapi.h>

namespace mysqlx {

static inline std::vector<uint64_t>
mysqlx_raw_as_u64_vector(const mysqlx::Value& in_value)
{
  std::vector<uint64_t> out;

  const auto bytes = in_value.getRawBytes();
  auto ptr = reinterpret_cast<const std::byte*>(bytes.first);
  auto end = reinterpret_cast<const std::byte*>(bytes.first) + bytes.second;

  while (ptr != end) {
    static constexpr std::byte carry_flag{0b1000'0000};
    static constexpr std::byte value_mask{0b0111'1111};

    uint64_t v = 0;
    uint64_t shift = 0;
    bool is_carry;
    do {
      auto byte = *ptr;
      is_carry = (byte & carry_flag) == carry_flag;
      v |= std::to_integer<uint64_t>(byte & value_mask) << shift;

      ++ptr;
      shift += 7;
    } while (is_carry && ptr != end && shift <= 63);

    out.push_back(v);
  }

  return out;
}

static inline std::chrono::year_month_day
read_date(const mysqlx::Value& value)
{
  const auto vector = mysqlx_raw_as_u64_vector(value);
  if (vector.size() < 3)
    throw std::out_of_range{"Value is not a valid DATE"};

  return std::chrono::year{static_cast<int>(vector.at(0))} / static_cast<int>(vector.at(1)) / static_cast<int>(vector.at(2));
}

static inline std::chrono::system_clock::time_point
read_date_time(const mysqlx::Value& value)
{
  const auto vector = mysqlx_raw_as_u64_vector(value);
  if (vector.size() < 3)
    throw std::out_of_range{"Value is not a valid DATETIME"};

  auto ymd = std::chrono::year{static_cast<int>(vector.at(0))} / static_cast<int>(vector.at(1)) / static_cast<int>(vector.at(2));
  auto sys_days = std::chrono::sys_days{ymd};

  auto out = std::chrono::system_clock::time_point(sys_days);

  auto it = vector.begin() + 2;
  auto end = vector.end();

  if (++it == end)
    return out;
  out += std::chrono::hours{*it};

  if (++it == end)
    return out;
  out += std::chrono::minutes{*it};

  if (++it == end)
    return out;
  out += std::chrono::seconds{*it};

  if (++it == end)
    return out;
  out += std::chrono::microseconds{*it};

  return out;
}

} //namespace

Which can then be used as follows:

auto row = table.select("datetime", "date").execute().fetchOne();
auto time_point = read_date_time(row[0]);
auto year_month_day = read_date(row[1]);

getBytes document links to ColumnMetaData document url.

ColumnMetaData document links to protobuf encoding url.

protobuf encoding url / Protocol Buffers Documentation Documentation say :

Base 128 Varints
Variable-width integers, or varints, are at the core of the wire
format. They allow encoding unsigned 64-bit integers using anywhere
between one and ten bytes, with small values using fewer bytes.
Each byte in the varint has a continuation bit that indicates if the
byte that follows it is part of the varint. This is the most
significant bit (MSB) of the byte (sometimes also called the sign
bit). The lower 7 bits are a payload; the resulting integer is built
by appending together the 7-bit payloads of its constituent bytes.
So, for example, here is the number 1, encoded as 01 – it’s a single
byte, so the MSB is not set:
0000 0001
^ msb
And here is 150, encoded as 9601 – this is a bit more complicated:
10010110 00000001
^ msb    ^ msb
How do you figure out that this is 150? First you drop the MSB from
each byte, as this is just there to tell us whether we’ve reached the
end of the number (as you can see, it’s set in the first byte as there
is more than one byte in the varint). Then we concatenate the 7-bit
payloads, and interpret it as a little-endian, 64-bit unsigned
integer:
10010110 00000001        // Original inputs.
 0010110  0000001        // Drop continuation bits.
 0000001  0010110        // Put into little-endian order.
 10010110                // Concatenate.
 128 + 16 + 4 + 2 = 150  // Interpret as integer.
Because varints are so crucial to protocol buffers, in protoscope
syntax, we refer to them as plain integers. 150 is the same as 9601.

回复收藏 0 原文

温折酒2025-01-22 12:58:31

根据您提供的信息，它似乎是N1 - 128 + N2 * 128。

回复收藏 0 原文

因为看清所以看轻2025-01-22 12:58:31

看来编码的逻辑是擦除第一个数字的最高有效位，然后从这个擦除的位写入第二个数字，如下所示：

2002 from 210 and 15:

1101 0010 -> _101 0010;
0000 1111 + _101 0010 -> 0111 1101 0010

2048 from 128 and 16:

1000 0000 -> _000 0000
0001 0000 + _000 0000 -> 1000 0000 0000

It seems that the logic of encoding is to erase the most significant bit of the first number and to write the second number from this erased bit like this:

2002 from 210 and 15:

1101 0010 -> _101 0010;
0000 1111 + _101 0010 -> 0111 1101 0010

2048 from 128 and 16:

1000 0000 -> _000 0000
0001 0000 + _000 0000 -> 1000 0000 0000

回复收藏 0 原文

下雨或天晴2025-01-22 12:58:31

哪个版本？？？

DATETIME过去以压缩十进制（8 个字节）编码。但是，当添加秒小数部分时，格式会更改为类似

长度指示（1 字节）
INT UNSIGNED 的秒数 - 自 1970 年以来（4 字节）
小数秒（0-3 字节）

DATE 的存储方式与 MEDIUMINT UNSIGNED （3 个字节）一样，为自 0000-00-00（或类似的时间）以来的天数。

你是如何获得“原始字节”的？没有任何功能可以让你这样做。 Select HEX(some-date) 首先转换为字符串（如“2022-03-22”），然后取其十六进制。这将为您提供 323032322D30332D3232。

回复收藏 0 原文

烟酒忠诚2025-01-22 12:58:31

关于代码请参考答案。

关于文档内容检查以下文字：

getBytes文档链接到ColumnMetaData 文档 URL。

ColumnMetaData文档链接到protobuf编码url。

protobuf 编码 url / Protocol Buffers 文档文档说：

基础 128 变体
可变宽度整数或变体是连线的核心
格式。它们允许在任何地方使用无符号 64 位整数进行编码
1 到 10 个字节之间，较小的值使用较少的字节。
varint 中的每个字节都有一个连续位，指示是否
它后面的字节是 varint 的一部分。这是最重要的
字节的有效位 (MSB)（有时也称为符号
少量）。低7位是有效负载；生成结果整数
通过将其组成字节的 7 位有效负载附加在一起。
例如，这里是数字 1，编码为 01 – 它是一个
字节，因此未设置 MSB：
<前><代码>0000 0001
^ 最高位
这里是 150，编码为 9601 – 这有点复杂：
<前><代码>10010110 00000001
^ 最高位 ^ 最高位
你怎么知道这是150？首先你从
每个字节，因为这只是告诉我们是否已经到达
数字的末尾（如您所见，它设置在第一个字节中，如下所示）
varint 中多于一个字节）。然后我们连接7位
有效负载，并将其解释为小尾数、64 位无符号
整数：
10010110 00000001 // 原始输入。
 0010110 0000001 // 删除连续位。
 0000001 0010110 // 放入小端顺序。
 10010110 // 连接。
 128 + 16 + 4 + 2 = 150 // 解释为整数。
因为在 protoscope 中，varints 对于协议缓冲区非常重要
语法中，我们将它们称为普通整数。 150 与 9601 相同。

varint 中的每个字节都有一个连续位，指示其后面的字节是否是 varint 的一部分。

About Code refrence the answer.

About document content check below words :

getBytes document links to ColumnMetaData document url.

ColumnMetaData document links to protobuf encoding url.

protobuf encoding url / Protocol Buffers Documentation Documentation say :

Base 128 Varints
Variable-width integers, or varints, are at the core of the wire
format. They allow encoding unsigned 64-bit integers using anywhere
between one and ten bytes, with small values using fewer bytes.
Each byte in the varint has a continuation bit that indicates if the
byte that follows it is part of the varint. This is the most
significant bit (MSB) of the byte (sometimes also called the sign
bit). The lower 7 bits are a payload; the resulting integer is built
by appending together the 7-bit payloads of its constituent bytes.
So, for example, here is the number 1, encoded as 01 – it’s a single
byte, so the MSB is not set:
0000 0001
^ msb
And here is 150, encoded as 9601 – this is a bit more complicated:
10010110 00000001
^ msb    ^ msb
How do you figure out that this is 150? First you drop the MSB from
each byte, as this is just there to tell us whether we’ve reached the
end of the number (as you can see, it’s set in the first byte as there
is more than one byte in the varint). Then we concatenate the 7-bit
payloads, and interpret it as a little-endian, 64-bit unsigned
integer:
10010110 00000001        // Original inputs.
 0010110  0000001        // Drop continuation bits.
 0000001  0010110        // Put into little-endian order.
 10010110                // Concatenate.
 128 + 16 + 4 + 2 = 150  // Interpret as integer.
Because varints are so crucial to protocol buffers, in protoscope
syntax, we refer to them as plain integers. 150 is the same as 9601.