编译器处理结构体的原理

发布于 2022-08-30 01:26:06 字数 158 浏览 34 评论 0

比如 struct {short :2; short b :14;}c;,编译器是怎么根据语法树知道需要为 c 分配一个两字节空间,并且里面有两个位域的?

这只是一个例子,想知道编译器在生成语法树后,是怎么处理并展开结构体变量的?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青巷忧颜 2022-09-06 01:26:06

(我是搬運工)

Structure with information about how a bitfield should be accessed.

Often we layout a sequence of bitfields as a contiguous sequence of bits.
When the AST record layout does this, we represent it in the LLVM IR's type
as either a sequence of i8 members or a byte array to reserve the number of
bytes touched without forcing any particular alignment beyond the basic
character alignment.

Then accessing a particular bitfield involves converting this byte array
into a single integer of that size (i24 or i40 -- may not be power-of-two
size), loading it, and shifting and masking to extract the particular
subsequence of bits which make up that particular bitfield. This structure
encodes the information used to construct the extraction code sequences.
The CGRecordLayout also has a field index which encodes which byte-sequence
this bitfield falls within. Let's assume the following C struct:

struct S {
  char a, b, c;
  unsigned bits : 3;
  unsigned more_bits : 4;
  unsigned still_more_bits : 7;
};

This will end up as the following LLVM type. The first array is the
bitfield, and the second is the padding out to a 4-byte alignmnet.

%t = type { i8, i8, i8, i8, i8, [3 x i8] }

When generating code to access more_bits, we'll generate something
essentially like this:

define i32 @foo(%t* %base) {
  %0 = gep %t* %base, i32 0, i32 3
  %2 = load i8* %1
  %3 = lshr i8 %2, 3
  %4 = and i8 %3, 15
  %5 = zext i8 %4 to i32
  ret i32 %i
}

參考資料:
[1]: http://clang.llvm.org/doxygen/CGRecordLayout_8h_source.html
[2]: http://www.zhihu.com/question/26415342/answer/32741740

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文