深入装配
c
中的函数:
PHPAPI char *php_pcre_replace(char *regex, int regex_len,
char *subject, int subject_len,
zval *replace_val, int is_callable_replace,
int *result_len, int limit, int *replace_count TSRMLS_DC)
{
pcre_cache_entry *pce; /* Compiled regular expression */
/* Compile regex or get it from cache. */
if ((pce = pcre_get_compiled_regex_cache(regex, regex_len TSRMLS_CC)) == NULL) {
return NULL;
}
....
}
其汇编:
php5ts!php_pcre_replace:
1015db70 8b442408 mov eax,dword ptr [esp+8]
1015db74 8b4c2404 mov ecx,dword ptr [esp+4]
1015db78 56 push esi
1015db79 8b74242c mov esi,dword ptr [esp+2Ch]
1015db7d 56 push esi
1015db7e 50 push eax
1015db7f 51 push ecx
1015db80 e8cbeaffff call php5ts!pcre_get_compiled_regex_cache (1015c650)
1015db85 83c40c add esp,0Ch
1015db88 85c0 test eax,eax
1015db8a 7502 jne php5ts!php_pcre_replace+0x1e (1015db8e)
php5ts!php_pcre_replace+0x1c:
1015db8c 5e pop esi
1015db8d c3 ret
c函数调用pcre_get_compiled_regex_cache(regex, regex_len TSRMLS_CC)
对应于1015db7d~1015db80
,它将3个参数推送到堆栈并调用它。
但我的疑问是,在这么多寄存器中,编译器如何决定使用eax
,ecx
和esi
(这个很特殊,因为它是使用前恢复,为什么?)作为中间进栈?
c 中一定有一些隐藏的指示告诉编译器这样做,对吧?
Function in c
:
PHPAPI char *php_pcre_replace(char *regex, int regex_len,
char *subject, int subject_len,
zval *replace_val, int is_callable_replace,
int *result_len, int limit, int *replace_count TSRMLS_DC)
{
pcre_cache_entry *pce; /* Compiled regular expression */
/* Compile regex or get it from cache. */
if ((pce = pcre_get_compiled_regex_cache(regex, regex_len TSRMLS_CC)) == NULL) {
return NULL;
}
....
}
Its assembly:
php5ts!php_pcre_replace:
1015db70 8b442408 mov eax,dword ptr [esp+8]
1015db74 8b4c2404 mov ecx,dword ptr [esp+4]
1015db78 56 push esi
1015db79 8b74242c mov esi,dword ptr [esp+2Ch]
1015db7d 56 push esi
1015db7e 50 push eax
1015db7f 51 push ecx
1015db80 e8cbeaffff call php5ts!pcre_get_compiled_regex_cache (1015c650)
1015db85 83c40c add esp,0Ch
1015db88 85c0 test eax,eax
1015db8a 7502 jne php5ts!php_pcre_replace+0x1e (1015db8e)
php5ts!php_pcre_replace+0x1c:
1015db8c 5e pop esi
1015db8d c3 ret
The c function call pcre_get_compiled_regex_cache(regex, regex_len TSRMLS_CC)
corresponds to 1015db7d~1015db80
which pushes the 3 parameters to the stack and call it.
But my doubt is,among so many registers,how does the compiler decide to use eax
,ecx
and esi
(this is special,as it's restored before using,why?) as the intermediate to carry to the stack?
There must be some hidden indication in c that tells the compiler to do it this way,right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不,没有隐藏的指示。
这是生成 80x86 指令的典型策略,许多编译器实现(
C
等)都使用该策略。例如,20 世纪 80 年代的 Intel Fortran-77 编译器在打开优化时会执行相同的操作。也就是说,优先使用 eax 和 ecx 可能是避免使用 esi 和 edi 的产物,因为这些寄存器不能直接用于加载字节操作数。
为什么不是
ebx
和edx
?嗯,许多代码生成器首选这些来在评估复杂结构评估时保存中间指针,也就是说,根本没有太多理由。编译器只是查找两个可用的寄存器来使用并覆盖它们以缓冲值。为什么不像这样重用
eax
?:因为这会导致管道停顿,等待
eax
完成之前的内存周期,自80586以来的80x86秒(可能是80486 - 时间太久远了)一定要离开我的头顶)。x86 架构是一头奇怪的野兽。每个寄存器虽然被英特尔宣传为“通用”,但都有其怪癖(例如,
cx
/ecx
与loop
指令相关联,并且 eax:edx 与乘法指令相关)。再加上优化执行以避免缓存未命中和管道停顿的特殊方法,通常会导致代码生成器生成难以理解的代码,而代码生成器将所有这些因素都考虑在内。No, there is no hidden indication.
This is a typical strategy for generating 80x86 instructions used by many compiler implementations,
C
and otherwise. For example, the 1980s Intel Fortran-77 compiler, when optimization was turned on, did the same thing.That is uses
eax
andecx
preferentially is probably an artifact of avoiding use ofesi
andedi
since those registers cannot directly be used to load byte operands.Why not
ebx
andedx
? Well, those are preferred by many code generators for holding intermediate pointers in evaluating complex structure evaluation, which is to say, there isn't much reason at all. The compiler just looked for two available registers to use and overwrote them to buffer the values.Why not reuse
eax
like this?:Because that causes pipeline stalls waiting for
eax
to complete previous memory cycles, in 80x86s since the 80586 (maybe 80486—it's too long ago to be sure off the top of my head).The x86 architecture is a strange beast. Each register, though promoted as being "general purpose" by Intel, has its quirks (
cx
/ecx
is tied to theloop
instruction for example, andeax:edx
is tied to the multiply instruction). That combined with the peculiar ways to optimize execution to avoid cache misses and pipeline stalls often leads to inscrutable generated code by a code generator which factors all that in.