当前位置：文江博客话题详情

程序设计 CPU与编译器

cpu的两个问题

发布于 2022-09-28 10:20:24 字数 106 浏览 18 评论 0

cpu在cache中取指或取数时，若未命中，就会去读内存。请问读内存时，是只读需要的数据呢，还是和读磁盘一样也要进行预读？另外，586以上的cpu，其cache是不是采用哈佛结构，有代码和数据两个cache。谢谢

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（6）

清眉祭 2022-10-05 10:20:24

我记得cache的读取是按块读的，或者叫做cacheline吧。
ia32似乎是指令/数据cache分开的，不过不知道是不是叫做哈佛结构。

你的呼吸 2022-10-05 10:20:24

谢谢mingyanguo 兄。

[ 本帖最后由 gta 于 2007-1-26 10:07 编辑 ]

睡美人的小仙女 2022-10-05 10:20:24

原帖由 gta 于 2007-1-26 10:00 发表于 3楼
谢谢mingyanguo 兄。再问一下，pentium的cache line是多大呢

不客气。
这个大小我记不住，找找手册看看吧。
手头资料不方便。

撩起发的微风 2022-10-05 10:20:24

AMD64 的 cache 是这样的：

Cache 的结构就像一个矩阵。
行为 set , 列为 way
一个 4 way 的 cache 组织中，一个 set 有 4 个 cache line 组成
每个 cache line 由 3 个部分组成： tag 域、data 域和 other information 域
每个 cache line 为 64 bytes。

_蜘蛛 2022-10-05 10:20:24

虚拟地址经过 MMU 处理后的物理地址，为分为三个部分。

index 域：得出 cache 的 set 值，如上图所求，从 index 得出 set 为 2
tag 域：物理地址的 tag 分别与 set 中的每个 way 的 cache line 的 tag 进行比较，直到匹配（hit）。在每个 way 进行搜索是通过一个 n:1 的乘法器得出每个 way 的地点。
offset 域：当 hit 时，通过 offset 域索引出 cache line 中 data 域的具体数据。

沉溺在你眼里的海 2022-10-05 10:20:24

Intel 提供了4条 cache 预读指令：
prefetchnta、prefetcht0、prefetcht1 以及 prefetch2

AMD 增加了两条指令:
prefetch 和 prefetchw，这两条是 AMD 自已的 3D NOW 指令。

下面是示例代码：

c code:

#define num 65536
#define ARY_SIZE (num * 8)
double array_a[num]
double array_b[num]
double array_c[num]
int i;
for ( i = 0; i < num; i++) {
array_a[i] = array_b[i] * array_c[i];
}

复制代码

汇编码：

mov edx, (-num) ; Use biased index.
mov eax, OFFSET array_a ; Get address of array_a.
mov ebx, OFFSET array_b ; Get address of array_b.
mov ecx, OFFSET array_c ; Get address of array_c.
loop:
prefetchw [eax+256] ; Four cache lines ahead
prefetch [ebx+256] ; Four cache lines ahead
prefetch [ecx+256] ; Four cache lines ahead
fld QWORD PTR [ebx+edx*8+ARR_SIZE] ; b[i]
fmul QWORD PTR [ecx+edx*8+ARR_SIZE] ; b[i] * c[i]
fstp QWORD PTR [eax+edx*8+ARR_SIZE] ; a[i] = b[i] * c[i]
fld QWORD PTR [ebx+edx*8+ARR_SIZE+8] ; b[i+1]
fmul QWORD PTR [ecx+edx*8+ARR_SIZE+8] ; b[i+1] * c[i+1]
fstp QWORD PTR [eax+edx*8+ARR_SIZE+8] ; a[i+1] = b[i+1] * c[i+1]
fld QWORD PTR [ebx+edx*8+ARR_SIZE+16] ; b[i+2]
fmul QWORD PTR [ecx+edx*8+ARR_SIZE+16] ; b[i+2]*c[i+2]
fstp QWORD PTR [eax+edx*8+ARR_SIZE+16] ; a[i+2] = [i+2] * c[i+2]
fld QWORD PTR [ebx+edx*8+ARR_SIZE+24] ; b[i+3]
fmul QWORD PTR [ecx+edx*8+ARR_SIZE+24] ; b[i+3] * c[i+3]
fstp QWORD PTR [eax+edx*8+ARR_SIZE+24] ; a[i+3] = b[i+3] * c[i+3]
fld QWORD PTR [ebx+edx*8+ARR_SIZE+32] ; b[i+4]
fmul QWORD PTR [ecx+edx*8+ARR_SIZE+32] ; b[i+4] * c[i+4]
fstp QWORD PTR [eax+edx*8+ARR_SIZE+32] ; a[i+4] = b[i+4] * c[i+4]
fld QWORD PTR [ebx+edx*8+ARR_SIZE+40] ; b[i+5]
fmul QWORD PTR [ecx+edx*8+ARR_SIZE+40] ; b[i+5] * c[i+5]
fstp QWORD PTR [eax+edx*8+ARR_SIZE+40] ; a[i+5] = b[i+5] * c[i+5]
fld QWORD PTR [ebx+edx*8+ARR_SIZE+48] ; b[i+6]
fmul QWORD PTR [ecx+edx*8+ARR_SIZE+48] ; b[i+6] * c[i+6]
fstp QWORD PTR [eax+edx*8+ARR_SIZE+48] ; a[i+6] = b[i+6] * c[i+6]
fld QWORD PTR [ebx+edx*8+ARR_SIZE+56] ; b[i+7]
fmul QWORD PTR [ecx+edx*8+ARR_SIZE+56] ; b[i+7] * c[i+7]
fstp QWORD PTR [eax+edx*8+ARR_SIZE+56] ; a[i+7] = b[i+7] * c[i+7]
add edx, 8 ; Compute next 8 products
jnz loop ; until none left.

复制代码

代码中将数据装载进 cache 的 4 个 set
4 way 结构，每个 cache line 为 64 bytes, 4 * 64 byte 共 256 bytes。

[ 本帖最后由 mik 于 2007-1-27 12:11 编辑 ]

~没有更多了~

关于作者

暂无简介

文章

评论

25 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

何以畏孤独

文章 0 评论 0

南冥有猫

文章 0 评论 0

神妖

文章 0 评论 0

冷心人i

文章 0 评论 0

橘虞初梦

文章 0 评论 0

北人南面

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文