什么是FFP机?
在 R. Kent Dybvig 的论文“Scheme 的三种实现模型”中,他谈到了“FFP 语言”和“FFP 机器”。显然,FFP 机器与多处理器上的字符串缩减之间存在某种相关性。
谷歌搜索并没有真正发现太多解释或例子。
谁能解释一下这个话题? 谢谢。
In R. Kent Dybvig's paper "Three Implementation Models for Scheme" he speaks of "FFP languages" and "FFP machines". Apparently there is some correlation between FFP machines, and string-reduction on multiple processors.
Googling doesn't really uncover much in terms of explanations or examples.
Can anyone shed some light on this topic?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Kent Dybvig 的顾问 Gyula A. Mago 在 1987 年由 Mago 和 Stanat 发表的《FFP 机器:技术报告 87-014》中发表了详细描述。
截至撰写本文时,该 PDF 可在以下位置免费获取:
http://www.cs.unc.edu/techreports/87-014。 pdf
Kent Dybvig's advisor, Gyula A. Mago, published a detailed description in "The FFP Machine: Technical Report 87-014" in 1987 by Mago and Stanat.
As of this writing, the PDF is freely available at:
http://www.cs.unc.edu/techreports/87-014.pdf
FFP Machine 是一种非常细粒度的并行计算机架构:
每个处理器保存一个符号/原子/值。
它使用计算的字符串缩减模型,其中
最里面的函数应用程序被找到并替换为它们的
同等结果(热切评价)。
如果结果在多个地方使用,则往往会重新评估
而不是承担访问某些全球商店的费用
(但是请参阅 Mago 的论文“复制操作数与复制结果”,或者更好的是 Mago 在 1982 年函数编程语言和计算机体系结构会议上的“FFP 机器中的数据共享”)。
持有 FFP 表达的 L 细胞减少
通过 T 细胞的树状结构排列进行通信。
请注意,IC 基本上是二维的并且带有接线,
电路可以在物理空间中走向三维。
占据更高维度的互连网络
(如Hypercube、Omega、Banyan、Star等网络)
最终将无法达到理论极限附近的性能。
该通信网络是电路交换的而不是分组交换的。
数据包不包含地址,不需要路由。
来自不同缩减的数据包不能相遇,不能冲突
并且不会出现彼此拥塞的情况。
执行配置活动(称为“分区”)
在树上向上扫过一次,使用一把
对 3 位消息进行逻辑运算,留下“区域机器”,
每个都被创建来最多推进一个可简化的应用程序。
虽然从技术上讲它是时间的对数,
由此产生的区域机器可以开始通信
在分区波后面以流水线方式,
实际上会花费恒定的时间损失。
(区域机器的拆除仍然是时间的对数成本)。
一次缩减内的数据包应该而且必须满足
从而提供常用的同步。
数据包序列按其上升顺序进行排序和组合
在一个区域内,从该区域机器的根广播。
提供并行前缀和并行后缀操作
减少区域交通,因为仍然存在潜在的瓶颈
在单个可简化应用程序中。
这是在不需要展示的情况下完成的
超级计算机(纽约大学的杰克(雅各布?)施瓦茨)
每个中都有一个单独的对数大小的高速缓存
通信节点。
每个T单元(内部树节点)只需要一个FIFO缓冲区
(为了效率)尺寸大于管道路径
树顶然后返回。
(后者是我的推测,但似乎有道理)。
由于树保持数据从左到右的顺序
(与其他一些组合网络不同),该系统使细胞能够
以对数而不是线性时间旋转数据,
避免区域机器根部出现可能的拥塞。
再次值得注意的是,一个区域内的并行性
机器独立于其他机器的同时并行性
区域机器,并有许多可用的处理器
与操作数中的数据量成正比。
The FFP Machine is a very fine-grained parallel computer architecture:
each processor holds a single symbol / atom / value.
It uses a string reduction model of computation in which
innermost function applications are found and replaced by their
equivalent result (eager evaluation).
Where a result is used in several places, it tends to be re-evaluated
instead of incurring the costs of accessing some global store
(but see Mago's paper on "Copying Operands vs Copying Results", or better yet Mago's "Data Sharing in an FFP Machine" in the 1982 Functional Programming Languages and Computer Architecture conference).
The L cells holding the FFP expression being reduced
communicate through a tree structured arrangement of T cells.
Note that IC's are basically two dimensional and with wiring,
circuits can move towards being three dimensional in physical space.
Interconnection networks that occupy higher dimensions
(such as the Hypercube, Omega, Banyan, Star, etc. networks)
will eventually be unable to perform near their theoretical limit.
This communication network is circuit-switched rather than being packet-switched.
Data packets contain no addresses and do not need routing.
Packets from distinct reductions cannot meet, cannot conflict
and cannot experience congestion with each other.
The configuring activity (called "Partitioning") is performed
in a single sweep upwards in the tree, using a handful
of logic operations on 3-bit messages, leaving "area machines" in its wake,
each created to advance at most a single reducible application.
While it is technically logarithmic in time,
the resulting area machines can begin communicating
in a pipelined fashion behind the partitioning wave,
practically costing a constant time penalty.
(The dismantling of area machines remains a logarithmic cost in time).
Packets within a single reduction should, and must, meet
and thus provide a often-useful synchronization.
Sequences of packets are sorted and combined as they rise
within an area, to be broadcast from the root of the area machine.
Parallel Prefix and Parallel Suffix operations are provided
to reduce area traffic, since there remains a potential bottleneck
within an individual reducible application.
This is accomplished without the need exhibited in
the Ultracomputer (Jack (Jacob?) Schwartz at NYU)
for a separate logarithmic-sized cache memory in each
communication node.
Each T cell (internal tree node) only needs a FIFO buffer
(for efficiency) of size greater than the pipeline path to
the top of the tree and back down.
(This latter is a conjecture of mine, but it seems reasonable).
Since the tree maintains the left-to-right order of data
(unlike some other combining networks), the system enables cells
to rotate their data in logarithmic rather than linear time,
avoiding the plausible congestion at the root of the area machine.
It's worth noting again, that the parallelism within an area
machine is independent of the simultaneous parallelism in other
area machines, and has available to it a number of processors
proportional to the quantity of data in the operand.
您遇到过这个吗?:编译 APL 以在 FFP 机器上并行执行
Have you come across this yet?: Compiling APL for parallel execution on an FFP machine
正式 FP。与 FP 类似,但使用常规的无糖语法,我能为您提供的只是机器执行。
请参阅Wikis Fp 页面。
Formal FP. Similar to FP, but with regular sugarless syntax, for machine execution is all I can offer you.
See Wikis Fp page.