Conclusions and solutions in computer architecture and organization

Systolic arrays are optimal for multimedia processing, while coarse-grain reconfigurable arrays and dynamic dataflow processors may be more suitable for artificial intelligence. Wavefront array processors might be good for machine learning as well.

Von Neumann architecture for CPU and Harvard architecture for coprocessors. The former issues instructions straight to the latter. Both CPU and coprocessors load and store data straight from and to memory. Dataflow processor, like a coarse-grained reconfigurable array (CGRA), for attached processor.

Enhanced dataflow, control flow/dataflow, and dataflow/control flow are all more effective than enhanced control flow. Dataflow/control flow is the most advantageous of all the four. Enhanced dataflow and control flow/dataflow are about equally effective, maybe the latter has a slight advantage. To have the best qualities, have dataflow at the hyperblock level, Von Neumann control flow at the basic block level, and dataflow at the instruction level, thereby combining the dataflow/control flow way of data-driven multithreading (DDM) at all levels but the bottom one with control flow/dataflow way of TRIPS or DYSER at all levels but the topmost. That would be three levels, rather than two. Such hierarchical design would be great for a Datacenter processing unit (DPU), I wonder why the Fungible F1 DPU isn’t designed that way. One could have more than one dataflow level, but it wouldn’t make sense to have more than one control flow level.

Von Neumann/Dataflow hybrid architectures like MT.Monsoon, data-driven multithreading, scheduled dataflow, TRIPS, and DYSER may be suitable for infrastructure offload like datacenter processing units (DPUs). Hyperprocessor, MLCA, and Task Superscalar would also be suitable, as would a massively parallel processor array (MPPA) – each worker processor may have Tartan, Conservation Cores, or DYSER.

Correct me if I’m wrong, but I think a field-programmable object array (FPOA) is analogous to composable disaggregated infrastructure, but instead of pooling compute, storage, and accelerators at the datacenter level, it pools arithmetic logic units, register files, and multiply-accumulators at the chip level, and instead of composing servers, it composes processor cores. Similarly, polymorphous architectures like TRIPS can dynamically allocate cores and register files to particular tasks, thereby creating virtual cores within a physical core.

Design a site like this with WordPress.com
Get started