Optimization of IT infrastructure and platform technologies

Computer architecture and organization

Physical CPU’s general purpose register machines, preferably load/store such as DLX/MIPS and IBM/Freescale Power Architecture, application virtual machines stack-based like Java and Forth. Interpretation and just-in-time compilation preferred for application virtual machines, rather than ahead-of-time compilation. Bytecode interpreters and abstract syntax tree just-in-time compilers preferred, other ideas are threaded code and template interpreter. Avoid register-memory, memory-memory, and accumulator machines.

At least one scalar fixed-precision CPU and multiple vector floating-point coprocessors. The former should have Von Neumann architecture while the latter has Harvard architecture, in which the CPU issues instructions straight to each coprocessors via a dedicated bus, but both coprocessors as well as CPU load and store data straight from and to memory via the main bus. Alternatively to Von Neumann CPU, have a CPU modified Harvard architecture in which system programs are stored in and fetched from boot flash and application software is loaded from DRAM and issued to coprocessors. Either way, also have an attached dataflow processor.

  • If the vector floating-point accelerator is a digital signal processor, it should have one instruction bus, one read/write data bus for memory access, one or more write-only data bus for audio output like speakers and headphones and video output like monitors, and at least one read-only data bus for audio input like microphones.

Data-Driven Multithreading for chip multiprocessors, with Thread Synchronization Units for each core. Only synchronize within CPU sockets, not between.

In a Massively Parallel Processor Array (MPPA), each group of processor cores sharing memory is called a “cluster.” Have a few control clusters and many data clusters. Maybe the control clusters place and issue tasks to the data clusters, but each cluster stores data in its own memory and the data clusters pass data to one another as messages without going through the control clusters. What about I/O clusters? Have a 2D folded torus for Network-on-Chip.

Programming

Assembly for firmware and BIOS. Compiled PL/X, C, and/or Modula-3 for the operating system.

C and C++ for filesystems. Either C, C++, Java, or Python for object stores. Java is also great for wide-column/column-family stores.

Java for web applications.

Java wide-column store on top of either a C/C++ distributed filesystem or a C/C++ or Java object store.

Operating systems and virtual platforms

Microkernels should be multi-server, and monolithic kernels have loadable kernel modules.

Kernel replication for microkernels, not for monolithic kernels. Only microkernels should be replicated into multikernels.

Fabrics and interconnects

Switched fabric of dynamic topology for shared-everything and shared-disk systems, and direct static mesh or torus for shared-nothing systems.

For shared-memory, could have shared bus instead of switched fabric. Bus would also be good for peripherals and expansion targets.

InfiniBand for storage front-end, Fibre Channel for storage back-end.

For Cloud Data Center Infrastructure Processing Unit (DCPU, DCIPU, CDCPU, CIPU, CDCIPU, or just DPU or IPU, have Von Neumann control flow semantics for the control plane and dataflow for the data plane. That would mean RISC CPU cores like ARM Neoverse or DLX/MIPS/Gullwing for the control plane, for the data plane have architecture like, in order of preference, Data-Driven Multithreading, Wavescalar, Scheduled Dataflow, or MT.Monsoon. May nest Tartan or Conservation Cores inside Data-Driven Multithreading. Maybe the DCPU should be a MPPA like mentioned in one of the sections above.

Leave a comment

Design a site like this with WordPress.com
Get started