Introduction

Advanced discussions of computer architecture are often still constrained to the classic view of generic processors and serial computations. And yet, there is so much more: the semiconductor industry is full of surprising ideas with tremendous potential, but little public interest. Even more tragically, the terms of the discussion are often downright wrong and miss the big picture.

This article hopes to serve as a quick and fairly basic primer to help explore both classic and exotic architectures in the coming months, right here on Beyond3D. It’s neither exhaustive nor highly precise, but that is fully intended as otherwise there wouldn’t be much left to talk about in architecture-specific content. Some parts may be more difficult to understand if you have little practical experience in the field, but should not be essential and therefore could be skipped.

If you disagree with some of our claims or explanations, we’d love you to comment in our forum thread – we're hopeful most disagreements will fall squarely in the field of semantics (and perhaps on emphasis and the quality of the examples). With that out of the way, let’s get right on it…

The Universal Machines

It would be poor form on our part to start such an article with anything other than a quick mention of Planet Earth’s very first Turing Machine, an event that will millions of years from now likely be considered even more of a true landmark. Despite never being constructed, this machine was Charles Babbage’s ‘Analytical Engine’. Certainly it wasn’t a minimal implementation of a Turing Machine and Babbage didn’t yet have a firm grasp of all the theoretical concepts that would come later, but it remains a truly brilliant design, especially for something that was first described in 1837! (Coincidentally, that was the same year Charles Darwin drew the very first sketch of a tree of life…)

The Analytical Engine was Turing-complete: it could store a program, read and write data, execute basic arithmetic operations (which could provably emulate any other operation) and conditionally jump to another part of the program. It could effectively compute any mathematical function, but still wasn’t as flexible as the classic von Neumann architecture: the results of a computation could not be used to determine precisely where to load or store data (i.e., it didn’t support pointers), and stored data could not be read as a program (which is the foundation of executable files on modern computers).

Let us consider what an extremely lightweight implementation of the von Neumann architecture would look like. Given that any logic gate, and thus any operation, can be implemented with either NAND or NOR logic gates, a minimal implementation doesn’t need to support any other operation and the pieces of stored data we will access may be single bits*.

The program needs to store only three storage locations per instruction and a single bit that tells us what the third operand refers to; is it the location to store the result of the operation, or is it the location of the next instruction to execute? If it is the latter and the result of the operation is true, we execute the instruction at that storage location; otherwise, we simply increment our instruction location by the length of our instructions (i.e. we read the instruction directly after the current one).

Such an implementation of the von Neumann architecture would be infinitely flexible; it could emulate any other Universal Turing Machine as well as any piece of fixed-function logic. However, it would be seriously inefficient for any practical purpose. This is why more complex Turing-complete processor architectures need to be considered (which is the main subject of this series), and also why there remains a significant amount of fixed-function logic in modern digital chips.

*: Advanced: Any other set of operations which is capable of emulating a NAND or NOR gate may also be appropriate, of course. Note that if you only support NAND and/or NOR operations, you need to be able to specify unique storage locations for the two operands being read and the operand being written (i.e. the x86-like sharing of operands wouldn’t work here) – this restriction may be less severe with other sets of operations, or even more severe if some operations may use more than three operands.