Please humor me as I theorize about an alternative CPU architecture.
That hopes to be simpler than what we've got whilst doing a better job at extracting parallelism and protecting sandboxes. Because I don't think imperative programming was the correct paradigm for machine code.
Functional would've been better.
Let's start with a data model. I'd store all data as consisting of:
* a callback function
* a refcount
* a fixed number of callback-specific fields.
The only data and operations that can be processed would be stored in those fields, or compiled into the callback.
If more data is needed more "thunks" can be used. And if less is needed, you don't have to use all those fields.
I'd further split the memory up so each CPU core only has access to it's own smallish section (thereby decreasing pointer sizes).
Though they'd also be able to address a few thunks from their "neighboring" cores and the RAM in order to store data there when they fill up, and request the computed data later. The RAM might require the data to be precomputed, and might compact the data for storage.
So each core would track two thunks it's allowed to directly access data from: one for variables, another for arguments.
And they'd have operations for:
* allocating a new thunk, filling it with data, and storing it in a variable slot.
* tail-recursing to one of the accessible thunks or a compiled-in function.
* pushing a new context with a branch table
* popping the current context whilst executing one of those branches
It might be easiest to lower these to smaller micro ops.
Between threads, I'd allow tasks chosen by a the software or a JIT (or upon stack overflow) to be offloaded onto another core, and for a core to wait for that computation to complete.
Maybe it'd queue up work (represented as always as thunks) to do in the meantime. It could even invent busywork by computing whatever data it has laying around.
But the biggest opportunity I see to drive parallelism is from the output circuitry, what I've described wouldn't have computed enough for it.
To handle output it'd need to have circuitry that asks for each field to be computed, before serializing it into an acceptable format. Though if one of those fields represents *when* this output needs to be delivered, this circuit needs to be able to handle that.
Maybe the machine code would be compiled in this way.
And for input I'd need a circuit that deserializes and time stamps external data, whilst having a special response for not yet received data.
Finally, as for math that'd benefit from a different form of a parallelism. So I'd give software a pseudo-function that sends math operators to a special circuit. And if multiple formulas are sent before the previous finishes I'd have it send them back to the main processor for optimization/merging.
Because this maths processor would essentially be a large SIMD circuit mostly yielding comparison results to the main CPU.
@alcinnz You should check out the late-80s/early-90s work on dataflow machines, by Arvind and others. Also maybe TRIPS/EDGE more recently