All may sound fine and dandy, until we write a program and want to switch to another thread. What thread? How do threads get started? How do threads get stopped? We need a means of getting threads going.
Lubomir Bic5 describes one functionally nice way to initiate parallel processes using a cobeginS0| S1|...| Sncoend construct which executes n segments Si of sequential code in parallel. However, this approach may require a major modification in the syntax of ``C'' or a complex function call with a variable number of arguments. More importantly, the forced nesting of the construct lacks the expressive power of other approaches...
A more general (but more complex) approach would be to use fork, join, and quit primitives. In this interface, an execution flow may be forked at a point in the code, or two or more flows later joined; superfluous flows can also be quit (terminated). UNIX provides a variant on this interface with its fork, wait, and exit primitives. The UNIX fork can appear anywhere in a block of code, and after the fork, both processes have access to all local variables currently defined. This is done by making an almost exact copy of the process dataspace (both stack and heap) into an identical but separate address space. This technique does an excellent job at protecting processes from each other, but it makes shared memory extremely difficult, is slow, and requires a lot of special hardware (virtual memory support) not usually found on personal computers.
For these reasons, I provide the primitive
mpthd which simply
creates a thread data object (see mpthd.c) which the caller can
manipulate as data. The
mpthd
function nor its corresponding
mpthd
function neither start nor stop the thread. But the caller
can, by explicitly switching to the new thread when appropriate. In this
way the
mpthd
routine, which suspends and resumes threads, is
also be used to start and stop them (indeed, there is no difference).
Now we've got threads going, but how do we tell them what to do?
And how do we pass them arguments? The UNIX fork provides a
beautiful way to specify the starting point of for a newly forked
process (the point of fork), and allows the forking user to pass
arguments to the new process via local variables. However, as
mentioned previously, that beauty has a high price: essentially all
data needs to be compiled with the help of extensive hardware support.
Even if we could afford the processing cost, there is no easy way to
emulate this on a personal computer as simply copied pointers will
point to their old dataspace. So, keeping consistent
with ``C'',
mpthd instead identifies the new code to be executed with
function pointer. When the thread is invoked, it calls the function,
and as ``C'' is statically scoped we need not worry about
duplicating local variables (a function cannot directly access
variables in higher scopes). But what do we do when the function
returns? We simply require the function to return a new thread id
which we switch to when the function returns; and if the switch
returns, we just call the function again. (Thus, in this special
thread function (THDFN), the return argument is quasi-equivalent to
its return address – but, unlike normal function calls, this one
doesn't have to return.
And what about arguments? We give each thread a little heap which lives in the space leftover by its stack. In this local heap, a creating thread can store arguments for the new thread, and after a thread has executed, it can leave arguments in its heap for other threads to read. Admittedly, this technique is a bit awkward, but it is a lot more efficient than copying the entire data space.
In the implementation of
mpthd, there are a few interesting things
to note. Notice the caller must also specify the size of buffer
it intends to store the thread context in. The first use of this
information is to figure out where to start the new stack, since on most
machines, the stack grows downward and must be started from the top of
the buffer. But another use of this number is to see if the context will be
large enough for any thread to run; as most personal computer operating
system calls use the caller's stack to do their work, a reasonable size
stack will be needed for even the simplest "Hello, World!" function.
And in the implementation of
mpthd, also notice that function
waits until the processor can temporarily claim the thread. In other
words, it waits to be sure processor is executing the thread: you don't
destroy the house while you (or someone else) is in it.
Finally, a last note about the interface of all the initialization and deinitialization functions. Notice they all take/return a pointer to a buffer in which to store the structure, rather than dynamically allocate the structure with, say, a malloc command. The reason is three fold: decreased operating system/``C'' environment dependence; the caller may want to do his own (perhaps static) allocation for speed, or want to store the structure in his own field; and finally, dynamic memory allocation is a shared resource, and we haven't gotten to the proper management of shared resources yet (see upcoming articles).