Threaded IO on the MacOS without Bleeding from the Ears

I'll explain input/output.
I'll explain the MacOS's various IO models.
Synchronous
Asynchronous
I'll explain how to extend the Thread Manager for effective IO
Finally, I'll explain how to couple Asynchronous IO with an extended Thread Manager to realize Threaded IO

Input/Output (IO) Explained

Whenever you need to move information into and out of your Mac's RAM, you have to deal with IO issues.
Examples of IO devices
SCSI devices like hard drives and CD-ROM drives
Serial devices like modems, printers and other Macs
ADB devices like mice and keyboards

The MacOS's IO Programming Interface

The MacOS uses chunks of code called drivers to interface with the hardware
Since drivers are rather low-level, Apple usually builds "Managers" --high level interfaces-- on top of the drivers.
The SCSI Manager
The Serial Manager
The File Manager

The MacOS's Serial IO Programming Interface

We'll concentrate on the MacOS's Serial Drivers
Why?
The Serial Drivers have always been Asynchronous
The SCSI Manager (and thus the File Manager) only recently went Asynchronous -- and only on 68040 Macs and better.
Serial ports are slow, and benefits greatly from Asynchronous IO
The Threaded IO concepts apply to all MacOS IO, including the File Manager

IO Programming Example

To help contrast the differing IO models, we'll code the same task to each model.
We'll attempt to reset the modem attached to the modem port. We'll accomplish this by writing "ATZ\r" to the modem port.

What does Synchronous IO mean?

When you execute IO synchronously, the MacOS executes the job and loops inside a routine called vSyncWait until the job is completed.
The effect presented to your program and your user is that the Mac halts until the job is completed.

Synchronous IO Example

First we'll open the modem port's serial drivers.
There are two drivers for each serial port -- an input driver and output driver. The output driver should always be opened first.
Next we'll write "ATZ" to the modem port.
Finally we'll close the modem port's serial drivers.

Opening the Modem Port's Serial Drivers

Like files, when you open a driver, a "reference number" is returned. You use this reference number whenever referring to the driver.

short inRefNum = 0, outRefNum = 0; OSErr err = OpenDriver( "\p.AOut", &outRefNum ); if( !err ) err = OpenDriver( "\p.AIn", &inRefNum );


We initially set inRefNum and outRefNum to zero to indicate the reference numbers are invalid. Bad things happen if you attempt to close a driver passing an invalid reference number.

Writing to the Modem Port's Output Driver Synchronously

We'll send the reset command

Str255 resetCmd = "\pATZ\r"; if( !err ) err = FSWrite(outRefNum, resetCmd[0], resetCmd+1); Foo();


Foo() will be called once FSWrite() completes.

Closing Modem Port's Serial Drivers

Now we'll close both the input and output drivers, input side first.

if( inRefNum ) (void) CloseDriver( inRefNum ); if( outRefNum ) (void) CloseDriver( outRefNum );


Note how we don't attempt to close the drivers unless the reference numbers are non-zero. Only a non-zero reference number is valid.

The Summary of Synchronous IO

Advantages
Very easy to program.
Disadvantages
Your computer is effectively frozen while the job completes.

What does Asynchronous IO mean?

When you execute IO asynchronously, the MacOS places your job in a queue and hands control back to your program immediately.
You supply a callback (called a "completion routine" in MacOS IO parlance) to be called when the job is completed.
The effect presented to your program is the Mac executes the job while your code executes.

Explanation of the ParamBlockRec union

Almost every asynchronous routine on the MacOS takes a pointer to a ParamBlockRec union. This is known as the "parameter block."

union ParamBlockRec { IOParam ioParam; FileParam fileParam; VolumeParam volumeParam; CntrlParam cntrlParam; SlotDevParam slotDevParam; MultiDevParam multiDevParam; };

Explanation of the ParamBlockRec union

The different fields of ParamBlockRec are used by different routines for different purposes.
The ioParam field is used for transport devices like the serial driver.
The fileParam field is used for the File Manager
The volumeParam is used for managing storage volumes like floppies, hard drives, CDs, etc.
The cntrlParam is used for controlling the drivers themselves.
We're interested in the ioParam field, and thus the IOParam structure.

Explanation of the IOParam structure

The IOParam structure.


QElemPtr qLink; short qType; short ioTrap; Ptr ioCmdAddr; IOCompletionUPP ioCompletion; // your callback OSErr ioResult; // job's status StringPtr ioNamePtr; short ioVRefNum; short ioRefNum; // reference number SInt8 ioVersNum; SInt8 ioPermssn; Ptr ioMisc; Ptr ioBuffer; // the buffer long ioReqCount; // size requested long ioActCount; // size completed short ioPosMode; long ioPosOffset;

Asynchronous IO Example

// open driver code here ParamBlockRec pb; pb.ioParam.ioCompletion = MyCompletion; pb.ioParam.ioRefNum = outRefNum; pb.ioParam.ioBuffer = (Ptr) resetCmd+1; pb.ioParam.ioReqCount = resetCmd[0]; if( !err ) err = PBWriteAsync( &pb ); Foo(); // close driver code here


Foo() will be called immediately.
pb.ioParam.ioResult is set to 1 when successfully installed into the driver's queue. When completed, ioResult contains an error code.
pb.ioParam.ioCompletion is called when the job completes.

Asynchronous IO Summary

Advantages
Your program continues to execute while IO completes.
Disadvantages
Uses error-prone parameter blocks.
Completion Routines are executed at interrupt time, and are subject to the restrictions of interrupt time code (can't use handles, can't use QuickDraw, can't access globals without extra code, etc).
What do you do while your IO job is executing? Call WaitNextEvent()? Spin the cursor? Ideally, you'd get some real work done.

The Thread Manager

The Thread Manager implements threads on the MacOS
The thread manager is a great way to divide up tasks your program performs. Compiling, printing, saving and downloading all are great candidates for threading.
The Thread Manager is cooperative -- a thread must specifically yield control to another thread in order for the model to work. It's like the MacOS's cooperative multitasking.

The Thread States Explained

The thread manager defines three mutually exclusive thread states.
Stopped
The thread can not be selected when your thread yields.
Ready
The thread can be selected to run when your thread yields.
Running
The currently executing thread.

The Threaded IO Model

The goal is to write a function that executes an IO job asynchronously, stopping our thread.
When the IO job completes, your callback executes and sets your thread's state to ready.
In this model, your program's other tasks (compiling, downloading, printing, etc.) continue to execute while your IO job executes.

Ideal Threaded IO Example

// open driver code here // parameter block setup code here if( !err ) err = PBWriteAsync( &pb ); if( !err ) err = SetThreadState(thisThread, kStoppedThreadState, kNoThreadID); // close driver code here

However, there's a window of death here. Between PBWriteAsync() and stopping our thread, the IO job can and will complete, executing our callback.

The Window of Death

Ideally, the execution of the job looks like
Asynchronously Write
Stop Thread
Callback is executed, set thread to ready
However, this course of execution is possible
Asynchronously Write
Callback is executed, set thread to ready
Stop Thread
Your thread is stopped, and will never wake up. Your thread is dead!

Combating the Window of Death (PowerPlant)

PowerPlant, Metrowerks' C++ framework defers the callback.
The callback checks the state of the IO thread. If the thread is not stopped, PowerPlant sets a Time Manager task to execute 100 microseconds in the future. Hopefully by then the thread will be stopped.
This is a good work-around, however it complicates the callback.

Combating the Window of Death (develop)

develop , Apple's technical journal, had an article on the Thread Manager. They advocated a dual-thread solution.
There's two threads per IO job -- the IO thread and the waker thread.
The IO jobs stops the waker thread. The IO thread executes the IO job and stops itself.
The callback readies the waker thread.
The waker thread wakes the IO thread.
This is a poor work-around. You have to manage two threads per job and the scheduling overhead to run the IO thread is too high.

Combating the Window of Death (Polling)

Using this technique, the thread is never stopped.
After executing the IO job, the thread simply loops, testing the ioResult field in the parameter block. The thread continually yields until ioResult is less than 1.
This method is as fast as PowerPlant's method and doesn't require a callback. This would be the best work-around.

Problems with the Threaded IO Models

The Thread Manager wasn't designed with IO in mind. Ideally, none of these models would be necessary -- we'd use the Ideal Thread Model.
The latency of the models is high. Imagine you have 25 threads running. You execute your IO and stop the thread. Even if the IO completes immediately, you have to wait your turn behind 24 other threads. One of those threads is the event loop, which may switch out your application.

Enhancing the Thread Manager for Effective IO

Metaphysical question: What does it mean to stop a thread?
Thread Manager's answer: mark it as ineligible for scheduling and schedule another thread.
My solution: write a routine that simply marks a thread as ineligible for scheduling -- but doesn't reschedule. This puts your thread into a known state

before

executing an IO job.

Extending the Thread Manager

SetXThreadState() is the new routine. It works just like SetThreadState() except it doesn't reschedule.
This line of execution works:
SetXThreadState( currentThread, kIneligible );
PBWriteAsync( &pb );
YieldToAnyThread();
Callback: SetXThreadState( ioThread, kEligible );
So does this:
SetXThreadState( currentThread, kIneligible );
PBWriteAsync( &pb );
Callback: SetXThreadState( ioThread, kEligible );
YieldToAnyThread();

Prioritizing Threads

However, our latency is still high. We want our thread to be the first thread in line when the IO job is completed.
Solution: be able to mark a thread as "priority".
A priority thread is the same as an eligible thread -- however the scheduler always schedules a priority thread first.
When our IO callback executes, it will make the thread as priority.

How do you do it?

The Thread Manager provides a hook for you to place your own scheduler.
However, the Thread Manager's data structures are completely opaque -- there's no "thread queue" to access from within the scheduler. You can't even access a reference constant given a ThreadID!
So, even though we can replace the default scheduler, we don't know what to schedule!

Creating a Thread Queue

There is a way -- create and maintain your own thread queue.
The Thread Manager provides three hooks meant for debugging. DebuggerNotifyNewThread(), DebuggerNotifyDisposeThread() and DebuggerNotifyScheduler(). We'll plug into these hooks.
We'll define three thread queues: an ineligible queue, an eligible queue and a priority queue.

Maintaining the Thread Queue

When our DebuggerNotifyNewThread() is called, we'll add an element to the eligible queue with the new thread's ID.
When our DebuggerNotifyDisposeThread() is called, we'll remove the element that matches our disposed thread's ID.
When our DebuggerNotifyScheduler() is called, we'll look at our priority queue. If there's a priority thread waiting, we'll move it into the eligible queue and schedule it. Priority status should be fleeting -- otherwise it will hog the processor.
Otherwise, we check out eligible queue for waiting threads.

The Thread Queue Code

We'll create the XThreadElem structure to hold individual thread queue elements.

struct XThreadElem { XThreadElemPtr next; ThreadID threadID; };

We can use the standard MacOS Queue utilities to manipulate our queues.
We'll store all three queues in one handle as an array of XThreadElem structures. We'll use the standard MacOS queue utilities to manage the elements.
We'll keep the handle locked because the Thread Queue routines will be called at interrupt time.

The Thread Queue Code

We'll create a XThreadQueue structure to manage each of our three queues

struct XThreadQueue { short ignored; XThreadElemPtr head; XThreadElemPtr tail; XThreadElemPtr mark; };

This is a standard MacOS queue header expect for the mark field. When moving through the eligible thread queue, the mark field points to the next thread to schedule.

Writing the SetXThreadState() routine

All our SetXThreadState() routine has to do is place the thread in question into the correct queue. Duck soup.

Threaded IO (Finally!)

We have extended the Thread Manager to effectively support IO. Now to write the IO code.
We want two new routines: a threaded read routine and a threaded write routine.
It turns out these routines are very similar, the only difference is the actual Device Manager call they make: PBRead() or PBWrite().

ThreadedRead()

Here's the ThreadedRead() function prototype: OSErr ThreadedRead( short refNum, void *buffer, long *size, long offset, long patience );
The only real new thing here is the patience argument. You can specify how long you'll willing to wait for the IO job to complete.
This function is very similar to FSRead(), except you get the advantages of threaded IO

ThreadedRead() Under the Hood

This is what ThreadedRead() does:
Marks itself ineligible using SetXThreadState()
Creates and fills out a parameter block
If a patience was specified, creates a Time Manager task set to kill the IO job if it doesn't finish in time.
Calls PBRead()
Calls YieldToAnyThread()

Summary of Threaded IO

Easy
Fast
Extends the Thread Manager -- all your current Thread Manager-based code will work.