How do I employ Critical Sections to synchronize two or more threads accessing the same data?
Note:Ademonstration program is available.
Getting Things In Sync
As you write multi-threaded applications, a situation sometimes occurs in which you have two threads that access the same resource, such as a file. In some rare cases, it's perfectly all right to allow simultaneous access to a resource. But in most instances, two or more threads accessing the same global resource can have disastrous effects such as data corruption or even making your system crash. Not good. So to properly deal with this, you have to have a way of letting a thread tell the other threads that may have access to the global resource to back off until the current thread has finished doing what it needs to do. The process of performing this type of communication between threads falls under the loose category of thread synchonization.
I call thread synchronization a loose category because synchronization can take on different forms. For instance, in a previous article on running queries in threads, Running Queries in Threads, I spoke at length about the Synchronize procedure of TThread, which basically makes a thread part of the main thread of an application. This is one form of thread synchronization. The other way is the way that I mentioned above, and that is what we will be discussing here.
Let's Get Critical, Critical... (sorry, couldn't resist)
In the most basic sense, critical sections define a way of designating that a particular resource is being used exclusively and is off limits to other processes. Here's another way to look at it: Imagine two people travelling along a path that is just wide enough to allow them to walk abreast of each other. Suddenly, they come to a footbridge that is only wide and strong enough for one person to cross at a time. Naturally, one person has to yield to allow the other to cross. Once that person is across, the other can follow and they can resume their sojourn side-by-side.
Critical sections are similar to the bridge-crossing analogy. If you look at the figure above, you'll see two threads (denoted by the cyan arrows) running in a single process (the black section). When the threads reach a critical section in the code (the blue area), one thread yields (the yield sign) while the other thread crosses the critical section. In actuality, one thread will have reached the critical section first, so it can flag the other threads not to mess with the data. The net result is that one thread is allowed to processes while the other(s) wait for it to finish.
That said, let's get into particulars about critical sections.
Threads and their associated routines and methodologies have always been relegated to the guru developers' realm. In my previous articles dealing with threaded technology, I've demonstrated that there's really nothing supernatural about performing tasks in threads. Likewise, using critical sections to synchronize threads is a fairly simple matter. It's actually one of the more literal methodologies you'll see implemented in Delphi. In fact, to implement critical sections in your code, you mark the block of code in each of your threads that accesses the global resource using EnterCriticalSection and LeaveCriticalSection calls (don't worry, we'll discuss these below). That's pretty much all there is to it.
Implementing a Critical Section
In order to implement critical sections in your code, you first have to define a critical section in memory. This is done with a call to InitializeCriticalSection. The function is declared as follows in Windows.PAS:
As you can see, InitializeCriticalSection takes one parameter of type TRTLCriticalSection, which is a record type that holds information about a defined critical section. You really needn't know anything about this structure because there's nothing that you have to manipulate within it. Simply declare a global variable (or at least one that is accessible to all threads that require access to it) of type TRTLCriticalSection; the compiler will do the rest. If you want more information about this structure, unfortunately you won't find anything in the manuals or the online help. However, if you look in Windows.PAS, you'll find the structure listed. But like I mentioned above, unless you're absolutely curious about the structure, don't bother, because there's nothing you need to do with it.
After you're done with a critical section, delete it to free up resources using the DeleteCriticalSection function. Like InitializeCriticalSection this function takes one parameter: a variable declared as TRTLCriticalSeciont. In this case, it would be the same one you defined with InitializeCriticalSection. As to where to place the initialization and deletion calls, I've found it most useful to put them in the initialization and finalization sections of the unit where you declare the critical section variable.
initialization
InitializeCriticalSection(CritSect);
finalization
DeleteCriticalSection(CritSect);
These sections go below all the code in the implementation right above the final end..
Using a Critical Section In Your Code
Putting a critical section to use in your thread is inanely easy. All you need to do is locate the section in your code where access to a globally accessible resource is made, and enclose it with Enter- and LeaveCriticalSection calls. For example, look at the pseudo-code listing below:
procedure TMyThread.ChangeData;
begin
...some code
//CritSect is a interface var of type TRTLCriticalSection and
//initialized in the initialization section of the unit
EnterCriticalSection(CritSect);
...change data in the resource here
LeaveCriticalSection(CritSect);
end;
Pretty easy stuff, huh? Believe me, of all the things I've written about, this is one of the most easy things to implement. What's really cool about implementing critical sections in your code is what happens to the threads that have to wait while another thread processes its code within a critical section. They go to sleep; that is, no CPU cycles are committed to the threads waiting for a critical section to free. This makes for extremely efficient code in addition to ensuring corruption-free data. This is the crux of thread synchronization.
Why Thread Synchronization Is So Important
In the old Windows 3.1 world, which is a cooperative processing environment, multiple tasks run in the same memory space, which means that if two processes step on each other's data, there is a strong probability that the entire system will be brought to its knees as the two processes fight for the same space. This was alleviated to a great extent with Win32 because processes run in their own virtual DOS machines, and if problems occurred, they would only affect those areas of memory. But the same situation can occur as in Windows 3.1 when two threads running in try to access the same global data. Fortunately, with things like critical sections, mutexes and semaphores, developers have the ability to circumvent those problems.
You see, thread synchronization is all about access management to resources that can only accommodate one thread at a time. Accessing the Delphi Visual Component Library (VCL) with multiple threads in a program is a perfect example of this. Whenever you run a thread that requires access to a VCL embedded on a form, you need to call Synchronize to synchronize the thread with main thread of the program, which has ownership of the VCL. Essentially, what happens here is that the main thread is put to sleep temporarily, so the external thread can do its processing, like changing the Caption text of a TLabel. If there wasn't any synchronization, the two threads would vie for control of the VCL object, and your program would crash due to resource contention. So I can't stress enough the importance of making sure you've covered the bases with synchronizing access to global data.
A Real-Life Implementation
I wouldn't have felt complete in writing this article without providing a real-life example of using critical sections in code, so I've designed a program that demonstrates this. Actually, it's a program that I need to use in my work that has turned out to be a good instructional aid. The link above and the link in the Editor's Note table at the top of the article are links to the application's ZIP file. While I'll be listing some of the code here, you'll do yourself a favor by having the code open in Delphi. Let's move on, shall we?
In my current job, I spend a great deal of time with customer transaction data, which I receive as several files on tape or CD in the form of ASCII files, which I copy to my local hard drive for processing. Each of these files typically contains a month's worth of data, but unfortunately, my processing programs require a single file which comprises either a quarter's worth of data, or at times, an entire year. So what this means is that I have to perform a join of the files. Traditionally, I had used the DOS Copy command with the /B parameter for joining files. This was somewhat tricky at best because it required me to enter the file names one by one on the command line. And if I misspelled a file name, I'd have to start all over again. This was a pain. So I came up with a program that automated that for me, allowing me to choose the files I wanted to copy from a file list box, then perform a join of the files.
I could have easily called the Copy command using CreateProcess and passed the string of file names as part of the parameter of CreateProcess. But I wanted to do something a bit sexier. So here's what I came up with: Copying would take place using two instances of a thread designed to copy one file to another. I would select files using system components for specifying drive, directory, and files. The program is a pretty no-frills one, consisting of a TDriveComboBox, a TDirectoryListBox, and a TFileListBox for selecting files; a TEdit for typing in the destination file, and a TButton to execute copying. If you have the source code, consider opening it up in Delphi and executing it now to see the interface in action.
On a conceptual level, the way the threads do their work is thus: The two threads are allowed to read their source files at will. However, when they write to the single destination file, they must wait until the other thread finishes its copying. The write phase for the thread is contained in a critical section to ensure that the current thread writing to the file has exclusive access to it. Look at the figure to the left. As you can see, the separate threads have free access to the source files, but when they have to write to the destination file, they have to essentially stand in line. The sample program I've included performs the copying with only two threads, but it follows the concept of the figure.
What makes this a more efficient way of copying than serially copying files with one process? The most obvious reason is that as one thread is writing, the other will either be reading another file or waiting in line to write its contents. This is in stark contrast to a single process that has to read a file, then write to its destination, then go back and repeat the process.
Here are some key points about the program that you should know about:
The first thing you'll notice in the code is that I actually employ two types of threads. One, TMasterThr, is used to spawn the other two threads. That's pretty much all it does. Why did I do that? I wanted to include a WaitForMultipleObjects call in my code which would make the program wait for the copying threads to terminate before moving on. But the calling thread just doesn't wait; it sleeps, which means that it doesn't receive any messages while it's waiting. Therefore, if I put the WaitForMultipleObjects call in the main thread of the program, it would lock up and not receive messages. Not good.
There are a couple of things I should mention about the program. First of all, the reads and writes are not buffered into small chunks. This isn't an issue for writing, but it is an issue for reading. You see, the read operation will read the entire contents of a file into memory. This means that if you copy large files (on the order of 15MB+), they'll be copied entirely into RAM, and will probably spill over into virtual memory. I tested the program using files ranging in size from 500K to 5MB, and it worked flawlessly.
This is a demonstration program, so I didn't provide any status for the copying. You can play around with the code to provide status messages. The groundwork has been laid for using more than two threads to perform the copying. However, if you want to experiment with that, you'll have to alter the code a bit to provide more signalling between the threads.
The demo program was written with just two threads in mind. I didn't think about using more threads until after I had completed the code, and because I wanted to get down to writing this article, I didn't want to go back and do a serious rewrite of the signalling logic to make more than two threads work together. However, I'll issue a challenge: If someone comes up with the code to make this copying operation work with more than two threads, I'll send them an inquiry.com baseball cap, and give them the chance to write about the technique(s) they employed.
It's source code time! I've commented the code rather extensively to aid in reading it. However, I still recommend that you download the source so you can have it at your fingertips.
{with the finally statements, need to make sure we get rid of
everything.}
finally
FreeMem(pBuf, bufSize);
dStream.Free;
end;
finally
sStream.Free;
end;
until
((NOT Assigned(Files)) OR (Files.Count = 0));
NoFiles := True;
end;
{TForm2 Code}
procedure TForm2.FormCreate(Sender: TObject);
begin
//This was set to a temporary directory for testing.
//Go ahead and delete this entry.
DirectoryListBox1.Directory := 'C:\CopyFolder';
end;
procedure TForm2.Button1Click(Sender : TObject);
var
I : Integer;
thr : TMasterThr;
begin
//Load up the master string list from which
//the threads will get their file names to copy
Files := TStringList.Create;
with FileListBox1 do
for I := 0 to Items.Count - 1 do
if Selected[I] then
Files.Add(Items[I]);
thr := TMasterThr.Create(Edit3.Text);
end;
initialization
InitializeCriticalSection(CritSect);
NoFiles := False;
finalization
DeleteCriticalSection(CritSect);
end.
This is a big piece of code. So what should you look for? Well, the most important part is the small section in TCopyThr's that contains the write operation into the destination file:
In order to make the program work, all access to the file has to be exclusive during this phase. What happens in this section of code is the destination file is either created or open based upon whether or not it exists; then, using the TFileStream Seek method, the program moves the file pointer to the end of the file, then writes the buffer filled in the section just above the critical section to the end of the file. This is the crux of the program. If I didn't enclose this in a critical section, I would get some serious access violations because the two threads could write to the destination file at the same time.
Wrapping It Up
Thread synchronization plays a key role in writing clean multithreaded programs, and critical sections are just one of the many ways to employ thread synchronization. In future articles I'll cover mutexes and semaphores, which allow you to do things a bit differently with thread synchronization. But for now, please feel free to rip up this code to see what you can come up with.
Copyright ⌐ 1997 Brendan V. Delumpa All Rights Reserved