How can I run queries in threads? I have several programs that run large queries, and would like to move the query processing to one or more background threads. How can I implement this in my program? NOTE: It's tough to discuss this subject without providing some background information. Threaded programming is so new to many programmers, I'd be remiss in my duties as The Delphi Pro if I didn't cover the most fundamental subjects of threaded programming. If you already have a good idea of the fundamentals of creating and running threads with the TThread component, you can skip to the section that deals with running queries in threads. -- The Delphi Pro Process and Thread Basics Many programmers, especially those new to programming with multiple threads, believe threads are the sole domain of programming gurus. But it's all a matter of understanding some fundamental concepts about how threads work in Delphi and how you can apply threads in your code. If what you're programming has to do with the User Interface, you don't want to use multiple threads because threads require resources. In fact, every time you create a thread, it gets the same size stack as the main thread of a program (we'll discuss this below). If you have a lot of threads, you'll take up a lot of resources. So the idea is to be judicious in your use of threads. It may be tempting to create a bunch of threads to handle a bunch of different tasks because creating threads, as you will see, is fairly easy. But you don't want to create threads just for the sake of creating threads. In addition to taking up resources, every executing thread creates another time slice on the CPU, forcing it to handle more tasks. The net result is that the computer will slow way down. But there are some instances in which running background threads makes a lot of sense: It's ideal to create background threads when: your program will execute a long process like a huge query or complex calculation that will take several seconds or minute to execute. In single-threaded programs in Win16 and Win32 alike, the interface becomes totally unresponsive if you execute a long process. Creating a separate thread of execution to run in the background frees up the user interface. your program runs in a multi-processor system. Windows NT, for example, has the ability to work SMP systems. With multi-threaded programs, individual threads may be executed on different processors, which means you take full advantage of balancing processor load. your program will need to execute a few processes at a time. One caveat: If you're running on a single CPU system and if the processes your program will be executing are pretty complex and will require lots of CPU cycles, it doesn't make sense to have several simultaneous threads running. However, I've found that with smaller operations that can execute in a few seconds, it's a pretty nice feature to have. In Windows 95 and NT (I'll refer to both systems as Win32 throughout this article), every program loaded into memory is called a process. Many people new to threads (including myself) make the mistake of believing that the definition of a process is interchangeable with that of a thread. It's not. Processes are their own entities. Though the name "processes" implies a form of activity, a process does nothing. It is merely a memory address placeholder for its threads and a space where executable code gets loaded. A process' thread is what actually does the work in a program. When a process is created in Win32, it automatically has an execution thread associated with it. This is called the main thread of the program. Other threads can be instantiated within a process, but you won't see many programs using multiple threads of execution. A thread can only be associated with one process, but a process can have many threads. Therefore, there is a distinct one-way, one-to-many relationship between processes and threads. The TThread Object Traditionally, processes (executing programs) are created in Win32 using the WinAPI CreateProcess and threads are created using CreateThread. In fact, many advanced Delphi and Windows programmers I've spoken with say that using the WinAPI call is their method of preference. With Delphi 2.0, the Delphi team created a wrapper class called TThread that encapsulates the WinAPI thread calls. TThread provides developers with a programmatic interface for creating multiple threads of execution in their programs. It also makes the job of creating and maintaining threads easier than directly using WinAPI calls. Does this come at a price? I don't know. I have several multithreaded applications using both straight WinAPI calls and the TThread object and haven't noticed any significant differences. But my test arena was not as wide as it should have been to accurately gauge performance differences. Most of the VCL is not thread-safe -- it's very important to take this into consideration when creating multiple threads of execution. If you call a VCL object from within a thread, most likely you'll raise an exception, because many of the VCL objects were not written with any type of synchronization code to ensure data integrity when called at random times from anything but the main thread. Essentially, they can only receive messages from a single thread. If they get a message from another thread, they'll hiccup, and your program will probably crash. Fortunately, TThread has a very simple way of safely making calls into the VCL that we'll discuss in a bit. Let's look at the TThread's structure. Here's the declaration for the TThread object: TThread = class private FHandle: THandle; FThreadID: THandle; FTerminated: Boolean; FSuspended: Boolean; FMainThreadWaiting: Boolean; FFreeOnTerminate: Boolean; FFinished: Boolean; FReturnValue: Integer; FOnTerminate: TNotifyEvent; FMethod: TThreadMethod; FSynchronizeException: TObject; procedure CallOnTerminate; function GetPriority: TThreadPriority; procedure SetPriority(Value: TThreadPriority); procedure SetSuspended(Value: Boolean); protected procedure DoTerminate; virtual; procedure Execute; virtual; abstract; procedure Synchronize(Method: TThreadMethod); property ReturnValue: Integer read FReturnValue write FReturnValue; property Terminated: Boolean read FTerminated; public constructor Create(CreateSuspended: Boolean); destructor Destroy; override; procedure Resume; procedure Suspend; procedure Terminate; function WaitFor: Integer; property FreeOnTerminate: Boolean read FFreeOnTerminate write FFreeOnTerminate; property Handle: THandle read FHandle; property Priority: TThreadPriority read GetPriority write SetPriority; property Suspended: Boolean read FSuspended write SetSuspended; property ThreadID: THandle read FThreadID; property OnTerminate: TNotifyEvent read FOnTerminate write FOnTerminate; end; Its structure is quite simple -- and simple is good In most components there are only a few procedures and properties you need to think about; this is not an exception with TThread. The only methods you'll need to worry about are Execute, Create, and Synchronize; and the only property that you'll usually need to access is FreeOnTerminate. Key Methods and Properties of TThread The key methods and property of TThread are listed below in Table 1. Attribute: Create Parameters: CreateSuspended: Boolean The Create method allocates memory, starts the thread and specifies the thread function in CreateThread as the Execute procedure. Here, as in any Create method, you can initialize vars and perform some preliminary operations. However, unlike a normal Create method, the thread is already executing by the time the method ends. Usually this isn't a problem because you'll just create the object and forget about it. But if you have to do any processing before the thread starts executing, set the CreateSuspended parameter to False. This allocates memory for the thread and sets it up, but the thread will not execute until you make a call to the Resume procedure. This is useful, especially if you need to set up values your thread will need over the course of its lifetime. Attribute: Execute Parameters: None Execute is your TThread's central execution method. It's fairly synonymous with a main processing or central control procedure or function you might use to execute all the procedures in a particular unit. The only difference is that with a TThread object, the first method that is called must be called Execute. This doesn't mean that you can't call another procedure which acts in that capacity from Execute. It's definitely more convenient to put your execution code here because that's Execute's purpose. One important note: If you look in the object declaration of TThread, you'll see that it is declared as a virtual; abstract; method. This means it's not implemented in any way, shape or form; therefore, it's up to you to provide the code. And there's no inherited functionality, so you'll never make a call to inherited Execute; in your own implementation. Attribute: Synchronize Parameters: Method: TThreadMethod Synchronize is your key to safely accessing the VCL in your thread. When Synchronize executes, your thread becomes a part of the main thread of the program, and in the process, suspends the operation of the main thread. This means the VCL can't receive messages from other threads other than the one you're synchronizing to the main thread, which in turn makes it safe to execute any VCL calls. Think of Synchronize as a middleman in a transaction. You have a buyer, which is the main thread of the program, and a seller of services, which is another thread of execution created within the same process. The seller would like to sell the buyer some goods -- in our case, do some operation on the VCL components running in the main thread -- but the buyer doesn't really know the seller, and is appalled at how the seller actually performs his service, so is afraid the seller's service may have a negative effect on him. So the seller enlists the help of an agent (the Synchronize procedure) to smooth things out, take the buyer out to lunch so the seller can do his thing. The seller is a procedure that performs the action on behalf of the thread. It doesn't have to be a specific type, but it must be a method of the thread. Say I have a long process running in a background thread that at certain times must update the text in a TPanel on the main form to give feedback about the thread's current status to the user. If I wrote in my code Form1.Panel1.Caption := 'This is my update message', I'd raise an exception -- most likely an access violation error. But if I encapsulate the call in a procedure, then call Synchronize, the message will be sent and text will be updated. Synchronize acted as a middleman so my status message could be passed to the TPanel. I've may have confused rather than enlightened you! Just remember this: When you want to call VCL objects from another thread, create a wrapper procedure for the operations you want to carry out, then use Synchronize to synchronize your thread with the main thread so your call can be safely executed. Synchronize is meant to be used really quickly. Execute it, then get out of it as soon as you can. While Synchronize is running, you essentially have only one thread working. The main thread is suspended. Attribute: FreeOnTerminate Parameters: Set property to Boolean value This is a really useful property. Typically, once your execute procedure finishes, the Terminated Boolean property is set to False. However, the thread still exists. It's not running, but it's taking up space, and it's up to you to free the thread. But by setting this property in the Create constructor to True, the thread is immediately freed -- and with it, all the resources it took up -- after it's finished executing. Table 1 -- Key Attributes of TThread A Real-life Example Most people get more out of seeing code examples to implement a certain concept. Below are code excerpts from a program that I wrote that performs a bunch of queries in sequence. First we have the type declaration: type TEpiInfoProc = class(TThread) {Base processing class for Episode Information processing} private FStatMsg : String; FSession : TSession; tblPrimary, tblHistory, tblSymmetry : String; FIter : Integer; property Iteration : Integer read FIter write FIter; procedure eiCreateEpiTemp1; //Performs initial joins procedure eiCreateEpiTemp2; //Creates new table of summarized info procedure eiGetClassifications(clState, //'AMI', 'Asthma', etc. clName, //'H1', 'H2', etc. priFld, //Join field from Primary hstFld : String; //Join field from History bMode : TBatchMode); //Batch Mode (will always be //batCopy 1st time); procedure eiCreateEpiInfoTable(clSrc, clDst, clName : String); procedure eiCreateHistory(HistIndicator : String); //Generic processing methods procedure EnableBtns; procedure GetTableNames; procedure UpdStatus; procedure IndexTbl(dbName, tblName, idxName, fldName : String; idxOpts : TIndexOptions); protected procedure Execute; override; public constructor Create; end; The above is like anything you see in Delphi when you declare a descendant of a class. You declare your variables, properties, and methods just like anything else. Here's the Create constructor for the thread: constructor TEpiInfoProc.Create; begin inherited Create(True); FSession := TSession.Create(Application); with FSession do begin SessionName := 'EpiInfoSession'; NetFileDir := Session.NetFileDir; PrivateDir := 'D:\EpiPriv'; end; FreeOnTerminate := True; Resume; end; Notice I'm creating a new TSession instance. When we discuss running queries in threads, I'll go into more depth about this. At this point the important thing to note is that I create the thread in a suspended state by calling the inherited Create and setting its CreateSuspended parameter to True. This allows me to perform the code immediately below the inherited Create before the thread code actually starts running. Now, on to the Execute procedure. procedure TEpiInfoProc.Execute; var N, M : Integer; begin try Synchronize(EnableBtns); //Set enabled property of buttons to opposite of current state Synchronize(GetTableNames);//Get Names FStatMsg := 'Now performing initial summarizations'; Synchronize(UpdStatus); ShowMessage('Did it!'); Exit; eiCreateEpiTemp1; eiCreateEpiTemp; {Create H1 - H9 tables} for N := 0 to 8 do for M := 0 to 5 do begin FStatMsg := 'Now performing '+ arPri[M] + ' field extract for ' + arDis[N]; Synchronize(UpdStatus); Iteration := M; //first iteration must be a batCopy, then batAppend thereafter if (M = 0) then eiGetClassifications(arDis[N], arCls[N], arPri[M], arHst[M], batCopy) else eiGetClassifications(arDis[N], arCls[N], arPri[M], arHst[M], batAppend); end; {Now do Outer Joins} for N := 0 to 8 do begin FStatMsg := 'Now creating ' + arDst[N]; Synchronize(UpdStatus); eiCreateEpiInfoTable(arSrc[N], arDst[N], arCls[N]); end; IndexTbl('EPIINFO', 'EpiInfo', 'Primary', 'Episode',[ixPrimary]); for N := 0 to 8 do eiCreateHistory(arCls[N]); FStatMsg := 'Operation Complete!'; Synchronize(UpdStatus); Synchronize(EnableBtns); except Terminate; Abort; end; end; Notice all my calls to Synchronize, especially the synchronized call to UpdStatus. Immediately preceding this call, I set a variable called FStatMsg to some text. UpdStatus uses this text to set the SimpleText of a TStatusBar on the main form of the program. Why not just pass this as a string variable parameter to UpdStatus? Because synchronized methods cannot have parameters. If you need to pass parameters, you must either set them in variables or create a structure to hold values your synchronized method can then access and pass to your main form. Also take note of the looping structures I've written. You might be tempted to run a loop within a synchronized method. Although your code will work, this defeats the purpose of multiple threads because of what we discussed above. Synchronization makes your thread part of the main thread during the time it is executing, meaning you have only one thread actually running. This reduces your program to a single thread of execution. And if you have a potentially long process like several sequential queries executed from within a loop like I have above, forget it! Your program will act just like a single-threaded application until you're out the loop. Running Queries in Threads In order to successfully run queries in threads, you must follow a few cardinal rules: For each thread you create that will be executing queries, you must have a separate instance of a TSession created along with the thread. (See Create constructor above) Simply put, TSessions essentially define the multi-threadness of the BDE. The BDE requires separate TSessions to allow multiple access to shared resources. Failing to do this will cause some very strange things to occur in your program (it will crash with an BDE exception of some sort.) Every data-access VCL component you instantiate within a thread must have its SessionName property set to that of the SessionName of the TSession created with your thread. If you don't do this, the data-access component will default to the TSession created along with the main thread; this could have serious ramifications. If you assign components to a new TSession and forget to do so for others, when the components interact with each other with let's say a TBatchMove moving data from a query to a TTable, you'll get an error that says you're trying to perform an operation on objects in different sessions. Not good. So just to ensure that everything is safe, keep the data-access components you create in separate threads distinct by assigning the SessionNames to the SessionName of the TSession you create along with your thread. You must create synchronized methods for any operation that will be accessing data-access components embedded on a form. I can't stress enough the importance of performing any call that will access VCL components within a Synchronize call. You're looking for trouble if you don't. Let's say you run a background query that will provide information in a visual form in a TDBGrid. Running the query is the easy part. But once you're finished, you have to have a way of manipulating a TDataSource so that its DataSet property will point to the query you've run in the thread. Setting this must be done in a call to a synchronized method that does the setting for you. Building on the previous rule, queries that will be providing data for data-aware components such as a TDBGrid must be persistent. Once you free a thread, its resources are freed too. If you create a TQuery within the context of a TThread, your query is gone once the thread terminates and frees. The easiest thing to do then is to drop a TQuery on a form and run it from within the thread. That way, its handle remains valid even though the thread has finished its work. Of the above rules, 1 and 2 are the most important elements for successfully running queries in separate threads of execution. While rules 3 and 4 are important, under certain conditions your queries will not run if you don't obey rules 1 and 2. "Visual" and Non-visual Background Queries Now that we've established ground rules regarding running queries in threads, we must take into consideration a couple of implementation paths. I put the visual in quotes above to denote queries whose results will be displayed in a data-aware component of some sort. Non-visual background queries don't provide any visual result. They just run, write to persistent data stores and return. How you implement these methods is significantly different -- we'll discuss them in separate sections below. In either of these cases, the aim is to free up the user interface. One of my biggest frustrations with writing programs that process a lot of information is that once I've started execution, I can't do anything with the interface. I can't minimize or maximize the application; I can't move the window. Ever since I've learned to implement threaded technology in my programs, I need not worry about that. It's a real boon to my productivity. Running Totally Background Queries This method is a challenge; you have to code everything yourself. It's not a lot of code, but there are certain things to consider that you can take for granted when you drop VCL components onto a form. This method is very useful for "Data Mining" types of programs, in which you're querying huge tables and coming up with highly refined, summarized result sets. The operations that typify this type of application are several queries performed in succession. Were you to run this kind of application in a single-threaded program, you can effectively kiss your productivity goodbye because your user interface will be locked. The example I'll be using for this discussion is the example I used above because that application is a data mining type of application. It was built to run several sequential queries against Paradox tables with hundreds of thousands of records (pretty big for Paradox tables). Let's revisit the type declaration of thread: type TEpiInfoProc = class(TThread) {Base processing class for Episode Information processing} private FStatMsg : String; FSession : TSession; tblPrimary, tblHistory, tblSymmetry : String; FIter : Integer; property Iteration : Integer read FIter write FIter; procedure eiCreateEpiTemp1; //Performs initial joins procedure eiCreateEpiTemp2; //Creates new table of summarized info procedure eiGetClassifications(clState, //'AMI', 'Asthma', etc. clName, //'H1', 'H2', etc. priFld, //Join field from Primary hstFld : String; //Join field from History bMode : TBatchMode); //Batch Mode (will always be //batCopy 1st time); procedure eiCreateEpiInfoTable(clSrc, clDst, clName : String); procedure eiCreateHistory(HistIndicator : String); //Generic processing methods procedure EnableBtns; procedure GetTableNames; procedure UpdStatus; procedure IndexTbl(dbName, tblName, idxName, fldName : String; idxOpts : TIndexOptions); protected procedure Execute; override; public constructor Create; end; There's something in the type declaration that I didn't bring up previously. Notice the second private variable FSession. Then, look at the constructor Create code below: constructor TEpiInfoProc.Create; begin inherited Create(True); FSession := TSession.Create(Application); with FSession do begin SessionName := 'EpiInfoSession'; NetFileDir := Session.NetFileDir; PrivateDir := 'D:\EpiPriv'; end; FreeOnTerminate := True; Resume; end; I simply instantiate a new session in my Create constructor for my queries and tables to point to during the course of execution. If you don't have the latest update to Delphi and you look in the help file under TSession, you're warned not to create a TSession on the fly. Why? Truthfully, I don't know. I broke the rules anyway when I originally saw this warning because after looking at the VCL source for the TSession object in the DB.PAS file, I didn't see anything in the TSession object that would lead me to believe that I couldn't instantiate a TSession object on the fly. This changed in 2.01 -- the help file makes no such warning -- so it's not an issue. Of all the things that I'm doing in the thread, creating this TSession is the absolute key operation because it provides a common ground for all the data access components that I instantiate in the thread. During the course of execution, I make calls to several procedures that perform queries on several different tables. However, they all operate similarly, so it's only necessary to list one of the procedures to illustrate how you should do your queries in a thread. Here's an example procedure: procedure TEpiInfoProc.eiCreateEpiTemp1; var sqlEI : TEnhQuery; begin sqlEI := TEnhQuery.Create(Application); with sqlEI do begin SessionName := FSession.SessionName; DatabaseName := 'Primary'; DestDatabaseName := 'PRIVATE'; DestinationTable := 'epitemp1.db'; DestBatchMoveMode := batCopy; DoBatchMove := True; with SQL do begin Clear; Add('SELECT DISTINCT d.Episode, d.PatientID, d.Paid AS TotPd, d.Charge AS TotChg, D1.Start'); Add('FROM "' + tblPrimary + '" d LEFT OUTER JOIN "'+ tblSymmetry+'" D1 '); Add('ON (D1.Episode = d.Episode)'); Add('ORDER BY d.Episode, d.PatientID, d.Paid, d.Charge, D1.Start'); end; try try Execute; except raise; Abort; end; finally Free; end; end; end; The important thing to notice in the code example above is that the first property I set for the query I create is its SessionName property. This falls in line with obeying rules 1 and 2 that I mentioned above. I did this first so I'd be in compliance with them right away, though you can set properties in any order. The whole point to this is that looking at the procedure, it's no different from any type of procedure you'd use to create a query and execute it in code. The only difference is that you don't rely on the default TSession; you use the one you created at construction time. While I can't release what the application above actually does, I'll give you some gory details. On the average, the application requires three to four hours to execute completely, based on the size of the client database. Most of the client databases are several hundred megabytes in size, and all are in excess of 600MB per table, so you can imagine that with the enormity of the data sets, a single threaded application would just sit there and appear to be locked. But running the queries in the background frees the interface, so status messages can be supplied quite easily, and the form remains active during the run. I took an easy way out in this program to prevent multiple threads from executing. Typically what you'll do is use a system mutex or a semaphore to provide a mechanism signaling that you have a process running. But I found it much easier to just disable all the buttons on the form, so the user could still move the form around and even enter selection criteria but couldn't execute the new process until the current process has finished. It took much less code to implement a disabling feature than to implement a system mutex. Running Background Queries that will Produce Visible Result Sets You won't need to do this often because typically, queries run to display data in a grid or some other data-aware component are usually run against smaller tables and execute fairly quickly, so you can get away with running a single thread of execution. There are some instances in which you might need to query a couple of unrelated tables at the same time, so running the queries in separate threads of execution makes a lot of sense. Below is unit code from a test unit I wrote to illustrate this purpose. On my form I've dropped two of each of the following components: TQuery, TSession, TDataSource and a TDBGrid. I also have a button that will perform the thread creation. Let's look at the code, then discuss it: unit thrtest; interface uses Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs, DB, DBTables, Grids, DBGrids, StdCtrls; type TForm1 = class(TForm) DataSource1: TDataSource; Query1: TQuery; DBGrid1: TDBGrid; Button1: TButton; DataSource2: TDataSource; Query2: TQuery; DBGrid2: TDBGrid; Session1: TSession; Session2: TSession; procedure Button1Click(Sender: TObject); end; //Thread class declaration - very simple TQThread = class(TThread) private FQuery: TQuery; protected procedure Execute; override; public constructor Create(Query: TQuery); end; var Form1: TForm1; implementation {$R *.DFM} constructor TQThread.Create(Query: TQuery); begin inherited Create(True); // Create thread in a suspendend state so we can prepare vars FQuery := Query; //Set up local query var to be executed. FreeOnTerminate := True; //Free thread when finished executing Resume; end; procedure TQThread.Execute; begin FQuery.Open; //Perform the query end; procedure TForm1.Button1Click(Sender: TObject); begin TQThread.Create(Query1); TQThread.Create(Query2); end; end. I've made the thread intentionally easy. The only thing you pass to the constructor is a query. The rest is all taken care of. Before I did any coding of the above, I did the following with the components on the form: DataSource1 and DataSource2 have their DataSet properties set to Query1 and Query2, respectively. Likewise, DBGrid1 and DBGrid2 have their DataSource properties set to DataSource1 and DataSource2, respectively. Query1 and Query2 have their SessionName property set to the SessionName property of Session1 and Session2, respectively. Both TQuerys have their DatabaseName properties set to DBDEMOS. Query1's SQL is simple : 'SELECT * FROM "employee.db".' Query2's SQL is simple as well: 'SELECT * FROM "customer.db".' This is a really simplistic example that works amazingly well. When the user presses the button, the program creates two new threads of type TQThread. Since we pass in the query we want to execute, the thread knows which one to operate on. Notice that I didn't put any synchronization code in this example. Some might think that you have to do this for a DataSource component, but the DataSource is Session-less, so it's impossible. Besides, the DataSource is merely a conduit between the Query and the DBGrid. It's fairly inert. As in any program, you can make this much more complex. For example, you can set the DatabaseName and SQL of the TQuerys at runtime, making this a really flexible display tool. You can add a batchmove facility at the tail-end of the run so that in addition to displaying the results in a DBGrid, the program will also write to result tables. Play around with this to see how you can expand on the concept. Conclusion Multi-threading is a breakthrough and revolutionary programming technology for many programmers. Note, that on Unix systems, multithreading has existed for several years. But for Windows platforms, this hasn't been the case until a few years ago. Also, I didn't cover some of the really advanced stuff that you can do with threads. For that, you'll have to refer to advanced-level Win32 development books. But hopefully, in the course of this discussion, you've reached a level of understanding that will get you on your way to constructing your own multi-threaded applications.