C/C++ Interactive Guide

home *** CD-ROM | disk | FTP | other *** search

/ C/C++ Interactive Guide / c-cplusplus-interactive-guide.iso / c_ref / csource4 / 209_01 / ldhfitr.doc < prev next >

Wrap

Text File | 1990-03-05 | 22.4 KB | 732 lines

LDHFITR.DOC VERS:- 01.00 DATE:- 09/26/86 TIME:- 10:02:47 PM description of computer reduction of data from kinetic measurements for the enzyme lactate dehydrogenase information on how to run program LDHFIT By J. A. Rupley, Tucson, Arizona NOTES ON DATA REDUCTION BY COMPUTER AND INFORMATION ON RUNNING THE PROGRAM LDHFIT INTRODUCTION: In order to obtain conclusions from quantitative measurements, there must be some form of data reduction. This can be as simple as a comparison by eye of two curves drawn through the data. If, however, the data set is large and complex, for example with more than one independent variable, and if the questions posed are detailed or involve a complicated nonlinear model, then visual or graphical methods are less satisfactory than a computer-based analysis. Procedures of the latter type are now widely used. This laboratory is a short introduction to data reduction by use of a computer. The intent is to show that a sophisticated computer program can be handled easily, that its use saves time and effort, that it can treat a more complicated model than can be treated graphically, and that it produces information such as estimates of uncertainties in the parameters that is difficult or impossible to obtain from graphical methods. The data to be analyzed are initial rate measurements made on the lactate dehydrogenase catalyzed reaction of pyruvate with NADH, in the presence and absence of lactate as inhibitor. The results of the computer fit are the following: (1) values of the kinetic constants V, KmA, KmB, KmAB, KmQ/KmPQ, and KBInhib. The first five constants are those that can be evaluated by the standard graphical methods of primary and secondary reciprocal plots. The constant KBInhib is the dissociation constant for the dead-end complex LDH-NADH-lactate, which is included in the mechanism fit by the computer program but cannot be included in the mechanism on which the graphical methods are based. (2) Estimates of the standard deviations of the kinetic constants. These are needed for an understanding of the reliability and significance of the values calculated for the kinetic constants. (3) A list of the coordinates of points suitable for construction of the lines of the reciprocal plots of the standard graphical methods. 1 By J. A. Rupley, Tucson, Arizona THEORY: A. REMARKS ON FITTING OF A MODEL TO DATA In a typical data reduction, a particular model to be tested is fit to a set of data points under some criterion for best fit. The ith data point of a set of N data points consists of a single value for the dependent variable Yobserved(i) measured for corresponding single values for the one or more independent variables Xobserved(i). The commonly-used least squares criterion for quality of fit is the minimum value of the sum of the squares of the deviations between the observed values of Y and the values of Y calculated according to the model being tested. Working from the model to be fit to the data, one develops an equation relating, for each of the N data points, the dependent variable Y to the independent variables X and to a set of M variable parameters p: Ymodel(i) = F(Xobserved(i); p(j), j=1,M) eq. 1 For example, if the model predicts a linear relationship between Y and a single independent variable X : Ymodel(i) = p(1) + p(2) * Xobserved(i) eq. 2 The constants p(1) and p(2) of equation (2) are the Y axis intercept and the slope, respectively, and of course are the same for all data points (for all pairs of values Y(i) and X(i)). The fitting of a model to data consists of finding the values of the M variable parameters p that give the best agreement between the N pairs of values of Ymodel(i) and Yobserved(i). Best agreement can be defined as the minimum value of the least squares function y: N y = SUM (Yobserved(i) - Ymodel(i))^2 * W(i) eq. 3 i=1 The factor W(i) of equation (3) is the normalized reciprocal variance (the statistical weighting) of the ith data point, and it can be set at unity if the data points are all of equal estimated uncertainty. Combining equations (1) and (3), one sees that the least squares function y of equation (3) is a function of the full set of N data points and a set of M variable parameters: y = f(Yobserved(i), Xobserved(i), i=1,N; p(j), j=1,M) eq. 4 2 The fitting problem therefore consists of finding the minimum value of the least squares function y, which for a given set of data depends only on the M variable parameters p (the data points Yobserved(i)---Xobserved(i) in equation (4) are constant in the fitting). There are several methods commonly used to find the minimum of y and thus evaluate the best fit values of the parameters p. The more useful of these can handle nonlinear model functions F (equation (1)) of arbitrary mathematical form. The rate law for lactate dehydrogenase is an example of a nonlinear model function. In the simplex method used here, one constructs an M dimensional polyhedron with M + 1 vertices (the simplex). Each dimension of the simplex corresponds to a variable parameter of equation (4). Each vertex of the simplex is a point in the M dimensional space, which is called "parameter space" or "factor space." The M coordinates of each vertex are values of the M parameters. Thus each vertex of the simplex has an associated value of the least squares function y. The starting simplex is constructed to be so large as to include within it the point corresponding to the minimum value of y. This minimum point has as its coordinates of the best fit values of the parameters. The minimization process shrinks the simplex about the minimum point, even though the coordinates of the minimum are not known beforehand, until the vertices of the simplex are so close together and so nearly equal that an exit test is satisfied. The exit test is set so that a desired level of accuracy is obtained. The values of the M parameters averaged over all the vertices, ie the parameter values for the centroid of the simplex, serve as reliable estimates of the best fit parameter values (those for the least squares function minimum), because the minimum point is known to be inside the shrunken simplex and thus near the centroid. We generally want to estimate the uncertainties in the parameter values obtained for a model fit to a particular set of data points. Standard deviations of the parameters are calculated by the program used here. There are likely to be large uncertainties in the parameters if there are few data points or if there are large deviations between Ymodel and Yobserved. As a rule, one should have 5 to 10 times as many data points as parameters. The first try at estimating uncertainties of the parameters can fail. The calculation involves matrix inversion, the use of differences between nearly equal large numbers, and the approximation of a complex surface by a simple quadratic function. It may be necessary to change certain test values and then to repeat the calculation of the standard deviations. In particular, if a parameter is close to a bound, so that expansion of the simplex in that dimension is not possible, then that parameter should be fixed in the quadratic fit. 3 All fitting methods can fail. We will not discuss problems with bounds, local minima, ill-behaved functions, poor quality data, physically unreasonable best fits, etc. References given in subsection C below discuss fitting methods in more detail. For additional discussion of fitting by use of the simplex method, see "Notes on the fitting program", and the article by Nelder and Mead (1965). B. THE FUNCTION FIT TO THE DATA In the computer program you will use, the function fit to the data, which you obtained in the kinetic experiment with lactate dehydrogenase, is for an ordered ternary-complex pathway with dead-end complexes (EAP and EQB). The following scheme describes the pathway: EAP | KPInhib = ea * p/eap | ----------EA----------- k1 * a | k-1 k-2 | k2 * b | | | | E EAB <---> EPQ eq. 5 | | | | k4 | k-4 * q k-3 * p | k3 ----------EQ----------- | | KBInhib = eq * b/eqb EQB This pathway is more complicated than the one, without dead- end complexes, on which are based the standard graphical methods of analysis of two-substrate--two-product reactions: -----------EA---------- | | | | E EAB <---> EPQ eq. 6 | | | | -----------EQ---------- For the direction of reaction (pyruvate reduction by NADH) and the product inhibitor (lactate) studied in these measurements, the rate law for the pathway of equation (5) is: 4 vo = V / [ 1 + KmQ/KmPQ * p + KmA/a eq. 7 + KmB * (1 + KmQ/KmPQ * p) * (1 + 1/KPInhib * p) + KmAB/(a * b) * (1 + KmQ/KmPQ * p) + k3/(k3 + k4) * 1/KBInhib * b ] The presence in the pathway of equation (5) of the dead end complexes EAP and EQB leads to a significantly more complicated rate law than is found for the simpler pathway of equation (6); compare equation (7) with the following equation, for the "bare" compulsory order pathway without dead-end complexes (the pathway of equation (6)): vo = V / [ 1 + KmQ/KmPQ * p + KmA/a eq. 8 + KmB * (1 + KmQ/KmPQ * p) + KmAB/(a * b) * (1 + KmQ/KmPQ * p) ] The computer program used to reduce the kinetic data fits equation (7) to measurements of the initial rate v(0) as a function of the concentrations of NADH, pyruvate, and lactate. The parameters that are evaluated in the fitting are listed in Table I and are those of equation (7). Several points should be noted regarding equations (7) and (8): (1) The last term of equation (7), containing the equilibrium constant KBInhib, probably can be neglected under initial rate conditions with q=0; in the fitting program this is recognized by setting the last term at a small value, 1E-10, which removes it from the equation. (2) With this change, equations (7) and (8) are identical except for the appearance in one term of equation (7) of a factor containing the equilibrium constant KPInhib, which is for dissociation of the dead-end complex EAP defined in the pathway of equation (5). 5 By J.A. Rupley, Tucson, Arizona REFERENCES: A simplex method for function minimization. J.A. Nelder and R. Mead (1965. Computer J. 7, 308. Digital computer user's handbook. M. Klerer and G.A. Korn (1967). Mcgraw-Hill, New York. Data analysis in biochemistry and biophysics. M.E. Magar (1972). Academic, New York. The solution of the general least squares problem with special reference to high-speed computers. R.H. Moore and R.K. Ziegler (1960). Los Alamos Scientific Laboratory Report LA-2367. 6 By J.A. Rupley, Tucson, Arizona TABLE I: EQUATION (7) FITTING RATE LAW PARAMETERS PARAMETERS 1 V 2 KmA = KmNADH 3 KmB = KmPyruvate 4 KmAB = KmNADH-Pyruvate 5 KmQ/KmPQ = KmNAD/KmLactate-NAD 6 1/KPInhib = 1/KInhibLactate 7 k3/(k3 + k4) * 1/KInhibPyruvate parm(7) approx. equal to 0 at t=0 INDEPENDENT VARIABLES: a = [NADH] b = [Pyruvate] p = [Lactate] q = [NAD] q = 0 at t=0 DEPENDENT VARIABLE: vo = initial rate of conversion of pyruvate to lactate 7 By J. A. Rupley, Tucson, Arizona PROCEDURES: A. SUMMARY OF STEPS IN THE DATA REDUCTION (1) Startup. Create a new file with your data, following the template of the DATA SHEET. Setup the title and the starting ranges of the parameters, from which the starting simplex is constructed. Be sure to save the setup on disk. (2) Minimization. The minimum of the least squares function is located by an iterative method, in which the simplex, a polyhedron in parameter space, is shrunk about the point representing the minimum. The starting simplex is large, sufficiently so to certainly include the minimum point. After shrinking, all the vertices of the simplex are so close together, and thus so close to the minimum point, that the average of the vertices, the centroid, is a good estimate of the minimum point. The minimization can take more than two hours. These results should be saved on a disk file, so if the fitting fails, it is possible to quickly return to this point. (3) Quadratic fit and standard deviations. The least squares function about the minimum is approximated by a quadratic surface, the shape of which gives estimates of the standard deviations of the parameters. This step takes about 10 minutes. There is considerable output, the last of which is a list of the standard deviations. The fitting program can be set to cycle between a set of minimization iterations and the quadratic approximation. This option can produce quicker convergence. B. FAILURE OF THE FITTING Fitting can lead to physically unreasonable values of the parameters, for example for a kinetics experiment a set of parameters for which V is unreasonably large or small, or a set with standard deviations such that one or more of the principal parameters (V, KmA, KmB, KmAB) have no significance. The best fit may be unacceptable. To judge whether a fit is acceptable, compare the root mean square error (RMS error) with the expected uncertainty in your independent variable, the meas- ured rates. The most likely causes of failure are: (1) A few data points (flyers) that are widely wrong. The results should always be examined for flyers, which we will somewhat arbitrarily define as measured values differing from calculated values by more than 3 standard deviations: ABS(Ymodel 8 - Yobserved) > 3 x RMS ERROR. If there are flyers, delete them and repeat the calculation, or correct the values by recalculation of vo from your experimental data. (2) Trapping of the fit in a local minimum in parameter space not near the true minimum, or a broad and ill-defined minimum region. To test for trapping or a broad minimum, start the fitting with a different simplex. For example, set tighter starting ranges on one or more of the principal parameters (V, KmA, KmB, KmAB). Alternatively, set one principal parameter, e.g. V, at a reasonable value. Generally, to test for trapping, one repeats the fitting with a more expanded starting simplex, not with a tighter one as suggested above. However, assuming the tighter ranges include physically reasonable values of the parameters, the approach above, using a restricted simplex, has the advantage of testing whether a set of physically reasonable values can give an acceptable fit of the data. To judge whether a fit is acceptable, even though it may not be as good as a physically unreasonable best fit, compare the root mean square error (RMS error) with the expected uncertainty in your dependent variable, the measured rates. (3) Systematic error in the measurements or poor data. Look for systematic bias in the extraction of the slope. Look for errors in calculating the value of vo. One can plot the measured values with the calculated lines, ie construct the reciprocal plots. It should then be clear what the problem is. Correction, if it is possible, generally requires examination of the original measurements, ie the tracings of transmittance versus time. A point to be made is that the computer fitting, because it is unbiased, is more likely than the experimenter to detect systematic error or poor data. 9 By J. A. Rupley, Tucson, Arizona INSTRUCTIONS FOR FITTING KINETIC DATA FOR LACTATE DEHYDROGENASE WITH THE PROGRAM LDHFIT.COM (C-LANGUAGE): A. CREATE THE DATA FILE By use of a word processing program, modify the template file LDHDATA, incorporating your vo values and so constructing your own data file, eg TUE-C.DAT. B. RUN THE FITTING PROGRAM Run the program LDHFIT with your data file (eg TUE-C.DAT) to produce the results file (eg TUE-C.PRT) A>LDHFIT TUE-C.DAT TUE-C.PRT It takes about 50 minutes for 100 iterations. 200 to 400 iterations are commonly required for convergence. The fitting will cycle between sessions of Nelder-Mead minimization (30 iterations) and quadratic fitting. You should watch the fitting. Let the program run freely until the test value is < 10-4. Then: (1) See if a parameter needs to be eliminated. If the calculation of -pmin- fails at this stage in the fitting, probably the fitting will not converge. If one of the parameter values listed under -pmin- in the summary of the quadratic fit results is negative, then this parameter should be fixed at 1.e-10. This can be done by use of a program option (see below). (2) Examine the data listing for flyers: (Yobserved(i) - Ycalculated(i)) > 3 * rms error. If these are present, you should delete them from the data set, or set their weight to zero. This is done by reworking the data file, with a word processor, then rerunning LDHFIT. You will get faster convergence by setting the starting simplex in the data file at an intermediate point in the fitting. To exercise options, during the minimization type: CTRL-X After some output, you will be asked to select an option: display of the data and then quadratic fit (<CR>) fix of a parameter (CTRL-F) exit from the program to operating system (CTRL-C) return to the fitting (CTRL-X) 10 B. AFTER THE FITTING CONVERGES When the simplex has shrunk to where it satisfies the exit test, the minimization pauses and expects input from the operator: respond with <RETURN> when asked if you want to display data; respond with <RETURN> when asked at the next pause if you want to carry out quadratic fit; at the next pause, record the number of the iteration, and exit by typing CTRL-C; D. CONDENSE THE RESULTS FILE The results file (eg TUE-C.PRT) will be too long to print. Select the information from the last data display and quadratic fitting by use of a word processor or a special search program. E. FAILURE OF THE FIT Repeat the fit after checking your data and possibly resetting the starting parameter values. 11