Logarithmic growth in biological processes

June 12, 2017 | Autor: S. Tabirca | Categoría: Computer Science, Cancer, Approximation, Mathematical Model, Biological Process, Least Squares, Gauss–Newton Method, Least-squares Approximation, Biological processes, Least Squares, Gauss–Newton Method, Least-squares Approximation, Biological processes

Share Embed

Laporkan tautan ini

Descripción

2010 12th International Conference on Computer Modelling and Simulation

Logarithmic Growth in Biological Processes M.Paltanea, S.Tabirca Dept. of Computer Science University College Cork Cork, Ireland Email: [email protected], [email protected]

E.Scheiber Dept. of Computer Science Transilvania University Bras¸ov, Romania Email: [email protected]

M.Tangney Cork Cancer Research Centre University College Cork Cork, Ireland Email: [email protected]

certain point. The method which was applied was to consider some of the most widely used tumor growth mathematical models and to approximate their parameters based on the available experimental data. The first section of this paper will present an overview of the models for which the parameters were determined with methods of approximation while the second section will present the method for determining the parameters for a generic logarithmic function, which can be applied as the model for phases of growth in different biological processes.

Abstract—The aim of our paper is to approximate the parameters of a generic logarithmic function that can be used to model various biological processes. We are considering the Gauss-Newton algorithm for solving the non-linear least squares problem and we are proposing a method for selecting the initial choice for the algorithm. This method proves to considerably increase the convergence probability of the GaussNewton algorithm applied for our function compared to its convergence probability when choosing a purely heuristic initial approximation. Keywords-logarithmic growth; least squares; approximation

I. I NTRODUCTION

II. S IMILAR MODELS

The increasing importance of approximation in practical problems has led to the foundation of a new branch in mathematical analysis - the approximation theory. Its main goal is to determine how to best approximate functions or given quantities with simpler functions and also how to represent the error of approximation. Nowadays, its applications are vast as it can provide sufficiently accurate solutions to very complex problems that couldn’t be solved by means of analytical tools alone. A specific research area that can benefit from the use of approximation methods is the interdisciplinary field of mathematical modeling of biological processes. During the past few decades, mathematical models and in particular cancer growth models have proven their worth in biological research. These models are generally defined by differential or partial differential equations which contain a number of parameters. Determining these parameters can be achieved by means of approximation, in those cases in which the modeled biological process can be subjected to measurements over time, like for instance the tumor growth or bacterial growth. Moreover, although subject to uncertainty, mathematical models can be used to predict the future evolution of the biological process they represent, by inputting the measurements recorded for the process up until a certain point into the model. This approach has been considered for VisEx, a software application developed at the University College Cork which enables cancer researchers to visualize the tumor growth and other cancer related data in a 3D environment. The application also includes a module for cancer growth modeling based on the recorded measurements of the tumor up until a

The models described in the following section are cancer tumor growth models that have successfully been used to capture cancer evolution. The data representing the tumor size come from measurements of the tumor’s volume at different points in time. These measurements and the general equation of the model can be used to define a system of equations, whose unknowns are the parameters of the model. Solving this system actually gives the best fit of the chosen model in regards to the available data. The system is overdetermined, since the number of measurements is larger than the number of parameters. Therefore, the non-linear least squares method can be applied in order to approximate the parameters of the model. The method attempts to minimize the sum of squared residuals, where the residual is the difference between the value given by the model and the real measured value. The Gompertz model is one of the first and most widely used mathematical models for tumor growth. Studies have shown that out of the existing models for tumor growth, this model is one of the most accurate [6]. The model holds that growth rates of tumor cells are exponential at early stages of development and slower at later stages. The general form of the Gompertz equation is given by:

978-0-7695-4016-0/10 $26.00 © 2010 IEEE DOI 10.1109/UKSIM.2010.29 10.1109/UKSIM.2010.52 10.1109/UKSIM.2010.30

f (t) = ea−b

−ct

, b, c > 0, a ∈ <

(1)

and it can be used right in this form to give the dependency between tumor volume and time. By applying the least squares method on this equation, parameters a, b and c could be best approximated. The results of the experiments carried out have highlighted 111 106 116

A. Iteration Process

that the Gompertz model can be the model of choice for predicting the growth of tumour xenografts in vivo.

We consider 3. The parameters a, b and c are going to be approximated to fit best the set of data {(ti , yi ) , i = 1, 2, ..., n} by using the least squares method [1]. The least squares estimate is given by: (a∗ , b∗ , c∗ ) ∈ P := (a, b, c) ∈ R3

The Fister-Panetta model[5] is a slightly modified variant of the Gompertz model and provides a simple scheme to encompass both the double exponential Gompertzian law for cancer growth but also the treatment effect on the tumor size. Bocu et.al.[8] have determined an upper bound for this model, given by: N (t) = (N0 )e

−rt

such that F (a∗ , b∗ , c∗ ) =

inf

F (a, b, c),

(a,b,c)∈P

(2) F (a, b, c) =

where N(t) represents the tumor volume over time. The unknown parameters in this equation are N0 , representing the initial tumor size and r, representing the growth rate. Again, by applying the least squares method, the two parameters could be best approximated and the curve thus obtained has provided accurate results for extrapolation.

n X

(yi − a · ln(bt + c))2

(4)

i=1

The least squares method can be implemented using the Gauss-Newton algorithm [2] [3], an iterative process which starts with an initial choice / guess for the minimum and then refines the estimate with each iteration. Suppose that Xk = (ak , bk , ck ) is the approximation at the k-th iteration. The relation between consecutive iterations is defined by:

III. L OGARITHMIC M ODEL

Xk+1 = Xk + δX

Using roughly the same technique that was applied for the models mentioned in the previous section, we will attempt to approximate the parameters of a generic logarithmic function of the form: f (t) = a · ln(bt + c) (3)

with the increment δX satisfying the normal equations: (JfT Jf )δX = JfT R

(5)

where R is the vector of residuals and Jf is the Jacobian of f (t), both evaluated at Xk :

As in the previous models, this function will be used to represent the tumor growth over time. Suppose that we have a series of measurements of a biological process like cell growth over the time interval time interval T = 0, 1, 2, ..., n. Therefore, we have the set of data {(ti , yi ) , i = 1, 2, ..., n}, where yi is the size of the cell colony at the time ti . Our prediction scheme finds the best possible fits of the equation parameters in the real data provided by the input series of data. The approximation process follows an iterative algorithm, which is basically the same that was used for the previous described models. The difficulty for this particular model arises when trying to choose the initial approximation. This is in fact a common issue when it comes to applying iterative algorithms for solving the least squares problem and there are yet no general rules on how to select the initial approximation, meaning that the problem still has to encompass a heuristic element. We have consulted several other sources, including the vast repository at zunzun.com [9], which are concerned with parameter approximation and we have not yet found an approximation method for the parameters of an equation of the type 3. This is perhaps due to the fact that an initial approximation that would assure the convergence of the approximation algorithm is hard to find in this particular case. We will present a solution which describes both the iterative process and a method for choosing the initial approximation.



 y1 − ak · ln(bk t1 + ck )  y2 − ak · ln(bk t2 + ck )   R=   ... yn − ak · ln(bk tn + ck ) 

ln(bk t1 + ck )  ln(bk t2 + ck ) Jf =   ... ln(bk t3 + ck )

ak t1 bk t1 +ck a k t2 bk t2 +ck

ak bk t1 +ck ak bk t2 +ck

...

...

a k t3 bk t3 +ck

ak bk t3 +ck

   

At any iteration k, the increment δX is determined from 5 as: δX = (JfT Jf )−1 JfT R (6) As a common practice, the termination rule for an iterative algorithm as Gauss-Newton is to stop the approximation process when either a maximum number of iterations has been reached, or when the absolute difference between the sum of squares calculated for two successive estimates Xi and Xi+1 fits within a predefined range. As an alternative to the Gauss-Newton algorithm (GNA), we have also considered the similar Levenberg-Marquardt algorithm (LMA) [4] [3]. The difference between the two algorithms is that the LMA introduces a non-negative damping factor λ in the equation of the increment which provides

117 107 112

x

a certain control over the reduction between iterations. Thus, the increment between consecutive iterations is given by: (JfT Jf + λI(JfT Jf ))δX = JfT R

Lemma. Function h(x) = xe axa is decreasing. 1−e The proof is immediate:x x ea We have h0 (x) = (1 + xa − e a ). The first factor x (1−e a )2 of the product is positive, while the second is negative (from writing the exponential function as a Taylor series). Therefore, h0 (x) < 0 so h(x) is decreasing.

(7)

The LMA is generally regarded as more robust than the GNA, since in many cases it can find the solution even if the initial choice is very far off the actual minimum. However, this didn’t apply for our function, as the LMA had also a low probability of convergence when choosing more or less random initial approximations. We stress that the convergence probability for both algorithms was vastly improved when providing them with the initial approximation obtained by applying the method described in the following section.

Proposition 8 has an unique solution, except for the case in which: y i3 − y i1 ti − ti1 = 3 y i2 − y i1 ti2 − ti1 Proof: We rewrite 8 as:

B. Choosing the Initial Approximation As stated before, the Gauss-Newton algorithm is an iterative process which requires an initial approximation. Choosing this approximation is a challenge in itself, because a bad initial choice won’t assure the convergence of the iterative process. A first choice of initial approximation was a pure heuristic approach: n 1X a= yi , b = 0, c = e n i=1

e

u

ea − 1 v ea − 1 In these conditions we have v < u and the following limits: lim g(a) =1

a%0

lim g(a) =∞

a&0

lim g(a) =

a→±∞

u

g 0 (a) =

v

yi 3 a

e

yi 2 a

−e

yi 1 a

−e

yi 1 a

−

yi 1 a

,

ti3 − ti1 =0 ti2 − ti1

(9)

u

Therefore, the numerator in 9 is negative, so g 0 (a) < 0 and g(a) is decreasing. Taking into account this result and the values of the limits we can conclude that g(a) decreases from uv to 1 on the (−∞, 0) interval and decreases from ∞ to uv on the (0, ∞) interval. We go back to 8 and since ti3 −ti1 ti2 −ti1 > 1 we deduce that the equation has a solution and the solution is unique. Fig. 1 shows the plotting of g(a) for u = 0.679 and v = 0.105. In this case, lima%0 g(a) = 1, lima&0 g(a) = ∞, lima→±∞ g(a) = 6.46667 In these conditions we can determine the value of a by applying either the bisection method or the secant method, whose computer implementations are robust and effective means of finding the approximated root. After finding the

and then we eliminate b, which leaves us with a non-linear equation in a: e

u

ve a ue a v > u 1 − ea 1 − ea u v v u ⇔ ue a (1 − e a ) < ve a (1 − e a )

Solving this system is not trivial since two of the unknowns are under the logarithm. We eliminate c from the system and we rewrite it as yi 1 a

v

v < u ⇒ h(v) > h(u) ⇔

a · ln(bti3 + c) = yi3

−e

v

ue a (1 − e a ) − ve a (1 − e a ) v a2 (e a − 1)2

Thus, we have

a · ln(bti2 + c) = yi2

yi 3 a

u >1 v

We will show that g(a) is decreasing on the (−∞, 0) and (0, ∞) intervals. The derivative of g(a):

a · ln(bti1 + c) = yi1

b(ti3 − ti1 ) = e

t i3 − t i1 t i2 − t i1

g(a) =

Therefore, we should choose three data points (ti1 , yi1 ), (ti2 , yi2 ), (ti3 , yi3 ) (i1 , i2 , i3 ∈ {1, ..n}), ti1 < ti2 < ti3 and find the initial approximation a0 ,b0 ,c0 as the solution for the following non-linear system of equations:

−e

=

e −1 We have ti1 < ti2 < ti3 . The hypothesis that the function in 3 is increasing (equivalent to ab > 0) assures that yi1 < yi2 < yi3 . Let u = yi3 − yi1 , v = yi2 − yi1 and

yi ≈ a · ln(bti + c), i = 1, 2, ..., n

yi 2 a

−1

yi −yi 2 1 a

This initial approximation has assured the convergence of the iterative process in a limited number of test cases as the simulations section will show. We will further present another method for choosing the initial approximation which has assured the convergence of the iterative process in all our test cases. We start from the observation that the measured values yi must be close to the calculated values:

b(ti2 − ti1 ) = e

yi −yi 3 1 a

(8)

118 108 113

method described in the previous section, we have reduced the system to a single equation in a. We have then applied the sector method in order to determine the value of a0 . Going back to the system, we have determined the values for b0 and c0 and thus the initial approximation is: (a0 , b0 , c0 ) = (2.162, 2.341, 1.236)

Figure 1.

Having this initial approximation, we have then applied the Gauss-Newton algorithm. The finishing conditions for the algorithm were set as: 50 maximum number of iterations or the difference between two consecutive square sums smaller than 10−30 . The algorithm finished after 9 iterations (the difference between the last two squares sums had become 0), returning the following values for a, b, c:

Plotting of g(a)

value of a, we determine the values of b and c from the initial system of equations as: b=

e

yi 3 a

(a, b, c) = (2.046, 2.823, 1.002)

yi 1

−e a ti3 − ti1

c=e

yi 1 a

As it can be also observed in Fig. 2, the original data points are well approximated by least squares curve fitting.

− bti1

yi3 −yi1 ti3 −ti1 yi2 −yi1 < ti2 −ti1 than the solution yi3 −yi1 ti3 −ti1 yi2 −yi1 > ti2 −ti1 than the solution is

We mention that if

is positive and if negative; in the case when they are equal, the equation has no solution. Therefore, in the first case the solution should be searched in the (0, +∞) interval for the bisection method and a positive number should be chosen as the initial approximation for the secant method, while in the second case the solution should be searched in the (−∞, 0) interval for the bisection method and a negative number should be chosen as the initial approximation for the secant method. Figure 2.

IV. S IMULATIONS AND E XPERIMENTS In the experiments that we have carried out, we have relied both on our own implementations of some of the algorithms mentioned in the previous section and also on the facilities offered by specialized tools for numerical mathematics like Mathematica and Scilab.

B. Initial choice comparison A theorem concerning the convergence of the GaussNewton method in regards to the choice of the initial approximation is yet to be defined. However, in an attempt to analyse the convergence probability we have conducted the experiment described in the following section. We have executed 10,000 tests for a fixed number of generated points (generated as in the previous experiment) and we have counted the cases of convergence both in the case of the heuristic choice of initial approximation and also in the case of initial approximation by our proposed method. Table I shows the results of these tests. There were 2x10,000 tests executed for every number of data points in the interval 7-31 (only odd numbers considered). As the results show, on the one hand, the percent of convergent cases for the heuristic initial choice cases is almost zero; on the other hand, our proposed method for the initial choice assures a convergence percentage of over 80% of the test cases and it even tends to increase proportionally with the number of points.

A. Generated Data The purpose of this experiment was to determine the accuracy of the least square method applied for a logarithmic function. Therefore, we have considered the logarithmic function for a = 2, b = 3 and c = 1 and we added some random perturbations to its values. Thus, the data points for the experiment were generated as follows: ti =i, i = 1, ...., 10 yi =2 ln(3ti + 1) + , ∈ (−0.5, 0.5) The resulting series: ti yi

1 2.76

2 3.90

3 4.57

4 5.12

5 5.54

6 5.93

7 6.21

8 6.48

Approximation function

9 6.71

The bolded data points were chosen to form the system for determining the initial approximation. By applying the

119 109 114

No points 7 9 11 13 15 17 19 21 23 25 27 29 31

% heuristic 4.19 3.88 1.36 3.05 1.84 0.79 0.73 2.54 1.75 1.68 2.14 0.21 0.47

0), returning the following values for a, b, c:

% proposed method 80.69 82.23 83.44 84.55 84.7 85.17 85.85 85.74 85.88 85.95 86.05 86.14 86.15

(a, b, c) = (−0.1142, −0.0042, 1.1229) As it can be observed in Fig. 3, the original data points are well approximated by the logarithmic function. D. Tumor Growth Experiment This experiment illustrates the case in which the considered logarithmic function can be used to model a cancer tumor growth. The following set of data represent real measurements of a human melanoma tumor injected in mice.

Table I C ONVERGENCE PERCENTAGE COMPARISON

1 25.8

2 55.6

3 91.4

4 104.9

5 139.2

6 192.4

7 234.5

8 305.1

9 377.2

the first line represent the time in days while the second line denotes the tumor volume in mm3

C. Bacterial Growth Experiment As a prerequisite for this experiment, we will briefly present the four phase pattern of bacterial growth. The first phase is the Lag phase, a period of slow growth in which the bacteria are adapting to the environment. This is followed by the Log phase, a period of exponential growth characterized by cell doubling. The third phase is the Stationary phase, a period when the growth slows down due to nutrient limitation. The final phase is the Death phase, in which the nutrients are depleted and the bacteria die.[7] The following set of data represent measurements of an Escherichia coli bacterial colony in its Log Phase of growth: t Tu

60 0.017

90 0.035

120 0.056

150 0.083

180 0.124

210 0.171

Using the same steps as in the previous experiments, we have first determined the initial approximation as: (a0 , b0 , c0 ) = (−228.4082, −0.0880, 0.9843) Having this initial approximation, we have applied the Gauss-Newton algorithm. The finishing conditions for the algorithm were set as: 50 maximum number of iterations or the difference between two consecutive square sums smaller than 10−30 . The algorithm finished after 14 iterations (the difference between the last two squares sums had become 0), returning the following values for a, b, c:

240 0.274

(a, b, c) = (−252.1570, −0.0841, 0.9796)

t denotes the time in minutes while Tu denotes the turbidity

As it can be observed in Fig. 4, the original data points are well approximated in this experiment too.

Using the same steps as in the previous experiment, we have first determined the initial approximation as: (a0 , b0 , c0 ) = (−0.1097, −0.0043, 1.1145) Having this initial approximation, we have applied the

Figure 4.

Figure 3.

Approximation function for melanoma tumor growth

We also mention that we’ve considered this particular data set for analyzing the dependence of the initial approximation calculation to the three data points selected to form the system. The first choice was to choose the middle point and the end points (1,5,9) which are actually the ones for which the above data was obtained. The second choice consisted of the set of points (2,5,8) for which the method generated the following initial approximation: (-176.6486,

Approximation function for bacterial Log phase growth

Gauss-Newton algorithm. The finishing conditions for the algorithm were set as: 50 maximum number of iterations or the difference between two consecutive square sums smaller than 10−30 . The algorithm finished after 10 iterations (the difference between the last two sums of squares had become

120 110 115

-0.0924, 0.9173). This approximation has still assured the convergence of the Gauss-Newton algorithm after 19 iterations. However, the third choice of (3,5,7) has generated an initial approximation (-100.5543, -0.0767, 0.6347) for which the Gauss-Newton algorithm did not converge. This analysis suggests that the method is rather strongly dependent on the choice for the data points to form the system.

[8] S. T. R. Bocu and Y. Chen. Fister-Panetta Upper Bound for Cancer Growth. Some Computational Remarks. International Conference on Biocomputation, Bioinformatics, and Biomedical Technologies, 2008. [9] Zunzun.com online curve fitting http://www.zunzun.com/.

V. C ONCLUSIONS Mathematical models aimed at growth monitoring assist biologists and medical personnel in assessing the evolution of the observed biological processes. In the particular field of cancer research, mathematical models can provide vital infromation about the cancer progression and indicate the opportune timing for different types of treatment strategies. Part of our mathematical modeling attempts, we have focused on a generic logarithmic function that can be used to represent some very important biological processes like cancer tumor growth and bacteria growth. Since these types of growths are represented in biological research as series of volumetric measurements over time, we were presented with a classic case of least squares curve fitting. In order to solve this problem, we have applied the iterative GaussNewton algorithm and also the LevenbergMarquardt for some tests. As the results have shown, the model has proven quite succesful in approximating these types of growth curves. Approximating the parameters for this particular function has raised an additional issue, as the Gauss-Newton algorithm would not converge unless choosing a very good initial approximation. In this regard, we have proposed a method for selecting such an initial approximation, which has ultimately assured the convergence of the algorithm in the vast majority of the test cases. R EFERENCES [1] A. Bjrck. Numerical methods for least squares problems. SIAM, Philadelphia, 1996. [2] G. K. D. Jukic and R. Scitovski. Least-squares fitting Gompertz curve, volume 169. Journal of Computational and Applied Mathematics, 2004. [3] S. W. J. Nocedal. Numerical Optimization. Springer, 2000. [4] C. Kelley. Iterative Methods for Optimization. SIAM, Philadelphia, 1999. [5] D. Mackenzie. Mathematical modeling and cancer, volume 37. SIAM News, 2004. [6] M. Marusic. Mathematical Models of Tumor Growth. Mathematical Communications, Department of Mathematics, University of Osijek, 1996. [7] F. R. K. V. R. M.H. Zwietering, I. Jongenburger. Modeling of the Bacterial Growth Curve, volume 56(6). Applied and Environmental Microbiology, 1990.

121 111 116

and

surface

fitting.

Lihat lebih banyak...

Logarithmic growth in biological processes

Descripción

Comentarios