A Preliminary Investigation into Biological Modeling: A Comparison, Evaluation, and Discussion of Logistic ODEs and Applications

May 24, 2017 | Autor: Shantanu Jakhete | Categoría: Mathematical Biology, Optimization (Mathematics), Mathematical Modelling
Share Embed


Descripción

A Preliminary Investigation into Biological Modeling: A Comparison, Evaluation, and Discussion of Logistic ODEs and Applications Shantanu S. Jakhete January 22, 2017

1

Introduction

As resources and labour become scarcer, our need to optimize means of production increases. One area of interest for many scientists, engineers, and mathematicians is in food production, where attempts to optimize and create ideal yields of crop are made. Although an examination of prior technical literature suggests that agricultural professionals agree on logistic s-curve models as the basis for biological growth, it is unclear which model o↵ers the best data and utility to the user. Based on prior literature, the models of interest include the 3-parameter sigmoidal, 3-parameter logistic, and the 3-parameter Gompertz model. All three types fall within the s-curve family yet di↵er by their di↵erential equations. This investigation primarily focuses on analysing these types of models with respect to their application. As we move forward in creating systems that accurately contextualize hundreds of factors into a dynamical modeling scenario, it will be imperative to have a robust modeling form and to choose a model that will allow for application well beyond the scope of an elementary mathematics course. It is also principal to this investigation to discuss the models in the context of larger systems including those that are chaotic, multivariate, or simply extended with a parameter past the measurement of the model (interpolation against extrapolation utility).

1.1

Personal Note

As a student in the Mathematics HL course, I have had the opportunity to explore di↵erent areas of mathematics, including statistics, calculus, and algebra, all of which will be useful for this investigation. Although I am interested in pursuing an education in the sciences and engineering, I am interested in how these branches of mathematics and science intersect to provide a thorough understanding of how our world works. Even though I have gained mathematical experience, I have learned the limits of our current mathematical abilities 1

and applications, which will allow me to explore natural phenomenon with an awareness of these limitations.

2 2.1

Investigation Approach Scientific Procedures

To explore the functionality of the three models, two plant test groups were grown over a period of sixteen days. Standard scientific (USDA ARS) procedures were followed in growing plants, including a standard watering time twice daily and measurement of the plant stalk height in the late afternoon.

2.2

Mathematical Analyses

Mathematical Analyses For this exploration, mathematical modeling software (SIGMAPLOT 12) will be used to execute and compute advanced statistical analyses and multivariate regression data. The time (day) is the independent variable of the regression and the mean of the plant height data is the dependent variable. For this study, three models will be considered: sigmoidal, logistic, and Gompertz. Although all models are three-parameter and belong to the s-curve family, di↵erences in their respective di↵erential equations create discrepancies in modeling. These di↵erences and ODEs are discussed in detail in section 3. The regression calculation in SIGMAPLOT is set to determine coefficients for each model through a brute-force approach until the model converges with the default step size tolerance (10 5 ) or until 200 iterations are completed. The models are also designed to minimize the sum of squares of the variance of data. Each model provides the coefficient for each parameter and the associated std. error as well as the R2 value, which will quantify best fits. For the scope of this study, the coefficients in each model carry equal weight so that the final regression is independent of coefficient significance. For both test groups, the 3-parameter sigmoidal model will be used to graph plots with 95% confidence and prediction bands as a demonstration of the percentile data that the models can provide. Residual plots were created from the 3-parameter sigmoidal model to isolate bias patterns and qualitatively discuss scatter.

3 3.1 3.1.1

Models 3-Parameter Sigmoidal Model Deriving

The logistic di↵erential equation, noted here as a sigmoidal model, is generalized as such[4]: dP = kP (A P ) (1) dt

2

The ODE can be solved by separating and using partial fractions[8,9]:

1 A

Z

1 A

dP = kdt P (A P ) Z Z dP = kdt P (A P ) Z

(P ) + (A P ) dP = kt + c1 P (A P ) 1 A

P

+

1 dP = kt + c1 P

1 [ ln |A P | + ln P ] = kt + c1 A 1 P ln = kt + c1 A A P P ln = Akt + c2 A P P = eAkt+c2 A P P = eAkt+c2 (A

P)

A 1 + Ce Akt Moving the C term into the power by setting a ratio over b allows us to equate the logistic equation di↵erential equation to a form SIGMAPLOT can interpret. P =

f (x) =

a 1+e

(x b

x0 )

(2)

for which a, b, and x0 are parametrically determined. 3.1.2

Discussion

The premise of the generalized logistic di↵erential equation is the inclusion of two cases. When P is small, the rate of population growth is equal to the carrying capacity, denoted by A. However, when P increases, over time, the rate decreases and eventually converges at the asymptote, A. The di↵erential equation seems innocuously simple, however, this simplicity is the one that contributes to its utility to natural and biological systems[8].

3

3.2 3.2.1

3-Parameter Logistic Model Discussion

This second model is one that can (as per the literature review) only be found in SIGMAPLOT. Although I believe the final general solution is one that may be common in probability, most likely for cumulative distribution functions, I am unsure why this format is included. Despite my personal unfamiliarity with this model, I chose it because of its simple and elegant form. In the simplest form, (equation 3), if a = 1, x0 = 1, b = 2, the graph looks similar to a Gaussian curve and even has a similar algebraic form. For this model, I was unable to find a literature di↵erential equation or references to the premise behind the algebraic form, but I strongly believe the underlying case is a cumulative probability function for which a generalized di↵erential equation most likely exists in a table of integration or likewise. g(x) =

a 1 + ( xx0 )b

(3)

for which a, b, and x0 are parametrically determined.

3.3 3.3.1

3-Parameter Gompertz Model Deriving

The Gompertz di↵erential equation is generalized as such: dy = ry ln(K/y) dt

(4)

(Adapted from Dr. Benchow, UCSD Mathematics)This ODE requires a substitution by dividing by K and equating but then becomes a separable form[5]: d y = ry ln(K/y) dt d y ry ln(K/y) = dt K K d y y K = r ln( ) dt K K y

Then equating

y K

d y y y = r ln( ) dt K K K to z will allow for easier separation: dz = dt Z

rzln(z)

dz = z ln z 4

Z

rdt

A secondary substitution where w = ln z and where dw = Z Z dw = rdt w ln |w| =

can be used:

rt + c1

ln | ln z| =

rt + c1

ln z = ±ec1 e ln z = ±ce z = ece

From the substitution that z =

dz z

rt rt

rt

y K:

y = ece K so that y = Kece

rt

rt

Equating the final solution to a form SIGMAPLOT interprets: (x

h(x) = ae

e

b

x0 )

(5)

for which a, b, and x0 are parametrically determined. 3.3.2

Discussion

The Gompertz di↵erential equation is a special limiting case of the generalized logistic di↵erential equation. According to Weisstein, the Gompertz model is used in actuarial science for specifying a simplified mortality law. Although its main applications include tumor growth modeling, newborn age probabilities, and various biological environments where there is a limited resource or volume, the use of e twice provides the basis for being a s-curve.

5

4 4.1

Statistical Results (Comparison & Evaluation) Models

The complete data set, including the coefficients for data, associated standard error, R2 value, and convergence information can be found in tables 1 and 2, in the appendices. For both plant test groups, all three models exceeded the .99 R2 interval. For plant test group 1, the 3-parameter sigmoidal regression had the highest R2 value of 0.997922. For plant test group 2, the 3-parameter Gompertz regression had the highest R2 value of 0.998981. Between both test groups, the 3-parameter logistic model had the highest R2 value of 0.998173. It is important to note that in the case of a nonlinear regression, a high R2 value does not always indicate good fit. This will be discussed later. The regressions of the 3-parameter sigmoidal regression with the confidence bands seem narrower for the plant test group 1 and wider for test group 2, particularly towards a larger time value.

4.2

Residuals

The 3-parameter sigmoidal regression o↵ered opportunity to examine patterns in residual data. The (elementary) residual value of the data point is calculated as observed predicted, which was done using the regression analysis module in SIGMAPLOT. Generally speaking, the residual error should not be predictable or consistent with respect to either the independent OR dependent variable [1]. The residual error plots, with respect to day and stalk height, can be seen in figures 3,4,7, and 8 in the appendices. For test group 1, there are systematic high and low values, which could suggest that the residual values are not stochastic and non-zero values can be predicted on a certain interval. Test group 2 has a stronger pattern, indicating that the residual is not at all stochastic. In fact, the residual plots for test group 2 probably o↵er a sinusoidal regression, however, completing a regression on a plot intended on explaining regressions is ironic.

5 5.1

Discussion R2 Determination

The R2 used in this investigation was the default setting from SIGMAPLOT and brief analysis of the regression solver code suggests that the R2 value is adapted from one intended for linear regressions. Although literature regarding logistic regression does not agree on a way of measuring fit, several di↵erent measures exist with di↵erent parameters to quantify it. The premise behind having specific R2 measures for sigmoidal curves is to factor in the idea of diminishing return, which would mean that data is less useful near the extremities of the data domain, because the di↵erential is approaching 0 [2].

6

5.2

Residuals

The residual plots in this investigation indicated that there was a predictability to residual data that was intended to be stochastic. According to Frost, this may suggest that the model is not explaining all that is possible, perhaps through a missing variable, higher-order term for fitting curvature, or a missing interaction between terms already in the model. Although the residual plots only demonstrate the limitations for the 3-parameter sigmoidal regression, future studies could investigate residuals across the s-curve family. The residual plots in this investigation were a first-order calculation, which meant that statistical values such as the mean, population size, or the standard deviation of the data set was not factored into the final residual value. Literature review suggests that for an advanced regression such as a s-curve family model, a multivariate residual calculation may provide more insight. One example of such calculation is the Pearson residual, or ”the di↵erence between the observed and estimated probabilities divided by the binomial standard deviation of the estimated probability” [3]. For larger sample sizes, this would mean the residuals would have a normal distribution, which could be indicated by a Shapiro-Wilk Test of Normality. From this, we can use multivariate analysis to perhaps indicate that the residuals and sample mean are independent. In terms of the regression, this would allow for a more accurate and less-biased depiction of biological function.

5.3

Applications & Remarks

One reason for having two test groups was to see di↵erences in ability to predict plant height. My initial hypothesis was correct in that the Gompertz regression would be the best at modeling the low-height plant growth whereas the sigmoidal regression would be the best at modeling the high-height plant growth. The second model, the logistic equation was in the middle, and was average at predicting both the low-height and high-height plant growth. These interesting results bring about the utility of interpolation vs. extrapolation and the application of such models to our everyday lives. Although I only explored the plant growth with three parameters and created a 2-dimensional model for analysis, we can push our regression abilities to create complex multivariate models. But what good does that do? Even though we can predict growth in the time domain modeled, we don’t know how the model will behave after the regression interval is exceeded. Even if the model retains its fit after the interval has passed, what use is it? In a society constricted with time and resource demands, the s-curves in this paper have shown that after time, the utility diminishes (diminishing returns). In the context of crops, it could mean risking the wait for 2 more days in order to secure a 2cm increase in growth. As I have learned in the process of completing this investigation, the closer we get to modeling the world around us, the further we move away from understanding its true elegance. Perhaps we are not ready to unlock the secrets of the universe and the ability to comprehend natural dynamical phenomenon may evade us, whether we like it or not.

7

Table 1: Plant 1 Group Regression Data Model

2

Coefficient

Value

Std. Error

R and Convergence

a

32.63

0.8215

R =0.997922,

b

02.08

0.0094

converged in 11

x0

11.00

0.1537

iterations

a

40.64

2.6570

R =0.99745,

b

-03.96

0.2561

converged in 2

x0

12.12

0.4655

iterations

a

41.01

2.5901

R =0.997051,

b

04.38

0.3652

converged in 10

x0

10.61

0.3384

iterations

Coefficient

Value

Std. Error

R and Convergence

a

24.99

1.1820

R =0.995114,

b

02.38

0.1631

converged in 10

x0

11.38

0.3012

iterations

a

35.19

2.3101

R =0.998896,

b

-03.39

0.1511

converged in 12

x0

13.57

0.5527

iterations

a

32.93

1.521

R =0.998981,

b

05.05

0.2631

converged in 11

x0

11.21

0.2728

iterations

2

3-Parameter Sigmoidal

3-Paramter Logistic

3-Paramter Gompertz

2

2

Table 2: Plant 2 Group Regression Data Model

2

2

3-Parameter Sigmoidal

3-Paramter Logistic

3-Paramter Gompertz

2

2

Figure 1: Plant 1 Test Group Height vs. Time Figure 2: Plant 1 Test Group Height vs. Time (Sigmoidal Regression with Bands) Figure 3: Plant 1 Test Group Sigmoidal Regression vs. Residuals (by Height) Figure 4: Plant 1 Test Group Sigmoidal Regression vs. Residuals (by Day) Produced using SIGMAPLOT 12

Figure 5: Plant 2 Test Group Height vs. Time Figure 6: Plant 2 Test Group Height vs. Time (Sigmoidal Regression with Bands) Figure 7: Plant 2 Test Group Sigmoidal Regression vs. Residuals (by Height) Figure 8: Plant 2 Test Group Sigmoidal Regression vs. Residuals (by Day) Produced using SIGMAPLOT 12

Bibliography [1]

J. Frost, "Why you need to check your residual plots for regression analysis: Or, to err is human, to err randomly is statistically divine," 2012. [Online]. Available: http://blog.minitab.com/blog/adventures-in-statistics-2/why-you-need-to-check-yourresidual-plots-for-regression-analysis. Accessed: Jan. 22, 2017.

[2]

J. Frost, "Regression analysis: How do I interpret r-squared and assess the goodness-offit?," 2013. [Online]. Available: http://blog.minitab.com/blog/adventures-in-statistics2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit. Accessed: Jan. 22, 2017.

[3]

S. W. Menard, Applied logistic regression analysis (quantitative applications in the social sciences series), 2nd ed. Thousand Oaks, CA: Sage Publications, 2001.

[4]

Northwestern Mathematics, "The Logistic Equation,". [Online]. Available: http://www.math.northwestern.edu/~mlerma/courses/math214-2-04f/notes/c2-logist.pdf. Accessed: Jan. 22, 2017.

[5]

University of California, San Diego Mathematics, "How to solve the Gompertz equation,". [Online]. Available: http://www.math.ucsd.edu/~benchow/20D-F13/Gompertz.pdf. Accessed: Jan. 22, 2017.

[6]

E. W. Weisstein, "Gompertz curve," Wolfram Research, 2017. [Online]. Available: http://mathworld.wolfram.com/GompertzCurve.html. Accessed: Jan. 22, 2017.

[7]

J. Cohen, P. Cohen, S. G. West, and L. S. Aiken, Applied multiple regression - correlation analysis for the behavioral sciences, 3rd ed. United States: L. Erlbaum Associates, 2002.

[8]

R. Larson, Calculus of A single variable 7e. Boston, MA, United States: Houghton Mifflin (Academic), 2001.

[9]

C. Quinn, C. Sangwin, R. Haese, and M. Haese, Mathematics for the international student: Mathematics HL (option): Calculus, HL topic 9, FM topic 5, for use with IB diploma programme. 2013.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.