Parameter uncertainty and interaction in complex environmental models

Share Embed


Descripción

WATERRESOURCES RESEARCH, VOL.30,NO. 11,PAGES 3159-3169, NOVEMBER 1994

Parameter uncertainty andinteraction in complex environmental models Robert C. Spear Environmental Engineering andHealthSciences Laboratory, Richmond Research Center

Universityof California, Berkeley

Thomas M. Grieb

Tetra Tech,Inc.,Lafayette, California, andEnvironmental Engineering andHealth Sciences Laboratory, Richmond ResearchCenter,Universityof California,Berkeley

Nong Shang Environmental Engineering andHealthSciences Laboratory,Richmond ResearchCenter Universityof California, Berkeley

Abstract.Recentlydeveloped modelsfor the estimation of risksarisingfromthe release of toxicchemicals fromhazardous wastesitesareinherently complexboth structurally andparametrically. To betterunderstand theimpactof uncertainty and interaction in the high-dimensional parameterspacesof thesemodels,the set of procedures termedregionalsensitivityanalysishasbeenextendedandappliedto the groundwaterpathway of the MMSOILS model. The extensionconsistsof a tree-

structured densityestimationtechniquewhichallowsthe characterization of complex interactionin that portionof the parameterspacewhichgivesrise to successful simulation. Resultsshowthat the parameterspacecanbe partitionedinto small, denselypopulatedregionsand relativelylarge, sparselypopulatedregions.From the high-density regionsone can identifythe importantor controllingparametersas well as theinteractionbetweenparametersin differentlocalareasof the space.This new tool canprovideguidancein the analysisandinterpretationof site-specific applicationof thesecomplex models. Introduction

In this paper we extend the approach to this type of problem developed by R. C. Spear and G. M. Hornberger Complexnumericaland analyticalmodelshave beenused [Hornbergerand Spear, 1980;Spear and Hornberger, 1980], tosimulatethe release,transport,andfate of toxicchemicals who in 1978 carried out an analysis of a multiparameter fromhazardouswaste sites [Mills, 1993; Huyakorn et al., eutrophicationmodel whose parameters were describedby !987].Recently, a new set of these modelshas been devel- statistical distribution functions. The Monte Carlo methodopedto simulate both the release of toxic chemicalsfrom ology that they developed to evaluate the relative imporseveraltypes of waste management units (e.g., landfills, tance of individual parametersin determiningmodel perforsurfaceimpoundments,waste piles, undergroundinjection mance has now come to be called regional sensitivity wells,andunderground storagetanks)andthe movementof analysis (RSA). chemicals within differentenvironmentalmedia(groundwa- In subsequentyears the RSA procedurehas been applied ter, surfacewater, soil, air, and biota). Examples of these to other water quality investigations [Van $traten, 198!; multimedia exposuremodelsincludeMMSOILS [U.S. En- Whiteheadand Hornberger, !984; Lence and Takyi, 1992; vironmentalProtectionAgency (EPA), 1992; Staskoand Jakernan eta!., 1990] and to quite different problems, Fthenakis,1993],MULTIMED [EPA, 1990],and MEPAS includingthe dynamics of moth [Auslander, 1982] and mos[Whelanet al., 1992].Multimediamodelshavebeendevel- quito [Eisenberget al., 1994]populations,toxicology[Spear opedas screeningtoolsto guidethe assessment of potential et al., 199!], nuclear safety [Cook and Gimblett, 1991], and humanand ecologicalrisks and to formulatealternative control engineering[Auslander et al., 1982; Tsai and Ausactionoptionsfor regulatoryagencies.To ensuretheir lander, 1988]. However, Beck [1987, p. 1401], in his encyeffectiveuse, however, there is a need to establishthe clopedicreview of uncertainty in water quality modeling,

magnitude andsourcesof uncertaintyassociated with model pointedout that a disadvantageof the approachwas that predictionsin order to achieve a better understandingof "the interpretation of the derived a posteriori parameter

simulated systems, to increase thereliabilityof modelpre-

dictions,andto definerealisticvaluesthat shouldbe usedin subsequent risk assessments.

distributions becomes more difficult as the dimension of the

Papernumber94WR01732.

parametervectorincreases,andfor all practicalpurposes,it seems probable that any conclusionswill have to be restricted to the properties of, at most, the univariate and bivariatemarginaldistributions."This has been the casein

0043-1397/94/94WR-01732505.00

all subsequentstudies carried out by our group using this

Copyright 1994bytheAmerican Geophysical Union.

3159

3160

SPEAR ET AL.: UNCERTAINTY AND INTERACTION IN COMPLEX MODELS

ping.If suchdifferencecanbe discernedby an appropriate statistical testappliedto the sampledistribution functions,

methodology.In our analysesof complexnumericalmodels we havealsofoundthat there are extensivesetsof parameter valuesthat result in plausiblemodelresultsthat are consistent with experimental observations.Moreover, we have found linear statisticalmodelsto be ineffectivein identifying the underlyingstructureof parameterinteractions. These resultshave led us to explorea particularclassof

conditionfor sensitivity.One canenvisionvarioustypesof parametricinteractionsthat would not be observablefrom

computer intensive algorithms for multivariable data analy-

the univariate marginal distributions, a fact that motivated

then evidenceexiststhat xi is an "important" or a "sensitive" parameter.However, it was realized from the outset

thatthisindexof sensitivity isa sufficient, butnotnecessary,

analysiseven in the earlieststudyusingthis sis. In this paper, we describea new approachwhich has multivariable arisenout of theseexplorationsandgive preliminaryresults approach.In our experience,however, conventionalmultivariable analysis of the parameter vectors has not been arisingfrom an applicationto the MMSOILS model. particularlyrevealing.

The RSA Concept The most attractive feature of the RSA idea is its conceptual simplicity. A model G contains a set of constant parametersx and a set of inputs z which togetherproducean outputy. Uncertainty and/or variabi!ityin the parametersis describedby assigningto each element of x a statistical distribution function which, taken together, constitute a multivariatedistributionfunctionf. This assignmentresults in an ensemble of models, each of structure G and with a

ExperienceWith the RSA Approach An importantdifferencebetweenthe Peel Inlet studyand all of the other applicationsof RSA that our grouphas carried out concerns the fraction of the total number of

simulations that resultin goodoutcomesor "passes." In the PeelInlet casethisfractionwasabout45%. In all subsequent studies,it has been a struggleto achieve valuesas highas 5%. The worstcasethat we have encounteredwas 20 passes

parameter vector which is a random member of the multi-

in 2.6 million simulations.In all of these subsequentcases

variate distributionf. The goal is often to ascertainwhich elementsof the parametervector are importantin producing simulationswhich mimicthe essentialfeaturesof the system being studiedas reflectedby the outputvector y. To this end, a criterionfunctionC is defined,which classifies anyy

therewassomeminorpruningof the rangeusedin thevery first runs, but for the most part these ad hoc adjustments, which were always based on the univariate marginaldistributions,were very modest, e.g., 10% to 20% reductionsin the range of one or two distributions out of 20 to 25. The

as representing a "good" simulation or a "bad" one. Good

interesting observationhas beenthat even beforethe prun-

simulationsare often referred to as "passes" or "behaviors." The good/badclassificationschememakespossible the app!icationof the largeand powerfularmoryof methods of multivariatestatisticalanalysisin exploringdifferences in the posterior distributionsassociatedwith good and bad simulationsor in exploringthe structureinducedinto the joint distributionof parametersassociatedwith goodsimu-

ing, the univariatemarginaldistributionsof passingparametersextendedover the entire allowable range for almostall parametersand that after the prune it extended acrossthe entirerangeof all parameters.We have never seenevidence that the passingparametersoccupieda single well-defined region internal to the hypercube, which might arise, for example, from a multidimensional normal distribution centered at some interior point. However, we have made explorationsof the connectednessof the pass regionusing nearest neighbor metrics of several sorts [Li et al., 1994]. These results suggestthat the pass region is generallya singleconnectedregion. The generalpictureof the passregionthat we haveformed is oneof very complexcorrelationstructurebetweenparameterswhichvarieslocallyover the space.That is, we arenot dealingwith hyperellipsoids or othermacrogeometries of the shortwhichunderlieprincipalcomponentsanalysisor other standardmultivariatestatisticalprocedures.This observation suggeststhat there is much information of interest contained in the correlation structure, but it has been a challengeto describeand analyze these interactionsin the

lations.

The power of the method arises from the classification

notion. The situationcan be visualizedgeometricallyby consideringeach element of the p-dimensionalvector of allowableparametervalues to be independentlyand uniformly distributedover the interval [0, 1]. When the initial parameterdistributionsare not sensiblymodeledas uniform or there is prior knowledge of a covariancestructure,this

informationis best dealt with in a secondary phaseof the analysis.Hence for presentpurposes,the prior parameter spaceis theunit hypercube.Any pointwithinthehypercube canbe identifiedas leadingto a goodor badsimulation by runningthe modelwith the corresponding parametervector and applyingthe classification criteriato the outputso produced. Hence the model and the classification criterion

20- and 30-dimensional spacestypical of the appliedprobprovidea meansof dividingthe hypercube intotwo regions, lemsthat we have addressed.In our work on toxicological one associatedwith good simulationsand the other with bad. modelswe had occasionto explorethe classification and The informationavailablefrom the approachis containedin regression trees(CART) procedure[Breimanet al., 1984]to the problem-specificinterpretationof the observablefea- definethe regionsin the spacethat are most usefulin tures of these two regions. classifying a parametervectoras one leadingto a goodora As wasmentionedat the outset,mostapplications of the badsimulation. CART is a nonanalytic,computerintensive RSA concepthavebeenconfinedto viewingthe good/bad procedure which leads to classification rules based on inresultsfrom the perspectiveof the univariatemarginal equalityconstraints appliedto individualparametervalues distributions. For example,if F{xi) is thepriordistribution or to linearcombinations of parameters.While CART ledto of theith elementof the parameter vectorxi, thenoneasks usefulinsights,it was not the ideal too1 since it usedboth whetherF(x•lG) = F(x•IB), thatis, whethertheconditional goodandbadparametervectors,andthe badweregenerally distributions showanydifference underthegood/bad map- in muchgreaternumbersthan the good. However,the

SPEARET AL.: UNCERTAINTYAND INTERACTIONIN COMPLEXMODELS

potential oftheCARTapproach ledustoinvestin exploring

3161

m

variationson the tree-structured,computerintensivestrat-

(4)

egywhichis its foundation.

i=1

For any partitioningof the space, the value of the loss functioncanbe calculated.As the partitioninggetsfiner,the The challengeof describingthe passregionbeyondwhat valuegetssmaller.Hence the challengeis to devisea search canbe observedfrom the'univariatemarginaldistributions, procedurethat findsa seriesof splitsthat resultin a smallL wecontend,is best approachedas a problemin multivariate while retainingrelatively simple structure. The series of densityestimation[Silv•,rrnan,1986]. Here we treat the splits is referred to as a tree for reasons that will become process of gettingpasspointsasa randomprocess. Usingan apparent.The procedurebeginsby splittingS, the parameter

Tree-Structured DensityEstimationApproach

intotwosmaller hypervolumes S½andSr. The appropriate scalingfactor, the densityof passesat a given hypercube, parameter pointis definedastheproportion of passpointsin rangeof eachparameteris equally divided into k bins, and a sufficientlysmall neighborhood of the point or, equiva- the grid pointsare usedto splitS. Hence thereare a total of lently,the chanceof gettingpassesper unit volume of the parameterspaceif samplesare taken randomlyfrom the neighborhood. Hence the densityvalue is 1 or 0, depending onwhetherthe point is containedwithin the passregionor thefail region. For thoseboundarypoints, the densitieswill bebetween0 and 1. The underlyingconceptof a new method developed by oneof usinvolvesthe extensionof the concept of the variable length histogramto multiple dimensions [Shang,1993].That is, the complexdensityof passesin the parameterspace is approximated by uniform densitiesin localregions,just as the histogramwith variablelengthdoes in a singledimension.The idea is based on the following philosophy:if the density is uniformly distributedin a local region(includingthe extreme cases where the density is equal to 0 or 1), then no further information about the parameterand parameter interactions can be extractedfrom thelocal region. The first challenge, however, is to find the localregions, determine their extent, and arrange them in somesystematic way. To gain an overview of the approach, we begin with a

k x p ways to make a split. If no informationabout the

parametersand their relationshipsexisted, i.e., if the data

pointswere uniformlydistributedin the originalspace,the

estimated density, f0, wouldbe I/S andthecorresponding L, L0, is equal to -l/S,

as can be seen from (1) and (4).

Otherwise,the unevendistributionof the densitysuggests that more informationabout parameterscan be obtainedby splittingthe regioninto two subregions.Actually,for a given

split,suppose S isdivided intotwoareas,Sq and$r, withnq andrtr pointsin eachandwherertq+ nr = n. Thenthenew density function is rtq

fl(x) = dq: • (Sq)-I

xGSq (5)

n I.

fl(x ) = dr =_n (Sr)- I L 1=

/lq

F/r

x {• Sr

dq- • dr

(6)

samplespaceS and a set of observations, x 1, x2, ß' ß, Xn, whichare independentand identically distributedsamples It is easyto showthat L 1 -< L0, and L0 - L• measuresthe froma population with unknowndensityf(x). Eachof thexi increasein accuracy when a split is made. A search is then is a vectorof p parameters.The object is to constructan conductedto findthe bestsplitby maximizingL0 - L1. The

estimate, f(x), so thatf is a goodapproximation of f.

splittingprocesscontinuesfor the newly generatedregions

Suppose S is partitionedinto m subspaces, S = U/n=•Si, in until the data in each region are judged to be uniformly whichthe n observations are splitas n = •1 hi. Thenthe distributed or the number of observations in the region is mostnatural estimate of density in subspaceSi is

fi(x) ni(•i) = --

(1)

smaller than some critical value. The splitting process and the resultingregionswhich can be describedby a binary tree structureare presentedin the following example. Figure I showsa test casein whichf comprisesa weighted sum of four bivariate normal distributions

of different means

The accuracyof estimationis defined by an integrated and variances. A random sample of size 200 was drawn from squareerror (ISE): this complexbivariatedistribution,and thesepairsof values were input to the densityestimationprocedure.The task is

ISE =f.r{f(x) -j'(x)} 2dx

(2)

Es

wherethe smallerthe ISE, the betterthe fit. Note that

•S

•S

rl i= 1 (3)

to divide the spaceinto regionsof relatively uniform density within the constraintsof the precisionallowed by the sample size. The scatterplotfor the 200 samplesis shownin Figure 2. The irregularrectangularbox structure superimposedon it representsthe boundariesof 13 subregionsdeterminedby the estimationprocedure.Figure 3 showsthe corresponding tree. The density values in the nodes are relative densities, which are calculatedby assumingthat the parameter space has a unit volume. From (!), it is easy to show that the real

densityisjust the relativedensitydividedby the real areaor

It canbeshown thatforanyf withstructure (1),the!SEcan volume of the parameterspace. The first split is at x2 -< '• estimated to withina constant by a lossfunctionL, -9.7, As is notedon the tree, only 5 of the 200 points lie in definedas

that region, with 195beinggreater than that value, as can be

3162

SPEARET AL.: UNCERTAINTY AND INTERACTION IN COMPLEXMODELS

comparing trees,adjusting thesplittingcriterionto avoidend cuts, and developingpruning strategiesfor generatingthe best subtrees.To give some senseof the concept, we define the roughnessparameter as

1 m I rn i• Si Rs =-1--

(7)

The definitionhastwo importantproperties'R -> rn for any tree with rn terminalnodesand R = rn if and only if Si =

Figure 1. Perspectiveplot of the real density.

S/rn. Hence equal splittinggives the least roughness.Conversely,the more uneventhe splitting,the larger the rough. nessparameter.In general,R is an increasingfunctionof m. That means that the roughness of a tree comes from two aspectsof a partition:finenessand evenness.Many criteria in the tree-generating processneedto be critically examined. For example,the modifiedcriterionfor makinga split is to maximize

iX(s)= (L o - L1)/(R• - Ro) seenin Figure 2. The correspondingdensitiesare 0.07 and 1.48.The proceduredeterminesthat it hasno furtherinterest in the low density region and concentrateson the higher densityportion of the space.The next split is at x 1 > -7.8. Five of the remaining195 points go to the right into a region of density0.16, and the other 190 lie in a regionof density 1.89. The processcontinuesuntil the proceduredetermines that it does not have the sample size to proceedfurther. With 200 samples in two dimensions, however, the method has discernedthe four separateclustersof pointscenteredon the subpopulationmeans, which is consistentwith the true density shown in Figure 1.

(8)

thereforethe criterion is trying to find the split with a small estimation error and, simultaneously, a small roughness parameter.

The roughnessparameter plays a central role in "tree

pruning,"which is basicallythe processof arrivingat the besttree amongall subtreeswith the sameroughnessparameter. The secondconceptimportant to arriving at a goodtree is the notion of cross validation, which is used to arrive at a defensibleestimate of the ISE. Cross validation is a repeti-

Tree Variability and Complexity The foregoing discussion and example suggesta deceptively simple picture of the art and scienceof successful tree-structured density estimation. In practice, there are many difficulties.Among them are the following. 1. For a given sample, how doesone decidewhen a node is a terminal node? Any partitioning of S correspondsto a

specialsmoothingof the data. The finer the partitioning,the rougher the estimation. Both oversmoothing and undersmoothingresult in bad estimations. 2. How doesone verify the accuracyof a particulartree? If the samesampleis used to constructa tree and to estimate

0.69 0,09 ,

.

o

_

•,16' ß

0.16

1.06

0.17

ß

.

ß

.-

o

,

, .....

,

-

'..

the ISE, the error is called a resubstitution error, and it is not, in general, a reliable estimate of ISE. 3. How doesone avoid bad splits?The simplifiedpicture givenabove for selectingsplitsfavors "end cuts," i.e., splits closeto the boundariesof regions. This can be seenfrom (6), since smaller volumes result in higher densities and smaller

':' ,,

.

,

.... .........

0.52.

ß •.:•...-•

-

..

estimation errors. 0.07

4. Becausea tree is built from a random sample,the tree itseft is subject to random variability. How does one deter-

mine thosefeatureswhich persistfrom sampleto sample? 5. How can one find hidden interactions?The simplified version looks for the largest decreasein the loss function at any stageof the splittingprocess,and as a result, it can lead one to fall into a local minimum.

The key notion in dealing with many of these problemsis to introduce a roughnessparameter, which measuresthe complexity of a tree. This measure provides a basis for

,,,

-15

-10

-5

0 Variable

5

10

15

1

Figure2. Partitioning of parameter spaceanddensity estimates based on 200 observations.

SPEAR ET AL.' UNCERTAINTY AND INTERACTION IN COMPLEX MODELS

6(•Intermediate Node 6.49 = Density

5

3163

195 X2 < -9.7

0[• Terminal Node 0.07 = Density

0.34 o.3.._•4 = Area

X1 > -7.8

183

Xl • 5.4

137

X2•

1.1

X2 > -4.3

51

Xl •;o

11

34

Xl •-1.2

/ 0.026

41

11

Xl < 3.0

i

\

/

0.026

3

S0

x2 • lO.1

0.0096

0,.,,0. 52

\

0.0.$7

6

Xl •-1.8

0.029

X2 • 6.5

/

0.024

4

X2 • 12.8

0.019

0.022

\

0.025

Figure 3. Tree diagramfor exampleproblem.

tiveprocedurein which the complete data set is randomly dividedinto v subsets(v = 10 was used). In each of the v repetitions,v - 1 subsets are combined into the learning samplethat is used to constructa tree, and the remaining subset is usedas a test sampleto estimatethe ISE. For each valueof the roughnessparameter,the estimatedISEs are averaged over the v repetitions.The resultingquantityis a reasonableestimation of the real ISE and is referred to as the

crossvalidationerror. The best roughnessparameteris that

includinglandfills, surfaceimpoundments,waste piles, undergroundinjection wells, and undergroundstoragetanks. The physicalcharacteristicsof each type of waste management unit are different, but in each case source characteristics are defined and mass balance calculations are made on an annual basis to ensure that mass is conserved in waste

managementunits that have multiple release pathways. Several mathematical models are used to represent fate and transport processesin five pathways: atmospheric,

surfacewater, groundwater,soil erosion, and food chain imatelythe smallestISE. The final tree is constructedby bioaccumulation.The uncertaintyanalysesconductedin this usingthe best roughness parameterand the completedata studyfocusedonthe simulationof contaminantmovementin set. the groundwaterpathway.The simulatedgroundwaterpathway includesthefollowingprocesses:net recharge,leaching onewhich has the smallest cross validation error, or approx-

ModelDescription MMSOILS was used to explorethe applicationof the tree-structured densityestimationmethodology to investigating modeluncertaintyandparameterinteraction.It is a multimediacontaminant fate, transport, and exposure model.It wasdeveloped as partof an assessment methodology[EPA, 1992]to estimateexposures andhealthrisks

of contaminantsfrom the soil, transportthroughthe partially saturatedzone, and contaminantdispersionwithin an under-

lying aquifer.A seriesof equationsand simplemodelsare used to link the movement of water into the soil, fluid flow

and contaminanttransport through the partially saturated zone, and mixingat the boundaryof the saturatedzone. The fate and transport of a contaminantin the aquifer are calculatedby using a quasi-analyticalsolution to a threetransientequation.It incorporatesthe effectsof associated with the release and subsequentfate and trans- dimensional

longitudinal advectionandthree-dimensional dispersion, rewastesites.The modelis capableof simulating chemical tardation, and first-order decay. •leasesfrom severaltypesof wastemanagement units, MMSOILS was designedto 'run on a personal computer port of chemicals from contaminated soils and hazardous

3164

SPEAR ET AL.: UNCERTAINTY AND INTERACTION IN COMPLEX MODELS Table

!.

Model Parameters

Parameter

Definition

Range of Values

Saturated Zone Compartment

GRAD*

aquifer gradient, m/m

COND* AQUIDP*

aquiferconductivity, m/yr aquiferdepth,m

DEPLYR2* FIELDC*

soil layer thickness,cm field capacity

1 x 10-5-0.11 0.31-6300 5.0-13.0

Unsaturated Zone Compartment

ROOTZN KS1 KS2

rootzonedepth, cm layer1soilsaturated hydraulic conductivity, cm/h layer2 soilsaturated hydraulic conductivity, cm/h

POERGMI* POERGM2

layer 1% organicmatter layer 2 % organicmatter

CN

runoff curve number

RAIN ! *

monthly rain, cm/yr

83-1090 0.01-1.0

1 x 10-7-100 1 X 10-4-25 1 x 10-6-35 2.5-10.0 1.0-4.0 72-92

78--117 Landfill

AREAl* CONDWB! CONDWB3

layerarea,m2 hydraulic conductivity, cm/h hydraulic conductivity, cm/h

58-232 1 x 10-4-35.1 1 x 10-4-0.78

THICK3*

waste layer thickness,m

ICLOSE*

waste site closure date

0.38-1.5 25-75

AREAINC

incremental wastearea,m2/yr

1.2-3.4

SOLB* CSL

benzenesolubilitylimit, mg/L benzenesoil concentration,mg/kg

315-1260 3.0-30.0

*Parametersthatexhibitedstatistically significant differences in theirdistributionsbetweenpassand nonpass groups.

system, and the model output consistsof the predicted The initial analysisinvolvedthe applicationof regional concentrationin eachenvironmentalpathwayanda selected sensitivityanalysisto investigatewhich modelparameters exposure point. For a previous study [Tetra Tech, Inc., make the largest contribution to the ability to meet the 1992],the model has also been incorporatedinto a Monte performance criterion.The methodinvolvesthe application Carlosimulation packageto randomlyselectvaluesof input of univariatestatisticaltechniquesto examine differences in parameters from specified distributions. The simulation the marginaldistributionsof each parameterbetweenthe packageallows up to !80 of the 300 MMSOILS input passand nonpassgroups.In the primary set of analyses the parameters to be varied in each model run. tree-structureddensity estimation technique was usedto. examinethe interactionsbetween parametersin the pass region of the parameterspace. Analyses

The analysesdescribedin thispaperexaminethe relative importanceof model parametersin predictingchemical Results concentrations in the groundwater at a solidwastemanage- UnivariateAnalysis ment site regulated under the Resource Conservationand

The model simulationsproduced 1090 passesand 3910 the model simulationshad previouslybeen obtainedfrom nonpasses.The hydraulicconductivityand hydraulicgra•numerousdocumentspreparedduringthe courseof remedial ent showedthe largestdifference in the cumulative distribuinvestigationand feasibility studiesconductedat the site tionfunctionbetweenthe two groups.Nine additionalm•el in [Tetra Tech, Inc., 1992].Twentymodelparameters, de- parametersexhibitedstatisticallysignificantdifferences betweenthe passand nonpassgroups. scribedin Table 1, were randomlyvaried in 5000 model their distributions simulations.These20 parameterswere selectedon the basis Differenceswere exhibitedbetweenthe two groupsin t•½

RecoveryAct of 1976.Modelparameter valuesrequired for

of theirrelativeimportance in preliminary regional sensitiv- typesof distributionbut not in the range of valuesthatmet ity analyses [TetraTech,Inc., 1992]using54parameters. the performance criteria. Correlation coefficientswere a!s0 Eachmodelrun wasseparated intotwo groups on the calculatedto evaluatethe relationshipbetweenmodelpabasisofthemodel performance criterion: predicted ground- rameters. The results, summarized in Figure 4, indicateaa water concentration of benzeneat a specified receptor absenceof importantinterparameterrelationships.Thelarg-

location(a drinking-water well located505m fromthe center estcorrelationcoefficient(-0.41) occurredbetweenconduc, of the site)andtime(77 yearsafteroperation of the solid tivity and gradient.The absolutevalue of the majorityof waste management site began). Model runs in which the

correlation coefficients was less than 0.10.

predictedbenzeneconcentration exceededthe U.S. Envi-

ronmental Protection Agency's chemical-specific applicablePartitioningof the PassPoints or relevantandappropriate requirement for benzene (1 Thediagram produced in the evaluation of thedensity :of /zg/L)wereclassified aspasses. All othermodelrunswere thepasspointsis shownin Figure5. The relativedensity classifiedas nonpasses. valueis shownfor eachnode,and the proportion of •

SPEARET AL.: UNCERTAINTYAND INTERACTIONIN COMPLEXMODELS n= !65

3165

d)/N. This informationon the density in each node is also summarized in Table 2. In terminal nodes 4, 10, 1, 3, and 14,

the relative density values were greater than 2.0, which meansthat we expect that at least 2.0 x 1090/5000= 44% of the pointsin the local regionsare passpoints. The expected number of pass points in terminal node 4, for example, is 95%. Overall, 858 of the 1090passes,or 79% were found in six of the terminal nodes,which representedapproximately

n =23

28% of the volumeof the parameterspace.Examinationof the other terminal nodes shows that 51% of the parameter space has a relative density of less than 0.5. The expected proportion of passes in these regions is less than 10%. n=l n=l m • m Generally, these results show that the parameter spacecan 0.1 0.2 0.3 0.4 0.5 be partitioned into relatively small, densely populated regionsand large, sparselypopulatedregions. The model parametersthat are responsiblefor the partiFigure4. Correlationbetween model parametersin the tioning of the parameterspaceand the valueson which the passregion(n = 1090). parameter space is partitioned are also shown in the tree diagram(Figure 5). The five parametersthat definedthe high density terminal nodes are COND (aquifer conductivity), volume of the parameter space occupied by the terminal GRAD (aquifer hydraulic gradient), SOLB (chemical solunodeis also indicated. The relative density value indicates bility), ICLOSE (waste site closure date), and DPLYR2 the density of pass points in each node. In general, if we (unsaturatedzone thickness). These parameters are a subset haven passesout of N simulations,then a relative density, of those identified in the regional sensitivity analysis that d, in a local region can be explained as the expectationof accounted for the partitioning of the pass and nonpass gettingn x d passesif we took N samplesfrom the local regions of the total parameter space (Table 2). region,or the proportion of passes in the region is (n x The parameterrangeswithin the high density nodes of the

3!4

776

CONDUCT > 630.3

275

501

GRAD> 0.01

472

72

TN1

TN:

242

SOLB.• 12.45

0.0.._•.9 0.,035

SOLB > 9.75

88

154

I,CLOSE =• 5O TN4 370

102

CONDUCT

26

> 3150.2

CONDUCT

3

• 3465.1 0.033

74

86

28

o 033

284

GRAD > 0.061

GRAD 5 0.033

O.lO!

I I

1

0.1,35, 11

0.07_•5 2 AREA1 > 101.5

O.lOl

3 iii

ICLOSE < 55

i

iiiiiiiim

iii iiiiiiiiiiiiiii

iiii

ii

iiiiii

i

iii

i i

Q Intermediate Node 1.26 = Density

0•o•,•

0.054

1

TN13/



TN3

Terminal Node 3

DPLYR2 • 586.5

AQUIDP • 9

14

2,48 = Density 0.033 = VolUme

0'033(Highdensity termtna! nodes are shaded) o_..•.•

o,o9•

0,041

ii

o,o•_!

Figure5. Partitioning ofpass points using density estimation approach (number of cases = 1090).

3166

SPEAR ET AL.: UNCERTAINTY AND INTERACTION IN COMPLEX MODELS

Table 2. Terminal

Tree DiagramStructureand ImportantParameters Relative

Node

Density

4 10 1 3 14 2 7 8 13 5 11 12 6 9

Number

of

Cases

Volume

4.35 2.94 2.80 2.48 2.17 ! .89

154 173 275 88 96 72

0.033 0.054 0.09 0.033 0.041 0.035

0.91 0.58 0.34

74 86 15

0.075 0.135 0.041

0.24 0.22 0.037 0.027 0.00

26 24 4 3

0.101 0.098 0.098 0.101 0.066

0

COND, SOLB, ICLOSE COND, GRAD, SOLB, ICLOSE COND, GRAD COND, SOLB, ICLOSE

COND, GRAD, SOLB, ICLOSE, DPLYR 2 COND, COND, COND, COND, COND, COND, COND, COND, COND,

pass region provide an effective method for explainingthe tree diagram. Figure 6 depicts the numericalrangesfor each of these five parameters within the six terminal nodesthat exhibit high densities(i.e., relative densitiesgreaterthan 1). This figure shows that the high-densityregions take on valuesover the full range of each of thesefive parameters. The passregion is not a singlehigh-densityarea describedby restrictedrangesof values for the five parameters.Instead, there are several distinct high-density areas defined by differentcombinationsof parameterranges. Examination of these combinationsprovides a means of comparingand contrastingthe effectof model parametersin differentportionsof the passregionaswell as informationon the interactionbetweenparametersthat was obscuredin the initial correlationanalysis(Figure4). For example,very low values of COND

Parameters

SOLB GRAD, GRAD, GRAD, SOLB, GRAD, GRAD, SOLB, GRAD,

SOLB SOLB SOLB, ICLOSE, DPLYR 2 GRAD SOLB SOLB, AREAl, AQUIDP GRAD SOLB, AREAl

The tree structure and these parameter ranges within the terminal nodes also provide information on the relative importanceof individualparameters. The initial split of the tree is made on the basis of values of COND, and the importanceof COND is indicated by the fact that several subsequentsplitsof the tree refine the acceptablerangesof COND within additional high-density terminal nodes. In contrast,DPLYR2 appearsto be less important in defining the high density nodes. Full ranges of values were observed in all but one of the high density terminal nodes (terminal node 14).

Another view of the importance of these parametersin differentportionsof the pass region is provided in Figure7. Terminal node 4 has the highest relative density, contains

14% of the observedpasseswithin 3.5% of the parameter

and unrestricted values for GRAD charac-

space, and is defined by very low values of COND •d terize terminal nodes 2, 3, and 4. Terminal node 1, on the relatively high values for SOLB and ICLOSE. Terminal other hand, is characterizedby very low values of GRAD node 1 encompassesa relatively large portion of the pass and all values for COND except thoseincludedin terminal region (approximately 9%) and is defined on the basisof nodes2, 3, and 4. The importanceof these combinationsin restrictedvaluesofjust two parameters,GRAD and COND. definingthesefour high-densitynodeshelpsto explainthe Terminal nodes 10, 13, and 14 have the same values for relatively strong linear relationshipthat was observedbe- COND, GRAD, and SOLB but different values for ICLOSE tween these parameters in the initial correlationanalysis and DPLYR2 at separatenodes, which results in different (r = -0.41). The tree-structured density estimation tech- densities. Terminal nodes 10 and 14 are defined on the basis

nique, however, also explainswhy a strongerrelationship of differences in the values of ICLOSE and DEPLYR2. Both does not emerge from the linear correlation analysis;while are parametersthat affect contaminantavailability, andit terminal nodes 1, 2, 3, and 4 exhibit restrictedrangesof seemslikely that theseparametersfunction as surrogates. eitherCOND or GRAD, the remaininghigh-density terminal Differencesin the densitybetweenterminal nodes13 and14 nodes (10 and 14) are characterized by wider ranges of showthe effectsof a singleparameterin defininga highmoderate values for both COND and GRAD. densityarea within the passregion. Waste Site

Conductivity TN 2

(COND)

Gradient

(GRAD)

ChemicalSolubility ClosureDate

(SOLB)

(ICLOSE)

Unsaturated

Zone

Thickness

(DPLYR2)

TN 3 TN4 TN1

TN 10

TN 14 Parameter

Ranges

Figure 6.

Parameter

Parameter

Parameter

Parameter

Ranges

Ranges

Ranges

Ranges

Numerical ranges of fiveparameters withinhighdensity terminalnodes(TN).

SPEAR ET AL.' UNCERTAINTY AND INTERACTION IN COMPLEX MODELS

3167

Terminal

Terminal

Node

Density

Node Densi_t• T" 4.35

I '..• ::.

10 !4

2.94 2.14

TN1



13

0.34

2.80

Parameter

Parameter

Conductivity (COND)

Conductivity (COND)

Gradient

Gradient

(GRAD)

(GRAD)

Solubility (SOLB)

Solubility (SOLe)

Waste Site Closure Date

Waste Site Closure Date

(ICLOSE)

(ICLOSE)

Unsaturated Zone Thickness

Unsaturated Zone Thickness

(DPLYR2)

(DPLYR2) ••



.•

ParameterRanges

•,•.__..

'v"'

•_..•

Parameter Ranges

Figure 7. Parameterrangeswithinselectedterminalnodes.

terminalnodes.Noneof theseparameters wereinvolvedin the original delineation of terminal node 5 (Figure 5), andtwo of The terminal nodesin the tree structurediagram(Figure 5) the parameters (monthly rainfall, RAIN1, and wastelayer canbe individuallysampledby rerunningthe Monte Carlo thickness, THICK3) played no role in the original parta•oning simulations usingthe restrictedrangesof the parameters that of the parameter space (Figure 5). The remaining three paramdefinethe boundaries of the terminal node of interest. These

Direet•

Search

importance in thislocalized directedsearchesof local regions of the parameter space etersappearto haveincreased wereconductedwith the goalof extractingadditionalinformationon the importanceof individualparameters. The terminalnodesalready describedin Figure 5 can be separated intotwo categories basedon information content.

region.Landfillarea (AREAl) was only importantin the partitioning of lowdensityareasin the initialanalysis. Unsat-

uratedzone thickness(DPLYR2) was importantin defining

onlyoneof thesixhighdensityterminalnodesin the original analysis (Figure6). Whilethe wasteclosuredate(ICLOSE), which defines the numberof yearsthat wasteswere addedto densitiesd contain little additional information. The lowthe landfill,was importantin definingthree of the original density portionsof the parameterspacecontainlittle inforterminalnodes,it is the singlemostimportant mationbecausethey are sparselypopulated.The high- high-density parameter in thepartitioning of thelocalized region. density nodes(d _>2.0) containlittle additional information While new parameterswere identifiedin the directed because almostall pointsarepasspointsandthe important search,a patternin parameterrelationships similarto that parameters as well as their rangesare alreadywell defined. observedin the originalanalysisexists: localizedinteracHowever,thoseterminalnodeswith a relativedensityof tionsareimportantin describing thesehighdensityregions. about1.0 representan unclearregionat the boundary Forexample, high-density regions includebothlowandhigh betweenthe low- and high-densityregions.They are of values of the landfill closure date (ICLOSE), but distinct specialinterest becausefurther partitioningof this space interactionsare observablebetweenother parametersin these

Thoseterminal nodesthat exhibit either low or high relative

mayidentify other high-densitysubspaces.

Terminalnode5 from the originaltree structurediagram

differenthigh-density regions.In TN5-7, low valuesin the

range ofICLOSEareassociated withhighvalues in (Figure 5), witha relativedensityof 0.91,wasexamined by parameter theparameter rangeof wastelayerthickness (THICK3)and usingthe directedsearchstrategy.A new set of 5000 thefull rangeof valuesfor monthlyrainfall(RAIN1).High simulations wererunusingrestrictedrangesforthreeparam- valuesof ICLOSE in TNS-10andTNS-11, on the otherhand, eters:conductivity (COND), hydraulicgradient(GRAD), areassociated withthefullrangeof valuesin THICK3 butwith •d chemical solubility(SOLB).Theresultsof thisanalysis a relativelylimitedrangein the valuesof RAIN 1.

aresummarized in Figures8 and9. Figure8 showsthenew

treestructurein which the portion of the parameterspace

defined by terminalnode5 has beenpartitionedinto 11 Discussion terminal nodes.The relativedensityin threeof thesenodes Monte Carlo simulationis a basic tool that is now used (TNS-7,TNS-10,andTN5-11)wasgreaterthantwicethe associated with model relative density intheparentterminal node(TN5,Figure5). routinelyto quantifyuncertainties

It hasalsobeenusedin examining therelative While these nodes represented only14%ofthevolume ofthe predictions. of modelparameters in affecting modelperforl•a! parameter space, theycontained 42%,or448of 1067, importance ofthenew modelrunsdesignated as passes.

mance.In the mostelementarysensitivityanalyses,single

Figure 9 shows theranges offiveadditional paramete• that modelp•ametersarevariedusingMonteCarlosimulation

are heldconstant.Regionalsencontribute ß to the delineationof the three new high-density whileall otherparameters

3168

SPEAR ET AL.: UNCERTAINTY AND INTERACTION IN COMPLEX MODELS

767

3OO

!CLOSE < 52.5

48

252

DPLYR2

! 93

59

2

THICK3 > 0.938

3

56

I

66

TNS.1

>

46

DPLYR2• 385.1

0.111

0.165

701

AREA1 ,• 92.8

0.0.._.•9

56

645

TN5-S DPLYR2< 284.4

0i-;

353

0.165

292

RAIN1 • 1.08

AREA1 _ 0.881

175

DPLYR2 < 737.55 o

1(•)Intermediate Node 1,64 = Density

0.111

0,091

0.0,,9

TNS-11

0.03

ms.?

Terminal

Node

5-7

2.42 = Density 0.03 = Volume

0.03 (High density terminal nodes are shaded)

Figure8. Results of directed search ofterminal node5 (number of cases= 1067). sitivityanalysis(RSA) techniquesextendthe basicsensitiv- observed relationships are discontinuous andarenotnecessar-

ity analysis by identifying therelativeimportance ofindivid- ily monotonic. TheseresultsalsoshowthatsincetheimporuaIparameters throughout the multidimensional parametertanceofindividual parameters varieswithindifferent regions of space.However,sinceRSA techniques havebeenlimitedto theparameter space, notonlyaretheresults of theelementary theexamination of the multidimensional parameter space sensitivity analyses limitedto particular regions, butthegenoneparameterat a time, they haveprovidedlittle informa- eralization of theseresults mayalsobemisleading. tionaboutthe interaction betweenparameters. Oneof theprincipalbenefitsof thetree-structured density The tree-structured densityestimation technique de- estimationtechniqueis the informationthat is obtainedto scribed inthispaperextends theabilityoftheRSAapproachguidedirected searches of theparameter space.Theprelimtoelucidate multivariable interaction intheparameter space. inaryanalyses presented inthispapershowtheabilitytouse In the preliminaryanalyticalresultspresented,we have the parameterboundariesof individual terminal nodesm shown theabilityto identifyrelatively small,densely popu- guidetheselection of theinputparameter rangesin orderto latedregionsand large,sparselypopulated, unstructureddramaticallyincreasethe fraction of model simulationsthat regions of the parameter space.Fromthe high-density meetthespecified modelperformance criteria.Thisdirected regions we are ablenot onlyto identifythe important or searchwas used to identify a secondaryset of model controllingparameters,but also to demonstratethe interac- parameters with well-defined,localizedimportance,butit tionbetweenparameters in particularlocalizedareasof the couldalsobeusedto improvetheefficiency of MonteCarlo parameterspace. The high-densityregionsidentifiedare techniques in sensitivity analyses. In theexample analysis,

characterized bydistinctly different combinations ofparam- thenumberof passesout of 5000simulationswasincreased eter ranges.High values of one parameterare associated from74atterminal node5 intheinitialanalysis (Figure 5)to withlowvaluesof another parameter in onehigh-density 1067in the directed search. region,andin a differenthigh-density regionthecorrelation The preliminary results from the examination of the betweenthe sameparameterscan be reversed.In both MMSOILS model are intended to show the use of the

cases, otherparameters govern theobserved interparameter tree-structured densityestimationtechniquein uncertain• relationships. Theseresultsexplainthe low correlation that analyses.Only a singlesubmodelof the MMSOILS model was observedin the originalanalysisof linear correlation (groundwater fate andtransport) was evaluated in •ese

between modelparameters inthepassregion (Figure 4).The preliminary analyses. Although theanalyses described repparameter spaceof thepassregioncannot bedescribed by resentonly a cursoryexaminationof the selectedsubmodel, simple linearrelationships between parameters. Instead, the theresults showhowthistechnique canbeusedto ident'ffy

SPEAR ET AL.: UNCERTAINTY AND iNTERACTION IN COMPLEX MODELS Terminal

Node

Beck,M. B., Waterqualitymodeling:A reviewof the analysisof

Density

I

5-11

4.34

:".,'• •

5-10 5-7

2.26 2.42

uncertainty,Water Resour. Res., 23(8), 1393-1442, 1987. Breiman, L., J. Friedman, R. Olshen, and C. Stone, Classification

and RegressionTrees,WadsworthInternationalGroup, Pacific Grove, Calif., !984.

Cook, I., and C. G. Gimblett, A risk perspectiveon fusion safety phenomena,FusionEng. Des., 17, 301-306, 1991. Eisenberg,J. N., W. K. Reisen, and R. C. Spear, Sensitivity analysisof a dynamic model comparingthe bionomicsof two isolatedCu!extarsalis(Diptera: Culicidae)populations,J. Med.

Parameter Waste Site Closure Date

Entomol., in press, 1994.

Hornberger,G. M., andR. C. Spear,Eutrophicationin PeelInlet, I, The problem:Definingbehaviorand a mathematicalmodelfor the phosphorusscenario,Water Res., 14, 29-42, 1980. Huyakorn,P.S., M. J. Ungs,L. A. Mulkey, and E. A. Sudicky,A three-dimensionalanalytical method for predictingleachate mi-

(ICLOSE) Landfill

3!69

Area

(AREA1)

gration, Ground Water, 25(5), 588-598, !987. Jakeman, A. J., F. Ghassemi, and C. R. Dietrich, Calibration and reliabilityof an aquifersystemmodelusinggeneralizedsensitivity analysis. IAHS Pub!., 195, 45-51, 1990. Lence, B. J., and A. K. Takyi, Data requirements for seasonal dischargeprograms:An applicationof a regionalizedsensitivity analysis, Water Resour. Res., 28(7), 1781-1789, 1992. Li, H., K. Watanabe, D. Auslander, and R. C. Spear, Model parameter estimation and analysis: Understanding parametric structure,Ann. Biotaed. Eng., 22, 97-!11, 1994. Mills, W. B., Development of soil cleanup levels at a former manufacturedgas plant site in Sacramento, California, in Pro-

Unsaturated Zone Thickness

(DPLYR2) Monthly Rain

(;AIN•) Waste Layer Thickness

(THICK3) Parameter Ranges

ceedingsof DevelopingCleanup Standardsfor Contaminated

Figure9. Parameterrangeswithinselectedterminalnodes defined in directed search.

Soil, Sediment and Groundwater--How Clean Is Clean?, pp. 277-288, Water Environment Federation, Alexandria, Va., 1993.

Shang, N., New developmentsin tree-structuredmethodology,

thoseparametersthat controlmodel outcomesand to examincthe complexinteractionsthat exist betweenparameters. This techniquealso has far greater applicabilityin model evaluationthan just the definition of parameterinteractions. Fromthe perspectiveof the groundwaterfate and transport submode!, this techniquecan be usedto comparealternative submodels or the relative importance of the different compartments of thesemodels(i.e., saturatedzone,unsaturated zone,andlandfillparameters).From the perspectiveof the overall risk assessmentframework which is provided by MMSOILS and similar models, this techniqueprovidesa toolwith which to evaluate the relative importanceof other exposuresubmodels(e.g., sourcecharacterization,surface

Ph.D. dissertation, Univ. of Calif., Berkeley, 1993. Silverman, B. W., Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, 1986.

Spear,R. C., andG. M. Hornberger,Eutrophicationin Peel Inlet, II, Identificationof critical uncertaintiesvia generalizedsensitivity analysis,Water Res., 14, 43-49, 1980.

Spear,R. C., F. Bois,T. Woodruff,D. Auslander,J. Parker,andS. Selvin,Modelingbenzenepharmaco-kinetics acrossthree setsof animal data: Parametricsensitivityand risk implications,Risk Anal., 1I, 641-654, 1991.

Stasko, S., and V. M. Fthenakis, Software review: MMSOILS, version 2.2, Risk Anal., 13(5), 575-579, !993.

Tetra Tech, Inc., Uncertainty analysisof the Poppy and Petunia RCRA facilities for RCRA corrective action regulatory impact

analysis,U.S. Environ.Prot. AgencyEnviron.Res. Lab., Athens, Ga., 1992.

Tsai, K. C., andD. M. Auslander,A statisticalmethodologyfor the waterpathways,atmosphericpathways,and food chain designof robustprocesscontrollers,J. Dyn. Syst. Meas. Con-

transport) andto comparealternativemodelconfigurations. trol., I10, 126-133, 1988. Risk assessment frameworks such as MMSOILS

are ulti-

matelyintendedto guide the clean-upor remediationof hazardous waste managementunits. The use of the treestructured densityestimationtechniquein conjunction with

U.S. EnvironmentalProtectionAgency(EPA), A subtitleD landfill applicationmanualfor the multimediaexposureassessment model(MULTIMED), preparedby Aqua Term Consultantsand ComputerScienceCorporation,U.S. Environ. Prot. Agency Office of Res. and Der., Athens, Ga., 1990.

therisk assessmentmodel framework providesnot only a

U.S. EnvironmentalProtectionAgency(EPA), MMSOILS: Multimedia contaminantfate, and exposure model, Environ. Prot. toolforidentifying thosesitesthatrepresent thegreatest risk andthereforeshouldbe cleanedup first, but alsoa method AgencyOfficeof Res. andDev., Athens,Ga., 1992. Van Straten,G., Analysisof modeland parameteruncertaintyin

for identifyingwhich sites are most likely to respondto available clean-upstrategies.

simplephytoplankton modelsfor Lake Balaton,in Progressin Ecological Engineering andManagementby MathematicalModelling, edited by D. M. DuBois, pp. 107-134, Editions

CEBEDOC, Liege, Belgium, 1981. Acknowledgments. Partialfinancial support forthisresearch was provided by a ChevronPostdoctoral Fellowshipfor RiskAnalysis Whelan,G., J. W. Buck,D. L. Strenge,J. G. DroppoJr., andB. L. Hoopes,Overviewof the MultimediaEnvironmental Pollutant (NS)andby the U.S. Environmental Protection Agency,Officeof System(MEPAS), Hazard. WasteHazard. Mater., Research and Development (contract68-CO-0019). The authors Assessment

gratefully acknowledge threeanonymous reviewers for theiruseful commentson the manuscript.

References

9(2), 191-208, 1992.

Whitehead,P. G., andG. M. Hornberger,Modellingalgalbehaviour in the River Thames, Water Resour. Bull., 18(8), 945-953, 1984.

T. M. Grieb, N. Shang,and R. C. Spear, EnvironmentalEngi-

andHealthSciences Laboratory, Richmond Research CenAus!ander, D. M., Spatialeffects onthestability ofa food-limitedneering mothpopulation, J. FranklinInst., $14, 347-365,1982.

ter, Universityof California,Berkeley,CA 94720.

Auslander, D. M., R. C. Spear,andG. E. Young,A simulation January24, 1994;mvis.ed June6, 1994; based approach to thedesign of controlsystems withuncertain (Received acceptedJune27, 1994.) parameters, J. Dyn. Syst.Meas.Control,!04, 20-26,1982.

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.