Environ Model Assess (2011) 16:239–250 DOI 10.1007/s10666-011-9249-3
A Quantile Regression Approach to Evaluate Factors Influencing Residential Indoor Radon Concentration Riccardo Borgoni
Received: 2 September 2009 / Accepted: 3 January 2011 / Published online: 27 January 2011 # Springer Science+Business Media B.V. 2011
Abstract Indoor radon concentrations depend on building characteristics such as building materials, ventilation and water supply. In this paper, a quantile regression approach is proposed to evaluate the effect of some buildings factors potentially influencing indoor radon concentration. Many of the considered factors, such as soil connection, age of construction and being a single family building, are found to have a statistically significant effect; however, this is far from being constant across the entire support of indoor radon concentration. A potential impact due to geological and geo-physical reasons is also found using the altitude of building locations as a surrogate variable. In addition, a clear local spatial effect is detected by a spatial autoregression approach. Keywords Building factors . Spatial quantile additive autoregression . Spatial lag
1 Introduction Radon (the 222Rn isotope with a half-life of 3.8 days) is a naturally occurring decay product of uranium commonly found in rocks and soils. It is an odourless, colourless and tasteless gas that drifts upward through the ground to the Earth's surface undetectable to humans except by means of specialised measurement devices. The standard international unit for radon activity which is a measure of the amounts of radioactive material is the becquerel (Bq). Radon is known to be the main contributor to natural background radiation exposure. From epidemiologic studies, it was established that the health risk related to radon R. Borgoni (*) Department of Statistics, University of Milano – Bicocca, Via Bicocca degli Arcimboldi 20121 Milan, Italy e-mail:
[email protected]
and radon progeny exposure is lung cancer. In fact, radon is considered to be the major leading cause of lung cancer second to smoking. The American Environmental Protection Agency estimates that 7,000 to 30,000 annual lung cancer deaths in the USA are caused by exposure to residential radon. For this reason, monitoring surveys have been promoted in a number of Western European countries in order to assess the exposure of people to this radioactive gas [10]. The primary contributor of radon gas into the home is from the soil [23] and may reach high levels of indoor concentrations, entering through cracks or holes in the foundations and concrete floors. Air pressure differences between the soil and the house can cause the soil air to flow towards the foundation of a building. Soil porosity and permeability can also affect indoor radon concentrations (IRC). However, IRC can largely depend on building characteristics such as building materials, ventilation and water supply that affect the entry of radon into the buildings and movement between rooms therein. The dependence of IRC on building factors has been studied previously in various papers. Lèvesque et al. [22] examined statistical associations between radon measurements in Quebec, Canada, and housing factors as well as geologic indicators of high radon potential. Similar associations were studied using data from New Hampshire, Minnesota and Iowa [1, 25, 29]. All of these studies aimed at evaluating the impact of building factors on the average IRC. However, modelling the mean of the conditional distribution may provide a picture which is neither complete nor even appropriate when the actual interest is on the tail of the distribution. This is true in the presence of outliers, asymmetry or reference concentration values endorsed by law or international recommendations. In all these cases, the interest focuses more on the tail of the distribution of the pollutant concentration than on the
240
average value. Hence, it can be preferable to fit a family of robust regression models, each summarising the behaviour at different probabilities of this conditional distribution. Quantile models offer one way of achieving this. In the last decades, quantile regression has been used in various fields such as medicine [7, 13], economics [4, 14, 21], ecology [5, 6] and hydrology [26]. A broad review on the topic is given by Yu et al. [31]. So far, no attempt has been made to employ this approach in environmental modelling and particularly in the radioprotection context. In this field, modelling higher concentration levels can also be relevant, adding information to those coming from models of mean concentration levels. In this paper, it is shown how quantile regression can be usefully adopted to investigate the effect of influential factors on IRC focusing on building characteristics. Spatial effects are controlled by employing a lagged variable scheme while the potential effect of altitude, which is known to be due to (unmeasured) geo-physical underground characteristics, is accounted for by extending the quantile regression model non-parametrically. The paper is organised as follows. In the next section, the dataset is introduced. A brief review of the quantile regression is provided in Section 3 whereas Section 4 presents the main results of this study. A discussion in Section 5 concludes the paper.
2 The Data The data considered in the present study were collected within an indoor radon gas monitoring survey conducted by the Agency of Environmental Protection (ARPA) of the Lombardy Region (Italy) in 2003, aiming at mapping IRC in its regional territory. Lombardy is the most populated region of Italy and one of the most exposed to high radon concentration. The survey monitored 3,646 buildings: 1,928 working places and schools and 1,718 dwellings. As people spend most of their life in houses, dwellings alone are considered in this paper. Monitored houses were located in 547 of the 1,500 municipalities of the region (as shown in Fig. 1). A detailed description of the survey is reported by de Bartolo et al. [8]. The IRC was missing for nine measurement points and the corresponding buildings were discarded from the considered dataset. Four measures of IRC were discarded from the final analysis: two below 10 Bq/m3 which is the natural concentration of radon and two unusually large measures of more than 1,600 Bq/m3, leaving a sample size of 1,705 buildings. Measurements were performed in dwellings located on the ground floor whose basic characteristics were fixed in order to ensure the representativeness of the test. Long-term
R. Borgoni
measurements were carried out using CR-39 trace detectors,1 contained in closed plastic canisters, positioned in situ between the end of September and the beginning of November 2003. The detectors were changed after 6 months and the two semester measures were recoded. The annual average values are considered in the present paper. The mean value of IRC is 118 Bq/m3 (standard deviation 127 Bq/m3) ranging from 11 to 1337 Bq/m3. Figure 2 depicts the cumulative distribution function and the histogram of IRC. The results of the national survey [2] already showed that the mean indoor radon concentration in Lombardy (116 Bq/m3) was quite higher than the national mean value (70 Bq/m3).2 The data of the Lombardy regional survey in this study (all measurements performed on the ground floor) confirm even higher concentration levels. Figure 1 also suggests that geographically higher values tend to cluster on the territory and concentrate in the Alps area in the north of the region whereas lower IRC values characterise the southern plain. The effect of altitude on IRC is depicted in Fig. 3 and appears to be clearly not linear. Such an effect is known to be due to the geomorphological underground characteristics; hence, altitude can be considered a surrogate measure of them. In the regional survey, the principal characteristics of the investigated rooms and their buildings were also collected by means of a questionnaire administered to dwellers. More technical structural building characteristics were not available. Table 1 shows summary statistics of IRC as a function of the collected building factors. As far as the floor material is concerned, the questionnaire administered did not distinguish between marble and granite types, therefore, it was not possible to evaluate their effect separately. However, it is likely that granite could induce a higher IRC in a room, even if it is often utilised a lot less than marble. Some of the considered factors seem to affect the IRC distribution quite heavily. The average concentration, for instance, is about 40 Bq/m3 higher for single buildings than non-single, about 27 Bq/m3 higher for buildings in direct contact with the ground than those with a basement or wasp's nest, about 48 Bq/m3 higher for buildings with stone walls than those with walls made of other materials, about 33 Bq/m3 higher for buildings without an air conditioning system than those with. Other characteristics seem to have just a negligible effect. 1
An error of about 20% of the radon concentration measurement has been estimated for the detectors. 2 It can be noted that the Italian legislation does not define action levels explicitly. In many circumstances, the 90/143/Euratom recommendation is adopted which suggests 200 and 400 Bq/m3 as the reference values for, respectively, the future construction standard and for considering remedial interventions in existing dwellings.
A Quantile Regression Approach for Indoor Radon Concentration
241
Fig. 1 The study region: measurement point locations (indoor radon regional survey 2003–2004) classified according to whether the recoded value is above or below the sample median (77 Bq/m3)
average IRC as for instance the year of construction or last refurbishment (Fig. 4b).
However, Table 2 shows how the effect of such factors is quite different at various levels of the IRC distribution. This is clearly suggested also by Fig. 4a where the sample quantiles of the IRC separated by the two types of wall material are depicted. It appears that while the difference of IRC at the fifth percentile is about 7 Bq/m3, it increases up to about 264 Bq/m3 at the 95th percentile. In other words, a building characteristic may lower or may foster IRC to an extent which depends on whether the building is exposed to high or low concentration. This differential effect seems somehow to be confirmed even in the case of a building characteristic which has a less-pronounced effect on the
Quantile regression was developed as an extension of the linear regression model [17]. The approach is semi parametric in the sense that no parametric distributional form (e.g. normal) is assumed for the random component of the model. A brief review of some of the key aspects related to quantile regression is presented in this section.
(b)
200 100
0.2
0
Fn(x)
0.4
0.6
frequency
300
0.8
1.0
400
(a)
0.0
Fig. 2 Cumulative distribution function, Fn(x) (a), and histogram (b) of frequencies of IRC in the Lombardy area (regional survey 2003–2004)
3 Quantile Regression
0
500
1000
1500
radon concentration (Bq/m3)
0
500
1000
1500
radon concentration(Bq/m3)
R. Borgoni 1500
242
The standard regression model is
1000 500 0
IRC (Bq/m3)
Y ¼ mðx; bÞ þ "
0
500
1000
1500
2000
altitude (m)
Fig. 3 IRC versus the altitude (metres above sea level) of the measurement points. The line represents a local polynomial fit
Let Y be a real-valued response variable and X a set of p explanatory variables. In the analysis reported in the following section, Y represents the IRC and X a set of building characteristics. A standard goal of statistical analysis is to infer, in some way, the relationship between Y and X. Regression analysis is used to model such a relationship.
Table 1 Summary statistics of radon concentration by building characteristics
Building characteristics
where β is a vector of unknown parameters, ε is a 0mean constant-variance stochastic error which is assumed to be independently distributed in the population of interest and μ (x,β) is a known function. In standard linear regression models, it is assumed μ(x,β)=x′β. In other words, what is modelled is the conditional mean, E(Y |X = x), of the response variable Y given X= x as a function of a set of unknown parameters β. In the presence of data, the parameters β can be estimated via ordinary least squares (OLS). This amounts to solve the minimization problem min b
Missing values were due to information not reported by the dweller for that particular item of the questionnaire
ðyi mðxi ; bÞÞ2
i¼1
where n is the number of units in the sample and i represents a unit of the sample. Given a value τ, 0