Do fee descriptors influence treatment choices in general practice? A multilevel discrete choice model

Share Embed


Descripción

ELSEVIER

Journal of Health Economics 16 (1997) 323-342

Do fee descriptors influence treatment choices in general practice? A multilevel discrete choice model Anthony Scott a, *, Alan Shiell b a Health Economics Research Unit, Department of Public Health, UniversityMedical Buildings, University of Aberdeen, Aberdeen, AB25 2ZD, UK b Centrefor Health Economics Research and Evaluation, Division of Community Medicine, University of Sydney, Sydney, Australia

Received 1 September 1995; accepted 1 April 1996

Abstract Before I990 Australian general practitioners (GPs) were remunerated according to consultation length. This was assumed to encourage GPs to prescribe more, counsel less and provide fewer treatments than were 'appropriate'. In an attempt to change this behaviour, the remuneration system was altered to reflect the content of consultations. This paper analyses, through the use of multilevel modelling, the effect of content-based descriptors on the discrete choice behaviour of GPs while controlling for patient, GP and practice characteristics. GPs who used content-based descriptors were just as likely to prescribe, counsel and treat compared to GPs who used time-based descriptors. © 1997 Elsevier Science B.V. All fights reserved. JEL classification: C25; I10; J33 Keywords: Remuneration; General practice; Discrete choice; Multilevel modelling

* Corresponding author. Tel.: 0224 681 818, ext. 53866; fax: 0224 662 994; e-mail: [email protected]. 0167-6296/97/$17.00 © 1997 Elsevier Science B.V. AI! fights reserved. PH S 0 1 6 7 - 6 2 9 6 ( 9 6 ) 0 0 5 2 0 - 6

324

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

1. Introduction

Research on the effects of remuneration on general practitioner (GP) behaviour remains scarce (Rosen, 1989; Kristiansen and Mooney, 1993; Scott and Hall, 1995). Understanding how the financing and organisation of general practice influences GP behaviour is crucial if incentives provided are to encourage efficient medical practice and a more complete principal-agent relationship (Rosen, 1989; Mooney, 1992). Previous studies have examined the effects of, f'LrSt, changes in the level of remuneration (Hadley et al., 1979; Rice, 1983; Barer et al., 1988; Mitchell et al., 1989; Labelle et al., 1990; Hughes and Yule, 1992); second, special payments and financial inducements (Hillman et al., 1989; Hemenway et al., 1990); and third, different remuneration systems (Hickson et al., 1987; Krasnik et al,, 1990; Kristiansen and Hjortdahl, 1992; Kristiansen and Holtedahi, 1993; Kristiansen and Mooney, 1993). One aspect of fee-for-service (FFS) remuneration that has not been considered in the literature, however, is the effect of fee descriptors on GP behaviour. Fee descriptors describe the particular service for which the GP is reimbursed. In essence they represent a contract between the payer (who is usually a third party) and the physician that specifies the service to be provided and its associated level of remuneration. Ideally, the contract should be specified so that the physician is reimbursed on the basis of performance in relation to the patient's welfare (more specifically, that part of the patient's utility function that is influenced by health care). In health care, however, the specification of such a contract is problematical for several reasons. First, the contents of the patient's utility function are unlikely to be fully known to the physician or third-party payer. It is therefore difficult to specify the contract to maximise the patient's utility (Mooney and Ryan, 1993). Second, even if the physician knew the contents of the patient's utility function, there would still be measurement problems for the purposes of remuneration. Third, the effects of the physician's actions on patient welfare are characterised by risk and uncertainty. Specifying a contract which accounts for all possible contingencies assumes that knowledge is available on all possible states of the world and their associated probabilities. This is not usually the case. Fourth, patient welfare may be influenced by factors other than the actions of the physician, such as the quality of housing and/or environmental factors. Thus it is difficult to attribute any gains (or losses) in welfare to the actions of the physician, and therefore difficult to remunerate the physician on the basis of outcome. Given these difficulties, fee schedules have been specified in terms of the volume of services provided rather than by improvements in patient welfare. As a type of contract, fee descriptors can therefore be an important aspect of FFS remuneration. If the fee descriptor is inadequately defined then a physician who, for example, wishes to minimise time input, may provide a service which in

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

325

Table 1 Time-based and content-based fee descriptors a Time-based descriptors Brief consultation: not more than 5 minutes' duration Standard consultation: more than 5 minutes' duration but not more than 25 minutes' duration Long consultation: more than 25 minutes' duration but not more than 45 minutes' duration Prolonged consultation: more than 45 minutes' duration Content-based descriptors b Level A: professional attendance for an obvious problem characterised by the straightforward nature of the task that requires a short patient history and, if required, limited examination and management Level B: professional attendance involving taking a selective history, examination of the patient with implementation of a management plan in relation to one or more problems, or a professional attendance of less than 20 minutes' duration involving components of an attendance of the type otherwise covered by levels C and D Level C: professional attendance involving taking a detailed history, an examination of multiple systems, arranging any necessary investigations and implementing a management plan in relation to one or more problems, and lasting at least 20 minutes, or a professional attendance of less than 40 minutes' duration involving components of an attendance of the type otherwise covered by level D Level D: professional attendance involving taking a detailed history, an examination of multiple systems, arranging any necessary investigations and implementing a management plan in relation to one or more complex problems, and lasting at least 40 minutes, or a professional attendance of at least 40 minutes' duration for implementation of a management plan a Descriptors refer to a surgery consultation. The same descriptors were used for home visits and consultations at institutions (e.g. nursing homes). b "Professional attendances by vocationally registered general practitioners cover consultations during which the general practitioner evaluates the patient's problem (which may include certain health screening services) and formulates a management plan, in relation to one or more conditions present in the patient. The service also includes advice to the patient and/or relatives and the recording of appropriate detail of the particular services." Source: Doessel (1990).

some way falls short of that intended by the descriptor and by its associated level of remuneration. In Australia, all physicians are paid on a fee-for-service basis according to the Medicare Fee Schedule which is in turn funded out of public expenditure. GPs are paid for each consultation and are reimbursed by sending their bill to either the patient, who can claim back 85% of the schedule fee from Medicare, or to Medicare (bulk-billing), who reimburse 85% of the schedule fee as full payment. If a GP bulk-bills then the patient faces a zero copayment. Around 75% of GP services in Australia are bulk-billed. Prior to 1990 fee descriptors for GP consultations were based solely on the time it took to conduct a consultation (see Table 1). Fees could be claimed for four levels of consultation: under 5 minutes; between 5 and 25 minutes; between 25 and 45 minutes; and over 45 minutes. Consultations normally lasted between 5 and

326

A. Scott, A. ShieIl / Journal of Health Economics 16 (1997) 323-342

25 minutes. The descriptors were alleged to create incentives for 'six minute medicine' (Dickinson and Doessel, 1990). It was suggested that GPs could reduce consultation time to 6 minutes to secure the same level of remuneration as a 25 minute consultation but with minimum effort. If there were excess demand for GP services, then the throughput of patients could be increased in a given period of time, thus increasing total net income. 'Six minute medicine' was assumed to encourage high rates of referral and prescribing, and discourage counselling and the provision of treatments (Doessel, 1990). For example, There is anecdotal evidence that the time-tiered items have induced medical practices to adopt 'prescription pad' and 'referral' type medical practices to bring consultations to a conclusion and start the next patient consultation (Doessel, 1990). As a response to these concerns, vocational registration (VR) was introduced in 1989. This was a voluntary scheme and involved first altering the fee descriptors, and second requiring GPs to participate in continuing medical education (CME) and quality assurance (QA). GPs who chose to become vocationally registered were required to use content-based descriptors, where the descriptions of consultations were based more on their content than on their length (see Table 1). Content-based fee descriptors were claimed to be specified more in terms of process and outputs rather than of inputs (Doessel, 1990): It is more reasonable to value a medical service by what is done for the patient, rather than how long the service has taken (Tulloch, 1990). This was assumed to alter the incentives generated by time-based descriptors, thus increasing the likelihood that counselling and treatments would be provided, and reducing the likelihood of referral and prescribing. Fee descriptors were changed first, and then the CME and QA schemes were set up. The data used in this study were collected at the same time as the CME and QA were being set up. That is, not all vocationally registered GPs were participating in these programmes. These changes provided an opportunity to compare vocationally registered GPs with non-vocationally registered GPs. In particular, given the fact that not all vocationally registered GPs were participating in CME and QA, it was possible to compare the behaviour of GPs who used content-based descriptors and GPs who used time-based descriptors. The main aim of this paper is to test the hypotheses that, ceteris paribus, GPs who used content-based descriptors (i.e. who were vocationally registered) were more likely to counsel and provide treatments and less likely to prescribe compared to GPs who used time-based descriptors. To this end, a model of the effect of fee descriptors on GPs' discrete choice behaviour is estimated while controlling for supply and demand-side characteristics.

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

327

2. Theoretical model The economic theory of agency applied to health care suggests that the utility functions of the physician and patient are not independent as in the conventional principal-agent relationship (Evans, 1984; Mooney and Ryan, 1993). This is due to the physician's ethical interest in 'doing his or her best for the patient', usually defined in terms of health status. Thus, the health status of the patient (HSp), is assumed to be an argument in the physician's utility function (Us), along with income (Y) and other arguments (X), such as leisure, professional status and intellectual satisfaction (Kristiansen, 1993). The GP's utility function can therefore be seen as

(1) Choices during consultations are made on behalf of the patient and maximise the physician's utility (including the physician's perception of the patient's utility) subject to various resource constraints, including physician time. Thus, choices are influenced by both demand and supply-side factors. Treatment choices are discrete in nature and for the purposes of this paper, the choice set facing the GP is assumed to be binary. GPs are assumed to weigh up the expected utilities of, for example, prescribing or not prescribing. The latter includes other elements in the choice set (e.g. referring to a specialist, recommending a follow-up visit, diagnostic testing, providing (non-drug) treatments, counselling, providing reassurance, or doing nothing). The discrete choice behaviour of individuals has been modelled in econometrics using the logit model (Pudney, 1989; Cramer, 1991). This model applies the theory of probabilistic choice to the theory of consumer behaviour. The utility associated with each mutually exclusive alternative in the choice set is assumed to vary depending on the tastes and preferences of the individual making the choice and on the utility-bearing characteristics of each alternative. Thus, the explanatory variables represent the contents of the physician's utility function and his/her tastes and preferences. Discrete choice is based on the maximisation of a random utility function, in which random errors represent the indeterminacy of individual behaviour, and various factors that determine the nature of preferences but are not known to the outside observer (McFadden, 1974; Pudney, 1989; Cramer, 199i). In the binary choice logit model, the probability that a specific decision is made in the ith consultation, P(Y/= 1)= 7ri, is as follows: 7ri = [exp(a + / 3 X ) ] / [ 1 + e x p ( a + / 3 X ) ] .

(2)

Rearranging the equation to make it linear in parameters we have ln[Tr,/(1 - 7r,)] = c~ +/3X

(3)

328

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

where a is a constant, /3 is a vector of the coefficients to be estimated and X is a vector of the utility-bearing attributes of each alternative and the tastes and preferences representing the contents of the physician's utility function.

3. Methods

3.1. Multilevel analysis Multilevel analysis is a relatively new technique and has been developed and used extensively in education research (Goldstein, 1995). It is a technique used to analyse data with a hierarchical structure, that is, data where observations are nested or clustered in groups with particular characteristics. For example, data on general practice consultations may be nested within GPs where each GP has contributed more than one consultation to the data set. GPs may in turn be nested in local areas. In this case there are three levels: the consultation, the GP and the local area. In cross-sectional data sets, clustering of observations may occur because of the type of sampling used to collect primary data (e.g. multi-stage or stratified random sampling) or because data on individuals have been linked to those from other secondary sources, for example, on the characteristics of the local area. The clustering of observations has implications for the nature of the regression analysis. In hierarchical data sets observations are not independent of one another. Thus, data on consultations within each GP and on consultations and GPs within each type of local area might be correlated. The use of standard regression techniques can produce small standard errors which overestimate the statistical significance of explanatory variables (Goldstein, 1995; Woodhouse, 1995). Multilevel analysis overcomes this problem by analysing variation that occurs at the two higher levels (i.e. variation amongst GPs and amongst types of local area) separately from variation in decision making occurring at the level of the consultation. Both the intercept and the regression coefficients from the fixed part of the model can be included in the random part of the model and allowed to vary across GPs and across different types of local area. The concept of multilevel analysis can be understood by examining the variation in the intercept of a regression model. This is usually handled by including a dummy variable in the model. However, if the intercept varies across more than two categories, then more dummy variables need to be included, thus reducing degrees of freedom. This soon becomes intractable when the intercept varies across many categories. Similarly, variation in the slope of a coefficient, if it varies across two or three categories, is easily handled by including an interaction term. Analysing variation in the slope across many categories is, however, problematical. Multilevel analysis overcomes these problems by enabling both the intercept and the slope coefficients to be included in the random

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

329

part of the model, thereby allowing them to vary randomly across many categories. For example, a simple two-level logit model, where the intercept is allowed to vary randomly at level 2, is l n ( r r i J 1 - rrij ) = a + ~xi~ + avj + sit + tzj

(4)

where ~'ij is the probability that a decision occurs in the ith consultation of the jth GP. The fixed part of the model is c~ + ~xq + av i, where a is the intercept for the model as a whole, ,Sxij is the vector of coefficients (/3 ) and variables (x) measured at level 1 (e.g. characteristics of the patient in the consultation), and Avj is the vector of coefficients (A) and variables (v) measured at level 2 (e.g. characteristics of GPs). The random part of the model is eij+/.t,j where e U is the usual level-1 random error term and/xj is the level-2 random tenn. The coefficient of /xj measures the random variation of the intercept, a, amongst GPs. This represents the departure of the jth GPs intercept from the intercept estimated for all GPs, a, and estimates the variation in decision making amongst GPs. The distributional assumptions are that *ij has a binomial distribution with a mean of zero and a variance 0-2(8ij) equal to rrij(1 - rrij), and /,j is normally distributed with a mean of zero and variance 0-2(/xj), independent of j (Prosser et al., 1991). Both of these assumptions can be tested. The simple two-level model with a random intercept (Eq. (4)) is algebraically similar to the random effects panel data model (or error components model). The multilevel model with random coefficients is also similar to the random coefficient panel data model (Hsaio, 1986; Chow, 1984; Chamberlain, 1984; Greene, 1990; Madalla, 1987). In panel data models, level-2 units are defined in terms of time, whereas multilevel models have been developed specifically to analyse cross-sectional data. Generally, most random effects panel data models can be regarded as special cases of the multilevel model with random intercept and coefficients (Dalton, 1993). The main advantage of multilevel modelling over the random intercepts/coefficients panel data approach is that the former allows intercepts and coefficients to vary randomly, not just at level 2, but also at higher levels of aggregation that often occur in large cross-sectional data sets with complex hierarchical structures. It is possible to test whether intercepts and/or coefficients are random at these higher levels of aggregation. Furthermore, the estimation of these models, in particular models with discrete dependent variables and complex random effects, is computationally easier through the availability of specialist software (Kreft et al., 1994). Multilevel analysis also overcomes problems related to aggregation. It is often necessary to decide at what level of aggregation any analysis should be undertaken. Analyses of GP behaviour have, in the past, been conducted separately at each level. Some studies have been conducted using local area data, some with GP or practice-level data and some using data collected at the level of the consulta-

330

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

tion. One problem with using aggregate data is 'aggregation bias' or ecological fallacy, where incorrect inferences are made about individual behaviour from aggregate data (Robinson, 1950). Analyses of GP behaviour using aggregate data have also been unable to collect data on important confounding variables, such as the health status of patients and GP characteristics. This has encouraged the use of often poor proxy variables and has added to the difficulty in interpreting results. Multilevel modelling enables data measured at different levels of aggregation to be analysed together and therefore provides much more flexibility when analysing the behaviour of individuals.

3.2. Data Data were taken from the Australian Morbidity and Treatment Survey 19901991 (AMTS). This was a one year Australia-wide survey of morbidity and treatments in a stratified (by state) random sample of 495 general practitioners. GPs recorded information about all surgery and home consultations for a total of 954 GP-weeks. Information was provided about the decisions made during the consultation, the characteristics of the patient, and other details relating to the medical condition. Recording was usually for two periods of one week, six months apart. Information on sampling, recruitment and descriptive results can be found elsewhere (Bridges-Webb et al., 1992). The national study has data on 98796 physician-patient encounters. A practice profile giving details of the characteristics of themselves and their practice was also completed by 95% of participating GPs. A subset of these data was created. This included consultations for upper respiratory tract infection (URTI; n = 3348), and sprain/strain (n = 837). A total of 4185 consultations were therefore included. Analysing the data by medical condition controls for variation in decision making across medical conditions. Some authors have suggested that the effect of financial incentives may vary depending upon the presenting medical condition (Hurley and Labelle, 1994). In turn, this might be related to the degree of uncertainty about the treatment of the condition and hence the amount of discretion available to the GP. For those conditions where there is a high degree of uncertainty (and therefore discretion) there will be more random variation in decision making and it would be expected that decisions are more likely to be influenced by non-clinical factors (e.g. the remuneration system). URTI and sprain/strain were chosen because they were the most commonly occurring conditions in the data set and because there was sufficient variation in the dependent variables (Scott et al., 1993). The consultations included in the data set were those where the medical conditions were new to the patient and where they were the only condition treated in that consultation. Selecting a condition that is new to the patient means that the data used are at the beginning of the episode of care rather than at different stages. This thereby

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

331

Table 2 Dependent variables

Number of decisions Proportion

Prescribe

Treat

Counsel

2721 0.65

454 0.11

923 0.22

avoids problems associated with the fact that decisions may depend on the stage of treatment or on decisions made in previous consultations. The requirement that the condition be the only condition that was treated in the consultation ensures that the choice is correctly attributed to the condition of interest. Thus the nature of the choice process is more homogeneous for all observations. The data set has a hierarchical/multilevel structure and is therefore amenable to analysis using multilevel modelling. There are 4185 consultations nested within 412 GPs, who are in turn nested within 25 types of local area. 1

3.3. Dependent variables Three decision-specific empirical models were estimated. The descriptive statistics of the dependent variables are shown in Table 2. For the decision to prescribe, GPs gave details of any prescriptions given to the patient for each problem managed. This variable was coded as 1 if at least one prescription was given and 0 otherwise. Similarly, if the GP provided a treatment or counselling, he or she was asked to record details of these actions for each problem managed. The administration of therapeutic treatment and psychological counselling were coded using the International Classification of Process in Primary Care (Bridges-Webb et al., 1992). For the decision to treat, the dependent variable was coded as 1 if at least one treatment was provided, and as 0 if no treatment was provided. For the decision to counsel, the dependent variable was coded as 1 if counselling was provided, and 0 if no counselling was provided. The coding of data was strictly monitored and inter/intra-coder reliability was shown to be very high (BridgesWebb et al., 1992).

3.4. Explanatory variables The main explanatory variables are shown in Table 3. Patient age, gender and status to the practice (i.e. whether the patient was new to the practice or not) are proxies for the existence of the patient's welfare in the GP's utility function (or

1 For the purposes of the multilevel analysis, local areas were classified into 25 groups according to the extent of competition (measured by GP density). GPs who practised in local areas characterised by the same degree of competition were therefore assumed to have a similar practice style.

332

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

Table 3 Explanatory variable definitions Explanatory variable

Coding ~

Mean/ proportion

Patient age Patient sex

Continuous Female (1) Male (0) New to practice (1) Seen previously (0) Sprain/strain (1) URTt (0) < 35 years old (0) 35-54 years old (1) 55 + years old (1) Male (1) Female (0) Continuous Other (1) Australia (0) > 10 years (1) 10 years or fewer (0) Has postgraduate qualification (1) Has no postgraduate qualifications (0) Yes (1) No (0) Continuous

24.3 0.53

Status of patient to practice Medical condition GP age

GP sex Number of other doctors in the practice Place of graduation Years in general practice Postgraduate qualifications Vocational registration GP density (full-time equivalent GPs per 10000 population)

0.11 0.20 0.15 0.71 0.15 0.90 2.16 0.23 0.67 0.56 0.61 8.59

a Bivariate cross tabulations of each explanatory variable and each dependent variable were used to code categorical and dummy variables. These codings are the same across all three models.

more precisely, the existence of the GP's perception of the patient's welfare which may include only health status). A dummy variable was also included for the medical condition (either URTI or sprain/strain). The relationship with age is expected to be positive although a U-shaped relationship might be expected since severity of URTI and sprain/strain may be higher in children and the elderly. Status of the patient to the practice is defined by whether the patient was new to the practice or had visited before. This reflects the prior knowledge of the GP about the patient's medical history, social and economic circumstances (Nazereth and King, 1993). The direction of the relationship for both gender and status to the practice is, however, difficult to predict. The influence of fee descriptors is captured by whether the GP was vocationally registered or not (i.e. used content-based or time-based descriptors). GP density was used as a proxy for the extent of excess demand which may interact with VR status. GPs faced with excess demand and who use time-based descriptors (i.e. are not vocationally registered) may prescribe more and treat and counsel less, compared to GPs who are in areas of excess supply and who use time-based

A. Scott, A. Shiell / Journai of Health Economics 16 (1997) 323-342

333

descriptors. In other words, GPs with a waiting list and who engage in six minute medicine can increase throughput and thus generate more income per unit of time more readily than can similar GPs without a waiting list. Other supply-side factors which may influence treatment choices include the physician's tastes and preferences and arguments in the physician's utility function. These are partly captured by GP and practice characteristics. Although these factors have been shown in the literature to be associated with decision making, there are no specific hypotheses of the direction of their effects (Eisenberg, 1979; Gray, 1982; Rossiter and Wilensky, 1983; Epstein et al., 1984; Waitzkin, 1984; Denig et al., 1988; Alemayehu et al., 1991; Baker and Klein, 1991; Bradley, 1991; Newton et al., 1991; Greenfield et al., 1992; Kristiansen and Hjortdahl, 1992; Kristiansen and Holtedahl, 1993). They are therefore included as potential confounding variables.

3.5. Modelling strategy MLn Vl.0 software was used (Woodhouse, 1995). MLn can use several methods to estimate the models' parameters. All are based on generalised linear modelling. In this analysis the iterative generalised least squares (IGLS) algorithm was used in conjunction with a second-order predictive quasi-likelihood (PQL) procedure (for further details see Goldstein, 1991, 1995; Goldstein and Rasbash, 1995; Rodriguez and Goldman, 1995). A simple two-level model was estimated initially. More complex patterns of variation were explored for each model. This included testing for random variation between types of local area (i.e. allowing the intercept to vary randomly at level 3) and random variation of the coefficients of explanatory variables at levels 2 and 3. The distributional assumptions of normality at level 2 and binomial variation at level 1 were also tested. The former are tested by examining a normal plot of standardised level-2 residuals, uj. The assumption of a binomial level-1 error term can be more formally examined by testing for the existence of extra-binomial variation (Williams, 1982; Wright, 1995). This involves testing the null hypothesis that tr:(e, i) = 1 (binomial variation) against a two-sided alternative that o'2(~'~j) =~ 1 (extra-binomial variation). If tr 2(eij) is significantly different from one, then this suggests that there is some unmeasured source of heterogeneity at level 1 (across consultations). In other words, the model may be misspecified. This may be due to the omission of explanatory variables at level 1, a 'sparse' data structure (that is where there are relatively few consultations per GP: Wright, 1995), or the omission of a level. The inclusion of extra-binomial variation in the model controls for such heterogeneity when evaluating the fixed parameters (Prosser et al., 199t). Since all main explanatory variables were hypothesised to influence choice, all were included in the initial multilevel logit models. The interaction term between GP density and vocational registration status, squared terms for continuous

334

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

variables, and more complex random effects were then added to each main effects model one at a time. The statistical significance of higher order terms and random effects was assessed using the Wald test. They remained in the model if the Wald test was significant at the 10% level, otherwise they were removed. In logistic models, goodness of fit is usually assessed using the likelihood ratio test which tests the null hypothesis that the vector of coefficients (/3) were all equal to zero. In addition to this, a 'pseudo' R 2 measure is often used. This is equal to the percentage decrease in the log-likelihood of the full model compared to a model with only the constant. However, MLn uses a quasi-likelihood procedure (rather than maximum likelihood) based on a linear approximation (Goldstein, 1995). The consequence of this is that the log-likelihood is often unreliable and may switch from positive to negative. A better measure of goodness of fit is one based on the predictive power of the model and the difference between the observed and fitted values. The Pearson X 2 test was therefore used (Hosmer and Lemeshow, 1989; Cramer, 1991). The Pearson X 2 statistic is distributed with n - ( p + 1) degrees of freedom. However, this is calculated on the basis of level-1 residuals and may be sensitive to the existence of extra-binomial variation, where this exists.

4. Results Table 4 shows the results for the fixed and random part of each model. The main hypotheses, that the use of content-based descriptors increases the likelihood of counselling and treatments and reduces the likelihood of prescribing, are not supported. All t-ratios for the VR variable are insignificant ( p > 0.10). Since neither VR nor GP density variables had large t-values, the interaction term between them was not included in any model. The most important factors influencing decisions in consultations for URTI and sprain/strain are the age and gender of patients, suggesting that patient characteristics are the most influential in the GP's utility function. Older patients were more likely to be prescribed medication and less likely to be counselled. Females were more likely to be counselled than males, but less likely to receive a therapeutic treatment. Patients were more likely to receive a treatment if they presented with a sprain or strain than if they had URTI. However, patients with URTI were more likely to receive a prescription compared to those with a sprain or strain. Some GP characteristics were also associated with decisions but their statistical significance varied across models. GPs aged over 55 years old were less likely to counsel their patients and more likely to prescribe medication than GPs aged under 35 years old. Compared to male GPs, female GPs were less likely to provide a treatment. GPs in practices with a low number of partners were less likely to prescribe relative to GPs in practices with more partners. GPs who held postgraduate qualifications were more likely to counsel relative to GPs with no such

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

335

Table 4 Regression results ~ Explanatory variables

Prescribe

Treat

Counsel

Fixed effects Constant

0.981 ( 2 . 9 8 ) * * *

-2.885

(-7.04)

Patient age

0.027 (9.18) * * "

-0.001

(-0.33)

-0.437

(- 2.99) * * *

Patient sex S t a t u s to p r a c t i c e

-0.107

(-

1.37)

0.090 (0.70)

Medical condition

- 1.810 ( -

GP age (35-54 years)

-0.162

12.90) * * *

0.121 ( 0 . 6 3 )

-2.161

(-4.71)

***

-0.018

(-7.23)

***

0.174 (1.99) " * -0.129

3.415 (23.73) * ° *

(-0.89)

0.109 (0.91)

-0.221

(-0.60)

-0.201

G P a g e (55 + y e a r s )

0.717 (1.90) *

-0.083

(-0.18)

- 1.36l (-2.42)

GP sex

0.024 (0.10)

-0.620

(-2.10)

Number of other doctors

(-0.56)

***

- 0 . 0 5 4 ( - 1.69) *

- 0.029 ( - 0.72)

Place of graduation

0.270 (1.32)

- 0.089 (- 0.35)

Y e a r s in g e n e r a l p r a c t i c e

0,327 (1.51)

0.074 (0.26)

Postgraduate qualifications

**

-0.184

(-

t.09)

-0.055

(- 0.26)

VR

-0.186

(-

1.26)

-0.130

(-0.66)

GP density

- 0.011 ( - 1.68) *

(-0.49) **

0.270 (0.79) 0.030 (0.67) 0.297 (1.01) -0.181

(-0.57)

0,622 (2.49) * * 0,134 (0.69)

0.002 (0.30)

0.009 (1.16)

0.733 (42.52) * * *

0.791 ( 4 3 . 6 1 ) * * *

Random effects Level 1 cr 2 ( e , j )

0.810 (40.85) * * *

Level 2 ~r ~ ( t r j )

2.34 (7.84) * *"

2.044 (6.02) * * *

3.374 (9.26) * * *

tr2(patient age) ~r ~(medical condition) ~r( #j,patient age) o'( #j, medical condition) cr( patient age, medical condition) cr Z(patient sex) ~r( try,patient sex)

0.0007 (3.77) * * *

-

-

2.470 (5.09) * * *

-

P e a r s o n X 2 (d.f,)

-0.008

(-

1.31)

- 1.636 ( - 5 . 3 5 ) * * *

0.001 (0.16) 4234 (4170)

-

-

-

1.336 ( 2 . 7 9 ) * " * - 0.578 (-

1.80) *

4287 (4170) *

-4203 (4170)

a T a b l e s h o w s r e g r e s s i o n c o e f f i c i e n t s a n d t-ratios in p a r e n t h e s e s , * -- 0 . 0 5 < p_< 0.1; * * = 0 . 0 1 < p < 0.05; ° * * = p < 0 . 0 1 .

qualifications. Finally, GPs located in areas of low GP density were less likely to prescribe. The bottom half of Table 4 shows the results for the random part of each model. For all three models, the intercept was allowed to vary randomly at levels 2 (GP) and 3 (area). However, no random variation in decision making across different types of local area were found and so the intercept was removed from the third level of the model. This indicates that GPs had similar 'average' levels of prescribing, counselling and treating across local areas. The intercept was found to vary at level 2, In Table 4, o'2(8ij) is the variance of the level-1 error tenn. This shows random variation in decision making across consultations. Each model allowed for extra-binomial variation, which was shown to be the case since o" 2(el j) is less than one for all models. Although this suggests

336

A. Scott, A. SMell/Journal of Health Economics 16 (1997) 323-342

some misspecification, the inclusion of extra-binomial variation in the model controls for this. Misspecification would have arisen if the level-1 variation was constrained to be binomial, when in fact it was extra-binomial. At level 2 (GP), orE(/~j) is the variance of the level-2 error term. This measures the random variation of the intercept amongst GPs and therefore shows the extent of random variation in decision making across GPs. For all three models, or2(/x) is significant and indicates the extent to which the jth GPs intercept is different from the intercept estimated for all GPs (i.e. the constant in the fixed part of the model). More complex patterns of variation were found for the decisions to prescribe and treat. The coefficients of all level-1 variables were included, one at a time, in the random part of each model. Each of these random variables was retained in the model if the value of the t-ratio was significant at the 10% level; otherwise the variable was removed from the model. For the decision to prescribe, the coefficients of 'patient age' and 'medical condition' were found to vary randomly across GPs at level 2. This is shown by the parameters for o" 2(patient age) and tr2(medical condition). These variances show the extent to which the coefficient of each variable varies across GPs. Thus, controlling for patient and GP characteristics, the effect of the patients' age on the decision to prescribe varies amongst GPs. Similarly, the effect of the medical condition on the decision to prescribe varies amongst GPs. As well as estimating the variances of these variables, MLn also estimates their covariances with the other variables in the random part of the model. The significant value of tr(tz~,medical condition), the covariance between o-2(/zj) and ~r2(medical condition), indicates that the effect of 'medical condition' on the decision to prescribe was greater for those GPs who prescribed less (that is, those GPs who had a lower intercep0. In other words, those GPs with a low level of prescribing are more likely to consider the medical condition when deciding to prescribe or not, compared to GPs with a higher level of prescribing. The other covariances were not significant. For the decision to treat, the coefficient of patient sex was found to vary significantly at level 2, where tr 2( patient sex) is the variance of the coefficient of patient sex across GPs. This indicates that the effect of patient sex on the decision to treat varies amongst GPs. The covariance between ~r2(/zj) and trZ(patient sex), tr(tzj,patient sex), is negative and indicates that the effect (i.e. slope) of patient sex on the decision to treat is greater for those GPs who provided fewer treatments (i.e. have a lower intercept).

5. Discussion

The results suggest that, for the management of URTI and sprain/strain, the use of content-based descriptors has had little effect on discrete choices made by

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

337

GPs. Patient age and sex were the most important factors influencing decisions, suggesting that patient-related factors were important in GPs' decision making and therefore a dominant factor in the GP's utility function. The results confirm the findings from at least some previous studies that the patient's medical condition and health status (proxied by age and sex) were the most important factors influencing decision making and that the effect of remuneration is small (Kristiansen and Hjortdahl, 1992; Kristiansen and Mooney, 1993). A possible interpretation of these results is that the wording of the new fee descriptor was such that it made no impact on the behaviour of GPs. The old and new descriptors (Table 1) were similar in scope and GPs may have considered the four new content-based descriptors to be equivalent to the four old time-based descriptors. Vocational registration was introduced with no system of monitoring the use of the new descriptors. It is then perhaps not surprising that the behaviour of GPs did not change. In practice, GPs can interpret fee descriptors however they want. For example, the differences between the new level A and B descriptors (Table I) include taking a 'short patient history' (A) and taking a 'selective history'. However, these are not defined and the GP is left to interpret them how they wish. Our results do not, however, rule out the possibility that decisions were influenced by fee descriptors during the management of other medical conditions. It is possible that for URTI and sprain/strain there was little opportunity for GPs to engage in 'six minute' medicine. 'Six minute' medicine implies that GPs have control over the length of the consultation and whether certain actions were taken. This might not be the case if the management of URTI and sprain/strain were characterised by consensus. If there was consensus amongst GPs about when to prescribe, treat and counsel for URTI and for sprain/strain then the introduction of content-based descriptors would have had little impact. The effect of contentbased descriptors may be stronger for medical problems where GPs can exercise more discretion. This discretion may come in the form of genuine uncertainty about management or GPs may simply disagree about treatment (Evans, 1990). It is difficult to assess, however, whether consensus existed for the treatment of URTI and sprain/strain. 'Consensus' is difficult to measure and, even if guidelines exist for these conditions, this does not mean that GPs are adhering to them. The results presented here may well not be representative of medical problems generally in Australian general practice. Several methodological issues deserve further discussion. The first is the use of multilevel modelling. This technique is relatively new and has been used little in applied economics. It is similar to random-effects panel data modelling, but is more flexible in terms of the number of levels and the number of random parameters that can be included in the model at any one time. The use of disaggregated 'micro' data increases the need for multilevel analysis especially where local area data are linked to individual data. In a preliminary analysis, standard logit analysis was used (Scott and Shiell, 1994). The results of the

338

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

analysis suggested VR was a significant predictor of the decisions to prescribe and to counsel. However, the results of the multilevel modelling reported here confu'rn that the earlier result was due to the inadequacy of using non-nested techniques with nested or hierarchical data. Multilevel analysis is therefore important in preventing inappropriate inferences being made when using hierarchical data. The second issue is the existence of extra-binomial variation. A sparse data structure and/or a missing level are unlikely to have caused this. A more likely reason is the possibility of omitted variables at level 1. This may indeed be the case since the process of a GP consultation is complex and there may be many factors, such as the nature of doctor-patient communication, that could influence decision making. In single-level logit models, Cramer (1991) showed that the omission of any relevant explanatory variables can cause the estimated coefficients to be biased towards zero. Thus, if variables have been omitted and the regression coefficients in the models estimated here are smaller than they should be, then their statistical significance will have been underestimated. Although this reinforces conclusions about variables that are statistically significant, the conclusions that other variables have no significant effect may be incorrect. It is plausible, therefore, that existing explanatory variables may exhibit a stronger association than we have shown here. Third, the goodness of fit of all models, as indicated by the Pearson X 2 statistic, is poor. Given that this statistic is based on the level-1 residuals, which exhibit extra-binomial variation, then it is not surprising that the models appear to be poorly fit. However, poor goodness of fit is usual for cross-sectional studies of individual decision making where unexplained random variation is to be expected (Cramer, 1991). Fourth, it is unclear what types of counselling were included in the dependent variable, to counsel (1) or not (0). If counselling was defined narrowly as specific psychological counselling then any differences in GP behaviour due to contentbased descriptors may not have been picked up in the dependent variable. The variable definition is, however, more likely to have included counselling in the form of lifestyle advice, which the authors of the survey refer to (Bridges-Webb et al., 1992). Fifth, the data used in this study were collected when not all vocationally registered GPs (VR GPs) were participating in CME and QA, since these programmes were not yet fully operational. There is no data on the proportion of VR GPs who were participating at this time. If there was a small proportion, as we have assumed, then this study has measured the effect of fee descriptors only. It may be the case that GP behaviour did subsequently change once CME and QA were operational. If there was a large proportion of VR GPs participating, then this strengthens our results in that the use of content-based descriptors and participation in C M E / Q A had no effect on decision making. Sixth, vocational registration was voluntary and so GPs who used content-based descriptors represent a self-selected group. Any difference in the behaviour of VR

A. Scott, A. ShieIl / Journal of Health Economics 16 (1997) 323-342

339

and non-VR GPs might therefore have been explained by characteristics which were not adequately reflected in the variables contained in the model. For example, GPs who already had a firm commitment to professional development and continuing education would be among the first to take up VR status. These GPs may have already been prescribing less and counselling and treating more before VR was introduced. If this was the case then any difference in behaviour may have been due to self-selection. However, no differences in behaviour between VR and non-VR GPs were found and so this scenario was unlikely. This does not, however, rule out selection bias since it may have been the case that VR GPs were prescribing m o r e and counselling and treating less before VR was introduced. VR may have brought their behaviour 'in line' with that of non-VR GPs. A non-significant VR variable may still reflect a real difference in behaviour. It is unlikely, however, that GPs who opted for VR were prescribing more and counselling and treating less than those who did not. As mentioned above, it is more likely that VR GPs already had a firm commitment to professional development and continuing education and therefore were more likely to prescribe less and counsel and treat more before VR was introduced. In conclusion, the change in fee descriptors that occurred did not influence the decisions to prescribe, treat, or counsel for URTI or sprain/strain. The implication of these results for understanding physician behaviour is that any further changes in fee descriptors need to be more fundamental than the change evaluated here. In changing from the four time-based descriptors to the four content-based descriptors, GPs may have considered them to be equivalent. Furthermore, the lack of monitoring the use of the new descriptors may also have been a factor in their failure to alter behaviour. In Australian general practice, reducing reliance on FFS payment and having a mixture of different types of remuneration would most likely reduce the adverse incentives caused by six minute medicine. This has already been suggested but must be accompanied by careful evaluation (National Health Strategy, 1992).

Acknowledgements The authors are grateful to John Cairns, Ivar Kristiansen, Gavin Mooney and two anonymous referees for useful comments on earlier drafts of this paper. Thanks also go to members of the 'multilevel modelling' and 'econometrics research' e-mail discussion lists who provided useful information in response to my questions. Thanks go to Professor Charles Bridges-Webb, Ms. Helena Britt, and David Miles of the Family Medicine Research Unit, University of Sydney, for providing data. Also acknowledged is the GP Branch, Commonwealth Department of Health, Housing, Local Government and Community Services, for providing the GP density data. The General Practice Evaluation Program of the Commonwealth Department of Health (Australia) provided financial support for an earlier

340

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

stage of this project. The Health Economics Research Unit is funded by the Chief Scientist Office of the Scottish Office Department of Health, and the Centre for Health Economics Research and Evaluation is funded by the New South Wales Department of Health. Any errors or omissions are the responsibility of the authors.

References Alemayehu, E., D.W. Molloy, G.H. Guyatt, et al., 1991, Variability in physicians' decisions on caring for chronically ill elderly patients: An international study, Canadian Medical Association Journal 144, 1133-1138. Baker, D. and R. Klein, 1991, Explaining outputs of primary health care: Population and practice factors, British Medical Journal 303, 225-229. Barer, M., R.G. Evans, and R.J. Labelle, t988, Fee controls and cost controls: Tales from the frozen north, Milbank Quarterly 66, 1-64. Bradley, C.P., 1991, Decision making and prescribing patterns: A literature review, Family Practice 8, 276-287. Bridges-Webb, C., H. Britt, and D.A. Miles et al., 1992, Morbidity and treatment in general practice in Australia 1990-1991, Medical Journal of Australia 157 (supplement), S1-56. Chamberlain, G., 1984, Panel data, in: Z. Griliches and M.D. Intriligator, eds., Handbook of econometrics, Vol. 2 (North-Holland, Amsterdam). Chow, G,C., 1984, Random and changing coefficient models, in: Z. Griliches and M.D. Intriligator, eds., Handbook of econometrics, Vol. 2 (North-Holland, Amsterdam). Cramer, J.S., 1991, The logit model: An introduction for economists (Edward Arnold, New York). Dalton, P., 1993, ML3: Software for three-level analysis, software review, Economic Journal, 103, 1592-1595. Denig, P., F.M. Haaijer-Ruskamp, and D.H. Zijsling, 1988, How physicians choose drugs, Social Science and Medicine 27, 1381-1386. Dickinson, J.A. and D.P. Doessel, 1990, Some evaluation issues in general practice, in: D.P. Doessel, ed., Towards evaluation in general practice: A workshop on vocational registration (Department of Community Services and Health, Canberra) 107-122. Doessel, D.P., ed., 1990, Towards evaluation in general practice: A workshop on vocational registration (Department of Community Services and Health, Canberra). Eisenberg, J., 1979, Sociologic influences on decision making by clinicians, Annals of Internal Medicine 90, 957-964. Epstein, A.M., C.B. Begg, and B.J. McNeil, 1984, The effects of physicians' training and personality on test ordering for ambulatory patients, American Journal of Public Health 74, 1271-1273. Evans, R.G., 1984, Strained mercy: The economics of Canadian health care (Butterworths, Toronto). Evans, R.G., 1990, The dog in the night time: Medical practice variations and health policy, in: T.F. Andersen and G. Mooney, eds., The challenges of medical practice variations (Macmillan, London) 117-152. Goldstein, H., 1991, Nonlinear multilevel models, with an application to discrete response data, Biometrika 78, 45-51. Goldstein, H., 1995, Multilevel statistical methods (Kluwer Academic, London). Goldstein, H. and J. Rasbash, 1995, Improved approximations for multilevel models with binary responses, Mimeo. (Institute of Education, London). Gray, L, 1982, The effect of the doctor's sex on the doctor-patient relationship, Journal of the Royal College of General Practitioners 32, 167-169.

A. Scott, A. ShieIl / Journal of Health Economics 16 (1997) 323-342

341

Greene, W.H., 1990, Econometric analysis, 2nd edn (Prentice-Hall, New Jersey) Greenfield, S., E.C. Nelson, M. Zubkoff, et al., 1992, Variations in resource utilisation among medical specialties and systems of care: Results from the Medical Outcomes Study, Journal of the American Medical Association 267, 624-1630. Hadley, J., J. Holahan, and W. Scanlon, 1979, Can fee-for-service reimbursement coexist with demand creation?, Inquiry 16, 247-258. Hemenway, D., A. Killen, and S.B. Cashman, 1990, Physicians' responses to financial incentives: Evidence from a for-profit ambulatory center, New England Journal of Medicine 322, 1059-1063. Hickson, G.B., W.A. Altemeier and LM. Perrin, 1987, Physician reimbursement by salany or fee-for-service: Effect an physician practice behaviour in a randomized prospective study, Paediatrics 80, 344-350. Hitlman, A.L., M.V. Pauly, and J.J. Kernstein, 1989, How do financial incentives affect physicians' clinical decisions and the financial performance of health maintenance organisations?, New England Journal of Medicine 321, 86-92. Hosmer, D.W. and S. Lemeshow, 1989, Applied logistic regression (John Wiley, New York). Hsalo, C., 1986, Analysis of panel data (Cambridge University Press, Cambridge) Hughes, D. and B. Yule, 1992, The effect of per-item fees on the behaviour of general practitioners, Journal of Health Economics 4, 413-438. Hurley, J. and R. Labelle, 1994, Relative fees and the utilisation of physicians' services in Canada, Paper 94-6 (Centre for Health Economics and Policy Analysis, McMaster University, Ontario). Krasnik, A., P.P. Groenewegen, and P.A. Pedersen, et al., 1990, Changing remuneration systems: Effects on activity in general practice, British Medical Journal 360, 1698-1701. Kreft, I.G.G., J. DeLeeuw, and R. VanderLeeden, 1994, Review of five multilevel programs, American Statistician 48, 324-335. Kristiansen, I.S., 1993, What is in the doctor's utility function? A theoretical and empirical investigation into what influences doctors' decision making (PhD Thesis, Institute of Community Medicine, University of Toronto). Kristiansen, I.S. and P. Hjortdahl, 1992, The general practitioner and laboratory utilisation: Why does it vary?, Family Practice 9, 22-27. Kristiansen, I.S. and K. Holtedahl, 1993, The effect of the remuneration system on the general practitioner's choice between surgery consultations and home visits, Journal of Epidemiotogy and Community Health 47, 481-484. Kristiansen, I.S. and G. Mooney, 1993, Remuneration of GP services: Time for more explicit objectives? A review of the systems in five industrialised countries, Health Policy 24, 203-212. Labelle, R., J. Hurley, and T. Rice, 1990, Financial incentives and medical practice: Evidence from Ontario on the effect of changes in physician fees on medical care utilisation, Paper 90-4 (Centre for Health Economics and Policy Analysis, McMaster University, Ontario). Madalla, G.S., 1987, Limited dependent variable models using panel data, Journal of Human Resources 22, 307-338. McFadden, D., 1974, Conditional logit analysis of qualitative choice behaviour, in: P. Zarembka, ed., Frontiers in econometrics (Academic Press, New York) 105-142. Mitchell, J.B., G. Wedig, and J. Cromwell, 1989, The Medicare physician fee freeze, Health Affairs 8, 21-32. Mooney, G., 1992, What do we want from our health care services? What can we expect from our physicians?, Working Paper C92-1 (Centre for Health Economics and Policy Analysis, McMaster University, Ontario). Mooney, G. and M. Ryan, 1993, Agency in health care: Getting beyond first principles, Journal of Health Economics 2, 125-136. National Health Strategy, 1992, The future of general practice, Issues Paper 3 (National Health Strategy, Canberra).

342

A. Scott, A. Shiell / Journal of Health Economics 16 (1997) 323-342

Nazereth, I. and M. King, 1993, Decision making by general practitioners in diagnosis and management of lower urinary tract symptoms in women, British Medical Journal 306, 1103-1106. Newton, J., V. Hayes, and A. Hutchinson, 1991, Factors influencing general practitioners' referral decisions, Family Practice 8, 308-313. Prosser, R., L Rasbash, and H. Goldstein, 1991, ML3 software for three-level analysis: Users' guide for V.2 (Institute of Education, University of London). Pudney, S., 1989, Modelling individual choice: The econometrics of comers, kinks and holes (Basil Blackwell, Oxford). Rice, T., 1983, The impact of changing Medicare reimbursement rates on physician induced demand, Medical Care 21,803-815. Robinson, W.S., 1950, Ecological correlations and the behaviour of individuals, American Sociology Review 31, 106-128. Rodfiguez, G. and N. Goldman, 1995, An assessment of estimation procedures for multilevel models with binary responses, Journal of the Royal Statistical Society A 158, 73-89. Rosen, B., 1989, Professional reimbursement and professional behaviour: emerging issues and research challenges, Social Science and Medicine 29, 455-462. Rossiter, L.F., and G.R. Wilensky, 1983, A re-examination of the use of physician services: the role of physician-initiated demand, Inquiry 20, 162-172. Scott, A., M. King, and A. Shiell, 1993, Factors influencing decision making in general practice: the feasibility of analysing secondary data, Discussion Paper 16 (Centre for Health Economics Research and Evaluation, University of Sydney). Scott, A., and J. Hall, 1995, Evaluating the effects of GP remuneration: problems and prospects. Health Policy, 31, 183-195. Scott, A., and A. Shiell, 1994, The influence of fee descriptors and the supply of general practitioners on treatment choices in general practice, in: C. Selby-Smith (ed), Economics and Health: 1993, Proceedings of the fifteenth Auslralian Conference of Health Economists (Public Sector Management Institute, Monash University) 165-178. Tulloch I, 1990, Descriptors for general practitioner services, in: Doessei, D.P. (ed), 1990, Towards evaluation in general practice: a workshop on vocational registration (Department of Community Services and Health, Canberra) 55-62. Waltzkin, H., 1984, Doctor-patient communication: clinical implications of social scientific research, Journal of the American Medical Association 252, 2441-2446. Williams, D.A., 1982, Extra-binomial variation in logistic linear models, Applied Statistics, 31, 144-148. Woodhouse, G., (ed), 1995, A guide to MLn for new users (Institute of Education, University of London). Wright, D.B., 1995, Extra-binomial variation in multilevel logistic models with sparse structures (Department of Psychology, Uinversity of London).

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.