A truncated bivariate generalized Pareto distribution

May 25, 2017 | Autor: Masoom Ali | Categoría: Distributed Computing, Modeling, Communication Networks, Communication Network, Maximum Likelihood, Computer Communications, Long Tail, Pareto Distribution, Moments, Electrical And Electronic Engineering, Generalized Pareto Distribution, Computer Communications, Long Tail, Pareto Distribution, Moments, Electrical And Electronic Engineering, Generalized Pareto Distribution

Share Embed

Laporkan tautan ini

Descripción

Computer Communications 30 (2007) 1926–1930 www.elsevier.com/locate/comcom

Short Communication

A truncated bivariate generalized Pareto distribution M. Masoom Ali a, Saralees Nadarajah a

b,*

Department of Mathematical Sciences, Ball State University, Muncie, IN 47306, USA b School of Mathematics, University of Manchester, Manchester M13 9PL, UK Received 10 July 2006; accepted 8 March 2007 Available online 13 March 2007

Abstract Following up the recent work of Ali and Nadarajah [M.M. Ali, S. Nadarajah, A truncated Pareto distribution, Computer Communications, in press], we introduce a truncated version of the bivariate generalized Pareto distribution – the most commonly known longtailed distribution – which possesses ﬁnite moments of all orders and could therefore be a better model for communication networks. Explicit expressions are derived for the product moments of the truncated distribution. An estimation procedure based on the method of maximum likelihood is also provided. Ó 2007 Elsevier B.V. All rights reserved. Keywords: Bivariate generalized Pareto distribution; Communication networks; Moments; Truncated ditribution

1. Introduction In recent years, many traﬃc measurement studies in modern communication networks such as the Internet have found long-tailed distributions [7,8,2,4,5,1]. This means that the behavior of these data signiﬁcantly departs from the traditional telephone traﬃc and its related Markov models with short-range dependence. In particular, the common Poisson arrival process and corresponding analysis based on Erlang formula are no longer valid. Despite being prevalent and important for communication networks, long-tailed distributions are diﬃcult to deal with because they do not possess ﬁnite moments of all orders. This has posed great diﬃculty with respect to modeling problems in communication networks. In this note, we propose a way to overcome this by introducing a truncated version. We consider the most popular long-tailed bivariate distribution: the bivariate generalized Pareto distribution given by the joint probability density function (pdf):

p1

gðx; yÞ ¼

Corresponding author. Tel.: +44 402 475 0464. E-mail address: [email protected] (S. Nadarajah).

0140-3664/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.comcom.2007.03.003

ð1Þ

for x > 0, y > 0, p > 0 and q > 0, where K = K(a, b, c, d, p, q) denotes the normalizing constant. Let G denote the corresponding joint cumulative distribution function (cdf) given by: Z xZ y ðuvÞp1 Gðx; yÞ ¼ K ð2Þ pþq du dv: 0 0 ða þ bu þ cv þ duvÞ The truncated version is given by the pdf: p1

f ðx; yÞ ¼

KðxyÞ pþq Xða þ bx þ cy þ dxyÞ

ð3Þ

for 0 < B 6 x 6 A < 1 and 0 < D 6 y 6 C < 1, where X = G(A, C) G(A, D) G(B, C) + G(B, D). The cdf associated with (3) is: F ðx; yÞ ¼

1 fGðx; yÞ Gðx; DÞ GðB; yÞ þ GðB; DÞg: X

ð4Þ

The corresponding marginal cdfs and marginal pdfs are 1 fGðx; CÞ Gðx; DÞ GðB; CÞ þ GðB; DÞg; X 1 F Y ðyÞ ¼ fGðA; yÞ GðA; DÞ GðB; yÞ þ GðB; DÞg; X

F X ðxÞ ¼ *

Kðx; yÞ pþq ða þ bx þ cy þ dxyÞ

M.M. Ali, S. Nadarajah / Computer Communications 30 (2007) 1926–1930

fX ðxÞ ¼

Kxp1 Cðc þ dxÞ p C F p; p þ q; p þ 1; 2 1 pþq a þ bx Xpða þ bxÞ Dðc þ dxÞ Dp 2 F 1 p; p þ q; p þ 1; a þ bx

and Ky p1 Aðb þ dyÞ p fY ðyÞ ¼ A 2 F 1 p; p þ q; p þ 1; a þ cy Xpða þ cyÞpþq Bðb þ dyÞ Bp 2 F 1 p; p þ q; p þ 1; ; a þ cy

where 2F1 denotes the Gauss hypergeometric function deﬁned by 1 X ðaÞk ðbÞk xk ; F ða; b; c; xÞ ¼ 2 1 ðcÞk k! k¼0 where (e)k = e(e + 1) (e + k 1) denotes the ascending factorial. We refer to (3) as the truncated bivariate generalized Pareto distribution. Because (3) is deﬁned over a ﬁnite interval, the truncated bivariate generalized Pareto distribution has all its moments. Thus, (3) may prove to be a much better model for communication networks. The truncation limits A, B, C and D could be determined from the nature of the communication networks. The rest of this note is organized as follows: the explicit expressions for the product moments of (3) are derived in Sections 2 and 3; the estimation procedure for (3) is given in Section 4. The calculations use the Gauss hypergeometric function deﬁned above and the Appell function of the ﬁrst kind deﬁned by F 1 ða; b; b0 ; c; z; nÞ ¼

1 X 1 X ðaÞkþl ðbÞk ðb0 Þl zk nl : ðcÞkþl k!l! k¼0 l¼0

The properties of the above special functions can be found in Prudnikov et al. [9] and Gradshteyn and Ryzhik [6]. 2. Product moments We derive two representations for the product moment E(XmYn). Theorem 1 expresses it as an inﬁnite sum of Appell functions while the expression given by Theorem 2 is a ﬁnite sum of elementary and Appell functions.

EðX m Y n Þ ¼ KfH ðA; CÞ GðA; DÞ GðB; CÞ þ GðB; DÞg

ð5Þ

where H ðP ; QÞ ¼

Z 0

P

Z 0

Q

xmþp1 y nþp1 pþq dy dx: ða þ bx þ cy þ dxyÞ

ð6Þ

Setting z = a + bx + cy + dxy, the double integral H(P, Q) can be expressed as Z Q Z P y nþp1 H ðP ; QÞ ¼ xmþp1 pþq dy dx ða þ bx þ cy þ dxyÞ 0 0 Z P Z QðcþdxÞþaþbx nþp1 ðz a bxÞ mþp1 ¼ dz dx: x zpþq ðc þ dxÞ 0 aþbx ð7Þ Now, an application of Eq. (2.2.6.1) in Prudnikov et al. [9, vol. 1] to calculate the inner integral in (7) shows that Z P mþp1 nþp1 Qnþp x ðc þ dxÞ H ðP ; QÞ ¼ pþq nþp 0 ða þ bxÞ Qðc þ dxÞ 2 F 1 n þ p; p þ q; n þ p þ 1; dx a þ bx Z P mþp1 nþp1 Qnþp x ðc þ dxÞ ¼ pþq nþp 0 ða þ bxÞ k k 1 X ðn þ pÞk ðp þ qÞk ðQÞ c þ dx dx a þ bx ðn þ p þ 1Þk k! k¼0 k 1 Qnþp X ðn þ pÞk ðp þ qÞk ðQÞ n þ p k¼0 ðn þ p þ 1Þk k! Z P mþp1 nþpþk1 x ðc þ dxÞ dx; pþqþk ða þ bxÞ 0

ð8Þ

k 1 P mþp Qnþp cnþp1 X ck ðn þ pÞk ðp þ qÞk ðQÞ ðm þ pÞðn þ pÞapþq k¼0 ak ðn þ p þ 1Þk k!

where we have used the deﬁnition of the Gauss hypergeometric function. The result of the theorem follows by applying Eq. (2.2.8.5) in Prudnikov et al. [9, vol. 1] to calculate the integral in (8). h

F 1 ðm þ p; 1 n p k; p þ q þ k; m þ p þ 1; dP bP : ; c a

Theorem 2. If (X, Y) has the joint pdf (3) and if p P 1 and q P 1 are integers then E(XmYn) is given by (5) for m P 1 and n P 1, where

for m P 1 and n P 1, where H ðP ; QÞ ¼

Proof. Using (3), one can write Z AZ C xmþp1 y nþp1 EðX m Y n Þ ¼ K pþq dy dx B D ða þ bx þ cy þ dxyÞ Z A Z C xmþp1 y nþp1 dy dx ¼K ða þ bx þ cy þ dxyÞpþq 0 0 Z AZ D xmþp1 y nþp1 pþq dy dx ða þ bx þ cy þ dxyÞ 0 0 Z BZ C xmþp1 y nþp1 pþq dy dx ða þ bx þ cy þ dxyÞ 0 0 Z BZ D xmþp1 y nþp1 þ pþq dy dx ða þ bx þ cy þ dxyÞ 0 0 ¼ KfH ðA; CÞ H ðA; DÞ H ðB; CÞ þ H ðB; DÞg;

¼

Theorem 1. If (X, Y) has the joint pdf (3) then

1927

1928

H ðP ;QÞ ¼

M.M. Ali, S. Nadarajah / Computer Communications 30 (2007) 1926–1930 nþp1 X k¼0

np1k nþpk1 X n þ p k 1 bl n þ p 1 ðaÞ k l a n p 1 k l¼0

fSðk;lÞ T ðk;lÞg; 8 kpqþ1 P > kpqþ1 ðQd þ bÞr k p q þ 1 ðQc þ aÞ > > > rþmþpþl > r > d ðQc þ aÞr r¼0 > > > rþmþpþl rþmþpþl1 > P r þ m þ p þ l 1 ðcÞ > > > fðc þ dP Þs cs g; > > s > sðcÞ1þs s¼0 > > > > if k > p þ q 1; > > Z > < P xmþpþl1 logfQðc þ dxÞ þ a þ bxgdx; Sðk; lÞ ¼ c þ dx 0 > > > > > if k ¼ p þ q 1; > > mþpþl > ðQc þ aÞ1þk > > P > pþq F 1 ðm þ p þ l;1;p þ q k 1; > > > > ðm þ p þ lÞcðQc þ aÞ > > P d P ðQd þ bÞ > > > ; m þ p þ l þ 1; ; > > c Qc þ a > : if k < p þ q 1

and 8 kpqþ1 P k p q þ 1 akpqþ1 br > > > > > r drþmþpþl ar > r¼0 > > rþmþpþl1 > P > rþmþpþl1 > > > > s > s¼0 > > rþmþpþl > > ðcÞ > > > fðc þ dP Þs cs g; > 1þs > sðcÞ > > < if k > p þ q 1; T ðk;lÞ ¼ Z P mþpþl1 > x > > log fa þ bxgdx; > > c þ dx > 0 > > if k ¼ p þ q 1; > > > > > P mþpþl a1þk > > > > > ðm þ p þ lÞcapþq F 1 ðm þ p þ l;1;p þ q k 1; > > > > > m þ p þ l þ 1; P d ; P b ; > > a c > : if k < p þ q 1:

where Sðk; lÞ ¼

Z

P

0

xmþpþl1 kpqþ1 fQðc þ dxÞ þ a þ bxg dx c þ dx

and T ðk; lÞ ¼

Z

P

0

xmþpþl1 kpqþ1 dx: fa þ bxg c þ dx

Sðk;lÞ ¼

kpqþ1 X

Z

P 0

H ðP ;QÞ ¼

0

nþp1 k

Z

QðcþdxÞþaþbx

aþbx nþp1k kpq

k¼0 mþp1

ða bxÞ z dzdx c þ dx nþp1 X n þ p 1 Z P xmþp1 ða bxÞnþp1k ¼ k ðc þ dxÞðk p q þ 1Þ 0 k¼0

¼

x

½fQðc þ dxÞ þ a þ bxgkpqþ1 fa þ bxgkpqþ1 dx nþp1 X n þ p 1 ð1Þnþp1k k¼0 nþpk1 X l¼0

Z

¼

k pqþ1 n þ p 1 k nþp1kl l b a l k

P

xmþpþl1 ½fQðc þ dxÞ þ a þ bxgkpqþ1 c þ dx 0 fa þ bxgkpqþ1 dx nþp1 X n þ p 1 ð1Þnþp1k k pqþ1 n þ p 1 k nþp1kl l b a l

r

r¼0

kpqþ1

ðQc þ aÞkpqþ1r ðQd þ bÞr

kpqþ1 X k p q þ 1 xrþmþpþl1 dx ¼ c þ dx r r¼0

ðQc þ aÞkpqþ1r ðQd þ bÞr Z cþdP ðu cÞrþmþpþl1 du drþmþpþl u c kpqþ1 X k p q þ 1 ðQc þ aÞkpqþ1r ðQd þ bÞr ¼ r r¼0 rþmþpþl1 X r þ m þ p þ l 1 ðcÞrþmþpþl1s s drþmþpþl s¼0 Z cþdP us1 du c

¼

kpqþ1 X

kpqþ1

r

r¼0

kpqþ1r

ðQc þ aÞ

r

ðQd þ bÞ

rþmþpþl1 X

rþmþpþl1

ðcÞ

s

s

s¼0 rþmþpþl1s

Proof. Since p is an integer, one can express (7) as P nþp1 X

ð10Þ

If k > p + q 1 then S(k, l) can be calculated as

Z

ð9Þ

s

fðc þ dP Þ c g : sdrþmþpþl ð11Þ

If k > p + q 1 then T(k, l) is the particular case of (11) for Q = 0. If k < p + q 1 then (9) and (10) can be calculated by application of equation (2.2.8.5) in Prudnikov et al. [9, vol. 1]. h 3. Particular cases Using special properties of the Appell functions, one can obtain elementary forms for E(XmYn) for given m and n. This is illustrated in the corollaries below. Corollary 1. If (X, Y) has the joint pdf (3) with p = 1 and d = 0 then E(X) is given by (5) with H ðA; BÞ ¼ aq1 f1 ð1 þ hAÞa ð1 þ hAÞa ahA ð1 þ /BÞ

a

a

þ ð1 þ /B þ hAÞ ahA a

a

þ ð1 þ /B þ hAÞ /B ð1 þ /BÞ /B þ ð1 þ /B þ hAÞa g=f/h2 aða2 1Þg

k

k¼0 nþpk1 X

l¼0

fSðk; lÞ T ðk;lÞg;

and K = a(a + 1)h/, where h = b/a, / = c/a and a = q 1. Corollary 2. If (X, Y) has the joint pdf (3) with p = 1 and d = 0 then E(X2) is given by (5) with

M.M. Ali, S. Nadarajah / Computer Communications 30 (2007) 1926–1930

H ðA; BÞ ¼ aq1 f2 þ ð1 þ hAÞa h2 A2 a ð1 þ hAÞa h2 A2 a2 a

2ð1 þ hAÞ ahA 2ð1 þ hAÞ a 2

a

2ð1 þ /BÞ

2 2

þ ð1 þ /B þ hAÞ h A a þ 2ð1 þ /B þ hAÞ 4ð1 þ /BÞa B/ þ 4ð1 þ /B þ hAÞa B/

4. Estimation of the truncated distribution’s parameters

a

a

þ 2ð1 þ /B þ hAÞa ahA 2ð1 þ /BÞa /2 B2 a

þ 2ð1 þ /B þ hAÞ ahA/B a 2

Here, we consider estimation of the parameters of (3) by the method of maximum likelihood. The log-likelihood for a random sample (x1, y1), . . . , (xn, yn) from (3) is: log LðA; B; C; D; a; b; c; d; p; qÞ ¼ n log K n log X þ ðp 1Þ

2

ð1 þ /B þ hAÞ h A a þ 2ð1 þ /B þ hAÞ /2 B2 g 2

n X

log xj þ ðp 1Þ

j¼1

a

3

1929

3

=fah /ð2a a þ 2 þ a Þg

n X

log y j ðp þ qÞ

j¼1

n X

logða þ bxj þ cy j þ dxj y j Þ:

j¼1

and K = a(a + 1)h/, where h = b/a, / = c/a and a = q 1. Corollary 3. If (X, Y) has the joint pdf (3) with p = 1 and d = 0 then E(Y) is given by (5) with a

H ðA; BÞ ¼ aq1 f1 ð1 þ hAÞ hA ð1 þ hAÞ ð1 þ /BÞa þ ð1 þ /B þ hAÞa

a

a

a

þ ð1 þ /B þ hAÞ a/B ð1 þ /BÞ a/B a

þ ð1 þ /B þ hAÞ hAg=fh/2 aða2 1Þg and K = a(a + 1)h/, where h = b/a, / = c/a and a = q 1. Corollary 4. If (X, Y) has the joint pdf (3) with p = 1 and d = 0 then E(Y2) is given by (5) with a

a

H ðA; BÞ ¼ aq1 f2ð1 þ hAÞ 2ð1 þ hAÞ h2 A2 þ 2 4ð1 þ hAÞa hA þ 2ð1 þ /B þ hAÞa a/B a

a

þ 2ð1 þ /B þ hAÞ h2 A2 ð1 þ /BÞ a2 /2 B2 a

a

þ 4ð1 þ /B þ hAÞ hA ð1 þ /B þ hAÞ a/2 B2 a

a

2ð1 þ /BÞ a/B þ 2ð1 þ /B þ hAÞ ahA/B 2ð1 þ /BÞa þ ð1 þ /BÞa a/2 B2 þ ð1 þ /B þ hAÞa a2 /2 B2 þ 2ð1 þ /B þ hAÞa g

The derivatives of this with respect to the 10 parameters are: o log L n ¼ oA X o log L n ¼ oB X o log L n ¼ oC X o log L n ¼ oD X

oX ; oA oX ; oB oX ; oC oX ; oD

n X o log L n oK n oX 1 ¼ ðp þ qÞ ; oa K oa X oa a þ bxj þ cy j þ dxj y j j¼1 n X o log L n oK n oX xj ¼ ðp þ qÞ ; ob K ob X ob a þ bxj þ cy j þ dxj y j j¼1 n X yj o log L n oK n oX ¼ ðp þ qÞ ; oc K oc X oc a þ bxj þ cy j þ dxj y j j¼1 n X xj y j o log L n oK n oX ¼ ðp þ qÞ ; od K od X od a þ bxj þ cy j þ dxj y j j¼1

=f/3 hað2a2 a þ 2 þ a3 Þg and K = a(a + 1)h/, where h = b/a, / = c/a and a = q 1.

n n X o log L n oK n oX X ¼ þ log xj þ log y j op K op X op j¼1 j¼1

Corollary 5. If (X, Y) has the joint pdf (3) with p = 1 and d = 0 then E(XY) is given by (5) with a

H ðA; BÞ ¼ aq1 f1 ð1 þ hAÞ ahA ð1 þ hAÞ h A2 a a

a

2

a

and a

2

ð1 þ /B þ hAÞ / B ð1 þ /BÞ a/B a 2

a

þ ð1 þ /B þ hAÞ a hA/B þ ð1 þ /B þ hAÞ ahA þ ð1 þ /B þ hAÞa ð1 þ /B þ hAÞa ahA/B ð1 þ /BÞa a/2 B2 þ ð1 þ /B þ hAÞa h2 A2 a ð1 þ /BÞ

a

a 2

ð1 þ /B þ hAÞ h A a

logða þ bxj þ cy j þ dxj y j Þ;

j¼1

a 2

þ ð1 þ hAÞ h2 A2 ð1 þ hAÞ

n X

2

a

þ ð1 þ /B þ hAÞ a/B þ ð1 þ /BÞ /2 B2 a

þ ð1 þ /B þ hAÞ a/2 B2 g =f/2 h2 að2a2 a þ 2 þ a3 Þg and K = a(a + 1)h/, where h = b/a, / = c/a and a = q 1.

n o log L n oK n oX X ¼ logða þ bxj þ cy j þ dxj y j Þ: oq K oq X oq j¼1

Since o log L/o A < 0 and o log L/o B > 0, it follows that the maximum likelihood estimators of A and B are max xj and min xj, respectively. Similarly, the maximum likelihood estimators of C and D are max yj and min yj, respectively. However, usually, A, B, C and D will be determined by some prior knowledge. The maximum likelihood estimators of a, b, c, d, p and q are the solutions of the six equations:

1930

M.M. Ali, S. Nadarajah / Computer Communications 30 (2007) 1926–1930

n X n oK n oX 1 ¼ ðp þ qÞ ; K oa X oa a þ bx þ cy j þ dxj y j j j¼1 n X n oK n oX xj ¼ ðp þ qÞ ; K ob X ob a þ bx þ cy j þ dxj y j j j¼1 n X yj n oK n oX ¼ ðp þ qÞ ; K oc X oc a þ bx þ cy j þ dxj y j j j¼1 n X xj y j n oK n oX ¼ ðp þ qÞ ; K od X od a þ bx þ cy j þ dxj y j j j¼1 n X n oK n oX ¼ logða þ bxj þ cy j þ dxj y j Þ K op X op j¼1

n X j¼1

log xj

n X

log y j ;

j¼1

and n n oK n oX X ¼ logða þ bxj þ cy j þ dxj y j Þ; K oq X oq j¼1

respectively.

References [1] J. Abate, W. Whitt, Computing Laplace transforms for numerical inversion via continued fractions, INFORMS Journal on Computing 11 (1999) 394–405. [2] R.J. Adler, R.E. Feldmann, M.S. Taqqu, A Practical Guide to Heavy Tails, Birkha¨user Boston, Boston, MA, 1998. [3] M.M. Ali, S. Nadarajah, A truncated Pareto distribution, Computer Communications, in press. [4] M.E. Crovella, M.S. Taqqu, A. Bestavros, Heavy-tailed probability distributions in the World Wide Web, in: R.J. Adler, R.E. Feldmann, M.S. Taqqu (Eds.), A Practical Guide to Heavy Tails, Birkha¨user Boston, Boston, MA, 1998, pp. 3–25. [5] A. Feldmann, W. Whitt, Fitting mixtures of exponentials to long-tail distributions to analyze network performance models, Performance Evaluation 31 (1998) 963–976. [6] I.S. Gradshteyn, I.M. Ryzhik, Table of Integrals, Series, and Products, sixth ed., Academic Press, San Diego, CA, 2000. [7] W.E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the selfsimilar nature of Ethernet traﬃc, IEEE/ACM Transactions on Networking 2 (1994) 1–15. [8] V. Paxson, S. Floyd, Wide-area traﬃc: the failure of Poisson modeling, IEEE/ACM Transactions on Networking 3 (1995) 226–244. [9] A.P. Prudnikov, Y.A. Brychkov, O.I. Marichev, Integrals and Series (volumes 1, 2 and 3), Gordon and Breach Science Publishers, Amsterdam, 1986.

Lihat lebih banyak...

A truncated bivariate generalized Pareto distribution

Descripción

Comentarios