Object-Oriented Decision Support System

Share Embed


Descripción

OBJECT-ORIENTED DECISION SUPPORT SYSTEM Franck Ravat, Olivier Teste Université Paul Sabatier - IRIT/SIG 118 Route de Narbonne, 31062 Toulouse cedex 04, France E-mail: {ravat, teste}@irit.fr

Object-oriented model, conceptual warehouse design, temporal information.

We develop a decision support system, based on the data warehousing approach. We define a conceptual object-oriented data warehouse model, which describes the data warehouse as a central repository of relevant, complex and temporal data. Our model integrates three concepts: warehouse object, environment and warehouse class. Each warehouse object representing a source entity, is composed of one current state, several past states (modelling its detailed evolutions) and several archive states (modelling its evolutions within a summarised form). The environment concept defines temporal parts in a data warehouse schema with significant granularities (attribute, class, sub graph). We also provide seven functions that are used to define warehouse classes.

1

INTRODUCTION

Data warehousing is emerging as a key technology for enterprises that wish to improve their data analyses and decision support (Inmon 1996). A data warehouse is defined as a repository of data that has been extracted from operational databases, called source databases (Widom 1995). Extracted data are integrated and stored in a data warehouse to perform efficient analyses. Our research centre intends to design a decision support system in French Social Security context1. In (Pedersen and Jensen 1999) the authors argue that the medical domain requires more powerful data model than conventional relational approaches. The data model should provide advanced temporal support (Chaudhuri and Dayal 1997, Yang and Widom 2000). The database community is devoting increasing attention to the data warehousing approach (Chaudhuri and Dayal 1997, Widom 1995). Technical aspects that mainly determine the 1

This study was partially financed by CTI-Sud (computing organism of the French Social Security) and it was supported by the EVOLUTION group (www.prism.uvsq.fr/dataware/coop/ evolution.html).

data warehouse performances, such as multidimensional data models (Agrawal et al. 1997, Pedersen and Jensen 1999) and materialised views (Gupta and Mumick 1995, Huyn 1997) are study while conceptual issues have not been investigated enough (Gatziu et al. 1999). Little attention is currently paid to design a data warehouse. Nevertheless, the European DWQ project studies techniques for measuring and optimising some aspects of warehouse quality (Jarke et al. 1998). In this paper, we provide a solution to overcome this insufficiency. We focus on conceptual warehouse design. We define a data warehouse as a central repository for managing efficiently extracted, complex and temporal data. We provide a warehouse data model based on the object paradigm. Section 2 presents our decision support system. Section 3 provides our data warehouse model. Section 4 describes a prototype allowing the administrator to generate data warehouses.

2

DECISION SUPPORT SYSTEM

We specify a decision support system allowing us to distinguish several research issues. Figure 1 depicts the functional architecture of our decision support system.

Querying, Analysis, Data Mining Data Mart

Data Mart

...

Data Mart

ORGANISATION

Data Warehouse CONSTRUCTION

Global Source

INTEGRATION

Source

Source

...

Source

Figure 1 - Decision support architecture.

The integration step consists in regrouping several sources into one global data source, which can be virtual, e.g. relevant source data is extracted only when the global source is querying. The integration is based on federated (Samos et al. 1998) and/or distributed database approaches (Ravat and Zurfluh 1995). The construction step intends to both design and generate a data warehouse as a materialised view over the global data source. The materialised view approach (Gupta and Mumick 1995) consists in collecting relevant source data to store extracted data and their evolutions in a database, called the data warehouse. In our system, the data warehouse is a central repository dedicated to manage efficiently relevant data. The organisation step organises data (available in the data warehouse) into several data marts for supporting efficiently OLAP (On-line Analytical Processing) applications. Multidimensional models are commonly used in the data marts to improve analyses. Note that the multidimensional approach is not adapted to model the data of our central repository, called data warehouse.

3

WAREHOUSE OBJECT MODEL

This section depicts our conceptual data warehouse data model.

3.1 Global source example Data warehouse stores data extracted from a global data source. In our context, the global

source is described within an object-oriented model. The motivating for using the object paradigm at the integration step is that it has proven to be successful in complex data modelling (Bukhres and Elmagarmid 1993). The case study is taken from the medical domain. It concerns the patients and the doctors in French hospitals. We use the object definition language (ODL) provided by the object database management group (ODMG) and we extend this standard model with composite objects (Bertino et al. 1998). interface PERSON { attribute String firstname; attribute String lastname; attribute Struct Address{String street, String city, Long code} address; attribute Short birth; } interface PATIENT (extend Person) { attribute Short patient_no; } interface DOCTOR (extend Person) { attribute Long doctor_no; attribute String speciality; attribute Double income; relationship Set works inverse WARD::staffs; relationship manages inverse WARD::managed_by; } interface ESTABLISHMENT { attribute String name; attribute String status; attribute Struct Address{String street, String city, Long code} address; attribute Double budget; composition Set organisation; } interface WARD { attribute String title; attribute String phone; relationship Set staffs inverse DOCTOR::works; relationship managed_by inverse DOCTOR::manages; }

3.2 Warehouse object Our data warehouse model is based on the object paradigm. In a data warehouse, each object models one (or several) extracted source object(s). The basic characteristics of a warehouse object is its distinct identity which is immutable and is independent of the source objects. Moreover, decision support requires historical data, while data sources often store only current data (Chaudhuri and Dayal 1997). The data warehouse must keep only significant updates,

e.g. some temporal data may be or may become non-relevant. For example, to analyse an hospital budget, the data warehouse must store the current budget and several evolutions of the hospital budget. The data warehouse can store detailed budget evolutions during two years and after this delay, the old budgets are aggregated using functions such as sum, max, min, avg… This problem is important because very large volumes of data are involved in a data warehousing context. It is important to store only relevant data to decision support. Figure 2 depicts our approach. The main characteristic of a warehouse object is that it regroups several states: one current state, several past states representing its detailed evolutions and possibly, several archive states representing summarised evolutions. Global source Data warehouse

Sa2

In order to define the temporal domain hi, we use a temporal model based on a discrete and linear time model (Goralwalla et al. 1998) where the time line is structured in a multigranular way. We integrate three basic temporal types: duration, instant, interval. A temporal domain hi is a set of intervals which are non-empty, disjoint, non contiguous, ordered and use the same temporal unit. EXAMPLE. Figure 3 depicts a warehouse object (identified by "S_oid1") representing a surgeon. It is composed of one current state, two past states (keeping two detailed evolutions of the income) and one archive state (storing the average of the income between 1990 to 1998). S_oid1:Surgeons current state: firstname="Olivier" lastname="Teste" address=[street="1, rue Alsace", city="Paris", code=75000] birth=1972 doctor_no=1027281001 speciality="surgery" income=15000,00 past state: income=13000,00

Sa1 Sp2 Sp1 S0 time Figure 2 - Warehouse object modelling.

We adopt a periodically refreshing approach, e.g. source updates are periodically propagated in the data warehouse at an extracting point. Therefore a current value in the data warehouse may be not the current value in the source. This seems to be a real nuisance, but our application context does not require to propagate immediately the source updates. A warehouse object o is defined by a tuple (oid, S0, PS, AS) where - oid is an object identity, - S0 is the current state, - PS={Sp1,…, Spp} is a finite set of past states, - AS={Sa1,…, Saa} is a finite set of archive states. Note that each warehouse object requires one current state. A state Si of a warehouse object is defined by a couple (hi, vi) where - hi is a temporal domain representing instants, - vi is a value, which is the object value during instants of hi.

past state: income=12000,00 archive state: income=10000,00

Figure 3 - Example of warehouse object.

3.3 Warehouse class Objects that share common structure and behaviour are grouped into a warehouse class. A warehouse class c is defined by a tuple (Name, Type, Super, Extension, β , ϕ, ψ) where - Name is a unique class name, - Type is a type defining structure and behaviour of the warehouse objects, - Super is a set of the super classes of c, - Extension = {o1, o2,…, os} is a finite set of warehouse objects, - β is a building function, which specifies the warehouse class origin, - ϕ is a temporal filter defining a set of temporal attributes (their detailed value evolutions are kept), - ψ is an archive filter, defining a set of archive attributes (their value evolutions are summarised). We provide seven functions to define the building function β. - The project function πP is used to extract source properties. These properties that are

defined by P={p1, p2,…, pm} are a sub set of the source class properties. - The hide function µP defines the nonextracted properties P={p1, p2,…, pm'} of a source class cs. - The grow function αP:F consists in creating new properties in a generated warehouse class. These properties P={p1, p2,…, pm"} can be calculated attributes or specific attributes. The calculated attributes are defined by an aggregate function fi∈{count, sum, avg, max, min} or a source method. The specific attributes are defined by a type. - The select function σp consists in restricting the source class extension. The warehouse class is populated with warehouse objects representing each selected source object according to the selection predicate. - The join function ⋈p consists in filtering the Cartesian product between two classes with a join predicate. Each generated warehouse object is defined from a pair of two source objects filtered by the predicate p. - The generalisation function ΛP consists in grouping together shared common properties and/or methods which must be defined at the generated super classes. The generated super class groups together the shared properties of P={p1, p2,…, pq} (specified by the administrator). - The specialisation function Σp consists in creating a sub class from one or several warehouse classes. It is important to note that the warehouse class is defined through two filters allowing the administrator to specify temporal and archive properties. A temporal filter specify a sub set of temporal properties. When a property is temporal (p∈ϕ), its detailed evolutions are kept by several past states. An archive filter specifies a sub set of couples (archive_propety, aggregate_function). When a property is an archive property (p∈ψ), some evolutions are summarised by archive states. EXAMPLE. To define a warehouse class named "Surgeons", we use a select function over the source class named "DOCTOR". The source class extension is restricted to only the doctors who are surgeons. A temporal filter and an archive filter are speficied. interface Surgeons = σd.speciality="surgery" (d DOCTOR) with temporal filter { income, speciality, works, manages }, archive filter {avg(income)};

The class named "Persons" is a super class of the warehouse class named "Surgeons".

interface Persons =Λ Λ s.firstname,s.lastname,s.address,s.birth(s Surgeons) with temporal filter {address};

The sub class named "YoungSurgeons" (birth≥1970) inherits from "Surgeons". interface YoungSurgeons = Σs.birth≥1970 (s Surgeons)

The classes named "Hospitals" and "Services" represents the public hospitals in French. "Hospitals" are described by two added attributes (nb_services, creation_year), which are not derived from the source class. A project function defines the derived properties of the class "Hospitals". interface Hospitals = αnb_services:count(h.organisation), creation_year:Short πh.name, h.address.city, h.budget, h.organisation (h σe.status="public"(e ESTABLISHMENT)) with temporal filter { budget, nb_services, organisation}, archive filter {avg(budget), avg(nb_services)};

A hide function is used to define non-derived source properties.

interface Services = µsr.name, sr.status, sr.address, sr.budget, sr.phone (sr ⋈h.organisation=s (h σe.status="public"(e ESTABLISHMENT), s WARD))) with temporal filter {staffs, managed_by};

3.4 Environment Decision support requires some warehouse objects with its different evolutions (current, past and archive states) and some warehouse objects with only its current value (current state). We introduce the environment concept allowing us to define an homogeneous temporal part in a data warehouse. An environment Envi is defined by (NameEnvi, CEnvi, ConfigEnvi) where - NameEnvi is a unique environment name. - CEnvi={cEnvi1,…, cEnvini} is a set of classes. - ConfigEnvi is a set of constraints which configure the environment. The environments are disjoints, ∀Envi, ∀Envj≠i,CEnvi∩CEnvj=∅. Warehouse classes are organised in a hierarchy according to inheritance relationships (Super). Our environment concept extends the inheritance concept to the filters. ∀ci∈CEnv, cj≠i∈CEnv, if ci is a subclass of cj, denoted by ci≼cj, then Typeci⊇Typecj ∧ ci cj Extension ⊆Extension ∧ ϕci⊆ϕcj ∧ ψci⊆ψcj. The configurations ConfigEnvi allows the environment object management configuration

(for example, we can define the archive rule specifying when the temporal states must be summarised). Our approach is based on the ECA-paradigm; see (Ravat et al. 1999) for more details. EXAMPLE. We consider the previous data warehouse example. The administrator creates one environment, named "Evolutions", containing four classes.

a finite set of warehouse classes and ConfigDW is a set of global configuration constraints. The data warehouse is configured by a set of constraints. We studied the data warehouse configuration in (Ravat et al. 1999).

environment Evolutions { Persons,Surgeons,Hospitals,Services }

In this section, we briefly describe a prototype allowing the administrator to both define and generate warehouses in an object-oriented database management system. This prototype, called WarGen (Warehouse Generator) is based on a graphical interface and a generator (Bret and Teste 1999, Ravat et al. 1999). The interface displays a graphical representation of a global source schema. The administrator defines graphically, interactively and incrementally a warehouse schema. The generator creates automatically the data warehouse according to the graphical warehouse definition. The warehouse classes, the first extraction (which populates the warehouse classes) and the refresh process (which propagates the source evolutions into the warehouse) are generated automatically. Figure 4 presents examples of a global source schema (see section 3.1 for a textual definition) and a warehouse schema (according to the building functions given section 3.3). The symbol ' ' defines a property belonging to a temporal filter and the symbol ' ' defines a property belonging to an archive filter.

Note that the sub class "YoungSurgeons" does not inherit from the filters of "Surgeons" and "Persons" because it is not in the "Evolutions" environment. On the other hand, "Surgeons" inherits from the temporal filter of "Persons" because they are in the same environment. The administrator defines a numeric threshold which limits the past state storage; only two past states are kept. When three past states are described, the warehouse aggregates the oldest past states in an archive state. create constraint NUMERIC_THRESHOLD on environment Evolutions when self.refresh if select PS from C in self.classes, OG in C.extension, PS in OG.history where PS.numerous() > 2 then OG.archive(PS,OG.ancestor[1]) ;

3.5 Warehouse schema A data warehouse schema, named SDW, is defined by (NameDW, CDW, ConfigDW) where NameDW is a schema name, CDW={c1,…, cnc} is

4

IMPLEMENTATION

warehouse window

source window

Figure 4 - WarGen interface.

The WarGen source code represents approximately 5000 lines of Java code (Jdk

1.1.6). The warehouses are generated in O2 (4.6 version). We have extended O2 with several

meta-classes and tool classes representing 2000 lines of O2C code.

5

CONCLUSION

This paper describes a decision support system divided in three steps (integration, construction, organisation). We deal with data warehouse design issues. Firstly, we provide a data model for objectoriented data warehouses, which is based on the concepts of warehouse object, warehouse class and environment. A warehouse object is composed of one current state, several past states representing detailed evolutions and several archive states representing summarised evolutions. A warehouse class integrates a temporal filter (detailed evolutions) and an archive filter (summarised evolutions) for defining temporal data management. Our solution stores only relevant information in such a way that temporal data are kept within significant details. We also provide seven functions that are used to define warehouse classes. We introduce the environment concept allowing definitions of temporal parts in a warehouse schema. Environments are used to define temporal parts with a relevant size. Secondly, our solution is implemented in a prototype, called WarGen. The administrator designs graphically, interactively and incrementally a conceptual data warehouse schema whereas a generator creates automatically temporal object data warehouses. There are several possible extensions to this work. The extraction process can be extended to extract the class behaviour. Also, we want to specify an object query language handling our warehouse elements.

REFERENCES Agrawal R., Gupta A., Sarawagi A., 1997. "Modeling Multidimensional Databases", ICDE'97. Bertino E., Ferrari E., Guerrini G., Merlo I., "Extending the ODMG Object Model with Composite Objects", OOPSLA'98, Vancouver (Canada), 1998. Bret F., Teste O., 1999. "Construction Graphique d'Entrepôts et de Magasins de Données", INFORSID'99, La Garde (Fr.).

Bukhres O.A., Elmagarmid A.K., 1993. "Object-Oriented Multidatabase Systems - A solution for Advanced Applications", Prentice Hall, ISBN 0-13-103813-2, 1993. Chaudhuri S., Dayal U., 1997. "An Overview of Data Warehousing and OLAP Technology", SIGMOD Record, 26(1):65-74. Gatziu S., Jeusfeld M.A., Staudt M., Vassiliou Y., 1999. "Design and Management of Data Warehouses", Report on the DMDW'99 Workshop, SIGMOD Record, 28, 4. Goralwalla A., Özsu M.T., Szafron D., 1998. "An Object-Oriented Framework for Temporal Data Models", LNCS - Temporal Databases: Research and Practice, ISBN N°3-540-64519-5. Gupta A., Mumick I.S., 1995. "Maintenance of Materialized Views: Problems, Techniques, and Applications", IEEE Data Eng. Bulletin. Hyun N., 1997. "Multiple-View SelfMaintenance in Data Warehousing Environments", VLDB'97, Athens. Inmon W.H., 1996. "Building the Data Warehouse", 2nd Edition John Wiley & Sons, Inc., ISBN n°0471-14161-5, USA. Jarke M., Jeusfeld M.A., Quix C., Vassiliadis P., 1998. "Architecture and Quality in Data Warehouses", CAISE’98, Italy. Pedersen T.B., Jensen C.S, 1999. "Multidimensional Data Modeling for Complex Data", ICDE'99. Ravat F., Teste O., Zurfluh G., 1999. "Towards the Data Warehouse Design", ACM CIKM'99, Kansas City (Kansas, USA). Ravat F., Zurfluh G., 1995. "Class partitioning", Journal of Computing and Information, vol. 1, n°2. -Special Issue. Samos J., Saltor F., Sistrac J., Bardés A., 1998. "Database Architecture for Data Warehousing: An evolutionary Approach", DEXA'98, Vienna (Austria). Widom J., 1995. "Research problems in data warehousing", ACM CIKM'95. Yang J., Widom J., 2000. "Temporal View SelfMaintenance in a Warehousing Environment", EDBT'00, Konstanz (Germany).

Lihat lebih banyak...

Comentarios

Copyright © 2017 DATOSPDF Inc.