Software comprehension based on database relational algebra

June 24, 2017 | Autor: Bassam Mohd | Categoría: Relational Database, Software Maintenance, Information Communication Technology

Share Embed

Laporkan tautan ini

Descripción

58

Int. J. Information and Communication Technology, Vol. 6, No. 1, 2014

Software comprehension based on database relational algebra Sahel Alouneh* Computer Engineering Department, German Jordanian University, P.O. Box 35247, Amman 11180, Jordan E-mail: [email protected] *Corresponding author

Sa’ed Abed and Bassam Jamil Mohd Computer Engineering Department, Faculty of Engineering, Hashemite University, P.O. Box: 150459, Zarqa, 13115 Jordan Fax: +962-5-3826613 E-mail: [email protected] E-mail: [email protected]

Ahmad Al-Khasawneh Department of Computer Information System, Hashemite University, Zarqa 13115, Jordan Fax: +962-5-3826625 E-mail: [email protected] Abstract: Execution trace file analysis facilitates software comprehension which is a fundamental part of the software maintenance process. However, the scalability of large execution trace files limits understanding programmes and increases the complexity of the maintenance process. A programme interior representation which reflects the actual data structure is constructed from reduced execution traces. These reduced execution traces are extracted from a profiled run or logs of a programme. As a result, the generated output of these representations can then be easily analysed and visualised. This paper presents an approach to analyse execution trace files using relational database algebra methods to better comprehend and maintain software(s). Furthermore, this work can be helpful in evaluating the software’s security using the logs trace files. The proposed approach was tested on real open source software. The results indicate that the proposed approach simplifies software comprehension, and may reduce the execution trace file size for up to 70% of its original size. Keywords: software maintenance; execution traces; comprehension; relational database; logs.

Copyright © 2014 Inderscience Enterprises Ltd.

Software comprehension based on database relational algebra

59

Reference to this paper should be made as follows: Alouneh, S., Abed, S., Mohd, B.J. and Al-Khasawneh, A. (2014) ‘Software comprehension based on database relational algebra’, Int. J. Information and Communication Technology, Vol. 6, No. 1, pp.58–70. Biographical notes: Sahel Alouneh received his BSc in Electrical and Computer Engineering from Jordan University of Science and Technology, Jordan in 2000. His MSc and PhD were obtained from Concordia University in 2004 and 2008, respectively. He is currently working at the German Jordanian University as an Assistant Professor at the Computer Engineering Department. Also, he is the Director of the Computer Center. His research interest includes computer networks fault tolerance, security, software engineering and computer hardware design. Sa’ed Abed received his BSc and MSc in Computer Engineering from Jordan University of Science and Technology (JUST), Jordan in 1994 and 1996, respectively. In 2008, he received his PhD in Computer Engineering from Concordia University, Canada. He has previously worked as a Lecturer at the Department of Computer Science, King Faisal University in Saudi Arabia from 1997 until 2003. His research interests include formal methods, theorem proving, model checking, SAT-solvers, VLSI design and automation, synthesis, computer architecture and fault tolerance. Bassam J. Mohd received his BS in Computer Engineering from KFUPM of Dhahran-KSA, his MS in Computer Engineering from the University of Louisiana at Lafayette and his PhD from the University of Texas – Austin, 2008. He has worked for several semiconductor companies including Intel, SUN, Synopsys and Qualcomm. He is currently an Assistant Professor at Hashemite University, Jordan. His research interest includes DSP designs, steganographic processor and power reduction/estimation techniques. Ahmad Al-Khasawneh is an Associate Professor of Computer Information System at Hashemite University. He holds a PhD and MSc in Information and Software Systems and Technology from Newcastle University, Australia and BS in Computer and Automatic Control Engineering. He has 20 years of experience in ICT field and acts as a Technical Advisor to the Royal Jordanian Airlines Director on issues related to the development of travel industry and ICT strategy. Currently, he is the Dean of the Prince Hussein Bin Abdullah II for Information Technology at Hashemite University.

1

Introduction

Software engineering is a multidisciplinary system that has many aspects to it. In particular, in the context of software maintenance, one of the most challenging tasks is to understand the software system. In fact, the software engineer attempts to build a rational map that relates the system’s methodologies and concepts to its source code and design. Understanding a system’s behaviour requires studying existing code, documentation, and other design artefacts in order to gain a level of understanding that is sufficient for a given maintenance task. This programme comprehension process is known to be very time-consuming, and in an investigation done by Basili, he reports that 50% to 60% of

60

S. Alouneh et al.

the software engineering effort is spent on understanding the software system (Basili, 1997; Cornelissen et al., 2008; LaToza et al., 2006). Execution traces formed from implemented code mirrors software’s actual implementation. This trace information can be used to recover relational patterns between different entities such as functions, files, or modules. Some solutions for the detection of patterns and their visualisation exist, but are limited to small amounts of trace information data and are incapable of comparing data from different versions of a large software system. However, execution traces can be comprised of millions of lines. Therefore, it is not feasible to explore all the content of the execution traces in order to be able to understand the software. Dynamic analyses based on execution traces are used in software testing, software performance analysis, distributed and parallel systems evaluation, and to some extent also in programme comprehension and reengineering. One challenge with dynamic data is its size: simple scenarios can result in very large execution traces. Because of that, researchers have investigated compression techniques to cope with the size challenge (Bogdan et al., 2013). It is worth to note that measuring a trace is a straightforward process. One simple approach is to report the file size or the number of lines in the trace file. In fact, measuring and analysing a trace file could be a very sophisticated and advanced process. Execution traces have been used in programme comprehension to facilitate understanding about interactions between building blocks of a software system. Further, they have been used to dynamically discover likely programme invariants that must be preserved when modifying and evolving a source code (Ernst et al., 2001). We present in this paper an approach to analyse and view the amount of information contained in an execution trace. In addition, this proposal can be used to evaluate the software’s security using the logs files. The core approach of this work is based on the relational algebra database concepts. Our method can be divided into two parts. The first part defines the appropriate database operations that can be applied to execution trace files. The second part transforms the execution trace files into compatible database files. The reason why we are using execution traces is that execution traces contain more relevant information such as thread data or interaction patterns. In some cases, source codes of some software objects are not available when doing maintenance due to legacy codes and others. The organisation of this paper is as follows. Section 2 reviews the related work in this field. Section 3 presents preliminary background on execution trace analysis. Section 4 illustrates our methodology. The evaluation and analysis of this approach is presented in Section 5. Finally, Section 6 concludes the paper and gives some future research directions.

2

Related work

In this section, we highlight the importance of analysing execution traces by showing their key role in software maintenance. We introduce the concept of trace summarisations that will help us propose our approach. Execution trace summarisation is defined as a process of summarising the main contents of a large trace by removing the details. This is similar to text summarisation where abstracts can be extracted from large documents without reading entirely its

Software comprehension based on database relational algebra

61

content. It is a powerful technique for extracting the behavioural models in a poorly documented system. However, based on software engineer’s expertise as well as the maintenance task performed, we expect that software engineers will need to have the flexibility to adjust the content of the summary if they find it too abstract or too detailed. Amyot et al. (2002) suggest tagging the source code at particular places in order to generate a trace that can later be represented using a use case map. This approach has the obvious drawback that it requires from the software engineers to know, in advance, where to insert the tags. It also necessitates the usage of static analysis tools. However, the authors recognised that visualising the main flow of execution of the feature under study can help understand the system dynamics. Systä’s et al. (2001) approach involves using a static analysis tool called Rigi (Müller and Klashinsky, 1988) to allow software engineers of her dynamic analysis tool to select the components of the system that need to be traced. This approach requires a considerable amount of time and precludes prior knowledge of the system’s static aspects. Other techniques that are used in reducing data redundancy of execution traces are based on compression techniques and algorithms known in information theory. All of them are based on compressing data by removing redundancy (De Pauw et al., 1998). Based on this, the key concept of data compression is to “assign short codes to common events (symbols or phrases) and long codes to rare events” (De Pauw et al., 1998). Even though these techniques produce very good results, the information, once compressed, is no longer readable by humans. So such algorithms certainly will not help in programme comprehension. Hamou-Lhadj and Lethbridge (2010) propose a framework for compression which allows software engineers to better comprehend an execution trace by significantly reducing its size and at the same time keeping its content readable, hence the name comprehension-driven compression framework. Several existing trace analysis tools and techniques were introduced to achieve good coverage of the types of features available and to simplify the content of execution traces; we did not attempt to examine in detail all the tools that exist in the literature. Using these tools, software engineers can search for specific components, filter out unwanted events, collapse or expand parts of the trace, and perform various other analyses. A survey of many of these tools and their supported techniques has been investigated in Hamou-Lhadj and Lethbridge (2004). None of these approaches, however, analyse the differences between different execution traces. Other related works were presented by Jerding et al. (1997), and Hamou-Lhadj and Lethbridge (2004). Both present different but similar techniques to identify patterns and similarities in execution traces. This allows them to compress execution traces and store them in a more compact form. In Bogdan et al. (2013), the authors apply the data fusion process to combine multiple sources of execution information. This process can determine the location of the source code that implements a specific functionality in software. This paper uses techniques such as web mining and dynamic analysis algorithms. The main contribution of this paper is the use of web mining algorithms to analyse execution trace information. To better comprehend a software source code in maintenance process, the authors in reference (Silva et al., 2011) proposes a new method based on the analysis of execution traces for locating points in the source code where perfective changes should be performed. The assumptions in this paper assume that source codes are open source and poorly documented, and these assumptions are important for the maintenance process.

62

S. Alouneh et al.

More paper works on execution trace analysis and programme comprehension are presented in references (Basili, 1997; Cornelissen et al., 2008; LaToza et al., 2006; Fischer et al., 2005; Eitan et al., 2011). This paper focuses on execution traces to improve programme comprehension using relational database concepts. This paper is an extension to the work of Alouneh et al. (2012). In this paper, the evaluation and analysis section has been added and more details on background and related work are presented.

3

Background

When building a query, the objective is to find the function sequence that will produce a proper result. Since the operations of the relational algebra are quite simple, many inbetween results might have to be produced before the final result. The intermediate results are used as operands in the operations that produce new intermediate results. The structure of the query with its operations and intermediate results may be presented as a flow chart, as in the Figure 1. The rectangles are relations (original or operation results), and the ovals are operations. The operation name is inside each oval, and a condition or other specification may be written inside or beside it. We use these diagrams to clarify the stepwise construction of a query. Figure 1

Tree representation of routine calls (see online version for colours)

The relational algebra is considered the base for implementing and optimising queries in relational database management systems (RDBMSs). The basic set of operations of relational algebra is used in defining database structure and constraints. A sequence of relational algebra operations forms a relational algebra expression, whose result will be a

Software comprehension based on database relational algebra

63

relation that represents the result of database query or retrieval request. In the proposed approach, the required operations to be performed on execution traces are defined in Table 1. The relational algebra is often considered to be an integral part of the relational data model. Its operations are divided into two groups. First group includes set operations from mathematical set theory as shown in Table 1. Table 1

Database operation on files

Union

U

Project

π

Intersection Cartesian product

∩

Join

Χ

Select

σ

There are many rules which can be used to transform relational algebra operations to equivalent ones. We will state here some useful rules for query optimisation. In this section, we use the following notation: •

E1, E2, E3: denote relational algebra expressions.

•

X, Y, Z: denote set of attributes.

•

F1, F2, F3, F4: denote predicates (selection or join conditions).

The relational algebra operations have the following properties: •

Commutativity Join, Cartesian product operations E1 F E2 ≡ E2 F E1 E1× E2 ≡ E2 × E1

•

Cascade projection

(

)

π x l π x 2 (... ( π x n (E) ) ...) ≡ π x1 (E) •

Cascade selection σ F1∧ F2 ∧...∧ Fn (E) ≡ σ F1 ( σ F2 (... ( σ Fn (E) ) ...) )

The queries are posed to a DBMS by interactive users or by programmes written in any high level language. The queries under go through a path which consists of the following: Query parser → Query optimiser → Code generator/interpreter → Query processor.

4

Relational database execution trace analysis approach

Software engineers generate several types of traces including routines calls, inter-process communication, and executed statements. In fact, one can trace just about any aspect of the software system. It is very common that traces, once generated, are saved in text files. A trace file usually contains a sequence of lines where each line represents an event. For the following example, each line record consists of the following items: the class of sender, the identity of the sender, the class of the receiver, the identity of the receiver, the method/function invoked, and the order of calls can be maintained using an integer that is

64

S. Alouneh et al.

added to represent the nesting level of calls or recording the entry and exit of the method. Figure 2 and Table 2 illustrate this example for a generated test execution traces file. In relational database model, Table 2 represents an entity or relation. Therefore, the representation of execution traces in the form of relational database tables is considered the first step in analysing the trace files. The second step is to interpret and analyse the content of execution trace tables. Figure 3 shows an architectural view of our approach. The input to our tool is the execution trace file(s), and in addition the logs file(s) which are used for security analysis. After that, the execution trace tables are generated and checked to be compatible for database relational algebra operations. The next stage is to apply the required operations on these generated execution trace tables. Finally, we are able to get the final reduced execution trace files. Our main purpose when analysing trace files is to reduce the redundant and un-necessary information. Also, it is very important to retrieve the required information in an optimised form. Figure 2

Table 2

Routine calls representation (see online version for colours)

Events that occur in the trace of Figure 2

Sender class

Sender ID

Receiver class

Receiver ID

Called method

Nesting level

C1

Obj1

m0

0

C1

Obj1

m1

0

C1

Obj1

C2

Obj2

m1

1

C2

Obj2

C3

Obj3

m1

2

C2

Obj2

C2

Obj2

m2

2

C3

Obj3

C3

Obj3

m2

3

C3

Obj3

C3

Obj3

m3

3

The relational algebra operations provide means to handle these issues. In the following discussion, we show how relational algebra operations can be used in trace execution analysis.

Software comprehension based on database relational algebra

65

When maintenance programmers inspect the trace files, sometimes it is very important to retrieve the required information efficiently. The select relational operation helps to do that. The select operation on a relation R(Xn) is denoted by:

σ R ( X n )

(1)

The select operation reduces the number of tuples (i.e., rows) in a relational database table. From the trace execution analysis point of view, the select operation can be used to reduce the number of execution trace tuples. The reduction of tuples depends on the condition applied. On the other hand, to reduce the number of attributes (i.e., columns) in relation R, the project operation can be used. In execution trace analysis, it is more efficient to include the attributes of our interest in the table. This operation helps to reduce the size of tracing tables, especially when more than one tracing tables are required to be joint. The explanation of this point is illustrated when we discuss the JOIN operation. The project operation is denoted by: Π < list of attributes > R ( X n )

(2)

If we think of the trace file as a relational database table, the operations 1 and 2 can be applied to the trace events of Table 2. The programmer can select the required and useful information for his analysis using the select condition of his interest. The same applies for the project operation. The programmer can represent the trace table by including only the required and useful information. This approach will have more impact when it is required to join more one than trace files together. Figure 3

Architectural view of our approach (see online version for colours)

66

S. Alouneh et al.

It is worth to note that software traces can be separated into many files. Therefore, it is very important to find a mechanism that can combine these files in the most efficient and optimised manner. A relational database model can achieve this goal by using its implemented and optimised operations. The JOIN operation, denoted by: R ( X n ) < join condition > S

(3)

where S and R denote the two trace execution tables to be combined. The join operation provides an efficient method to combine trace execution tables by only selecting the trace tuples that match the joining condition. It is worth to note that relational database operations can help to restructure the trace execution files in order to improve programme comprehension and analysis. The following points illustrate the main advantages of our approach: •

Reducing redundancy: this goal can be achieved by relational algebra operations select, project, join, sets (i.e., union, intersection, and set difference).

•

Reducing trace analysis complexity: when trace execution files are organised according to relational database structure such as tabular format, then searching for needed trace data becomes more efficient. In addition, the use of relational database operations can help to retrieve information of interest in an optimised time and overhead.

•

Improve software compression: The use of relational database operations can ease the software comprehension. This is specially an important factor in software maintenance phase. The tabular structure of execution trace files make these files more readable and easier to be understood.

•

Security of execution traces information: The relational DBMS provide means of access control to the data. Therefore, this security option can be applied to execution trace data files by using a discretionary authorisation mechanism (i.e., views). For example, the owner of the execution traces can designate different view levels of the execution trace tables to different programmers. In other words, each maintenance programmer files include only those attributes that are granted by the owner for that specific maintenance task.

On the other hand, in computer systems administration, logs files are used by system engineers to trace and evaluate the performance and security of software applications and operating systems. In very often cases, the logs files are huge and very complex to be traced and analysed. Therefore, our scheme that uses execution trace analysis based on relational algebra can be used to evaluate the security of software for some security threats such as buffer overflow vulnerabilities. To illustrate more, the logs files may include tracking information about the number and locations of stack return address override occurrences. Also, the source files and execution trace files consist of the vulnerable functions which cause these buffer overflow attacks. Therefore, we can feed all these key elements to be part of the relational algebra testing factors, thus help in understanding the security risks in the software to be maintained.

Software comprehension based on database relational algebra

5

67

Evaluation and analysis

In this section, the evaluation results that are obtained from testing the Clipper open source benchmark library (http://angusj.com/delphi/clipper.php) are presented. The clipper library is used to handle Boolean clipping operations on 2D polygons. Figure 4 shows the reduction ratio in the size of execution trace files. Generally, the reduction ratio is found to be more significant as the size of execution trace files increases. In addition to that, the reduction ratio is found to be greater on the JOIN relational operation than the SELECT operation. The reason for this is that the JOIN operation is accompanied with SELECT statement and can combine records from two or more tables in the database. In Figure 4, it can be noticed that the reduction ratio increases whenever the trace execution files get larger. Indeed, this is a realistic conclusion because the reduced files consist of tables abstracts of same redundant exaction traces. Figure 5 shows the impact of our method on security. Indeed, factors that have been studied are the false positive and false negative alarms for the analysis of logs files. The false positive alarm is triggered to indicate a possible security risk in the programme. However, the risk might not exist. On the other hand, false negative factor is used to indicate that the programme is safe while in fact there are risky parts that might cause security wholes. The false positive ratio can be high, however, the false negative ratio should be low as much as possible as it is not desirable to have some potential risks in the system that are not discovered. Figure 4

Reduction ratio for execution trace file with variable sizes (see online version for colours)

68

S. Alouneh et al.

Figure 5

Logs file security testing (see online version for colours)

The false and negative factors are of great importance for understanding and or comprehending the security risks in software. These factors should be tested in any maintenance process in advance in order to give the appropriate decision when evaluating and analysing the programme from the security point of view. Moreover, our method is found to be very efficient in making the programme comprehension easier due to the reduction in the execution trace files sizes. The ability to combine different intersected tables for variable execution trace files make the trace analysis and tracing more flexible and time efficient. This is possible because of the database relational algebra operations that can be applied on execution trace files. The reduction in the execution trace file size can help in making the readability and understanding easier. Also, this technique can limit the information to be viewed in the maintenance and analysis processes.

6

Conclusions

Execution trace data is another keystone for software evolution analysis. The properties of execution traces, such as detailed information about ‘scheduling’ data, invocation patterns, call frequency, nesting levels, or threading, can complement results gained from release history and architectural analysis. Further research will have to show how dependencies between these three dimensions can be exploited. In this paper, we presented an approach that is based on analysing execution trace files using relational database methods to better understand and maintain software(s). Our method is found to be efficient in making programme comprehension an easier process by reducing the size of execution trace file(s). In addition, we also tested this approach on log trace files. This has an advantage to get security warnings from the logs files and re-present then in tabular database forms for further analysis and from a security point of view. For example, buffer overflow security warnings generated from the log files can be simply combined with execution trace files

Software comprehension based on database relational algebra

69

tables so that maintenance software engineers can have a better understanding on the security holes in the software product and then can decide on the appropriate actions. For future work, we plan to integrate existing pattern detection approaches to reduce the amount of information and improve the comparability of different software versions. Further, we need to develop an automatism to compute differences for given execution traces which would allow us a more focused analysis. However, we plan to use other compression techniques to analyse the differences on a finer grained level and to identify the similarities and differences between the traces of different releases.

References Alouneh, S., Abed, S., Mohd, B.J. and Al-Khasawneh, A.M. (2012) ‘Relational database approach for execution trace analysis’, in Proc. of the IEEE International Conference on Computer, Information and Telecommunication Systems (CITS 2012), Amman, Jordan, 14–16 May, pp.1–4. Amyot, D., Mussbacher, G. and Mansurov, N. (2002) ‘Understanding existing software with use case map scenarios’, in Proceedings of the 3rd SDL and MSC Workshop, LNCS, Vol. 2599, pp.124–140. Basili, V.R. (1997) ‘Evolving and packaging reading technologies’, Journal of Systems and Software, Vol. 38, No. 1, pp.3–12. Bogdan, D., Meghan, R. and Denys, P. (2013) ‘Integrating information retrieval, execution and link analysis algorithms to improve feature location in software’, Empirical Software Engineering Journal, Vol. 18, No. 2, pp.277–309, Springer, ISSN: 1382-3256. Clipper open source benchmark library [online] http://angusj.com/delphi/clipper.php. Cornelissen, B., Zaidman, A., Holten, D., Moonen, L., van Deursen, A. and van Wijk, J.J. (2008) ‘Execution trace analysis through massive sequence and circular bundle views’, J. Syst. Softw., Vol. 81, No. 12, pp.2252–2268. De Pauw, W., Lorenz, D., Vlissides, J. and Wegman, M. (1998) ‘Execution patterns in object-oriented visualization’, in Proc. of the 4th USENIX Conference on Object-Oriented Technologies and Systems, Santa Fe, NM, pp.219–234. Eitan, N., Gordon, M., Harel, D., Marron, A. and Weiss, G. (2011) ‘On visualization and comprehension of scenario-based programs’, 2011 IEEE 19th International Conference on Program Comprehension (ICPC), 22–24 June, pp.189–192. Ernst, M.D., Cockrell, J., Griswold, W.G. and Notkin, D. (2001) ‘Dynamically discovering likely program invariants to support program evolution’, IEEE Transactions on Software Engineering, Vol. 27, No. 2, pp.99–123. Fischer, M., Oberleitner, J., Gall, H. and Gschwind, T. (2005) ‘System evolution tracking through execution trace analysis’, in Proceedings of the 13th International Workshop on Program Comprehension (IWPC ‘05), IEEE Computer Society, Washington, DC, USA, pp.237–246. Hamou-Lhadj, A. and Lethbridge, T. (2004) ‘A survey of trace exploration tools and techniques’, in Proc. of the 14th IBM Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, ON, Canada, pp.42–55. Hamou-Lhadj, A. and Lethbridge, T.C. (2004) ‘A metamodel for dynamic information generated from object-oriented systems’, Electronic Notes in Theoretical Computer Science, May, Vol. 94, pp.59–69. Hamou-Lhadj, A. and Lethbridge, T.C. (2010) ‘Understanding the complexity embedded in large routine call traces with a focus on program comprehension tasks’, IET Software, Vol. 4, No. 2, pp.161–177.

70

S. Alouneh et al.

Jerding, D.F., Stasko, J.T. and Ball, T. (1997) ‘Visualizing interactions in program executions’, in Proceedings of the 19th International Conference on Software Engineering (ICSE), IEEE Computer Society Press, May, pp.360–371. LaToza, T.D., Venolia, G. and DeLine, R. (2006) ‘Maintaining mental models: a study of developer work habits’, in Proc. 28th Int. Conf. on Software Engineering (ICSE), ACM, pp.492–501. Müller, H.A. and Klashinsky, K. (1988) ‘Rigi – a system for programming in-the-large’, in Proceedings of the 10th International Conference on Software Engineering, ACM Press, pp.80–86. Silva, L.L.P.o., de Amo, K.R. de Almeida Maia, S.M. (2011) ‘On the use of execution trace alignment for driving perfective changes’, 15th European Conference on Software Maintenance and Reengineering (CSMR) 2011, 1–4 March, pp.221–230. Systä, T., Koskimies, K. and Müller, H. (2001) ‘Shimba – an environment for reverse engineering Java software systems’, Software – Practice and Experience, Vol. 31, No. 4, pp.371–394.

Lihat lebih banyak...

Software comprehension based on database relational algebra

Descripción

Comentarios