EASE 2007

11th International Conference on Evaluation and Assessment in Software Engineering

2-3 April 2007

Program

Time Monday 2nd April 2007
8:30-9:15 Registration
9:15-10:30

Keynote Address - Khaled El-Emam

Chair: Barbara Kitchenham

10:30-11:00 Coffee
11:00-12:30

Experimental Comparison of the Comprehensibility of a UML-based Formal Specification versus a Textual One
Rozilawati Razali, Colin F Snook, Mike R Poppleton, Paul W Garratt, and Robert J Walters

Assessment of a Framework for Comparing Software Architecture Analysis Methods
Muhammad Ali Babar, and Barbara Kitchenham

12:30-14:00 Lunch
14:00-15:30

Feature Selection and Clustering in Software Quality Prediction
Qi Wang, Jie Zhu, and Bo Yu

Predicting Short-Term Defect Inflow in Large Software Projects – An Initial Evaluation
Miroslaw Staron, and Wilhelm Meding

15:30-16:00 Coffee
16:00-17:30

An Experiment Measuring the Effects of
Maintenance Tasks on Program
Knowledge

Alistair Hutton and Ray Welland

Outsourcing and Knowledge Management in Software Testing
Katja Karhu, Ossi Taipale, and Kari Smolander

18:30 Reception and Conference Dinner

 

Time Tuesday 3rd April 2007
9:00-10:30

Keynote Address - Barbara Kitchenham
The Current State of Evidence-Based Software Engineering

Preliminary results of a study of the completeness and clarity of structured abstracts
David Budgen, Barbara Kitchenham, Stuart Charters, Mark Turner, Pearl Brereton, and Stephen Linkman
10:30-11:00 Coffee
11:00-12:30

A framework for effort management in software projects
Topi Haapio

Predicting Web Development Effort Using a Bayesian Network
Emilia Mendes

12:30-14:00 Lunch
14:00-15:30

Systematic Review of Statistical Process Control: An Experience Report
Maria Teresa Baldassarre, Danilo Caivano, Barbara Kitchenham, and Giuseppe Visaggio

Motivators of Software Process Improvement: An Analysis of Vietnamese Practitioners’ Views
Mahmood Niazi, and Muhammad Ali Babar

Closing Remarks

15:30 - 16:00 Coffee

Posters

The following papers will be presented as informal posters during the conference.

Author(s) Paper Title

Trevor Barker

The use of Grounded Theory in the evaluation of computer software

 

Abstracts

Experimental Comparison of the Comprehensibility of a UML-based Formal Specification versus a Textual One
Rozilawati Razali, Colin F Snook, Mike R Poppleton, Paul W Garratt, and Robert J Walters

The primary objective of software specification is to promote understanding of the system properties between stakeholders. Specification comprehensibility is essential particularly during software validation and maintenance as it permits the understanding of the system properties more easily and quickly prior to the required tasks. Formal notation such as B increases a specification’s precision and consistency. However, the notation is regarded as being difficult to comprehend due to its unfamiliar symbols and rules of interpretation. Semi-formal notation such as the Unified Modelling Language (UML) is perceived as more accessible but it cannot be verified systematically to ensure a specification’s accuracy. Integrating the UML and B could perhaps produce an accurate and approachable specification. This paper presents an experimental comparison of the comprehensibility of a UML-based graphical formal specification versus a purely textual formal specification. The measurement focused on the efficiency in performing the comprehension tasks. The experiment employed a cross-over design and was conducted on forty-one third-year and masters students. The results show that the integration of semi-formal and formal notations expedites the subjects’ comprehension tasks with accuracy even with limited hours of training.

 

Assessment of a Framework for Comparing Software Architecture Analysis Methods
Muhammad Ali Babar, and Barbara Kitchenham

We have developed a framework, FOCSAAM, for comparing software architecture analysis methods. FOCSAAM can help architects and managers to choose a specific method to support architecture analysis process. We have been assessing the suitability of the framework’s elements in different ways. During the development of FOCSAAM, a theoretical assessment was performed by relating each of its elements to the published literature on quality assurance, process improvement, and software development approaches. Moreover, we have also found that most of the elements of FOCSAAM can also be mapped onto the elements of a well-known framework for comparing information systems development methods, NIMSAD framework. Our goal of this study was to further assess the suitability of different elements of FOCSAAM by using the expert opinion approach. We asked 17 practicing architects with extensive experience to assess the suitability of the elements of FOCSAAM for selecting a particular method to support the software architecture analysis process. The findings of this study provide support for each element of FOCSAAM to be included in forming criteria for comparing software architecture analysis methods.

 

Feature Selection and Clustering in Software Quality Prediction
Qi Wang, Jie Zhu, and Bo Yu

Software quality prediction models use the software metrics and fault data collected from previous software releases or similar projects to predict the quality of software components in development. Previous research has shown that this kind of models can yield predictions with impressive accuracy. However, building accurate software quality prediction model is still challenging for following two reasons. Firstly, the outliers in software data often have a disproportionate effect on the overalls predictive ability of the model. Secondly, not all collected software metrics should be used to construct model because of the curse of dimension. To resolve these two problems, we present a new software quality prediction model based on genetic algorithm (GA) in which outlier detection and feature selection are executed simultaneously. The experimental results illustrate this model performs better than some latest raised software quality prediction models based on S-PLUS and TreeDisc. Furthermore, the clustered software components and selected features are easier for software engineers and data analysts to study and interpret.

 

Predicting Short-Term Defect Inflow in Large Software Projects – An Initial Evaluation
Miroslaw Staron, and Wilhelm Meding

Predicting a defect inflow is important for project planning and monitoring purposes. For project planning purposes and for quality management purposes, an important measure is the trend of defect inflow in the project – i.e. how many defects are reported in a particular stage of the project. Predicting the defect inflow provides a mechanism of early notification whether the project is going to meet the set goals or not. In this paper we present and evaluate a method for predicting defect inflow for large software projects: a method for short-term predictions for up to three weeks in advance on a weekly basis. The method is evaluated by comparing it to existing defect inflow prediction practices (e.g. expert estimations) at one of large projects at Ericsson. The results show that the method provides more accurate predictions (in most cases) while decreasing the time required for constructing the predictions using current practices in the company.

 

An Experiment Measuring the Effects of Maintenance Tasks on Program Knowledge
Alistair Hutton and Ray Welland

Objective: To ascertain whether programmers gain more knowledge about an unfamiliar program by enhancing the code or documenting the code. The context of this work was investigating whether maintenance programmers faced with an unfamiliar system should start by actively working on the system or spend time passively exploring the system before attempting to make changes.
Method: We designed a laboratory experiment where subjects initially either enhanced or documented a program and then we measured how they performed when carrying out a further task on the given code. Our hypothesis was that programmers would gain more knowledge performing one of the two tasks. The experiment was repeated three times with different groups of students, all at the same stage of their education.
Results: There was no significant difference between the performance of the two groups who had performed different initial tasks. However, there was a strong correlation between performance in the measured task and the students’ programming ability, as measured by a previous academic assessment. As not all subjects completed the measured task within the given time, we needed to use Kaplan-Meier survival curves and the Cox Proportional Hazard Model to analyse our data. Detailed inspection of the code produced during the experiment revealed some interesting qualitative results.
Conclusions: We were unable to show a significant difference between the value of enhancing or documenting code as a way of gaining knowledge about unfamiliar programs. In the context of software maintenance this means that there is no advantage in spending unproductive time documenting code to gain knowledge.

 

Current State of Evidence-Based Software Engineering
Barbara Kitchenham

Background: Recently, Dybå, Kitchenham, and Jørgensen wrote a series of articles proposing the adoption of the evidence-based paradigm in Software Engineering research and practice. The evidence-based paradigm relies on the use of systematic literature reviews to provide a rigorous method for identifying, analyzing, and synthesizing research on a specific topic or research question. It is also concerned with providing best practice guidelines for practitioners.
Aims: The goal of this presentation is to describe the extent to which the evidence-based paradigm has been adopted in Software Engineering since 2004.
Method: We performed a tertiary review to determine the current extent of adoption of the evidence-based paradigm. A tertiary review is a review of systematic literature reviews (which are themselves secondary reviews). We manually searched articles in IEEE Trans on SE, IEEE Software, JSS, IST, CACM, ACM Surveys, ISESE05, ICSE05 and ICSE06 and identified all articles that conformed to the following inclusion criteria:

We excluded papers that were subjective reviews (with no defined question, minimal search criteria, no data extraction process, and no formal aggregation process). Papers were classified by type (SLR, EBG, MA), by subject area, scope (research trends or specific research question). SLR papers were also evaluated for quality against four criteria: Results: We found 23 papers. There was one meta-analysis, 20 systematic literature reviews (SLRs), 4 evidence-based guidelines (of which two included SLRs), and two papers that positioned themselves as Evidence-based software engineering papers (of which both included SLRs). It is interesting to note that we found no SE or CS SLRs in ACM Surveys.
Nine of the 20 SLRs addressed research trends rather than specific research questions. In terms of topic area, nine articles related to cost estimation, four related to research trends in SE experiments and three related to test methods. In the area of cost estimation, researchers are addressing well-defined research questions and obtaining useful results. This is also an area where researchers have proposed evidence-based guidelines for practitioners.
EBSE is strongly supported by the Simula Research Laboratory (11 out of 23 papers) and European researchers (17 papers). There are relatively few North American researchers (4 papers).
In terms of quality, very few SLRs provide a quality assessment of their primary studies (only 3 fully and 4 partly).
Conclusions: Currently, researchers in Europe (particularly those interested in cost estimation) have been in the forefront of Evidence-based Software Engineering but it seems to have limited take-up in North America.
The large number of papers addressing research trends is rather disappointing since research trends are primarily of interest to academics rather than practitioners. However, five of these papers suggest ways in which software engineering experiments can be improved. In the longer term this should have a benefit for practitioners by improving the quality of software engineering research and hence increasing the reliability of software engineering evidence.
Another particular concern is the lack of quality assessment of primary studies, since this is a critical factor in SLR methodology. However, this is partly due to the relatively large number of SLRs that consider research trends rather than specific research questions.

 

Outsourcing and Knowledge Management in Software Testing
Katja Karhu, Ossi Taipale, and Kari Smolander

The objective of this empirical study was to explore outsourcing in software testing and shape hypotheses that explain the association between outsourcing and knowledge management. First, a survey of testing practices was conducted and 26 organizational units (OUs) were interviewed. From this sample, five OUs were further selected for an in-depth case study. The study used qualitative grounded theory as its research method and the data was collected from 41 theme-based interviews. The analysis yielded hypotheses that included that the business orientation of an OU affects outsourcing of testing and the knowledge management strategy, outsourcing seems to be more effective when independent testing agencies have enough domain knowledge, and outsourcing verification tasks is more difficult than outsourcing validation tasks. The results of this study can be used in developing the knowledge management strategy and as guidance in making outsourcing decisions.

 

Preliminary results of a study of the completeness and clarity of structured abstracts
David Budgen, Barbara Kitchenham, Stuart Charters, Mark Turner, Pearl Brereton, and Stephen Linkman

Context: Systematic literature reviews largely rely upon using the titles and abstracts of primary studies as the basis for determining their relevance. However, our experience indicates that the abstracts for software engineering papers are frequently of such poor quality they cannot be used to determine the relevance of papers. Both medicine and psychology recommend the use of structured abstracts to improve the quality of abstracts.
Aim: This study investigates whether structured abstracts are more complete and easier to understand than non-structured abstracts for software engineering papers that describe experiments.
Method: We constructed structured abstracts for a random selection of 25 papers describing software engineering experiments. The original abstract was assessed for clarity (assessed subjectively on a scale of 1 to 10) and completeness (measured with a questionnaire of 18 items) by the researcher who constructed the structured version. The structured abstract was reviewed for clarity and completeness by another member of the research team. We used a paired ‘t’ test to compare the word length, clarity and completeness of the original and structured abstracts.
Results: The structured abstracts were significantly longer than the original abstracts (size difference =106.4 words with 95% confidence interval 78.1 to 134.7). However, the structured abstracts had a higher clarity score (clarity difference= 1.47 with 95% confidence interval 0.47 to 2.41) and were more complete (completeness difference=3.39 with 95% confidence intervals 4.76 to 7.56).
Conclusions: The results of this study are consistent with previous research on structured abstracts. However, in this study, the subjective estimates of completeness and clarity were made by the research team. Future work will solicit assessments of the structured and original abstracts from independent sources (students and researchers).

 

A framework for effort management in software projects
Topi Haapio

Objective: The objective of this paper is to provide a framework for effort management in software projects to increase effort estimation accuracy.
Method: We applied a multimethodological approach employing a case study and constructive research.
Results: Based on the case study on four earlier proposed frameworks related to effort management and three popular process maturity assessment models, we constructed a framework to manage effort in a software project in a proactive manner yet fulfilling the requirements of the assessment models.
Conclusions: A project's software engineering process involves various functions which have an effect on the success of cost estimation. Two approaches, proactive and reactive, can be taken to manage effort in a software project. In this paper, a case study is conducted regarding both approaches. Based on the case study, we provide a new framework for effort management in software projects to increase effort estimation accuracy.

 

Predicting Web Development Effort Using a Bayesian Network
Emilia Mendes

OBJECTIVE - The objective of this paper is to investigate the use of a Bayesian Network (BN) for Web effort estimation.
METHOD - We built a BN automatically using the HUGIN tool and data on 120 Web projects from the Tukutuku database. In addition the BN model and node probability tables were also validated by a Web project manager from a well-established Web company in Rio de Janeiro (Brazil). The accuracy was measured using data on 30 projects (validation set), and point estimates (1-fold cross-validation using a 80%-20% split). The estimates obtained using the BN were also compared to estimates obtained using forward stepwise regression (SWR) as this is one of the most frequently used techniques for software and Web effort estimation. RESULTS - Our results showed that BN-based predictions were better than previous predictions from Web-based cross-company models, and significantly better than predictions using SWR.
CONCLUSIONS - Our results suggest that, at least for the dataset used, the use of a model that allows the representation of uncertainty, inherent in effort estimation, can outperform other commonly used models, such as those built using multivariate regression techniques.

 

Systematic Review of Statistical Process Control: An Experience Report
Maria Teresa Baldassarre, Danilo Caivano, Barbara Kitchenham, and Giuseppe Visaggio

Background: A systematic review is a rigorous method for assessing and aggregating research results. Unlike an ordinary literature review consisting of an annotated bibliography, a systematic review analyzes existing literature with reference to specific research questions on a topic of interest.
Objective: Statistical Process Control (SPC) is a well established technique in manufacturing contexts that only recently has been used in software production. Software production is unlike manufacturing because it is human rather than machine-intensive, and results in the production of single one-off items. It is therefore pertinent to assess how successful SPC is in the context of software production. These considerations have therefore motivated us to define and carry out a systematic review to assess whether SPC is being used effectively and correctly by software practitioners.
Method: A protocol has been defined, according to the systematic literature review process, it was revised and refined by the authors. At the current time, the review is being carried out.
Results: We report our considerations and preliminary results in defining and carrying out a systematic review on SPC, and how graduate students have been included in the review process of a first set of the papers. Conclusions: Our first results and impressions are positive. Also, involving graduate students has been a successful experience.

 

Motivators of Software Process Improvement: An Analysis of Vietnamese Practitioners’ Views
Mahmood Niazi, and Muhammad Ali Babar

OBJECTIVE - In this paper we present findings from an empirical study that was aimed at determining the software process improvement (SPI) motivators. The main objective of this study is to provide SPI practitioners with some insight into designing appropriate SPI implementation strategies and to maximize practitioner support for SPI.
METHOD - We used face-to-face questionnaire based survey sessions as our main approach to collect data from twenty-three software development practitioners of eight Vietnamese software development companies. We asked practitioners to choose and rank various SPI motivators against the five types of assessments (high, medium, low, zero, or do not know). From this, we propose the notion of ‘perceived value’ associated with each SPI motivator.
RESULTS - We have identified 6 ‘high’ perceived value SPI motivators that are generally considered critical for successfully implementing SPI initiatives. These motivators are: cost beneficial, job satisfaction, knowledgeable team leaders, maintainable/easy processes, shared best practices, and top-down commitment. Our results show that developers are highly motivated by: career prospects, communication, cost beneficial, empowerment, knowledgeable team leaders, maintainable/easy processes, resources, shared best practices, top-down commitment, and visible success; Managers are motivated by: job satisfaction, knowledgeable team leaders, maintainable/easy processes, meeting targets, shared best practices, and top-down commitment. Our results also show that practitioners of small and medium sized companies are highly motivated by: cost beneficial, job satisfaction, knowledgeable team leaders, and maintainable/easy processes; practitioners of large companies are highly motivated by: cost beneficial, reward schemes, shared best practices and top-down commitment.
CONCLUSIONS - We believe practitioners use SPI motivators on the basis of the perceived value of that motivator. This focus on perceived value associated with each SPI motivator, offers SPI practitioners opportunities for implementing motivators to improve organisation’s SPI implementation capabilities. By analysing the degree to which practitioners use each of the SPI motivators, we can identify a number of critical SPI motivators for SPI practitioners.