EASE 2002


 Case Study: Evaluating the Effect of Email
 Four Degrees of Separation: Intrusive and and non-intrusive data collection
On the Vital Role and Difficulty of Definition and Measurement for the Validation of S.E. Theory
Supporting Communicability with Use Case Guidelines: An Empirical Study
Comparing Effort Estimates Based on Use Case Points with Expert Estimates
Requirements Problems in Twelve Software Companies: An Empirical Analysis
Techniques for Gathering Information about an Organisation and its Needs for Software Process Improvement
A Framework for Evaluating a Software Bidding Model
Benchmarking of Process for Managing Product platforms - a Case Study
An Approach Towards a Universal  Software Evaluation Scheme
A Concerted Family of Experiments to Investigate the Influence of Context on the Effect of Inspection Techniques
Experiment about Test-first programming
Investigating the Influence of Software Inspection Process Parameters on Inspection Meeting Performance
Aggregating Viewpoints for Strategic Software Process Improvement – a Method and Case Study
Combining Quantitative Software Development Cost Estimation Precision Data with Qualitative Data from Project Experience Reports at Ericsson
Investigation of Product Process Dependency Models through Probabilistic Modeling
Validating Metrics for Data Warehouses
How productive is software development across time zones
Business Needs Driving IT Decisions - Using Feature Analysis and Stakeholder Evaluation in Rolls-Royce
Making Inferences with Small Numbers of Training Sets

Case Study: Evaluating the Effect of Email Interruptions withinthe Workplace

Thomas Jackson1,2, Ray Dawson2 and Darren Wilson 1
1 Danwood Group, Lincoln, UK
2 Loughborough University, Leics., UK – (t.w.jackson@lboro.ac.uk)


This experience report outlines the value of measuring the communication processes through electronic monitoring and is a follow up to the paper presented at the EASE 2001 conference last year. The use of email by employees at the Danwood Group was studied and it was found that the interrupt effect from emails was non-trivial. The common reaction to the arrival of an email is to react almost as quickly as responding to telephone calls. This means the interrupt effect is comparable with that of a telephone call. The recovery time from an email interruption was found to be significantly less than the published recovery time for telephone calls. It is to be concluded, therefore, that while email is still less disruptive than the telephone, the way the majority of users handle their incoming email has been shown to give far more interruption than expected.
Through analysing the data captured, the authors have been able to create a set of recommended
guidelines for email usage. The guidelines will increase employee efficiency by reducing the
prominence of interruptions, restricting the use of email-to-all messages, setting up the email
application to display three lines of the email and to check for email less frequently. It is recommended that training should be given to staff on how to use email more effectively to increase employee effectiveness.

Four Degrees of Separation: Intrusive and non-intrusive data collection

Stephen Owen, David Budgen  Pearl Brereton
Department of Computer Science
University of Keele


Though the process of data collection forms the foundation of any empirical enquiry, the process is often overshadowed by the more obvious problems of data analysis. The type of data collected can dictate the form of analysis that can be applied, and it is the type and quality of collection method employed that determines whether the resultant analysis stands or fails.

The selection of collection methods may be further motivated by a need to ascertain why a given action is taken rather than simply what is done or how an action is performed. The need to determine 'why' requires different information to the other types of questions since enquiring into the reasoning behind the performance of a task requires access to the results of the cognitive process of the subject.
In an investigation into the manner in which software documentation is consulted during the development process, a combination of  intrusive and non-intrusive methods of data collection has been used. We describe some suitable methods, and review the experiences of using two of them in combination.

On the Vital Role and Difficulty of Definition and Measurement for the Validation of S.E. Theory

Franck Xia, Dept. Computer Science, University of Missouri-Rolla, Rolla, MO65409, USA
E-mail: xiaf@umr.edu, Phone: 1-573-341.6737, Fax: 1-573-341.4501


This paper articulates two fundamental issues vital for the evaluation and assessment of software engineering theory: concept definition and measurement. We argue that the SE research community needs to define rigorously technical concepts and then develop sound measures before we design experiments for testing hypotheses and theories.

Supporting Communicability with Use Case Guidelines: An Empirical Study

Keith Phalp & Karl Cox
{kphalp, coxk}@bournemouth.ac.uk
Empirical Software Engineering Research Group,
School of Design, Engineering & Computing, Bournemouth University
Poole House, Talbot Campus, Fern Barrow, Poole, Dorset, UK


Recent research into use cases advocates guidance in the composition and structuring of
their descriptions. For example, the CREWS1 research project has proposed Authoring
Guidelines and Cockburn’s recent book has also furthered the field of writing
descriptions [1]. As such, this paper describes an experiment, which compares the utility
of use-case writing guidelines, specifically our own (CP) guidelines with those from the
CREWS group.
Replication of CREWS studies has indicated that use cases are improved through the
application of guideline sets, but that their evaluation is possibly flawed. Further
experimentation has focused on comparing our (leaner) CP Rules to those of CREWS.
Our pilot experiment appeared to show that the CP Rules performed as well as those
from CREWS across two types of scenario or domain. However, we further noted from
the pilot that the CP Structure Rules actually fared better for one of the scenarios. Hence,
the pilot seemed to suggest that the CP Rules would perform at least as well as the
CREWS Guidelines, whilst also flagging a possible need to consider the type of scenario.
The experiment now described thus attempts to confirm the findings of the pilot. In some
senses, this has been successful, in that we conclude the CP Structure Rules either fare as
well or better than those from CREWS. However, we also note that the improved CP
scores seen were for the opposite scenario to those for the pilot. Hence, whilst being
gratified by the general trend of the result we also consider the issues of combining
studies in this way.

Comparing Effort Estimates Based on Use Case Points with Expert Estimates

Bente Anda
Department of Informatics
University of Oslo
P.O. Box 1080 Blindern
NO–0316 Oslo
IBM Norway AS
P.O. Box 500
NO–1411 Kolbotn


Use case models are used in object-oriented analysis for capturing
and describing the functional requirements of a system. Attributes of a use case
model may therefore serve as measures of the size and complexity of the
functionality of a system. Many organizations use a system's use case model in
the estimation process.
This paper reports the results from a study conducted to evaluate a method for
estimating software development effort based on use cases, the use case points
method, by comparing it with expert1 estimates. A system was described by a
brief problem statement and a detailed use case model. The use case points
method gave an estimate that was closer to the actual effort spent on
implementing the system than most estimates made by 37 experienced
professional software developers divided into 11 groups (MRE of 0.21 versus
MMRE of 0.37).
The results support existing claims that the use case points method may be
used successfully in estimating software development effort. They also show
that the combination of expert estimates and method based estimates may be
particularly beneficial when the estimators lack specific experience with the
application domain and the technology to be used.

Requirements Problems in Twelve Software Companies: An Empirical Analysis

Tracy Hall, Sarah Beecham & Austen Rainer
Department of Computer Science, University of Hertfordshire
<t.hall, s.beecham, a.w.rainer>@herts.ac.uk


In this paper we discuss our study of the problems twelve software companies experienced in their requirements process. The aim of the work is to develop a more holistic understanding of the
requirements process, so that companies can more effectively organise and manage requirements.
Our main findings support the conventional wisdom that the requirements process is a major source of problems in the software development process. Furthermore our findings also suggest that most requirements problems are organisational rather than technical. Our results also show that there is a relationship between companies' maturity and patterns of requirements problems.

Techniques for Gathering Information about an Organisation and its Needs for Software Process Improvement

Espen Frimann Koren, Simula Research Laboratory, Norway
Hans Westerheim, SINTEF Informatics, Norway


This paper reports on how the authors planned and conducted the work in the initial
phase of software process improvement in a Norwegian software house. The work
was motivated by the results from the projects SPIQ (Dybå, Wedde et al. 2000) and
PROFIT (http://www.geomatikk.no/profit), where a pragmatic approach to software
process improvement was the main focus. The way the work was conducted was
further motivated by the results of Tore Dybå (Dybå 2001). He claimed that for
instance participation of employees and exploitation of existing knowledge has
positive influence on the success of software process improvement. This led us to
arrange workshops with participants from all relevant parts of the organisation to
gather a picture as broad as possible of the situation.
The paper is organised as follows: First we describe the motivation behind the
choice of method. Then we describe the case study. Then we discuss the results we
got from our investigations. The paper ends with a conclusion and a description of
further work.

A Framework for Evaluating a Software Bidding Model

Barbara Kitchenham, Lesley Pickard, Stephen Linkman and Peter Jones
Email: barbara@cs.keele.ac.uk
Software Engineering Group
Department of Computer Science
Keele University
Keele, Staffs, ST5 5BG


This paper discusses the issues involved in evaluating a software bidding model. We found it difficult to assess the appropriateness of any model evaluation activities without a baseline or standard against which to assess them. This paper describes our attempt to construct such a baseline. We reviewed evaluation criteria used to assess cost models and an evaluation framework that was intended to assess the quality of requirements models. We developed an extended evaluation framework that will be used to evaluate our bidding model. Furthermore, we suggest the evaluation framework might be suitable for evaluating other models derived from expert opinion based influence diagrams.

Benchmarking of Processes for Managing Product platforms - a Case Study

Martin Höst, Enrico Johansson, Adam Noren, Lars Bratthall


This paper presents a case study where two organisations participate in a benchmarking
initiative in order to find improvement suggestions for their processes for managing
product platforms. The initiative is based on an instrument which consists of a list of
questions. It has been developed as part of this study and it contains eight major categories
of questions that guide the participating organisations to describe their processes.
The descriptions are then reviewed by the organisations cross-wise in order to
identify areas for improvement. The major objective of the case study is to evaluate the
benchmarking instrument in practice. The result is that the benchmarking procedure
with the benchmarking instrument was well received in the study. We can therefore
conclude that the approach probably is applicable for other similar organisations as

An Approach Towards a Universal  Software Evaluation Scheme

Franz Gruber, Christa Illibauer
Software Competence Center Hagenberg
Hauptstraße 99 A-4232 Hagenberg, Austria


Introducing new and innovative software solutions requires to be founded on an appropriate underline software and hardware infrastructure. Therefore, nearly every project
contains a task comparing and finding “optimal” products on the market. Due to the different requirements in the various projects the design of a universal evaluation procedure is a
sophisticated task.
Our approach differs from other papers regarding evaluation in the way that we present a general evaluation scheme, which can be applied to every software evaluation project.
This general evaluation scheme includes mandatory criteria and aspects as well as their weights. We also describe in which manner our evaluation scheme can be adapted to
special needs. This approach has the advantage of being customizable and highly flexible according to the specific, different requirements.
Another main advantage of our approach is that the whole process can be supported by software. This software includes the administration, the execution, weighting as well as the evaluated products with documentation etc. It also provides the administration of different weightings in one

A Concerted Family of Experiments to Investigate the Influence of Context on the Effect of Inspection Techniques

Marcus Ciolkowski
University of Kaiserslautern
and Fraunhofer IESE,
Kaiserslautern, Germany
Forrest Shull
Fraunhofer Center Maryland,
Stefan Biffl
Fraunhofer IESE,
Kaiserslautern, Germany
and Technische Univ. Wien


For a growing population of researchers in software engineering, empirical studies have become a key approach of research. Empirical studies may be used, for example, to evaluate technologies and help to direct further research by revealing what problems and difficulties people have in practice. Without empirical studies, we have to rely on intuition or educated opinion.
Individual empirical studies often yield interesting results for their particular context, but typically this context is not described in sufficient and detail to decide whether another context is similar enough to apply the conclusions of the study also there.
Approaches which abstract study context, such as meta analysis, can by definition do little to answer questions on the influence of context on the effectiveness of software engineering techniques. The generalization of individual study results need a consistent and systematic definition of context variables for the studies involved.
This paper describes a method to plan, conduct, and analyze concerted families of experiments. The goal of the method is to maximize the quality and benefit of the individual empirical studies as part of the family and to minimize the effort for researchers by reusing experiment know-how. This is achieved by providing, for all studies of the experiment family, a common framework for context measurement, study preparation, material, and analysis.
We apply the method to describe the planning steps for an experiment family on the influence of context on the effectiveness of defect reduction techniques with a focus on reading techniques for inspection. The experiment family is planned and conducted by members and industrial collaborators of the International Software Engineering Research Network (ISERN). First results are scheduled until mid-2002, final results shall be available until 2003.
The first step of this experiment family is a broad survey of software companies on the state of the practice of inspection process and inspection techniques. The second step is to benchmark state-of-the-art inspection techniques with the participating organization’s own documents and inspection techniques.

Experiment about Test-first programming

Matthias M. Meuller, Oliver Hagner
Computer Science Department
University at Karlsruhe, Germany


Test- first programming is one of the central techniques of Extreme Programming.
Programming test- first means (1) write down a test-case before coding and (2) make all the
tests executable for regression testing. Thus far, knowledge about test- first programming
is limited to experience reports. Nothing is known about the bene ts of test- rst compared
to traditional programming (design, implementation, test). This paper reports about an
experiment comparing test- first to traditional programming. It turns out that test- first
does not accelerate the implementation and the resulting programs are not more reliable,
but test- first seems to support better program understanding.

Investigating the Influence of Software Inspection Process Parameters on Inspection Meeting Performance

Michael Halling
Institute for Software Technology
Vienna University of Technology
Karlsplatz 13, A-1040 Vienna, Austria
Stefan Biffl
Fraunhofer Institute for Experimental
Software Engineering
D-67661 Kaiserlautern, Germany
halling@swt.tuwien.ac.at Stefan.Biffl@iese.fhg.de


The question whether meetings justify their costs has been discussed in several studies. However, it is still open, how modern defect detection techniques and team size influence meeting performance, particularly with respect to defect severity classes.
In this paper we investigate the influence of software inspection process parameters (defect detection technique, team size, meeting effort) on defect detection effectiveness, i.e., the number of defects found, for 31 teams, which inspected a requirements document, to shed light on the performance of inspection meetings. We compare the set of defects reported by each team after the individual preparation phase – nominal-team performance – and after the team meeting – real-team performance.
Main findings are that nominal teams perform significantly better than real teams with respect to effectiveness for all defect classes. This implies that meeting losses are on average higher than meeting gains. Meeting effort was positively correlated with meeting gains indicating that synergy effects can only be realized if enough time is available. Regarding meeting losses we confirm existing reports that for a defect the probability to get lost in a meeting decreases with an increase of the number of inspectors who detected this defect during individual preparation. In the experiment context different defect detection techniques and team sizes did not systematically influence meeting performance. This indicates that defect detection techniques applied during individual preparation do not influence meeting performance.

Aggregating Viewpoints for Strategic Software Process Improvement – a Method and Case Study

Daniel Karlström
Dept. Communication Systems,
Lund University,
Box 118, SE-221 00 Lund,

Per Runeson
Dept. Communication Systems,
Lund University,
Box 118, SE-221 00 Lund,

Claes Wohlin
Dept. Software Eng. & Comp. Sc. ,
Blekinge Institute of Technology,
Box 520, SE-372 25 Ronneby,


Decisions regarding strategic software process improvement (SPI) are generally based on the
management’s viewpoint of the situation, and in some cases also the viewpoints of some kind of an SPI group. This may result in strategies, which are not accepted throughout the organisation, as the views of how the process is functioning are different throughout the company. This paper describes a method for identifying the major factors affecting a process improvement goal. The method lets individuals from the whole development organisation rate the expected effect of these factors from their own viewpoint. In this way the strategic SPI decision can be taken using input from the entire organisation, and any discrepancies in the ratings can also give important SPI decision information.
The method is applied in a case study performed at Fuji Xerox, Tokyo, which is reported in this paper. In the case study, significantly different profiles of the factor ratings came from management compared to the ones from the engineering staff. This result can be used to support the strategy decision as such, but also to anchor the decision in the organisation.

Combining Quantitative Software Development Cost Estimation Precision Data with Qualitative Data from Project Experience Reports at Ericsson

Design Center in Norway
Magne Jørgensen, Simula Research Laboratory, Norway
Normann Løvstad and Liv Moen, Ericsson Design Center, Norway


This paper suggests and exemplifies how project cost estimation precision data should be
analysed to enable better learning from completed projects. The suggestions are based on
an analysis of data included in an experience database recently introduced at Ericsson
Design Center in Norway and other project information sources.
Ericsson Design Center in Norway has approx. 1000 employees. Most employees develop
software and hardware for telecom applications, e.g., network management systems. The
development projects follow well-defined processes and deliver an experience report before
completion. A project’s performance is, amongst others, measured as the cost estimation
precision, i.e., the relative deviation between estimated and actual cost.

Investigation of Product Process Dependency Models through Probabilistic Modeling

Manoranjan Satpathy           Rachel Harrison          Daniel Rodriguez
School of Computer Science, Cybernetics and Electronic Engineering
University of Reading, Reading RG6 6AY, UK
{M.Satpathy, Rachel.Harrison, d.rodriguez-garcia}@reading.ac.uk


The motivation behind the idea of product focused process improvement is to make a process improvement program address certain product quality features in an explicit manner. The PROFES methodology [PROFES (2001)] describes such an improvement program through the notion of a PPD (Product-Process Dependency) repository. A PPD model tells which improvement action will result in the improvement of which product quality. Because of the associated cost, only a small number of improvement actions can be implemented. Therefore it becomes imperative that we must be sure of the impact of the improvement actions. In this paper, we discuss how Bayesian Networks [Jensen (1996)] can be used to predict the outcome of PPD models and hence the impact of the associated improvement actions.

Validating Metrics for Data Warehouses

Manuel Serrano, Coral Calero, Mario Piattini,
{mserrano, ccalero, mpiattini}@inf-cr.uclm.es
Alarcos Research Group
Escuela Superior de Informática
University of Castilla – La Mancha
Paseo de la Universidad, 4
13071 Ciudad Real


A data warehouse (DW) is a set of data and technologies aimed at enabling the
executives, managers and analyst to make better and faster decisions. So organizations
are adopting data warehouses to manage information efficiently as “the” main
organizational asset. Due to the principal role of data warehouse in taking strategic
decisions its quality is fundamental. The design of the star schema of a data warehouse
is one of the most important factors affecting the quality of the final system. In the last
years different authors have proposed some useful guidelines to design to design data
warehouse models, however more objective indicators are needed to help designers and
managers to develop quality data warehouses. In this paper we will propose a set of
metrics for data warehouse models and we will show their empirical validation.

How productive is software development across time zones

Adel Taweel, Pearl Brereton
Department of Computer Science
Keele University
E-mail O.P.Brereton or A.A.M.Taweel @cs.keele.ac.uk


This paper introduces a sequential collaborative software engineering process involving shift working across time zones and describes an exploratory empirical study of this working pattern involving the implementation of a small scale software system.
The paper reports on the organoisation of the study and (brieflly) on the results obtained through questionnaires, observations and measurement.

Business Needs Driving IT Decisions - Using Feature Analysis and Stakeholder Evaluation in Rolls-Royce

Mark de Chazal Heulwen Pearce Ray Dawson
Rolls-Royce Rolls-Royce, Derby Loughborough University
Loughborough University
Mark.dechazal@virgin.net R.J.Dawson@lboro.ac.uk


This experience paper is a follow-on from an experience paper presented at last year’s
EASE conference which looked at the use of feature analysis to support strategic IT
decision-making within an engineering support function in Rolls-Royce [1].
The feature analysis tool is exceedingly powerful and informative, particularly
appreciated by senior managers. The graphical output enabled senior managers to
make strategic decisions quickly and effectively. The tool was extended by the
creation of different views of the business, of which the feature and system
comparison view has been by far the most useful. The other views proved interesting,
but did not ultimately have the impact of the feature and system comparison.
Feature analysis is an informative and effective tool for managers and developers but
is not complete in itself. A stakeholder analysis also proved extremely useful and
powerful. It can concisely summarise who has influence and why. The analysis
focussed thinking on how to manage the stakeholders. Feature analysis combined with
stakeholder analysis proved to be particularly effective.

Making Inferences with Small Numbers of Training Sets

Colin Kirsopp, Martin Shepperd
Empirical Software Engineering Research Group
School of Design, Engineering & Computing
Bournemouth University
Royal LondonHouse
Bournemouth, BH1 3LT, UK
{ckirsopp, mshepper}@bmth.ac.uk


This paper discusses a potential methodological problem with empirical studies
assessing project effort prediction systems. Frequently a hold-out strategy is deployed
so that the data set is split into a training and a validation set. Inferences are then
made concerning the relative accuracy of the different prediction techniques under
examination. Typically this is done on very small numbers of sampled training sets.
We show that such studies can lead to almost random result (particularly where
relatively small effects are being studied). We analyse two data sets, using a
configuration problem for case-based prediction as an example, and generate results
from 100 training sets. This enables us to produce results with quantified confidence
limits. From this we conclude that in both cases using less than five training sets leads
to untrustworthy results and ideally more than 20 sets should be deployed.
Unfortunately this poses something of a question over a number of empirical
validations of prediction techniques and so we suggest that further research is needed
as a matter of urgency.

Return to EASE Home Page