An Initial Quality Analysis of the Ohloh Software Evolution Data

Magiel Bruntink

doi:10.14279/tuj.eceasst.0.906

An Initial Quality Analysis of the Ohloh Software Evolution Data

Magiel Bruntink

Abstract

Large public data sets on software evolution promise great value to both researchers and practitioners, in particular for software (development) analytics. To realise this value, the data quality of such data sets needs to be studied and improved. Despite these data sets being of a secondary nature, i.e., they were not collected by the people using them, data quality is often taken for granted, casting doubt on conclusions drawn from those data. This paper reports on an intial investigation of the quality of the software evolution data available on Ohloh, and further describes steps taken to cleanse the data set. Our goal is that other researchers, practitioners, and parties responsible for data sets such as Ohloh, use the outcomes of the validation and cleansing steps to improve quality of data sets in the public domain.

Full Text:

PDF

DOI: http://dx.doi.org/10.14279/tuj.eceasst.0.906

DOI (PDF): http://dx.doi.org/10.14279/tuj.eceasst.0.906.889

Hosted By Universitätsbibliothek TU Berlin.

Username
Password
Remember me

Electronic Communications of the EASST is a peer-reviewed, scientific and open access journal ISSN 1863-2122

An Initial Quality Analysis of the Ohloh Software Evolution Data

Abstract

Full Text:

Electronic Communications of the EASST
is a peer-reviewed, scientific and open access journal
ISSN 1863-2122