Publication Details
Overview
 
 
Matthew Tonkin, Jan Lemeire, Jessica Woodhams, Dalal Alrajeh, Mark Webb, Sarah Galambos, Harriet Smailes, Amy Burrell
 

Contribution to journal

Abstract 

Objectives Develop machine learning algorithms to support behavioural crime linkage of serial sexual offences and to test these algorithms in an ecologically valid way.Methods Geographical, temporal, and Modus Operandi (MO) information relating to10,918 solved stranger sexual offences committed in the United Kingdom (UK) were used to compare 35 algorithmic approaches in terms of their ability to successfully distinguish between linked crimes (committed by the same offender) and unlinked crimes (committed by different offenders). The 35 approaches included different types of algorithm (Bayesian, regression and classification tree) and different methods of utilising MO data. The discrimination accuracy of these 35 approaches was compared using six performance metrics.Results The algorithm that utilised the new measure of behavioural similarity developed in this study and the Four Quartiles approach clearly outperformed the remaining 34 approaches across all six performance metrics. (% linked pairs in top 100 ranks = 95.00%; % linked pairs in top 500 ranks = 68.20%; AUPRC Mean [SD] = 0.26 [0.10]; AUC Mean [SD] = 0.95 [0.02]; Median First Rank = 2; Median Rank All Series = 5). Collapsing MO variables did not enhance discrimination accuracy. The new similarity metric developed in this study for quantifying behavioural similarity enhanced discrimination accuracy compared to the metric most commonly used by previous crime linkage research, Jaccard{\textquoteright}s coefficient.Conclusions Machine learning algorithms demonstrate significant potential for supporting the early identification of linked series of sexual offences in the UK. These findings provide a robust evidence base with which to begin building and implementing computer software to support human decision-making in this domain.

Reference