Objectives Develop machine learning algorithms to support behavioural crime linkage of serial sexual offences and to test these algorithms in an ecologically valid way.Methods Geographical, temporal, and Modus Operandi (MO) information relating to10,918 solved stranger sexual offences committed in the United Kingdom (UK) were used to compare 35 algorithmic approaches in terms of their ability to successfully distinguish between linked crimes (committed by the same offender) and unlinked crimes (committed by different offenders). The 35 approaches included different types of algorithm (Bayesian, regression and classification tree) and different methods of utilising MO data. The discrimination accuracy of these 35 approaches was compared using six performance metrics.Results The algorithm that utilised the new measure of behavioural similarity developed in this study and the Four Quartiles approach clearly outperformed the remaining 34 approaches across all six performance metrics. (% linked pairs in top 100 ranks = 95.00%; % linked pairs in top 500 ranks = 68.20%; AUPRC Mean [SD] = 0.26 [0.10]; AUC Mean [SD] = 0.95 [0.02]; Median First Rank = 2; Median Rank All Series = 5). Collapsing MO variables did not enhance discrimination accuracy. The new similarity metric developed in this study for quantifying behavioural similarity enhanced discrimination accuracy compared to the metric most commonly used by previous crime linkage research, Jaccard{\textquoteright}s coefficient.Conclusions Machine learning algorithms demonstrate significant potential for supporting the early identification of linked series of sexual offences in the UK. These findings provide a robust evidence base with which to begin building and implementing computer software to support human decision-making in this domain.
Tonkin, M, Lemeire, J, Woodhams, J, Alrajeh, D, Webb, M, Galambos, S, Smailes, H & Burrell, A 2025, 'Building the Statistical Evidence Base for Crime Linkage Decision-Support Tools with Sexual Offences', Journal of Quantitative Criminology, vol. 42, no. 1, pp. 171-201. https://doi.org/10.1007/s10940-025-09622-w
Tonkin, M., Lemeire, J., Woodhams, J., Alrajeh, D., Webb, M., Galambos, S., Smailes, H., & Burrell, A. (2025). Building the Statistical Evidence Base for Crime Linkage Decision-Support Tools with Sexual Offences. Journal of Quantitative Criminology, 42(1), 171-201. https://doi.org/10.1007/s10940-025-09622-w
@article{048fb3c23cf34db18e7b2b7ec693c109,
title = "Building the Statistical Evidence Base for Crime Linkage Decision-Support Tools with Sexual Offences",
abstract = "Objectives Develop machine learning algorithms to support behavioural crime linkage of serial sexual offences and to test these algorithms in an ecologically valid way.Methods Geographical, temporal, and Modus Operandi (MO) information relating to10,918 solved stranger sexual offences committed in the United Kingdom (UK) were used to compare 35 algorithmic approaches in terms of their ability to successfully distinguish between linked crimes (committed by the same offender) and unlinked crimes (committed by different offenders). The 35 approaches included different types of algorithm (Bayesian, regression and classification tree) and different methods of utilising MO data. The discrimination accuracy of these 35 approaches was compared using six performance metrics.Results The algorithm that utilised the new measure of behavioural similarity developed in this study and the Four Quartiles approach clearly outperformed the remaining 34 approaches across all six performance metrics. (% linked pairs in top 100 ranks = 95.00%; % linked pairs in top 500 ranks = 68.20%; AUPRC Mean [SD] = 0.26 [0.10]; AUC Mean [SD] = 0.95 [0.02]; Median First Rank = 2; Median Rank All Series = 5). Collapsing MO variables did not enhance discrimination accuracy. The new similarity metric developed in this study for quantifying behavioural similarity enhanced discrimination accuracy compared to the metric most commonly used by previous crime linkage research, Jaccard{\textquoteright}s coefficient.Conclusions Machine learning algorithms demonstrate significant potential for supporting the early identification of linked series of sexual offences in the UK. These findings provide a robust evidence base with which to begin building and implementing computer software to support human decision-making in this domain.",
author = "Matthew Tonkin and Jan Lemeire and Jessica Woodhams and Dalal Alrajeh and Mark Webb and Sarah Galambos and Harriet Smailes and Amy Burrell",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",
year = "2025",
month = jul,
day = "3",
doi = "10.1007/s10940-025-09622-w",
language = "English",
volume = "42",
pages = "171--201",
journal = "Journal of Quantitative Criminology",
issn = "0748-4518",
publisher = "Springer Nature",
number = "1",
}