Data Driven Business Solutions

We specialise in Data Science and AI solutions for complex engineering and business problems.

  • Varun Rao

Road Accident Hotspot Identification using Machine Learning

Updated: Jul 21, 2020

Road traffic accidents kill hundreds of Victorians each year. For people between the ages of 1-44, land transport accidents are in the top 3 leading causes of death. Globally, road traffic accidents are the eighth leading cause of deaths, and it is estimated that the 1.3 million fatalities occurring each year cost most countries around 3% of their GDP.

Alongside various global initiatives to reduce the road toll, the Victorian government launched its flagship road safety programme, Towards Zero, in 2014. The campaign targeted a 20% reduction in deaths and a 15% reduction in serious injuries by 2020. In a previous article, we presented a straightforward examination of statewide road accidents over the years 2013-2019 from data of over 74,000 vehicular accidents compiled by the state government agency VicRoads; these accidents are shown in the figure below.

Distribution of all accidents considered

In this article, we summarise work that is described in detail in our technical report "Using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to identify Geographic Clusters of Road Traffic Accidents in Victoria".

Deep Blue AI Technical Report

This extension of our previous article uses an unsupervised machine learning approach called clustering to identify accident hotspots from the same accident data investigated in our previous article. The clustering algorithm used is called Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and offers many advantages over the more common, and practically ubiquitous, k-means clustering algorithm.

Accident hotspots

Traditional identification of accident black spots is based on road segmentation or accident characteristics, both of which ignore the geo-spatial element that is crucial to understanding underlying causes.

While a visual inspection of a simple overlay of traffic accidents is useful for ad-hoc purposes, this approach is not suitable for large sample sizes. Besides the obvious issues of visually identifying clusters in high-density areas, this method is subjective and subject to bias and interpretation. Even if humans could be trained to correctly identify clusters by hand, it would not be practical to scale this approach. This is illustrated in the figure below: the left figure shows a simple scatter plot of accidents overlaid on a map of Melbourne, while the right figure shows 5 accident clusters identified by the DBSCAN clustering algorithm. It is clear that the left image is of little value to discern any patterns or hotspots, whereas the clustered image on the right offers significantly more utility.

Unclustered (left) versus clustered (right) images of accidents in Melbourne

The insurance company AAMI publishes an annual Crash Index, which highlights hotspots from more than 340,000 accidents; however the report does not describe the methodology used. Melbourne fared particular badly in this analysis, with Plenty Road in Bundoora being rated the worst location in the country. A detailed breakdown of accidents by location is also provided by the Victorian Transport Accident Commission (TAC), but hotspots are not identified.

DBSCAN clustering algorithm

The purpose of a clustering algorithm is to identify clusters of points from unlabelled data. Clustering has many applications, such as computer vision, insurance, information retrieval, document clustering, marketing, character recognition and genomic studies. The k-means algorithm is the most popular clustering method, largely due to its simplicity and robustness, but suffers from a number of drawbacks, which are discussed in detail in our technical report. In view of these limitations, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm was used for the present work instead.

A detailed technical description of the DBSCAN algorithm is provided in our report, but a summary is presented here for non-technical readers. DBSCAN was first proposed by researchers at the University of Munich, and was recently awarded the SIGKDD Test of Time award.

DBSCAN uses a density-based approach, operating on the principle that the density of points within a cluster will be higher than outside. As a consequence, the algorithm is able to identify clusters of arbitrary shapes, one important feature that k-means lacks. The authors of the algorithm illustrated this with the figure below, showing clusters of arbitrary shape that can be easily identified by the density variation. Crucially, DBSCAN correctly handles noisy points, while k-means does not - this is an important distinction given the noisy nature of our dataset.

DBSCAN has two hyperparameters, ε and MinPts. Briefly, ε is a measure of how close points need to be to each other to be considered part of a cluster. MinPts is a threshold value indicating the minimum number of points within a radius of ε of a point to form a dense region. A more formal definition is provided in our technical report.

Clusters of arbitrary shapes (from DBSCAN authors)


The results presented here used data from 74,400 accidents and 1,413 fatalities collected by VicRoads, spanning 2013-2019. Only accidents rated as"Serious injury accident" or "Fatal accident" were considered, and clusters with fewer than 20 data points were excluded. The DBSCAN algorithm was implemented using Python with results visualised using Tableau.

The sensitivity of the model to the hyperparameters ε and MinPts is discussed in the report, and an interactive dashboard with combinations of these parameters is available below. For the purposes of this article, ε = 0.005 and MinPts =11, as these offered the most useful clustering results, but the reader is free to select any combination using the dashboard below.

The figure below shows a high-level overview of accident clusters in the Melbourne area using this combination of ε and MinPts. The freeways are clearly indicated as problem roads, accounting for virtually all the hotspots identified. This result is not surprising - our previous article describes the increase in serious and fatal accidents at higher road speeds such as those found on freeways. In the following section, we examine some of these locations in detail.

Accident clusters in the Melbourne area (ε = 0.005 and MinPts =11)

M1 - Port Melbourne and South Melbourne

The figure below shows accident clusters on the M1 freeway in the Port Melbourne area. The dominant cluster is at the intersection with Citylink, but there are smaller clusters at the ramps to Kings Way and Power Street. The Westgate Bridge is also a problem area, which is unusual because there are no intersections.

Accident clusters on the M1 freeway near the Citylink intersection

M1 - Laverton and Laverton North

The figure below shows the distribution of accidents on the M1 freeway near the M80 intersection. There are three distinct areas of concern, chief among which are the intersection with the M80, as well as the section immediately to the west. One drawback of the current method of clustering is that it is not possible to tell the direction of traffic for a given accident. However, in the personal experience of the author, who drives on this section of road frequently, the cluster is likely to be caused by vehicles travelling towards the city, and is possibly indirectly linked to the M80 intersection.

The western-most cluster is the most curious. While it occurs near a bend on the road, this is a relatively gentle turn and there are no major intersections. The reason for this high-density cluster requires further investigation.

Accident clusters on the M1 freeway near the M80 intersection

M80 - Tullamarine

The figure below shows a stretch of the Western Ring Road (M80) between the Calder Freeway and Tullamarine Freeway (M2). This short section experiences a large number of accidents, while sections to either side, ostensibly similar in nature, are relatively trouble free. This anomaly also deserves further investigation, particularly in view of its proximity to the airport.

Accident clusters on the M80 freeway near the Calder Freeway intersection

Plenty Road - Bundoora

As mentioned earlier, research published by AAMI identified Plenty Road in Bundoora as the worst accident location in the country. A direct comparison to the present work is not meaningful because we only included serious or fatal injuries in our analysis, and also excluded clusters with fewer than 20 accidents. Nevertheless, the figure below shows DBSCAN clusters for Plenty Road, clearly indicating that this section of road experiences a large number of accidents.

Accident clusters on Plenty Road in Bundoora

Interactive Dashboard

The dashboard below visualises DBSCAN clusters for the combination of ε and MinPts selected by the filters on the right hand side. This dashboard was created using Tableau and is hosted on Tableau Public.

Further Links

Deep Blue AI Technical Report: Varun Rao & Rahul Rao (2020), Using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to identify Geographic Clusters of Road Traffic Accidents in Victoria. [LINK].

Deep Blue AI Blog: Towards Zero - fact or fiction?

144 views0 comments

Recent Posts

See All