For Severe and Fatal Road Traffic Accidents
For Severe and Fatal Road Traffic Accidents
Promotion
To cite this article: Katharine Reeves, Joht Singh Chandan & Siddhartha Bandyopadhyay
(2019): Using statistical modelling to analyze risk factors for severe and fatal road
traffic accidents, International Journal of Injury Control and Safety Promotion, DOI:
10.1080/17457300.2019.1635625
Article views: 26
Using statistical modelling to analyze risk factors for severe and fatal road
traffic accidents
Katharine Reevesa, Joht Singh Chandanb and Siddhartha Bandyopadhyaya
a
Department of Economics, University of Birmingham, Birmingham, UK; bInstitute of Applied Health Research, College of Medical and
Dental Sciences, University of Birmingham, Birmingham, UK
(Ball, Edwards, Ross, & McGwin Jr., 2010; Langford & The severity level of an accident (slight, severe or fatal) is
Koppel, 2006; Rolison, Hanoch, Wood, & Liu, 2014; recorded as a categorical variable. Further, the numerical
Rolison, Regev, Moutari, & Feeney, 2018). Surveys in the value assigned to this variable (1, 2 or 3) is ordinal in
UK and Europe have explored the driver’s perceptions of fac- nature. Ordered categorical variables are non-continuous,
tors which contribute to the severity of the RTAs from their bounded and cannot be measured on an interval or ratio
past experience and these include driving behaviour (speeding, scale; therefore, an ordinary least squares approach would not
distraction, lapses of attention and aggression), risk taking and be suitable in this circumstance to assess risk factors leading
driving when fatigued (Antov et al., 2010; Smith, 2016). to such outcomes. It is necessary to use a group of models
However, due to differing risk profiles in the UK compared to specifically designed for this type of dependent variable,
other global cohorts, it is important to continually explore known as ordered logistic regression models (OLOGIT) or
possible risk factors for the risk of severe and fatal RTAs proportional odds models. For this type of model (OLOGIT),
which could influence future policy making in the area. Also, the dependent variable has M discrete levels and M-1 binary
it is clear that few global studies have utilized statistical mod- logistic regressions are estimated using grouped values of the
elling methods such as the classification and regression tree dependent variables. Supplementary 4 illustrates an example
(CART) to allow for the identification of combination of risk where the dependent variable has three levels and the model
factors, as opposed to the individual effects of each risk factor. estimates two logistic regressions. For the first regression, the
The aim of the research in this article is to identify driver dependent variable is equal to 0 when Y ¼ 1 and one when
characteristics and environmental factors which affect the Y ¼ 2 or 3; for the second, it is equal to 0 when Y ¼ 1or 2
severity of RTAs in a population in two English counties and one when Y ¼ 3. Accordingly, the estimated coefficients
(Norfolk and Suffolk) using statistical methods. This type of are the effect of a change in the confounding factors on the
analysis would be beneficial to the police, health services and odds that Y ¼ 2 or 3 in the first regression and Y ¼ 3 in
road safety agencies, as they can identify groups of drivers the second.
and circumstances which are most at risk of killed or ser- Equation 1 gives the model for the generalized ordered
iously injured (KSI) accidents. Two methods have been used logit regression (GOLOGIT), of which OLOGIT is a special
which can be used in other public health analysis settings case. As can be seen in the model, the proportional odds
particular where the cause of a change in the dependent vari- assumption is relaxed, and a separate set of coefficients is
able is multifactorial such as in a complex service analysis. estimated for each logistic regression. This model is suitable
when the above tests indicate that the assumption for
OLOGIT is not met.
Methods Equation 1:
Study population and period expðaj þ Xi bj Þ
P ðYi > jÞ ¼ ; j ¼ 1; 2; . . . ; M 1
1 þ ½expðaj þ Xi bj Þ
This article uses a driver-level dataset of 76,334 records
compiled by Norfolk and Suffolk police forces from RTAs
The disadvantage of this, however, is that there are now
during the years 2005 to 2014. Police reports are used to
M-1 sets of coefficients to interpret and work with, rather
record the driver characteristics and environmental factors
than just one. So the most common approach is a mixture
surrounding an accident and the details of these variables
of the two above called the partially constrained generalized
are described in Supplementary 2.
ordered logit regression (PC-GOLOGIT) (Long, 1997;
The dataset includes driver characteristics such as gender,
Williams, 2016; Williams & Williams, 2006). This model is
age, ethnicity, breath test result, whether they were wearing often the most appropriate in practice as it can accommo-
a seatbelt and whether or not they hold a UK driving date both OLOGIT and GOLOGIT models where needed
licence. Additionally, several variables are included to and is seen below in Equation 2, as is what is used for the
describe the environment and characteristics of the accident first part of analysis.
itself including the condition of the road, visibility, number Equation 2:
of casualties, time of day, day of the week, road class and
exp aj þ X1i b1 þ X2i b2 þ X3i b3j
type, speed limit and weather conditions. The outcome vari- PðYi >jÞ ¼ ; j ¼ 1; 2; . . . ; M 1
able for this analysis is the severity of an accident, which is 1 þ exp aj þ X1i b1 þ X2i b2 þ X3i b3j
recorded categorically as 1, 2 or 3 for the categories ‘slight’,
‘severe’ and ‘fatal’, respectively, as defined by the UK govern- Statistical significance will be set at p < 0.05.
ment (Department for Transport, 2017) (see Supplementary 3
for definitions of slight, severe and fatal). Classification and regression tree analysis
The second method used is the CART model. This method
Statistical analysis enables us to identify groups of risk factors that are related
to severity of an accident. This type of analysis lends itself
Partially constrained generalized ordered logistic regres- well to datasets with several categorical variables and is also
sion (PC-GOLOGIT) good at handling interactions in addition to heterogeneity.
The first model used in order to identify individual risk fac- CART is used in data mining which seeks to predict the
tors associated with severity is the PC-GOLOGIT model. outcome of future events based on the characteristics of past
INTERNATIONAL JOURNAL OF INJURY CONTROL AND SAFETY PROMOTION 3
events. There are several algorithm options, all of which cre- There is, therefore, a trade-off between producing a small
ate splits in the group of observations and create a tree. tree which is easy to interpret and over-fitting the model to
There are several advantages to using this type of model, the dataset in order to achieve a low percentage error. The
particularly when analyzing categorical variables, as is often majority of papers who use this analysis method settle on
the case when the data are derived from forms or surveys using 50% of the dataset in order to strike a balance
such as this road accident data. The repeated splitting of between these two objectives and this is the training set size
observations into groups is most natural for categorical vari- which is used in this article.
ables which can be split into clearly defined categories. An The test which maximizes the information gain is chosen
additional benefit of CART over traditional regressions is at each stage in order to create the next set of branches. A
that they work well with interactions, for example a fatal test for a continuous variable often comes in the form of an
accident may be more likely if a driver is both a certain age inequality, and for a categorical variable, branches represent
and driving at certain speed. These interactions can be diffi- one or more classes within that category.
cult to include in regressions since there are so many possi- Repetitive splitting of observations in a non-trivial way
bilities to consider. Adding in further interaction terms will always result in single-class leaves eventually, but the
within a logistic regression model requires substantially aim of designing a model is to produce a tree which is small
more computing power and decreases the degrees of free- enough to interpret with the minimum percentage error
dom. In a CART model, however, the nature of the tree possible (Quinlan, 1993). With this in mind, Supplementary
means that all interactions are naturally included if they are 5 illustrates the tree size and the percentage error when the
optimal without having to specify them and having to minimum number of cases per leaf is restricted. This is a
reduce the degrees of freedom. form of ‘pre-pruning’, where a decision is made about the
There are several versions of the CART model which use size of the tree before it is run. By increasing the number of
slightly different algorithms. The analysis in this article uses minimum cases to eight, the size of the tree becomes man-
the C5.0 algorithm (Quinlan, 1993). Due to the versatility of ageable without much of an effect on the percentage error.
this method of analysis and advancements in computing The result of this process is the classification tree in Figure
power, it can be applied to, for example, designing predict- 3, which when tested on the test dataset, correctly predicts
ive models of medical outcomes such as mortality rates or the severity of accidents for 83.7% of observations.
in other healthcare settings (Morgan, 2014).
The C5.0 algorithm starts with a training dataset, a subset
of the original dataset on which the model is built in order Results
to predict the outcomes of the remaining observations Individual variables influencing the severity of traffic
For this analysis, driver-level data were identified in
accidents using PC-GOLOGIT
order to analyze the effect of driver characteristics in add-
ition to factors surrounding the accident on the severity of The results for the partially constrained model are presented
an accident. The classification tree in this analysis is used to in Table 1 as the change in log odds. Each variable has two
identify both groups and situations where the risk of KSI estimated coefficients, one for changing from ‘slight’ to
accidents is higher as well as predict periods of peak ‘severe or fatal’ and one for ‘slight or severe’ to ‘fatal’.
demand on services. This is done by building a model Where there is no significant difference between them, only
which has predictive power for a test dataset. one is reported. In summary, the following variables have
The driver-level dataset for Norfolk and Suffolk contains statistically significant estimated coefficients in terms of log
76,334 records, for which the accident severity, driver char- odds (95% confidence intervals in parentheses):
acteristics and other confounding factors are recorded. In
order to avoid selection bias, half of the dataset is selected Female driver: 0.41 (0.59, 0.23)
at random to create the training set on which to build the Positive breath test: 1.04 (0.69, 1.39)
model and the other half is reserved for testing when the No seatbelt: 1.42 (1.03, 1.81)
model is complete. The training set, therefore, contains No UK licence: 0.63 (1.12, 0.14)
38,167 observations which belong to one of three severity Dark with no street lighting: 0.33 (0.13, 0.53) and
classes: slight, severe or fatal. 0.85 (0.42, 1.29)
The first decision to make is the proportion of the data Casualties: 0.39 (0.32, 0.46)
which should be used as the training set and consequently Slip road: 1.98 (3.62, 0.33)
how large the test set should be, which is the set of observa- Speed limit: 0.02 (0.01, 0.03)
tions on which the model will be tested, and a percentage Raining with high winds: 0.94 (1.72, 0.16)
error will be calculated to show the proportion of observations Old age pensioners (OAPs): 0.44 (0.30, 0.57) and
for which the outcome variable class in incorrectly predicted. 0.67 (0.43, 0.90)
As shown in Figure 1, the percentage of observations Pedestrians: 1.25 (0.63, 1.88)
used in the training set affects both the size of the tree pro-
duced and the percentage of incorrectly predicted observa- As the outcomes are in log odds, this can be difficult to
tion classes. As the size of the training set increases, so too interpret; therefore, Supplementary 6 presents the marginal
does the size of the tree and the percentage error falls. effects which give the probability of a particular outcome
4 K. REEVES ET AL.
that this is not the only combination of factors which will which are a positive breath test, not wearing a seatbelt and
result in a fatal accident, but also there may be drivers in having an accident while not on a slip road.
other leaves who fall into the same class. The classification Using CART analysis, we have identified several combi-
merely shows that the majority of this group were involved nations of factors that are found to be associated with the
in a fatal accident and also share these characteristics. risk of fatal accidents such as the accident including: more
Another point to keep in mind is that the results refer to than one OAP involved, one or no cycles involved, zero
combinations of factors, which is different to the marginal pedestrians, speed limit over 40 mph, either a dual carriage-
effects of individual variables found in regression analysis. way, one-way street or a single carriageway, more than five
For example, a speed limit which is over 40 mph only casualties and more than five vehicles. This indicates that
increases the probability of an accident being fatal when there are specific groups which could be the focus of poli-
combined with the other relevant factors. cies aimed at reducing the severity of future accidents.
factors, inappropriate or excessive speed, not using seatbelts the notable risk factors for severity of accidents in this study
or child restraints, not using helmets on two-wheeled have been demonstrated in the literature elsewhere. One of
vehicles, insufficient crash protection, involvement of drugs the strongly correlated associated risk factors, having a posi-
or alcohol, roadside objects not crash protective. However, tive breath test, is a clear indicator as to the importance of
less research has been specifically conducted in high-income alcohol impairing driving ability. Previous literature has
countries such as the UK to identify the importance of these indicated that the use of breath testing is still not necessarily
risk factors leading to different severity levels of accidents. routinely carried out at all RTAs within the UK (Rolison
Using multivariate modelling methods, a recent paper et al., 2018; Tunbridge & Harrison, 2017). Considering the
assessed the impact of speed variation on severity of seriously increased risk of serious injury and fatality associ-
accidents, identifying that speed alone is not the only ated with alcohol use, policy must certainly focus on educat-
factor contributing to accident severity but acts in combin- ing drivers as to the dangers of drink driving (Tunbridge &
ation of several other factors such as weather also identified Harrison, 2017). Not wearing a seatbelt identified as an
in this current study (Choudhary, Imprialou, Velaga, & important risk factor in our study has also been highlighted
Choudhary, 2018). within the WHO report, and it is clear from evaluations
conducted globally that encouraging seatbelt use is a good
marker for reducing severity of RTAs (WHO, 2015a).
What this study adds
Particularly interestingly, within our study we identified that
Our study provides up to date analysis on a wide variety of unlicensed drivers were also a contributing factor to the
risk factors, including those not previously clearly suggested severity of RTAs. A possible explanation for this is that
in the literature relating to the severity of accidents such as unlicensed drivers are thought to engage in more high-risk
possession of a UK licence and involvement of OAPs. By driving behaviours and also have a much decreased odds of
using two complementary methods, the strength of individ- utilizing safety features within vehicles such as seatbelts (Fu,
ual risk factors as well as the importance of a combination Anderson, Dziura, Crowley, & Vaca, 2012). Within the UK,
of risk factors which can be targeted was identified. Some of it is currently against the law to drive without a licence;
INTERNATIONAL JOURNAL OF INJURY CONTROL AND SAFETY PROMOTION 7
however, we hope this research highlights the significance of risk factors in RTAs. Through use of sophisticated statistical
doing so as to further educate those who may consider modelling, several risk factors have been identified which
engaging in such activities. Finally, another key result we could be used to target policies to discourage drivers from
identified was the negative impact of driving on unlit roads. acting in ways that increase the probability of being KSI if
The Cochrane collaboration has identified that street light- they are involved in an accident, as well as identify at-risk
ing is a low cost and cost-effective method for reducing groups who could be targeted in peak demand. However,
RTAs (Beyer & Ker, 2009). However, more recent research further work needs to be done across England to identify
indicates that amending street lighting alone is not an whether this model appropriately predicts risk of KSI in
effective response, it must happen in conjunction with other other settings.
policy measures to tackle other risk factors identified in
our study and previous literature (Fotios & Price, 2017). In
addition to this, this study has demonstrated the use of Acknowledgements
CART methodology in assessing risk factors related to a We would like to thank Anindya Banerjee and Eddie Kane for com-
public health issue. Increasing the use of decision tree ments on a draft of this article and Norfolk and Suffolk Constabulary
models in public health and epidemiological research can for assisting and providing us with accident data.
be useful in deepening our understanding of complex
problems with multiple risk factors, as regression models Funding
may not always be adequately suited to these challenges
(Lemon, Roy, Clark, Friedmann, & Rakowski, 2003; This work was supported by funding from Norfolk and Suffolk con-
stabulary. KR also received funding from ESRC for a doctoral student-
Venkatasubramaniam et al., 2017). ship during which time this article was written.
Ernstberger, A., Joeris, A., Daigl, M., Kiss, M., Angerpointner, K., Racioppi, F., Eriksson, L., Tingvall, C., & Villaveces, A. (2004).
Nerlich, M., & Schmucker, U. (2015). Decrease of morbidity in road Preventing road traffic injury: A public health perspective for Europe.
traffic accidents in a high income country – An analysis of 24,405 Copenhagen: World Health Organization. Retrieved from www.
accidents in a 21 year period. Injury, 46, S135–S143. Retrieved euro.who.int
from https://doi.org/10.1016/S0020-1383(15)30033-4 doi:10.1016/ Road Safety Observatory (2018). Seat belts – How effective? Retrieved
S0020-1383(15)30033-4 June 9, 2019, from https://www.roadsafetyobservatory.com/HowEffective/
Fotios, S., & Price, T. (2017). Road lighting and accidents: Why light- vehicles/seat-belts
ing is not the only answer. Lighting Journal, 82(5), 22–26. Rolison, J.J., Hanoch, Y., Wood, S., & Liu, P.-J. (2014). Risk-taking dif-
Retrieved from http://eprints.whiterose.ac.uk/116229/ ferences across the adult life span: A question of age and domain.
Fu, J., Anderson, C.L., Dziura, J.D., Crowley, M.J., & Vaca, F.E. (2012). The Journals of Gerontology: Series B, 69(6), 870–880. doi:10.1093/
Young unlicensed drivers and passenger safety restraint use in U.S. geronb/gbt081
Fatal crashes: Concern for risk spillover effect? Paper presented at Rolison, J.J., Regev, S., Moutari, S., & Feeney, A. (2018). What are the
Annals of Advances in Automotive Medicine, Association for the factors that contribute to road accidents? An assessment of law
Advancement of Automotive Medicine, Annual Scientific Conference, enforcement views, ordinary drivers’ opinions, and road accident
56, 37–43. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/ records. Accident Analysis & Prevention, 115, 11–24. doi:10.1016/j.
23169115 aap.2018.02.025
Jackson, L., & Cracknell, R. (2018). Road accident casualties in Britain Sherafati, F., Homaie-Rad, E., Afkar, A., Gholampoor-Sigaroodi, R., &
and the world. London: House of Commons Library Retrieved from Sirusbakht, S. (2017). Risk factors of road traffic accidents associated
https://researchbriefings.parliament.uk/ResearchBriefing/Summary/ mortality in Northern Iran; a single center experience utilizing
CBP-7615 Oaxaca blinder decomposition. Bulletin of Emergency and Trauma,
Langford, J., & Koppel, S. (2006). Epidemiology of older driver crashes 5(2), 116–121. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/
– Identifying older driver risk factors and exposure patterns. 28507999
Transportation Research Part F: Traffic Psychology and Behaviour, Smith, A.P. (2016). A UK survey of driving behaviour, fatigue, risk tak-
9(5), 309–321. Retrieved from doi:10.1016/j.trf.2006.03.005 ing and road traffic accidents. BMJ Open, 6(8), e011461. Retrieved
Lemon, S.C., Roy, J., Clark, M.A., Friedmann, P.D., & Rakowski, W. from doi:10.1136/bmjopen-2016-011461
(2003). Classification and regression tree analysis in public health: Tunbridge, R., & Harrison, K. (2017). Fifty years of the breathalyser –
Methodological review and comparison with logistic regression. where now for drink driving? Retrieved from http://www.pacts.org.
Annals of Behavioral Medicine, 26(3), 172–181. Retrieved from doi: uk/wp-content/uploads/sites/2/129256_PACTS_50YearsBreathalyser_
10.1207/S15324796ABM2603_02 V5-1.pdf
Long, J.S. (1997). Regression models for categorical and limited depend- Venkatasubramaniam, A., Wolfson, J., Mitchell, N., Barnes, T., JaKa,
ent variables. Sage Publications. Retrieved from https://uk.sagepub. M., & French, S. (2017). Decision trees in epidemiological research.
com/en-gb/eur/regression-models-for-categorical-and-limited-dependent- Emerging Themes in Epidemiology, 14(1), 11. Retrieved from doi:10.
variables/book6071 1186/s12982-017-0064-4
Morgan, J. (2014). Classification and regression tree analysis. Retrieved Williams, R. (2016). Understanding and interpreting generalized
from https://www.bu.edu/sph/files/2014/05/MorganCART.pdf ordered logit models. The Journal of Mathematical Sociology, 40(1),
Peden, M., Scurfield, R., Sleet, D., Mohan, D., Hyder, A.A., Jarawan, 7–20. Retrieved from doi:10.1080/0022250X.2015.1112384
E., & Mathers, C. (2004). World report on road traffic injury pre- Williams, R., & Williams, R. (2006). Generalized ordered logit/partial
vention. Retrieved from http://www.who.int/violence_injury_preven- proportional odds models for ordinal dependent variables. Stata
tion/publications/road_traffic/world_report/intro.pdf Journal, 6(1), 58–82. Retrieved from https://econpapers.repec.org/
Quinlan, J.R. (1993). C4.5: Programs for machine learning. Morgan article/tsjstataj/v_3a6_3ay_3a2006_3ai_3a1_3ap_3a58-82.htm
Kaufmann Publishers. Retrieved from https://books.google.co.uk/ World Health Organization (WHO). (2015a). Global status report on
books?hl=en&lr=&id=b3ujBQAAQBAJ&oi=fnd&pg=PP1&dq=%5B5% road safety 2015. Injury Prevention. Retrieved from https://doi.org/
5D+Ross+Quinlan.+C4.5:+Programs+for+Machine+Learning.+Morganþ http://www.who.int/violence_injury_prevention/road_safety_status/
Kaufmann+Publishers,+1993.&ots=sQ4vQLGoF5&sig=RrwZAxg4UU- 2013/en/index.html
I4DWfYyADv_vZX5A#v=onepage&q=%5B5%5D Ross Quinlan. C4. WHO. (2015b). Road traffic injuries: The facts. Global Status Report on
5%3A Programs for Machine Learning. Morgan Kaufmann Road Safety, 2015. Retrieved from http://www.who.int/news-room/
Publishers%2C 1993.&f¼false fact-sheets/detail/road-traffic-injuries