Detection and Prediction of Driver Drowsiness Using Artificial Neural Network
Detection and Prediction of Driver Drowsiness Using Artificial Neural Network
A R T I C L E I N F O A B S T R A C T
Keywords: Not just detecting but also predicting impairment of a car driver’s operational state is a challenge. This study
Drowsiness aims to determine whether the standard sources of information used to detect drowsiness can also be used to
Prediction predict when a given drowsiness level will be reached. Moreover, we explore whether adding data such as
Artificial neural network driving time and participant information improves the accuracy of detection and prediction of drowsiness.
Physiological measurement
Twenty-one participants drove a car simulator for 110 min under conditions optimized to induce drowsiness. We
Behavioral measurement
Driving performance and activity
measured physiological and behavioral indicators such as heart rate and variability, respiration rate, head and
eyelid movements (blink duration, frequency and PERCLOS) and recorded driving behavior such as time-to-lane-
crossing, speed, steering wheel angle, position on the lane. Different combinations of this information were
tested against the real state of the driver, namely the ground truth, as defined from video recordings via the
Trained Observer Rating. Two models using artificial neural networks were developed, one to detect the degree
of drowsiness every minute, and the other to predict every minute the time required to reach a particular
drowsiness level (moderately drowsy). The best performance in both detection and prediction is obtained with
behavioral indicators and additional information. The model can detect the drowsiness level with a mean square
error of 0.22 and can predict when a given drowsiness level will be reached with a mean square error of
4.18 min. This study shows that, on a controlled and very monotonous environment conducive to drowsiness in a
driving simulator, the dynamics of driver impairment can be predicted.
⁎
Corresponding author at: Aix Marseille Univ, CNRS, ISM, Marseille, France.
E-mail address: [email protected] (C. Jacobé de Naurois).
https://doi.org/10.1016/j.aap.2017.11.038
Received 27 July 2017; Received in revised form 12 October 2017; Accepted 27 November 2017
Available online 06 December 2017
0001-4575/ © 2017 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/BY-NC-ND/4.0/).
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
awake. Secondly, according to Friedrichs and Yang (2010), when the specific indicator of drowsiness, various measures are often used
experiment involves more than three hours of monotonous driving, the jointly. Such a hybrid approach minimizes the number of false alarms
KSS becomes inadequate because drivers have difficulty judging their while maintaining a high rate of recognition (essential for good ac-
alertness. Lastly, subjective assessment clearly does not constitute an ceptance of the system by the human operator, Dong et al., 2011),
objective measure of drowsiness, and when the task is very mono- mainly because no signal emerges as the reference marker allowing real
tonous, individual ratings on drowsiness differ from the person’s phy- time measurement that is both relatively non-invasive and reliable.
siological alertness level (Brown, 1997). Moreover, there is no direct link between all these features and the
Features extracted from eye and head movements, classified as “operational state”, which is why methods such as machine learning or
sensorimotor indicators, are also promising parameters to detect the statistical models are used, combining the different measures.
operational state and are now included in many research approaches The different algorithms used include k-nearest neighbors (Chauhan
(Chen and Ji, 2012; Liu et al., 2009). Video-oculo-graphy (VOG) is et al., 2015), decision trees (Lee et al., 2010; Sukanesh and
commonly used to study the following features: blink frequency, blink Vijayprasath, 2013), Bayesian classifiers (Lee and Chung, 2012; Yang
duration and PERCLOS (PERcentage of eye CLOSure). Changes in these et al., 2010), Support Vector Machines (Bhowmick and Chidanand
features are considered under low-level control, offering an easy way to Kumar, 2009; Krajewski et al., 2009a,b; Liang et al., 2007; Yeo et al.,
monitor the activity of the neurovegetative system (Caffier et al., 2003; 2009), artificial neural networks (ANN) (Bundele and Banerjee, 2009;
Wierwille and Ellsworth, 1994). These features are generally extracted Eskandarian et al., 2007; Sayed and Eskandarian, 2001; Samiee et al.,
with image processing algorithms based on eye, head and gaze move- 2014), ensemble methods like random forest (Krajewski et al., 2009a,b;
ment tracking. Thus, the quality of the estimation is highly dependent McDonald et al., 2013; Torkkola et al., 2008; Zhang et al., 2004) and,
on this first signal-processing step. more recently, deep learning (Hajinoroozi et al., 2015). Most studies
Physiological features are also frequently used to assess drowsiness consider the problem of estimating the driver’s impaired operational
because they are continuously available and could be considered as an state as a classification problem. Is the driver in an impaired state or
objective, more direct, measure of the functional state. The main re- not? Is the driver drowsy or not? However, the evolution of the state of
cordings of signals related to drowsiness are the electroencephalogram the driver can also be considered as a regression problem, i.e. the driver
(EEG), the electrocardiogram (EKG) and electro-dermal activity (EDA) goes through various continuous states, although regression models are
(Borghini et al., 2014; Dong et al., 2011). The gold standard appears to rarely used in the literature (Murata and Naitoh, 2015). Nonlinear
be the EEG, the most direct indicator of central nervous system activity modeling machine learning (such as with ANNs) is also often used. With
(De Gennaro et al., 2001). However, the EEG is quite intrusive, and these techniques, the model can extract information from noisy data,
proper installation of an extensive set of electrodes on the participant’s and can avoid over-fitting, making it generally more robust (Dong et al.,
scalp requires expertise and time. It has been established that when a 2011). Since in the context of driving we expected over-fitting and
change in vigilance is observed, changes on psychophysiological noisy data, the present study uses machine-learning techniques based
arousal can be also observed, and these changes can be monitored by on artificial neural networks.
measures of the central and autonomic nervous system activity (Haar- Most research focuses on the detection/estimation of an impaired
mann and Boucsein, 2008). Concerning EKG, since heart rate variability state, rather than on its prediction, even though they adopt the term
(HRV) is linked to the autonomic nervous system this feature is often “prediction” (Chen, 2013; Hargutt and Kruger, 2001; Ji et al., 2004;
used as an indicator of drowsiness because change on HRV can provide Verwey and Zaidel, 2000). This is because in machine learning, the
information about the autonomic nervous system (Elsenbruch et al., term “prediction” is used to infer the label of an object not seen during
1999; Lal and Craig, 2001; Riemersma et al., 1977; Stein and Pu, 2012). the learning phase. However, some studies try to predict what the
Moreover, some studies on drowsiness, vigilance or workload also re- ground truth will be in the subsequent few minutes: the ground truth
cord and analyze respiration rate and amplitude (Besson et al., 2013; Ju was shifted for one epoch (Kaida et al., 2007), while different lags (+1,
et al., 2015; Reimer et al., 2009; Rodriguez Ibañez et al., 2011). +2, +3, +4, +5,+7,+10 min) were tested by (Larue, 2010). Murata
Yet a direct relationship between physiological features and cogni- et al. (2016) obtained the highest prediction accuracy using the data
tive state is hard to define, because these physiological features vary between 20 and 120 s before the prediction. Watson and Zhou (2016)
with other states (including, but not limited to, emotion, workload, detect micro-sleep with 96% accuracy and are able to predict, between
physical fatigue) or with the context. These variations according to 15 s and 5 min in advance, the time when the next micro-sleep will
state also differ from one person to another. Thus, each physiological occur. However, the time when the first micro-sleep occurs obviously
indicator has its own limits. Heart rate usually decreases during driving cannot be predicted by such methods.
and when the driver is tired (Lal and Craig, 2001), but the opposite may As explained above, using a single source of information does not
also occur (Apparies et al., 1998). Peiris et al. (2005) showed that two seem to be an efficient way to accurately assess the state of the driver.
independent experts analyzing EEGs to detect drowsiness may not make Different sources of information and different models are used in the
the same assessment for the same participant at the same time. On the literature, and results are hard to generalize away from well-controlled
other hand, EDA can be influenced by stress (Healey and Picard, 2005) laboratory conditions. In the present study, we collected information
and emotions (Rebolledo-Mendez et al., 2014). Taken alone, therefore, originating from different sources: physiological, behavioral, and psy-
these indicators in themselves cannot be considered as adequate and chological data from the driver, as well as performance information
exclusive indicators of drowsiness or fatigue. from the vehicle. The goal of this study is to develop and evaluate a
Driving behavior and performance analyses have the main ad- model with an artificial neural network (ANN), so as to predict when a
vantage of being non-intrusive. Some signals such as pressure on pedals given impaired state will be reached in addition to detecting this im-
or car movements are easily available. The standard deviation of car paired state. We deliberately chose unobtrusive recording techniques
position relative to lane midline (also named standard deviation of lane easily applicable in a car. Different datasets using different sources of
position (SDLP)), and steering wheel movements, are the most common information were tested, to determine which kind of information yields
features used to detect drowsiness (Arnedt et al., 2001; De Valck et al., the most powerful model. We put forward two hypotheses. First, we
2003; Liu et al., 2009; Philip et al., 2004). However, here again, driving hypothesized that it is possible to predict when the impaired state will
performance and activity are not specific indicators of drowsiness. For arise by using the sensorimotor, physiological and performance in-
example, driving performance can decrease with other factors such as dicators used to detect drowsiness. Second, we hypothesized that
distraction (Tango et al., 2009), or with a decline in attention (Marin- adding information such as driving time and participant information
Lamellet et al., 2003) will improve the accuracy of the model.
Since none of these feature families is consensually considered as a
96
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
2. Materials and methods with a step of 0.5) as proposed by Belz et al. (2001). This ground-truth
determination method was chosen because the assessment by video
2.1. Participants coding is reliable and allows a comprehensive assessment of the driver
state. Other methods, such as questionnaires (e.g. KSS), reaction time to
A total of 21 participants were included in the study (mean a double task or even EEG are quite invasive and may disturb the driver
age ± SD: 24.09 ± 3.41 years; 11 men and 10 women). On the day of and thus influence his/her state. However, video analysis is long and
the experiment, the participants were not allowed to drink alcohol, requires several observers with a certain level of training. In order to be
coffee or tea. Inclusion criteria were: valid driver’s licence for at least 6 more reliable, this method can use criteria and rating scale as a basis for
months, no visual correction needed to drive, not susceptible to simu- different observers. The ORD relies on a continuous scale from “alert”
lator sickness (as assessed by the Motion Sickness Susceptibility to “extremely drowsy” with a list of criteria which can be observable in
Questionnaire, Short-form (MSSQ-Short, Golding, 1998) and an Ep- the driver, characteristics of a drowsy driver (Wierwille and Ellsworth,
worth scale score (assessing susceptibility to drowsiness) below 14 1994).The two trained raters evaluated each minute of video and rated
(Johns, 1991). A score of below 8 on this scale means the person has no each segment on a scale ranging from 0 (alert) to 4 (extremely drowsy).
sleep debt. A score of from 9 to 14 means the person shows signs of The mean of the two raters was taken as the drowsiness level. Inter-
sleepiness, and if the score is above 15, the person shows signs of ex- rater reliability was computed with the Pearson's linear correlation
cessive sleepiness. Before the experiment, participants were questioned (R = 0.71 and p = 0.00).
on their age, their quality of sleep (on a scale of 1–10), their caffeine In order to synchronize data obtained at various sampling fre-
consumption (never, rarely, one or two cups per day, more than two quencies, we averaged data over periods of 1 min. Thus, the final
cups per day), driving frequency (occasionally, several times a month, a sampling rate is 1/min for each feature, including ground truth.
week or a day), number of kilometers per year. To assess their circadian The modeling process can be divided into two phases. First, one
typology, their score on the Horne and Ostberg morning/evening Artificial Neural Network (ANN) detects the level of drowsiness from a
questionnaire (Horne and Ostberg, 1975) was also noted. All these in- predetermined set of features (detection model). This ANN is used to
dicators concerning the participants were later considered as partici- detect the impaired state (level of drowsiness). Second, if drowsiness is
pant information, and used with a view to improving the performance under 1.5, a second ANN predicts (in min) when it will reach 1.5 and
of the model. gives this time as its output (for instance when the level is reached),
otherwise its output is 0 (prediction model). The threshold was set at
2.2. Protocol 1.5 for the following reason. McDonald et al. (2013) defined the limit
between “not drowsy” and “drowsy” at a level between 1 and 2 (0 or 1,
The participants drove during between 100 and 110 min in a static not drowsy; 2, 3, 4: moderately, very or extremely drowsy). We chose
driving simulator in an air-conditioned room with temperature control the level of 1.5 as a threshold for defining the impaired state because
set at 24° Celsius, after lunchtime. According to the literature about this level means that at a given time, one of the two raters has evaluated
circadian rhythms, the probability of falling asleep between 02:00 to the state of the participant as moderately drowsy (level 2) while the
06:00 and 14:00 to 16:00 is 3 times higher than at 10:00 or at 19:00, other evaluated the state as 1. These two ANNs were trained in-
respectively (Horne and Reyner, 1999). We chose a period corre- dependently.
sponding to an intermediate level between a low risk of drowsiness (in The neural network toolbox (Beale et al., 1992) of Matlab R2013a
the morning) and the highest risk (end of the night). The road and was used to create the ANNs. Two feedforward neural networks were
traffic were generated with SCANeR Studio®. While driving, data on used with 2 hidden layers, and a back propagation training method was
driving performance, eyelid and head movements, and physiological applied using the Levenberg-Marquardt algorithm (Levenberg, 1944).
data were recorded using the following hardware and software: The error was validated by ten-fold cross-validation and a search grid.
SCANeR Studio® for driving performance at 10 Hz, faceLAB® for sen- The performance function used for learning was the mean squared error
sorimotor signals at 60 Hz, and EKG, pulse plethysmography (PPG), (the average squared error between the network outputs and the target
EDA and Respiration with the Biopac® MP150 system and Acqknow- output). To avoid overfitting, the total dataset was distributed in a
ledge® software at 1000 Hz. In this study, EDA was also recorded but training sub-dataset (70% of the total set, to learn the network’s node
not used due to extensive signal loss. A webcam was placed on top of weights), a validation sub-set (15%: to stop learning and avoid over-
the central screen of the simulator to video-record the participants training) and a testing sub-set (15%: to evaluate the model’s ability to
during the session. work on previously unseen data. This property is also called ‘general-
At the beginning of the session, the participants drove along a ization’).
highway for roughly 90 min, then turned off the highway and drove for In addition, three other metrics were used to evaluate the model:
around 5 min to reach a city. Finally, they drove in an urban environ- first, the percentage of numbers of absolute errors below a threshold
ment for roughly 5 min. During most of the highway stretch, there was (0.5 for detection of degree of impairment and 5 min for predictions
no traffic. Some 2/3 of the way along, 22 cars appeared from the right and for the testing dataset: the higher this metric, the better the model
of the highway, disappearing a few kilometers later. This sudden ad- performs); second, the range of errors containing 95% of the values;
dition of traffic was intended to change the driver’s level of drowsiness. and third, the coefficient R of the correlation between outputs and
Rossi et al. (2011) demonstrated that a driver is more susceptible to targets.
sleepiness in a simulator with a monotonous scenario, and during the Driving performance and driving behavior indicators (car dataset)
afternoon. used in the model were: lateral distance relative to the midline, time-to-
line-crossing (Bergasa et al., 2006), steering wheel angle, accelerator
2.3. Data analysis and modeling pedal angle, shift relative to the lateral line, speed, and number of line
crossings. Physiological features used in the model (physiological da-
The level of drowsiness, the so-called ground truth (indeed, the real taset) were the heart rate and its variability, and the respiration rate
state of the driver is not directly accessible and must be evaluated), and its variability. Sensorimotor features (behavioral dataset) extracted
determined as a reference in this study is based on subjective assess- from FaceLab data were blink duration and its frequency, PERCLOS,
ment by video analysis and independently coded by two raters. Their head movement in translation and rotation, and saccade frequency.
evaluation was based on a method proposed by Wierwille and Ellsworth Participant information recorded consisted of score on circadian ty-
(1994), which used a scale between 0 and 100. For practical reasons in pology, score on Epworth scale, sleep quality, driving frequency,
relation with the ANN, we decided to use a smaller scale (from 0 to 4 number of cups of coffee a day and age. Driving time (the time elapsed
97
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
Table 1
All the variables (grouped by source of information, in column) computed for each participant for each minute, used as input for ANNs.
HR: Heart Rate (average and standard deviation) Blink duration (average and standard deviation) Lateral distance from the closest lane and the center of the car in m
(beat/min) (average and standard deviation)
Svlf: HR signal Very Low Frequency Power (0.0- Blink frequency (average and standard deviation) Time to lane crossing (average and standard deviation)
0.04 Hz) (per minute)
Slf: HR signal Low Frequency Power (0.04-0.15 Hz) PERCLOS (average and standard deviation) (% of Steering angle (average and standard deviation)
eye-closure time)
Shf: HR signal High Frequency Power (0.15-0.4 Hz) Head position x (average and standard deviation) Steering angle velocity (average and standard deviation)
Svhf: HR signal Very High Frequency Power (0.4- Head position y (mean and standard deviation) Steering entropy (computed from steering angle)
3.0 Hz)
Sympathetic ratio (Slf/(Svlf + Slf + Shf) Head position z (average and standard deviation) Number of direction change (0-crossings) per minute (computed
from steering angle)
Vagal ratio (Shf/(Svlf + Slf + Shf) Head rotation x (average and standard deviation) Accelerator pedal angle (average and standard deviation)
Sympathetic-vagal ratio (Slf/Shf) Head rotation y (average and standard deviation) Lateral shift of the vehicle center relative to the lane center (average
and standard deviation)
Respiration Rate (average and standard deviation) Head rotation z (average and standard deviation) Vehicle speed (km/h) (average and standard deviation)
(per minute) Saccade frequency (mean and standard deviation) Number of out-the road per minute
(per minute)
since the beginning of the driving session, in minutes) was also used as physiological sensors and an A/D system (in our experiment with
an input feature for the model (see Table 1). In an attempt to rebase Biopac®, in a real car it could be with a smart-watch).
individual differences, we subtracted from each signal the mean of the
first five minutes of this signal, so that the signal represents variation
3.1. Detection
from an initial state. To optimize learning, each feature was normalized
such that minimum and maximum values lie within [−1;1].
In this section, we present model performance in detecting drow-
siness level, as defined by the ORD scale (from 0 to 4, see Methods
3. Results section). The error is the difference between the real state (as given by
the subjective evaluation, the so-called ground truth) and the output,
The ANNs were trained 16 times (4 × 2 × 2) with different data- squared and averaged over epochs to provide the mean squared error of
sets. Each dataset results from the combination of the following: the the trained model.
three sources of information tested alone or all together (thus 4 com- From an absolute point of view, the dataset configuration providing
binations), with or without elapsed time (2 cases) and with or without the best performance (lowest mean square error) in training the model
information about the participants (2 cases). The Tables 2 and 3 present contains driving time, participant information and behavioral features
the performance obtained with each of the 16 datasets. In this section, (# in Table 2). With this dataset, the mean square error is 0.22 ± 0.02
the results will be presented with the driving time (labeled with ‘1′ in and more than 80% of the absolute value of the error of the testing data
tables) and without (labeled with ‘0′ in Tables 2 and 3), with the in- is under 0.5 (less than one-half of a state level, as defined by the ORD
formation about the participant (labeled with ‘1′ in tables) and without scale). Ninety-five percent of the absolute value of the error is under
(labeled with ‘0′ in Tables 2 and 3). The grouping was decided ac- 0.87. In other words, the model is off by less than one drowsiness level
cording to how these variables were recorded in our experiment (and on our scale, in 95% of cases. Performance is similar when car in-
possibly in a real car), that is to say with which equipment. Indeed, the formation is included. The mean square error is 0.23 ± 0.06. More
vehicle information can be recorded from the vehicle’s Controller Area than 86.34% of the absolute value of the error of the testing data is
network (in our experiment with SCANeR® software), the behavioral under 0.5. Ninety-five percent of the absolute value of the error is under
measurements with a camera and a specific image processing system (in 0.73, i.e. in 95% of cases the model is off by less than one drowsiness
our experiment with faceLAB®) and physiological measurements with level on our scale.
Table 2
Model performance in detecting drowsiness level for the testing dataset: mean square error (MSE), standard deviation (STD), according to dataset used, with (1) or without (0) driving
time, with (1) or without (0) participant information. The worst performance (highest MSE) is highlighted in bold and with a * while the best performance (lowest MSE) is highlighted in
bold and with a #.
Driving Time Participant information Dataset Source MSE STD |Error |95% % Error < 0.5
98
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
Table 3
Performance of the model in predicting drowsiness level with the testing dataset: mean square error (MSE), standard deviation (STD), according to whether dataset is used with (1) or
without driving time (0), participant information, and source of recorded information. The * symbol indicates the worst performance and the # symbol the best performance. The best and
worst performance are also higlighted in bold.
Driving Time Participant information Dataset Source MSE STD |Error |95% % Error < 5
When neither driving time nor participant information is used (line regression lines are very close to unity (0.87, 0.88, 0.88 respectively for
0-0 in Table 2), or when only one of these is used (0-1 or 1-0), the model the training, validation and testing datasets) and the intercepts are close
performs better with all datasets used together or with the behavioral to zero (0.17 for all three datasets). Errors are calculated, at each 1 min
dataset used alone; performance is slightly worse with the physiological epoch, as the difference between the output of the model and the
or car datasets used alone. As stated above, the model performs best, for ground truth. The graph on the left of Fig. 1 shows a peak at 0.05,
each dataset or for all three datasets used together, when both driving meaning that most of the errors are close to 0. Also, more than 95% of
time and participant information are included (1-1). the instances had an error of between −1.16 and 1.16. In Fig. 2, the
Figs. 1 and 2 present, respectively with (Fig. 1) and without (Fig. 2) correlations between output and target are still good but there is greater
driving time and participant information, the frequency histogram of variability (R = 0.87, 0.74, 0.78 respectively for the training, valida-
distribution of error (left panel, A) and the correlation (right panel, B) tion and testing datasets). The model used for the results presented in
between real state (target, horizontal axis) and estimated state, the Fig. 2 (behavior, physiology and car) is less accurate than the model
output of the ANN (vertical axis). The model is trained with behavioral which results are presented on Fig. 1 (behavior, elapsed time and
data in Fig. 1 and with all datasets in Fig. 2; thus, Fig. 1 illustrates the participant information). As for errors, the graph on the left shows a
best, and Fig. 2 the worst, performance for the training, validation and single but broader peak at 0.2 and −0.02, also meaning that most of the
testing datasets. Linear regressions were applied to the output of the errors are close to 0.
model to correlate them with the ground truth. With a perfect model,
all data points would be on the diagonal line of the correlation graph. 3.2. Prediction
Fig. 1 shows that, for each of the three datasets, simulated values are
well correlated with expected values (ground truth). The R-values are This section presents the performance of the second model, aimed at
actually very close to unity (0.93, 0.91, 0.91 respectively for the predicting when a driver will reach a given drowsiness level (here 1.5).
training, validation and testing datasets). Moreover, the slopes of the The error, for each epoch, is the difference between the time remaining
Fig. 1. frequency histogram of error distribution (left panel) and correlation (right panel) between real and estimated state, for a model trained with behavioral dataset, driving time and
participant information.
99
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
Fig. 2. frequency histogram of error distribution (left panel) and correlation (right panel) between real and estimated state, for a model trained with behavioral, car and physiological
datasets.
from the current epoch before the target level is really reached (as per technological developments are not sufficient to meet the challenge of
the subjective evaluation) and the time predicted by the trained model safety in modern vehicles. Predicting the degree of driver impairment,
(squared and averaged over epochs to provide the mean squared error). and when it will occur, remain important research objectives requiring
The best performance is achieved with a combination of driving more complex treatment of heterogeneous information from diverse
time, participant information and the behavioral dataset. The mean sources. The objective of this study was to assess whether the time of
square error is 4.18 ± 1.17 min. For 95% of the testing data, the ab- occurrence of a given state of drowsiness could be predicted by using
solute value of error is under 2 min and more than 99% of the absolute ANN models (one to detect drowsiness and a second one to predict
value of error is under 5 min. Similar, but not higher, accuracy is drowsiness).
achieved with the car and physiological datasets (4.67 ± 1.33 and Overall, our results demonstrate that, using an ANN trained with the
5.51 ± 1.84). Ninety-five percent of the absolute value of error is same information used to detect drowsiness, it is possible to predict
under 2.43 and 2.62, respectively. For more than 97% of the testing when a driver’s impairment will appear to an accuracy of approxi-
data, the absolute value of error is under 5 min. mately 5 min. Moreover, to further improve accuracy, external in-
The worst model performance in predicting drowsiness is with the formation such as driving time or a driver profile can be added to the
car dataset alone (60.09 ± 6.19 min). Performance improves with the model. In his study, Larue (2010) accurately predicted a driver’s de-
addition of participant information (50.21 ± 8.84 min), or of driving creased vigilance up to five minutes in advance, and up to 10 min in
time (31.14 ± 10.72 min). The model becomes very accurate when advance with 70% to 80% accuracy. Under quite different conditions,
both driving time and participant information are included with the car and with different types of information, our model seems to be more
dataset (4.67 ± 1.33 min). accurate. In our worst case, for 95% of the test dataset, the model can
For each source of information (all, behavioral, car and physiolo- predict when the impairment will appear to within 13.11 min. In our
gical datasets), the model is more accurate when both driving time and best case, for 95% of the test dataset, the model can predict the im-
participant information are included in the dataset than with either pairment to within 1.97 min.
driving time or participant information alone, or with no additional As explained in the results section, model performance, both on
information. detecting a drowsiness level and on predicting when this level will be
Figs. 3 and 4 present the frequency histogram of distribution of reached, varies considerably according to the datasets used to train the
errors (left panel) and the correlation between real time (target, hor- model. This raises the question of the relevance of using physiological
izontal axis) and estimated time (vertical axis) of appearance of signals, behavioral features and driving activity, and of the respective
drowsiness, respectively with (Fig. 3) and without (Fig. 4) driving time roles of these different datasets in model performance. An important
and participant information, The model is trained with behavioral data point highlighted by our results is how temporal (driving time) and
in Fig. 3 and with all datasets in Fig. 4, so that Fig. 3 illustrates the best, idiosyncratic (participant information) data impact model performance.
and Fig. 4 the worst, performance. On Fig. 3, the graph on the right The limitations of our model with regard to generalization (i.e. the
shows that the relation between target and output is very precise, data ability of the model to accurately treat previously unseen data), and
are close to the diagonal (very high R, better than 0.98 for the training, from a more general point of view, inter-individual variability, will also
validation and testing datasets, the slopes are better than 0.99). On the be discussed.
left part of Fig. 3, the main peak is at 0.3, meaning that the model has
an error inferior at 0.3.
4.1. Dataset comparison: behavioral/physiological/car
4. Discussion Our objective was to use the same information both to detect
drowsiness and to predict the time when a given drowsiness level would
Detecting impairment of a driver’s operational state is a major safety be reached. Interestingly, when trained with all datasets, either singly
issue, addressed in numerous studies. While recent car models go some or in combination, the model gave satisfactory results. The dataset
way towards providing this detection capacity, it is clear that recent giving the best performance is the behavioral dataset (followed by the
100
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
Fig. 3. frequency histogram of error distribution (left panel) and correlation (right panel) between real and estimated times, for a model trained with behavioral dataset, driving time and
participant information.
physiological dataset and finally the car dataset), both in detecting the 2015). Wang and Xu (2016) consider eye features as the prime input for
degree of drowsiness and in predicting when a given drowsiness level detection of drowsiness. However, since they are usually computed by
will occur. Similar results were previously reported. Samiee et al. image processing, these features cannot be considered fully reliable.
(2014) showed that information about blinks leads to highly accurate Although techniques have progressed considerably in recent years,
detection (90.74% detection of a drowsy state), while lateral deviation detecting face and gaze movements remains tricky in complex situa-
of the car and steering wheel angle provide 85.37% and 87.22% ac- tions (for example, subjects with glasses, variable or low light condi-
curacy, respectively. However, when all three sources of information tions, Benoit and Caplier, 2005; Friedrichs and Yang, 2010).
(blinking, lateral position and steering angle) were used together, ac- Our behavioral, physiological and, to a lesser extent, car datasets led
curacy increased to 94.69%, although this was not borne out by our to the best model performance. With all sources of information in the
study. As in our study, Daza et al. (2014) obtained better results with same neural network, performance could be expected to improve be-
features extracted from eyelid movement (such as PERCLOS) than with cause the neural network can better learn dependencies between dif-
features extracted from driving behavior. In the literature, HRV data ferent kinds of information. Unfortunately, our results do not bear this
showed a correlation with drowsiness (Elsenbruch et al., 1999; Lal and out. A single ANN-based model may not be the best way to take ad-
Craig, 2001; Stein and Pu, 2012). Yet our model gave better results with vantage of the dependencies between the different sources of in-
ocular and head parameters than with physiological variables: the ORD formation. An alternative, inspired by Samiee et al. (2014), might be to
scale showed a stronger correlation with the ocular parameters than linearly combine the outputs of three ANNs, each trained with a dif-
with physiological variables such as EKG and Respiration (Rost et al., ferent dataset: car, physiological or behavioral.
Fig. 4. frequency histogram of error distribution (left panel) and correlation (right panel) between real and estimated times, for a model trained with behavioral, car, and physiological
datasets.
101
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
102
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
networks or dynamic neural networks to add temporality to the model, De Valck, E., De Groot, E., Cluydts, R., 2003. Effects of slow-release caffeine and a nap on
or adding other features like context information (traffic, type of road, driving simulator performance after partial sleep deprivation. Percept. Mot. Skills 96
(1), 67–78.
weather etc.). These factors can influence the driver’s state. However, Dong, Y., Hu, Z., Uchimura, K., Murayama, N., 2011. Driver inattention monitoring
as eyelid and head movements are difficult to record in a real car, the system for intelligent vehicles: a review Intelligent Transportation Systems. IEEE
focus should be on improving a model using only driving performance, Trans. 12 (2), 596–614.
Elsenbruch, S., Harnish, M.J., Orr, W.C., 1999. Heart rate variability during waking and
driving behavior (based on data provided by sensors in the car) and sleep in healthy males and females. Sleep 22 (8), 1067–1071.
physiological measurements. Finally, a larger and more realistic dataset Eskandarian, A., Sayed, R., Delaigue, P., Blum, J., Mortazavi, A., 2007. Advanced Driver
(far more subjects (wider range for age for example)), recorded in real, Fatigue Research. Federal Motor Carrier Safety Administration, Washington, DC
Report: FMCSA-RRR-07–001.
on-road, conditions (different times of the day for example) would be Friedrichs, F., Yang, B., 2010. Drowsiness monitoring by steering and lane data based
required to validate these models. features under real driving conditions. In: Proceedings of the European Signal
Processing Conference. Aalborg, Denmark. pp. 23–27.
Golding, J.F., 1998. Motion sickness susceptibility questionnaire revised and its re-
Conflict of interests
lationship to other forms of sickness. Brain Res. Bull. 47 (5), 507–516.
Hajinoroozi, M., Mao, Z., Huang, Y., 2015. Prediction of driver’s drowsy and alert states
None. from EEG signals with deep learning. 2015 IEEE 6th International Workshop on
Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP) 493–496.
Hargutt, V., Kruger, H.-P., 2001. Eyelid movements and their predictive value for fatigue
Acknowledgments stages. In: Presented at the International Conference on Traffic and Transport
Psychology ? ICTTP 2000. HELD 4–7 September 2000, Bern, Switzerland.
This research was funded by a PhD grant from PSA Group via the Healey, J.A., Picard, R.W., 2005. Detecting stress during real-world driving tasks using
physiological sensors Intelligent Transportation Systems. IEEE Trans. 6 (2), 156–166.
OpenLab agreement with Aix-Marseille University and CNRS entitled Horne, J.A., Ostberg, O., 1975. A self-assessment questionnaire to determine morning-
“Automotive Motion Lab”. We thank Marjorie Sweetko for correcting ness-eveningness in human circadian rhythms. Int. J. Chronobiol. 4 (2), 97–110.
and improving the English manuscript, all the participants in this study, Horne, J., Reyner, L., 1999. Vehicle accidents related to sleep: a review. Occup. Environ.
Med. 56 (5), 289–294.
L. Marrou for helping to evaluate video recordings, P. Vars, V. Honnet, Ingre, M., Åkerstedt, T., Peters, B., Anund, A., Kecklund, Gör., 2006. Subjective sleepi-
and M. Hing for their help with SCANeR® Software Developments. ness, simulated driving performance and blink duration: examining individual dif-
ferences. J. Sleep Res. 15 (1), 47–53.
Ji, Q., Zhu, Z., Lan, P., 2004. Real-time nonintrusive monitoring and prediction of driver
References fatigue Vehicular Technology. IEEE Trans. 53 (4), 1052–1068.
Johns, M.W., 1991. A new method for measuring daytime sleepiness: the Epworth slee-
Alhazmi, S., 2013. Towards Context-based Fatigue Detection System in Vehicular Area piness scale. Sleep 14 (6), 540–545.
Network. University of Ottawa, Canada (Unpublished doctoral thesis). Ju, J.H., Park, Y.J., Park, J., Lee, B.G., Lee, J., Lee, J.Y., 2015. Real-Time driver’s bio-
Anund, A., Fors, C., Kecklund, G., Leeuwen, W.V., Åkerstedt, T., 2015. Countermeasures logical signal monitoring system. Sens. Mater. 27 (1), 51–59.
for Fatigue in Transportation: a Review of Existing Methods for Drivers on Road, Rail, Kaida, K., Åkerstedt, T., Kecklund, G., Nilsson, J.P., Axelsson, J., 2007. Use of subjective
Sea and in Aviation. Statens väg- Och Transportforskningsinstitut. (VTI report 852A). and physiological indicators of sleepiness to predict performance during a vigilance
Apparies, R.J., Riniolo, T.C., Porges, S.W., 1998. A psychophysiological investigation of task. Ind. Health 45 (4), 520–526.
the effects of driving longer-combination vehicles. Ergonomics 41 (5), 581–592. Karrer, K., Vöhringer-Kuhnt, T., Baumgarten, T., Briest, S., 2004. The role of individual
Arnedt, J.T., Wilde, G.J., Munt, P.W., MacLean, A.W., 2001. How do prolonged wake- differences in driver fatigue prediction. In: Third International Conference on Traffic
fulness and alcohol compare in the decrements they produce on a simulated driving and Transport Psychology. Nottingham, UK. pp. 5–9.
task? Accid. Anal. Prev. 33 (3), 337–344. Krajewski, J., Batliner, A., Golz, M., 2009a. Acoustic sleepiness detection: framework and
Beale, M., Hagan, M.T., Demuth, H.B., 1992. Neural Network Toolbox. Neural Network validation of a speech-adapted pattern recognition approach. Behav. Res. Methods 41
Toolbox, The Math Works 5. pp. 25. (3), 795–804.
Belz, S.M., Robinson, G.S., Casali, J.G., 2001. An on-Road investigation of commercial Krajewski, J., Sommer, D., Trutschel, U., Edwards, D., Golz, M., 2009b. Steering wheel
motor vehicle operator self assessment of fatigue as an indicator of driver fatigue. In: behavior based estimation of fatigue. Proceedings of the Fifth International Driving
SAGE Publications Sage CA: Los Angeles, CA. Proceedings of the Human Factors and Symposium on Human Factors in Driver Assessment, Training and Vehicle Design
Ergonomics Society Annual Meeting Vol. 45. pp. 1576–1580. 118–124.
Benoit, A., Caplier, A., 2005. Hypovigilence analysis: open or closed eye or mouth? Lal, S.K., Craig, A., 2001. A critical review of the psychophysiology of driver fatigue. Biol.
Blinking or yawning frequency? In: IEEE Conference on Advanced Video and Signal Psychol. 55 (3), 173–194.
Based Surveillance. AVSS 2005. pp. 207–212. Larue, G.S., 2010. Predicting effects of monotony on driver’s vigilance. Centre for
Bergasa, L.M., Nuevo, J., Sotelo, M.A., Barea, R., Lopez, M.E., 2006. Real-time system for Accident Research and Road Safety. Queensland University of Technology, Australia
monitoring driver vigilance. IEEE Trans. Intell. Transp. Syst. 7 (1), 63–77. (Unpublished doctoral thesis).
Besson, P., Bourdin, C., Bringoux, L., Dousset, E., Maiano, C., Marqueste, T., Vercher, J.- Lee, B.G., Chung, W.-Y., 2012. Driver alertness monitoring using fusion of facial features
L., 2013. Effectiveness of physiological and psychological features to estimate heli- and bio-Signals. IEEE Sens. J. 12 (7), 2416–2422.
copter pilots… workload: a bayesian network approach. IEEE Trans. Intell. Transp. Lee, J.D., Fiorentino, D., Reyes, M.L., Brown, T., Ahmad, O., Fell, J., Dufour, R., 2010.
Syst. 14 (4), 1872–1881. Assessing the Feasibility of Vehicle-based Sensors to Detect Alcohol Impairment 811.
Bhowmick, B., Chidanand Kumar, K.S., 2009. Detection and classification of eye state in National Highway Traffic Safety Administration, Washington, DC,DOT HS, pp. 358.
IR camera for driver drowsiness identification. 2009 IEEE International Conference Lee, B.L., Lee, B.G., Chung, W.Y., 2016. Standalone wearable driver drowsiness detection
on Signal and Image Processing Applications (ICSIPA) 340–345. system in a smartwatch. IEEE Sens. J. 16 (13), 5444–5451.
Borghini, G., Astolfi, L., Vecchiato, G., Mattia, D., Babiloni, F., 2014. Measuring neuro- Levenberg, K., 1944. A method for the solution of certain non-linear problems in least
physiological signals in aircraft pilots and car drivers for the assessment of mental squares. Q. Appl. Math. 2 (2), 164–168.
workload, fatigue and drowsiness. Neurosci. Biobehav. Rev. 44, 58–75. Li, L., Werber, K., Calvillo, C.F., Dinh, K.D., Guarde, A., König, A., 2014. Multi-Sensor soft-
Brown, I.D., 1997. Prospects for technological countermeasures against driver fatigue. Computing system for driver drowsiness detection. In: Snášel, V., Krömer, P., Köppen,
Accid. Anal. Prev. 29 (4), 525–531. M., Schaefer, G. (Eds.), Soft Computing in Industrial Applications. Springer
Bundele, M.M., Banerjee, R., 2009. Detection of fatigue of vehicular driver using skin International Publishing, pp. 129–140.
conductance and oximetry pulse: a neural network approach. In: Proceedings of the Liang, Y., Reyes, M.L., Lee, J.D., 2007. Real-Time detection of driver cognitive distraction
11th International Conference on Information Integration and Web-based using support vector machines. IEEE Trans. Intell. Transp. Syst. 8 (2), 340–350.
Applications & Services. New York, NY, USA : ACM. pp. 739–744. Liu, C.C., Hosking, S.G., Lenné, M.G., 2009. Predicting driver drowsiness using vehicle
Caffier, P.P., Erdmann, U., Ullsperger, P., 2003. Experimental evaluation of eye-blink measures: recent insights and future challenges. J. Saf. Res. 40 (4), 239–245.
parameters as a drowsiness measure. Eur. J. Appl. Physiol. 89 (3–4), 319–325. Marin-Lamellet, C., Paire-Ficout, L., Lafont, S., Amieva, H., Laurent, B., Thomas-Antérion,
Chauhan, A., Saroliya, A., Sharma, V., 2015. Design & Analysis of KNN algorithm for C., Fabrigoule, C., 2003. Mise En Place d’un Outil d’évaluation Des déficits
fatigue detection in vehicular drivers using Pulse Oximetry parameter. Int. J. Eng. Attentionnels Affectant Les Capacités De Conduite Au Cours Du Vieillissement
Technol. Manage. 2 (3), 107–110. Normal Et Pathologique: L’étude SÉROVIE 81. Recherche – Transports – Sécurité, pp.
Chen, J., Ji, Q., 2012. Drowsy driver posture, facial, and eye monitoring methods. In: 177–189.
Eskandarian, A. (Ed.), Handbook of Intelligent Vehicles. Springer, London, pp. McDonald, A.D., Lee, J.D., Schwarz, C., Brown, T.L., 2013. Steering in a random forest
913–940. ensemble learning for detecting drowsiness-Related lane departures. Hum. Factors J.
Chen, R., 2013. Sitting Behaviour-based Pattern Recognition for Predicting Driver Hum. Factors Ergon. Soc (18720813515272).
Fatigue. Deakin University, Australia (Unpublished doctoral thesis). Murata, A., Naitoh, K., 2015. Multinomial logistic regression model for predicting driver’s
Daza, I.G., Bergasa, L.M., Bronte, S., Yebes, J.J., Almazán, J., Arroyo, R., 2014. Fusion of drowsiness using only behavioral measures. J. Traffic Trans. Eng. 3, 80–90.
optimized indicators from Advanced Driver Assistance Systems (ADAS) for driver Murata, A., Ohta, Y., Moriwaka, M., 2016. Multinomial logistic regression model by
drowsiness detection. Sensors 14 (1), 1106–1131. stepwise method for predicting subjective drowsiness using performance and beha-
De Gennaro, L., Ferrara, M., Curcio, G., Cristiani, R., 2001. Antero-posterior EEG changes vioral measures. In: In: Goonetilleke, R., Karwowski, W. (Eds.), Advances in Physical
during the wakefulness?sleep transition. Clin. Neurophysiol. 112 (10), 1901–1911. Ergonomics and Human Factors 489. Springer International Publishing, Cham, pp.
103
C. Jacobé de Naurois et al. Accident Analysis and Prevention 126 (2019) 95–104
665–674. Stein, P.K., Pu, Y., 2012. Heart rate variability, sleep and sleep disorders. Sleep Med. Rev.
Peiris, M.T.R., Jones, R.D., Davidson, P.R., Carroll, G.J., Signal, T.L., Parkin, P.J., Bones, 16 (1), 47–66.
P.J., 2005. Identification of vigilance lapses using EEG/EOG by expert human raters. Sukanesh, R., Vijayprasath, S., 2013. Certain investigations on drowsiness alert system
2005 27th Annual International Conference of the IEEE Engineering in Medicine and based on heart rate variability using LabVIEW. WSEAS Trans. Inf. Sci. Appl. 10 (11).
Biology Society 1–7, 5735–5737. Tango, F., Calefato, C., Minin, L., Canovi, L., 2009. Moving attention from the road: a new
Philip, P., Taillard, J., Guilleminault, C., Quera, S., Bioulac, B., Ohayon, M., 1999a. Long methodology for the driver distraction evaluation using machine learning ap-
distance driving and self?induced sleep deprivation among automobile drivers. Sleep proaches. 2nd Conference on Human System Interactions 2009, 596–599 (HSI ’09).
22 (4), 475–480. Thiffault, P., Bergeron, J., 2003. Fatigue and individual differences in monotonous si-
Philip, P., Taillard, J., Quera-Salva, M., Bioulac, B., Åkerstedt, T., 1999b. Simple reaction mulated driving. Personality Individual Differences 34 (1), 159–176.
time, duration of driving and sleep deprivation in young versus old automobile dri- Torkkola, K., Gardner, M., Schreiner, C., Zhang, K., Leivian, B., Zhang, H., Summers, J.,
vers. J. Sleep Res. 8 (1), 9–14. 2008. Understanding driving activity using ensemble methods. In: Prokhorov, D.
Philip, P., Taillard, J., Sagaspe, P., Valtat, C., Sanchez-Ortuno, M., Moore, N., Bioulac, B., (Ed.), Computational Intelligence in Automotive Applications. Springer, Berlin
2004. Age, performance and sleep deprivation. J. Sleep Res. 13 (2), 105–110. Heidelberg, pp. 39–58.
Philip, P., Sagaspe, P., Taillard, J., Valtat, C., Moore, N., Åkerstedt, T., Bioulac, B., 2005. Van Dongen, H.P.A., Rogers, N.L., Dinges, D.F., 2003. Sleep debt: theoretical and em-
Fatigue, sleepiness, and performance in simulated versus real driving conditions. pirical issues. Sleep Biol. Rhythms 1 (1), 5–13.
Sleep 28 (12), 1511. Van Dongen, H.P.A., Baynard, M.D., Maislin, G., Dinges, D.F., 2004a. Systematic inter-
Rebolledo-Mendez, G., Reyes, A., Paszkowicz, S., Domingo, M.C., Skrypchuk, L., 2014. individual differences in neurobehavioral impairment from sleep loss: evidence of
Developing a body sensor network to detect emotions during driving. IEEE Trans. trait-like differential vulnerability. Sleep 27 (3), 423–433.
Intell. Transp. Syst. 15 (4), 1850. Van Dongen, H.P.A., Maislin, G., Dinges, D.F., 2004b. Dealing with inter-individual dif-
Reimer, B., Coughlin, J.F., Mehler, B., 2009. Development of a driver aware vehicle for ferences in the temporal dynamics of fatigue and performance: importance and
monitoring, managing & motivating older operator behavior. Proceedings of the ITS- techniques. Aviat. Space Environ. Med. 75 (3), A147–A154.
America 1–9. Verwey, W.B., Zaidel, D.M., 2000. Predicting drowsiness accidents from personal attri-
Riemersma, J.B.J., Sanders, A.F., Wildervanck, C., Gaillard, A.W., 1977. Performance butes, eye blinks and ongoing driving behaviour. Personality Individual Differences
decrement during prolonged night driving. Vigilance. Springer, pp. 41–58. 28 (1), 123–142.
Rodriguez Ibañez, N., García González Á, M., Ramos Castro, J.J., Fernández Chimeno, M., Wang, X., Xu, C., 2016. Driver drowsiness detection based on non-intrusive metrics
2011. Drowsiness detection by thoracic effort signal snalysis with professional drivers considering individual specifics. Accid. Anal. Prev. 95, 350–357 (Part B).
in real environments. Driver Distraction & Inattention 2011: Program, Presentations Watson, A., Zhou, G., 2016. Microsleep prediction using an EKG capable heart rate
& Reviewed Papers. monitor. 2016 IEEE First International Conference on Connected Health:
Rossi, R., Gastaldi, M., Gecchele, G., 2011. Analysis of driver task-related fatigue using Applications, Systems and Engineering Technologies (CHASE) 328–329.
driving simulator experiments. Proc. Soc. Behav. Sci. 20, 666–675. Wesensten, N.J., Belenky, G., Thorne, D.R., Kautz, M.A., Balkin, T.J., 2004. Modafinil vs.
Rost, M., Zilberg, E., Xu, Z.M., Feng, Y., Burton, D., Lal, S., 2015. Comparing contribution caffeine: effects on fatigue during sleep deprivation. Aviat. Space Environ. Med. 75
of algorithm based physiological indicators for characterisation of driver drowsiness. (6), 520–525.
J. Med. Bioeng. 4 (5), 391–398. Wierwille, W.W., Ellsworth, L.A., 1994. Evaluation of driver drowsiness by trained raters.
Samiee, S., Azadi, S., Kazemi, R., Nahvi, A., Eichberger, A., 2014. Data fusion to develop a Accid. Anal. Prev. 26 (5), 571–581.
driver drowsiness detection system with robustness to signal loss. Sensors 14 (9), Yang, G., Lin, Y., Bhattacharya, P., 2010. A driver fatigue recognition model based on
17832 (14248220). information fusion and dynamic Bayesian network. Inf. Sci. 180 (10), 1942–1954.
Sayed, R., Eskandarian, A., 2001. Unobtrusive drowsiness detection by neural network Yeo, M.V.M., Li, X., Shen, K., Wilder-Smith, E.P.V., 2009. Can SVM be used for automatic
learning of driver steering. Proceedings of The Institution of Mechanical Engineers EEG detection of drowsiness during car driving? Saf. Sci. 47 (1), 115–124.
Part D-Journal of Automobile Engineering 215 (9), 969–975. Zhang, Y., Owechko, Y., Zhang, J., 2004. Driver cognitive workload estimation: a data-
Shahid, A., Wilkinson, K., Marcu, S., Shapiro, C.M., 2011. Karolinska sleepiness scale driven perspective. The 7th International IEEE Conference on Intelligent
(KSS). In: Shahid, A., Wilkinson, K., Marcu, S., Shapiro, C.M. (Eds.), STOP, THAT and Transportation Systems, 2004. Proceedings 642–647.
One Hundred Other Sleep Scales. Springer, New York, pp. 209–210 (ch 47).
104