Latest news

The perils of data analysis – speeding fines and the road toll

1 comment

Excel formulas discussed: Slope, Correl
SPSS formulas discussed: Correlation, Regression, Oneway

I was reading an article on CarAdvice recently that argued that increased speeding fine revenue in Victoria had done nothing to reduce accidents, and that in fact speed cameras were dangerous. The argument was that while deaths had decreased, safer cars had just transferred deaths into hospitalisations; accidents were continuing at roughly the same rate but the outcomes were better. Whilst I would tend to agree with a number of the statements regarding low-level violations (particularly as I recently copped a speeding fine on a motorbike when sunglare meant I couldn’t see the speedo for a minute, and I clearly misjudged engine sound), I thought that some of the logical steps and data analysis (or lack thereof) were worth looking into.
In making the claim that speed cameras don’t reduce the road toll, the article relied on the data in this graph.
deaths,-hosp-and-fines

The conclusion from this graph was that hospitalisations had remained constant whilst fines had increased, and therefore fines had no effect on hospitalisation statistics. This is a common error with people thinking about correlations – correlation does not equal causality – it could be that accidents are somehow causing speeding fines rather than the other way around. The author is also forgetting (or ignoring) that there may be a third variable (or multiple ones!) which would be serving to increase accidents, but speeding fines are serving to decrease them, with the effects of the two cancelling each other out. Without controlling for these other variables, it is difficult to determine whether the first two things are related or not.

Data sources

To expand on the analysis, I went searching for relevant data. Key inputs I was searching for were fatalities, hospitalisations and speeding fines (both revenue and the number of infringements). To look at the likelihood of any of these applying to an individual, I then sought population parameters, such as Victorian population data, vehicle numbers and travel volumes. All of these were obtained from official sources such as the Transport Accident Commission (www.tac.vic.gov.au), a Victorian Government road safety site (www.camerassavelives.vic.gov.au), the Australian Bureau of Statistics (www.abs.gov.au), the Victorian Department of Treasury and Finance (www.dtf.vic.gov.au), plus the Sydney Morning Herald (www.smh.com.au) for early fine data.
The data was then combined into a single spreadsheet (which you can download here or at the bottom of this post) and then cleaned up.

Data cleaning and calculations

There are minor discrepancies between yearly bases. For example, accident data is on a calendar year basis whilst fines are on a financial year basis, although given the steady trend of fines, this should not pose too much of an issue. Population is an estimate as at 30 June of each year (approximating an average population for that calendar year).
Prior to 1995, the number of registered vehicles was not calculated every year. In the intervening years between two official statistics, a linear change in vehicle numbers has been assumed.
Speeding fines cover fixed and mobile cameras, but not on-the-spot fines issued by police, as police fines included too many different types of fines to rely on them as a proxy for speeding. Dollar amounts of fines include red-light infringements which are captured by fixed speed cameras, although the number of infringements removes the red-light offences.
To move past just looking at speeding fines over the five year period over which fine data was available (prior to FY2010, speeding fines were included in a general fines category in Victorian financial accounts), a number of other columns were included as to whether particular laws or enforcement programs were in place. For the purposes of analysis, three of these changes were delayed until the next year (Australian Design Rules for seatbelts and the introduction of demerit points from 1970 to 1971, and the urban 50km/h speed limit from 2001 to 2002) as otherwise they would only have been in operation for a single year before another significant change occurred, and their effects would likely have been lost.
Any cells that contain a calculated rather than official value have been highlighted in yellow.
Calculations were also made to determine the intensity of fatalities, hospitalisations and total casualties with respect to the population parameters (population, vehicles and distance (vehicle kilometres travelled or VKT)).

The simple analysis

For the analysis, I used Excel to create the charts and SPSS to calculate most of the statistics. SPSS (or PASW depending on which year you got the software) was used as it can easily calculate whether a relationship is statistically significant, whilst Excel tends to only have formulas to calculate the output without significance (although you can make your own significance calculations, it’s just harder to do). For a refresher on what statistical significance means, either Google it for a definition that makes sense to you, or try this one – https://www.statpac.com/surveys/statistical-significance.htm.

Speeding fines

For the five years of available data, speeding fines in Victoria have followed an interesting pattern.
historical-fines
average-fines
What we find is that the number of speeding fines has declined on average (trendline shows an annual decrease of nearly 14,000 p.a.) but that the funds raised from speeding fines have increased in every single year. This is particularly evident in 2013 when the number of infringements issued declined by around 6% but revenue increased by nearly 9% after after the base fine amount (in the form of the value of a penalty unit) was increased by 12.5% and had CPI indexation applied. This reminds me of the treatment consumers receive from regulated businesses such as water and power distribution charges, where they are encouraged to consume less, but then required to pay more so that the distributor’s profits are maintained, as allowed under the DORC model used by regulators such as the Australian Energy Regulator.

Deaths

The Caradvice article only looked at deaths from 2006-2013 and showed a 23% decrease over this period after excluding pedestrian fatalities (which seems a strange exclusion to make, as a fatality caused by a traffic accident is still a fatality, regardless of whether the person is inside or outside the vehicle, and they may have still included people who were on a bicycle). After including all fatalities, 2006-2013 showed a 28% decrease, and looking at 2000-2013 (the period for which there is also hospitalisation data) there is a 40% decrease.
death-metrics-over-time
Deaths per million of population and per million registered vehicles declined even more rapidly, down 36% and 38% respectively from 2006-2013, and by 51% and 55% from 2000-2013. Deaths per billion VKT declined slightly less at 23% and 42% from 2006 and 2000, but only until 2012 (the last available year of VKT data). Over longer periods the decline is even more dramatic (particularly since around 1970.

Hospitalisations

The Caradvice article was correct in that we have not observed as dramatic reduction in hospitalisations as in fatalities. Note that casualties is defined as total hospitalisations plus fatalities.
hosp-over-time
casualty-metrics-over-time
However, they seemingly deliberately selected years that could best show an increase in injury rates rather than representative years. If they had selected a year either side of their base date of 2006, then they would have shown a decrease in injuries rather than the marginal increases that supported their claims of unsafe driving.

Extended analysis

As outlined above a number of problems were evident in the Caradvice analysis – the key ones being selective use of particular periods, and basing analysis on visual observations or single periods rather than using proper statistical techniques. It’s the stats that we’ll look at in this section.

Fatalities, hospitalisations and casualties

Regression analyses were undertaken to test whether injuries and deaths were increasing or decreasing over time. The table below shows that the results (in the form of average annual change) can change dramatically depending on the time period chosen.
time-regression-deaths-hosp-cas
Regardless of the behaviour of hospitalisation (where over some periods one of the hospitalisation measures may show an increase), overall casualty rates have declined over time. Over the years from 2000-2013 (the years for which hospitalisation data is available), casualties declined by an average of 32.5 per year, with a significance level of 0.028 (meaning there is only a 2.8% chance that there is actually no decline over time). However, over the shorter time period of 2010-2013 (where fine data is available), whilst Excel can calculate a regression slope of -15 casualties per year, SPSS shows that this result is not statistically significant (significance level of 0.847) as there are not enough data points to validate the result.
Caradvice claim that the data “supports the theory that while cars have become significantly safer, people are still making the wrong choices behind the wheel and driving poorly.” For this to be true, then the number of casualties should be roughly flat over time, which means that hospitalisations should be negatively correlated with fatalities. We have just shown that casualties have been declining, but hospitalisations could still be negatively correlated with fatalities, just not enough to remove the effects of declining fatalities on overall casualties. The three scatter-plots below show the relationship between hospitalisation and fatalities, including the trendline calculated by Excel.
scatter-greater-14-days
scatter-less-14-days
scatter-total-hosp
Visual inspection seems to indicate that hospitalisations of greater than 14 days are actually positively correlated with fatalities (i.e. when deaths increase, so do hospitalisations and vice versa), while there appears to be little relationship between shorter hospitalisations and fatalities. Regressions and correlation analysis confirm this to be the case, as shown in the table below.
corr-hosp-deaths
The relationship between total hospitalisations and fatalities could be significant depending on your choice of significance level. 1% and 5% are commonly used levels in research, depending on what you are testing – such as 1% or less for medical/drug applications, 5% in psychology research – although 0.1% and 10% are also used in certain circumstances. What the results show is that there is a statistically significant positive correlation between longer hospitalisations and fatalities, and almost significant positive correlations between short and total hospitalisations and fatalities. These results indicate that it is not just safer cars turning deaths into emergency room visits that is reducing our road toll.
I earlier showed that overall casualties may or may not be significantly decreasing over time (depending on the time period chosen). But what about casualty and hospitalisation rates taking into account the increased population, vehicles and travel distance?
time-regression-casualty-metrics
Regression and correlation tests indicated statistically significant reductions in casualty and hospitalisation rates over time, apart from when analysing the short 2010-2013 period (when speeding fine data is available).
fine-regression-death-hosp-cas
What we can see here is that there is a statistically significant relationship only between the number of road deaths and the total funds raised by traffic camera fines. But as usual, particularly for this small sample size, you should remember that just because two variables are correlated, it does not mean that one is causing the other to behave in a certain way.

Longer time periods and road rule changes

A longer term view (from 1952) shows overall Victorian road deaths peaking in 1969-70 and generally declining thereafter. Deaths as a proportion of the population may have peaked around the same time (although it could be argued it had an extended peak from around 1965), whilst deaths as a proportion of registered vehicles started declining around 1960-66. Governments would argue that a number of road rule changes beginning in 1970 served to assist in reducing the road toll. For the purposes of examining these claims, the years from 1952 have been broken up into 10 distinct stages.
table-road-rule-changes
These changes are a mixture of actual safety changes (seat belts, bike helmets) and “enhanced” enforcement activities (random breath tests, general speed camera use). The chart below plots road deaths over time, broken into the different road rule stages.
deaths-by-rule-changes
There are a couple of different statistical tests that could be used to analyse the performance of the rule changes. We could run a regression or correlation analysis and show that as the period number increases, fatalities decrease. This is likely to be a bit of an obvious result though, as the death toll has (in general) fallen over time. It is not obvious how hospitalisations or total casualties would perform though, as there are only three stages (7, 8 and 9) for which this data is available.
corr-cas-by-flag
A better technique could be to run an Analysis of Variance (“ANOVA”) test to see which of the groups are different from the others, or which seemingly produced an immediate reduction in the road toll.
ANOVA-deaths-cas-hosp
ANOVA-death-metrics
ANOVA-hosp-metrics
The ANOVA tests showed that significant differences exist between groups for Deaths, Death metrics (population, VKT and vehicles), Hospitalisation metrics (population, VKT and vehicles), but not Hospitalisation or Casualties.
The next step is to conduct post-hoc tests to see which rule changes did produce a significantly different result than its predecessor regime. I chose to use the Scheffe test, as with a large number of comparisons, it controls overall Type 1 errors well, but it does have the downside of potentially missing some significant differences. For overall Deaths, SPSS produces the following output table. Note that to produce this table, 2014 has to be ignored, as it is the only year in stage 10, and SPSS will not calculate post-hoc results where any group has only 1 data point.
post-hoc-deaths
What we want to do is compare a stage to it’s immediate predecessor for significance. The only one that has a significance less than 0.05 is stage 6 compared to stage 5 – the difference being the introduction of compulsory bike helmets and the widespread adoption of speed cameras. It can also be seen that none of stages 7, 8 or 9 produce significantly different results than stage 6.
For the death metrics (which essentially measure the likelihood of a particular individual dying), there are slightly different results. Deaths as a proportion of population has similar patterns to overall deaths, deaths as a proportion of vehicles also has an almost significant difference (0.053) between stages 5 and 4 (speed camera trial period), while deaths as a proportion of VKT has significant reductions at stages 5, and 6. Click on a thumbnail for a larger view of the death metrics tables.
post-hoc-death-vkt
post-hoc-death-vehicles
post-hoc-death-population
But what about hospitalisation metrics? We have already seen that there has been no significant effect on overall hospitalisation levels, but there were differences in the rates of hospitalisation. For hospitalisation as a proportion of population, stage 9 was significantly different to stage 8 (introduction of more stringent mobile phone laws), with the same result for hospitalisation as a proportion of vehicles. For hospitalisation as a proportion of VKT, stage 9 was significantly different to stage 7, i.e. the cumulative effect of stages 8 and 9 (stringent mobile phone laws, upgraded camera operations and the 50km/h urban limit) was enough to make a difference.
post-hoc-hosp-metrics

Conclusion

Unfortunately, only having a combined four years of speeding fine revenue and hospitalisation data makes it pretty hard to conclude whether safer cars and not speeding fines are responsible for reduced fatalities and their conversion into hospitalisations.
It does seem fair to conclude that the general introduction of speed cameras has served to reduce fatalities on our roads. Whether higher penalties, rather than just the fear of getting caught, serve to reduce it even further is an open point. Whether governments are raising penalties primarily because it’s an easy way to raise revenue (every politician loves a good “sin tax”) or for more altruistic reasons I’ll leave to other people to argue.
I was surprised to find that the introduction of mandatory seatbelt legislation did not seem to produce a statistically significant reduction in road deaths. This is probably says more about the pattern of the data at a time when cars were becoming more common and the road toll was increasing accordingly, than it does about the effectiveness of seatbelts.
It is also important to remember that just because no statistically significant result was found that a road law change did not reduce accidents. This analysis was conducted on a real world environment, and was not an experiment that was able to control for all the variables, including major ones such as society’s view of speeding and drink-driving.
I guess I’ll just have to wear the latest speeding fine, and chalk it up to an experience in keeping me alive.

Feel free to pick holes in my arguments (and I’m sure there are a few) in the comments below. And if you have suggestions for other tests that could be run, weigh in with them too. Or just download the workbook to conduct your own analysis.



email
Brendan WalpoleThe perils of data analysis – speeding fines and the road toll

1 comment

Join the conversation

Leave a Reply

Your email address will not be published. Required fields are marked *