What Started This
Off:
One of the benefits of living at Quayside is easy access to
New Westminster skytrain station.  I take
the train (and a bus) to work every day. 
Generally the skytrain service is quite good.  Major delays seem quite rare, but I was
getting a feeling that they had been more frequent this winter.  Reliability is my line of work, so I figured I’d
take a look-see.  
Disclosure – I have no affiliation with Translink or either Skytrain
operating company.  My only interaction with
the system is as a rider.  While I have
spent nearly 10 years working in heavy industry as a maintenance and
reliability professional and have a degree in Mechanical Engineering, I am not certified
as a Professional Engineer (yet), and none of this should be considered
professional analysis or advice.  This is
just an overactive blogger’s exploration of a collision of interests.
At first I looked at a few news articles discussing skytrain reliability (note: in this article Skytrain refers to both the Expo and Millennium lines
run by BC Rapid Transit Company,
and the Canada Line run by inTransitBC).  These were just regurgitated customer
complaints and Translink talking points, so I decided, “no, I need to go to the
source” and dug into the actual reports from Translink. Big mistake.  What they reported was so infuriatingly vague
and useless I made a project out of this to try and get a real sense of
skytrain reliability.  
Highlights from the 2013 Q2 report.  The highlights are colour-coded indicating my
rage level at reading them.  Keep in mind
this is only for half a year.  
Expo/Millennium Lines:
- Service reliability (% service hours delivered): 99.6% (Target 99.8%)
- On-time performance: 95.7% (Target 94.5%)
- Complaints per million boarded passengers: 6.4 (Target 5.1)
- Service Hours: 550,415
- Collisions and derailments per million km: 0
- Utilization: 18%
- Service km: 22,181,836
- Mean Distance Between Vehicle Failures: 563,654 (no units given, presumably km. If it’s meters, we’re in trouble)
- Boardings per Service Hour: 68.7
- Boardings (BpSH * Service Hours) = 38,723,030
Canada Line: 
- Service reliability (% service hours delivered): Not reported
- On-time performance: Not reported
- Complaints per million boarded passengers: 12.3 (Target 2.7)
- Service Hours: 96,622
- Collisions and derailments per million km: Not reported
- Utilization: 35.9%
- Service km: 3,381,770
- Mean Distance Between Vehicle Failures: Not reported
- Boardings per Service Hour: 209.2
- Boardings (BpSH * Service Hours) = 20,213,322
- Contract Adherence Monitoring: 95.8% (Target 98.5%) – How’s that for a clear metric?
So by the time the 2013 reports are out, I should have about
3 heart attacks from reading Translink’s uselessness.  Reporting for the bus fleet was even worse,
but that’s a story for another day.  
Why Is The Report
So Bad?
The fundamental problem with the reported figures is, “What
the bleep does any of this mean?”   The first question that comes up in any proper
discussion of reliability is, “How are you defining reliability?”  As Translink does not define it for us, I am
left to guess.  There are a tremendous
number of questions that impact how the metric is reported vs. what people
actually experience.  
On-time performance: there is no publicly available skytrain
schedule that I can find – all the schedules I found on www.translink.ca define
first and last train times, and service frequency during blocks of time.  Customer service tweets regularly say there
is no set schedule, just a train frequency. 
Does BCRTC/inTransitBC have a set schedule for when a train is due to
arrive at each station, or is “on-time” service measured by trains meeting
frequencies?  Who knows?!?  In theory you could back out a schedule from the
first train time and service frequency, however the frequencies are variable
(e.g. 3-4 minutes) and change through the day so this gets ugly quickly.  
So is a late trip one that departs or arrives at a station later
than a set clock time?  Or is a late trip
defined as a train that misses its headway (e.g. the train is supposed to be
four minutes behind the train ahead of it, but is actually running five minutes
behind)?  Is it a late trip every time a
train is late into a station?  Or is it
calculated over an entire run? Or a day? 
For example, suppose our train is on the Expo line heading
downtown.  It departs King George on-time,
but due to mechanical malfunction is late into Surrey Central, falling behind
the train in front, and consequently is late into every station down the
line.  Is this counted as one late trip
or 19?  If it is calculated over the run,
at what point or points are lateness determined?  If it’s late to Metrotown but makes up the
time by Waterfront, is that trip on time, or not?  The longer the timeframe a trip is measured against, the easier it is to miss out on small delays as they get averaged out if the train meets the larger schedule on time.  
Are particular kinds of problems not counted against on-time
performance?  That is, is performance
measured strictly against mechanical or operational issues (i.e. things within Translink’s
control), or do they count it anytime the train is late regardless of the
reason (e.g. medical emergencies, Godzilla attacks).  While I certainly sympathize with medical
incidents, if somebody has a heart attack on a platform shutting down the train
seven times in a week, I want to see that somewhere.  
Why is there a two-minute buffer in the on-time performance?  Given that Expo Line trains are often running
at 2 minute headways or less, depending on how performance is measured you
could rig this calculation to the point that service could be cut in half and
it would still measure as perfectly on time. 
At least it’s not the airline industry’s 15 minute buffer, but on-time
is on-time, not two minutes late.  
The service hours metric raises many similar questions.  Does that mean the stations were open 99.7%
of their scheduled times?  Does that mean
at least one train was running 99.7% of the time?  Trains were on the tracks and not immobilized
99.7% of the time?  And this doesn’t even
start us down the road of cost-effectiveness - what is train availability
compared with utilization, opportunity costs due to lateness, call-out and
emergency costs due to breakdown.  
On-time performance of 96.5%, and 99.7% service hour
delivery.  What do those two numbers mean
to me as a transit user?  I have no idea.  These are meaningless management metrics meant
to look favourable in the annual public reporting and to give Translink a
defence whenever Skytrain reliability is questioned.  Is Skytrain reliability bad this year?  We had an on-time performance of 96.5%.  Ninety-six point five!
What I care about as a customer is when I go to a station,
will a train show up in 2-4 minutes (or whatever the designated frequency is),
and will there be any major delays en-route to my stop.  I want to have an idea how frequently the
system is significantly slowed down or delayed. 
I don’t care as much about a train being held at a station an extra
minute for a police incident, or other minor localized delays.  I build buffers into my trip time to deal
with that, much like I add buffers to deal with traffic if I’m driving.  What I want to know is how often is the whole
system is grinding down – a train malfunction forcing single-tracking that
delays a whole line, station shut-downs, bus bridges, the works.  I want to know how likely it is my 10 minute
train ride is going to take 30 minutes. 
As a point of reference, my worst experience was a nominal 15 minute
ride taking 1.5 hours.  Translink’s
reporting gives me no sense of this at all. 
So What Did I Do?
The proper way to do this would be to dig through reams of
data from the Skytrain Operations and Maintenance systems, detail every
incident regarding the who, what, when, where how and why, and do some analysis
to figure out how often the system is delayed, why those delays are occurring,
and recommend what to do about it.
I don’t have access to this data.  I could likely make a Freedom of Information
Request for it, but it would be massive, unwieldy, missing 2/3rds of what I
need, and without direct access to their systems to delve into it further and
access to the personnel to answer questions around context, I really couldn’t
do a proper job of it anyways.  
However, to get a higher level estimate of major problems, I
don’t need access to the down and dirty data. 
Through Translink’s amazing customer service twitter feed (seriously,
if you take transit in Metro Vancouver, follow them!) they announce when there
are major problems.  If there’s an issue on
any Skytrain line, these guys and gals will be tweeting it.  My giant make-an-ass-out-of-you-and-me
assumption for this whole analysis is this:
If the delay isn’t bad
enough for someone to tweet about, it’s not bad enough for me to care about.
This goes back to my earlier point.  I have buffers in my travel schedule to deal
with the minor problems. It’s the major ones that concern me.  
Now just because Translink tweeted it doesn’t necessarily
mean I will count an incident as a major delay, and vice versa.  Some judgment is involved.   Also, I did see at least one tweet mention that
the Customer Service folks don’t get proactively notified unless the delay is
estimated at 10 minutes or longer.  
So basically, I scoured Translink’s twitter feed for
skytrain problems.  I looked for a tweet indicating
a major skytrain incident had occurred and was causing delays.  Then I looked for a corresponding tweet
indicating it had ended.  The time
difference between the two tweets gives the approximate duration of the
event.  Knowing the duration of events
and the scheduled hours of service, I can estimate how often the skytrain
system is under some kind of major slowdown or delay.  Even better, many of the tweets indicate
where the incident occurred and a reason for the delay, letting me dive into
things to a small degree.  
There are lots and lots and lots and lots of simplifications and assumptions going on here.  Even with these, there are a number of
challenges with the data and analysis.  Namely:  
1. Downloading someone else’s twitter feed is hard.
 Seriously – try it some time.  I looked at Translink’s twitter feed early on
November 15th, and they apparently had posted 133,822 tweets from
the beginning of time.  My google skills
were not sufficient to get me an easy way to download all of their tweets.  http://snapbird.org/
worked well for the first 3500 tweets, but wouldn’t give me anything beyond
that.  An old html/xml command didn’t
work.  All the other solutions involved
learning programming and twitter’s API.  I’m
already spending enough time on this.
Solution?   Make Translink do all the
hard work for me.  I submitted a Freedom
of Information request for all of their tweets up to November 15th,
2013.  After 60ish days I got the tweets
from November 21st, 2011 through February 11th, 2014. According
to Translink, Twitter could or would not give them the earlier data ( about another 1.5
years worth), but they were continuing to pursue it.  Rather than wait, and given assumption 7, I
ran with what was provided – November 21st, 2011, through February 11th,
plus a few days of hand gathering data from twitter to round it out to February
15th, 2014.
2. Coverage Gaps: 
The customer service tweeting hours do not line up exactly with Skytrain
operating hours.  On weekdays, the trains
start about an hour earlier than customer service, and end 1-1.5 hours later
than them.  
Assumption: Assume any delay that is in progress when Customer Service starts up in the morning has been ongoing from the start of service. Likewise, if a delay is still ongoing when they close, assume the delay continues until the skytrain shuts down for the night. If a major delay happened entirely within the pre or post-customer service hours and was not tweeted, for the purpose of this analysis it never happened. Starting early or running late was an issue in about 5% of all incidents. There is some fudge factor in here as I frequently found tweets indicating a problem before the official opening time of the twitter service, and so ran with the tweet times there.
Assumption: Assume any delay that is in progress when Customer Service starts up in the morning has been ongoing from the start of service. Likewise, if a delay is still ongoing when they close, assume the delay continues until the skytrain shuts down for the night. If a major delay happened entirely within the pre or post-customer service hours and was not tweeted, for the purpose of this analysis it never happened. Starting early or running late was an issue in about 5% of all incidents. There is some fudge factor in here as I frequently found tweets indicating a problem before the official opening time of the twitter service, and so ran with the tweet times there.
3. Delays in announcements.  There is some delay between the time an
incident occurs and when customer service tweets the announcement.  Similarly there is a delay between when the
system is restored and when the announcement is made. 
Assumption:  assume that the delay before
announcing a problem and delay announcing a resolution are the same, and so there
is no effect on the incident duration. 
4. Restored means something different to Translink
than it does to me.  Translink announces
when the technical problem is resolved – the problem train is off the tracks,
the medical emergency is cleared, etc. 
That doesn’t mean the system is instantly back to normal – trains are
still bunched up and overcrowded, stations might still have passups.  Depending on the type and length of problem
it could take significant time for the system to stretch out and get back to
normal.  
Assumption: when Translink announces the problem is resolved, the system is
magically restored to normality, but with a minimum incident time of 10
minutes. 
5. Inconsistent terminology.  Skytrain delays.  Expo Line problems.  C-line issues.  MAJOR SYSTEM WIDE DELAYS.  There is no standard terminology for
describing Skytrain problems.  Customer
Service is pretty good at putting #Skytrain on really big issues, but not
perfect, and definitely not for everything I was after.  I used keywords in various combinations to
pull up relevant tweets, and even those required judgment on when it was a
major problem vs. a minor single train delay. 
Did I miss some incidents because of this?  Most likely. 
6. Other Data Sources.  If I personally experienced a delay or had
news reports discussing a major delay, I went with that over what I could
estimate from the tweets. 
7. Reporting consistency.  The way the twitter feed has been used has
likely changed over the 3.5 years it’s been running.  The account has gone from zero to over 40,000
followers in its time.  I have no way to
account for that, so I won’t, but I’m missing the entire first year of usage,
so I would expect things were reasonably settled down by the time my data
starts.   
8. Anything that says Skytrain is Expo and/or Millennium
line.  Translink seems diligent about
calling the Canada Line out separately, so there might be a slight bias in
favour of the C-line, but I don’t believe it is much.    
9. Sometimes they will report a problem cleared,
but no start of a problem, or vice versa. 
In that case I assumed the problem started or ended 10 minutes before or
after the relevant tweet. 
10. Delays or slowdowns for planned maintenance are
not counted as incidents.  So all those
evenings for the past year and a half where things slow down for the power rail
replacement project don’t affect the numbers. 
When the Canada Line goes to single tracking after 11 pm for regular
maintenance, it is not included.  If an
actual incident occurs and is reported during that time, that is still
counted.  I think maintenance
announcements represented nearly half the tweets I had to wade through.  
This can get fuzzy.  On January 31st,
2014, the Canada Line had a major problem. 
The next three days there was ‘track maintenance’ causing delays.  That maintenance was likely a result of the
failure that occurred on the 31st – do I count that as planned
maintenance and ignore it, or do I count it against the system?  In this case I was nice and considered it
planned maintenance, but realistically delays like that should be counted
against system performance. 
11. When a problem was reported over a range or area
without specifying where exactly it occurred, I tracked it against the station
closest to Waterfront Station on the system map.  If not reported at all, I classified it
against the Expo/Millennium Line or the Canada Line in general.  Incidents were tracked with the following
data:
a.     
Date
b.     
Failure Mode, if any information was given (why
it failed e.g. train issue, medical incident, track intrusion alarm)
c.     
Train Line and Location
d.     
Start and End time of the incident, giving a duration
So What Does All
That Mean?
Throw all those assumptions and ~107,000 tweets in a
blender.  Note that for the remainder of
this article, “Expo” or “Expo Line” refers to the combined Expo and Millennium
Lines (both lines through Burnaby and the line to Surrey) and “Canada” or
“Canada Line” includes both the Richmond and Vancouver Airport segments.  What we get is from November 21st,
2011, through February 15th, 2014…
Table 1:
Summary of incidents.  All times reported
as hh:mm.  MTBF = Mean Time Between
Failures
This gives us a little bit of information. Clearly, there are more incidents on the Expo Lines. This makes sense – older tracks, older trains, more service, and much less of the system is underground and thus it is more exposed to weather and morons compared to the Canada Line. What surprised me the most was the overall frequency – a major incident every three days on the Expo Line!
Table 2:
Incident Durations and Overall System Reliability
In this case the reliability is calculated as the percentage of time the system is not operating under an incident, based on an average of 20 service hours per day and 817 days in the study. I don’t have any significant transit experience to compare this against, but my gut reaction to these numbers is they feel pretty good. Not that it compares directly, but it is below the 2013 Q2 reported Service Reliability of 99.6% for the Expo Line.
Curiously, the Canada Line tends to have significantly
longer incidents with both Average and Median delays roughly double that of the
Expo Line.  The Canada Line had about 80%
fewer incidents.  Canada Line runs about
20% of the service hours, and 15% of the service km of the Expo Line, so that
matches up nicely.   However, the total
duration of incidents was only half that of the Expo Line, waaaay higher than
you would expect.  Both lines had 10
minutes, the shortest delay counted, as the most frequent delay (the
Mode).  
Figure 1:
Histogram of Incident Durations
55% of incidents were resolved in 20 minutes or less, and
once you get past 30 minutes the incident durations seem pretty flat.  Let’s plot this out to see what else we can
tell from the delay times.
 Figure 2:
Median Ranks of Incidents vs. Duration
This is a fairly ugly, but fairly typical chart of incident times from a maintenance and reliability perspective. What we would like to see here is a nice straight, steep line. That would indicate the delays are short (the steep part) and distributed evenly (the straight part).
What we actually see here, and in most other organizations, is a straight,
steep part for the start of the chart, with a kink in the upper half.  That indicates the bulk of the incidents are
being resolved in a regular period of time – the Expo Lines have 80% of their
incidents resolved in 50 minutes or less, the Canada Line has 70% in an hour or
less.  However, once things stretch
beyond that mark, the duration gets very uncertain and potentially very lengthy.  
What To Do About
It
Translink’s goal should be twofold.
- Get rid of all the stuff that’s happening after the kink. Long duration incidents are a killer – they cost Translink money in terms of the direct costs of the incident, lost fare revenue, and lost goodwill. Also they create a public relations problem – once skytrain incidents reach two hours, or stretch through rush hour, they become headline news (but we have 96.5% on-time service!).
- Once the long duration issues are well handled, focus on shortening up the “regular” durations, i.e. cut it from almost all incidents being resolved in under 50 minutes to being resolved in under 30 minutes.
Interestingly, the Canada Line is less consistent in their
repair times (the kink in the chart comes lower down, and the tail stretches
out much further).  We’ll get into that
as we continue.  First, let’s delve into those
long duration incidents, specifically those lasting at least 45 minutes.  Why 45?
- That’s right around the kink in the chart – once incidents hit the 45 minute mark, it becomes much less predictable how long it will last.
- Travelling the Expo Line takes roughly 40 minutes end to end. The Canada Line is shorter. That means if an incident lasts 45 minutes, it will by definition affect the entire line.
- Major incidents this long are much more likely to get reported and updated on twitter, so the data is likely better for longer incidents.
Table 3:
Summary of Incidents Lasting at Least 45 Minutes
I have to say, this really surprised me.  Basically, Translink has a major Skytrain
issue that lasts 45 minutes or longer nearly three times a month!  The long incidents represent only 24% of the
total number of incidents, but 76% of the total incident time.  Both the average and median are well removed
from the lower limit of 45 minutes – it’s not a raft of issues at 45 minutes
with a few outliers lasting longer – the incident duration is very widely
distributed, as we saw in Figure 1.  How
serious a problem is this? From our earlier 2013 Q2 data
- ~58.7 million boardings in Q2
- 3,650 system operating hours, based on 20 hours per day for ½ a year
- 16,000 boardings per hour (this number seems really high to me, but works out to about 300 boardings per hour per station which feels reasonable. It also gives ~320,000 boardings per day, which is in line with other Translink reports of ridership)
- An average fare of $1.86 (from 2012 annual budget)
- 188.5 hours of lost service
- $100,000 in repair labour (say four tradesmen @ $75 per hour for 188.5 hours plus two hours for every incident)
- $100,000 in repair parts (guesstimate: parts $$$ = labour $$$ - probably low but on the right order of magnitude)
Add that all up to get $5.8 million over roughly two years,
of which $5.6 million is lost fare revenue. 
Now, this overstates the costs, probably to a large degree.  Boardings are not the same as fare
passengers.  A lot of the fares will be
delayed, or shift to another Translink service, or are monthly passes and so
don’t represent lost revenue.  And while
not chump change, $5.6 million over two years fits into ~$900 million of total
fare revenue for Translink greater in that period.   Still, we’re likely talking seven-figures
between costs and lost revenue, and that doesn’t touch the lost goodwill or
scaring away potential users.
Switching It Up
All this emphasizes what I said above – to improve skytrain reliability, first Translink needs to get a handle on why long incidents occur and work to prevent them. Let’s look at some Pareto Charts for these incidents.
Figure 3: Pareto of Skytrain Failure Modes
What strikes me most about the Pareto are the Rail and Switch incidents, given their large durations and relatively small counts. Interestingly, they cascade down to both the Expo and Canada Lines, though in slightly different manners.
Figure 4:
Pareto of Expo Line Failure Modes
When we break it down by line, “Trains” pop up higher in the
Expo Line.  Again, this is expected – a
good portion of the trains on that line are nearing 30 years old, and even the
new trains are younger than the Canada Line trains.  On the Canada Line, a single major problem
with the control system resulted in it appearing near the top of the
Pareto.  
Rail and switch items show a very high average duration –
that is they don’t break as often as the trains, but when they do it takes a
long time to fix the problem.  This seems
intuitive – a train can be pulled off the tracks, while a rail or switch issue
needs to be dealt with in place, and also has higher overhead when being fixed
– you need to lock out the power system, workers need to travel to the problem,
and then clear the tracks. 
This helps explains the longer tail of the Canada Line
incidents compared to the Expo Line.  The
Canada Line has proportionally more Rail and Switch issues, as well as the
major Control issue.  Given that the
Canada Line is less than five years old, I wasn’t expecting such a high rate of
those issues.  Had this been the year
2010/2011 I would have explained these issues as infant mortality.  You would expect a higher incident rate as
you worked out the kinks in a new system. 
But nearing five years in, those kinks should be worked out,
particularly since the Canada Line uses traditional electric train design, not
fancy Linear Propulsion.  It seems more likely that there are some
design issues with the Canada Line track/switching system.  The switch incidents are split evenly over
the years, but fall exclusively within the months of January-May.  There’s not enough information to say what
the problem is, but if I had to take a guess I would venture there’s an issue
in terms of robustness regarding weather and/or cold, compounded by the
complicated track geometry near Bridgeport.
Two incidents accounted for nearly 1/3 of the total time for
rail and switching issues: a problem at Richmond-Brighouse on January 31st,
2014 and a problem at New Westminster April 25th, 2013.  Of the remaining issues, nearly half of them
happened at Bridgeport and Columbia stations, with the rest scattered
about.  Again, this makes sense.  As confluence points both stations have a
high switching frequency and Brighouse has complicated track geometry and
additional complications with the maintenance centre there.  
These stations have had a switch/rail failure on average
every four months between them.  We know
the rail system undergoes nightly inspection, but it appears a significant
portion of those inspections is directed towards the linear propulsion rail and not necessarily the power
or traction rails.  Switches and rail are
not new technology, and there are numerous papers on rail and switch
maintenance.  Other transit agencies
certainly have rapid transit lines with frequent switching.  How has Translink adopted best practices from
the rail industry in general and from other transit agencies to manage
this?  Given the high stress at these
stations, maybe best practices are not enough and Translink needs to go beyond
them to maintain a reliable system.  
Trains
Trains are by far the most frequent cause of grief, both for long duration (20 incidents) and when counting everything (111 incidents). The average duration for Train incidents is 30 minutes, with a median of 15 minutes.
Train Incidents are all over the map. As mentioned, many Expo Line trains are getting old and that is reflected in the higher number of issues on the Expo Line. There are certainly analyses that can be done on the trains to ensure they are being maintained to optimal reliability (Reliability Centered Maintenance being the traditional one). However, there are many ways to optimize maintenance, and no one involved says to what end they are directing their maintenance. For example, you could optimize maintenance to
- Maximize train reliability to minimize service disruptions
- Minimize direct maintenance costs while maintaining a set level of reliability
- Minimize total system costs (including emergency repair costs, lost fare revenue, etc)
- No optimization, just follow the schedule in the maintenance manual
All of these will result in different train reliabilities,
direct costs, and extended costs.  Which
is the right one to use?  Any of them are
valid choices (except for just follow the manual – that’s lazy and rarely
optimal for anything) and can be justified. 
But it would be nice to know what philosophy is being used and how train
(or track for that matter) maintenance has been changed to deal with trains
that are 30 years old now.  
Another problem with Trains is many of the train issues are
time-outs from people holding the doors open too long at stations.  This was reported about 20% of the time for
the long duration incidents, and 25% for the incidents less than 45 minutes
(these are likely low – train incidents were not always specified beyond being
a train incident).  The best solution to
this problem I can think of is to replace the door seals with giant blades that
sever anything caught between the doors as they close.  The cleanup crews would have a few more limbs
to sweep up, but it would cut down on time-outs significantly.  
Indictables, Ills,
and Idiots
People do other stupid things too, like criminal acts and trespassing into the tracks. They also get sick sometimes.
Intrusion Alarms are the second most frequent issue after
trains, with 58 incidents during the two years. 
They tend to be short, with an average duration of 23 minutes, but still
a median of 15 minutes.  There were only
8 incidents longer than 45 minutes.  
Depending on the nature of the Intrusion Alarms, improved station design may be able to reduce these incidents. I find it absolutely insane that you can have a passenger platform open to tracks energized to 600 VDC, and where trains blow past at 30+ kph, with no safeguards other than a yellow strip of paint and a safety switch on the tracks.
Medical Incidents are also frequent but short – 43 incidents, 31 minute average, and seven lasting at least 45 minutes. There’s not much that can be done about Medical Incidents other than having more Skytrain Attendants scattered around to reduce the response time. I don’t know for certain, but I assume all attendants are trained in first aid. I suppose we could add biometrics to the Compass Card and do a medical scan before you enter the station.
Police incidents are less frequent still, with only 16 incidents (including the single incident that occurred at YVR-Airport Station). They tend to be longer, though, with the average duration 51 minutes. One of the supposed benefits of faregates is a reduction in criminal activity – that should show up in a reduction in police incidents in the years to come.
Location, Location, Location
Not too many surprises here: the incident rate along the combined portion of the Expo/Millennium line is about twice as high as the other segments. It has older tracks, older trains, and it runs twice as many trains as the other lines – a recipe for trouble. Columbia and Bridgeport stand out due to their track/switching issues. Edmonds gets a lot of train incidents, presumably since the maintenance centre is there – how many of those truly occurred at Edmonds compared to how many are just being reported there is anyone’s guess, particularly as Bridgeport doesn’t show this.
Somehow Brentwood, Braid, Lake City Way, Surrey Central (seriously?) and King Edward escaped scott free. Scott Road, however, was another story – this was a big surprise when looking at the location data. It tied Columbia for the most incidents, but I’m not sure what is driving it. It’s largely Train incidents by count, so people holding the doors perhaps, but Intrusions, Trains and Police Incidents all had about the same total duration.
Other surprises were Patterson and Olympic Village. With Patterson the rest of the line is fairly consistent at about 10 incidents per station along that section, while Patterson has only two. Olympic Village reversed that, with seven incidents compared to the average of two along the rest of that section of the Canada Line.
Now, onto what prompted this insanity in the first place.
How Bad Was This Winter?
Figure 7: Incidents
by Month and average monthly incident rate for winter months.  Non-winter months greyed out.  Note: 
November 2011 not included, February 2014 estimated from 1/2 month of
data
Well, if we look at the average monthly incidents and length
of incidents over winter (loosely defined as November to February), it’s
trending up.  The number of monthly incidents
of at least 45 minutes decreased slightly in 2014, but not back to 2012
levels.  This is where it would have been
nice to have that missing data from 2011, although I don’t know if I could
survive combing through another year of twitter data.  However, the rest of November would have been
nice to compare with the massive spikes there in 2012 and 2013.  Why is there such a spike in November?  I’m not sure – the incidents are a typical
mish-mash consisting mostly of Trains, then Intrusions, and smattering of
others.  Maybe it reflects a surge of
ridership with the first winter storms.  
Incident duration follows the same trend: winter 2012 was about 40 minutes on average, 2013 and 2014 were right around 60 minutes. Something interesting on the chart is the uptick of incidents in the springtime. The types of incidents occurring in March and April of 2012 and 2013 appear to be a general mishmash, with no particular type standing out of line. Spring storms, mayhaps?
Curiosities
- Longest stretch of time with no incidents: 19 days: July 18th to August 5th, 2013
- Longest stretch of time with no incidents > 45 minutes: 61 days (twice): May 3rd to July 2nd, 2012 and August 27th to October 26th, 2012.
- Incidents occur most commonly on Mondays and in the afternoon rush. Sundays appear eerily quiet.
Figure 9:
Incidents by Start Time
Figure 10:
Incidents by Day of the Week
So What?
So to answer the question was winter 2013/2014 worse than
previous years? Yes it was.  More
incidents and longer incidents means more grief for passengers.  
How good is Skytrain reliability overall?  Well, it really depends on how you look at the numbers.  The optimist would say the combined system is operating normally 98.5% of the time, and the individual lines even better than that.  The pessimist would say there's a significant issue ever 2.5 days, and major issues lasting more than 45 minutes occur nearly three times a month.  Knowing all this does make me want to increase the buffer in my schedule a bit, but it
doesn’t fundamentally alter how I will use the Skytrain system.
There is nothing here that Translink and the operating companies don’t (or at least shouldn’t) already know. Trains are an issue on the Expo Line. Track and system issues are a problem on the Canada Line. My questions for Translink are about what is being done to address these issues.
- What is being done to fix the problems with the Bridgeport tracks and switches?
- What parts of the Mark I train refurbishment program will address train reliability issues?
- What actions are being taken to reduce the number of and duration of intrusion incidents?
And similar questions for the other issues.  I would also encourage Translink to be more
transparent in their reliability reporting. 
Show us what’s really happening; tell us why and what you’re doing about
it.  It shouldn’t take a blogger with too
much time on his hands to do this.  Finally,
my advice if you want to avoid Skytrain issues: 
Travel only on Sunday nights during the summer.  
I would like to give a great big sloppy wet kiss to Translink’s customer service team, firstly for providing such a great service, and secondly for providing me a giant set of data to use against them. Uber thanks to Matt Lorenzi for the killer skytrain map, and my editor wife for paring down my raging rant into a semi-coherent line of thought. You guys are all the best.













 
 
Interesting and nice work! I read up to about half of the text and stop, and want to ask a question and if you want to do some more works :) ...
ReplyDeleteDoes it make more senses to do your Sky train incident analysis based on impact on the number of affected rider? A 45 minute delay incident during the rush hour will certainly impact more people than one in late evening?