The last time I looked, it was a week ago. Where has the week gone?
First off, we had three problem this week. Although the public-facing site is free to use, we operate the site professionally. Part of that involves being transparent and honest, especially when things go wrong. So, what went wrong?
- The outage on Tuesday 26th July was caused by scheduled maintenance to increase capacity which shouldn’t have affected the site. However, due to the volume of messages we receive during the evening rush hour, we didn’t have enough spare capacity to cope. We lost several minutes worth of messages which were held within one of the systems we took down, and these weren’t replicated on to the second system. The site appeared to have suffered an outage, with trains appearing in the wrong position
- The outage on Wednesday 27th July was caused by more scheduled maintenance and a problem with some database commands. Rather than deleting all old schedules prior to 6th July, it deleted all schedules prior to 6th August, which was everything. As a result, we had to rebuild our timetable database from a backup. We took the decision not to restore the full backup as it would take too long, and we wanted to get the site back up and running quickly
- The outage on the Thursday 28th July was related to the outage the previous evening. One of the database queries that runs to create trains runs very slowly under some circumstances, taking up to 4 seconds to find data rather than about 4 milliseconds. As a result, every message we received took far longer to process and eventually one of the processes on the server failed. We didn’t notice until the next morning as our automated monitoring doesn’t currently check for this specific scenario. However, when we found out what had happened, we restarted services, found the issue and put in a tactical fix to get the site up and running. We lost about 9 hours worth of real-time data
So that we’re not hit by these problems in the future, we’re fine-tuning how we operate the site. Nobody likes outages!
That’s the bad news dealt with – on to the good news! What’s new on the site this week?
- The Banbury area is being resignalled, and we’ve updated the Didcot Parkway to Banbury map to include the renumbered platforms at Oxford, the new line toward Oxford Parkway (moved from the Chilterns map) and extended the map further toward Leamington Spa. As the new signalling is commissioned, you’ll be able to see signals light up on the map
- When clicking on a location in a schedule for a train which has run, the location link is supposed to take you to the location at the time the train was booked. However, it was taking you to the time in GMT, not local time – so we’ve fixed it to return the right time
- The Grove Park to Bromley North and Hildenborough map now includes route indications all the way to Orpington, which we’ll be extending in the coming weeks. Due to data limitations, we can’t tell which sets of points some routes use, so there may be a green signal but no route set where there are alternate routes
- On the East London Line map, the crossover between platforms 2 and 3 at Dalston Junction was mis-positioned, causing conflicting routes to appear to be set. This has now been moved so it’s possible to make a move from signal 206 to platform 3 at the same time as a train moves from signal 203 to signal 211
- On the Esher to Basingstoke map, a route from signal 132 was mis-drawn, which has now been fixed
- On the Exeter to Liskeard map, the inter-map links weren’t working, which we’ve tidied up
Until next time, enjoy the new Banbury map and the rest of the fixes!