Interview

Learning from data

Joachim Betz, Principal Transport Solution at T-Systems, in conversation with Peter Schütz, Head of Traveler Information at Deutsche Bahn AG.
Author: Thomas van Zütphen
Photos: Oliver Krato

Mr. Schütz, what exactly is the “forecasting machine”? 

The forecasting machine is used to provide customers with highly precise information on train arrival and departure times along their planned itineraries and to show alternatives in the event of deviations from the plan. This information is “played” centrally by the forecasting machine from a “Single Point of Truth” to all of our output media. These are, for example, the display boards in the train stations, various apps, websites and travel portals in the network. 

What’s the status quo? 

On long­distance services, the best networked system in the country with up to 1,000 trains a day, our passengers and those picking them up at stations have been benefiting from forecasting machines since May of last year. An improved travel experience in this form was also the aim of the project when we launched it in 2015. The initial question was: How do we get the best information on connections or delays for the arrival and departure times of our trains? Our hypothesis at the time was: Huge amounts of data and intelligent algorithms should make it possible to make and deliver a more reliable forecast in real time than our old forecasting system could achieve. The algorithm always calculates the actual data in relation to the current operating status of the customer’s itinerary. And that’s refreshed every minute. Particularly on long­distance routes, the operating status can change every minute. When a customer travels directly from A to B, one minute is usually meaningless. However, a minute can be decisive when changing trains. That’s why it’s important that we get a good forecast for changing trains at an early stage and can offer the customer a consistently reliable travel chain forecast. 

What kind of data does the forecasting machine use? 

Train­running information, scheduling decisions, but also secondary information that the system can obtain from delays of other trains and that the algorithm has learned to deal with. In today’s live operation of the forecasting machine, these are all inventory data that we guide through the algorithms. In the future, this will be supplemented by circulation information, weather data, GPS data, etc. in order to achieve a shorter and at the same time even more valid information cycle. On the technical side, there are only two levers of improvement for this service: on the one hand, the data – the diversity of its sources and their provision frequency – and, on the other hand, the algorithm. 


Speaking of the “future” – what are you planning next? 

The aim is to further improve the forecasting quality for long­distance services and, in parallel, to start integrating regional and metropolitan trains in 2019. It’s always a matter of learning from the data and deducing from it which improvements can be rolled out. That means nothing more than reading the right things from the data. If you like, it’s a standard process in the Big Data world. This also includes fast failing, sorting out data in order to say in a portfolio tunnel: These are the drivers with which we can once again create the largest hub for improving forecasts. The next step is to adopt these elements and introduce them into production. To this end, we want to successively expand our data source landscape and our long­term goal is to also include local public transport, i.e. to integrate buses, subways and suburban trains.

​​​​​​​

And when does a cooperation partner on the IT side come into play? 

First of all: we use state­of­the­art technologies for the many millions of forecasts that are made every day. Microservices, for example, which consolidate the data by the minute, are automated and scalable, on Big Data platforms in the cloud. This gives us the scalability that allows us to be very flexible. Today, modern IT is a cloud operation. But the idea behind our cooperation with T­Systems is a different one: We bring the railway knowledge, while T­Systems brings the algorithmic knowledge – the knowledge about how to find the right algorithm to solve a certain problem. They provide the competent team that knows everything about analytics and can handle both AI and Big Data. If both groups, and not just the respective data scientists, work together in a highly integrative manner, then there is real power behind it. Analytical power, implementation power and a great deal of competence. We have made a really great start, but we are sure that we can continue to improve here as well. And with every improvement, we will raise our bar. That is the so­called “forecast quality.” In long­distance services today, 30 minutes before a train arrives, it is 87.5 percent. However, we want to aim for a forecast quality of 95 percent, ideally even 99 percent. 

How will you achieve this? 

A travel guide that is as comprehensive as possible needs information from all the modes of transport involved. And we’re talking about over 800,000 trips a day on public transport in Germany. That’s why we are already integrating a large number of external companies into the travel information platform so that we can supply the information channels consistently and well. We have deliberately said that we are making the basic information available not only to DB companies, but also to all external transport companies, so that the entire industry can benefit. In the long term, this also includes air traffic, i.e. the real time supply of take­off and landing information to airlines and airports. That is the future. In principle, the ability to forecast is not a USP in itself. This is more of an industry standard. But to make forecasts of this quality, as the forecasting machine can already do for long­distance train services today, is something special compared to other countries. That’s why we are investing so that we can expand this service even further and make it faster.


You can’t get much faster than “real time”. 

It’s not about systemic speed, but about the speed of project implementation. The quality of the algorithm accounts for 30 percent. The quality of the data accounts for 30 percent. But 40 percent is down to the way we work. At some points we still work redundantly as a team. We have to get to the market even faster through the way we work. That’s the next step. And that’s where I have confidence in the partnership. It’s working.

Contact:​​​​​​​ J.Betz@t-systems.com