Wednesday, June 30, 2010

Doing your own on-time flight time analysis Part III

In the last post, the data from the on-time flight database was loaded in a column-orientated storage engine. Now the numbers can be crunched.

The original goal of this exercise was to find the flight from Los Angeles International Airport, LAX, to Dallas Fort Worth International Airport, DFW, that was the most likely to arrive on-time.

The data is 'opportunity rich' in that there is a lot information in there. It is easy to start wondering about the various nuggets of information in there. Are their certain aircraft (tail numbers) that are routinely bad performers? Are some days of the week better than others? Do national holidays have an effect on the on-time performance? If you are delayed, is there a 'regular amount' of delay? Does early departure make for an early arrival? Can the flight crew make up for a late departure? How much time is usually spend on runways?

But to look for the flight from LAX to DFW with the lowest arrival delay1.

SELECT FlightNum, SUM(ArrDelayMinutes) as Delay
FROM flight
WHERE Origin = 'LAX'
AND Dest = 'DFW'
GROUP BY FlighNum
ORDER BY FlightNum;

So there are lots of good questions to ask with this data but it ends this series on doing your own on-time flight analysis. If anyone digs into the flight data further, please let the rest of us know what you are finding.

1. For the data on hand the answer is American Airlines flight 2463.

No comments: