Load Packages
# numerical calculation & data frames
import numpy as np
import pandas as pd
# visualization
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn.objects as so
# statistics
import statsmodels.api as sm
R for Data Science by Wickham & Grolemund
# numerical calculation & data frames
import numpy as np
import pandas as pd
# visualization
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn.objects as so
# statistics
import statsmodels.api as sm
# pandas options
"mode.copy_on_write", True)
pd.set_option(= 2
pd.options.display.precision = '{:.2f}'.format # pd.reset_option('display.float_format')
pd.options.display.float_format = 7
pd.options.display.max_rows
# Numpy options
= 2, suppress=True) np.set_printoptions(precision
# Load the nycflight13 dataset
= sm.datasets.get_rdataset("flights", "nycflights13").data.drop(columns="time_hour") flights
다음 조건을 만족하는 항공편을 필터링 해보세요. (1~6)
IAH
or HOU
)flights
to find the most delayed flights. Find the flights that left earliest (예정시간보다 가장 일찍 출발한).dep_delay
or arr_delay
is missing) is slightly suboptimal. Why? Which is the most important column?
Challenges:
Which carrier has the worst arrival delays? Challenge: can you disentangle the effects of bad airports vs. bad carriers? Why/why not?
Which plane (tailnum
) has the worst on-time record?
Look at each destination. Can you find flights that are suspiciously fast? (i.e. flights that represent a potential data entry error).
Compute the air time of a flight relative to the shortest flight to that destination. Which flights were most delayed in the air?
** For each plane, count the number of flights before the first delay of greater than 1 hour.
np.cumsum
을 활용