Load Packages
# numerical calculation & data frames
import numpy as np
import pandas as pd
# visualization
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn.objects as so
# statistics
import statsmodels.api as smR for Data Science by Wickham & Grolemund
# numerical calculation & data frames
import numpy as np
import pandas as pd
# visualization
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn.objects as so
# statistics
import statsmodels.api as sm# pandas options
pd.set_option("mode.copy_on_write", True)
pd.options.display.precision = 2
pd.options.display.float_format = '{:.2f}'.format # pd.reset_option('display.float_format')
pd.options.display.max_rows = 7
# Numpy options
np.set_printoptions(precision = 2, suppress=True)# Load the nycflight13 dataset
flights = sm.datasets.get_rdataset("flights", "nycflights13").data.drop(columns="time_hour")다음 조건을 만족하는 항공편을 필터링 해보세요. (1~6)
IAH or HOU)flights to find the most delayed flights. Find the flights that left earliest (예정시간보다 가장 일찍 출발한).dep_delay or arr_delay is missing) is slightly suboptimal. Why? Which is the most important column?
Challenges:
Which carrier has the worst arrival delays? Challenge: can you disentangle the effects of bad airports vs. bad carriers? Why/why not?
Which plane (tailnum) has the worst on-time record?
Look at each destination. Can you find flights that are suspiciously fast? (i.e. flights that represent a potential data entry error).
Compute the air time of a flight relative to the shortest flight to that destination. Which flights were most delayed in the air?
** For each plane, count the number of flights before the first delay of greater than 1 hour.
np.cumsum을 활용