In this blog we will be doing Data cleaning techniques, Data visualization, and hypothesis testing on change in weather of Finland for past 10 years
“The Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming” following is the Hypothesis for the analysis.
The Hypothesis means we need to find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. This monthly analysis has to be done for all 12 months over the 10 year period. So you are basically resampling your data from hourly to monthly, then comparing the same month over the 10 year period. Support your analysis by appropriate visualizations using matplotlib and / or seaborn library.
Step 1: Importing of libraries and Dataset.
Step 2: Looking at the dataset.
Step 3: Cleaning Dataset.
Step 4: Plotting of Data.
Here is link of my Entire code if you like to refer https://colab.research.google.com/drive/1MacKo_MLx6TmVoS4XdZiAVwoaGCzKa8I?usp=sharing
Step 1: Importing of libraries and Dataset
Here we are importing pandas ,numpy , matplotlib .Pandas is mainly used for data analysis. Pandas allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features.NumPy is an open-source numerical Python library. It can be utilised to perform a number of mathematical operations on arrays such as trigonometric, statistical, and algebraic routines.Matplotlib. pyplot is a collection of functions that make matplotlib work like MATLAB. Each pyplot function makes some change to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting area, decorates the plot with labels, etc. In matplotlib.
We are importing and calculating the data it contents .Here is link of original dataset :https://www.kaggle.com/muthuj7/weather-dataset
Step 2: Looking at the dataset.
Step 3: Cleaning Dataset
In this step we will prepare our data for the plotting , we will first drop the unwanted columns (all except temperature and humidity) .
Then we are checking null present in selected columns
And converting the Timezone to +00:00 UTC .
Step 4: Plotting of Data
In is final step we will plot the data to for the analysis ,
Firstly we will plot the whole dataset for all months .
As we can see temperature increases sharply first and then drops sharply to same level as repeatedly for 10 years whereas their is no change in humidity in past 10 years.
As we can analyze there isn’t any change in humidity in past 10 years(2006–2010) for the month of April. where as , temperature increases sharply in 2009 and drops in 2015 for rest of the years there isn’t any sharp change in the temperature.
Comparing all 12 months analysis in it the month of April to the month of august there is slightly change in temperature but nearly no change in humidity for the 10 years(2006–2010) . Whereas for the month from September to march there is a vast change in the temperature but again humidity remains unchanged.
In this 10 years of Dataset, we can see as per year increases Apparent temperature and humidity are not related. For all the year monthly average humidity is the same but the Apparent temperature is different. Global warming is affecting the earth’s temperature so that we see some uncertainty in this data.