
Weather prediction is one of the most practical applications of machine learning. Using historical weather data, we can easily create models forecasting temperature, rainfall, and other weather conditions.
In this article, weāll learn how to create a simple weather prediction AI model using Python. Here, weāll build two separate models:
- One to predict the maximum temperature using linear regression.
- Another one is to predict whether it will rain tomorrow using logistic regression.
Donāt worry, this tutorial is completely beginner-friendly. We will explain each part of the code to help you understand the logic behind this AI weather prediction model.
Visit Also: Create a Sentiment Analysis Project in Python using NLP
Requirements
Before we start coding, make sure the following are installed on your system:
- Python 3.8+
- A weather dataset (a CSV file)
- Install pandas, scikit-learn, and matplotlib using the following command:
pip install pandas scikit-learn matplotlib
Also, keep a weather dataset named weather.csv
in the same directory as your script. You can download the sample dataset from here or directly using the āDownloadā button:
š”ļø Temperature Prediction Model (Linear Regression)
Here, we will create a model that will predict the maximum temperature based on some key weather features.
Import the Required Libraries
import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split
pandas
: It is used for reading and processing the dataset.LinearRegression
: Weāll use this algorithm to predict temperature.train_test_split
: It helps us split the data into training and testing sets.
Load and Clean the Dataset
Real-world data often has missing values. Itās important to remove every row with missing entries:
data = pd.read_csv("weather.csv").dropna()
We load the weather.csv
file and remove any rows with missing values using .dropna()
function.
Select Features and Target
X = data[['MinTemp', 'Humidity9am', 'Pressure9am', 'WindSpeed9am']] y = data['MaxTemp']
X
includes the input features that affect temperature.y
is the target value ā the maximum temperature we want to predict.
Split Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Here, we divide the dataset into two parts using the train_test_split
function:
- 80% for training the model,
- 20% for testing how well it performs on unseen data.
Train the Linear Regression Model
model = LinearRegression() model.fit(X_train, y_train)
In the above code, we create a linear regression model and train it using the training data. The model learns how the input features affect the output temperature.
Evaluate the Model
print(f"R² Score: {model.score(X_test, y_test):.2f}")
The R² Score shows how well the model fits the data. A score closer to 1 means better accuracy.
Output
R² Score: 0.74
Temperature Prediction
new_data = pd.DataFrame([[10.0, 60, 1015, 15]], columns=X_train.columns) print(f"Predicted MaxTemp: {model.predict(new_data)[0]:.1f}°C")
Here, we use a new sample input to predict the maximum temperature. You can change the values to experiment with different weather conditions.
Output
R² Score: 0.74 Predicted MaxTemp: 23.0°C
š§ļø Rain Prediction Model (Logistic Regression)
Now we will create the rain prediction model using logistic regression.
Import Required Libraries
import pandas as pd from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, classification_report from sklearn.model_selection import train_test_split
From the above code, weāll use:
LogisticRegression
to build a binary classification model.accuracy_score
andclassification_report
to evaluate how well our model predicts rain.train_test_split()
to split our dataset into two parts:- Training Set ā The part of the data used to train our machine learning model.
- Testing Set ā The part of the data used to test how well our model performs on unseen data.
Load and Preprocess the Data
data = pd.read_csv("weather.csv").dropna(subset=['RainTomorrow']) data['RainTomorrow'] = data['RainTomorrow'].map({'Yes': 1, 'No': 0})
In the above code:
- We remove all the rows that donāt have a value for
RainTomorrow
. - Then, we convert the āYesā/āNoā labels into 1 and 0, which are easier for machine learning algorithms to work with.
Select Features and Target
features = ['MinTemp', 'MaxTemp', 'Humidity9am', 'Rainfall', 'Pressure9am'] X = data[features] y = data['RainTomorrow']
Here:
X
contains the features that we believe affect rain (temperature, humidity, rainfall, etc.).y
is the target column ā will it rain tomorrow (1 = Yes, 0 = No).
Split Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
As per the previous weather prediction, we divided the dataset into two parts, 80% for training and 20% for testing.
Train the Logistic Regression Model
model = LogisticRegression(max_iter=1000) model.fit(X_train, y_train)
Using the above code, we train our model to classify future weather as rainy or not. max_iter=1000
ensures the model gets enough iterations to learn well.
Evaluate the Model
y_pred = model.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}") print(classification_report(y_test, y_pred))
In the above code:
- Accuracy shows the percentage of correct predictions.
- Classification Report includes precision, recall, and F1-score ā these are important metrics for classification problems.
Predict if It Will Rain Tomorrow
new_data = pd.DataFrame([[15.0, 25.0, 80, 0.0, 1015]], columns=features) print("Will it rain tomorrow?", "Yes" if model.predict(new_data)[0] == 1 else "No")
We have given a sample input to test our model. You can adjust the input values to simulate various weather conditions and see if it predicts rain or not.
Output
Accuracy: 0.82 precision recall f1-score support 0 0.82 1.00 0.90 58 1 1.00 0.19 0.32 16 accuracy 0.82 74 macro avg 0.91 0.59 0.61 74 weighted avg 0.86 0.82 0.77 74 Will it rain tomorrow? No
Summary
In this tutorial, we learned how to build a Weather Prediction AI Model using Python. We created two separate models using scikit-learn:
- A Linear Regression model to predict the maximum temperature.
- A Logistic Regression model to predict whether it will rain tomorrow.
We also learned how to preprocess data, select features, split datasets, train models, and create real-world predictions ā all using just a few lines of Python code.
Remember, this is just the beginning. You can enhance these models using more real-world data, feature engineering, and advanced algorithms for even better predictions.
For any queries related to this topic, contact me at contact@pyseek.com.
Happy Coding!
Frequently Asked Questions
Q: Where can I find weather datasets?
A: Try Kaggle.
Q: Why is my accuracy low?
A: Try collecting more data or adding better features.
Q: Can I predict temperature instead?
A: Yes! Use LinearRegression instead of LogisticRegression.