Create a Weather Prediction AI Model using Python

Showing a person analyzing weather data on a laptop screen, featuring weather icons like sun, cloud, and rain, with keywords AI, Python, and Weather Prediction displayed, representing a weather prediction AI model in Python.

Weather prediction is one of the most practical applications of machine learning. Using historical weather data, we can easily create models forecasting temperature, rainfall, and other weather conditions.

In this article, we’ll learn how to create a simple weather prediction AI model using Python. Here, we’ll build two separate models:

Don’t worry, this tutorial is completely beginner-friendly. We will explain each part of the code to help you understand the logic behind this AI weather prediction model.

Visit Also: Create a Sentiment Analysis Project in Python using NLP

Requirements

Before we start coding, make sure the following are installed on your system:

  • Python 3.8+
  • A weather dataset (a CSV file)
  • Install pandas, scikit-learn, and matplotlib using the following command:
pip install pandas scikit-learn matplotlib

Also, keep a weather dataset named weather.csv in the same directory as your script. You can download the sample dataset from here or directly using the ā€˜Download’ button:

šŸŒ”ļø Temperature Prediction Model (Linear Regression)

Here, we will create a model that will predict the maximum temperature based on some key weather features.

Import the Required Libraries

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
  • pandas: It is used for reading and processing the dataset.
  • LinearRegression: We’ll use this algorithm to predict temperature.
  • train_test_split: It helps us split the data into training and testing sets.

Load and Clean the Dataset

Real-world data often has missing values. It’s important to remove every row with missing entries:

data = pd.read_csv("weather.csv").dropna()

We load the weather.csv file and remove any rows with missing values using .dropna() function.

Select Features and Target

X = data[['MinTemp', 'Humidity9am', 'Pressure9am', 'WindSpeed9am']]
y = data['MaxTemp']
  • X includes the input features that affect temperature.
  • y is the target value — the maximum temperature we want to predict.

Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Here, we divide the dataset into two parts using the train_test_split function:

  • 80% for training the model,
  • 20% for testing how well it performs on unseen data.

Train the Linear Regression Model

model = LinearRegression()
model.fit(X_train, y_train)

In the above code, we create a linear regression model and train it using the training data. The model learns how the input features affect the output temperature.

Evaluate the Model

print(f"R² Score: {model.score(X_test, y_test):.2f}")

The R² Score shows how well the model fits the data. A score closer to 1 means better accuracy.

Output

R² Score: 0.74

Temperature Prediction

new_data = pd.DataFrame([[10.0, 60, 1015, 15]], columns=X_train.columns)
print(f"Predicted MaxTemp: {model.predict(new_data)[0]:.1f}°C")

Here, we use a new sample input to predict the maximum temperature. You can change the values to experiment with different weather conditions.

Output

R² Score: 0.74
Predicted MaxTemp: 23.0°C

šŸŒ§ļø Rain Prediction Model (Logistic Regression)

Now we will create the rain prediction model using logistic regression.

Import Required Libraries

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split

From the above code, we’ll use:

  • LogisticRegression to build a binary classification model.
  • accuracy_score and classification_report to evaluate how well our model predicts rain.
  • train_test_split() to split our dataset into two parts:
    • Training Set – The part of the data used to train our machine learning model.
    • Testing Set – The part of the data used to test how well our model performs on unseen data.

Load and Preprocess the Data

data = pd.read_csv("weather.csv").dropna(subset=['RainTomorrow'])
data['RainTomorrow'] = data['RainTomorrow'].map({'Yes': 1, 'No': 0})

In the above code:

  • We remove all the rows that don’t have a value for RainTomorrow.
  • Then, we convert the ā€˜Yes’/’No’ labels into 1 and 0, which are easier for machine learning algorithms to work with.

Select Features and Target

features = ['MinTemp', 'MaxTemp', 'Humidity9am', 'Rainfall', 'Pressure9am']
X = data[features]
y = data['RainTomorrow']

Here:

  • X contains the features that we believe affect rain (temperature, humidity, rainfall, etc.).
  • y is the target column — will it rain tomorrow (1 = Yes, 0 = No).

Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

As per the previous weather prediction, we divided the dataset into two parts, 80% for training and 20% for testing.

Train the Logistic Regression Model

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

Using the above code, we train our model to classify future weather as rainy or not. max_iter=1000 ensures the model gets enough iterations to learn well.

Evaluate the Model

y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(classification_report(y_test, y_pred))

In the above code:

  • Accuracy shows the percentage of correct predictions.
  • Classification Report includes precision, recall, and F1-score — these are important metrics for classification problems.

Predict if It Will Rain Tomorrow

new_data = pd.DataFrame([[15.0, 25.0, 80, 0.0, 1015]], columns=features)
print("Will it rain tomorrow?", "Yes" if model.predict(new_data)[0] == 1 else "No")

We have given a sample input to test our model. You can adjust the input values to simulate various weather conditions and see if it predicts rain or not.

Output

Accuracy: 0.82
              precision    recall  f1-score   support

           0       0.82      1.00      0.90        58
           1       1.00      0.19      0.32        16

    accuracy                           0.82        74
   macro avg       0.91      0.59      0.61        74
weighted avg       0.86      0.82      0.77        74

Will it rain tomorrow? No

Summary

In this tutorial, we learned how to build a Weather Prediction AI Model using Python. We created two separate models using scikit-learn:

  • A Linear Regression model to predict the maximum temperature.
  • A Logistic Regression model to predict whether it will rain tomorrow.

We also learned how to preprocess data, select features, split datasets, train models, and create real-world predictions — all using just a few lines of Python code.

Remember, this is just the beginning. You can enhance these models using more real-world data, feature engineering, and advanced algorithms for even better predictions.

For any queries related to this topic, contact me at contact@pyseek.com.

Happy Coding!

Frequently Asked Questions

Q: Where can I find weather datasets?

A: Try Kaggle.

Q: Why is my accuracy low?

A: Try collecting more data or adding better features.

Q: Can I predict temperature instead?

A: Yes! Use LinearRegression instead of LogisticRegression.

Share your love
Subhankar Rakshit
Subhankar Rakshit

Hey there! I’m Subhankar Rakshit, the brains behind PySeek. I’m a Post Graduate in Computer Science. PySeek is where I channel my love for Python programming and share it with the world through engaging and informative blogs.

Articles:Ā 214