Excel Spreadsheet Manipulation in Python: A Complete Tutorial

working with excel spreadsheet in python using openpyxl library

Introduction

Excel is a widely used tool for data analysis, but sometimes it can be difficult to manipulate and analyze large datasets efficiently in Excel. Python, on the other hand, is a powerful programming language that offers a vast array of tools for data analysis and manipulation.

Python provides several libraries for working with different file formats. One such library is `openpyxl`, which is a Python library used for working with Excel spreadsheets. This library allows developers to manipulate Excel files using Python code, making it a handy tool for data analysis and processing. In this article, we will explore how to work with Excel spreadsheets using the `openpyxl` library.

Installing openpyxl

Before we start working with openpyxl, we need to install the library. The easiest way to install the library is to use pip, which is a package manager for Python. To install openpyxl, open your command prompt or terminal and enter the following command:

pip install openpyxl

The Basics

This section covers the fundamental concepts of using Python with Excel Spreadsheets. Additionally, the Advanced section below provides numerous techniques with a range of features.

Working with Excel Spreadsheets

Once you have installed openpyxl, you can start working with Excel spreadsheets. The first step is to open the Excel file that you want to work with. You can do this using the `load_workbook()` method of the openpyxl library.


from openpyxl import load_workbook

workbook = load_workbook(filename="example.xlsx")

This code loads an Excel file called “example.xlsx” into a workbook object.

Accessing Sheets

Once you have loaded the workbook, you can access the sheets within the workbook using the sheet-names attribute of the workbook object.


sheet = workbook['Sheet1']

This code retrieves the sheet called “Sheet1” from the workbook.

Reading Cell Values

You can read the value of a cell in a sheet by specifying its row and column index.


value = sheet.cell(row=1, column=1).value

This code reads the value of the cell in the first row and first column of the sheet.

Writing to a Cell

You can write to a cell in a sheet by specifying its row and column index.


sheet.cell(row=1, column=1).value = "Hello, World!"

This code writes the string “Hello, World!” to the cell in the first row and first column of the sheet.

Iterating Through Rows and Columns

You can iterate through rows and columns in a sheet using the for loops. The `max_row` and `max_column` attributes of the sheet object can be used to determine the last row and column of the sheet.


for row in range(1, sheet.max_row + 1):
    for column in range(1, sheet.max_column + 1):
        value = sheet.cell(row=row, column=column).value
        print(value)

This code iterates through all the cells in the sheet and prints their values.

Filtering Rows

You can filter rows in a sheet based on a condition using the `filter()` method of the sheet object. In the following example, we filter rows based on the value in the second column.


for row in sheet.iter_rows(min_row=2, min_col=2, max_col=2):
    for cell in row:
        if cell.value == "Yes":
            print(cell.offset(column=-1).value)

This code filters rows based on the value “Yes” in the second column and prints the value in the first column.

Saving Changes

After you have made changes to the Excel spreadsheet, you need to save them. You can do this using the `save()` method of the workbook object.


workbook.save(filename="example.xlsx")

This code saves the changes made to the workbook object back to the Excel file called “example.xlsx”.

Advanced Tutorial

In this section, we will explore multiple tasks involving Excel Spreadsheets that have the potential to elevate the complexity of this topic towards an Advanced Tutorial.

Reading Data from Multiple Cells

Here is an example of how to read data from multiple cells in an Excel spreadsheet using Python:


from openpyxl import load_workbook

# Load the workbook
workbook = load_workbook(filename='example.xlsx')

# Select the sheet you want to work with
sheet = workbook['Sheet1']

# Create an empty list to hold the data
data = []

# Iterate through the rows and columns you want to read from
for row in sheet.iter_rows(min_row=2, min_col=1, max_row=5, max_col=3):
    # Create a list to hold the values for each row
    row_data = []
    for cell in row:
        # Append each cell value to the row_data list
        row_data.append(cell.value)
    # Append the row_data list to the data list
    data.append(row_data)

# Print the data
print(data)

In
this example, we are reading data from cells A2 to C5 on the “Sheet1”
worksheet. We use the `iter_rows()` method to iterate over the rows and
columns we want to read from. We specify the range of rows and columns
using the min_row, min_col, max_row, and max_col parameters. In this
case, we want to start at row 2 (min_row=2) and column 1 (min_col=1) and
end at row 5 (max_row=5) and column 3 (max_col=3).

Inside
the for loop, we create a list called row_data to hold the values for
each row. We iterate over each cell in the row using the for loop and
append the value of each cell to the row_data list. After we have
iterated over all the cells in the row, we append the row_data list to
the data list. This creates a list of lists, where each inner list
contains the values for one row of cells.

Finally,
we print the data list to the console. This will output a list of lists
that contains the values from the cells we read from the Excel
spreadsheet.

You can also read data from multiple cells using cell names. Here’s an example code snippet that demonstrates how to achieve this:


import openpyxl

def read_cells(filename, cell_names):
    workbook = openpyxl.load_workbook(filename)
    sheet = workbook.active

    data = {}

    for cell_name in cell_names:
        cell = sheet[cell_name]
        data[cell_name] = cell.value

    workbook.close()

    return data

# Usage example
filename = 'example.xlsx'
cell_names = ['A1', 'B2', 'C3']

result = read_cells(filename, cell_names)
print(result)

In the code above, we define the `read_cells` function that takes a filename (path to the Excel file) and a list of cell names as input. The function loads the workbook, accesses the active sheet, and iterates over the provided cell names.

For each cell name, it retrieves the corresponding cell from the sheet using `sheet[cell_name]` and stores the cell value in a dictionary called `data`, where the cell name is the key.

Finally, the function returns the `data` dictionary containing the cell values. In the example usage, the function is called with a filename and a list of cell names, and the resulting dictionary is printed.

Writing to an Excel Spreadsheet

In this scenario, our objective is to generate a fresh Excel spreadsheet where we can write data. The process will automatically create the file for us. Here is a sample piece of code for reference:


import openpyxl

def write_cells(filename, data):
    workbook = openpyxl.Workbook()
    sheet = workbook.active

    for cell_name, value in data.items():
        sheet[cell_name] = value

    workbook.save(filename)
    workbook.close()

# Usage example
filename = 'example.xlsx'
data = {'A1': 'Hello', 'B2': 'World', 'C3': 42}

write_cells(filename, data)

In the code above, we define the `write_cells` function that takes a filename (path to the Excel file) and a dictionary of data as input. The function creates a new workbook using `openpyxl.Workbook()` and accesses the active sheet.

It then iterates over the items in the `data` dictionary. For each key-value pair, it assigns the value to the corresponding cell in the sheet using `sheet[cell_name] = value`.

Finally, the function saves the workbook to the specified filename using `workbook.save(filename)` and closes the workbook.

In the example usage, the function is called with a filename and a dictionary of data, where the keys are the cell names and the values are the data to be written. The code will create a new Excel file or overwrite the existing one if it already exists, and write the data to the specified cells.

Append Data to an Excel Spreadsheet

Here, we will add data to existing Excel spreadsheets, a process commonly referred to as appending. Here’s an illustrative example to demonstrate this process:


import openpyxl

def append_to_sheet(filename, sheet_name, data):
    workbook = openpyxl.load_workbook(filename)
    sheet = workbook[sheet_name]

    for row in data:
        sheet.append(row)

    workbook.save(filename)
    workbook.close()

# Usage example
filename = 'example.xlsx'
sheet_name = 'Sheet1'
data = [
    ['John', 25, 'Engineer'],
    ['Lisa', 30, 'Manager'],
    ['David', 28, 'Analyst']
]

append_to_sheet(filename, sheet_name, data)

In the code above, we define the `append_to_sheet` function that takes a filename (path to the Excel file), sheet name, and a list of rows of data as input. The function loads the existing workbook using `openpyxl.load_workbook()` and accesses the specified sheet.

It then iterates over each row in the data list and appends it to the sheet using `sheet.append(row)`. The `row` variable represents a list containing the values for each column in the respective row.

Finally, the function saves the workbook to the specified filename using `workbook.save(filename)` and closes the workbook.

In the example usage, the function is called with a filename, sheet name, and a list of rows of data. Each row in the data list contains values for each column. The code will open the existing Excel file, navigate to the specified sheet, and append the data rows to the sheet.

Adding New Rows and Columns to an Excel Spreadsheet

In the case of an existing Excel spreadsheet, it is possible to incorporate additional rows and columns. In the following explanation, we will explore the steps required to accomplish this task.


import openpyxl

def append_rows_columns(filename, sheet_name, new_rows, new_columns):
    workbook = openpyxl.load_workbook(filename)
    sheet = workbook[sheet_name]

    # Appending new rows
    for row_data in new_rows:
        sheet.append(row_data)

    # Appending new columns
    for column_data in new_columns:
        sheet.append(column_data)

    workbook.save(filename)
    workbook.close()

# Usage example
filename = 'example.xlsx'
sheet_name = 'Sheet1'
new_rows = [
    ['John', 25, 'Engineer'],
    ['Lisa', 30, 'Manager'],
    ['David', 28, 'Analyst']
]
new_columns = [
    ['NewColumn1', 'NewColumn2', 'NewColumn3'],
    [1, 2, 3],
    ['A', 'B', 'C']
]

append_rows_columns(filename, sheet_name, new_rows, new_columns)

In the code above, we define the `append_rows_columns` function that takes a filename (path to the Excel file), sheet name, lists of new rows, and lists of new columns as input. The function loads the existing workbook using `openpyxl.load_workbook()` and accesses the specified sheet.

To append new rows, it iterates over each row data in the `new_rows` list and uses `sheet.append(row_data)` to add the row to the sheet.

To append new columns, it iterates over each column data in the `new_columns` list and uses `sheet.append(column_data)` to add the column to the sheet.

Finally, the function saves the workbook to the specified filename using `workbook.save(filename)` and closes the workbook.

In the example usage, the function is called with a filename, sheet name, lists of new rows, and lists of new columns. Each new row or column is represented by a list of values. The code will open the existing Excel file, navigate to the specified sheet, and append the new rows and columns to the sheet.

Arithmetic Operations on an Excel Spreadsheet

By utilizing the `openpyxl` library along with the built-in mathematical operators, it is possible to execute diverse arithmetic operations on an Excel spreadsheet.


import openpyxl

def perform_arithmetic(filename, sheet_name, cell1, cell2, operator):
    workbook = openpyxl.load_workbook(filename)
    sheet = workbook[sheet_name]

    # Accessing cell values
    value1 = sheet[cell1].value
    value2 = sheet[cell2].value

    # Performing arithmetic operation
    if operator == '+':
        result = value1 + value2
    elif operator == '-':
        result = value1 - value2
    elif operator == '*':
        result = value1 * value2
    elif operator == '/':
        result = value1 / value2

    # Writing result to a new cell
    new_cell = 'C1'  # Choose a cell to store the result
    sheet[new_cell] = result

    workbook.save(filename)
    workbook.close()

# Usage example
filename = 'example.xlsx'
sheet_name = 'Sheet1'
cell1 = 'A1'
cell2 = 'B1'
operator = '+'

perform_arithmetic(filename, sheet_name, cell1, cell2, operator)

In the code above, we define the `perform_arithmetic` function that takes a filename (path to the Excel file), sheet name, cell references for two input cells, and an operator as input. The function loads the existing workbook using `openpyxl.load_workbook()` and accesses the specified sheet.

It retrieves the values of `cell1` and `cell2` using `sheet[cell1].value` and `sheet[cell2].value`, respectively.

Next, it performs the arithmetic operation based on the provided operator. In the example code, the supported operators are `+` (addition), `` (subtraction), `*` (multiplication), and `/` (division).

After obtaining the result, it writes the result to a new cell using `sheet[new_cell] = result`, where `new_cell` is a cell reference (e.g., ‘C1’).

Finally, the function saves the workbook to the specified filename using `workbook.save(filename)` and closes the workbook.

In the example usage, the function is called with a filename, sheet name, cell references for two input cells, and an operator. The code will open the existing Excel file, access the specified sheet, perform the arithmetic operation, and store the result in a new cell.

Plotting Charts on an Excel Spreadsheet

By utilizing the capabilities of the openpyxl library, you have the ability to plot different types of charts directly on an Excel Spreadsheet. This example demonstrates how to create a bar chart, but you can explore other chart types and customization options in the openpyxl documentation to create different types of charts. Let’s explore the following steps to gain a better understanding.

1. Import the necessary modules:


import openpyxl
from openpyxl.chart import BarChart, Reference

2. Load the Excel spreadsheet using `openpyxl`:


# Load the workbook
workbook = openpyxl.load_workbook('example.xlsx')

# Select the sheet you want to work with
sheet = workbook['Sheet1']

3. Create a reference to the data you want to plot:


# Define the data range for the chart
data = Reference(sheet, min_col=1, min_row=1, max_col=3, max_row=6)

4. Create a new chart object:


# Create a bar chart
chart = BarChart()

5. Assign the data to the chart:


# Add the data to the chart
chart.add_data(data)

6. Add the chart to the worksheet:


# Add the chart to the worksheet
sheet.add_chart(chart, "E2")

7. Save the modified workbook:


# Save the workbook
workbook.save('example.xlsx')

In this example, the data range for the chart is specified as columns 1 to 3 and rows 1 to 6. Adjust these values according to your data range. The `add_chart()` method is used to add the chart to the worksheet, and the argument “E2” specifies the top-left cell where the chart will be placed.

Remember to replace `‘example.xlsx’` with the actual path and filename of your Excel spreadsheet.

Adding an Image to an Excel Spreadsheet

Adding images to an Excel Spreadsheet is made possible by the functionality provided by the openpyxl library. Here’s an example to accomplish this.


from openpyxl import Workbook
from openpyxl.drawing.image import Image

# Create an Excel workbook
workbook = Workbook()

# Select the active worksheet
worksheet = workbook.active

# Replace with the actual path to your image file
image_path = 'path_to_image/image.jpg' 

# Load the image
image = Image(image_path)

# Add the image to the worksheet
# Specify the cell location where the image should be placed
worksheet.add_image(image, 'A1')

# Save the workbook
workbook.save('example.xlsx')

Replace ‘example.xlsx’ with the desired filename and path where you want to save the Excel file.

In this example, we assume that you have an image file named ‘image.jpg’ and provide the path to that image file. Make sure to replace ‘path_to_image’ with the actual path to your image file.

This example demonstrates how to add a single image to an Excel spreadsheet. If you want to add multiple images, you can repeat steps 4 and 5 for each image, specifying different cell locations.

Note that the image will be embedded within the Excel file, so make sure to provide the correct path to the image file when loading it.

Setting Font Styles of an Excel Spreadsheet

The subsequent steps showcase how you can customize font styles in Excel spreadsheets using Python and the openpyxl library.

1. Import the necessary modules:


from openpyxl import Workbook
from openpyxl.styles import Font

2. Create an Excel workbook and select the active worksheet:


# Create an Excel workbook
workbook = Workbook()

# Select the active worksheet
worksheet = workbook.active

3. Customize the font style for a specific cell:


# Customize the font style
cell = worksheet['A1']
font = Font(name='Arial', size=12, bold=True, italic=True, color='FF0000')

# Apply the font style to the cell
cell.font = font

In this example, the font style for cell ‘A1‘ is customized. Adjust the cell reference and the font properties (`name`, `size`, `bold`, `italic`, `color`) according to your requirements.

4. Save the workbook as an Excel file:


# Save the workbook
workbook.save('example.xlsx')

In this example, the font style for cell ‘A1’ is customized. Adjust the cell reference and the font properties (name, size, bold, italic, color) according to your requirements.

Replace ‘example.xlsx‘ with the desired filename and path where you want to save the Excel file.

In addition to the properties shown in the example (name, size, bold, italic, color), the `Font` class in `openpyxl` provides many other options for customizing font styles, such as underline, strike-through, subscript, superscript, and more. You can explore the `Font` class in the `openpyxl` documentation for more customization options.

Remember to specify the cell or range of cells where you want to apply the font style by accessing the corresponding cell or range of cells in the worksheet (e.g., `worksheet[‘A1’]`, `worksheet[‘A1:C3’]`).

Styling Cells of an Excel Spreadsheet

By manually adjusting cell properties individually, you can achieve the same outcome in Python using the openpyxl library. Here is an illustrative example:


from openpyxl import Workbook
from openpyxl.styles import Font, Color, PatternFill, Alignment

# Create a new workbook and select the active sheet
workbook = Workbook()
sheet = workbook.active

# Set some cell values
sheet['A1'] = "Hello"
sheet['B1'] = "World!"

# Apply styling to cells
font = Font(color=Color(rgb="FF0000"), bold=True)
fill = PatternFill(fill_type="solid", fgColor=Color(rgb="FFFF00"))
alignment = Alignment(horizontal="center", vertical="center")

sheet['A1'].font = font
sheet['B1'].fill = fill
sheet['B1'].alignment = alignment

# Save the workbook
workbook.save("styled_spreadsheet.xlsx")

In this example, we create a new workbook and select the active sheet. We then set values for cells ‘A1’ and ‘B1’. Next, we define a `Font` object with red color and bold style, a `PatternFill` object with yellow color, and an `Alignment` object with centered horizontal and vertical alignment.

Finally, we assign these style objects to the corresponding cells using the `font`, `fill`, and `alignment` properties of the cell objects. Afterward, we save the workbook to a file called “styled_spreadsheet.xlsx”.

You can customize the styling further by exploring other attributes and properties provided by the `Font`, `Color`, `PatternFill`, and `Alignment` classes from `openpyxl.styles`.

Create Excel Formulas

You can generate a variety of Excel formulas using the openpyxl library in Python. These formulas can incorporate a range of functions, operators, and cell references to suit your specific requirements. To gain a clearer understanding, follow the subsequent steps.

1. Import the necessary modules:


from openpyxl import Workbook
from openpyxl.utils import get_column_letter

2. Create an Excel workbook and select the active worksheet:


# Create an Excel workbook
workbook = Workbook()

# Select the active worksheet
worksheet = workbook.active

3. Write data to the worksheet:


# Write data to the worksheet
worksheet['A1'] = 5
worksheet['A2'] = 10
worksheet['A3'] = 15
worksheet['A4'] = 20
worksheet['B1'] = 3
worksheet['B2'] = 4
worksheet['B3'] = 5
worksheet['B4'] = 6

4. Create different formulas and assign them to cells:


# Formula 1: Sum of values in a range
cell1 = worksheet['C1']
cell1.value = '=SUM(A1:A4)'

# Formula 2: Average of values in a range
cell2 = worksheet['C2']
cell2.value = '=AVERAGE(A1:A4)'

# Formula 3: Multiplication of two cells
cell3 = worksheet['C3']
cell3.value = '=B1 * B2'

# Formula 4: Concatenation of two cells
cell4 = worksheet['C4']
cell4.value = '=CONCATENATE(A1, B1)'

In this example, we demonstrate different types of formulas:

  • Formula 1 calculates the sum of values in the range A1 to A4.
  • Formula 2 calculates the average of values in the range A1 to A4.
  • Formula 3 multiplies the values of cells B1 and B2.
  • Formula 4 concatenates the values of cells A1 and B1.

You can create and assign other formulas based on your specific requirements. Make sure to use the appropriate formula syntax and adjust the cell references accordingly.

5. Save the workbook as an Excel file:


# Save the workbook
workbook.save('example.xlsx')

Replace ‘example.xlsx‘ with the desired filename and path where you want to save the Excel file.

Miscellaneous

We are almost at the very end of this tutorial. In this section, we will explore how to perform operations such as adding, updating, duplicating, and removing Excel sheets using Python. Refer to the following example to enhance your comprehension.


from openpyxl import Workbook, load_workbook

# Load an existing workbook
workbook = load_workbook('example.xlsx')

# Get the active sheet
active_sheet = workbook.active

# Add a new sheet
new_sheet = workbook.create_sheet(title="New Sheet")

# Update sheet name
active_sheet.title = "Updated Sheet Name"

# Remove a sheet
sheet_to_remove = workbook['Sheet2']
workbook.remove(sheet_to_remove)

# Duplicate a sheet
original_sheet = workbook['Sheet1']
duplicated_sheet = workbook.copy_worksheet(original_sheet)
duplicated_sheet.title = "Duplicated Sheet"

# Save the modified workbook
workbook.save('modified_example.xlsx')

In this example, we first load an existing workbook called `example.xlsx`. We then retrieve the active sheet using the `active` attribute. To add a new sheet, we use the `create_sheet()` method and specify the desired title. The `title` parameter can be set to any string.

To update the name of the active sheet, we simply assign a new value to the `title` attribute.

For removing a sheet, we select the sheet to be removed (in this case, ‘Sheet2’) using its name or by accessing it through its key. Then we use the `remove()` method of the workbook object.

To duplicate a sheet, we first select the original sheet (in this case, ‘Sheet1’) and use the `copy_worksheet()` method of the workbook. We assign the duplicated sheet a new title and modify it as needed.

Finally, we save the modified workbook to a new file called ‘modified_example.xlsx’ using the `save()` method.

Remember to replace the file names and sheet names according to your requirements when using the code.

Exercises

Here are some exercises to practice working with Excel files using the `openpyxl` library in Python:

Exercise 1: Reading and Printing Data

Write a Python program that reads an Excel file named “data.xlsx” and prints the contents of each cell in the first sheet.

Exercise 2: Writing Data

Create a new Excel file named “output.xlsx” and write the following data into it:

  • In cell A1, write “Name”
  • In cell B1, write “Age”
  • In cell A2, write your name
  • In cell B2, write your age

Exercise 3: Updating Existing Data

Modify the “output.xlsx” file from Exercise 2 and update your age to the current year minus your birth year (assuming you wrote your birth year in cell B2).

Exercise 4: Formatting Cells

Write a Python program that reads an Excel file named “data.xlsx”, applies the following formatting to the cells in the first sheet, and saves the modified file as “formatted_data.xlsx”:

  • Set the font style to bold for the headers (first row).
  • Apply a background color to the header cells.
  • Apply a number format to the “Salary” column, displaying values with two decimal places. 

Exercise 5: Working with Multiple Sheets

Create a new Excel file named “multisheet.xlsx” with two sheets named “Sheet1” and “Sheet2”. Write some random data into each sheet, and then read and print the contents of both sheets.

Exercise 6: Data Manipulation

Write a Python program that reads an Excel file named “data.xlsx” with a sheet named “Sheet1”. The sheet contains the following columns: “Name”, “Age”, and “Gender”. Your program should calculate and print the average age of all individuals in the sheet.

These exercises cover various aspects of working with Excel files using the `openpyxl` library. Feel free to explore the library’s documentation for more advanced operations and functionalities.

Conclusion

The openpyxl library is a useful tool for working with Excel spreadsheets in Python. It allows you to access and modify the contents of Excel spreadsheets using Python code. With openpyxl, you can read and write cell values, access and modify sheets and save changes to Excel files. By combining the power of Python with the convenience of Excel, you can take your data analysis and processing to the next level.

Share your love
Subhankar Rakshit
Subhankar Rakshit

Hey there! I’m Subhankar Rakshit, the brains behind PySeek. I’m a Post Graduate in Computer Science. PySeek is where I channel my love for Python programming and share it with the world through engaging and informative blogs.

Articles: 147