Working with CSV files in Python

Image of three girls actively conversing. One girl holds two books with green covers, while a CSV file icon hovers above them. A nearby text reads 'Working with CSV Files in Python,' implying a cooperative learning and problem-solving setting.

Introduction

CSV (Comma-Separated Values) files are a widely used format for storing tabular data. They provide a simple and human-readable way to represent data, making them a popular choice for data interchange between different applications and systems.

Python, with its rich ecosystem of libraries, offers powerful tools for working with CSV files, enabling developers to easily read, write, and manipulate data. In this article, we will explore the basics to advanced of working with CSV files in Python.

Introduction to CSV Files

A CSV file consists of rows, each representing a record, and columns, which contain values for specific attributes. Columns are separated by a delimiter, usually a comma, but other delimiters like tabs or semicolons can also be used. Each row ends with a line break character.

Example CSV file (Get the file from here: ‘employee.csv‘):


Name,Age,Contact,Country
Alice,28,512-123-8182,USA
Bob,35,512-123-8183,UK
Charlie,22,512-123-8184,Canada

Reading CSV Files

Reading a CSV file using `csv.reader`

Python’s built-in `csv` module provides a straightforward way to read and write CSV files. To read a CSV file, you can use the `csv.reader` class:


import csv

with open('employee.csv', 'r') as file:
    csv_reader = csv.reader(file)
    for row in csv_reader:
        print(row)

Output

[‘Name’, ‘Age’, ‘Contact’, ‘Country’]
[‘Alice’, ’28’, ‘512-123-8182’, ‘USA’]
[‘Bob’, ’35’, ‘512-123-8183’, ‘UK’]
[‘Charlie’, ’22’, ‘512-123-8184’, ‘Canada’]

Reading CSV into Dictionaries

While the basic `csv.reader` returns rows as lists, you can also read CSV data into dictionaries for more structured access. The `csv.DictReader` class offers reading CSV file data into a Python Dictionary.

The primary benefit of opting for the second approach is that it provides you with an organized set of data for your tasks.


import csv

with open('employee.csv', 'r') as file:
    csv_dict_reader = csv.DictReader(file)
    for row in csv_dict_reader:
        print(row['Name'],row['Age'],row['Contact'],row['Country'])

Output

Alice 28 512-123-8182 USA
Bob 35 512-123-8183 UK
Charlie 22 512-123-8184 Canada

Writing CSV Files

Writing to a CSV file using `csv.writer`

To write data to a CSV file, you can use the `csv.writer` class. Take a quick look at the information in the ’employee.csv’ file shown above. We’re going to include an additional pair of data rows immediately adjacent to the current ones.

In the following code snippet, you’ll observe that the CSV file is opened in the append (‘a’) mode. With this approach, fresh data will be inserted alongside the current dataset.

If you intend to replace the existing data with new information, opt for the write (‘w’) mode instead of the append (‘a’) mode.


import csv

headers = ['Name','Age','Contact','Country']
rows = [['Alex','24',"512-123-8182",'Italy'],
['Peter','23',"650-231-8188",'Spain']
]

with open('employee.csv','a') as file:
    csv_writer = csv.writer(file)
    # csv_writer.writerow(headers)
    csv_writer.writerows(rows)

Output

The current arrangement of data within the ’employee.csv’ file is now as follows:

Name,Age,Contact,Country
Alice,28,512-123-8182,USA
Bob,35,512-123-8183,UK
Charlie,22,512-123-8184,Canada
Alex,24,512-123-8182,Italy
Peter,23,650-231-8188,Spain

The `.writerow()` method is employed to insert a single data row, while `.writerows()` is utilized to add multiple rows that are organized within a Python list, dictionary, tuple, and so on.

As we’ve utilized the append mode on the non-empty ’employee.csv’ file, there’s no necessity to include the header rows more than once. This is the reason we’ve added a hashtag(‘#’) to that line (the yellow one).

If you’re contemplating generating a fresh file or overwriting the data, you should include the header rows. In such a scenario, eliminate the hashtag (‘#’) from that location (the yellow line).

Writing to a CSV file using `csv.DictWriter`

An alternative approach to writing data in a CSV file involves utilizing the `csv.DictWriter` class. This method offers better clarity and control over the order of columns and the data being written.

In this instance, we will be demonstrating how to input a data dictionary into a CSV file, complete with headers. For this scenario, we will be storing the information by employing the write (‘w’) method rather than the append (‘a’) method.


import csv

headers = ['Name','Age','Contact','Country']
rows = [{'Name':'John','Age':'26','Contact':"512-123-6841",'Country':'USA'},
{'Name':'Harry','Age':'27','Contact':"650-231-2078",'Country':'UK'}
]

with open('employee.csv','w') as file:
    csv_writer = csv.DictWriter(file, headers)
    csv_writer.writeheader()
    csv_writer.writerows(rows)

Output

The contents of the ’employee.csv’ file now stand as follows:

Name,Age,Contact,Country
John,26,512-123-6841,USA
Harry,27,650-231-2078,UK

Customizing CSV Output

You can customize the CSV output by specifying delimiters, quote characters, and quoting styles while writing CSV data:


import csv

data = [
    ['Name', 'Age', 'Country'],
    ['Alice', 28, 'USA'],
    ['Bob', 35, 'Canada'],
    ['Charlie', 22, 'UK']
]

with open('output.csv', 'w', newline='') as file:
    csv_writer = csv.writer(file, delimiter=';', 
    quotechar='"', quoting=csv.QUOTE_MINIMAL)

    csv_writer.writerows(data)

Output

The data within the ‘output.csv’ file is currently presented in the following manner:

Name;Age;Country
Alice;28;USA
Bob;35;Canada
Charlie;22;UK

Manipulating CSV Data

Filtering and Transforming Data

Python’s list comprehensions make it easy to filter and transform CSV data. In this scenario, we will display all the rows where the age exceeds 25.

import csv

with open('employee.csv', 'r') as file:
    csv_dict_reader = csv.DictReader(file)
    filtered_data = 
    [row for row in csv_dict_reader if int(row['Age']) > 25]
    for row in filtered_data:
        print(row)

Output

{‘Name’: ‘John’, ‘Age’: ’26’, ‘Contact’: ‘512-123-6841’, ‘Country’: ‘USA’}
{‘Name’: ‘Harry’, ‘Age’: ’27’, ‘Contact’: ‘650-231-2078’, ‘Country’: ‘UK’}

Aggregating and Summarizing Data

You can use the `collections` module to perform aggregation and summary tasks on CSV data:


import csv
from collections import defaultdict

age_count = defaultdict(int)

with open('employee.csv', 'r') as file:
    csv_dict_reader = csv.DictReader(file)
    for row in csv_dict_reader:
        age_count[int(row['Age'])] += 1

print(age_count)

Output

defaultdict(<class ‘int’>, {26: 1, 27: 1})

Joining and Merging CSV Files

Method 1: Merging Two CSV Files Based on a Key Column

Suppose you have two CSV files: `file1.csv` and `file2.csv`, and you want to merge them based on a common key column, such as `id`. Here’s how you can do it:


import csv

data1 = []
with open('file1.csv', 'r') as file1:
    csv_reader = csv.DictReader(file1)
    for row in csv_reader:
        data1.append(row)

data2 = []
with open('file2.csv', 'r') as file2:
    csv_reader = csv.DictReader(file2)
    for row in csv_reader:
        data2.append(row)

merged_data = []
for row1 in data1:
    for row2 in data2:
        if row1['id'] == row2['id']:
            merged_row = {**row1, **row2}
            merged_data.append(merged_row)
            
# Union of keys from both dictionaries
fieldnames = data1[0].keys() | data2[0].keys()  
with open('merged.csv', 'w', newline='') as merged_file:
    csv_writer = csv.DictWriter(merged_file, fieldnames=fieldnames)
    csv_writer.writeheader()
    csv_writer.writerows(merged_data)

Method 2: Joining Two CSV Files Based on a Key Column

Joining CSV files is a similar process to merging, but it typically involves adding selected columns from one file to another based on a common key. Here’s how you can perform a join using the `csv` module:


import csv

joined_data = []
for row1 in data1:
    for row2 in data2:
        if row1['id'] == row2['id']:
            joined_row = 
            {'id': row1['id'], 'column1': row1['column1'], 
            'column2': row2['column2']}
            joined_data.append(joined_row)

# Selected columns for the join
fieldnames = ['id', 'column1', 'column2']  
with open('joined.csv', 'w', newline='') as joined_file:
    csv_writer = csv.DictWriter(joined_file, fieldnames=fieldnames)
    csv_writer.writeheader()
    csv_writer.writerows(joined_data)

Method 3: Merging the Entire Files

It’s also possible to combine two CSV files without indicating specific key columns. In such a situation, both files need to share identical key columns.


import csv

combined_data = []

for filename in ['file1.csv', 'file2.csv']:
    with open(filename, 'r') as file:
        csv_dict_reader = csv.DictReader(file)
        combined_data.extend(list(csv_dict_reader))

with open('combined.csv', 'w', newline='') as file:
    fieldnames = combined_data[0].keys()
    csv_dict_writer = csv.DictWriter(file, fieldnames=fieldnames)
    csv_dict_writer.writeheader()
    csv_dict_writer.writerows(combined_data)

Reading a CSV File using UTF-16 Encoding Technique

UTF-8 vs UTF-16

UTF-8 and UTF-16 are both character encodings that are used to represent and store textual data in computers. They are part of the Unicode standard, which aims to provide a consistent way of encoding characters from all the world’s writing systems. However, they differ in terms of how they encode and store characters. For Example:

UTF-8 is a variable-width encoding, meaning that it uses 1 to 4 bytes to represent characters. ASCII characters (English letters, digits, basic symbols) are represented using a single byte (8 bits), making it storage-efficient for predominantly ASCII text. Non-ASCII characters use more bytes, depending on the character.
UTF-16 uses 2 bytes (16 bits) for most characters, but characters outside the Basic Multilingual Plane (BMP) use 4 bytes (2 pairs of 2 bytes). This makes UTF-16 storage-efficient for a broader range of languages, as it can represent a wide variety of characters with a fixed 2-byte encoding.

UTF-16 is often used in Windows environments and for applications that require support for a wide range of characters, especially those outside the BMP.

So, If you need to support a wide range of characters or are working in a Windows environment, UTF-16 might be more suitable.

In this example, we will use a different CSV file (Get the file from here: ‘python-trends.csv‘) and use the `open()` function to open the CSV file, specifying the encoding parameter as ‘utf-16’.


import csv

# Opening the CSV file in read mode(by default)
with open('python-trends.csv', encoding='utf-16') as f:
    csv_reader = csv.reader(f)
    # column names
    headers = next(reader)
    # printing the column names
    print(headers, 'n')

    # printing the data from the reader object
    for row in csv_reader:
        print(row)

Reading a CSV File with Optional Arguments

Numerous columns or cells within a CSV file may hold data containing different special characters. To illustrate, let’s examine this with an example.

Remember that we initially interacted with a CSV file called ’employee.csv’. I introduced a small modification there to highlight some intriguing insights. Now, let’s take a look at the information stored within it.

Name Age Contact Joining_Date
“Alice” 28 512-123-8182 21-JUN, 2015
Bob 35 512-123-8183 17-SEP, 2011
Charlie’s 22 512-123-8184 12-JAN, 2020

In this file, the delimiter used is a space (‘ ‘), indicating that two values are separated by this space character.

Orange Mark: `“Alice”` represents a name (or value) enclosed within double quotation marks.
Green Mark: The word `Charlie’s` contains an apostrophe (‘), as shown.
Yellow Mark: Three instances involve dates in which the year is distinctively separated by a comma from the day and month.

To get the result properly there are several optional arguments we can pass. Now we will tell the program to follow the simple steps:

Utilize the space (‘ ‘) character as the delimiter by including the `delimiter=’ ‘` parameter. The default delimiter is a comma. If necessary, you can also indicate a semicolon (‘;’), tab (‘t’), or pipe (‘|’) as an alternative.
Exclude the comma (‘,’) found within the date segment by using the `escapechar=’,’` parameter.


import csv

with open('employee.csv') as f:
    csv_reader = csv.reader(f, delimiter=' ', escapechar=',')
    header = next(csv_reader)
    print(header,'n')

    # printing all the rows of values
    for row in csv_reader:
        print(row)

Output

[‘Name’, ‘Age’, ‘Contact’, ‘Joining_Date’]

[‘Alice’, ’28’, ‘512-123-8182′, ’21-JUN 2015’]
[‘Bob’, ’35’, ‘512-123-8183′, ’17-SEP 2011′]
[“Charlie’s”, ’22’, ‘512-123-8184′, ’12-JAN 2020’]

Exercises

Here are some exercises to practice working with CSV files in Python:

1. Reading CSV Files:

Read a CSV file named “data.csv” and print each row.
Read a CSV file named “data.csv” with headers and print only the values in the “Name” column.
Read a CSV file named “data.csv” and calculate the total number of rows in the file.

2. Writing CSV Files:

Create a list of dictionaries containing information about books (title, author, year) and write it to a CSV file named “books.csv”.
Write a program that generates a CSV file with a multiplication table for numbers 1 to 10.
Create a CSV file named “grades.csv” and write student names and their corresponding grades to the file.

3. Manipulating CSV Data:

Read a CSV file named “sales.csv” containing sales data (product, quantity, price) and calculate the total revenue.
Read a CSV file named “students.csv” with student data (name, age, grade) and filter out students older than 18.
Read a CSV file named “temperature.csv” with temperature data in Fahrenheit. Convert temperatures to Celsius and write them to a new CSV file.

4. Joining and Merging CSV Files:

Merge two CSV files, “orders.csv” and “customers.csv,” based on the common column “customer_id,” and write the merged data to a new file.
Join two CSV files, “products.csv” and “sales.csv,” based on the common column “product_id,” and write the joined data to a new file.
Concatenate three CSV files, “file1.csv,” “file2.csv,” and “file3.csv,” vertically to create a single combined file.

5. Advanced Exercises:

Explore a real-world dataset in CSV format (e.g., from Kaggle), and perform data cleaning, aggregation, and visualization using Python.

Remember to break down the exercises into smaller steps, and feel free to modify the scenarios to match your interests and learning goals.

Working with CSV files in Python

Introduction

Introduction to CSV Files

Reading CSV Files

Reading a CSV file using `csv.reader`

Reading CSV into Dictionaries

Writing CSV Files

Writing to a CSV file using `csv.writer`

Writing to a CSV file using `csv.DictWriter`

Customizing CSV Output

Manipulating CSV Data

Filtering and Transforming Data

Aggregating and Summarizing Data

Joining and Merging CSV Files

Method 1: Merging Two CSV Files Based on a Key Column

Method 2: Joining Two CSV Files Based on a Key Column

Method 3: Merging the Entire Files

Reading a CSV File using UTF-16 Encoding Technique

Reading a CSV File with Optional Arguments

Exercises

Subhankar Rakshit

Build a Web Scraper to Compare E-Commerce Prices Using Python

Web Scraping for Job Listings: Automate Job Search with Python

Advanced Web Scraping with Scrapy: Building a Scalable Scraper

How to Scrape Websites Using BeautifulSoup and Requests

The Best Way to Learn Python: A Guide for Beginners & Beyond

Python Program to Calculate Body Mass Index (BMI)

Python Program to Convert Decimal to Binary, Octal, & Hexadecimal

Write a Python Program to Find HCF (GCD)

Write a Python Program to Find LCM

Write a Python Program to Find the Sum of Natural Numbers

Introduction

Introduction to CSV Files

Reading CSV Files

Reading a CSV file using `csv.reader`

Reading CSV into Dictionaries

Writing CSV Files

Writing to a CSV file using `csv.writer`

Writing to a CSV file using `csv.DictWriter`

Customizing CSV Output

Manipulating CSV Data

Filtering and Transforming Data

Aggregating and Summarizing Data

Joining and Merging CSV Files

Method 1: Merging Two CSV Files Based on a Key Column

Method 2: Joining Two CSV Files Based on a Key Column

Method 3: Merging the Entire Files

Reading a CSV File using UTF-16 Encoding Technique

Reading a CSV File with Optional Arguments

Exercises

Subhankar Rakshit

You may also like

Trending now