AI Project: Six Degrees of Kevin Bacon in Python

Hollywood star Kevin Bacon on a press conference

Table of Contents

Introduction

Have you ever played the game “Six Degrees of Kevin Bacon“? It’s a fun and intriguing way to connect any actor or actress to Kevin Bacon within six movies. But did you know that this game has a deeper significance and has even inspired a scientific theory? Well, In this article, we will delve into the origins of the game, its rules, and some fascinating examples of how actors and actresses can be linked to Kevin Bacon. Next, we will create a project named Six Degrees of Kevin Bacon in Python.

In this project, we will employ Artificial Intelligence to find the minimal distance between two actors or actresses and a movie will be considered as the distance here.

The Origins of Six Degrees of Kevin Bacon

The concept of Six Degrees of Kevin Bacon emerged in the mid-1990s when a group of college students at Albright College in Pennsylvania started playing a game based on the idea that any actor could be connected to Kevin Bacon through six or fewer co-starring roles. It was inspired by the popular theory of six degrees of separation, which suggests that any two people in the world can be connected through a chain of acquaintances in six steps or less.

The game gained widespread recognition after a conversation between Kevin Bacon and a journalist, where Bacon mentioned that he had worked with almost everyone in Hollywood or someone who had worked with them. This led to the creation of the “Six Degrees of Kevin Bacon” game, which quickly gained popularity and became a cultural phenomenon.

The Rules of the Game

The rules of the game are simple. The objective is to connect any actor or actress to Kevin Bacon within six movies. Each movie connection counts as one degree of separation. For example, if an actor has worked directly with Kevin Bacon in a film, they have a Bacon number of one. If they have worked with someone who has worked with Kevin Bacon, their Bacon number is two, and so on.

It’s important to note that the game is not limited to Kevin Bacon alone. The concept can be applied to any actor or actress in the film industry. However, Kevin Bacon is often used as the central figure due to his extensive filmography and the diverse range of actors he has worked with.

The Project Details

There are a ton of movies worldwide, featuring numerous actors and actresses. Using this project, our motto is to find out the shortest connections between two actors or actresses globally. As we discussed earlier, a movie serves as a connection here.

The project file follows this hierarchy:

six-degrees-of-kevin-bacon
- large
  - people.csv
  - movies.csv
  - stars.csv
- small
  - people.csv
  - movies.csv
  - stars.csv
- degrees.py
- util.py

The large folder contains a huge amount of data stored in separate csv files, each containing over a million data entries. In the small folder, you’ll find the same CSV files, but they contain a smaller dataset intended for pre-testing purposes.

Real-life Examples

Let’s explore some real-life examples that can be solved using this Artificial Intelligence Project.

Example 1

For example, we want to find the connections between Tom Cruise and Tom Hanks. By employing our project, it yields the following optimal result:

2 degrees of separation.
1: Tom Hanks and Bill Paxton starred in Apollo 13
2: Bill Paxton and Tom Cruise starred in Edge of Tomorrow

connections between tom hanks and tom cruise with six degrees of kevin bacon — Connections between Tom Hanks and Tom Cruise

It’s important to note that, as the program goes through millions of datasets to find the optimal solution, the results may vary over time. For instance, with the same input as above, the program may also return the following result:

2 degrees of separation.
1: Tom Cruise and Craig T. Nelson starred in All the Right Moves
2: Craig T. Nelson and Tom Hanks starred in Turner & Hooch

connections between tom cruise and tom hanks with six degrees of kevin bacon — Connections between Tom Cruise and Tom Hanks

The main point is that, although the distance stays the same every time, the connections might be different.

Example 2

Let’s find a connection between Bollywood and Hollywood using our Python AI Project. This time we will try to find the minimum connections between renowned Bollywood star Shah Rukh Khan and Hollywood icon Tom Cruise. Check out one possible result:

3 degrees of separation.
1: Tom Cruise and Samantha Morton starred in Minority Report
2: Samantha Morton and Om Puri starred in Code 46
3: Om Puri and Shah Rukh Khan starred in Don 2

connections between tom cruise and shah rukh khan with six degrees of kevin bacon in python — Connections between Tom Cruise and Shah Rukh Khan

Set Up the Environment

Download the zip file of the dataset and extract its content. After that, make two Python files in the unzipped folder: degrees.py and util.py. The code you need is provided in the following sections.

six-degrees-of-kevin-bacon Download

degrees.py

import csv
import sys
from util import Node, StackFrontier, QueueFrontier

# Maps names to a set of corresponding person_ids
names = {}

# Maps person_ids to a dictionary of: name, birth, movies (a set of movie_ids)
people = {}

# Maps movie_ids to a dictionary of: title, year, stars (a set of person_ids)
movies = {}


def load_data(directory):
    """
    Load data from CSV files into memory.
    """
    # Load people
    with open(f"{directory}/people.csv", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            people[row["id"]] = {
                "name": row["name"],
                "birth": row["birth"],
                "movies": set()
            }
            if row["name"].lower() not in names:
                names[row["name"].lower()] = {row["id"]}
            else:
                names[row["name"].lower()].add(row["id"])

    # Load movies
    with open(f"{directory}/movies.csv", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            movies[row["id"]] = {
                "title": row["title"],
                "year": row["year"],
                "stars": set()
            }

    # Load stars
    with open(f"{directory}/stars.csv", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        for row in reader:
            try:
                people[row["person_id"]]["movies"].add(row["movie_id"])
                movies[row["movie_id"]]["stars"].add(row["person_id"])
            except KeyError:
                pass


def main():
    if len(sys.argv) > 2:
        sys.exit("Usage: python degrees.py [directory]")
    directory = sys.argv[1] if len(sys.argv) == 2 else "large"

    # Load data from files into memory
    print("Loading data...")
    load_data(directory)
    print("Data loaded.")

    source = person_id_for_name(input("Name: "))
    if source is None:
        sys.exit("Person not found.")
    target = person_id_for_name(input("Name: "))
    if target is None:
        sys.exit("Person not found.")

    path = shortest_path(source, target)

    if path is None:
        print("Not connected.")
    else:
        degrees = len(path)
        print(f"{degrees} degrees of separation.")
        path = [(None, source)] + path
        for i in range(degrees):
            person1 = people[path[i][1]]["name"]
            person2 = people[path[i + 1][1]]["name"]
            movie = movies[path[i + 1][0]]["title"]
            print(f"{i + 1}: {person1} and {person2} starred in {movie}")


def shortest_path(source, target):
    """
    Returns the shortest list of (movie_id, person_id) pairs
    that connect the source to the target.

    If no possible path, returns None.
    """

    solution = list()
    explored = set()

    solution_found = False
    empty = False

    start = Node(state=source, parent=None, action=None)
    frontier = QueueFrontier()
    frontier.add(start)

    while not solution_found:
        if frontier.empty():
            solution_found = True
            empty = True
        
        # Choose a node from frontier
        node = frontier.remove()

        # If node is the target, then we have a solution
        if node.state == target:
            solution_found = True
            while node.parent is not None:
                pid, mid = node.state, node.action
                solution.append((mid, pid))
                node = node.parent
            solution.reverse()

        # Mark node as explored
        explored.add(node)
        neighbors = neighbors_for_person(node.state)

        for neighbor in neighbors:
            child = Node(state=neighbor[1], action=neighbor[0], parent=node)
            # Add neighbor to frontier
            frontier.add(child)

            # If any child node from neighbors is the target, then we have a solution
            if child.state == target:
                solution_found = True
                while child.parent is not None:
                    pid, mid = child.state, child.action
                    solution.append((mid, pid))
                    child = child.parent
                solution.reverse()

    if solution_found:
        if empty:
            return None
        return solution


def person_id_for_name(name):
    """
    Returns the IMDB id for a person's name,
    resolving ambiguities as needed.
    """
    person_ids = list(names.get(name.lower(), set()))
    if len(person_ids) == 0:
        return None
    elif len(person_ids) > 1:
        print(f"Which '{name}'?")
        for person_id in person_ids:
            person = people[person_id]
            name = person["name"]
            birth = person["birth"]
            print(f"ID: {person_id}, Name: {name}, Birth: {birth}")
        try:
            person_id = input("Intended Person ID: ")
            if person_id in person_ids:
                return person_id
        except ValueError:
            pass
        return None
    else:
        return person_ids[0]


def neighbors_for_person(person_id):
    """
    Returns (movie_id, person_id) pairs for people
    who starred with a given person.
    """
    movie_ids = people[person_id]["movies"]
    neighbors = set()
    for movie_id in movie_ids:
        for person_id in movies[movie_id]["stars"]:
            neighbors.add((movie_id, person_id))
    return neighbors


if __name__ == "__main__":
    main()

This is the runner program. It finds the shortest connections between two people using data from CSV files. It works with information about actors/actresses, movies, and the relationships between them. Here’s a simplified breakdown:

Key Components

Data Structures: The program creates three dictionaries: names, people, and movies, to store information about people, movies, and their connections.
Loading Data: The load_data function loads information from CSV files into memory, organizing data into dictionaries.
Main Function: The main function prompts users to input the names of two people and then calculates the shortest path (connections) between them in terms of the movies they starred in.
Shortest Path Function: It determines the shortest path between two people using a breadth-first search algorithm.
Person Id for Name: The person_id_for_name function in the program plays a crucial role in resolving ambiguities related to individuals’ names.
- When a user inputs a person’s name, this function converts the name to the corresponding IMDB ID.
- It utilizes the names dictionary, which maps lowercase names to sets of person IDs.
- If there’s more than one person with the same name, the function prompts the user to choose the intended person.
Neighbors for Person: The neighbors_for_person function identifies the movie connections of a given person.
- Given a person’s ID, the function looks up their associated movies from the people dictionary, which contains information about individuals and the movies they’ve been in.
- It then explores each movie associated with the person and identifies other individuals (persons) who starred in the same movies.
- For each movie, it retrieves the person IDs of co-stars from the movies dictionary.
- The function forms pairs of (movie_id, person_id) to represent the connections between the given person and their co-stars in specific movies.

Important Note: Our program can handle cases where there are multiple people with the same name and offers the user options to choose the intended person.

util.py

class Node():
    def __init__(self, state, parent, action):
        self.state = state
        self.parent = parent
        self.action = action

class StackFrontier():
    def __init__(self):
        self.frontier = []

    def add(self, node):
        self.frontier.append(node)

    def contains_state(self, state):
        return any(node.state == state for node in self.frontier)

    def empty(self):
        return len(self.frontier) == 0

    def remove(self):
        if self.empty():
            raise Exception("empty frontier")
        else:
            node = self.frontier[-1]
            self.frontier = self.frontier[:-1]
            return node


class QueueFrontier(StackFrontier):

    def remove(self):
        if self.empty():
            raise Exception("empty frontier")
        else:
            node = self.frontier[0]
            self.frontier = self.frontier[1:]
            return node

In this code, we have two classes to help us navigate through possible paths:

Node Class: Represents a point in our exploration. Each node has a state (current situation), a parent (the node we came from), and an action (the move we made to get here).
StackFrontier Class: Manages a stack (last in, first out) of nodes. It helps us keep track of our current exploration path. We can add nodes, check if a state is already in our exploration path, check if it’s empty, and remove the last node.
QueueFrontier Class (inherits from StackFrontier): A variation that works like a queue (first in, first out). It’s similar to the StackFrontier but removes nodes from the front of the queue instead of the end.

Output

Loading data...
Data loaded.
Name: Tom Cruise
Name: Shah Rukh Khan
3 degrees of separation.
1: Tom Cruise and Samantha Morton starred in Minority Report
2: Samantha Morton and Om Puri starred in Code 46
3: Om Puri and Shah Rukh Khan starred in Don 2

Summary

The “Six Degrees of Kevin Bacon” game is a captivating way to explore the interconnection of the film industry. It showcases how actors and actresses from different generations, genres, and backgrounds can be linked through a relatively small number of connections.

Not only film industries, the theory also suggests that any two people in the world can be connected through a chain of acquaintances in six steps or less and the theory is based on the six degrees of separation.

In this tutorial, we built a project called Six Degrees of Kevin Bacon in Python. In the project, we worked with a super huge dataset – think millions of entries – that has info about actors, actresses, and the movies they’ve been in from all over the world. The project uses smart Artificial Intelligence to find out the minimum connections between two actors/actresses globally where the movies are considered as connections.

As our ‘Six Degrees of Kevin Bacon in Python’ journey wraps up, know that connections are endless. Grab your metaphorical popcorn, because in the vast universe of cinema, every ending is just the beginning of a new story.

If you have any burning questions about this Python project, drop them in the comments below. I’m here and ready to help you out.

AI Project: Six Degrees of Kevin Bacon in Python

Introduction

The Origins of Six Degrees of Kevin Bacon

The Rules of the Game

The Project Details

Real-life Examples

Example 1

Example 2

Set Up the Environment

degrees.py

Key Components

util.py

Output

Summary

Subhankar Rakshit

How to Generate Music using Python & Deep Learning

Create a Weather Prediction AI Model using Python

Create a Space Shooter Game using Python & PyGame

Learn to Automate Twitter Posts using Python and Tweepy

Python Modules: Learn From the Scratch

Introduction

The Origins of Six Degrees of Kevin Bacon

The Rules of the Game

The Project Details

Real-life Examples

Example 1

Example 2

Set Up the Environment

degrees.py

Key Components

util.py

Output

Summary

Subhankar Rakshit

You may also like

Trending now