Learn How to Check and Correct Spellings Using Python

The image features text reading 'Spelling Corrector using Natural Language Processing' with a Python logo centered. Additionally, a cartoon girl is depicted pointing to another text above her head that reads 'Machine Learning'.

Introduction

Misspellings and typos, those ever-present gremlins, can haunt even the most meticulous writer. While eagle-eyed proofreading remains crucial, Python, with its arsenal of Natural Language Processing (NLP) libraries, offers invaluable tools to combat these errors.

Today, we’ll explore two methods for building your own spelling checker and corrector using rule-based TextBlob and N-gram-based PyEnchant libraries – with informative programming examples to upscale your Python mastery.

To get started with TextBlob (The documentation is also available), you’ll need to install it first. You can install TextBlob via pip:

pip install textblob

Rule-based Approach with TextBlob

The rule-based approach relies on predefined rules to identify and correct common typos. TextBlob offers built-in rules for handling common typos. Let’s explore some functionalities:

You can easily check and correct a single word using TextBlob. Here is an example:

# Import the Word class from the textblob module
from textblob import Word

# Define the word to be corrected
word = 'Tigerr'

# Create a Word object from the given word
givenWord = Word(word)

# Use the correct() method to correct the spelling of the word
corrected_word = givenWord.correct()

# Print the original word and the corrected word
print(f"Original Word: {givenWord}")
print(f"Corrected text: {corrected_word}")

Output

Original Word: Tigerr
Corrected text: Tiger

Let’s perform the spelling checker on a sentence. In this scenario, we will use the TextBlob class.

from textblob import TextBlob

text = "Tiger is the largast livingg cat."

# Spell correct using TextBlob's rules
corrected_text = TextBlob(text).correct()

print(f"Original text: {text}")
print(f"Corrected text: {corrected_text}")

Output

Original text: Tiger is the largast livingg cat.
Corrected text: Tiger is the largest living cat.

As you can see, TextBlob corrected “largast” to “largest” and “livingg” to “living”.

However, rule-based methods might struggle with complex errors or words outside their dictionary.

Read Also: Learn How to Blur an Image using Python Programming

N-gram-based Approach with PyEnchant

Install PyEnchant (Documentation):

pip install pyenchant

The N-gram approach considers sequences of letters (n-grams) to identify valid words. Libraries like PyEnchant offer large dictionaries and N-gram analysis for more accurate suggestions. Let’s see it in action:

from enchant import Dict

def spell_suggest(word):
  # Check if word exists in dictionary
  if Dict("en_US").check(word):
    return word
  # Suggest corrections based on n-grams
  suggestions = Dict("en_US").suggest(word)
  if suggestions:
    return suggestions[0]
  else:
    return "No suggestions found"

text = "Tommorow I will go to the librery."

corrected_words = [spell_suggest(word) for word in text.split()]
corrected_text = " ".join(corrected_words)

print(f"Original text: {text}")
print(f"Corrected text: {corrected_text}")

Output

Original text: Tommorow I will go to the librery.
Corrected text: Tomorrow I will go to the library

Custom Dictionaries: You can create your custom dictionaries to enhance accuracy by including domain-specific terms. For example, you want to create a custom dictionary to add medical terms. Here is an example:

custom_dict = DictWord("en_US")
custom_dict.add("diabetes")
custom_dict.add("cardiovascular")

# Use both built-in and custom dictionary
suggestions = custom_dict.suggest("diabtes")
print(suggestions)

Output

[‘diabetes’, ‘debates’, “debate’s”, ‘debaters’, “diabetes’s”, ‘dilates’, ‘dates’, ‘debts’, “debt’s”, “debater’s”, “date’s”, “Tibet’s”]

While we’ve explored the foundations of spell-checking and correction with TextBlob and PyEnchant, remember, this is just the beginning! For even greater mastery, venture deeper into the realm of advanced NLP tools:

  • spaCy: You can unleash the power of context-aware spell correction. By analyzing surrounding words and their relationships, spaCy suggests corrections that seamlessly integrate with your text’s meaning.
  • TensorFlow/PyTorch: You can train your own custom spell correction model tailored to your specific needs and vocabulary. Imagine a model that understands your writing style and domain jargon, offering unparalleled accuracy and personalization.

Remember, the tools mentioned here are just the tip of the iceberg. As you explore further, you’ll discover a vast array of NLP libraries and techniques waiting to be harnessed. Experiment, innovate, and push the boundaries of spell correction in Python. Who knows, you might discover the next groundbreaking technique in this exciting field!

Share your love
Subhankar Rakshit
Subhankar Rakshit

Hey there! I’m Subhankar Rakshit, the brains behind PySeek. I’m a Post Graduate in Computer Science. PySeek is where I channel my love for Python programming and share it with the world through engaging and informative blogs.

Articles: 194