Extract Emails and Phone numbers from a text using Python

Introduction

Have you ever encountered a situation where you had to find the Email IDs and Phone numbers one by one from a large text document or a webpage? Manually, itā€™s a boring task and may consume so much of your valuable time. But what if you had a program that could do all that searching for you? Pretty cool, right?

In this tutorial, weā€™ll build a Python program that uses regular expressions to extract email addresses and phone numbers within text or web pages.

Requirements

Please install the pyperclip module using the following command. It will allow you to work with the clipboard functions (copy and paste).

pip install pyperclip

The Python Program

import pyperclip, re

# Regular expression for Indian phone numbers
IndianNumber = re.compile(r'''(
([+]\d{1,2})                    # Leading Plus(+) Sign
(\d{3,10})
)''',re.VERBOSE)

# Regular expression for US phone numbers
phoneRegex = re.compile(r'''(
(\d{3}|\(\d{3}\))?              # Area Code(Optional)
(\s|-|\.)                       # Separator
(\d{3})                         # First Three Digits
(\s|-|\.)                       # Separator
(\d{4})                         # Last Four Digits
(\s*(ext|x|ext.)\s*(\d{2,5}))?  # Extension
)''', re.VERBOSE)

# Regular expression for Emails
emailRegex = re.compile(r'''(
[a-zA-Z0-9._%+-]+       # Username
@                       # @Symbol
[a-zA-Z0-9.-]+          # Domain Name
(\.[a-zA-Z]{2,4})       # dot-something
)''', re.VERBOSE)

# Find Matches in Clipboard Text
text = str(pyperclip.paste())
phone_groups = phoneRegex.findall(text)
email_groups = emailRegex.findall(text)
Indian_Contacts = IndianNumber.findall(text)

matched = []

for group in phone_groups:
    # Append full phone number to matched list
    matched.append(group[0])

for group in Indian_Contacts:
    # If Indian country code is present
    if group[1] == '+91':
        phoneNum = group[1] + group[2]
    matched.append(phoneNum)

for group in email_groups:
    matched.append(group[0])

if len(matched) > 0:
    # Copy matched items to Clipboard Again
    pyperclip.copy('\n'.join(matched))
    print('Copied to clipboard!\n')
    # Print matched items
    print('\n'.join(matched))
else:
    print('No Phone Numbers or Emails found')

Output

Output

Summary

In this tutorial, you learned how to extract email addresses and phone numbers within a text or webpage using Python with the help of Regular Expression. The program can find any combination of email addresses. In the case of Phone numbers, we built custom expressions for US and Indian phone numbers. Extracting numbers for other countries follows similar logic ā€“ check out this article on Regular Expressions in Python for more details!

Need help with specific country formats? Feel free to ask in the comments ā€“ Iā€™m happy to assist!

Want more lovely Python topics like this, visit our separate page packed with several Cool Python Programs.

Happy Coding!

Share your love
Subhankar Rakshit
Subhankar Rakshit

Hey there! Iā€™m Subhankar Rakshit, the brains behind PySeek. Iā€™m a Post Graduate in Computer Science. PySeek is where I channel my love for Python programming and share it with the world through engaging and informative blogs.

Articles:Ā 194