Python Code to Extract Emails by Reading File [Complete Script]

Python Code to Extract Emails by Reading File [Complete Script]

In this tutorial, we will write our own Python script to extract all the email IDs from the given text file. Using this script, you don’t need any external tool to extract emails.

First of all, hope you have Python installed on your system.

Python to extract emails from file:

To make it simple, divide the problem into multiple tasks.

Read each line from the text file.

1
2
3
4
fileToRead = 'readText.txt'
file = open(fileToRead, 'r')
listLine = file.readlines()

Related Read: Python code to check if file presents or not

Read each word from the line and save it into the list.

We can use the Python split function to get the words from the text line.

1
2
3
4
5
6
7
8
9
fileToRead = 'readText.txt'
delimiterInFile = [',', ';']
file = open(fileToRead, 'r')
listLine = file.readlines()
for itemLine in listLine:
    item =str(itemLine)
    for delimeter in delimiterInFile:
        item = item.replace(str(delimeter),' ')

Note: If you are using replace() string method, you have to save the result in a new string. Replacing characters from the string in place is not possible as the string is an immutable data type in Python.

Python to Validate / Verify Email ID: 

Using re Python module for pattern matching makes our job easy. Verify each of the string if it is a valid email id or not.

1
2
3
4
5
6
import re
def validateEmail(strEmail):
    if re.match("(.*)@(.*).(.*)", strEmail):
        return True
    return False

You can learn more about regular expression in Python.

Save all the extracted email IDs in the file.

After validation, save all the valid email IDs into the list listEmailCheck if all the list items are unique (unique email IDs). And remove the duplicate email IDs from the list. Save the list into the file emailExtracted.txt.

If there is no email in the text file, listEmail will be empty. Print “No email found.”

For instance, if you found 40 emails in the file print “4o emails collected!”.

Python Script to Extract Emails from the file:

You can run this code with both the Python 2 and Python 3 version.

Here is the complete code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import re
fileToRead = 'readText.txt'
fileToWrite = 'emailExtracted.txt'
delimiterInFile = [',', ';']
def validateEmail(strEmail):
    # .* Zero or more characters of any type.
    if re.match("(.*)@(.*).(.*)", strEmail):
        return True
    return False
def writeFile(listData):
    file = open(fileToWrite, 'w+')
    strData = ""
    for item in listData:
        strData = strData+item+'\n'
    file.write(strData)
listEmail = []
file = open(fileToRead, 'r')
listLine = file.readlines()
for itemLine in listLine:
    item =str(itemLine)
    for delimeter in delimiterInFile:
        item = item.replace(str(delimeter),' ')
    
    wordList = item.split()
    for word in wordList:
        if(validateEmail(word)):
            listEmail.append(word)
if listEmail:
    uniqEmail = set(listEmail)
    print(len(uniqEmail),"emails collected!")
    writeFile(uniqEmail)
else:
    print("No email found.")

Most of the code in this Python script is self-explanatory. If you still have doubts, you can ask in the comment section.

The scope of this Python Script

Using Python as a scripting language has its perks.

Automate Email Marketing: You can use this Python script to extract emails from the text file. Many times we need to read all the emails for marketing.

You are ready to automate your email-extracting job with this simple Python script.

Extracting emails from the web pages is also simple. Get the source code from the web page using the browser. You can simply use the view-source feature. Example, view-source:http://example.com/.

Open it in the browser and copy and paste the source code into the file readEmail.txt. Running this script will give you all the email IDs present on the web page.

You can also use a CSV file rather than a text file to extract email IDs and save it. Using the CSV file in Python is pretty simple.

Automation: I use this script to extract the email IDs of the students subscribed to my Python channel. So that I can import these emails into the email server to send them a programming newsletter. It saves my time a lot, rather than adding each email ID.

That’s it all from this script written in Python to extract emails from the file.

Kindly share, what are the things you have automated using Python. I would like to hear from you.

28 Comments

        1. Input should be two text files- ‘readText.txt’ and ’emailExtracted.txt’. Keep both the files in the same directory from where you are running this program. The ‘readTest.txt’ will be your input text files from where you want to extract the emails. After executing script, all the extracted emails will be saved in the file ’emailExtracted.txt’.

  1. thank you so much. By the way, how can you write such complex codes like this? Please share your idea with me.

    1. Follow these steps for solving any complex programming questions.

      > break down the problem into small tasks.
      > write code for each small task
      > integrate these tasks

      For example: In the above program, I followed the below steps.

      > Breaking down the problem into multiple tasks like reading file data, writing a regular expression for email…
      > Write functions for each task
      > Integrate each task. Read each line from the file and extract emails from each line.

  2. Hello! Thank you so much. One question please. How to extract phone number by doing this way.? please

  3. What’s about the closing file?
    For why reason are you using such complex writing-in-file method?

    def writeFile(listData):
        with open(fileToWrite, 'w+') as f:    
            for item in listData:
                f.write(item+'\n')
    

    That will be more efficiency and take less memory

    1. Hi Vladimir,

      You are performing write operations for each item in the list. The file write operation has more complexity associated with it. Its always good to replace these many write operations with the single.

  4. Hi,
    I was writing the code and got below error. Please help me out.

    >>> file=open(filetoread,’r’)
    Traceback (most recent call last):
    File “”, line 1, in
    file=open(filetoread,’r’)
    TypeError: ‘tuple’ object is not callable
    
      1. I have list of companies with company name, all these companies are newly registered. Now my task is to get email ids for all these companies. Please do suggest.

        1. This is not so straight forward. You have to find out official websites and the their about-us or contact page. These pages mpsty will have contact email. You have to webscrap to read the page and to get the email ID.

          If you want to do it no programming way, there is a chrome plugin by hunter.io. Using it, you can find the email ID for any given company.

    1. In that case, you can modify the regular expression to match the desired email addresses from the university. I assume the email address from the university will have a specific domain (example: university-name.edu). Change the regular expression to “(.*)@” in validateEmail() function. That’s all.

  5. Thank you so much for this program. I need your help i.e. what if one wanted to extract specific animals image from the big dataset which contains multiple images of different animals? please help me with this.

    1. One simple solution is that you can name the images accordingly ex. cat-01, cat-02, dog-01. From the name, you can depict the images. But this requires your access to the dataset. Otherwise, if you want to automate this, you can use the pillow Python package for reading images and any efficient open-source image recognition package to identify the animals in the given image.

  6. I have 100’s of files with emails in them. I need to write a python script to hide the user part in the email in those files, do you have something for a similar approach.

  7. Great Explanation. Can we do this email finding without using re? I am using the core Python language. Can you give a hint in this regard, Thanks a lot.

    1. Thanks Amma!

      Then there is a lot of stuff to verify.

      The string to be email ID, it should have special characters ‘@’ and ‘.’ in the same order. The string should have the exact one ‘@’ character. You have to do this verification manualluy using Python string methods.

  8. Hello, thanks for this code I have a question plz.

    What if I want to exclude some emails from the file like I don’t want the print the “hotmail.com” email?

    1. You can simply update the validateEmail() function in the program mentioned in this tutorial.

      Add the following line of code inside the validateEmail() function.

      if "hotmail.com" in strEmail:
          return False
      

Leave a Reply

Your email address will not be published. Required fields are marked *