• Home
  • Subscribe
  • Contribute Us
    • Share Your Interview Experience
  • Contact Us
  • About
    • About CSEstack
    • Campus Ambassador
  • Forum & Discus
  • Tools for Geek
  • LeaderBoard
CSEstack

What do you want to Learn Today?

  • Programming
    • Tutorial- C/C++
    • Tutorial- Django
    • Tutorial- Git
    • Tutorial- HTML & CSS
    • Tutorial- Java
    • Tutorial- MySQL
    • Tutorial- Python
    • Competitive Coding Challenges
  • CSE Subject
    • (CD) Compiler Design
    • (CN) Computer Network
    • (COA) Computer Organization & Architecture
    • (DBMS) Database Management System
    • (DS) Data Structure
    • (OS) Operating System
    • (ToA) Theory of Automata
    • (WT) Web Technology
  • Interview Questions
    • Interview Questions- Company Wise
    • Interview Questions- Coding Round
    • Interview Questions- Python
    • Interview Questions- REST API
    • Interview Questions- Web Scraping
    • Interview Questions- HR Round
    • Aptitude Preparation Guide
  • GATE 2022
  • Linux
  • Trend
    • Full Stack Development
    • Artificial Intelligence (AI)
    • BigData
    • Cloud Computing
    • Machine Learning (ML)
  • Write for Us
    • Submit Article
    • Submit Source Code or Program
    • Share Your Interview Experience
  • Tools
    • IDE
    • CV Builder
    • Other Tools …
  • Jobs

Python Code to Extract Emails by Reading File [Complete Script]

Aniruddha Chaudhari/50676/28
CodePython

In this tutorial, we will write our own Python script to extract all the email IDs from the given text file. Using this script, you don’t need any external tool to extract emails.

First of all, hope you have Python installed on your system.

Python to extract emails from file:

To make it simple, divide the problem into multiple tasks.

Read each line from the text file.

1
2
3
4
fileToRead = 'readText.txt'
file = open(fileToRead, 'r')
listLine = file.readlines()

Related Read: Python code to check if file presents or not

Read each word from the line and save it into the list.

We can use Python split function to get the words from the text line.

1
2
3
4
5
6
7
8
9
fileToRead = 'readText.txt'
delimiterInFile = [',', ';']
file = open(fileToRead, 'r')
listLine = file.readlines()
for itemLine in listLine:
    item =str(itemLine)
    for delimeter in delimiterInFile:
        item = item.replace(str(delimeter),' ')

Note: If you are using replace() string method, you have to save the result in a new string. Replacing characters from the string in-place is not possible as the string is an immutable data type in Python.

Python to Validate / Verify Email ID: 

Using re Python module for pattern matching makes our job easy. Verify each of the string if it is a valid email id or not.

1
2
3
4
5
6
import re
def validateEmail(strEmail):
    if re.match("(.*)@(.*).(.*)", strEmail):
        return True
    return False

You can learn more about regular expression in Python.

Save all the extracted email IDs in the file.

After validation, save all the valid email IDs into the list listEmail. Check if all the list items are unique (unique email IDs). And remove the duplicate email IDs from the list. Save the list into the file emailExtracted.txt.

If there is no email in the text file, listEmail will be empty. Print “No email found.”

For an instance, if you found 40 emails in the file print “4o emails collected!”.

Python Script to Extract Emails from the file:

You can run this code with both the Python 2 and Python 3 version.

Here is the complete code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import re
fileToRead = 'readText.txt'
fileToWrite = 'emailExtracted.txt'
delimiterInFile = [',', ';']
def validateEmail(strEmail):
    # .* Zero or more characters of any type.
    if re.match("(.*)@(.*).(.*)", strEmail):
        return True
    return False
def writeFile(listData):
    file = open(fileToWrite, 'w+')
    strData = ""
    for item in listData:
        strData = strData+item+'\n'
    file.write(strData)
listEmail = []
file = open(fileToRead, 'r')
listLine = file.readlines()
for itemLine in listLine:
    item =str(itemLine)
    for delimeter in delimiterInFile:
        item = item.replace(str(delimeter),' ')
    
    wordList = item.split()
    for word in wordList:
        if(validateEmail(word)):
            listEmail.append(word)
if listEmail:
    uniqEmail = set(listEmail)
    print(len(uniqEmail),"emails collected!")
    writeFile(uniqEmail)
else:
    print("No email found.")

Most of the code in this Python script is self-explanatory. If you still have doubt, you can ask in the comment section.

The scope of this Python Script

Using Python as a scripting language has its own perk.

Automate Email Marketing: You can use this python script to extract emails from the text file. Many times we need to read all the emails for marketing.

You are ready to automate your email extracting job with this simple Python script.

Extracting emails from the web pages is also simple. Get the source code from the web page using the browser. You can simply use the view-source feature. Example, view-source:http://example.com/.

Open it in the browser and copy and paste the source code into the file readEmail.txt. Running this script will give you all the email IDs present on the web page.

You can also use a CSV file rather than a text file to extract email IDs and to save it. Using the CSV file in Python is pretty simple.

Automation: I use this script to extract the email IDs of the students subscribed to my Python channel. So that I can import these emails on the email server to send them a programming newsletter. It saves my time a lot, rather than adding each individual email ID.

That’s it all from this script written in Python to extract emails from the file.

Kindly share, what are the things you have automated using Python? I would like to hear from you.

Python Interview Questions eBook

Python
Aniruddha Chaudhari
I am complete Python Nut, love Linux and vim as an editor. I hold a Master of Computer Science from NIT Trichy. I dabble in C/C++, Java too. I keep sharing my coding knowledge and my own experience on CSEstack.org portal.

Your name can also be listed here. Got a tip? Submit it here to become an CSEstack author.

Comments

  • Reply
    Nari
    September 18, 2019 at 6:55 pm

    Nice work!!

    • Reply
      Aniruddha Chaudhari
      September 18, 2019 at 6:57 pm

      Thanks! Glad you like it.

  • Reply
    Meru Wa
    October 11, 2019 at 7:27 pm

    TypeError: ‘str’ object is not callable

    • Reply
      Aniruddha Chaudhari
      October 11, 2019 at 11:08 pm

      What Python version are you using?

      • Reply
        Ved
        November 22, 2019 at 2:00 am

        What should be the input? Please tell.

        • Reply
          Aniruddha Chaudhari
          November 23, 2019 at 10:39 am

          Input should be two text files- ‘readText.txt’ and ’emailExtracted.txt’. Keep both the files in the same directory from where you are running this program. The ‘readTest.txt’ will be your input text files from where you want to extract the emails. After executing script, all the extracted emails will be saved in the file ’emailExtracted.txt’.

  • Reply
    Varun
    January 3, 2020 at 2:04 pm

    thank you so much. By the way, how can you write such complex codes like this? Please share your idea with me.

    • Reply
      Aniruddha Chaudhari
      January 4, 2020 at 3:03 pm

      Follow these steps for solving any complex programming questions.

      > break down the problem into small tasks.
      > write code for each small task
      > integrate these tasks

      For example: In the above program, I followed the below steps.

      > Breaking down the problem into multiple tasks like reading file data, writing a regular expression for email…
      > Write functions for each task
      > Integrate each task. Read each line from the file and extract emails from each line.

  • Reply
    Sai Leng Wan
    January 3, 2020 at 3:09 pm

    Hello! Thank you so much. One question please. How to extract phone number by doing this way.? please

    • Reply
      Aniruddha Chaudhari
      January 4, 2020 at 2:54 pm

      Hi Sai,

      Everything will be the same. Only, you have to write different regular expressions to extract phone numbers based on the rules (like in India, the phone number is 10 digits).

  • Reply
    Vladimir
    February 10, 2020 at 12:50 pm

    What’s about the closing file?
    For why reason are you using such complex writing-in-file method?

    def writeFile(listData):
        with open(fileToWrite, 'w+') as f:    
            for item in listData:
                f.write(item+'\n')
    

    That will be more efficiency and take less memory

    • Reply
      Aniruddha Chaudhari
      February 13, 2020 at 6:43 pm

      Hi Vladimir,

      You are performing write operations for each item in the list. The file write operation has more complexity associated with it. Its always good to replace these many write operations with the single.

  • Reply
    anand
    July 11, 2020 at 8:32 pm

    Hi,
    I was writing the code and got below error. Please help me out.

    >>> file=open(filetoread,’r’)
    Traceback (most recent call last):
    File “”, line 1, in
    file=open(filetoread,’r’)
    TypeError: ‘tuple’ object is not callable
    
    • Reply
      Aniruddha Chaudhari
      July 11, 2020 at 8:57 pm

      Hi Anand,

      If you are trying to run the above program, don’t try it in Python interpreter console. Rather, save the above code in your .py file and run it.

      • Reply
        Madapati Sindhuri
        September 26, 2022 at 12:28 pm

        I have list of companies with company name, all these companies are newly registered. Now my task is to get email ids for all these companies. Please do suggest.

        • Reply
          Aniruddha Chaudhari
          October 4, 2022 at 3:35 pm

          This is not so straight forward. You have to find out official websites and the their about-us or contact page. These pages mpsty will have contact email. You have to webscrap to read the page and to get the email ID.

          If you want to do it no programming way, there is a chrome plugin by hunter.io. Using it, you can find the email ID for any given company.

      • Reply
        Mayank Dev
        December 9, 2022 at 2:18 pm

        Is this your college assignment?

  • Reply
    Sami Ullah
    August 30, 2020 at 6:18 am

    What if I want to extract emails from all university personnel in a specific country?

    • Reply
      Aniruddha Chaudhari
      September 5, 2020 at 8:40 am

      In that case, you can modify the regular expression to match the desired email addresses from the university. I assume the email address from the university will have a specific domain (example: university-name.edu). Change the regular expression to “(.*)@” in validateEmail() function. That’s all.

  • Reply
    Supriya
    August 15, 2021 at 5:35 pm

    Thank you so much for this program. I need your help i.e. what if one wanted to extract specific animals image from the big dataset which contains multiple images of different animals? please help me with this.

    • Reply
      Aniruddha Chaudhari
      September 14, 2021 at 9:05 am

      One simple solution is that you can name the images accordingly ex. cat-01, cat-02, dog-01. From the name, you can depict the images. But this requires your access to the dataset. Otherwise, if you want to automate this, you can use the pillow Python package for reading images and any efficient open-source image recognition package to identify the animals in the given image.

  • Reply
    chang
    September 14, 2021 at 7:53 am

    I have 100’s of files with emails in them. I need to write a python script to hide the user part in the email in those files, do you have something for a similar approach.

    • Reply
      Aniruddha Chaudhari
      September 14, 2021 at 7:36 pm

      You can read each file separately and then you can perform required operations on the emails in that file.

  • Reply
    Amna
    September 12, 2022 at 9:26 pm

    Great Explanation. Can we do this email finding without using re? I am using the core Python language. Can you give a hint in this regard, Thanks a lot.

    • Reply
      Aniruddha Chaudhari
      September 21, 2022 at 10:34 pm

      Thanks Amma!

      Then there is a lot of stuff to verify.

      The string to be email ID, it should have special characters ‘@’ and ‘.’ in the same order. The string should have the exact one ‘@’ character. You have to do this verification manualluy using Python string methods.

  • Reply
    hadz
    October 24, 2022 at 6:51 pm

    Hello, thanks for this code I have a question plz.

    What if I want to exclude some emails from the file like I don’t want the print the “hotmail.com” email?

    • Reply
      Aniruddha Chaudhari
      October 31, 2022 at 5:49 pm

      You can simply update the validateEmail() function in the program mentioned in this tutorial.

      Add the following line of code inside the validateEmail() function.

      if "hotmail.com" in strEmail:
          return False
      
      • Reply
        Aniruddha Chaudhari
        October 31, 2022 at 5:49 pm

        Let me know if it solves your problem.

Leave a Reply Cancel reply

Basic Python Tutorial

  1. Python- Tutorial Overview
  2. Python- Applications
  3. Python- Setup on Linux
  4. Python- Setup on Windows
  5. Python- Basic Syntax
  6. Python- Variable Declaration
  7. Python- Numeric Data Types
  8. Python- NoneType
  9. Python- if-else/elif
  10. Python- for/while else
  11. Python- User Input
  12. Python- Multiline User Input
  13. Python- String Formatting
  14. Python- Find Substring in String
  15. Python- Bitwise Operators
  16. Python- Range Function
  17. Python- List
  18. Python- List Vs Tuple
  19. Python- Compare Two Lists
  20. Python- Sorting List
  21. Python- Delete Element from List
  22. Python- Dictionary
  23. Python- ‘is’ vs ‘==’
  24. Python- Mutable vs Immutable
  25. Python- Generator & Yield
  26. Python- Fibonacci Generator
  27. Python- Assert Statement
  28. Python- Exception Handling 
  29. Python- RegEx
  30. Python- Lambda Function
  31. Python- Installing Modules
  32. Python- Important Modules
  33. Python- Find all Installed Modules
  34. PyCharm- IDE setup
  35. Python- File Handling
  36. Python- Monkey Patching
  37. Python- Decorators
  38. Python- Instance vs Static vs Class Method
  39. Python- Name Mangling
  40. Python- Working with GUI
  41. Python- Read Data from Web URL
  42. Python- Memory Management
  43. Python- Virtual Environment
  44. Python- Calling C Function

Python Exercise

  1. Python- Tricky Questions
  2. Python- Interview Questions (60+)
  3. Python- Project Ideas (45+)
  4. Python- MCQ Test Online
  5. Python- Coding Questions (50+)
  6. Python- Competitive Coding Questions (20+)

Python String

  1. Reverse the String
  2. Permutations of String
  3. Padding Zeros to String/Number

Python List

  1. Randomly Select Item from List
  2. Find Unique Elements from List
  3. Are all Elements in List Same?

Python Dictionary

  1. Set Default Value in Dictionary
  2. Remove all 0 from a dictionary

File Handling

  1. Python- Read CSV File into List
  2. Check if the File Exist in Python
  3. Find Longest Line from File

Compilation & Byte Code

  1. Multiple Py Versions on System
  2. Convert .py file .pyc file
  3. Disassemble Python Bytecode

Algorithms

  1. Sorting- Selection Sort
  2. Sorting- Quick Sort

Other Python Articles

  1. Clear Py Interpreter Console
  2. Can I build Mobile App in Python?
  3. Extract all the Emails from File
  4. Python Shell Scripting

© 2022 – CSEstack.org. All Rights Reserved.

  • Home
  • Subscribe
  • Contribute Us
    • Share Your Interview Experience
  • Contact Us
  • About
    • About CSEstack
    • Campus Ambassador
  • Forum & Discus
  • Tools for Geek
  • LeaderBoard