• Home
  • Subscribe
  • Contribute Us
    • Share Your Interview Experience
  • Contact Us
  • About
    • About CSEstack
    • Campus Ambassador
  • Forum & Discus
  • Tools for Geek
  • LeaderBoard
CSEstack

What do you want to Learn Today?

  • Programming
    • Tutorial- C/C++
    • Tutorial- Django
    • Tutorial- Git
    • Tutorial- HTML & CSS
    • Tutorial- Java
    • Tutorial- MySQL
    • Tutorial- Python
    • Competitive Coding Challenges
  • CSE Subject
    • (CD) Compiler Design
    • (CN) Computer Network
    • (COA) Computer Organization & Architecture
    • (DBMS) Database Management System
    • (DS) Data Structure
    • (OS) Operating System
    • (ToA) Theory of Automata
    • (WT) Web Technology
  • Interview Questions
    • Interview Questions- Company Wise
    • Interview Questions- Coding Round
    • Interview Questions- Python
    • Interview Questions- REST API
    • Interview Questions- Web Scraping
    • Interview Questions- HR Round
    • Aptitude Preparation Guide
  • GATE 2024
  • Linux
  • Trend
    • Full Stack Development
    • Artificial Intelligence (AI)
    • BigData
    • Cloud Computing
    • Machine Learning (ML)
  • Write for Us
    • Submit Article
    • Submit Source Code or Program
    • Share Your Interview Experience
  • Tools
    • IDE
    • CV Builder
    • Other Tools …
  • Jobs

How to Solve Captcha Using Python?

Aniruddha Chaudhari/651/0
CodePython

You have probably encountered those annoying messages on registration or feedback pages that read, “Enter the letters you see on the image,” or “Select the images with a…” These are known as captchas, and they are designed as gates to let you in.

CAPTCHA stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart”.

Simply put, they are intended to differentiate between humans and automated users, such as bots. The text is created so that a human can read it without difficulty, whereas a machine cannot.

In practice, however, this rarely works because almost every simple text captcha posted on the site is cracked within a few months.

What are CAPTCHAs used for?

As we have mentioned, sites use CAPTCHAs to restrict bots. But why shouldn’t bots be allowed to access these sites? Here are some more specific uses.

  • CAPTCHAs are used to prevent online poll skewing by ensuring that every single vote is entered by a human. It also maintains poll accuracy by discouraging multiple voting, as it makes the time required for each vote longer.
  • Sites also use captchas to prevent bots from accessing registration pages and creating fake accounts. This reduces the wastage of the site’s resources and minimizes any chances of fraud.
  • Ticketing sites use CAPTCHAs to limit scalpers from making false registrations for free events and buying multiple tickets for resale.
  • Most systems require human feedback for all of their contact forms, reviews, and messaging boards. CAPTCHAs prevent false registrations, and hence false comments and online harassment.

How Do CAPTCHAs Hinder Web Scraping?

Most websites have automatic captchas, which are triggered if a website detects unusual activities that may resemble bot behavior. These include behaviors such as unlimited requests within split seconds and clicking on links at a far higher rate than humans would do.

Captchas can be a major impediment during the web scraping process, as most scraping operations are carried out and performed by the automated bots you use to scrape. However, this should not worry you.

There are several ways to overcome captchas when scraping the web. One way is to use Python programming by writing original code from scratch or using available code. However, to avoid too many inconveniences, you can also opt for an automatic site unblocker to help you dodge captchas successfully.

Decoding Image Captchas Using Python

The most common captcha is the image code captcha, which contains distorted letters that a computer program cannot detect easily, but a human can somehow manage to understand. When web-scraping, you can extract the letters from the image using Python. Here’s how.

After accessing the captcha in a useful format, you can employ the help of Optical Character Recognition, which comes in handy for extracting text from images.

You can also use open-source Tesseract, an optical character recognition tool for Python, to recognize and “read” the text embedded in the image. It can be installed using the pip command.

pip install pytesseract

The first step is to extend the original python script that loaded the captcha. This will produce a different script to read the captcha in black and white mode as follows.

import pytesseract
img = get_captcha(html)
img.save('captcha_original.png')
gray = img.convert('L')
gray.save('captcha_gray.png')
bw = gray.point(lambda x: 0 if x < 1 else 255, '1')
bw.save('captcha_thresholded.png')

# The format is now easy and 
# can be passed to tesseract as follows
pytesseract.image_to_string(bw)

When run, the output of this final script is the captcha of the form you are trying to access.

If you are new to web scraping, read frequently asked questions on web scraping.

Extra Tips for Bypassing Captchas

Rotate Proxies

As we mentioned earlier, sending frequent requests and clicking on links continuously are considered bot behaviors and can make websites employ captchas to block access. To solve this, you have to rotate proxies every time you send a request to the website. The clean residential IP proxies will help avoid captchas that trigger while you scrape, as your own IP address will not be shown.

Rotate User Agents

Merely changing a user agent will not be enough to prevent websites from restricting access when you send many requests at the same time. You will have to rotate the user agents to make the target website view you as different devices sending requests.

This is all about how to solve captcha using Python. If you still fail to solve the captcha with your code, let’s discuss it in the comment.

Python Interview Questions eBook

CAPTCHAPython
Aniruddha Chaudhari
I am complete Python Nut, love Linux and vim as an editor. I hold a Master of Computer Science from NIT Trichy. I dabble in C/C++, Java too. I keep sharing my coding knowledge and my own experience on CSEstack.org portal.

Your name can also be listed here. Got a tip? Submit it here to become an CSEstack author.

Leave a Reply Cancel reply

Basic Python Tutorial

  1. Python- Tutorial Overview
  2. Python- Applications
  3. Python- Setup on Linux
  4. Python- Setup on Windows
  5. Python- Basic Syntax
  6. Python- Variable Declaration
  7. Python- Numeric Data Types
  8. Python- NoneType
  9. Python- if-else/elif
  10. Python- for/while else
  11. Python- User Input
  12. Python- Multiline User Input
  13. Python- String Formatting
  14. Python- Find Substring in String
  15. Python- Bitwise Operators
  16. Python- Range Function
  17. Python- List
  18. Python- List Vs Tuple
  19. Python- Compare Two Lists
  20. Python- Sorting List
  21. Python- Delete Element from List
  22. Python- Dictionary
  23. Python- ‘is’ vs ‘==’
  24. Python- Mutable vs Immutable
  25. Python- Generator & Yield
  26. Python- Fibonacci Generator
  27. Python- Assert Statement
  28. Python- Exception Handling 
  29. Python- RegEx
  30. Python- Lambda Function
  31. Python- Installing Modules
  32. Python- Important Modules
  33. Python- Find all Installed Modules
  34. PyCharm- IDE setup
  35. Python- File Handling
  36. Python- Monkey Patching
  37. Python- Decorators
  38. Python- Instance vs Static vs Class Method
  39. Python- Name Mangling
  40. Python- Working with GUI
  41. Python- Read Data from Web URL
  42. Python- Memory Management
  43. Python- Virtual Environment
  44. Python- Calling C Function

Python Exercise

  1. Python- Tricky Questions
  2. Python- Interview Questions (60+)
  3. Python- Project Ideas (45+)
  4. Python- MCQ Test Online
  5. Python- Coding Questions (50+)
  6. Python- Competitive Coding Questions (20+)

Python String

  1. Reverse the String
  2. Permutations of String
  3. Padding Zeros to String/Number

Python List

  1. Randomly Select Item from List
  2. Find Unique Elements from List
  3. Are all Elements in List Same?

Python Dictionary

  1. Set Default Value in Dictionary
  2. Remove all 0 from a dictionary

File Handling

  1. Python- Read CSV File into List
  2. Check if the File Exist in Python
  3. Find Longest Line from File

Compilation & Byte Code

  1. Multiple Py Versions on System
  2. Convert .py file .pyc file
  3. Disassemble Python Bytecode

Algorithms

  1. Sorting- Selection Sort
  2. Sorting- Quick Sort

Other Python Articles

  1. Clear Py Interpreter Console
  2. Can I build Mobile App in Python?
  3. Extract all the Emails from File
  4. Python Shell Scripting

© 2022 – CSEstack.org. All Rights Reserved.

  • Home
  • Subscribe
  • Contribute Us
    • Share Your Interview Experience
  • Contact Us
  • About
    • About CSEstack
    • Campus Ambassador
  • Forum & Discus
  • Tools for Geek
  • LeaderBoard