Member-only story
An Unsuccessful Attempt to use OCR to Pass text-based Captcha in Selenium Automated Tests
Not recommended, but a fun exercise with using an OCR

A repost of my daughter’s article with permission. I added it here for the convenience of my blog subscribers.
This article is included in my “How to in Selenium WebDriver” series.
First of all, Captchas are designed to stop automation. Ideally, Captchas would be disabled for automated testing, but this is not always possible (out of your control or due to human reasons).
Most Captchas these days are more advanced, they use images and very heavily distorted text. Even so, some sites still use more basic text captchas, like below.
Today, I came up with an idea to attempt OCR (Optical Character Recognition) in automated test scripts to parse the text-based captchas like the above. I give the result first: very low accuracy, i.e. no practical use. However, I think it is a good exercise.
Tesseract OCR library
Tesseract is a popular, free and open-source OCR library, and it runs on multiple platforms. Use the package manager to install it, for example, Homebrew for macOS.
brew install tesseract
The usage tesseract <image_file> <output_file>
, example:
tesseract captcha1.png output
Output on screen like below:
Estimating resolution as 272
and a new file output.txt
was created, in the case, the content is
4g34
Advanced Tesseract
We can further enhance the recognition accuracy by tweaking some of Tesseract’s configurations, e.g. 1 word, 1 character or vertically aligned.
Here I will use the page segmentation mode. For instance, say that I know the Captcha is always 1 word (in our 3g34
example), I can specify this with the relevant page segmentation mode (8: Treat the image as a single word.
). Our new command looks like this:
tesseract captcha1.png output --psm 8