Creative Web Automation: Generate Ad-Free Study Notes

How to use an automated script to grab only the text content off online study notes

Zhimin Zhan

--

A repost of my daughter’s article with permission. I added the review part. This is also included in my “How to in Selenium WebDriver” series.

For my high-school English book studies, occasionally I referred to online study notes, such as SparkNotes. SparkNotes (and others) provided the content free but split it into multiple small sections with commercial ads. I could live with it; just a lot of pages to click through. My father saw it and created clean combined versions of just the text on these sites using automation scripts.

This article documents how I repeated his work, using raw Selenium in Ruby scripts.

Table of Contents:
· Target Website
Main Page (for Macbeth)
A Section Page:
· Automation Design
· Implementation
1. Save note pages
2. Parse the HTML to extract the content
3. Filter out unneeded links.
4. Combine all section files.
5. Optimize and Print out
· Complete Script
Script 1 — Save the notes (multiple) to separate HTML files
Script 2 — Parse and generate the clean HTML version
· Review (by Zhimin)
Tip 1: Flexible to use the best tool for the job
Tip 2: Automation Script ≠

--

--