Creative Web Automation: Generate Ad-Free Study Notes
How to use an automated script to grab only the text content off online study notes
A repost of my daughter’s article with permission. I added the review part. This is also included in my “How to in Selenium WebDriver” series.
For my high-school English book studies, occasionally I referred to online study notes, such as SparkNotes. SparkNotes (and others) provided the content free but split it into multiple small sections with commercial ads. I could live with it; just a lot of pages to click through. My father saw it and created clean combined versions of just the text on these sites using automation scripts.
This article documents how I repeated his work, using raw Selenium in Ruby scripts.
Table of Contents:
· Target Website
∘ Main Page (for Macbeth)
∘ A Section Page:
· Automation Design
· Implementation
∘ 1. Save note pages
∘ 2. Parse the HTML to extract the content
∘ 3. Filter out unneeded links.
∘ 4. Combine all section files.
∘ 5. Optimize and Print out
· Complete Script
∘ Script 1 — Save the notes (multiple) to separate HTML files
∘ Script 2 — Parse and generate the clean HTML version
· Review (by Zhimin)
∘ Tip 1: Flexible to use the best tool for the job
∘ Tip 2: Automation Script ≠…