Member-only story

Craigslist’s Devious Strategy to Prevent Scraping

The bell character is not your friend

Thomas Smith
2 min readDec 24, 2019
Photo by Luís Perdigão on Unsplash

In the early 2010s, screen scraping was all the rage. Everyone wanted to scrape data from websites, and I did a ton of scraping projects.

Lots of sites set up elaborate systems to prevent scraping — like populating data on the page using Javascript onloads, since Mechanize and other scraping tools didn’t do JS.

I remember that Craigslist, though, had the most devious strategy. If you tried to scrape their site and had the wrong user-agent, they didn’t just send back a null result.

Instead, they sent back megabytes of a single character repeated over and over — the bell character!

This meant that if you were trying to print the resulting data to the console for debugging on Linux, it would ring your motherboard bell over and over again, thousands of times. On many systems, this was a blocking function, so you would end up with a frozen computer making a constant “BONG BONG BONG BONG” sound from its internal speaker, which couldn’t be muted or switched off. You couldn’t even CTRL + C.

The only solution was to restart the whole thing, or wait out hours of dinging.

It was a devious strategy, and definitely discouraged scraping the site. You could find ways around it, but mess up once and print to the terminal, and you got…DING DING DING.

--

--

Thomas Smith
Thomas Smith

Written by Thomas Smith

CEO of Gado Images | Content Consultant | Covers tech, food, AI & photography | http://bayareatelegraph.com & http://nofrillsinfluencer.com | tom@gadoimages.com

No responses yet