User blog comment:Kagimizu/New Hope for an Old Cause?/@comment-1272757-20120301183555/@comment-4748628-20120301191457

It involves a lot of UNIX bash and a lot of PERL scripting. It's a lot to explain here without confusing everyone. I've got some scripts that have already proven that they can get text from the pages and the images as well.