Tor-ifying CasperJS

Over the past few months, one of my favorite tools has become CasperJS, which is a navigation and testing utility than runs on top of PhantomJS, a headless web browser. This is a great tool for doing web scraping, which you can use to automate the retrieval of data from webpages, among other things. Sometimes though, you want to test a target web page from a variety of different IP addresses, or find yourself behind a block of banned IP addresses, or just need to anonymize your activity. This is where Tor comes in handy.

Web scraping is a lot of fun, but make sure you are following the commonly accepted rules of web scraping:

  1. Make sure you’re following the target site’s Terms of Service. This means respecting robots.txt and any other restrictions there may be.
  2. Limit your requests. Scraping bots can navigate webpages much faster than normal humans, and you don’t want to accidentally DOS a site with an out of control scraper.
  3. Be nice to the server. If you don’t need images, modify your scraper so it doesn’t download images (PhantomJS has a --load-images=false flag for this). If you want to be really nice to the server, put your e-mail address in the scraper’s HTTP headers so the server admin can contact you if your scraper is giving them a problem.

(These rules were partially adapted from this list)

Now, on to how to “Tor-ify” CasperJS.

Download Tor

Well, obvs. I normally use REHL-based Linux distros, so I’m going to link to those instructions. Once torproject.repo is in your /etc/yum.d or /etc/yum.repos.d, you should be able to yum install tor with no problem. Then start the service with service tor start.

After you’ve confirmed that Tor can run on your machine, feel free to shut it down, as we’ll be coming back to that later.

Write a Script for Testing

How will you be able to tell that CasperJS is properly proxying through Tor? Why not write a script that scrapes whatismyip.com?

Taking a look at the source of whatismyip.com, it looks pretty simple to scrape. The IP address is contained in a div that has a handy id that we can pull data from. My workflow for writing scripts with CasperJS is to fire up the target webpage, write some code in Chrome Developer Tools, then copy that code back into my CasperJS script. Taking a look at the source for whatismyip.com, we can use some vanilla javascript to grab the IP address.

1
var ip = document.getElementById('greenip').textContent.trim()

As a force of habit, I normally add .trim() on to the end of any text I pull out of a DOM node since there’s no point in keeping useless whitespace.

Now, let’s star writing our CasperJS script.

1
2
3
4
5
var document=[]
var casper=require('casper').create({
  verbose: true,
  logLevel: "info"
});

I like to keep logging on for most of my Casper scripts, just because it’s helpful to see what’s going on and the output looks cool. Now we’ll define the first step of our scraping process:

1
2
casper.start("http://whatismyip.com/", function() {
});

This simply tells CasperJS to use PhantomJS to load up http://whatismyip.com.

An important note about CasperJS is even though everything is written in Javascript, you can’t actually manipulate or read the page you’ve loaded into CasperJS without running the evaluate function. So we’ll add to our first step:

1
2
3
4
5
casper.start("http://whatismyip.com/", function() {
      this.evaluate(function () {
          console.log(document.getElementById("greenip").textContent.trim());
      });
});

Once you’re inside this.evaluate, you have access to the document object. Also, notice how we do output with console.log, instead of this.echo. This is because once inside the evaluate function, you no longer have access to the this that refers to the Casper object.

Ok, so tying everything together for this really complicated script:

1
2
3
4
5
6
7
8
9
10
11
12
13
var document = [];
var casper = require('casper').create({
      verbose: true,
          logLevel: "info"
});

casper.start("http://whatismyip.com/", function() {
      this.evaluate(function () {
          console.log(document.getElementById("greenip").textContent.trim());
      });
});

casper.run();

You always have to put casper.run() at the end of your scripts to kick of the process of running through the steps. Now, if you’ll watch your terminal output, you should see your IP address output in the [info] [remote] section of the output. Yay!

Now, start up Tor again and we’ll pass in some parameters and see if the IP address changes.

Proxy PhantomJS

Tor is a SOCKS proxy, not an HTTP proxy. (Most of my previous exposure to Tor was through the Tor Browser Bundle, so this was interesting to me). But PhantomJS can also run through a SOCKS proxy, so no worries. Add these parameters when you start the script:

--proxy-address=127.0.0.1:9050 --proxy-type=socks5

(If you Tor proxy isn’t running on 127.0.0.1:9050, you should change that parameter. You’ll see where the Tor proxy is running when you run service tor start.)

So, running my final Tor-ified CasperJS setup looks a little like this:

casperjs --proxy=127.0.0.1:9050 --proxy-type=socks5 whatismyip.js

Note: while using Tor requests can take significantly longer to go through. I’ve seen some requests with this script take as long as 114 seconds to resolve, but such is the nature of Tor.

Won’t you do the world a favor and add a relay to the Tor network?

Creating Following and Followers Tabs With Flags in Drupal 7

I’m writing this post because I’ve been using Drupal to rapidly-prototype an MVP (and I have many, many thoughts on Drupal, but I won’t get into them here). One of the features was to have a list of followers (who follow a user) and a list of the users that a certain users follows, which is a very typical setup. I’m writing this post because I had one hell of a time putting this thing together in Drupal 7, and I’m hoping it will save someone else from going through the pain I went through.

First, let’s rubber duck and clearly define our requirements for the two lists.

Following: A list of users that a specific user has flagged ‘follow’. In other words, a list of users flagged by another user. Followers: A list of the users that have flagged a specific user to ‘follow’. In other words, a list of users that have flagged another user.

I’m using the terminology flag here since we’ll be using the flag module to put together the following/followers functionality. The two modules we’ll need are:

First, you’re going to need to create a new flag for ‘follow’. Setting up a new flag is fairly trivial, and I called my flag ‘follow_user’ and set it to be a global flag.

Now, create a new view for Followers. You’re going to need to set the following options:

And that’s how you do the followers tab. For the following tab, create another view, with basically the same parameters (the path and will be different, obvs), but the major differenct is with the Contextual Filter:

For reference sake, here is what the views look like in my admin:

Followers

Followers Settings

Following

Following Settings

Hope this helps someone!

An Overview of the Upcoming Facebook Privacy Changes

I can haz a vote?

Facebook is making changes to its Data Use Policy and Terms of Service (Statement of Rights and Responsibilities). Per the current policies, these changes have to be put up to a vote by Facebook users. Of course, the vote is only valid if 30% of the user base participates in the voting (that’s about 300 million people). At the time of this writing, only about 16 thousand users have voted and a majority of those (90%) have voted against the new policies.

In addition to minor changes clarifying data collection, privacy settings, and affiliates, the major change that comes with the new policies is the abolition of voting itself. If the new policies are passed, then there will be no more voting on new policies. Instead, there will be a seven day comment period before they are put in place.

Given the virality of the fake Facebook copyright notice, I’m surprised that the voter turnout is so low. On the other hand, Facebook has not done much in terms of publicising this effort. I suspect they want to do away with voting altogether, as it has never had a meaningful impact on the site’s policies and operations.

In lieu of a document comparing the two policies (other than this pdf from Facebook), I’ve created a little voters guide.

Here are the upcoming Facebook policy changes:

Data Use Policy

Signing up

Old Policy New Policy
Required to provide name, email, birthday, and gender. Have to provide information such as that in the old policy, but may use a telephone number.

Information received by Facebook

Old Policy New Policy
Does not mention Facebook’s affiliates. Language modified to include Facebook’s affiliates(see note)

Note on Affiliates Facebook defines affiliates to be business that are legally part of the same group Facebook is a part of. I take this to mean Facebook Ireland, which is a company set up to take advantage of Ireland’s tax laws, and Facebook Hypderabad.


Messaging (@facebook.com email address)

Old Policy New Policy
Contained details on how to control who messages you. Now anyone in a conversation can message you.

How Facebook uses your information New Policy: Insertion of a clause – “in addition to helping people see and find things that you do and share” – prefacing examples of how Facebook uses your information.


Timeline

New Policy: A reminder that even though you may hide a post, people may see it elsewhere, like on someone else’s timeline or search results.


Finding you on Facebook

Old Policy New Policy
Only friends will be able to find you via e-mail address or phone number, depending on your privacy settings People will be able to find you though a post to a public page or if you are tagged in a friend’s post or photo

Personalized Ads New Policy: Clarifies how personalized ads work. Sponsored Stores New Policy: Subscribers, in addition to friends, will see sponsored stories.


Data Retention New Policy: Facebook may retain information from suspended accounts for up to a year to detect repeat offenders.


Invitations

Old Policy New Policy
Facebook will send up to two reminders to friends you invite. Facebook will send a few reminders to friends you invite.

Affiliates New Policy: Facebook may share information about affiliates (see note above).


Opportunity to Comment and Vote New Policy: Voting is removed, as is the 7000 comment threshold for triggering a vote. Now users have seven days to comment before a change goes into effect.

Statement of Rights and Responsibilities

Your Facebook Timeline

Old Policy New Policy
You will not use a personal Facebook account primarily for commercial gain. You will not use a personal Facebook account primarily for commercial gain, and will create a Facebook Page to do so.

Promotions from Pages New Policy: If you run a promotion on your timeline from your page, you agree to the Pages Terms of Service.


Amendments to the Policy New Policy: Voting removed as well as the 7000 comment threshold (similar to the changes to the Data Use Policy). Seven day comment rule also applies here.

You can vote on these policies here.

I Haz Make Octopress Plugin

This week I’m deep in the process of converting a variety of PHP/MySQL backed sites (mostly Joomla and Symfony 1.4) over to Octopress, mostly because I don’t want to deal with the overhead of running MySQL (In the past, I’ve had to upgrade my Slicehost VPS in order to keep MySQL from hanging). Some of the pages had a little Facebook likebox included with them to collect likes. My search for an Octopress aside that would create a likebox was fruitless, so I made one.

How it works

Basically there is a plugin (likebox.rb) and an aside (likebox.html). To use the plugin/aside you move the files into their respective correct locations, then modify the default layout to load the likebox.rb plugin if Facebook likebox configuration is detected in _config.yml. The only reason there is a plugin is to load up the Facebook Javascript SDK code so the markup in likebox.html works correctly.

I’m wondering if there is a more elegant way to load up the Javascript SDK rather than just adding an if loop that will spit out static markup if true.

Check it out on github: octopress-facebook-likebox

How to Add Video to Octopress, the non-Hacky Way

Well, now that you’ve seen the hacky way to embed videos in Octopress, let me share with you the non-hacky way: the octopress-responsive-video-embed plugin.

It’s a simple plugin that generates some interesting markup for embedding videos (you essentially embed the raw source into an iframe and hope the server on the other end will serve up a player with the html5 or flash stream).

The size of the player/viewport is controlled CSS that defines a responsive, intrinsic ratio for the video.

Pretty interesting stuff.

Deconstructed Star Trek - TNG Season 5 Ep 20 Cost of Living

The Cost of Living on Memory Alpha

Deanna’s mother takes Worf’s son to a skitzoid paradise and explains to him that he is only a body filled with organs. Meanwhile, the space parasites that were eating the ship are ejected into an asteroid belt, and the Enterprise becomes a body without organs.

Deconstructed Star Trek - TNG Season 6 Ep 2 the Chase

The Chase on Memory Alpha

Picard betrays his surrogate father, then discovers the meaning of life, but the search overextends the propulsion system and the ship has to be repaired for two days.

N900 Circles of Hell

So I recently bought a Nokia N900 from ebay. It arrived in a nice little package and while the seller promised the phone would be new, it was actually a refurb.

The phone came pre-loaded with a name and some contacts entirely in Chinese. I tried calling the numbers with my sim card, but I only recieved an automated recording saying the number you have dailed is no longer available and then some code with the letters NY in it… which was interesting in it’s own right as I was calling from New York State.

I set out to preform a factory reset on my N900. Doing a factory reset, which to me entails wiping the contacts, calendars, cookies, and any other personal information from the device and resetting it to it’s fresh out of the factory state seems like it would be a simple operation. However, as I would soon come to learn, nothing is simple with the Nokia N900.

The Nokia N900 is more comptuer than phone. It has a 600Mhz processor and runs debian. While iPhones and Android Phones have a built in software factory reset, and older Nokia phones can simply use a four digit numerical code to reset. Since the N900 is anything but simple, doing a factory reset is on the opposite specturm of ease from the above methods.

The Maemo wiki details a two step process for preforming a factory reset (i.e. flashing the device). There are two types of disc storage on the N900: the eMMC storage which holds user-specific settings (/home/user, for example) and the rootfs storage which holds the root file system. According to the wiki, it’s every important to flash the eMMC storage first then take care of the rootfs. Luckily, the commands for flashing the two are virtually identical.

First, I needed to download the flasher program from the Maemo development site. The Linux-based package I downloaded was a nicely compiled binary that found my USB ports and ran fine. YMMV.

Second, I needed to download both the eMMC image and the rootfs image from here. Note: the device ID number is also printed on a sticker on the box; it’s not necessary to pull out the battery (yet). Remaing the eMMC and rootfs .bin files to something less verbose makes life easier.

Once all the *.bin files were placed in my flasher-3.5 folder, doing the flashing was simple.

Don’t forget, eMMC first, then rootfs

./flasher-3.5 -F eMMC.bin -f

A message saying Suitable USB device not found, waiting indicates that the N900 needs to be turned off, then pluged into a USB port while the ‘u’ key is held down. I guess this boots the device into special USB flashing mode, and I still can’t figure out if this key is easier to hold down with my finger or the stylus.

Flashing the eMMC takes a few minutes, then to be sure everything worked, I pulled out the battery (per the wiki instructions (!)). Once again, leaps and bounds from the iPhone and Android in terms of simplicity.

After the eMMC is flashed, the rootfs needs to be flashed. It’s the same command, only the -R flag is tagged on to reboot the device.

./flasher-3.5 -F rootfs.bin -f -R

Ok, yay! Now the N900 is back to factory settings and essentially a new device. All that I had to do was:

  1. Download a binary
  2. Boot the device in a special ‘flashing’ mode
  3. Flash one part of the disc
  4. Pull out the battery
  5. Flash the other part of the disc!

Wasn’t that simple?

How to Create a Custom Article Type in Octopress, the Hacky Way

Inspired by this Stack Overflow question–which I think is a totally legitimate question and should not have been closed, here is how I did something similar.

The problem

I needed a way to easily embed vimeo videos in posts. I wanted to use the vimeo video id to reference the video, and I wanted the top of my markdown files to look like this:

source/_posts/my-video.markdown
1
2
3
4
5
6
7
layout: post
title: "What a cool video "
date: 2012-09-02 21:35
comments: false
vimeo: 11122132
sharing: false
categories:

The video id (in the vimeo param)is used in vimeo’s generic embed code. Now came time to dig through the Octopress directory structure. It’s pretty complex, and tree is quite helpful for visualizing this. In the source directory, you have layouts and includes. layouts is mostly for page layouts made up of different components in the includes folder.

Once inside includes, I found it easiest to simply modify article.html. Best practices probably dictate going back and tweaking something with the layout, but I don’t know enough about Octopress and Ruby and was having tremendous difficulties with variable scoping (and also escaping Ruby control structures in a codeblock).

source/_includes/article.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
snip
    \{\% if post.vimeo \%\}\
    <div class="entry-video">
      <iframe 
        src="http://player.vimeo.com/video/?byline=0&portrait=0&color=ffffff" 
        width="800" 
        height="600" 
        frameborder="0" 
        webkitAllowFullScreen 
        mozallowfullscreen 
        allowFullScreen>
      </iframe>
    </div>
    \{\%\ endif \%\}\
---snip---

And that’s a quick and easy hacky way to customize articles. I’m sure there’s a better way to do this, but this method is working for my needs right now.

Lazy Bash Auto Install

Sometimes you have a dotfile repository and need an auto installer.

I always like to execute bash with bash -x for maximum debugging goodness

My dotfiles repo consists of both regular files and directories. I’m going to loop through both and run a subroutine on them, for ultimate complexity.

Artifically constructing a for loop here is easy, and it’s going be be simpler to just manually keep track of files. So lets say you have profile, screenrc, and vimrc,

Liquid error: undefined method `Py_IsInitialized’ for RubyPython::Python:Module

Don’t forget you have to declare your subroutines/functions before they are executed

And now the sub_file can just unlink some each file (note it still has access to the $f variable)

Liquid error: undefined method `Py_IsInitialized’ for RubyPython::Python:Module

Taking advantage of the built-in $HOME and $PWD I believe is a good practice.

Ok, cool, you’re done. For directories, use the -d flag to check if the directory exists or not.