Tracking AT&T Data Plan Use With PhantomJS / Part 1

Tracking AT&T Data Plan Use With PhantomJS / Part 1

When my family moved to the cheaper AT&T Shared Data Plan we saved hundreds of dollars every year. But, now we had to better manage the data use because it was shared by 4 people. Most months we don’t have a problem but I would like to know when we are over budget early enough to tell everyone to conserve. No Problem, I thought. I will use my Raspberry Pi to scrape the AT&T web pages and send me an alert when the use is too high. I already use my RPi to gather information like the outside temperature and stock quotes and this shouldn’t be any different. I was wrong. The AT&T site makes heavy use of javascript and jQuery and I was not able to the simple methods I used in the past. That is where PhantomJS comes into the picture.

PhantomJS if a full webkit browser that can be scripted with javascript. So it renders all the jQuery goodness AT&T puts on the page AND I can create a simple automation script to scrape the data. In Part 1, I will cover the automation framework I created. PhantomJS has a full testing framework called CasperJS but I though that was overkill for my project. I just needed to step through a few pages and grab some content. In Part 2 I will cover scraping the AT&T site and some of the unique steps I had to take to script the site.

Step 0 – Get PhantomJS

If you need PhantomJS, see this post on how to compile it on the Raspberry Pi.

The Basic Loop

The framework is basically a list of PhantomJS commands that gets executed until the end. I created an array of function calls that get called one after the other with the javascript setInterval() method. The core of the script looks like this:

var page = new WebPage(), testindex = 0;
var steps = [
  function() {
    //Load a Page
    page.open("http://www.google.com");
  },
];
// loop through each function call in the steps array
interval = setInterval(function() {
   // Save screenshot for debugging purposes
   steps[testindex]();
   page.render("step" + (testindex + 1) + ".png");
   testindex++;
 }
}, 3000);

You can also see two PhantomJS methods. The first is page.open(). This does what you would expect – it browses and opens a webpage. The page object can then be manipulated as needed. The second method is page.render(). This will create an image on the rendered page. This can be very useful when debugging.

Making sure the script doesn’t run too quickly

The setInterval() method above is set to execute a new step every 3 seconds. But sometimes webpages take longer than that to render. This is especially true on the Pi. Some checking with page.onLoadStarted() is required to make sure the page is fully loaded.

page.onLoadStarted = function() {
loadInProgress = true;
};

page.onLoadFinished = function() {
loadInProgress = false;
};

Other Items

Passing in parameters

I also added in a little code to support passing in parameters and an error message if the parameters were missing:

if (system.args.length < 2) {
   console.log('Usage: framework.js some-value');
   console.log('Example: framework.js xyzzy');
   phantom.exit(1);
} else {
   someValue = system.args[1];
}

Printing Page Console Messages

When PhantomJS renders a page it does not dump page console messages to the system console. The messages go to the PhantomJS internal console. But, it does generate an event that can display the messages: I added the following code:

page.onConsoleMessage = function(msg) {
    console.log(msg);
};

The Escape Clause

I found that not all web pages load correctly every time. When a page didn’t load correctly, PhantomJS would just get stuck on that step and never continue. So I created a check that kills the whole thing after running for 15 minutes. At the top of the code I figure out when the script should escape and then I check for that time at the bottom of the loop.

Start of script...
// Allow the routine to run for 15 minutes then die
var d = new Date();
var t = d.getTime() + 1000*60*15;

More code until the bottom of the loop ...

// exit if the routine runs too long
 if ((d.getTime() > t) || (typeof steps[testindex] != "function")) {
   //console.log("test complete!");
   phantom.exit();
 }

Using jQuery to test for content on the page

The AT&T site uses jQuery so I was able to make use of that for the AT&T script. But, if you are parsing a site that doesn’t include jQuery by default, it can be included using page.injectJS() in the main loop. I created a loadJQ flag to indicate when jQuery needs to be loaded. One note, the jQuery javascript needs to be stored on your local disk in the same directory as the framework script.

Putting it all together

OK, so maybe the framework script is a little complicated when explained in chunks. Here is the whole script all together:


var page = new WebPage(), system = require('system'), testindex = 0, loadInProgress = false, loadJQ;

// Allow the routine to run for 15 minutes then die
var d = new Date();
var t = d.getTime() + 1000*60*15;

if (system.args.length < 2) {
   console.log('Usage: framework.js some-value');
   console.log('Example: framework.js xyzzy');
   phantom.exit(1);
} else {
   someValue = system.args[1];
}

page.onConsoleMessage = function(msg) {
  if (msg.substr(0,3) == "###") {
    console.log(msg);
  }
};

page.onLoadStarted = function() {
  loadInProgress = true;
  // uncomment this line if you want to use jquery and the target page does not include it
  //loadJQ = true;
  //console.log("load started");
};

page.onLoadFinished = function() {
  loadInProgress = false;
  //console.log("load finished");
};

var steps = [
  function() {
    //Load a Page
    page.open("http://www.google.com");
  },
  function() {
    //echo the value
    console.log('You passed in:' + someValue);
  }, 
];

// loop through each function call in the steps array
interval = setInterval(function() {
  if (!loadInProgress && typeof steps[testindex] == "function") {
    console.log("step " + (testindex + 1));
    // Inject jQuery for scraping (you need to save jquery-1.6.1.min.js in the same folder as this file)
    if (loadJQ) {
        page.injectJs("jquery-1.6.1.min.js");
        loadJQ = false;
        //console.log("injected JQ");
    }
   // Save screenshot for debugging purposes
   page.render("step" + (testindex + 1) + ".png");
   steps[testindex]();
   testindex++;
 }
 // exit if the routine runs too long
  if ((d.getTime() > t) || (typeof steps[testindex] != "function")) {
    //console.log("test complete!");
    phantom.exit();
  }
}, 3000);

Some things to point out: This example has two steps. The first displays the Google homepage and the second prints the command line argument. Also, the first debug image is always black because the first step has not run yet. I just ignore that image to keep the script simple. Last, most of the debugging messages are commented out but still in the code. Add them back in to get a better understanding of what is happening.

When you run the framework.js you should see the following:

./phantomjs framework.js 12345
step 1
step 2
You passed in:12345

That’s it!

That is all for now. In Part 2 I will use the framework to log into the AT&T site and grab my data plan usage numbers.