Sixsided
Interactive Dev in Portland, OR

SVG continues its march towards ascendancy as a web graphic format. There are tools for generating svg at runtime -- d3.js, svg.js, and raphael.js, to name a few -- but what if you just want a procedurally generated graphic for use as a background image? It makes more sense to do it once than to have each client regenerate it on every pageload.

I found a few people doing this online, with approaches falling into two categories:

  • node + jsdom + an SVG library like Raphaël (example)
  • Server-side tool like SVuGy to build an svg file from Ruby, etc.
  • PhantomJS + one of the popular SVG libraries

I'm really enjoying client-side Web technology lately, so I decided to take the third path: Load an SVG-generating page into a headless browser, grab the SVG node out of the DOM, and save it to a file. For the browser I used the excellent PhantomJS. My script was this:

#!/Users/sixsided/bin/phantomjs

var page = require('webpage').create();
var args = require('system').args;
var system = require('system');


var svg_url = args[1];
var svg_jquery_selector = args[2] || 'svg';  // default to reading first <svg> in page

system.stderr.write([svg_url, svg_jquery_selector].join(', '));

page.onConsoleMessage = function (msg) {
    // print any js errors to stderr
    system.stderr.write("\n\n\n\n");
};

page.open('http://livecode/mksvg.html', function () {
    system.stderr.writeLine('this: ' +  this +  '  window: ' +  window);

    system.stdout.write(page.evaluate(function(selector){ 
        // SVG is a different XML dialect from HTML, so it has no innerHTML; 
        // wrap it in a div in order to get at its content
        var d = document.createElement('div');
        d.appendChild(jQuery(selector)[0].cloneNode(true /* deep copy */));    
        return d.innerHTML;            
    }, svg_jquery_selector));
    phantom.exit();

});
Usage:
    scrape_svg.js http://d3js.org/ svg
PhantomJS isn't especially complicated, but it is different from straight-up browser scripting. The browser js environment is sandboxed away from then nodejs environment via callbacks that can only accept (and return) simple Javascript objects (ie, only what you could pass through JSON). There's a DOMWindow object available in the top-level nodejs scope, on which you can call setTimeout, etc.