<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Stuart A. Thompson</title>
	<atom:link href="http://www.stuartathompson.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.stuartathompson.com</link>
	<description>A multimedia journalist practicing digital journalism in Toronto</description>
	<lastBuildDate>Sat, 12 May 2012 02:20:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>We are the class of 2012</title>
		<link>http://www.stuartathompson.com/2012/05/we-are-the-class-of-2012/</link>
		<comments>http://www.stuartathompson.com/2012/05/we-are-the-class-of-2012/#comments</comments>
		<pubDate>Sat, 12 May 2012 02:20:04 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[The Globe and Mail]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=23030</guid>
		<description><![CDATA[This folio package was a two-page spread in the paper focusing on poor prospects for recent grads. It wouldn&#8217;t translate well online: not only would the text appear in a linear way, but you&#8217;d lose the scanability from the blurbs. I built this interactive using gRaphael&#8230; for some reason. Really, it would have been easier to create static images since the interactivity was low. But it was good practice and an exercise in turning around an interactive very quickly. See the full version.]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p><a href="http://www.theglobeandmail.com/news/national/debt-ridden/article2430353/"><img class="aligncenter size-full wp-image-23031" title="Screen Shot 2012-05-11 at 10.19.13 PM" src="http://www.stuartathompson.com/wp-content/uploads/2012/05/Screen-Shot-2012-05-11-at-10.19.13-PM.png" alt="" width="611" height="702" /></a></p>
<p>This folio package was a two-page spread in the paper focusing on poor prospects for recent grads. It wouldn&#8217;t translate well online: not only would the text appear in a linear way, but you&#8217;d lose the scanability from the blurbs.</p>
<p>I built this interactive using gRaphael&#8230; for some reason. Really, it would have been easier to create static images since the interactivity was low. But it was good practice and an exercise in turning around an interactive very quickly.</p>
<p><a href="http://www.theglobeandmail.com/news/national/debt-ridden/article2430353/">See the full version.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/05/we-are-the-class-of-2012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>One of my interactives nominated for a Data Journalism Award</title>
		<link>http://www.stuartathompson.com/2012/04/one-of-my-interactives-nominated-for-a-data-journalism-award/</link>
		<comments>http://www.stuartathompson.com/2012/04/one-of-my-interactives-nominated-for-a-data-journalism-award/#comments</comments>
		<pubDate>Sat, 28 Apr 2012 12:07:15 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[The Globe and Mail]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=23028</guid>
		<description><![CDATA[I&#8217;m honoured to have one of my interactives nominated for the first annual Data Journalism Awards. The international competition is supported by Google and run by the Global Editors Network. It&#8217;s also the first of its kind in the world. The Globe submitted a handful of pieces and the Sunshine List interactive was nominated for international/national data application. The sortable, searchable table took a couple weeks of scraping, refining and developing to get in working order. I also found a way to search the previous year&#8217;s records to show any change in income or benefits, adding an interesting layer to the data. There are some fantastic entires from The Guardian, The Australian, the BBC, The Toronto Star and more. See all the nominees.]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p>I&#8217;m honoured to have one of my interactives nominated for the first annual <a href="http://datajournalismawards.org/">Data Journalism Awards</a>. The international competition is supported by Google and run by the Global Editors Network. It&#8217;s also the first of its kind in the world.</p>
<p>The Globe submitted a handful of pieces and the <a title="Searching the sunshine list" href="http://www.stuartathompson.com/2012/03/searching-the-sunshine-list/">Sunshine List interactive</a> was nominated for international/national data application. The sortable, searchable table took a couple weeks of scraping, refining and developing to get in working order. I also found a way to search the previous year&#8217;s records to show any change in income or benefits, adding an interesting layer to the data.</p>
<p>There are some fantastic entires from The Guardian, The Australian, the BBC, The Toronto Star and more. <a href="http://datajournalismawards.org/nominees/">See all the nominees</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/04/one-of-my-interactives-nominated-for-a-data-journalism-award/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Explore the data behind Vancouver&#8217;s high school system</title>
		<link>http://www.stuartathompson.com/2012/04/explore-the-data-behind-vancouvers-high-school-system/</link>
		<comments>http://www.stuartathompson.com/2012/04/explore-the-data-behind-vancouvers-high-school-system/#comments</comments>
		<pubDate>Sun, 22 Apr 2012 00:47:22 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[The Globe and Mail]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=23025</guid>
		<description><![CDATA[This interactive lets you explore mucho data behind Vancouver&#8217;s high schools, from capacity to graduation rates. It was created using Raphael, a Javascript library that creates vector images, so it&#8217;s great for maps. I also used gRaphael to create an animate some pie charts and bar charts. My favourite part, though, came from mapping the location of all the students. It took a lot of work — resizing the dots, scaling and positioning them to match the native data. But the final result is pretty cool. See the full interactive.]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p><a href="http://www.theglobeandmail.com/news/national/british-columbia/interactive-explore-the-data-behind-vancouvers-high-schools/article2409271/"><img class="aligncenter size-full wp-image-23026" title="Screen Shot 2012-04-21 at 8.34.42 PM" src="http://www.stuartathompson.com/wp-content/uploads/2012/04/Screen-Shot-2012-04-21-at-8.34.42-PM.png" alt="" width="605" height="381" /></a></p>
<p>This interactive lets you explore mucho data behind Vancouver&#8217;s high schools, from capacity to graduation rates.</p>
<p>It was created using Raphael, a Javascript library that creates vector images, so it&#8217;s great for maps. I also used gRaphael to create an animate some pie charts and bar charts.</p>
<p>My favourite part, though, came from mapping the location of all the students. It took a lot of work — resizing the dots, scaling and positioning them to match the native data. But the final result is pretty cool.</p>
<p><a href="http://www.theglobeandmail.com/news/national/british-columbia/interactive-explore-the-data-behind-vancouvers-high-schools/article2409271/">See the full interactive.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/04/explore-the-data-behind-vancouvers-high-school-system/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>My new pet project: Ad Hoc Data</title>
		<link>http://www.stuartathompson.com/2012/04/my-new-pet-project-ad-hoc-data/</link>
		<comments>http://www.stuartathompson.com/2012/04/my-new-pet-project-ad-hoc-data/#comments</comments>
		<pubDate>Sun, 22 Apr 2012 00:41:41 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Data]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=23022</guid>
		<description><![CDATA[I&#8217;ve started a new collective, kind of like the Open House Arts Collective of yore, but this time about data journalism. It&#8217;s called Ad Hoc Data. It&#8217;s a chance to work with like-minded nerds interested in exploring the tech and story-telling that comes from data. Our first project, a federal budget calculator, took quite a bit of time and offered some interesting results. Try it yourself. I&#8217;m excited to see where it goes next. It&#8217;s a chance for me to experiment and create with more freedom than usually afforded to me. And a chance to meet other like-minded folks interested in creating, creating and creating.]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p><a href="http://www.adhocdata.ca"><img class="aligncenter size-large wp-image-23023" title="Screen Shot 2012-04-21 at 8.35.25 PM" src="http://www.stuartathompson.com/wp-content/uploads/2012/04/Screen-Shot-2012-04-21-at-8.35.25-PM-640x271.png" alt="" width="640" height="271" /></a></p>
<p>I&#8217;ve started a new collective, kind of like the Open House Arts Collective of yore, but this time about data journalism. It&#8217;s called <a href="http://www.adhocdata.ca">Ad Hoc Data</a>. It&#8217;s a chance to work with like-minded nerds interested in exploring the tech and story-telling that comes from data.</p>
<p>Our first project, a <a href="http://www.adhocdata.ca/federal">federal budget calculator</a>, took quite a bit of time and offered some interesting results. <a href="http://www.adhocdata.ca/federal">Try it yourself</a>.</p>
<p>I&#8217;m excited to see where it goes next. It&#8217;s a chance for me to experiment and create with more freedom than usually afforded to me. And a chance to meet other like-minded folks interested in creating, creating and creating.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/04/my-new-pet-project-ad-hoc-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Searching the sunshine list</title>
		<link>http://www.stuartathompson.com/2012/03/searching-the-sunshine-list/</link>
		<comments>http://www.stuartathompson.com/2012/03/searching-the-sunshine-list/#comments</comments>
		<pubDate>Sat, 24 Mar 2012 21:20:35 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[The Globe and Mail]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=23014</guid>
		<description><![CDATA[The Ontario government released their annual Sunshine List on March 24, detailing public sector employees earning more than $100,000 per year. I created a table so readers can explore the list in more detail, letting you can search, sort and filter by name, salary and more. This list is published each year on the Government’s website in a way that’s hard to search, impossible to sort and difficult to navigate. The Globe wanted to pull the data from this year’s list and publish it in a more usable way, as a tool for our reporters and our readers. Here’s a little background on how I made the tool. I started by building a scraper, a program that trolls web pages for content and saves it in a more sophisticated way than copy-paste. Using a coding language called Python, I built a universal scraper that could pull all the data back [...]]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p><a href="http://www.theglobeandmail.com/news/national/who-cracks-six-figures-among-ontarios-best-paid-civil-servants/article2375973"><img class="aligncenter size-large wp-image-23015" title="sunshine" src="http://www.stuartathompson.com/wp-content/uploads/2012/03/sunshine-640x360.png" alt="" width="640" height="360" /></a></p>
<p>The Ontario government released their annual Sunshine List on March 24, detailing public sector employees earning more than $100,000 per year. I created a table so readers can explore the list in more detail, letting you can search, sort and filter by name, salary and more.</p>
<p>This list is published each year on the Government’s website in a way that’s hard to search, impossible to sort and difficult to navigate. The Globe wanted to pull the data from this year’s list and publish it in a more usable way, as a tool for our reporters and our readers.</p>
<p>Here’s a little background on how I made the tool.</p>
<p>I started by building a scraper, a program that trolls web pages for content and saves it in a more sophisticated way than copy-paste. Using a coding language called Python, I built a universal scraper that could pull all the data back to 1997 – the first year it was released.</p>
<p>I cleaned this data using Google Refine, converting encoded HTML characters and renaming some categories. The next challenge was cutting the data down as much as possible. While there were only a few thousand records back in the 1990s, other years had as many as 79,000 records, making the file sizes very large. While Chrome and Firefox could handle it well, Internet Explorer chugged slowly with each new megabyte I pushed its way.</p>
<p>So I divided the master file into several chunks by category and opted to not use a JSON array, since the keys added unnecessary kilobytes to the file. A standard JavaScript array was used instead.</p>
<p>Usually when you’re dealing with a big dataset, programmers use server-side language like PHP to query the data. But we were wary of doing this because the technical and administrative overhead seemed insurmountable within our timeframe. So we tried exploring browser-side options and settled on SlickGrid, an open-source JavaScript plug-in that handles massive amounts of data very well. The plug-in had to be customized to handle some extra functionality: currency sorting, historical comparisons and a universal search box.</p>
<p>Since I had data from the 2010 release, I added a feature to let readers compare increases or decreases. In an earlier version of the table, I also included the employer name and position with this pop-up. But I had to cut it late in development because it nearly doubled the size of the 2010 dataset.</p>
<p>The final tool is very simple to use and, admittedly, not very flashy. But it lets readers dig a little deeper into the list, find notable people or search specific jobs.</p>
<p><a href="http://www.theglobeandmail.com/news/national/who-cracks-six-figures-among-ontarios-best-paid-civil-servants/article2375973">View the full interactive.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/03/searching-the-sunshine-list/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The most congested intersection in Toronto</title>
		<link>http://www.stuartathompson.com/2012/03/the-most-congested-intersection-in-toronto/</link>
		<comments>http://www.stuartathompson.com/2012/03/the-most-congested-intersection-in-toronto/#comments</comments>
		<pubDate>Sat, 10 Mar 2012 22:51:02 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[The Globe and Mail]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=23009</guid>
		<description><![CDATA[This map takes data from the Toronto Traffic Safety Unit and plots over 2,000 points measuring traffic volume. I&#8217;ve also added a unique view showing just the top 100, where the radius of each circle corresponds to the number of cars passing through that point. Finally, we used on qualitative data to plot the ten most congested points in the city. Each uses a different technique: the first uses standard Fusion Tables plotting; the second uses a custom Fusion Tables query and several calculations to plot circle objects onto the map; the third makes another query and uses the latitude and longitude as coordinates to plot custom icons. Overall this took about a week and accompanies a great story on traffic congestion.]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p>This map takes data from the Toronto Traffic Safety Unit and plots over 2,000 points measuring traffic volume. I&#8217;ve also added a unique view showing just the top 100, where the radius of each circle corresponds to the number of cars passing through that point. Finally, we used on qualitative data to plot the ten most congested points in the city.</p>
<p><a href="http://www.stuartathompson.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-10-at-5.45.53-PM.png" rel="lightbox[23009]"><img class="aligncenter size-large wp-image-23010" title="Screen Shot 2012-03-10 at 5.45.53 PM" src="http://www.stuartathompson.com/wp-content/uploads/2012/03/Screen-Shot-2012-03-10-at-5.45.53-PM-640x507.png" alt="" width="640" height="507" /></a></p>
<p>Each uses a different technique: the first uses standard Fusion Tables plotting; the second uses a custom Fusion Tables query and several calculations to plot circle objects onto the map; the third makes another query and uses the latitude and longitude as coordinates to plot custom icons. Overall this took about a week and accompanies a great story on traffic congestion.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/03/the-most-congested-intersection-in-toronto/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Perils threatening the Northern Gateway pipeline</title>
		<link>http://www.stuartathompson.com/2012/03/perils-threatening-the-northern-gateway-pipeline/</link>
		<comments>http://www.stuartathompson.com/2012/03/perils-threatening-the-northern-gateway-pipeline/#comments</comments>
		<pubDate>Thu, 01 Mar 2012 00:20:22 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[The Globe and Mail]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=22981</guid>
		<description><![CDATA[My latest map plots the various hazards that threaten the Northern Gateway pipeline — from underground earthquakes to landslides to protected animal populations. The pipeline ends at Kitimat, B.C., a small coastal port on the west Coast. From there, oil&#8217;s transferred onto tankers and tugged through the maze of the Douglas Channel. There are three exit opens but only one approved anchor point. It&#8217;s an interesting look at the tremendous difficulty in getting oil out of Canada. View the full interactive]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p>My <a href="http://www.theglobeandmail.com/news/national/perils-threatening-the-northern-gateway-pipeline/article2353245/">latest map</a> plots the various hazards that threaten the Northern Gateway pipeline — from underground earthquakes to landslides to protected animal populations. The pipeline ends at Kitimat, B.C., a small coastal port on the west Coast. From there, oil&#8217;s transferred onto tankers and tugged through the maze of the Douglas Channel. There are three exit opens but only one approved anchor point. It&#8217;s an interesting look at the tremendous difficulty in getting oil <em>out</em> of Canada.</p>
<p><a href="http://www.theglobeandmail.com/news/national/perils-threatening-the-northern-gateway-pipeline/article2353245/"><img class="aligncenter size-large wp-image-22982" title="Screen Shot 2012-02-29 at 7.00.08 PM" src="http://www.stuartathompson.com/wp-content/uploads/2012/03/Screen-Shot-2012-02-29-at-7.00.08-PM-640x336.png" alt="" width="640" height="336" /></a></p>
<p><a href="http://www.theglobeandmail.com/news/national/perils-threatening-the-northern-gateway-pipeline/article2353245/">View the full interactive</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/03/perils-threatening-the-northern-gateway-pipeline/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Random Drummond</title>
		<link>http://www.stuartathompson.com/2012/02/random-drummond/</link>
		<comments>http://www.stuartathompson.com/2012/02/random-drummond/#comments</comments>
		<pubDate>Sat, 25 Feb 2012 21:32:11 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Data]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=22976</guid>
		<description><![CDATA[After scraping the content for all the Drummond reports, I wanted to show them in a simple way. Inspired by What The F#*! Should I Make For Dinner, I created this simple site that shows a random recommendation with option to tweet the ones you like. The site works by pulling in all the recommendations in a JSON file, then creating a random number and pulling that record. I also added a query option, so if you add &#8220;?37&#8243; to the end, it&#8217;ll pull the 37th recommendation. This is helpful for tweeting and recalling an interesting recommendation. Visit the site.]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p>After <a title="Drummond Report recommendations via scrape" href="http://www.stuartathompson.com/2012/02/drummond-report-recommendations-via-scrape/">scraping the content for all the Drummond reports</a>, I wanted to show them in a simple way. Inspired by <a href="http://whatthefuckshouldimakefordinner.com/">What The F#*! Should I Make For Dinner</a>, I created <a href="http://www.stuartathompson.com/drummond/">this simple site</a> that shows a random recommendation with option to tweet the ones you like.</p>
<p><a href="http://www.stuartathompson.com/drummond/"><img class="aligncenter size-large wp-image-22977" title="Screen Shot 2012-02-25 at 4.27.37 PM" src="http://www.stuartathompson.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-25-at-4.27.37-PM-640x425.png" alt="" width="640" height="425" /></a></p>
<p>The site works by pulling in all the recommendations in a JSON file, then creating a random number and pulling that record. I also added a query option, so if you add &#8220;?37&#8243; to the end, it&#8217;ll pull the 37th recommendation. This is helpful for tweeting and recalling an interesting recommendation.</p>
<p><a href="http://www.stuartathompson.com/drummond/">Visit the site.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/02/random-drummond/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Drummond Report recommendations via scrape</title>
		<link>http://www.stuartathompson.com/2012/02/drummond-report-recommendations-via-scrape/</link>
		<comments>http://www.stuartathompson.com/2012/02/drummond-report-recommendations-via-scrape/#comments</comments>
		<pubDate>Sat, 25 Feb 2012 00:42:50 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=22970</guid>
		<description><![CDATA[The Drummond Report offered mucho insight into Ontario&#8217;s future. But the recommendations themselves were buried in 300 pages of background and chitter. What if you just wanted the recommendations? You could browse all 20 chapters on the government website. Or, with some deft scraping, you could pull them down and throw them into a table of your own. That&#8217;s what I did last week with the help of ScraperWiki, a super-handy website that gets you up-and-running with Python, Mechanize and other scraping libraries in no time flat. Download the Drummond report recommendations (CSV) or visit the website for the full text. How the scrape worked Here&#8217;s a look at the full scrape from ScraperWiki. You can see this on the site too. import scraperwiki import lxml.html import re from BeautifulSoup import BeautifulSoup for x in range(1,21): src = 'http://www.fin.gov.on.ca/en/reformcommission/chapters/ch%d.html' % (x) html = scraperwiki.scrape(src) soup = BeautifulSoup(html) filter = soup.findAll('p') for [...]]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p>The Drummond Report offered mucho insight into Ontario&#8217;s future. But the recommendations themselves were buried in 300 pages of background and chitter. What if you just wanted the recommendations? You could browse <a href="http://www.fin.gov.on.ca/en/reformcommission/">all 20 chapters</a> on the government website. Or, with some deft scraping, you could pull them down and throw them into a table of your own.</p>
<p>That&#8217;s what I did last week with the help of ScraperWiki, a super-handy website that gets you up-and-running with Python, Mechanize and other scraping libraries in no time flat.</p>
<p>Download <a href="http://www.stuartathompson.com/wp-content/uploads/2012/02/fin_gov_ch6.csv">the Drummond report recommendations</a> (CSV) or visit the website for <a href="http://www.fin.gov.on.ca/en/reformcommission/">the full text</a>.</p>
<h2>How the scrape worked</h2>
<p>Here&#8217;s a look at the full scrape from ScraperWiki. You can see this <a href="https://scraperwiki.com/scrapers/fin_gov_ch6/edit/">on the site</a> too.</p>
<p><span id="more-22970"></span></p>
<p><p>
								<pre class="Plum_Code_Box"><code class="javascript">import scraperwiki           
import lxml.html
import re
from BeautifulSoup import BeautifulSoup

for x in range(1,21):
    src = 'http://www.fin.gov.on.ca/en/reformcommission/chapters/ch%d.html' % (x)
    html = scraperwiki.scrape(src) 
    soup = BeautifulSoup(html)    
    filter = soup.findAll('p')
    for y in filter:
        z = y.find(re.compile('^strong'))
        if z: 
            a = z.find(text=re.compile('^Recommendations*'))
            if a: 
                location = y.getText().split(':')[0]
                text = y.getText().split(':')[1]
                chapter = location.split('-')[0].split(' ')[1]
                recId = location.split('-')[1]
                data = {
                'rec':chapter + '-' + recId,
                'chapter':chapter,
                'recId':recId,
                'data':text
                }
                scraperwiki.sqlite.save(unique_keys=[&quot;rec&quot;], data=data)
</code>
									</pre>
							</p></p>
<p>Breaking this down a bit, we first import a few necessary scripts, the most important of which (for our purposes) is <a href="http://www.crummy.com/software/BeautifulSoup/">BeautifulSoup</a>, a text parser that makes it really easy to drill into HTML and find specific tables, tags and text.</p>
<p>Next, I&#8217;ve set up a loop that cycles from chapter 1 to chapter 20, then dynamically subs in the appropriate chapter number. This is a <a href="http://python.org/">Python</a> variable here: wherever %d is displayed, it&#8217;ll get the variable from the brackets at the end. In this case, %d stands for x, which is the chapter number.</p>
<p><p>
								<pre class="Plum_Code_Box"><code class="javascript">    src = 'http://www.fin.gov.on.ca/en/reformcommission/chapters/ch%d.html' % (x)</code>
									</pre>
							</p></p>
<p>I&#8217;ve taken this HTML and pushed it through BeautifulSoup, which gives me access to all new functions like find() and findAll(). In this case, I want just the recommendations, which are conveniently stored in &lt;strong&gt; tags, which are stored in paragraphs. So each recommendation looks something like:</p>
<p><p>
								<pre class="Plum_Code_Box"><code class="html">&lt;p&gt;&lt;strong&gt;Recommendation 1-1:&lt;/strong&gt; This is the first recommendation...&lt;/p&gt;</code>
									</pre>
							</p></p>
<p>So on each page, I want to look at all the paragraphs. This translates to <strong>soup.findAll(&#8216;p&#8217;)</strong>. Now that I have them, I want to go through each one and determine if it begins with a bold (or strong) tag. So <strong>y.find(re.compile(&#8216;^strong&#8217;))</strong>. The &#8216;^&#8217; symbol is a regex symbol for &#8220;at the beginning of.&#8221; I don&#8217;t much care if there&#8217;s a strong tag somewhere else in the paragraph; if it&#8217;s not at the beginning, it&#8217;s not a recommendation.</p>
<p>But to be double-sure, I also want to check that strong tag for the word recommendation. So each time I find the strong tag, I also check if there&#8217;s the word &#8220;Recommendation&#8221; at the beginning of it. Thus:</p>
<p><p>
								<pre class="Plum_Code_Box"><code class="javascript">       z = y.find(re.compile('^strong'))
        if z: 
            a = z.find(text=re.compile('^Recommendations*'))</code>
									</pre>
							</p></p>
<p>Now I&#8217;m sure I&#8217;ve found the word &#8220;Recommendation&#8221; at the start of a &lt;strong&gt; tag, which is at the start of a paragraph (&lt;p&gt;) tag. Phew! Now, under <a href="https://twitter.com/#!/frabcus">frabcus</a>&#8216; suggestion, I also want to store the chapter and recommendation number. So I&#8217;ve split the text inside the &lt;p&gt; tag at the first colon (returning all the text before &#8220;Recommendation 1-1:&#8221;), then split this a bit further on the &#8220;-&#8221; tag, then on the space. This gives me two values, the first being the first number and the second being the second number. For reasons unclear at the moment, it sometimes didn&#8217;t get a value here, but no bother. I can clean this up in the final version.</p>
<p>I&#8217;ve then pushed this all into a ScraperWiki table using their SQLite save() command. To do that, I needed a unique key for each row, which I created using the chapter and recommendation number in the text itself.</p>
<p>Any questions? I&#8217;m happy to help where I can.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/02/drummond-report-recommendations-via-scrape/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data journalism: How to create a map with Fusion Tables</title>
		<link>http://www.stuartathompson.com/2012/02/data-journalism-how-to-create-a-map-with-fusion-tables/</link>
		<comments>http://www.stuartathompson.com/2012/02/data-journalism-how-to-create-a-map-with-fusion-tables/#comments</comments>
		<pubDate>Wed, 22 Feb 2012 05:03:39 +0000</pubDate>
		<dc:creator>Stuart</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Data]]></category>

		<guid isPermaLink="false">http://www.stuartathompson.com/?p=22949</guid>
		<description><![CDATA[Google has a lovely suite of tools for creating custom maps, chief among them the Maps API and Fusion Tables. They work together like brother and sister to create respectable visualizations with mucho data. Let&#8217;s dive in. Get your data ready Your data should be in a clean CVS or XLS table. You&#8217;ll probably have to do some work refining things first. (Hey, check out Google Refine for that!) Then spend some time doing simple graphs and interviewing the data in Excel for trends and clues. When you have an idea of what you want to map, open up Fusion Tables. For the purposes of this tutorial, let&#8217;s use my spreadsheet on B.C. federal prisons, available on BuzzData. Uploading to Fusion Tables Go to Fusion Tables (click &#8220;see my tables&#8221; from the splash screen). But DO NOT try to just upload the file to Google Docs. Instead, go to Create &#62; [...]]]></description>
			<content:encoded><![CDATA[
<!-- wp-jquery-lightbox, a WordPress plugin by ulfben --> 
<p>Google has a lovely suite of tools for creating custom maps, chief among them the Maps API and Fusion Tables. They work together like brother and sister to create respectable visualizations with mucho data. Let&#8217;s dive in.</p>
<h2>Get your data ready</h2>
<p>Your data should be in a clean CVS or XLS table. You&#8217;ll probably have to do some work refining things first. (Hey, check out <a href="http://code.google.com/p/google-refine/">Google Refine</a> for that!) Then spend some time doing simple graphs and interviewing the data in Excel for trends and clues.</p>
<p>When you have an idea of what you want to map, open up <a href="http://www.google.com/fusiontables/Home/">Fusion Tables</a>.</p>
<p>For the purposes of this tutorial, let&#8217;s use my spreadsheet on B.C. federal prisons, available on <a href="http://buzzdata.com/stuartathompson/b-c-federal-prison-seizures-2008-2010-breakdown-by-the-numbers#!/overview">BuzzData</a>.</p>
<h2>Uploading to Fusion Tables</h2>
<p>Go to Fusion Tables (click &#8220;see my tables&#8221; from the splash screen). But <strong>DO NOT </strong>try to just upload the file to Google Docs. Instead, go to Create &gt; Table. Choose your file, preview, upload.</p>
<p>Once loaded, you should see the table. A bunch of fields are listed at the top, most importantly Latitude and Longitude. Fusion Tables recognizes these fields as coordinates so you don&#8217;t have to do any work to map it.</p>
<p style="text-align: center;"><a href="http://www.stuartathompson.com/wp-content/uploads/2012/02/ft2.png" rel="lightbox[22949]"><img class="size-full wp-image-22950 aligncenter" title="ft2" src="http://www.stuartathompson.com/wp-content/uploads/2012/02/ft2.png" alt="" width="620" height="297" /></a></p>
<p>If you had latitude and longitude but these were called &#8220;lat&#8221; and &#8220;lon&#8221; or &#8220;x&#8221; and &#8220;y,&#8221; you can tell Fusion Tables these are coordinates by going to Edit &gt; Modify Columns and changing the Type of columns.</p>
<p>Since they&#8217;re already specified, go to Visualize &gt; Map. And presto! You have a map. You can embed it at this point, but it&#8217;s really ugly and you get an even uglier InfoWindow when you click on a marker.</p>
<p style="text-align: center;"><a href="http://www.stuartathompson.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-21-at-11.41.26-PM.png" rel="lightbox[22949]"><img class="size-large wp-image-22951 aligncenter" title="ft2" src="http://www.stuartathompson.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-21-at-11.41.26-PM-640x340.png" alt="" width="640" height="340" /></a></p>
<h2> Make it prettier</h2>
<p>There are two simple tools you need to use here:</p>
<ol>
<li><strong><strong>Configure styles</strong>: </strong>this changes the markers</li>
<li><strong>Configure info windows</strong>: this changes the pop-up window</li>
</ol>
<div>First, click <strong>configure styles </strong>(the small link above the map). Since we&#8217;re working with markers (not polylines or polygons) you can go straight to <strong>buckets</strong>. These are filters you use to specify different colours based on values in your table. We&#8217;re going to set one colour for prisons with 0-100 prisoners and another colour for 100 or more prisoners. Thus:</div>
<div><a href="http://www.stuartathompson.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-21-at-11.47.27-PM.png" rel="lightbox[22949]"><img class="size-full wp-image-22952 aligncenter" title="Screen Shot 2012-02-21 at 11.47.27 PM" src="http://www.stuartathompson.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-21-at-11.47.27-PM.png" alt="" width="612" height="308" /></a></div>
<p>Clicking <strong>Save</strong> refreshes the map with the new colours. You should see at least one of them at your zoom level with a yellow marker.</p>
<p>Next, click <strong>configure info windows</strong>. Fusion Tables feeds some info here automatically, but it&#8217;s pretty hideous. So go to the second tab, <strong>Custom</strong>, and add some HTML. I&#8217;ve prepared this simple version:</p>
<p><p>
								<pre class="Plum_Code_Box"><code class="html">&lt;div class='googft-info-window' style='font-family: sans-serif'&gt;
&lt;h3&gt;{Prison}&lt;/h3&gt;
&lt;strong&gt;Date opened:&lt;/strong&gt; {Date Opened}&lt;br /&gt;
&lt;strong&gt;Number of inmates:&lt;/strong&gt; {Number of inmates}&lt;br /&gt;
&lt;hr /&gt;
&lt;strong&gt;Total seizures:&lt;/strong&gt; {total seizures}
&lt;/div&gt;</code>
									</pre>
							</p></p>
<p>Now it looks like this:</p>
<p style="text-align: center;"><a href="http://www.stuartathompson.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-21-at-11.52.02-PM.png" rel="lightbox[22949]"><img class="size-full wp-image-22953 aligncenter" title="Screen Shot 2012-02-21 at 11.52.02 PM" src="http://www.stuartathompson.com/wp-content/uploads/2012/02/Screen-Shot-2012-02-21-at-11.52.02-PM.png" alt="" width="571" height="309" /></a></p>
<p>Spiffy! Now you can embed this in your site. First, go to <strong>Share </strong>in the top right and set it to <strong>Unlisted</strong>, then hit<strong> Save</strong>. Then click on <strong>Get embeddable link</strong>.</p>
<p>But that&#8217;s no fun. You have no legend and no controls. Plus there&#8217;s more you could do with the map, like custom icons and manual filters.</p>
<p>In Part 2, we&#8217;ll visualize the map using the Maps API and the FusionTablesLayer to add filters, custom icons and views.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.stuartathompson.com/2012/02/data-journalism-how-to-create-a-map-with-fusion-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

