<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Aaron Schiff &#187; Data Wrangling</title>
	<atom:link href="http://www.aaronschiff.net/category/data-wrangling/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.aaronschiff.net</link>
	<description>The personal website of Aaron Schiff</description>
	<lastBuildDate>Thu, 24 Jun 2010 04:31:55 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>World development indicators now free</title>
		<link>http://www.aaronschiff.net/data-wrangling/world-development-indicators-now-free/</link>
		<comments>http://www.aaronschiff.net/data-wrangling/world-development-indicators-now-free/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 19:50:58 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Data Wrangling]]></category>

		<guid isPermaLink="false">http://www.aaronschiff.net/data-wrangling/world-development-indicators-now-free/</guid>
		<description><![CDATA[Woohoo, the World Bank&#8217;s World Development Indicators dataset is now available for free. This is an excellent dataset of cross-country economic and social development indicators. There&#8217;s even an API for accessing the data directly! 
]]></description>
			<content:encoded><![CDATA[<p>Woohoo, the World Bank&#8217;s World Development Indicators dataset is now <a href='http://flowingdata.com/2010/04/20/world-data-released-is-a-dream-come-true/'>available for free</a>. This is an excellent dataset of cross-country economic and social development indicators. There&#8217;s even an API for accessing the data directly! </p>
]]></content:encoded>
			<wfw:commentRss>http://www.aaronschiff.net/data-wrangling/world-development-indicators-now-free/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Excel Data Validation &#8211; Add New Items</title>
		<link>http://www.aaronschiff.net/data-wrangling/excel-data-validation-add-new-items-2/</link>
		<comments>http://www.aaronschiff.net/data-wrangling/excel-data-validation-add-new-items-2/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 05:03:22 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Data Wrangling]]></category>

		<guid isPermaLink="false">http://www.aaronschiff.net/data-wrangling/excel-data-validation-add-new-items-2/</guid>
		<description><![CDATA[A nice Excel tip:

In this Excel data validation example, you&#8217;ll create an Excel Data Validation drop down list that allows users to add new items. 

]]></description>
			<content:encoded><![CDATA[<p>A nice <a href='http://www.contextures.com/excel-data-validation-add.html'>Excel tip</a>:</p>
<blockquote>
<p>In this Excel data validation example, you&#8217;ll create an Excel Data Validation drop down list that allows users to add new items. </p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.aaronschiff.net/data-wrangling/excel-data-validation-add-new-items-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data&#8217;s dirty secret</title>
		<link>http://www.aaronschiff.net/data-wrangling/datas-dirty-secret/</link>
		<comments>http://www.aaronschiff.net/data-wrangling/datas-dirty-secret/#comments</comments>
		<pubDate>Mon, 08 Mar 2010 07:58:59 +0000</pubDate>
		<dc:creator>aaron</dc:creator>
				<category><![CDATA[Data Wrangling]]></category>

		<guid isPermaLink="false">http://www.aaronschiff.net/?p=31</guid>
		<description><![CDATA[Data sounds so clean and clinical but the dirty little secret of data is that it&#8217;s messy. When working with data invariably the first step involves cleaning the data and processing it into a form that is usable for analysis. This process is even more complicated when bringing together multiple datasets.
Basic data problems that need [...]]]></description>
			<content:encoded><![CDATA[<p>Data sounds so clean and clinical but the dirty little secret of data is that it&#8217;s messy. When working with data invariably the first step involves cleaning the data and processing it into a form that is usable for analysis. This process is even more complicated when bringing together multiple datasets.</p>
<p>Basic data problems that need to be dealt with before analysis include:</p>
<ul>
<li>Missing or incomplete data</li>
<li>Inconsistent definitions across datasets</li>
<li>Different frequencies (e.g. annual, monthly, quarterly) used in different datasets</li>
</ul>
<p>Data wrangling is the process of sorting out these types of problems to produce a nice clean dataset. The objective is usually a flat database-style dataset, with columns for data descriptors and then the data itself. Missing data often needs to be imputed or otherwise estimated. It is often necessary to come up with creative ways to adjust the data to account for inconsistencies in definitions across datasets. In fact, inconsistencies can arise within the same data series, for example when the definition of a time-series has changed at some point. In such a case, some kind of back-casting can often be used to produce a series that is consistent over time.</p>
<p>I use Excel most often for data wrangling. Useful tools include pivot tables, the database commands like DSUM, and conditional commands like SUMIF or the newer SUMIFS. I also sometimes write custom macros that re-shape data into the format that I want, if there is too much data for manual editing. However, often the process of macro-writing and debugging can take longer than just doing the data edits manually.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.aaronschiff.net/data-wrangling/datas-dirty-secret/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
