Some potentially interesting Twitter data for sale from infochimps:
For $300 you can buy the dataset containing an hour-by-hour breakdown of the occurence of hashtags, URLs, and smileys in the 1.6 billion tweets created between March 2006 and March 2010. For $250 you can purchase a dataset extracted from those same 1.6 billion tweets with all mentions of stock tokens and related keywords.
The next step after data wrangling is usually data analysis, although sometimes you might jump straight to presentation if no particular analysis is required. The main objective of data analysis is to let the data tell a story, which may involve formal hypothesis testing or estimating coefficients of models, but can be as simple as visualising the data in different ways to look for patterns and trends.
In fact, I find the latter to be increasingly helpful as a complement to formal statistical analysis. The challenge is visualising complex datasets that have many dimensions. Since at best we can only look across two or three dimensions of a dataset at once, we have to be careful about the way we choose to ‘cut’ the data. Excel pivot tables and pivot charts are a pretty good way of doing this, although the presentation of the data in Excel is not as clean as it could be.
I also find it very helpful to think about the possible relationships that could exist in a dataset before diving in to the analysis. This does not necessarily involve coming up with formal hypotheses to test, but rather just thinking about what are the sensible correlations or patterns to look for.