I made this graph today showing the number of merger and acquisition applications to the New Zealand Commerce Commission.
Over the past ten years there has been a reasonably steady level of applications of about 20 per year, except for the big drop in 2009. Since in a recession there should be as much or more demand for mergers than usual, this suggests that most of the applications in the recent past were for acquisitions rather than mergers. I also wonder what was the driver of the high level of applications in the early 90s.
An economic model is a tool to give insight into a particular economic situation that is of interest to us. For example we might want to understand what drives changes in the price of oil, or what the effects of a merger between two firms in the same industry will be.
To build a model we must first strip out all the complexities of the real world that we are not interested in. The real world is a very complicated and messy place, and if we try to capture all of its complexity in our model then the model itself will be complicated to the point of being useless.
We exclude and include things in a model and simplify the real world by making assumptions. For example we might assume that demand is linear, and that many things such as the weather, people’s incomes, unemployment, etc, remain constant. Once again, the point of all these assumptions is to make the model simple enough that we can understand it and focus our attention on what we really want to understand. Most of the skill of model building is in choosing appropriate assumptions.
For this reason, a model should not be judged solely by its assumptions (although highly dubious assumptions are not a good thing). Rather we should focus on the model’s ability to teach us something, and its ability to explain the economics of something in a plausible way.
To summarise, a model does:
A model does not:
I was very interested today to read about the launch of Pacific Fibre, which plans to build an international fibre network to compete with Southern Cross.
The New Zealand government has spent a lot of effort over the past five years or so to get competition in retail broadband markets through local loop unbundling and wholesale access to Telecom’s copper network. However, international bandwidth has not been subject to regulation and there has not been any significant build of international fibre since Southern Cross.
The next step after data wrangling is usually data analysis, although sometimes you might jump straight to presentation if no particular analysis is required. The main objective of data analysis is to let the data tell a story, which may involve formal hypothesis testing or estimating coefficients of models, but can be as simple as visualising the data in different ways to look for patterns and trends.
In fact, I find the latter to be increasingly helpful as a complement to formal statistical analysis. The challenge is visualising complex datasets that have many dimensions. Since at best we can only look across two or three dimensions of a dataset at once, we have to be careful about the way we choose to ‘cut’ the data. Excel pivot tables and pivot charts are a pretty good way of doing this, although the presentation of the data in Excel is not as clean as it could be.
I also find it very helpful to think about the possible relationships that could exist in a dataset before diving in to the analysis. This does not necessarily involve coming up with formal hypotheses to test, but rather just thinking about what are the sensible correlations or patterns to look for.
Data sounds so clean and clinical but the dirty little secret of data is that it’s messy. When working with data invariably the first step involves cleaning the data and processing it into a form that is usable for analysis. This process is even more complicated when bringing together multiple datasets.
Basic data problems that need to be dealt with before analysis include:
Data wrangling is the process of sorting out these types of problems to produce a nice clean dataset. The objective is usually a flat database-style dataset, with columns for data descriptors and then the data itself. Missing data often needs to be imputed or otherwise estimated. It is often necessary to come up with creative ways to adjust the data to account for inconsistencies in definitions across datasets. In fact, inconsistencies can arise within the same data series, for example when the definition of a time-series has changed at some point. In such a case, some kind of back-casting can often be used to produce a series that is consistent over time.
I use Excel most often for data wrangling. Useful tools include pivot tables, the database commands like DSUM, and conditional commands like SUMIF or the newer SUMIFS. I also sometimes write custom macros that re-shape data into the format that I want, if there is too much data for manual editing. However, often the process of macro-writing and debugging can take longer than just doing the data edits manually.