Recently I was conversing with a colleague about whether it is better to buy packaged software or build your own. This is something our customers continue to discuss and is increasingly probable in the analytic space. Previously the build vs. buy conversation was around operational applications, and that has been, for all intents and purposes settled. Buy then build fill in the gaps. I wrote about this in my column for Oracle Magazine in 2011 (http://www.oracle.com/technetwork/issue-archive/2011/11-jul/o41field-398497.html).
Now with packaged analytics increasingly in vogue, similar conversations are occurring. Do we build it all ourselves with our data integration / wrangling platforms and reporting / analytic platforms? Or do we buy packaged solutions? The answer to both is yes! But only after you’ve framed the question properly. Too often we buy to scratch an itch or eliminate an acute and discreet pain.
With operational systems we needed to implement or improve specific functional capabilities. Little concern was given to functional or data integration. This worked well in a silo but, eventually the data (or even functionality) of these packaged systems needed to be shared more broadly and the structure and granularity for the data rarely matched well with the other systems deployed in the organization. Inevitably, we bought, configured, implemented and then built integration. The integration covered functional integration with a service bus, with the complications arising from whether the purchased solution, by design or license, exposed functionality openly. Data integration was generally more problematic and resolved with huge projects to build data warehouses and/or operational data stores with mixed results. The data integration challenge also spawned the need for active data governance, which, in part, forces an examination enforcement of rules for common data attributes across all solutions.
With packaged analytic solutions, the challenge is inverted. There’s an expectation that the data required is, or can be easily, integrated. There have been packaged reporting and analytic solutions for decades. Although they are increasingly more valuable as they target very valuable, regulated information in Financial and Health Care organizations. The problem is, still, integration. The same problems classic data warehouses addressed are still the fundamental challenges we have today. Whether we build physical integrated repositories or some virtual, on-demand solution with streaming and APIs I still have to account for a semantic model, keys, and history. Often on-demand data does not have history or the data structures are keyed so differently the complexity of matching like for like records is computationally too costly on demand. Conversely traditional batch, ETL integration significantly slows down our on-demand expectations. No method of integration does everything well.
This is not to say that advances in packaged solutions and particularly the evolution of the technology don’t warrant promotion and use. On the contrary. I’m a fan. Yet, I’m pragmatic enough to know there are no silver bullet answers. Just like I wrote in 2011, it’s “Build AND Buy”. First understand what your ecosystem needs, then buy the platforms, tools, and packages that facilitate the assembly of your optimal ecosystem.