Big Data? Big Deal!

DataWhat do yellow elephants, hives and pigs have in common? Is Cassandra the Zookeeper? because if so the cats need to be herded!

No, stay with me, I haven’t lost it just yet! if I had, i could find it using HDFS and MapReduce processes.

Don’t be alarmed! What you’ve just experienced is something we can help with, confusion! One of the biggest hurdles we are helping our clients over in recent times are the barriers of Big Data adoption, a confusion of technologies, concepts and strategies blurred together to sell one of the most on trend IT initiatives today.

In this article I will break down and separate the components of a Big Data solution, and provide some meaningful examples of how Big Data strategies can be leveraged to drive real value for your organisation. I will use my experience in “traditional” enterprise business intelligence and information management to highlight synergies I can see between the two, and how your organisation can adopt one in addition to the other. There are some very good reasons for running these initiatives in parallel, they are extremely complimentary, and together can provide a powerful mixture of agility and performance for your organisation.

So what is Big Data?

Just in case you weren’t aware, we live in a data rich world! Most people have a number of personal smart devices, phones, tablets, watches, televisions etc, which are capable of connecting to networks, tracking events, displaying information, talking to us, etc etc. In each instance these devices produce data, both qualitative and quantitive. Most of us understand this demand on a personal level, but this consumerism also drives industry. The networks that carry data, the applications that move data between devices, our electricity/gas suppliers, our traffic lights, our airports, booking systems and cars all have points of communication along their disparate processing channels. Our world, and our businesses, create enormous amounts of data.

The problem

Some organisations have always dealt with enormous data volumes, so Big Data remains a very relative term, generally speaking though the concern is not so much with the volumes of data specifically, but rather how easily a quantity of data can be used to generate value for an organisation. There are a number of issues for consideration; how accessible is the information in your organisation? How secure is it? How do we get insight from it? how do we derive real strategic value?

Some of the more traditional forms of collating, processing and displaying information struggle under the requirements of the modern data explosion, and for that reason alternative approaches have been created to deal with the burden of knowing!

The Big Data approach is a distinct change in philosophy to that of a traditional enterprise data warehouse, but still deals with some familiar issues;

  • What data is important to our organisation?
  • How do we get access to it?
  • How do we take that raw data and present it as useful information?
The answer? It’s a hardware/software/people thing. This is where the fun really begins! let’s deal with how these questions can be answered conceptually, and then move into how the yellow elephants, big blue and good people in long white coats can help.

All data is important and insightful – we just don’t know it yet

Determining the importance of data is a difficult equation. We don’t just look at the data, we look at the cost of churning raw data into something useful, we look at the impacts of that data tactically and strategically, and then we prioritize according to highest value and business focus. Fundamentally we make decisions around investing today for future value.

Big Data is a change in focus. It assumes that all data in an organisation is of value today, or will become visibly valuable in the future. Big Data takes the prioritization process out of the business case, and justifies visibility over maturity. Invest today and have it all, find uses for it over time through explorative analysis.

I haven’t met a lock I couldn’t pick with a crowbar

Accessing Data (and by that I mean obtaining it, transforming it, conforming it) is a long winded process, made simpler by a variety of great tools in the ETL space, but nonetheless a time and resource exhaustive part of traditional business intelligence and information management. The main contention here is moving data to the point of processing between environments, generally there is a focus on performance at that point of processing.

The Big Data world diverges a little here in terms of how this happens physically, it either utilises the Hadoop open source platform, or some other form of massively parallel processing (MPP). Conceptually though, the focus is not on bringing data to processing power, but rather taking processing power to the data at its source, and plenty of it! Scalability is a huge factor with Big Data processing, the more hardware you throw at it, the more processing power there is to churn through data. Whilst well planned EDW’s can offer serious grunt as well, generally speaking there is a limit to what can be achieved, especially constrained by time taken to deliver, and organic growth over time. In this respect Big Data can take serious advantage of a Cloud based infrastructure, shameless (and fully disclosed!) plug here for our sister company Kloud Solutions who are already taking advantage of those synergies.

Technology alert!:

Hadoop is the word in terms of open source. The Hadoop project is maintained by, it’s not an acronym, but rather a creative name for a series of subprojects that provide a technical solution specifically around the concepts that Big Data seeks to address. Hadoop features a yellow elephant as its mascot, and many of the subprojects feature creative names and depictions that also had a mention in my witty introduction. Most commercial Big Data products borrow in some way from the fundamental offerings of the project, but seek to make a more intuitive, manageable, supportable solution.

To quote

“The Apache™ Hadoop™ project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures”

Notable Hadoop projects for the acquisition of data include:

  • HDFS (Hadoop Distributed File System) maps distributed hardware together, putting its data, as well as its computational power on the Big Data radar.
  • MapReduce is responsible for picking out specific data elements across the HDFS, typically from flat, text based files. A Java based executable is pushed to the source, where in effect a “query” is run, the results are then brought together and assembled.
Commercially there are many alternatives, some truly innovative products that tend to be more of a microcosm of the Hadoop technology than a replication. Most alternatives feature banks of disk, with specialised controllers and processing power spread across an entire appliance. They combine an operating system, application layer, disk and processing power across discrete units that work in parallel to process enormous amounts of data. Each of these nodes functions in a way similar to that of a node on a Hadoop HDFS.
They include but are not limited to:
  • Teradata
  • IBM Netezza
  • Oracle Exadata
  • EMC Greenplum
The Microsoft solution more closely resembles the Hadoop model, by combining MS SQL Server 2012 and Microsoft Azure Cloud services to distribute the footprint, and leverage massive scalability, with the flexibility of a Cloud offering.

The proof of the pudding is in the reporting

“Amounts and counts Brad, that’s all they really want, amounts and counts!” The words of one of my very first project manager still ring in my ears, and there is some truth in that, simplicity is often key in reporting. In keeping with the comparison between some more traditional uses of information and the Big Data approach, the key differences lay in maturity of reporting in general.

Some people may take exception in the over generalisation found here, but it is keeping with our focus on simplicity.  Traditional warehousing and modelling techniques generally emphasize relationships between entities over a period of time that are well defined and mature in terms of organisational IP and reporting typically reflects that. What I mean by that is, data has generally gone through a process of vetting by SMEs within a business, consolidation within a framework of understood business rules and usages that make sense on a whole of network level. Organisations look for well understood business patterns.

Then in walks Big Data with its pistols a’blazin! Firstly lets deal with the “Unstructured” misnomer sometimes used to describe Big Data, in essence read “not in the conformed structure of a traditional EDW”, because the data we are talking about is generally very structured. In truth Big Data is not mature in the enterprise sense, but from an operational point of view is extremely optimised as a source, and as such has strong interpretable formatting. At a base level what we are really talking about are definable name-value pairs, a text based search result, and the number of it’s occurrences in a data set that can be aggregated/transformed in layer upon layer of processing. Think of twitter logs that might mention your business, and then the correlation of those tags with sales of specific products online at specific times, this might inform marketing campaigns. The value here lies in true data mining and analytical exploration of the unknown. Spotting a new unforeseen trend that sets your organisation apart from others is the goal. Measuring the state of your organisation in a managerial sense is something that is better placed in an EDW.

Technology alert!

Notable Hadoop projects for the analysis of data include:

  • Hive, a data warehouse infrastructure for data summarization and ad hoc querying.
  • Mahout, a scalable machine learning and data mining library.
Again we see our traditional commercial incumbents making use of the more analytical components of a business intelligence solution notably SAS, SPSS and SSAS. Theoretically though the data layer is transparent to most reporting products that just see the Big Data solution as another data source.

The Strategic Coalition

So we have looked at how conceptually Big Data is different to a traditional EDW, but to be fair we aren’t comparing apples with apples, we are however comparing a couple of fruits that mix well, maybe more of an orange and mango? Apple and Guava? Ok, you get the point. This is where the human factor can intervene to determine the appropriate use for these software and hardware based solutions.

I alluded to someone in a long white coat earlier on, which was a reference to the term “Data Scientist”. The definitions for this term are many and varied, and not always associated with Big Data, but in short, a data scientist is someone who understands the traditional practices of business and data analysis. What sets them apart is an innovative approach to sourcing and delivering information in a way that influences how an organization approaches a business challenge. Well in a world where terms come and go at the speed of light, I’d like to coin one I prefer, “Data Pioneer”. I think its more fitting given the explorative nature of Big Data, not only in assessment, but in picking which trail to follow strategically to arrive at real value.

If we look at the benefits of both schools of thought we find a more holistic solution. We can supplement the inherent benefits of conformed/well defined data that suits a more established way of measuring our businesses success, with a more dynamic explorative assessment of a volatile market place, with unsurpassed speed of delivery.

Who says pioneering went out with the wild west! Just watch out for the cowboys!

Business Accelerated BI

Want to increase business adoption of Business Intelligence, save a bunch of cash and build BI solutions quickly? Believe it or not, good technology partners want you to as well!

In this article I’m going to demonstrate 3 simple ways you can accelerate your BI initiative and drive more value from BI Vendors/IT departments today, not tomorrow.

It’s time to kick start your BI Heart!

Those of you who read my last post on Enterprise BI Strategy will have a better picture of where I see the responsibilities of the technology piece sitting, firmly on the the shoulders of a CIO or equivalent function, but how can the business take an active role in a BI initiative, and help to accelerate its delivery and uptake?

Dream it and it will come

More often than not, when business users are taken through a traditional requirements gathering phase the focus tends to be on what would support their current role using data available today. It’s an interesting challenge, with BI supporting strategic decision making for the future, the balance between being relevant today, and tomorrow, poses architectural challenges for IT folk, that can be mitigated to some extent with a little bit of forethought.

Business people by nature are an efficient lot, motivated in adopting processes that provide the most value within a manageable amount of effort on a recurring basis, which is why sometimes its very hard to take a step back and look at alternatives to the way their business is conducted today, or how a changing marketplace may force the way it’s conducted in the future. From my experience, an open dialogue around some of the business’s expectations and challenges for the future, along with the type of BI tools that can provide clarity around those opportunities, is extremely helpful when designing and building a BI solution.

Often, when pressed, business users find it hard to articulate their exact requirements, and I believe that comes about for a couple of reasons. Firstly there is a reluctance to lock those requirements down too tightly because of the dynamic nature of businesses in general, keep in mind though that broad requirements don’t lead to focused outcomes. Secondly, business users don’t have vast amounts of the type of BI experience that would help them make long standing architectural decisions.

The value here though is in the types of initiatives, ideas, forecasts, that can help provide an architectural direction and strategy for future BI development.

  • Lock down what you need today.
  • Think about what you need tomorrow.
  • BI is an evolutionary process, you don’t have to get it all right, all the time.
  • Your insight as a business stakeholder is vital to BI, speculative or otherwise.

Understand the Jargon

Client facing BAs, Technical PMs and developers, have made it a point to understand business terminology and be able to translate technical concepts in business terms for a number of different audiences inherently in their role. There is a general resistance from a number of business areas to reciprocate the knowledge sharing process, which is fine! A perfectly plausible explanation is in utilisation, best to let IT departments or vendors deal with providing systems and solutions, allowing the business to do what they do best, selling, service provision etc, but it comes at a cost in terms of time and dollars.

“Incomprehensible jargon is the hallmark of a profession.” – Kingman Brewster Jr

“Communication Transition Points” take a generally rich and established business language and translate it to an equally rich and established technical language. These CTPs are like international films or political forums, that take seemingly obvious concepts and ideas and attempt to translate them in misfitting ways, generally running longer than we’d like, and costing more than they should! These CTP gateway activities occur throughout a BI initiative, but feature prominently in requirement gathering activities.

Should vendors take time to express themselves in a way that targets your organisation? Absolutely. Does this add an overhead to the project? Absolutely. A commitment to understand even basic terms such as facts, dimensions, metrics, and concepts such as dimensional modelling, presentation layer vs data layer, makes a huge difference to the development process, and the ability of the user to convey very clear and succinct requirements.

  • Miscommunication ultimately impacts business outcomes.
  • A little bit of jargon goes a long way.
  • CTPs are expensive and time consuming, cut down.
  • Clear communications give focused results.

Build a team that works

So put up your hand if you’ve dreamt the dream of BI success, and you’ve schooled yourself in another language! BI wants you! To build an effective team to deliver a true BI business outcome you need a range of technical and business resources who can share a respectful, open dialogue. Often the structuring of the team can play an extremely important part.  It’s not two teams of disparate individuals fighting a way to different outcomes, it’s one cohesive group working through various technical and business challenges to produce a platform that supports real strategic  advantage, so structure it as such.

A collaborative team of dedicated technical and business resources can cycle through iterations of BI deliverables in a way that adds value to your organisation today, and provides clarity over the abilities of the platform for the future. Sure, fundamentally the solution is about the provision of information, and is constrained by inputs throughout the organisation, but a data rich experience without a user base is just an expensive hobby, get involved as often and for as long as you possibly can. At the end of the day the business stands to benefit most from successful implementation/s of BI, so invest in a common vocabulary, invest your time thinking about the future.

Build a team that partners for success. Good vendors look for ways to help you achieve real business outcomes, and whatever commercial agreement you reach should be structured in a way that facilitates that commitment, but you also need to do your part!

  • Foster a culture of collaboration between technical and business resources.
  • Create team structures that reinforce that relationship.
  • Get involved often and for as long as possible.
  • Focus on partnerships that provide outcomes. 

Enterprise BI Strategy

More often than not, when an Enterprise considers a new BI Strategy, that strategy starts, and sometimes ends, with a reporting tool selection process. Based primarily on new and improved reporting tools, with all the bells and whistles, Senior Executives and managers tend to converge on products that offer responsive high level metrics on the go.

There is, of course, another exercise in providing the foundation that supports the pointy end of the iceberg. The underlying data modelling, collection,  management, and processing, often forms the stodgy, unappetising portion of the project pie, that’s hard to sell to business stakeholders. After all, what’s appealing about re-visiting business processes and modelling product hierarchies? Or even worse, working through abstract concepts like date and time dimensionality with the IT department? The coup de grâce though, lays in the costs associated with those questions, often before there’s a dashboard, cross-tab, or even proof of concept in sight; that offers some tangible benefit to the business.

Dear reader, at this point of time let me unreservedly throw in the towel and separate the combatants! I should quickly point out that in no way was my intention to set the Business in the blue corner, IT in the red, and infer never the twain shall meet. On the contrary, In BI projects, more than any other, it is essential the veil between business stakeholders and technology enablers is whisper thin. Almost by process of osmosis, the very best business stakeholders make it their business to deliver robust and articulate requirements to the BI project team, whilst the best BI practitioners strive to understand standard and industry specific business concepts.

A change is as good as a holiday… for a week or two…

When business and BI specialists fail to align, the result is generally a number of segmented solutions. Small silos of data that exist in technically savvy business Access databases and Excel spread sheets become the norm rather than the exception, and under that scenario de-facto reporting super-users within the business can become the first point of call when decision making requires support. So what’s wrong with that?! Well to be honest, this type of behaviour is often born out of necessity.

The time to turn a Business question into an answer can often be unwieldy, and it’s generally the impromptu questions that require an expedient response. The issue is that we never have a consolidated version of the truth, or a solid foundation for future growth. I have seen organisations (generally smaller ones), that manage to negotiate these spot fires relatively well, there tends to be a general cycle of smaller projects, that more or less fail or succeed, encourage little or no excitement around new reporting opportunities, and entrench biases to particular reporting/database/ETL products that are better or worse than other products. The effect of these cycles on the business though can do irreparable damage to perception, uptake and iterative improvement of business intelligence in general.

Put your hand up if you’ve ever heard the term “Business Intelligence” and “Oxymoron” uttered in the same sentence by an end user! Well let’s start to work on a suitable retort!

Enter the Enterprise BI Strategy

So who can drive synergy between the Business, their data and the toolset that makes it useful? More often than not, this can be a group of people, but principally this sits with a CIO. BI could want for no greater champion than the CIO. Someone typically responsible for the provision of IT platforms to solve business problems (the microcosm of which conveniently looks a lot like an enterprise wide business intelligence system), is perfectly placed to drive a unified platform for disseminating information on which to base business decisions. The greatest visibility of underlying technology platforms, and the greatest responsibility to the CEO, and greater business community, generally combine in the greatest Business Intelligence interest!

“A perfection of means, and confusion of aims, seems to be our main problem.” – Albert Einstein

So now we know who, let’s get to the what! I read an article recently that looked at some of the challenges faced by a CIO in perfecting a BI Strategy.

These days there’s very little resistance to implementing some form of BI, so with all this good will, how come the most desired results are so elusive? Are the right people in your enterprise receiving the right information in a timely fashion? In my experience, without an adequate BI strategy the answer is uniformly no.

Sure, weigh the eggs, paint the basket, but don’t put them all in just yet…

So far in this article I’ve generally referred to “The Project”, and for ease, we always have to start somewhere, but the reality is a BI strategy is unlike a project, it’s not a once off activity. Both the BI environment and its governing strategy need to evolve and grow with the business and its requirements. The challenge here is developing a strategy that accommodates for growth.

To get some predictive metrics on growth we can look at existing BI usage, but that’s only part of the picture. It makes sense from a process point of view, to tie system development and its potential requirement for additional reporting back to the BI environment.

Enterprise BI Strategies by and large, accommodate the existing business units, and specifically target their growth. This is a pretty fair assumption. Finance, Marketing, Sales, HR etc., are a fairly safe bet in most organisations and we can make some rational decisions about the depth of growth along this structural tree, but don’t discount growth across its breadth as well. Special projects can come along and organisational restructures are common, so it’s worth giving some thought. More importantly here is where CIO visionary planning becomes critical.

  • What does your organisation look like today?
  • What will it look like in a year?
  • What will it look like in 10 years?

Solid architectural principles tell us that building a modular, iterative BI environment is a way to cater to all those questions, but what does that actually mean today? Traditionally, this meant conforming as many dimensions as possible, catering for different levels of aggregation to boost reporting performance and providing as much business self-service as you can cater for. Some of the newer advancements in technology, like in-memory processing for example, challenge some of these tenets, making it increasingly hard to pin down technology, the toolsets and their importance. Either way, for me, specific technologies have no place in a Enterprise BI Strategy, more important to keep them on the periphery, to constantly challenge and drive value in delivery performance.

“I was a peripheral visionary. I could see the future, but only way off to the side. ” – Steven Wright

IBM’s 2011 Global CIO Study shows that the number one visionary planning activity, among CIO respondents from over 70 countries and 18 industries, is Business Intelligence and analytics, at number 4, a rapid uptake of cloud planning. If we look at these things in isolation I think we do ourselves a disservice. Whether these two things go hand in hand at the moment is a question beyond the scope of this article, but it highlights a couple of things:

  • There is a need to provide a broader and more responsive array of meaningful business analytics, and this comes out of Enterprise BI Strategy, that caters for growth.
  • Cloud computing has been identified as a way of creating scalable solutions ideal for growth, in terms of infrastructure/platform as a service.

Let’s not hinge our entire Strategy on Cloud based services just yet, but there are clearly some exciting correlations between those synergies, only enhanced by the advancements in in-memory processing, and thin reporting clients that are scalable over a number of “on the go” distribution methods.

In Summary

  • Developing an all-encompassing BI Enterprise Strategy is a way of:
    • Providing the right people in your organisation with the right information at the right time.
    • Catering for growth.
  • CIO engagement is critical for success.
  • BI Strategy is a living, evolutionary process.
  • Technology is purely an enabler for your strategy that can continue to drive its value.

Get every new post delivered to your Inbox.