Rarefied talent in data science, data technology, and analytics

What is Big Data?

» Posted by Frank Lo

IBM defines big data in fairly simplistic terms: managing huge amounts of data, and being able to process it quickly. But, we don't think this tells the whole story. We hope to provide a more complete picture, covering all the different elements of big data, why it matters to business, and how it is transforming the industry.

"Without data, all you have is an opinion"

A misconception is that the "big data revolution" is about the size of the data. That misses the point. Really, it is about the application of data to reach deep insight, leveraging the possibilities that come from markedly improved data accessibility, analysis, and action.

Yes, "big data" is simply a buzz term coined to represent all of this, but we'll run with these semantics due to the ubiquity in usage. And yes, "big data" often involves warehousing data at a massive scale. But the true motivation – why enterprise invests so heavily in all of this – is not data collection. It is all about learning from that data.

Big data is a revolution in how business is done

What is the real change? Transforming the flow of information in way that enables business to be more intelligent.

What is big data?

The Elements of Big Data

Big data is comprised of a few critical pieces that all work together to bring value to the business:

Data Warehousing at massive scale

As the world has become increasingly digitized, the volume of data that we bring into our systems has ramped up exponentially — it is now common for large businesses to be working with information stores at a petabyte scale. To keep up with these needs, a myriad of innovative technologies have been developed that provide infrastructure to manage such enormous volumes of data.

It is important to note that the technological challenge behind the infrastructure is not actually in storing all the data, it is in finding ways to be very nimble with all the data, processing it with efficiency that makes the data actionable. Some of the tech frameworks commonly in use include: relational database systems, NoSQL database systems, and software ecosystems for distributed computing (e.g. Hadoop/MapReduce). Beyond the technologies themselves, these complex systems require talented platform developers and database administrators (DBAs) to build and maintain a company's data infrastructure.

For more details on this topic, read our in-depth page: About Big Data Technology

Business Intelligence (BI) and self-serve analytics

"Business Intelligence" (commonly abbreviated "BI") has become a term that represents the capability of connecting data with the rest of the company. Specifically, it is an important link between the data warehouse and business leaders/business analysts, enabling full transparency in the nuance of what is going on in the business.

The BI group at a company accomplishes this by developing and maintaining a variety of tools that help end-users grasp all of the data in a digestable medium. These tools and capabilities may include:

The idea that everyone has access to the data and can try to understand it is known as self-serve analytics. It is powerful because it makes data central to all types of decision making, performance management, and business process management, so that departments are not just moving forward based on "hunch".

In particular, OLAP tools (stands for online analytical processing) allow users to navigate through data very easily. It accomplishes this by utilizing a special framework that allows flexible "querying" of data with rapid execution time. The end result is simplicity for business users to get custom views of the data. For example, a marketing manager can easy get full visibility into week-over-week sales of a specific type of product coming through a specific acquisition channel, just by clicking a few buttons in an OLAP tool.

This dashboarding/OLAP framework also makes answering data questions more straightforward for many types of analysts (e.g. marketing analysts, operations analysts, financial analysts). With these tools, analysts can dive in and slice and dice data at deep granularity to understand business elements at a nuanced level, without being burdened by technical challenges working with raw, unstructured information in a data warehouse.

Overall, Business Intelligence is a critical capability that liberates the data, allowing it to be used by everyone. It is a major step towards a company having an analytical culture with evidence-based decision making.

Data Science brings advanced learning capabilities

Data science takes knowledge discovery to the next level – it is all about deep-learning from data, using advanced techniques that include predictive modeling, causal inference, and pattern-recognition through machine learning. It is an expert capability that requires a heavy dose of mathematics mastery, technical savviness, and business intuition. Its practitioners are known as data scientists, and generally considered to be very high value employees in any company with big data ambitions.

Data science centers around asking tough questions and solving some of the most analytically challenging problems around business and data. It is reading between the lines and deriving deep inference from data – mining out key insight that is buried behind the noise, as well as developing powerful data-driven capabilities. At the end of the day, the goal of data science is to provide value through discovery by turning information into gold.

For more details on this topic, read our in-depth page: What is Data Science?

The Future of Big Data

The demand for big data talent and technology is exploding – the investment in big data solutions in 2013 is already projected to have tripled over the last 2 years, and there is general industry consensus that big data in its current state is still very much in its emergence. As our world continues to become more information-driven year over year, some industry analysts predict that the big data market will easily expand by another 10x within the next decade.

State of Big Data

Obviously, this visual is the roughest of estimates of where big data is now on the maturity curve. But, all signs point towards the next 5 to 10 years being an exciting time of growth for this space. Big data is already proving its value, allowing companies to operate at a new level of intelligence and sophistication — and this is only at the beginning.