Big Data: The 4 Layers Everyone Must Know

Big Data still causes a lot of confusion in people's heads: What really is it? What is new and what is old wine in new bottles?

In order to bring a little more clarity to the concept it might help to describe the 4 key layers of a big data system - i.e. the different stages the data itself has to pass through on its journey from raw statistic or snippet of unstructured data (for example, social media post) to actionable insight.

The whole point of a big data strategy is to develop a system which moves data along this path. In this post, I will attempt to define the basic layers you will need to have in place in order to get any big data project off the ground.

Although people have come up with different names for these layers, as we’re charting a brave new world where little is set in stone, I think this is the simplest and most accurate breakdown:

Data sources layer

This is where the data is arrives at your organization. It includes everything from your sales records, customer database, feedback, social media channels, marketing list, email archives and any data gleaned from monitoring or measuring aspects of your operations. One of the first steps in setting up a data strategy is assessing what you have here, and measuring it against what you need to answer the critical questions you want help with. You might have everything you need already, or you might need to establish new sources.

Data storage layer

This is where your Big Data lives, once it is gathered from your sources. As the volume of data generated and stored by companies has started to explode, sophisticated but accessible systems and tools have been developed – such as Apache Hadoop DFS (distributed file system), which I cover in this article – or Google File System, to help with this task.

A computer with a big hard disk might be all that is needed for smaller data sets, but when you start to deal with storing (and analyzing) truly big data, a more sophisticated, distributed system is called for. As well as a system for storing data that your computer system will understand (the file system) you will need a system for organizing and categorizing it in a way that people will understand – the database. Hadoop has its own, known as HBase, but others including Amazon’s DynamoDB, MongoDB and Cassandra (used by Facebook), all based on the NoSQL architecture, are popular too. This is where you might find the Government taking an interest in your activities – depending on the sort of data you are storing, there may well be security and privacy regulations to follow.

Data processing/ analysis layer

When you want to use the data you have stored to find out something useful, you will need to process, and analyze it. A common method is by using a MapReduce tool (which I also explain in a bit more depth in my article on Hadoop). Essentially, this is used to select the elements of the data that you want to analyze, and putting it into a format from which insights can be gleaned. If you are a large organisation, which has invested in its own data analytics team, they will form a part of this layer, too. They will employ tools such as Apache PIG or HIVE to query the data, and might use automated pattern recognition tools to determine trends, as well as drawing their conclusions from manual analysis.

Data output layer

This is how the insights gleaned through the analysis is passed on to the people who can take action to benefit from them. Clear and concise communication (particularly if your decision-makers don’t have a background in statistics) is essential, and this output can take the form of reports, charts, figures and key recommendations.

Ultimately, your Big Data system’s main task is to show, at this stage of the process, how measurable improvement in at least one KPI that can be achieved by taking action based on the analysis you have carried out.

Data Science Central

« Cybersecurity Training, Military Style
Why Executives Need to Prioritise Cybersecurity »

Infosecurity Europe
CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

CYRIN

CYRIN

CYRIN® Cyber Range. Real Tools, Real Attacks, Real Scenarios. See why leading educational institutions and companies in the U.S. have begun to adopt the CYRIN® system.

TÜV SÜD Academy UK

TÜV SÜD Academy UK

TÜV SÜD offers expert-led cybersecurity training to help organisations safeguard their operations and data.

Infosecurity Europe, 3-5 June 2025, ExCel London

Infosecurity Europe, 3-5 June 2025, ExCel London

This year, Infosecurity Europe marks 30 years of bringing the global cybersecurity community together to further our joint mission of Building a Safer Cyber World.

Jooble

Jooble

Jooble is a job search aggregator operating in 71 countries worldwide. We simplify the job search process by displaying active job ads from major job boards and career sites across the internet.

MIRACL

MIRACL

MIRACL provides the world’s only single step Multi-Factor Authentication (MFA) which can replace passwords on 100% of mobiles, desktops or even Smart TVs.

Protegrity

Protegrity

Protegrity is an enterprise and cloud data security software for data-centric encryption and tokenization to protect sensitive data while maintaining usability.

Cyber Security Academy - University of Southampton

Cyber Security Academy - University of Southampton

An industry/University partnership established to advance cyber security through world class research, teaching excellence, industrial expertise and training capacity.

FinalCode

FinalCode

FinalCode offers a file encryption and file-based enterprise digital rights management (eDRM) platform.

Nuvias Group

Nuvias Group

Nuvias Group is a specialist value-addedd IT distribution company offering a service-led and solution-rich proposition ready for the new world of technology supply.

Rafael

Rafael

Rafael has more than 15 years of proven experience in the cyber arena providing solutions for national security as well as commercial applications.

Thinkst Applied Research

Thinkst Applied Research

Thinkst is an Applied Research company with a deep focus on information security.

Business Continuity

Business Continuity

Business Continuity delivers integrated IT solutions for cybersecurity, virtualization, cloud platforms and operational security solutions.

Zaviant Consulting

Zaviant Consulting

Zaviant Consulting is a leading data security and privacy consulting firm assisting organizations comply with constantly evolving security frameworks and privacy regulations.

Endure Secure

Endure Secure

Endure Secure is a managed cyber security & information security consultancy. Our passion for IS and our understanding of the threat landscape is reflected in the services that we provide.

Datapac

Datapac

Datapac is one of Ireland’s largest and most successful ICT solutions and services providers. We have been at the forefront of technology innovation in Ireland for the past three decades.

TrafficGuard

TrafficGuard

TrafficGuard is an award-winning digital ad verification and fraud prevention platform.

ConvergePoint

ConvergePoint

ConvergePoint is the leading compliance software provider on the Microsoft Office 365 SharePoint platform.

Contextal

Contextal

Contextal develops cutting-edge open-source cybersecurity solutions, designed to connect the dots and detect complex threats, which slip through the existing protections.

Information Security Society of Africa – Nigeria (ISSAN)

Information Security Society of Africa – Nigeria (ISSAN)

The Information Security Society of Africa – Nigeria (ISSAN) is a not-for-profit organization dedicated to the protection of Nigeria’s cyberspace.

Q-Bird

Q-Bird

Q*Bird's mission is to provide equipment for the current, and future European quantum internet.

TerraZone

TerraZone

TerraZone is a global cyber security and privacy solutions provider to governments and enterprises.