Big Data: The 4 Layers Everyone Must Know

Big Data still causes a lot of confusion in people's heads: What really is it? What is new and what is old wine in new bottles?

In order to bring a little more clarity to the concept it might help to describe the 4 key layers of a big data system - i.e. the different stages the data itself has to pass through on its journey from raw statistic or snippet of unstructured data (for example, social media post) to actionable insight.

The whole point of a big data strategy is to develop a system which moves data along this path. In this post, I will attempt to define the basic layers you will need to have in place in order to get any big data project off the ground.

Although people have come up with different names for these layers, as we’re charting a brave new world where little is set in stone, I think this is the simplest and most accurate breakdown:

Data sources layer

This is where the data is arrives at your organization. It includes everything from your sales records, customer database, feedback, social media channels, marketing list, email archives and any data gleaned from monitoring or measuring aspects of your operations. One of the first steps in setting up a data strategy is assessing what you have here, and measuring it against what you need to answer the critical questions you want help with. You might have everything you need already, or you might need to establish new sources.

Data storage layer

This is where your Big Data lives, once it is gathered from your sources. As the volume of data generated and stored by companies has started to explode, sophisticated but accessible systems and tools have been developed – such as Apache Hadoop DFS (distributed file system), which I cover in this article – or Google File System, to help with this task.

A computer with a big hard disk might be all that is needed for smaller data sets, but when you start to deal with storing (and analyzing) truly big data, a more sophisticated, distributed system is called for. As well as a system for storing data that your computer system will understand (the file system) you will need a system for organizing and categorizing it in a way that people will understand – the database. Hadoop has its own, known as HBase, but others including Amazon’s DynamoDB, MongoDB and Cassandra (used by Facebook), all based on the NoSQL architecture, are popular too. This is where you might find the Government taking an interest in your activities – depending on the sort of data you are storing, there may well be security and privacy regulations to follow.

Data processing/ analysis layer

When you want to use the data you have stored to find out something useful, you will need to process, and analyze it. A common method is by using a MapReduce tool (which I also explain in a bit more depth in my article on Hadoop). Essentially, this is used to select the elements of the data that you want to analyze, and putting it into a format from which insights can be gleaned. If you are a large organisation, which has invested in its own data analytics team, they will form a part of this layer, too. They will employ tools such as Apache PIG or HIVE to query the data, and might use automated pattern recognition tools to determine trends, as well as drawing their conclusions from manual analysis.

Data output layer

This is how the insights gleaned through the analysis is passed on to the people who can take action to benefit from them. Clear and concise communication (particularly if your decision-makers don’t have a background in statistics) is essential, and this output can take the form of reports, charts, figures and key recommendations.

Ultimately, your Big Data system’s main task is to show, at this stage of the process, how measurable improvement in at least one KPI that can be achieved by taking action based on the analysis you have carried out.

Data Science Central

« Cybersecurity Training, Military Style
Why Executives Need to Prioritise Cybersecurity »

CyberSecurity Jobsite
Check Point

Directory of Suppliers

ManageEngine

ManageEngine

As the IT management division of Zoho Corporation, ManageEngine prioritizes flexible solutions that work for all businesses, regardless of size or budget.

TÜV SÜD Academy UK

TÜV SÜD Academy UK

TÜV SÜD offers expert-led cybersecurity training to help organisations safeguard their operations and data.

BackupVault

BackupVault

BackupVault is a leading provider of automatic cloud backup and critical data protection against ransomware, insider attacks and hackers for businesses and organisations worldwide.

Clayden Law

Clayden Law

Clayden Law advise global businesses that buy and sell technology products and services. We are experts in information technology, data privacy and cybersecurity law.

NordLayer

NordLayer

NordLayer is an adaptive network access security solution for modern businesses — from the world’s most trusted cybersecurity brand, Nord Security. 

Axiomtek

Axiomtek

Axiomtek is a leading design and manufacturing company in the industrial computer and embedded field.

Sigma IT

Sigma IT

SIGMA IT is one of the largest IT services organizations in EMEA region providing a full range of solutions and services including cybersecurity, data protection and business continuity.

Cloud GRC

Cloud GRC

Cloud GRC is an innovative cybersecurity company with solutions and expertise in Cybersecurity Strategies & Frameworks, Threat & Risk Assessment, Cloud Security, and Regulatory Compliance Requirements

Scythe

Scythe

SCYTHE is a next generation red team platform for continuous and realistic enterprise risk assessments.

Salt Cybersecurity

Salt Cybersecurity

Salt Cybersecurity offer a four-pronged approach to information security that includes Custom Security Policy, Vulnerability Assessment, Threat Detection, and Security Awareness Training.

Carson McDowell

Carson McDowell

Carson McDowell are one of Northern Ireland's leading law firms. We are the law firm of choice for many of Northern Ireland's Top 100 companies as well as international companies doing business here.

Finnish Security & Intelligence Service (SUPO)

Finnish Security & Intelligence Service (SUPO)

The Finnish Security and Intelligence Service is a government agency tasked with combating serious threats to national security in Finland.

Oasis Technology

Oasis Technology

Oasis Technology are experts in cyber security. In addition to pioneering the game-changing TITAN anti-hacking device, we provide extensive cyber security consulting services.

Ping Identity

Ping Identity

At Ping Identity, we believe in making digital experiences both secure and seamless for all users, without compromise. That’s digital freedom.

Three Wire Systems

Three Wire Systems

Three Wire is a leader in innovative and efficient technology solutions for government agencies and large enterprise corporations.

Trustaira

Trustaira

Trustaira is the first deep tech solution and service company in Bangladesh.

Sasken Technologies

Sasken Technologies

Sasken’s Cybersecurity Services enables enterprises to develop, maintain, and take digital products to the market with security postures that empower operational excellence.

CorePLUS Technologies

CorePLUS Technologies

CorePlus solutions are designed to empower organizations with the tools they need to ensure the utmost protection for their assets, people, and information.

The Aerospace Corporation

The Aerospace Corporation

The Aerospace Corporation is playing a key role in advancing space cybersecurity through innovative prototypes that can quickly detect and mitigate cyber threats.

Aikido Security

Aikido Security

Aikido is the no-nonsense security platform for developers. Secure your code, cloud, and runtime in one central system. Find and fix vulnerabilities automatically.

Tactic Lab

Tactic Lab

Tactic Lab is a group of cybersecurity experts and managed security services provider focused on offensive and defensive security.