Big Data: The 4 Layers Everyone Must Know

Big Data still causes a lot of confusion in people's heads: What really is it? What is new and what is old wine in new bottles?

In order to bring a little more clarity to the concept it might help to describe the 4 key layers of a big data system - i.e. the different stages the data itself has to pass through on its journey from raw statistic or snippet of unstructured data (for example, social media post) to actionable insight.

The whole point of a big data strategy is to develop a system which moves data along this path. In this post, I will attempt to define the basic layers you will need to have in place in order to get any big data project off the ground.

Although people have come up with different names for these layers, as we’re charting a brave new world where little is set in stone, I think this is the simplest and most accurate breakdown:

Data sources layer

This is where the data is arrives at your organization. It includes everything from your sales records, customer database, feedback, social media channels, marketing list, email archives and any data gleaned from monitoring or measuring aspects of your operations. One of the first steps in setting up a data strategy is assessing what you have here, and measuring it against what you need to answer the critical questions you want help with. You might have everything you need already, or you might need to establish new sources.

Data storage layer

This is where your Big Data lives, once it is gathered from your sources. As the volume of data generated and stored by companies has started to explode, sophisticated but accessible systems and tools have been developed – such as Apache Hadoop DFS (distributed file system), which I cover in this article – or Google File System, to help with this task.

A computer with a big hard disk might be all that is needed for smaller data sets, but when you start to deal with storing (and analyzing) truly big data, a more sophisticated, distributed system is called for. As well as a system for storing data that your computer system will understand (the file system) you will need a system for organizing and categorizing it in a way that people will understand – the database. Hadoop has its own, known as HBase, but others including Amazon’s DynamoDB, MongoDB and Cassandra (used by Facebook), all based on the NoSQL architecture, are popular too. This is where you might find the Government taking an interest in your activities – depending on the sort of data you are storing, there may well be security and privacy regulations to follow.

Data processing/ analysis layer

When you want to use the data you have stored to find out something useful, you will need to process, and analyze it. A common method is by using a MapReduce tool (which I also explain in a bit more depth in my article on Hadoop). Essentially, this is used to select the elements of the data that you want to analyze, and putting it into a format from which insights can be gleaned. If you are a large organisation, which has invested in its own data analytics team, they will form a part of this layer, too. They will employ tools such as Apache PIG or HIVE to query the data, and might use automated pattern recognition tools to determine trends, as well as drawing their conclusions from manual analysis.

Data output layer

This is how the insights gleaned through the analysis is passed on to the people who can take action to benefit from them. Clear and concise communication (particularly if your decision-makers don’t have a background in statistics) is essential, and this output can take the form of reports, charts, figures and key recommendations.

Ultimately, your Big Data system’s main task is to show, at this stage of the process, how measurable improvement in at least one KPI that can be achieved by taking action based on the analysis you have carried out.

Data Science Central

« Cybersecurity Training, Military Style
Why Executives Need to Prioritise Cybersecurity »

CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

DigitalStakeout

DigitalStakeout

DigitalStakeout enables cyber security professionals to reduce cyber risk to their organization with proactive security solutions, providing immediate improvement in security posture and ROI.

Authentic8

Authentic8

Authentic8 transforms how organizations secure and control the use of the web with Silo, its patented cloud browser.

Alvacomm

Alvacomm

Alvacomm offers holistic VIP cybersecurity services, providing comprehensive protection against cyber threats. Our solutions include risk assessment, threat detection, incident response.

Resecurity

Resecurity

Resecurity is a cybersecurity company that delivers a unified platform for endpoint protection, risk management, and cyber threat intelligence.

North Infosec Testing (North IT)

North Infosec Testing (North IT)

North IT (North Infosec Testing) are an award-winning provider of web, software, and application penetration testing.

Latham & Watkins LLP

Latham & Watkins LLP

Latham & Watkins is an international law firm. Practice areas include Data Privacy, Security and Cybercrime.

FIRST Conference

FIRST Conference

Annual conference organised by the Forum of Incident Response and Security Teams (FIRST), a recognized global leader in computer incident response.

Ionic Security

Ionic Security

Ionic provide a high-assurance data protection and control platform built on strong encryption, fine-grain control and contextual analytics.

ClearDATA

ClearDATA

The ClearDATA Managed Cloud protects sensitive healthcare data using purpose-built DevOps automation, compliance and security safeguards, and healthcare expertise.

Datacom Systems

Datacom Systems

Datacom Systems is a leading manufacturer of network visibility solutions.

Terranova Security

Terranova Security

Terranova is dedicated to providing information security awareness programs customized to your internal policies and procedures.

Intelligent Waves

Intelligent Waves

Intelligent Waves holds and manages contracts to provide an array of intelligence, operational, communications and IT support to the USG in austere, forward-deployed, hazardous duty environments.

Neurosoft

Neurosoft

Neursoft is a fully integrated ICT company with Software Development, System Integration and Information Technology Security capabilities.

01 Communique Laboratory

01 Communique Laboratory

01 Communique Laboratory is an innovation leader in the new realm of Post-Quantum Cyber Security.

AArete

AArete

AArete is a global management and technology consulting firm specializing in strategic profitability improvement, digital transformation, and advisory services.

Silent Push

Silent Push

Silent Push maps all internet-facing infrastructure with searchable, advanced attributes, generating early indicators of potential threats that are tailored to your environment.

Aardwolf Security

Aardwolf Security

Aardwolf Security specialise in penetration testing to the highest standards set out by OWASP. We ensure complete client satisfaction and aftercare.

SecureAck

SecureAck

From our A-Op SaaS automation platform to Managed Automation-as-a-Service (MAaaS), SecureAck offer powerful security automation the way that best suits your organisation's needs.

nodeQ

nodeQ

At nodeQ, we are pioneering the future of computer networks, leveraging our deep expertise in quantum communication, artificial intelligence, and software-defined networking.

SecuCenter

SecuCenter

Secucenter is a trusted partner for SOC services, offering security expertise in a cost-effective way.

Potech

Potech

Potech provides masterful services in Information & Technology and Cybersecurity to multiple markets across the world.