The Future Of Big Data Science

Apache Spark: An open source tool is opening up new possibilities for Data Science

Apache Spark is the go-to tool for Data Science at scale. It is an open source, distributed computer platform which is the first tool in the Data Science toolbox which is built specifically with Data Science in mind. 

We all know that data volumes are growing at an alarming rate and in order to get the best value out of these datasets business need to be able to analyse the full breadth and depth of this data. Traditionally this has been achieved with the various NoSQL data-stores like Hadoop, MongoDb, ElasticSearch and countless others. What has been lacking is the ability to process this data for analytics. 

Analytics has either been achieved by writing complex MapReduce jobs or by picking particular aspects to analyse with Python or R. This works well in a lot of use cases, and typically a machine learning application only need be trained on a small part of the data or the feature engineering and population work means this happens naturally. However, when the need does arise to work with big datasets, (and this is only likely to grow), data science has been at a bit of a loss. That is no longer true with Apache Spark.

Spark is different from the myriad other solutions to this problem because it allows Data Scientists to develop simple code to perform distributed computing, and the functionality available in Spark is growing at an incredible rate. 

Much has been made in the Data Science community around Spark’s ability to train Machine Learning models at scale, and this is a key benefit, but the real value comes from being able to put an entire analytics pipeline into spark, right from the data ingestion and ETL processes, through the data wrangling and feature engineering processes through to training and execution of models. What's more with spark streaming and graphx spark can provide a much more complete analytics solution.

Spark 2.0 is already available as a preview and a full release is imminent and this will represent a real step forward with the unification of datasets and data-frames, everything you want to do analytically with data-frames becomes much faster. And this is also true for spark streaming with the "unending data-frame".

Information-Management

« Google Wants Your Medical Records
Cyber-Attack Takes Down Pokémon Go »

CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

ON-DEMAND WEBINAR: What Is A Next-Generation Firewall (and why does it matter)?

ON-DEMAND WEBINAR: What Is A Next-Generation Firewall (and why does it matter)?

Watch this webinar to hear security experts from Amazon Web Services (AWS) and SANS break down the myths and realities of what an NGFW is, how to use one, and what it can do for your security posture.

Authentic8

Authentic8

Authentic8 transforms how organizations secure and control the use of the web with Silo, its patented cloud browser.

North Infosec Testing (North IT)

North Infosec Testing (North IT)

North IT (North Infosec Testing) are an award-winning provider of web, software, and application penetration testing.

ManageEngine

ManageEngine

As the IT management division of Zoho Corporation, ManageEngine prioritizes flexible solutions that work for all businesses, regardless of size or budget.

DigitalStakeout

DigitalStakeout

DigitalStakeout enables cyber security professionals to reduce cyber risk to their organization with proactive security solutions, providing immediate improvement in security posture and ROI.

Leonardo

Leonardo

Leonardo (formerly Finmeccanica) is a global high-tech company in Aerospace, Defence, Security & Information Systems including Cybersecurity & ICT solutions.

baramundi software

baramundi software

baramundi software AG provides companies and organizations with efficient, secure, and cross-platform management of workstation environments.

CyberPrism

CyberPrism

CyberPrism provides SaaS solutions using proprietary technology, underpinned by industry-leading technical practitioners to protect OT within Government, Maritime and Industrial markets.

GreyCampus

GreyCampus

GreyCampus is a leading provider of training for working professionals in the areas of Project Management, Big Data, Data Science, Service Management, Quality Management and Information Security.

NanoVMs

NanoVMs

NanoVMs is the industry's only unikernel platform available today. NanoVMs runs your applications as secure, isolated virtual machines faster than bare metal installs.

CyGlass

CyGlass

CyGlass simply and effectively identifies, detects, and responds to threats to your network without requiring any additional hardware, software, or people.

RevBits

RevBits

RevBits provides high-performance cybersecurity solutions including email security, endpoint security, deception technology and PAM solution to enterprise companies and public sector organizations.

VIRTIS

VIRTIS

VIRTIS' mission is to provide today's leading organizations peace of mind that their entire digital network perimeter is safe from hackers and data breach.

AB Handshake

AB Handshake

AB Handshake offers a game-changing solution for telecom service providers that eliminates fraud on inbound and outbound voice traffic.

ZINAD IT

ZINAD IT

ZINAD is an information security company offering state-of-the-art cybersecurity awareness products, solutions and services.

VMware

VMware

VMware is a leading provider of multi-cloud services for all apps, enabling digital innovation with enterprise control.

FusionAuth

FusionAuth

FusionAuth is the customer authentication and authorization platform that makes developers' lives awesome.

American Binary

American Binary

American Binary is a Quantum Safe Networking (TM) and post-quantum encryption company.

Telenor Cyberdefence

Telenor Cyberdefence

Telenor Cyberdefence is a newly established (2024) cloud-born Managed Security Service Provider focused on the Nordic markets.

Linx Security

Linx Security

The Linx Identity Security platform enables identity, security, and IT ops teams to finally control the whole identity lifecycle.

DefectDojo

DefectDojo

DefectDojo is a DevSecOps and vulnerability management tool.