The Future Of Big Data Science

Apache Spark: An open source tool is opening up new possibilities for Data Science

Apache Spark is the go-to tool for Data Science at scale. It is an open source, distributed computer platform which is the first tool in the Data Science toolbox which is built specifically with Data Science in mind. 

We all know that data volumes are growing at an alarming rate and in order to get the best value out of these datasets business need to be able to analyse the full breadth and depth of this data. Traditionally this has been achieved with the various NoSQL data-stores like Hadoop, MongoDb, ElasticSearch and countless others. What has been lacking is the ability to process this data for analytics. 

Analytics has either been achieved by writing complex MapReduce jobs or by picking particular aspects to analyse with Python or R. This works well in a lot of use cases, and typically a machine learning application only need be trained on a small part of the data or the feature engineering and population work means this happens naturally. However, when the need does arise to work with big datasets, (and this is only likely to grow), data science has been at a bit of a loss. That is no longer true with Apache Spark.

Spark is different from the myriad other solutions to this problem because it allows Data Scientists to develop simple code to perform distributed computing, and the functionality available in Spark is growing at an incredible rate. 

Much has been made in the Data Science community around Spark’s ability to train Machine Learning models at scale, and this is a key benefit, but the real value comes from being able to put an entire analytics pipeline into spark, right from the data ingestion and ETL processes, through the data wrangling and feature engineering processes through to training and execution of models. What's more with spark streaming and graphx spark can provide a much more complete analytics solution.

Spark 2.0 is already available as a preview and a full release is imminent and this will represent a real step forward with the unification of datasets and data-frames, everything you want to do analytically with data-frames becomes much faster. And this is also true for spark streaming with the "unending data-frame".

Information-Management

« Google Wants Your Medical Records
Cyber-Attack Takes Down Pokémon Go »

Infosecurity Europe
CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

MIRACL

MIRACL

MIRACL provides the world’s only single step Multi-Factor Authentication (MFA) which can replace passwords on 100% of mobiles, desktops or even Smart TVs.

NordLayer

NordLayer

NordLayer is an adaptive network access security solution for modern businesses — from the world’s most trusted cybersecurity brand, Nord Security. 

The PC Support Group

The PC Support Group

A partnership with The PC Support Group delivers improved productivity, reduced costs and protects your business through exceptional IT, telecoms and cybersecurity services.

XYPRO Technology

XYPRO Technology

XYPRO is the market leader in HPE Non-Stop Security, Risk Management and Compliance.

DigitalStakeout

DigitalStakeout

DigitalStakeout enables cyber security professionals to reduce cyber risk to their organization with proactive security solutions, providing immediate improvement in security posture and ROI.

Virtual Security

Virtual Security

Virtual Security provides solutions in the field of managed security services, network security, secure remote work, responsible internet, application security, encryption, BYOD and compliance.

PubNub

PubNub

PubNub enables developers to build secure realtime Mobile, Web, and IoT Apps.

Medigate

Medigate

Medigate is a dedicated medical device security platform protecting all of the connected medical devices on health care provider networks.

Data443 Risk Mitigation

Data443 Risk Mitigation

Data443 Risk Mitigation provides next-generation cybersecurity products and services in the area of data security and compliance.

SpyCloud

SpyCloud

SpyCloud is a leader in account takeover (ATO) prevention, protecting billions of consumer and employee accounts either directly or through product integrations.

Open Connectivity Foundation (OCF)

Open Connectivity Foundation (OCF)

OCF is dedicated to ensuring secure interoperability ensuring secure interoperability of IoT for consumers, businesses and industries.

Ultratec

Ultratec

Ultratec provide a range of data centric services and solutions including data recovery, data erasure, data destruction and full IT Asset Disposal (ITAD).

Drootoo

Drootoo

Drootoo is transforming businesses and making them high performing entities with its unified cloud platform.

MassMutual Ventures

MassMutual Ventures

Mass Mutual ventures backs companies building category-defining businesses in markets including enterprise software, digital health, cybersecurity, and fintech.

Dutch Institute for Vulnerability Disclosure (DIVD)

Dutch Institute for Vulnerability Disclosure (DIVD)

DIVD's aim is to make the digital world safer by reporting vulnerabilities we find in digital systems to the people who can fix them.

ThreatLocker

ThreatLocker

The ThreatLocker Platform provides a Zero Trust security solution that offers a unified approach to protecting users, devices, and networks against the exploitation of zero day vulnerabilities.

Velum Labs

Velum Labs

Velum Labs is a cyber intelligence company that provides simple and non-intrusive, cloud and cyber intelligence solutions; built from a market-leading understanding of cyber-attack methodology.

Myntex

Myntex

Myntex® builds the future of mobile security. We empower our partners to deliver exclusive mobile endpoint security software, fortifying against mobile threats, device exploits and data exfiltration.

Segra

Segra

Segra owns and operates one of the nation’s largest fiber networks and provides best-in-class broadband and data security solutions throughout the Southeast and Mid-Atlantic.

Omdia

Omdia

Omdia is a technology research and advisory group. Our deep knowledge of tech markets combined with our actionable insights empower organizations to make smart growth decisions.

North Infosec Testing (North IT)

North Infosec Testing (North IT)

North IT (North Infosec Testing) are an award-winning provider of web, software, and application penetration testing.