The Future Of Big Data Science

Apache Spark: An open source tool is opening up new possibilities for Data Science

Apache Spark is the go-to tool for Data Science at scale. It is an open source, distributed computer platform which is the first tool in the Data Science toolbox which is built specifically with Data Science in mind. 

We all know that data volumes are growing at an alarming rate and in order to get the best value out of these datasets business need to be able to analyse the full breadth and depth of this data. Traditionally this has been achieved with the various NoSQL data-stores like Hadoop, MongoDb, ElasticSearch and countless others. What has been lacking is the ability to process this data for analytics. 

Analytics has either been achieved by writing complex MapReduce jobs or by picking particular aspects to analyse with Python or R. This works well in a lot of use cases, and typically a machine learning application only need be trained on a small part of the data or the feature engineering and population work means this happens naturally. However, when the need does arise to work with big datasets, (and this is only likely to grow), data science has been at a bit of a loss. That is no longer true with Apache Spark.

Spark is different from the myriad other solutions to this problem because it allows Data Scientists to develop simple code to perform distributed computing, and the functionality available in Spark is growing at an incredible rate. 

Much has been made in the Data Science community around Spark’s ability to train Machine Learning models at scale, and this is a key benefit, but the real value comes from being able to put an entire analytics pipeline into spark, right from the data ingestion and ETL processes, through the data wrangling and feature engineering processes through to training and execution of models. What's more with spark streaming and graphx spark can provide a much more complete analytics solution.

Spark 2.0 is already available as a preview and a full release is imminent and this will represent a real step forward with the unification of datasets and data-frames, everything you want to do analytically with data-frames becomes much faster. And this is also true for spark streaming with the "unending data-frame".

Information-Management

« Google Wants Your Medical Records
Cyber-Attack Takes Down Pokémon Go »

Infosecurity Europe
CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

CYRIN

CYRIN

CYRIN® Cyber Range. Real Tools, Real Attacks, Real Scenarios. See why leading educational institutions and companies in the U.S. have begun to adopt the CYRIN® system.

IT Governance

IT Governance

IT Governance is a leading global provider of information security solutions. Download our free guide and find out how ISO 27001 can help protect your organisation's information.

TÜV SÜD Academy UK

TÜV SÜD Academy UK

TÜV SÜD offers expert-led cybersecurity training to help organisations safeguard their operations and data.

XYPRO Technology

XYPRO Technology

XYPRO is the market leader in HPE Non-Stop Security, Risk Management and Compliance.

North Infosec Testing (North IT)

North Infosec Testing (North IT)

North IT (North Infosec Testing) are an award-winning provider of web, software, and application penetration testing.

Finjan Holdings

Finjan Holdings

Finjan solutions are aimed at keeping the web, networks, and endpoints safe from malicious code and security threats.

Visual Guard

Visual Guard

Visual Guard is a modular solution covering most application security requirements, from application-level security systems to Corporate Identity and Access Management Solutions.

SKKU Security Lab (seclab)

SKKU Security Lab (seclab)

SKKU Security Lab supports research and education in information security engineering. The lab is a part of the College of Software, Sungkyunkwan University.

California Cybersecurity Institute (CCI) - Cal poly

California Cybersecurity Institute (CCI) - Cal poly

The CCI provides a hands-on research and learning environment to explore new cyber technologies and train and test tactics alongside law enforcement and cyberforensics experts.

Mitre ATT&CK

Mitre ATT&CK

MITRE ATT&CK™ is a globally-accessible knowledge base of adversary tactics and techniques based on real-world observations.

Sixgill

Sixgill

Sixgill, an IoT sensor platform company, builds the universal data service and smart process automation software allowing any organization to effectively govern its IoE assets.

CyberSecurity Non-Profit (CSNP)

CyberSecurity Non-Profit (CSNP)

CyberSecurity Non-Profit (CSNP) is a 501(c)(3) non-profit organization dedicated to promoting cybersecurity awareness and education.

Adyta

Adyta

Adyta specializes in cybersecurity solutions adapted to the needs of sovereign institutions, business groups and other organizations that handle information and sensitive or classified data.

ConnectWise

ConnectWise

The Unified ConnectWise Platform offers intelligent software and expert services to easily run your business, deliver your services, secure your clients, and build your staff.

South West Cyber Resilience Centre (SWCRC)

South West Cyber Resilience Centre (SWCRC)

The South West Cyber Resilience Centre (SWCRC) is led by serving police officers, as part of a not-for-profit partnership with business and academia.

SRG Security Resource Group

SRG Security Resource Group

SRG Security Resource Group is a Canadian company dedicated to providing world-class Physical and Cyber Security services.

Cryptr

Cryptr

Cryptr provides plug and play authentication to manage all your authentication strategies in one place with just a few lines of code.

CAT Labs

CAT Labs

CAT Labs is building digital asset recovery and cybersecurity tools to enable governments to fight crypto crime and to protect investors from hacks, fraud and scams.

Agile Defense

Agile Defense

Agile Defense is an Information Technology services provider, delivering leading-edge Digital Transformation solutions to the Federal Government.

Getvisibility

Getvisibility

Getvisibility enables customers to detect, classify and protect sensitive information increasing data security, governance, compliance and lowering the risk of losing valuable data.

Luxembourg House of Cybersecurity (LHC)

Luxembourg House of Cybersecurity (LHC)

Luxembourg House of Cybersecurity (formerly SecurityMadeIn.lu) is the backbone of leading-edge cyber resilience in Luxembourg.