The Future Of Big Data Science

Apache Spark: An open source tool is opening up new possibilities for Data Science

Apache Spark is the go-to tool for Data Science at scale. It is an open source, distributed computer platform which is the first tool in the Data Science toolbox which is built specifically with Data Science in mind. 

We all know that data volumes are growing at an alarming rate and in order to get the best value out of these datasets business need to be able to analyse the full breadth and depth of this data. Traditionally this has been achieved with the various NoSQL data-stores like Hadoop, MongoDb, ElasticSearch and countless others. What has been lacking is the ability to process this data for analytics. 

Analytics has either been achieved by writing complex MapReduce jobs or by picking particular aspects to analyse with Python or R. This works well in a lot of use cases, and typically a machine learning application only need be trained on a small part of the data or the feature engineering and population work means this happens naturally. However, when the need does arise to work with big datasets, (and this is only likely to grow), data science has been at a bit of a loss. That is no longer true with Apache Spark.

Spark is different from the myriad other solutions to this problem because it allows Data Scientists to develop simple code to perform distributed computing, and the functionality available in Spark is growing at an incredible rate. 

Much has been made in the Data Science community around Spark’s ability to train Machine Learning models at scale, and this is a key benefit, but the real value comes from being able to put an entire analytics pipeline into spark, right from the data ingestion and ETL processes, through the data wrangling and feature engineering processes through to training and execution of models. What's more with spark streaming and graphx spark can provide a much more complete analytics solution.

Spark 2.0 is already available as a preview and a full release is imminent and this will represent a real step forward with the unification of datasets and data-frames, everything you want to do analytically with data-frames becomes much faster. And this is also true for spark streaming with the "unending data-frame".

Information-Management

« Google Wants Your Medical Records
Cyber-Attack Takes Down Pokémon Go »

CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

ON-DEMAND WEBINAR: What Is A Next-Generation Firewall (and why does it matter)?

ON-DEMAND WEBINAR: What Is A Next-Generation Firewall (and why does it matter)?

Watch this webinar to hear security experts from Amazon Web Services (AWS) and SANS break down the myths and realities of what an NGFW is, how to use one, and what it can do for your security posture.

Cyber Security Supplier Directory

Cyber Security Supplier Directory

Our Supplier Directory lists 6,000+ specialist cyber security service providers in 128 countries worldwide. IS YOUR ORGANISATION LISTED?

XYPRO Technology

XYPRO Technology

XYPRO is the market leader in HPE Non-Stop Security, Risk Management and Compliance.

BackupVault

BackupVault

BackupVault is a leading provider of automatic cloud backup and critical data protection against ransomware, insider attacks and hackers for businesses and organisations worldwide.

Authentic8

Authentic8

Authentic8 transforms how organizations secure and control the use of the web with Silo, its patented cloud browser.

Mitol PerfectBackup

Mitol PerfectBackup

Mitol PerfectBackup provide Enterprise Online Backup, Disaster Recovery and Cloud Computing Services.

Jumpsec

Jumpsec

Jumpsec provides penetration testing, security assessments, social engineering testing, cyber incident response, training and consultancy services.

PrivateCore

PrivateCore

We protect data-in-use from hackers trying to steal data such as encryption keys, certificates, intellectual property.

NSIDE Attack Logic

NSIDE Attack Logic

NSIDE Attack Logic simulates real-world cyber attacks to detect vulnerabilities in corporate networks and systems.

Jumio

Jumio

Jumio’s end-to-end identity verification and authentication solutions fight fraud, maintain compliance and onboard good customers faster.

International Accreditation Forum (IAF)

International Accreditation Forum (IAF)

The IAF is the world association of Conformity Assessment Accreditation Bodies. Its primary function is to develop a single worldwide programme of conformity assessment.

Genius Guard

Genius Guard

Genius Guard specializes in DDoS Protection, DDoS Protected Webhosting, HYIP Hosting, Bitcoin Hosting, Cryptocurrency Hosting.

Cythereal

Cythereal

Cythereal is the leader in predicting and preventing advanced malware attacks. Security Automation for the Overwhelmed Administrator.

ActZero

ActZero

ActZero’s security platform leverages proprietary AI-based systems and full-stack visibility to detect, analyze, contain, and disrupt threats.

Innovex Global

Innovex Global

Innovex is a full-service executive search and advisory business that engages with early-stage startups, scale-ups, and established businesses in the Fintech, Cybersecurity and Technology industries.

Policy Monitor

Policy Monitor

Policy Monitor is a cyber security company founded by experts with extensive experience in operational and risk management.

Techsolidity

Techsolidity

Techsolidity is an emerging e-learning platform that offers a wide range of upskilling programs worldwide in areas including cybersecurity.

Tracebit

Tracebit

Tracebit uses decoys to detect and respond to cloud intrusions in minutes.

MiDO Technologies

MiDO Technologies

MiDO Technologies has a mission to change the narrative around digital enabling tools on the continent of Africa and prepare African youth.

Barrier Networks

Barrier Networks

Barrier Networks are a Cyber Security Managed Service Provider that specialises in Network and Application security.

Blackwell Security

Blackwell Security

Blackwell is a driving force in healthcare cybersecurity, transforming how security operations are conducted within this critical sector.