The Future Of Big Data Science

Apache Spark: An open source tool is opening up new possibilities for Data Science

Apache Spark is the go-to tool for Data Science at scale. It is an open source, distributed computer platform which is the first tool in the Data Science toolbox which is built specifically with Data Science in mind. 

We all know that data volumes are growing at an alarming rate and in order to get the best value out of these datasets business need to be able to analyse the full breadth and depth of this data. Traditionally this has been achieved with the various NoSQL data-stores like Hadoop, MongoDb, ElasticSearch and countless others. What has been lacking is the ability to process this data for analytics. 

Analytics has either been achieved by writing complex MapReduce jobs or by picking particular aspects to analyse with Python or R. This works well in a lot of use cases, and typically a machine learning application only need be trained on a small part of the data or the feature engineering and population work means this happens naturally. However, when the need does arise to work with big datasets, (and this is only likely to grow), data science has been at a bit of a loss. That is no longer true with Apache Spark.

Spark is different from the myriad other solutions to this problem because it allows Data Scientists to develop simple code to perform distributed computing, and the functionality available in Spark is growing at an incredible rate. 

Much has been made in the Data Science community around Spark’s ability to train Machine Learning models at scale, and this is a key benefit, but the real value comes from being able to put an entire analytics pipeline into spark, right from the data ingestion and ETL processes, through the data wrangling and feature engineering processes through to training and execution of models. What's more with spark streaming and graphx spark can provide a much more complete analytics solution.

Spark 2.0 is already available as a preview and a full release is imminent and this will represent a real step forward with the unification of datasets and data-frames, everything you want to do analytically with data-frames becomes much faster. And this is also true for spark streaming with the "unending data-frame".

Information-Management

« Google Wants Your Medical Records
Cyber-Attack Takes Down Pokémon Go »

CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

Syxsense

Syxsense

Syxsense brings together endpoint management and security for greater efficiency and collaboration between IT management and security teams.

DigitalStakeout

DigitalStakeout

DigitalStakeout enables cyber security professionals to reduce cyber risk to their organization with proactive security solutions, providing immediate improvement in security posture and ROI.

XYPRO Technology

XYPRO Technology

XYPRO is the market leader in HPE Non-Stop Security, Risk Management and Compliance.

CYRIN

CYRIN

CYRIN® Cyber Range. Real Tools, Real Attacks, Real Scenarios. See why leading educational institutions and companies in the U.S. have begun to adopt the CYRIN® system.

MIRACL

MIRACL

MIRACL provides the world’s only single step Multi-Factor Authentication (MFA) which can replace passwords on 100% of mobiles, desktops or even Smart TVs.

Secure Recruiting International (SRI)

Secure Recruiting International (SRI)

SRI is an industry leader in Information Security , Networking, Wireless and Storage recruitment.

Cipher Tooth

Cipher Tooth

CipherTooth is a superior system for delivering secure content over the Internet.

Data Recovery Services (DRS)

Data Recovery Services (DRS)

DRS provides data recovery services from media including hard disk drives, RAID, solid state disks SSD, memory sticks, USB drives, SD cards, tapes and mobile phones.

Stealthbits Technologies

Stealthbits Technologies

Stealthbits Technologies is a cybersecurity software company focused on protecting an organization's sensitive data and the credentials attackers use to steal that data.

Saudi Federation for Cyber Security and Programming (SAFCSP)

Saudi Federation for Cyber Security and Programming (SAFCSP)

SAFCSP is a national institution under the umbrella of the Saudi Arabian Olympic Committee, which seeks to build national and professional capabilities in the fields of cyber security and programming.

GuardRails

GuardRails

GuardRails provides continuous security feedback that empowers developers to find, fix, and prevent vulnerabilities.

Kleiner Perkins

Kleiner Perkins

For five decades, Kleiner Perkins has made history by partnering with some of the most ingenious and forward-thinking founders in technology and life sciences.

SecureStrux

SecureStrux

SecureStrux are a cybersecurity consulting firm providing specialized services in the areas of compliance, vulnerability assessment, computer network defense, and cybersecurity strategies.

Tactical Network Systems (TNS)

Tactical Network Systems (TNS)

Tactical Network Solutions helps you discover hidden attack vectors in IoT and connected devices before someone else does.

Phished

Phished

Phished is an AI-driven platform that focuses on the human side of cybersecurity. By combining fully automated training software with personalised, realistic simulations of cyberattacks.

Policy Monitor

Policy Monitor

Policy Monitor is a cyber security company founded by experts with extensive experience in operational and risk management.

Questex Asia Total Security Conference

Questex Asia Total Security Conference

Questex Asia’s Total Security Conferences is one of the industry’s most prestigious and engaging forums for the region's top information security leaders and business decision-makers.

European Data Protection Supervisor (EDPS)

European Data Protection Supervisor (EDPS)

The EDPS is the European Union’s independent data protection authority. We monitor and ensure the protection of personal data and privacy when EU institutions and bodies process personal information.

Unified National Networks (UNN)

Unified National Networks (UNN)

UNN’s mission is to unify the national networks and create a modern and cost efficient digital platform connecting the entire country.

Cybervergent

Cybervergent

Cybervergent (formerly Infoprive) are a leading cybersecurity technology company in Africa. We provide cybersecurity guidance and solutions that help protect your business.

Jitterbit

Jitterbit

Jitterbit integrates critical business processes and enables application development to deliver the experiences and insights needed by enterprises of all sizes to accelerate their digital journey.