Transforming Your Database

What is a database? Once upon a time, it was simple. The database was a modern Bob Cratchit putting data in tables made up of very straight columns filled with one row per entry. Long, endless rectangles of information stretching on into the future.

The relational database has been the bedrock of modern computing. The vast majority of websites are just a bunch of CSS (Cascading Style Sheets – elements displayed on screen) or lipstick painted on top of SQL (Structured Query Language – used to communicate with a database).

Everything that makes us special is just another row in the big table of life.

The love affair with the big matrix of bits is slowly fading as developers are realizing that not everything fits into a simple table. And because developers are smart and obsessive about finding solutions for every need, they’ve started creating new and better places to store the information. The last few years have brought an explosion in other mechanisms for squirreling away our data.

Are these wonderful new options still databases? Does the data have to fit into some big matrix to be a database? Some like to use the word “data store” to differentiate the modern mechanisms because the word “database” is too tightly linked in our minds to the old tabular structure. We’ll leave that up to the philosophers. Data goes in and answers come out.

Here are eight ways that the database is being reinvented in new shapes and forms.

GPU Computing

Once upon a time, video cards were built to draw elaborate scenes for kids’ games, but now the so-called graphics processing units are doing plenty of non-graphical processing. Searching through data is just one of the best non-graphical operations for them to tackle. And why not? Plowing through endless piles of data looking for a match is an inherently parallel operation made up of lots of rudimentary jobs (testing equality) repeated millions of times. So it is pretty simple to turn the job over to the thousands of processors in the GPU.

The biggest wins aren’t in answering each query (which are obviously many times faster) but in the preparation work, because there is little need for preprocessing. Many databases save time by maintaining an index, which is effectively a precomputed result of every possible search.

If this index is corrupted or destroyed, rebuilding it can take hours, days, or maybe even months. If the data can fit inside a GPU’s memory, though, you can usually get by without the index. If the data is changing quickly and most of the index is never used, then skipping the preprocessing can be quite effective.

Non-volatile memory (NVRAM)

Programmers who cut their teeth 50 years ago had it easy. They didn’t have to juggle data between the RAM and the disk with elaborate protocols for ensuring consistency. That’s because the memory back then was iron core and wasn’t erased when the power was turned off. Those good times may be back again soon because chip manufacturers are talking about replacing RAM with NVRAM or non-volatile memory.

This is a big game changer for database programmers because one of their biggest challenges (and even their greatest reason for living) is disappearing. Some suggest that the databases can get much faster because the transaction semantics can be simpler. Others float the idea of building the recovery log after the data is written to the media, not before.

No one knows how the dust will settle. Will people still use a database at all if they don’t need a permanent record? Or will the searching and indexing keep them coming back? All of the algorithms and all of the architectures are up for rethinking. We’ll know the best way to use NVRAM in a decade or so.

Scale-out SQL

When the NoSQL movement began, one of the big features was the ability to spread your data storage across multiple nodes. NoSQL databases like Cassandra and MongoDB made it seem like getting all of the nice features of large-scale storage meant abandoning the comfortable world of SQL.

In reality, there doesn’t need to be a tradeoff. While the earliest experiments in large-scale databases were easier to create because they left behind all of the SQL baggage, there’s no reason why SQL can’t work well across multiple machines running at huge scale. Indeed, companies like Oracle have been doing it for years.

The newest large-scale databases let you use all of your SQL knowledge and convenience with a set of data spread out across a big cluster. CockroachDB, for instance, offers a standard SQL query engine that accesses data replicated in multiple nodes, all with ACID guarantees. Yes, you’ll pay for some of this belt-and-suspender support for data consistency, but perhaps less than you expected.

If guaranteed consistency is important to your work, start by checking out stacks like CockroachDB, Google Cloud Spanner, Clustrix, Azure SQL, and NuoDB.

Geospatial Databases

Traditional databases are built for one-dimensional data sets, not the two dimensional coordinates from geography. You can fake it and use a standard database to accomplish basic tasks with geographic coordinates. If you stick latitude and longitude in separate columns, it’s not hard to search for rows that fall within a box defined by a range of latitudes and longitudes. But once you want to go beyond this basic box, standard SQL queries just don’t cut it.

Geospatial databases add a few extra functions that make searching, sorting, and intersecting much easier in two-dimensional space. Spatial indices, for instance, usually work by adding a grid on top of the coordinate space to make it much faster to search for rows that are adjacent in two-dimensional and three-dimensional worlds.

These indices make it possible to write queries with operations like “contain,” “overlap,” and even “touch” with sets that are defined by polygons. All of this makes reasoning about the real world that much more efficient.

Check out Neo4j Spatial, GeoMesa, MapD, and PostGIS for some good places to begin.

Graph Databases

Tables are a good repository for many data structures but they don't do a great job of modeling one big, emerging data structure that has powered the last 10 years of Internet evolution: the network. As the so-called “social graph” explodes, we’re filling our computer with more and more nodes with links between them.

And the connections between the nodes are often more important than the data in them. Sure, storing and retrieving one link between one pair of nodes is easy to do in a classic relational database, but more complicated queries start to get impossible. Is Bob two or three hops away from Chris in the friendship network? Is Mary dating the ex of one of her friends?

Graph databases make queries like this easier to run. There is no endless fetching from tables because the query knows how look in the neighborhood specified by the links. Tools like Neo4J, OrientDB, and DataStax are just a few of the options that now can barely be counted on two hands and two feet. They have their own query languages too.

Cloud Databases

One of the biggest changes lies in how we buy database software. In the past, we bought our own machines and signed licensing deals to run the software on our machines. Now the cloud companies are offering services that store blobs of data off somewhere that we can’t see or touch. They just say the data will be there when we want it.

The advantages are obvious. There is no need to maintain the server or the room holding it. There is no need to worry about licensing or configuration or installing patches. Someone else deals with all of those headaches. The solution is often cheaper too — especially if you don’t have a ton of data to store. The services usually charge by the byte.

But the dangers, if there are any, are lying in the shadows. Does someone else have access to the data? Is the server protected from power surges, lightning storms, or floods? Is the data backed up to a trustworthy offsite location? You’ve got to trust the cloud provider on everything.

Major cloud service providers Google, Microsoft, and Amazon offer a long list of database services. These days Oracle, MongoDB, and DataStax also make their databases available in the cloud.

Artificial Intelligence (AI)

Some say that artificial intelligence is just a term for the latest generation of research that is just rolling out of the labs and into production. If so, there are a number of new products and solutions adorned with buzzwords like “machine learning” or “neural networks” or “deep learning.” They may not seem like a database, but you fill them with data and ask them questions. Why not?

The good news from artificial intelligence solutions is that you don’t need to know what you’re looking for. You can just wave your hand and ask for something nebulous like “most interesting” or “closest.” There is no need for the right key, the infernal reference number that the customer service folks are always asking you to write down.

The bad news is that you won’t know if you’ve gotten the right answer because you didn’t specify the question with any precision. Is that blog post really the most interesting? The biggest secret for Google’s success is that there is no absolute right answer. If you’re in the ball park no one can complain.
 
The list of machine learning toolkits is almost too long to contemplate. You can always ask your favorite search engine for the “most interesting” AI.

Blockchain

The word blockchain may be tangled up with the complicated economics and politics of Bitcoin, but underneath all of the talk about currency is an extremely stable and practical distributed data store. Everyone has a chance to update the data in the long table and everyone gets to share in the answer. The big excitement is the fact that everyone shares in the same answers. It’s perfect for businesses that are frenemies.

Some developers take this a bit further and talk about “smart contracts,” which is another way of saying that the bits in the database are trustworthy enough for people to base legal issues like ownership upon them. You can’t do that with a regular database, which can be tweaked by anyone with administrative privileges.

There are weak points, though. Each user must maintain an encryption key because all transactions must be digitally signed. If that key gets lost or forgotten, the data in those rows is frozen forever. If that key gets stolen, well, all bets are off. The blockchain isn’t perfect, in other words, but it’s much more reliable than the standard model.

R3, Ripple, and IBM are just three of the many competitors exploring the space. Many of the leading banks have their own internal projects. And then there are the Bitcoin and Altcoin companies themselves, which are also big parts of the ecosystem.

Infoworld

You Might Also Read: 

Get Your Data Strategy On Board:

Measuring the Economic Value of Data:

« A 9-Step Guide For GDPR Compliance
Equifax Executives Resign Without Charge »

CyberSecurity Jobsite
Perimeter 81

Directory of Suppliers

ManageEngine

ManageEngine

As the IT management division of Zoho Corporation, ManageEngine prioritizes flexible solutions that work for all businesses, regardless of size or budget.

Resecurity

Resecurity

Resecurity is a cybersecurity company that delivers a unified platform for endpoint protection, risk management, and cyber threat intelligence.

Practice Labs

Practice Labs

Practice Labs is an IT competency hub, where live-lab environments give access to real equipment for hands-on practice of essential cybersecurity skills.

MIRACL

MIRACL

MIRACL provides the world’s only single step Multi-Factor Authentication (MFA) which can replace passwords on 100% of mobiles, desktops or even Smart TVs.

XYPRO Technology

XYPRO Technology

XYPRO is the market leader in HPE Non-Stop Security, Risk Management and Compliance.

C3IA Solutions

C3IA Solutions

C3IA Solutions is an NCSC-certified Cyber Consultancy providing assured, tailored advice to keep your information secure and data protected.

Solarflare

Solarflare

Solarflare is a leading provider of intelligent networking I/O software and hardware platforms that accelerate, monitor and secure network data.

British Assessment Bureau

British Assessment Bureau

The British Assessment Bureau is an ISO certification body. We check conformity and compliance of companies to recognised ISO standards including ISO 27001.

Cyber Security For Critical Manufacturing (ManuSec)

Cyber Security For Critical Manufacturing (ManuSec)

Cyber Security For Critical Manufacturing (Manusec) is a global series of summits focusing on Cyber Security for Critical Manufacturing Sectors.

BCS Financial

BCS Financial

BCS Financial delivers financial and insurance solutions. Specialty risk products include Cyber and Privacy Liability insurance.

SteelCloud

SteelCloud

SteelCloud has spent the last decade inventing technology to automate policy compliance, configuration control, and Cloud security.

Mnemonica

Mnemonica

Mnemonica specializes in providing data protection system, information security compliance solutions, cloud and managed services.

SecurityGate

SecurityGate

SecurityGate.io is the only Integrated Risk Management platform built for OT/ICS cybersecurity.

SOC Experts

SOC Experts

SOC Experts is a pioneer (we started SOC training well before people realized how big the domain was going to be) and the only institution to provide end-to-end training on Security Operations Centers

AirITSystems

AirITSystems

AirITSystems offer companies comprehensive IT security solutions that take all security considerations into account and are tailored to your business.

Stefanini Group

Stefanini Group

Stefanini is a global IT services company providing a broad range of solutions for digital transformation including automation, cloud, IoT and cybersecurity.

Drata

Drata

Drata is a security and compliance automation platform that continuously monitors and collects evidence of a company's security controls, while streamlining workflows to ensure audit-readiness.

Vaultinum

Vaultinum

Vaultinum are a trusted independent third party specialized in the protection and audit of digital assets.

National Coordinator for Security and Counterterrorism (NCTV) - Netherlands

National Coordinator for Security and Counterterrorism (NCTV) - Netherlands

The NCTV serves the Netherlands’ national security. We protect national interests, identify threats and strengthen resilience.

Three Wire Systems

Three Wire Systems

Three Wire is a leader in innovative and efficient technology solutions for government agencies and large enterprise corporations.

SteelGate

SteelGate

SteelGate’s core capabilities are centered around architecture design and engineering of network, systems, and cybersecurity solutions.