Data infrastructure optimization software
Data integration and quality software
Data availability and security software
Cloud solutions

Graph Databases: Not Your Father’s Big Data

Facebook is one of the most widely used web sites, and should be one of the most-studied, but some developers are still unaware of one of the site’s key technology components.

This component surfaced in 2013 when Facebook described “Social Graph,” an internally developed project based on graph theory. An oversimplification of graph theory, for those who were not given a brief introduction to it in a discrete mathematics intro class, models relationships between pairs of objects mathematically. The object is represented as a node or vertex, and the connecting lines “linking” them are considered edges. Graph databases use nodes, edges and properties to organize and access information. A connection between you (node) and a friend (node) can be thought of as an edge.

Some Graph Apps

Some of the applications where graph databases are being used are familiar, even if this flavor of NoSQL database is less frequently associated with Big Data trends. A network administrator typically draws a chart representing workstations and servers in her network. When the network grows to encompass thousands of devices, software must be used to represent the network. The infrastructure behind Sprint or Verizon’s wireless network of towers and computers is no back-of-the-envelope affair.

A short list of other graph database uses might include:

  • Logistics routing
  • Financial transaction event-actor tracing for fraud detection
  • Linguistics
  • Molecular structure models in chemistry
  • Animal breeding or migration patterns
  • Transportation networks
  • Friendship / friend-of-friend analytics

Triple Threat

One of the most important applications for graph databases is for ontologies and the semantic web. As most commonly implemented, a database about known facts and relationships can be represented using the Resource Description Framework (RDF). In RDF, the basic relationship is described using a subject-predicate-object (SVO in linguistics) expression, hence the shorthand “triple store.” While Big Data RDF applications do not require graph databases, there are some operations that are more naturally performed with them.

graph database image scheme

Social networks are candidates for graph databases. Credit: Anders Sandberg | Flickr

One of the best-known uses of RDF is in IBM Watson, which used RDF “extensively.” Other IBM projects access so-called linked data, albeit not always through graph databases. In a 2012 W3C interview, Arnaud Le Hors explained :

“Watson uses a triple-store but also ontologies and inference. Watson downloads data from the Web (e.g., from dbpedia) that is curated and added to the triple store. Watson reasons over the data, using Semantic Web technology in a major way. We also have products from Tivoli (for help desk tickets) and Information Management (DB2) using linked data. . . [And] IBM just released DB2 version 10 which provides support for RDF with a SPARQL engine on top of DB2.”

Engine Options

Neo4j may be the best funded open source graph database, but InfiniteGraph, InfoGrid, OrientDB, BigData, DEX, HyperGraphDB, OQGraph and ArangoDB are also mentioned. If you prefer Python to Java, consider the Bulb project.

Wikipedia maintains a complete list identifying the most recent versions of these and other graph databases.

Work is not limited to open source communities. Big Data creator-vendors Microsoft (“Trinity”), Twitter (“Interest Graph”) and Google (“Knowledge Graph”) have graph database implementations in various states of readiness.

Gain an Edge: Start Tinkering

The barriers to tinkering with graph databases are surmountable. Java developers already working with Hadoop or Storm, for example, may want to start with Neo Technology’s Neo4J, used by Cisco, Adobe, Deutsche Telekom and Intuit. As with Hortonworks or Cloudera, build something useful that needs vendor support, then consider a supported commercial version. Begin with a conventional ETL to SQL scenario using a Syncsort supported target, then convert it to graph format using a tool like Neo4j’s Cypher.

Tinkertoys as a primitive graph database representation

Tinkertoy models have been compared to graph databases. Credit: Mike Mozart | Flickr

Who knows? A graph database could be a friend you didn’t know you had.


Leave a Comment

Related Posts