Making Sense of 50 Billion Triples: No Free Lunch

A lot of grandiose claims have been made promising that graph databases would allow easy ingest of all manner of disparate data and make sense of it – and uncover hidden relationships and meaning. This is, in fact, possible — but there are a few considerations that you need to account for to make your database useful to an analyst charged with making sense of the information. There simply is no free lunch; where time and effort are saved in one place, they must be expended (at least partially) elsewhere.

Let’s take a look at the fundamental difference between graph databases and relational databases from which these claims stem: Rather than store data in rows and columns, graph databases store data in a simpler format that describes a series of simple relationships. Two entities (i.e., nodes or vertices) are connected by a directed edge to form a triple: subject, predicate and object. Since the only requirement for inserting data into a graph database is that the entry contain a subject, predicate and object, then it stands to reason that ingesting disparate data sources is a much simpler affair than if one had to design a schema and potentially multiple tables to store that data.

Herein lies the rub: Though it is true that literally any data can be simply tossed into a graph database without any care, in the famous words of Mister T, “I pity the fool!” who has to actually sit down and make sense of the data that results from such a careless ingestion method. Further, I would hope that at the very least, the poor soul responsible for analyzing the resulting hodgepodge would be provided with a lifetime supply of aspirin.

To standardize data prior to ingest, one needs to develop a mechanism that maps raw data to a particular ontology or taxonomy. To accomplish this, one can implement a rule-based system or even use machine learning to get the job done. The machine learning approach is in use to some degree for DBpedia, which employs a complex data-processing workflow to map Wikipedia data to the DBpedia ontologies. This approach is certainly feasible, but takes us right back to where we started as far as trying to benefit from easily ingesting data with minimal preprocessing.

Given the goal of living up to the claim that one can simply toss data in, the more contextually appropriate method here is to ingest the data as is for the most part and then sort it out once it has been ingested. To do this with a very small graph, as seen in many examples, is trivial — but to do it at scale, not so much. What about when we are looking at dozens or even hundreds of data sources? If we have enough knowledge and understanding of our source data, this would be feasible, but real datasets usually start off as relative mysteries to us. It is here that a graph database can truly shine.

That will be the topic of my next two posts; I’ll provide a few tactics to get you going on any size dataset and then dive into deeper into the graph. As a solution architect for Cray, I’m biased toward our customers’ uses of graph analytics, so I encourage you to question and test my assertions and then come to your own conclusions.

The post Making Sense of 50 Billion Triples: No Free Lunch appeared first on Cray Blog.

Making Sense of 50 Billion Triples: No Free Lunch

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112