Guided Tutorial — Build a Knowledge Graph in 30 Minutes

This tutorial picks up where the Hello World walkthrough ends. You will build a bibliographic knowledge graph with papers, authors, institutions, and citations — then validate it, reason over it, and export it as JSON-LD.

The tutorial is organized in four independent segments. Each takes under ten minutes and leaves you with a working, progressively richer knowledge graph. You can stop after any segment.

Note

This tutorial uses an academic bibliographic dataset. The patterns — entity relationships, typed literals, named graphs, inference, validation — apply equally to product catalogs, supply chains, organizational hierarchies, or any domain with interconnected data.

Prerequisites

pg_ripple is installed and you are connected to a PostgreSQL database with the extension created. See Installation.

Segment 1: Load and Explore (10 min)

Register prefixes

SELECT pg_ripple.register_prefix('bib', 'http://example.org/bib/');
SELECT pg_ripple.register_prefix('foaf', 'http://xmlns.com/foaf/0.1/');
SELECT pg_ripple.register_prefix('dc', 'http://purl.org/dc/elements/1.1/');
SELECT pg_ripple.register_prefix('dcterms', 'http://purl.org/dc/terms/');
SELECT pg_ripple.register_prefix('schema', 'http://schema.org/');
SELECT pg_ripple.register_prefix('skos', 'http://www.w3.org/2004/02/skos/core#');

Load the bibliographic dataset

SELECT pg_ripple.load_turtle('
@prefix bib:     <http://example.org/bib/> .
@prefix foaf:    <http://xmlns.com/foaf/0.1/> .
@prefix dc:      <http://purl.org/dc/elements/1.1/> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix schema:  <http://schema.org/> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix skos:    <http://www.w3.org/2004/02/skos/core#> .

bib:mit       a schema:Organization ; schema:name "MIT" .
bib:stanford  a schema:Organization ; schema:name "Stanford University" .
bib:oxford    a schema:Organization ; schema:name "University of Oxford" .

bib:alice     a foaf:Person ; foaf:name "Alice Chen" ;
              schema:affiliation bib:mit .
bib:bob       a foaf:Person ; foaf:name "Bob Smith" ;
              schema:affiliation bib:stanford .
bib:carol     a foaf:Person ; foaf:name "Carol Martinez" ;
              schema:affiliation bib:oxford .

bib:paper1    a schema:ScholarlyArticle ;
              dc:title "Knowledge Graphs in Practice" ;
              dc:creator bib:alice ; dc:creator bib:bob ;
              dcterms:issued "2024-01-15"^^xsd:date ;
              schema:about <http://example.org/bib/kg> .

bib:paper2    a schema:ScholarlyArticle ;
              dc:title "Efficient SPARQL Query Processing" ;
              dc:creator bib:bob ; dc:creator bib:carol ;
              dcterms:issued "2024-03-22"^^xsd:date .

bib:paper3    a schema:ScholarlyArticle ;
              dc:title "Graph-Enhanced Retrieval for LLMs" ;
              dc:creator bib:alice ;
              dcterms:issued "2024-06-10"^^xsd:date .

bib:paper2    dcterms:references bib:paper1 .
bib:paper3    dcterms:references bib:paper1 .
bib:paper3    dcterms:references bib:paper2 .

bib:alice foaf:knows bib:bob .
bib:bob   foaf:knows bib:carol .
');

Explore: find all papers by Alice

SELECT * FROM pg_ripple.sparql('
  PREFIX dc: <http://purl.org/dc/elements/1.1/>
  PREFIX bib: <http://example.org/bib/>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  SELECT ?title WHERE {
    ?paper dc:creator bib:alice .
    ?paper dc:title ?title .
  }
');

Explore: citation chains

Find papers that cite papers Alice authored:

SELECT * FROM pg_ripple.sparql('
  PREFIX dc: <http://purl.org/dc/elements/1.1/>
  PREFIX dcterms: <http://purl.org/dc/terms/>
  PREFIX bib: <http://example.org/bib/>
  SELECT ?citingTitle ?citedTitle WHERE {
    ?citing dcterms:references ?cited .
    ?cited dc:creator bib:alice .
    ?citing dc:title ?citingTitle .
    ?cited dc:title ?citedTitle .
  }
');

Explore: count papers per author

SELECT * FROM pg_ripple.sparql('
  PREFIX dc: <http://purl.org/dc/elements/1.1/>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  SELECT ?name (COUNT(?paper) AS ?papers) WHERE {
    ?paper dc:creator ?author .
    ?author foaf:name ?name .
  }
  GROUP BY ?name
  ORDER BY DESC(?papers)
');

Segment 2: Validate (10 min)

SHACL (Shapes Constraint Language) lets you define data quality rules. You will create a shape that requires every ScholarlyArticle to have a title and at least one creator.

Load a SHACL shape

SELECT pg_ripple.load_shacl('
@prefix sh:     <http://www.w3.org/ns/shacl#> .
@prefix schema: <http://schema.org/> .
@prefix dc:     <http://purl.org/dc/elements/1.1/> .
@prefix xsd:    <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/shapes/ArticleShape>
  a sh:NodeShape ;
  sh:targetClass schema:ScholarlyArticle ;
  sh:property [
    sh:path dc:title ;
    sh:minCount 1 ;
    sh:maxCount 1 ;
    sh:datatype xsd:string ;
    sh:message "Every article must have exactly one title" ;
  ] ;
  sh:property [
    sh:path dc:creator ;
    sh:minCount 1 ;
    sh:message "Every article must have at least one creator" ;
  ] .
');

Validate the dataset

SELECT pg_ripple.validate();

The result is a JSONB validation report. If all articles conform, the report shows zero violations. Now insert a bad article to see validation catch it:

SELECT pg_ripple.insert_triple(
  'http://example.org/bib/bad_paper',
  'http://www.w3.org/1999/02/22-rdf-syntax-ns#type',
  'http://schema.org/ScholarlyArticle'
);

SELECT pg_ripple.validate();

The report now shows a violation: the article has no title and no creator.

Segment 3: Reason (10 min)

Datalog rules let you derive new facts. You will write a rule that infers transitive co-authorship: if Alice co-authored a paper with Bob, and Bob co-authored with Carol, then Alice and Carol are indirectly connected.

Write and load a rule

SELECT pg_ripple.load_rules('
  coauthor(?a, ?b) :- <http://purl.org/dc/elements/1.1/creator>(?paper, ?a),
                      <http://purl.org/dc/elements/1.1/creator>(?paper, ?b),
                      ?a != ?b.
  connected(?a, ?b) :- coauthor(?a, ?b).
  connected(?a, ?b) :- connected(?a, ?c), coauthor(?c, ?b), ?a != ?b.
', 'coauthorship');

Run inference

SELECT pg_ripple.infer('coauthorship');

This returns the number of new facts derived.

Query the derived facts

SELECT * FROM pg_ripple.sparql('
  PREFIX bib: <http://example.org/bib/>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  SELECT ?name WHERE {
    bib:alice <http://example.org/bib/connected> ?person .
    ?person foaf:name ?name .
  }
');

Alice is now connected to Bob (direct co-author on paper1), Carol (through Bob on paper2), and potentially others through the transitive chain.

Segment 4: Export (10 min)

Export your knowledge graph as JSON-LD, shaped for an API using a frame template.

Export as Turtle

SELECT pg_ripple.export_turtle();

This returns all triples in human-readable Turtle format.

Export as JSON-LD with framing

SELECT pg_ripple.sparql_construct_jsonld('
  PREFIX dc: <http://purl.org/dc/elements/1.1/>
  PREFIX foaf: <http://xmlns.com/foaf/0.1/>
  PREFIX schema: <http://schema.org/>
  CONSTRUCT {
    ?paper dc:title ?title .
    ?paper dc:creator ?author .
    ?author foaf:name ?name .
    ?author schema:affiliation ?org .
    ?org schema:name ?orgName .
  }
  WHERE {
    ?paper a schema:ScholarlyArticle .
    ?paper dc:title ?title .
    ?paper dc:creator ?author .
    ?author foaf:name ?name .
    OPTIONAL {
      ?author schema:affiliation ?org .
      ?org schema:name ?orgName .
    }
  }
');

The result is a nested JSON-LD document with papers, their authors, and institutional affiliations — ready to serve from a REST API.

What you built

In 30 minutes, you created a knowledge graph with:

Structured data — papers, authors, institutions, and citations as RDF triples
Quality rules — SHACL shapes that catch incomplete articles
Derived knowledge — Datalog rules that infer transitive co-authorship
API-ready export — JSON-LD output shaped for downstream consumers

Next steps

Storing Knowledge — data modeling deep dive
Querying with SPARQL — the full query language
Validating Data Quality — advanced SHACL patterns
Reasoning and Inference — Datalog, RDFS, OWL RL