📄 Import relational data to Neo4j using Apache Hop Neo4j Output

Import relational data to Neo4j using Apache Hop Neo4j Output


Last updated: February 12, 2026

This article explains how to export data from a relational database (PostgreSQL) and import it into a Neo4j graph database using the Neo4j Output transform in Apache Hop.

Prerequisites: Basic understanding of the property graph model, a working Apache Hop installation, and access to both a PostgreSQL and Neo4j instance.

Related article: Importing Relational Data to Neo4j using Apache Hop — Graph Output, which uses the Neo4j Graph Output plugin with a metadata-defined graph model. Graph Output loads only nodes that have at least one relationship, while Neo4j Output loads all nodes.

Source code: Available in the how-to-apache-hop public repository.

Overview

The Neo4j Output approach uses separate pipelines for nodes and relationships, orchestrated by workflows. The process has four steps:

  1. Design the graph model — Translate the relational schema into a graph data model.

  2. Implement node pipelines — One pipeline per node label.

  3. Implement relationship pipelines — One pipeline per relationship type.

  4. Implement workflows — Orchestrate execution in the correct order (nodes first, then relationships).

Sample database

This tutorial uses the dvdrental sample PostgreSQL database, which represents the business processes of a DVD rental store: films, actors, categories, languages, and staff.

Only a subset of entities is used to keep the graph focused on the most relevant relationships.

Step 1: Design the graph model

When converting a relational model to a graph model, follow three standard rules:

Relational concept

Graph concept

Row

Node

Table name

Label

Join / foreign key

Relationship

Nodes and labels

Relational table

Graph label

Actor

Actor

Film

Film

Category

Category

Language

Language

Relationships

Join

Relationship name

Source → Target

Actor ↔ Film (via film_actor)

ACTS_IN

Actor → Film

Film ↔ Category (via film_category)

BELONGS_TO

Film → Category

Film ↔ Language

IN

Film → Language

The resulting graph model:

Pipeline and workflow structure

Unlike Graph Output (which can load nodes and relationships in a single pipeline), Neo4j Output requires separate pipelines organized into workflows:

main-workflow

Nodes must be loaded before relationships, since relationships reference existing nodes by their primary keys.

Step 2: Implement node pipelines

Each node pipeline follows the same pattern: Table Input → Neo4j Output.

Example: Film nodes

Table Input SQL:

sql

SELECT *
FROM public.film;

Neo4j Output configuration:

Setting

Value

Transform name

write-films

Neo4j Connection

neo4j-connection

Batch size

1000

Create indexes

Yes

Use CREATE instead of MERGE

Yes

The label is set to Film and all fields from the Table Input are mapped as node properties. The film_id field is marked as the primary key.

Key settings explained:

  • Batch size: Aggregates 1,000 records per transaction for better performance.

  • Create indexes: Generates unique constraints for all primary key properties.

  • Use CREATE instead of MERGE: Bypasses lookup for faster loading. Use this for initial loads when you are certain no duplicates exist.

Result: 1,000 Film nodes created.

Repeat for remaining node types

Create identical pipelines for the other three node labels:

Pipeline

SQL source

Label

Primary key

Nodes created

write-films.hpl

public.film

Film

film_id

1,000

write-actors.hpl

public.actor

Actor

actor_id

200

write-categories.hpl

public.category

Category

category_id

16

write-languages.hpl

public.language

Language

language_id

16

Verification queries

cypher

MATCH (n:Film) RETURN n LIMIT 10;
MATCH (n:Actor) RETURN n LIMIT 10;
MATCH (n:Category) RETURN n LIMIT 10;
MATCH (n:Language) RETURN n;

Step 3: Implement relationship pipelines

Each relationship pipeline also uses Table Input → Neo4j Output, but with the Only create relationships option enabled. The Neo4j Output transform is configured with three tabs: From Node, To Node, and Relationship.

Example: ACTS_IN relationship

Table Input SQL:

sql

SELECT *
FROM public.film_actor;

The film_actor table contains actor_id, film_id, and last_update — the join table that represents the many-to-many relationship.

Neo4j Output configuration:

Setting

Value

Transform name

write-acts-in

Neo4j Connection

neo4j-connection

Batch size

1000

Only create relationships

Yes

From Node tab:

Setting

Value

Label

Actor

Property field

actor_id

Property name

actorId

Primary

Yes

To Node tab:

Setting

Value

Label

Film

Property field

film_id

Property name

filmId

Primary

Yes

Relationship tab:

Setting

Value

Relationship value

ACTS_IN

Relationship property field

last_update

Relationship property name

lastUpdate

Property type

LocalDateTime

Result: 5,462 ACTS_IN relationships created.

Repeat for remaining relationship types

Pipeline

SQL source

Relationship

From → To

Created

write-acts-in.hpl

public.film_actor

ACTS_IN

Actor → Film

5,462

write-belongs-to.hpl

public.film_category

BELONGS_TO

Film → Category

1,000

write-in.hpl

public.film

IN

Film → Language

1,000

Verification queries

cypher

MATCH (a:Actor)-[]-(f:Film) RETURN a,f LIMIT 10;
MATCH (f:Film)-[]-(c:Category) RETURN f,c LIMIT 10;
MATCH (f:Film)-[]-(l:Language) RETURN f,l LIMIT 10;

Step 4: Implement the workflows

Nodes workflow

Create a workflow that chains all node pipelines sequentially:

For each action, browse to the pipeline file. Enable Wait for the pipeline to complete.

Relationships workflow

Create a workflow that chains all relationship pipelines:

Main workflow

Create a main workflow that runs the two sub-workflows in order:

This ensures nodes and indexes exist before relationships are created.

Verifying the result

After the main workflow completes, verify the graph in Neo4j.

View the graph schema:

cypher

CALL db.schema.visualization;

View sample data with relationships:

cypher

MATCH (n) RETURN n LIMIT 15;

Neo4j Output vs. Graph Output

Aspect

Neo4j Output

Graph Output

Graph model metadata

Not required

Required (metadata object)

Pipeline structure

Separate pipelines for nodes and relationships

Single pipeline for everything

Node loading

Loads all nodes, including those without relationships

Loads only nodes that appear in the join result

Workflow orchestration

Required (nodes before relationships)

Not required

Flexibility

More granular control per node/relationship type

More compact, single-pass approach

Best for

Full data loads, complex graphs with many node types

Simpler models, faster setup

Choose Neo4j Output when you need to load all nodes regardless of relationships, or when you want granular control over each node type and relationship pipeline. Choose Graph Output when you prefer a single-pipeline approach with metadata-driven mapping.