Mathias Hauser

My opinions and thoughts on various things and experiences of software development

My start with Neo4J and what you should consider when starting with it

I had to develop an application which collects real time data from twitter and stores them into a graph database for a university lecture. I took it a little bit more serious than necessary to dive deeper into this technology and to find the limitations of my approach. My submitted project was far more than necessary, but I knew that it couldn’t run in a productive environment. The reason for this was not enough time to rewrite the whole database model at the end and a problem definition for which Neo4j was just the wrong technology.

I start with a project setup which uses Spring Boot and Spring Data Neo4j, after this I tell you what you should consider in my opinion when starting on your own.

I recommend downloading the source of this example before you start. Link to the repository blog_neo4j-spring-boot direct link to the source zip source-zip

Content of this post

  • Projekt setup with Spring Boot, Spring Data Neo4j and Spring Data JPA
  • What you should consider – in my opinion
  • Further reading and references

Project setup with Spring Boot, Spring Data Neo4j and Spring Data JPA

The project setup is based on the setup of my post Bootstrap an application with Spring Boot – Part2 Web application. So I will only highlight the Neo4j specific changes and extensions.

Additional dependencies

We need two additional dependencies in our gradle configuration for the usage of Neo4j. These are Spring Data Neo4j and Hibernate Validator. The required changes to the build.gradle file are highlighted in the following build.gradle file.

buildscript {
    repositories {
        mavenLocal()
        mavenCentral()
        maven { url "http://repo.spring.io/libs-snapshot" }
    }
    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:1.0.0.RC1")
    }
}

apply plugin: 'java'
apply plugin: 'war'
apply plugin: 'eclipse-wtp'
apply plugin: 'idea'
apply plugin: 'spring-boot'

version = '1.0'
group = 'mh.dev.blog'
description = 'neo4j-spring-boot'

war {
    baseName = 'hello'
    version =  '0.1.0'
}

project.ext {
	springBootVersion = '1.0.0.RC1'
	springDataNeo4jVersion = '3.0.0.RC1'
}

repositories {
    mavenLocal()
    mavenCentral()
    maven { url "http://repo.spring.io/libs-snapshot" }
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web:$springBootVersion")
    compile("org.springframework.boot:spring-boot-starter-data-jpa:$springBootVersion")
    compile("org.springframework.data:spring-data-neo4j:$project.ext.springDataNeo4jVersion")
    compile("org.hibernate:hibernate-validator:5.0.3.Final")
    compile("org.apache.commons:commons-lang3:3.2.1")
    compile("com.google.guava:guava:16.0.1")
    compile("org.yaml:snakeyaml:1.13")
    compile("mysql:mysql-connector-java:5.1.28")
	
    testCompile("org.springframework.boot:spring-boot-starter-test:$springBootVersion")
    testCompile("org.hsqldb:hsqldb:2.3.1")
    testCompile("junit:junit:4.11")
}

task wrapper(type: Wrapper) {
    gradleVersion = '1.10'
}

Changes for the JPA configuration

There is only one change in the Application.java file, the reason for this change is that I use JPA and Neo4j in this project. This requires telling Spring Data JPA the location of the JPA repositories, otherwise it finds the Neo4j repositories too which ends in not mapped model classe exceptions. The package in my case “is mh.dev.blog.repository.jpa” the repositories for Neo4j are located in mh.dev.blog.repository.graph. This might not be a good separation, but it’s suitable for this example project.

package mh.dev.blog;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.EnableAutoConfiguration;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.jpa.repository.config.EnableJpaRepositories;

@Configuration
@ComponentScan
@EnableJpaRepositories(basePackages = "mh.dev.blog.repository.jpa")
@EnableAutoConfiguration
public class Application {

	public static void main(String[] args) {
		SpringApplication.run(Application.class, args);
	}
}

Neo4j specifc configuration

We need one additional configuration class for Neo4j related configuration. It’s possible to place this class anywhere on your class path. I decided to place this file in mh.dev.blog.config package. The @Configuration annotation marks this class as part of the application configuration. We define in line 18 the base package for Neo4j related repository classes.
We use in this project an embedded Neo4j Database. Another possibility would be the usage of the REST-API of Neo4j which is slower but required for horizontal scalling. The embedded Database is configured in line 23 which will create the folder “graph.db” in the root of your execution directory. This means that the location can be weird in some cases, especially when you deploy your application in a tomcat with eclipse. The location will be the folder of your eclipse instance in this case.
The 3rd part is the configuration for the transaction management in lines 26 to 32. This ensures that JPA and Neo4j can be used with correct transaction support.

package mh.dev.blog.config;

import javax.persistence.EntityManagerFactory;

import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.data.neo4j.config.EnableNeo4jRepositories;
import org.springframework.data.neo4j.config.JtaTransactionManagerFactoryBean;
import org.springframework.data.neo4j.config.Neo4jConfiguration;
import org.springframework.data.transaction.ChainedTransactionManager;
import org.springframework.orm.jpa.JpaTransactionManager;
import org.springframework.transaction.PlatformTransactionManager;

@Configuration
@EnableNeo4jRepositories(basePackages = "mh.dev.blog.repository.graph")
public class Neo4jConfig extends Neo4jConfiguration {

	@Bean
	public GraphDatabaseService graphDatabaseService() {
		return new GraphDatabaseFactory().newEmbeddedDatabase("graph.db");
	}

	@Autowired
	@Bean(name = "transactionManager")
	public PlatformTransactionManager neo4jTransactionManager(EntityManagerFactory entityManagerFactory, GraphDatabaseService graphDatabaseService)
			throws Exception {
		return new ChainedTransactionManager(new JpaTransactionManager(entityManagerFactory),
				new JtaTransactionManagerFactoryBean(graphDatabaseService).getObject());
	}

}

Neo4j Entity and Repository

Let’s take a look at an example Neo4j entity and repository class. This entity is a minimal approach without additional indexes to keep things simple. The important lines are highlighted.

  • @NodeEntity – Marks this class as a Neo4j entity
  • @TypeAlias(“GraphWord”) – Creates a label for this entity in the graph. This makes it easier to write queries for your application
  • @GraphId – Defines the id attribute for this class
package mh.dev.blog.model.graph;

import org.springframework.data.annotation.TypeAlias;
import org.springframework.data.neo4j.annotation.GraphId;
import org.springframework.data.neo4j.annotation.NodeEntity;

@NodeEntity
@TypeAlias("GraphWord")
public class GraphWord {

	@GraphId
	private Long id;

	private String text;

	public Long getId() {
		return id;
	}

	public void setId(Long id) {
		this.id = id;
	}

	public String getText() {
		return text;
	}

	public void setText(String text) {
		this.text = text;
	}

	@Override
	public String toString() {
		return "GraphWord [id=" + id + ", text=" + text + "]";
	}
}

Now let’s take a look at a simple repository class for our GraphWord class. We extend the class CRUDRepository to gain advantage of some utility methods like save or delete. But take care to use the correct class, because there is another class with the name CrudRepository which is for JPA entities. I don’t explain the query language in this post, but I mention some additional resources at the end of this post. But what I want to highlight is that the GraphWord in the query relates to the @TypeAlias annotation.

package mh.dev.blog.repository.graph;

import java.util.List;

import mh.dev.blog.model.graph.GraphWord;

import org.springframework.data.neo4j.annotation.Query;
import org.springframework.data.neo4j.repository.CRUDRepository;

public interface GraphWordRepository extends CRUDRepository<GraphWord> {

	@Query("Match (word:GraphWord) Where word.text = {0} return word")
	public GraphWord text(String text);

}

Example test

Nothing special here, you can write tests in the same fashion as with JPA entities.

package mh.dev.blog.test.controller;

import mh.dev.blog.Application;
import mh.dev.blog.model.graph.GraphWord;
import mh.dev.blog.repository.graph.GraphWordRepository;

import org.junit.Assert;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.SpringApplicationConfiguration;
import org.springframework.test.context.ActiveProfiles;
import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;
import org.springframework.test.context.transaction.TransactionConfiguration;
import org.springframework.test.context.web.WebAppConfiguration;
import org.springframework.transaction.annotation.Transactional;

@WebAppConfiguration
@ActiveProfiles(value = "test")
@Transactional
@TransactionConfiguration(defaultRollback = true)
@RunWith(SpringJUnit4ClassRunner.class)
@SpringApplicationConfiguration(classes = { Application.class })
public class GraphTest {

	@Autowired
	private GraphWordRepository repository;

	@Test
	public void graphtTest() {
		GraphWord graphWord1 = new GraphWord();
		graphWord1.setText("word1");
		repository.save(graphWord1);

		GraphWord graphWord2 = new GraphWord();
		graphWord2.setText("word2");
		repository.save(graphWord2);

		Assert.assertEquals(1, repository.text("word1").size());
		Assert.assertEquals(0, repository.text("word3").size());
	}
}

What you should consider – in my opinion

First of all if you plan to start developing a Neo4j application with Spring Data Neo4j use version 3. There is at the moment only a release candidate, but it’s based on Neo4j 2.0 instead of 1.9.This makes writing queries easier because of labels for classes (@TypeAlias) and the start part is not necessary anymore.

  • Neo4j is an in-memory database, which means it may need a lot of memory – the state itself is always persisted on the hard drive. This makes it really hard if you want to analyse the hole graph, because Neo4j has to load the whole graph in the memory to execute the query. You know that you don’t have enough memory when the database starts garbage collection all the time. I personally would say, if you want to query something which requires the whole graph, Neo4j might not be the right technology for you.
  • Transaction behaviour of RelationshipEntities might create headache. The reason for this is that Neo4j locks both related entities, which might cause a deadlock. Positive thing Neo4j detects these situations and you get an exception. There are two situations which caused me a lot of troubles because of this behaviour. One is Justin Bieber. I had a releation to his node for each retweet. What looks like no problem, but this guy forces so many retweets which caused certain deadlocks. The other thing was, that I wanted to use exactly one node for each word. Which means if there two tweets which contains the same word, they have a relation to this node. But this is impossible due to this transactional behaviour. So keep this in mind when you need a lot of insert performance.
  • Relations are fast, I can’t say at the moment how much the difference is between Neo4j and a relational database, but I plan to create some sort of benchmark to get a better understanding. Beside that don’t use neo4j to query something like how old is the oldest guy in my database or what was the first post, you can do it but I think there are better ways for that. This leads to the next point
  • Don’t use Neo4j to solve everything related to data persistence, there is a reason why my setup contains also a JPA configuration. The reason for that is that relational databases are really good for many tasks – use them.
  • Plan enough time for creating your domain model, because this can have a high impact in later stages
  • One important question for myself: “Would I use Neo4j in another project?” I would say yes but always supported by other database types, because Neo4j is not a solution for everything.

Further reading and references

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: