20.12.2017
Bulk and Batch imports with Spring Boot
This article describes how to implement bulk and batch inserts with Spring Boot and Hibernate using the EntityManager directly.
For this it is necessary to configure the batch size so that Hibernate knows that the SQL inserts have to be combined to batch_size
SQL-statements.
The following properties must be set in the application.properties file of Spring Boot to achieve that:
spring.jpa.properties.hibernate.jdbc.batch_size=5 spring.jpa.properties.hibernate.order_inserts=true spring.jpa.properties.hibernate.order_updates=true
It is important that the prefix spring.jpa.properties
is used. This ensures that Spring passes the values through to Hibernate.
Many thanks to Michael Simons @rotnroll666 and Vlad Mihalcea @vlad_mihalcea for the helpful tips!
In addition, it is recommended that you set the following property in the file application.properties to see the Hibernate statistics and to be able to check whether the SQL inserts were really executed in a batch.
spring.jpa.properties.hibernate.generate_statistics=true
In the following two ways are shown how to perform a bulk import within a Spring boot application.
Repository
One possibility is to create your own repository.
package org.hameister.bulk.data; import org.springframework.data.jpa.repository.support.SimpleJpaRepository; import org.springframework.stereotype.Repository; import org.springframework.transaction.annotation.Transactional; import javax.persistence.EntityManager; import java.util.List; /** * Created by hameister on 19.12.17. */ @Repository public class BulkImporterRepository extends SimpleJpaRepository<Item, String> { private EntityManager entityManager; public BulkImporterRepository(EntityManager entityManager) { super(Item.class, entityManager); this.entityManager=entityManager; } @Transactional public List<Item> save(List<Item> items) { items.forEach(item -> entityManager.persist(item)); return items; } }
This extends the class SimpleJpaRepository
and gets an EntityManager
in the constructor. In the save method, the entity manager is used to save the Item
Objects with persist
. Important is the annotation @Transactional
, which ensures that Spring handles the transactions.
It should also be noted that this example does not use the SimpleJpaRepository.save (Iterable <S> entities)
method.
The reason for this is that in the example you want to make sure that persist()
is called and not merge()
.
Why this can lead to problems in this example and prevents bulk import is described in the article Bulk and Batch imports with Spring Boot and the CrudRepository. There you will find two approaches how to prevent the merge()
call.
Item
package org.hameister.bulk.data; import javax.persistence.*; /** * Created by hameister on 01.12.17. */ @Entity @Table(name = "Item") public class Item { @Id String id; @Column(name = "description") private String description; @Column(name = "location") private String location; public Item() { } public String getId() { return id; } public void setId(String id) { this.id = id; } public String getDescription() { return description; } public void setDescription(String description) { this.description = description; } public String getLocation() { return location; } public void setLocation(String location) { this.location = location; } }
Service
Another way to perform a bulk import is to create your own service.
package org.hameister.bulk.service; import org.hameister.bulk.data.BulkImporterRepository; import org.hameister.bulk.data.Item; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service; import org.springframework.util.Assert; import java.util.List; @Service public class BulkImporterService { private EntityManagerFactory emf; @Autowired public BulkImporterService(EntityManagerFactory emf) { Assert.notNull(emf, "EntityManagerFactory must not be null"); this.emf = emf; } public List<Item> bulkWithEntityManager(List<Item> items) { EntityManager entityManager = emf.createEntityManager(); entityManager.getTransaction().begin(); items.forEach(item -> entityManager.persist(item)); entityManager.getTransaction().commit(); entityManager.close(); return items; } }
This solution uses Dependency injection in the constructor to inject an EntityManagerFactory
.
This is used in the method bulkWithEntityManager
to create an EntityManager
.
With this EntityManager
a transaction is created and then all items are stored by calling the method persist
.
After that the transaction is committed so that the data is written to the database.
In addition, the EntityManager
should be closed with close()
.
As you can see, you have to deal with the transaction handling yourself in this variant.
The complete source code can be found on Github SpringBootBulkImport as a Maven project.
The example also contains a Spring Boot application with REST-Controller to test the import. If you call the endpoint http://localhost:8080/repositoryimport in a browser after you started the application you should see a similar output in the console if batch_size=5
.
5015192 nanoseconds spent acquiring 1 JDBC connections; 0 nanoseconds spent releasing 0 JDBC connections; 442437 nanoseconds spent preparing 1 JDBC statements; 0 nanoseconds spent executing 0 JDBC statements; 25708379 nanoseconds spent executing 2 JDBC batches; 0 nanoseconds spent performing 0 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 0 nanoseconds spent performing 0 L2C misses; 110442364 nanoseconds spent executing 1 flushes (flushing a total of 10 entities and 0 collections); 0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
You see that the 10 Item
s (entities) are imported in two batches within one SQL statement.
Further informations and explanations concerning Hibernate can be found in Vlad Mihalceas Blogpost The best way to do batch processing with JPA and Hibernate.
In the article Spring Boot Bulk and Batch imports with the CrudRepository a two approaches without using the EntityManager directly are described.