Diving into bulkCreate method in Sequelize

July 8, 2023

Background

During my recent development work on a data synchronization feature using bulkCreate method in Sequelize, I encountered an issue with incorrect returning IDs when using the updateOnDuplicate: true option. This led me to suspect that Sequelize generates the returning IDs through inference. To confirm my hypothesis, I decided to investigate the source code of Sequelize.

How Sequelize generate returning IDs in `bulkCreate`

Despite the extensive codebase of Sequelize, locating the bulkCreate method was straightforward. A global search within the codebase led me directly to the src/model.js file, where I found the relevant comment confirming my initial assumption.

/*
  * The success handler is passed an array of instances, but please notice that these may not completely represent the state of the rows in the DB. This is because MySQL
  * and SQLite do not make it easy to obtain back automatically generated IDs and other default values in a way that can be mapped to multiple records.
  * To obtain Instances for the newly created values, you will need to query for them again.
  */

Next, I proceeded to investigate this method by following the code snippets below:

// src/model.js
const results = await model.queryInterface.bulkInsert(model.getTableName(options), records, options, fieldMappedAttributes);

static get queryInterface() {
  return this.sequelize.getQueryInterface();
}

// src/sequelize.js
this.queryInterface = this.dialect.queryInterface;

// src/dialects/sqlite/index.js
// Just take SQLite as an example.
this.queryInterface = new SQLiteQueryInterface(
  sequelize,
  this.queryGenerator
);

// src/dialects/sqlite/query-interface.js
class SQLiteQueryInterface extends QueryInterface { 
  // ...
}

// src/dialects/abstract/query-generator.js
async bulkInsert(tableName, records, options, attributes) {
  options = { ...options };
  options.type = QueryTypes.INSERT;

  const results = await this.sequelize.query(
    this.queryGenerator.bulkInsertQuery(tableName, records, options, attributes),
    options
  );

  return results[0];
}

// src/sequelize.js
async query(sql, options) {
  // ...
  const query = new this.dialect.Query(connection, this, options);
  // ...
  return await query.run(sql, bindParameters);
  // ...
}

// src/dialects/sqlite/query.js
async run(sql, parameters) {
  // ...
  resolve(query._handleQueryResponse(this, columnTypes, executionError, results, errForStack.stack));
  // ...
}

And finally I got what I was aiming for:

// src/dialects/sqlite/query.js
_handleQueryResponse(metaData, columnTypes, err, results, errStack) {
  // ...
  // add the inserted row id to the instance
  if (this.isInsertQuery(results, metaData) || this.isUpsertQuery()) {
    this.handleInsertQuery(results, metaData);
    if (!this.instance) {
      // handle bulkCreate AI primary key
      if (
        metaData.constructor.name === 'Statement'
        && this.model
        && this.model.autoIncrementAttribute
        && this.model.autoIncrementAttribute === this.model.primaryKeyAttribute
        && this.model.rawAttributes[this.model.primaryKeyAttribute]
      ) {
        const startId = metaData[this.getInsertIdField()] - metaData.changes + 1;
        result = [];
        for (let i = startId; i < startId + metaData.changes; i++) {
          result.push({ [this.model.rawAttributes[this.model.primaryKeyAttribute].field]: i });
        }
      } else {
        result = metaData[this.getInsertIdField()];
      }
    }
  }
  // ...
}

Upon the _handleQueryResponse method, it became apparent that Sequelize utilizes the last inserted ID and infers the remaining IDs when using bulkCreate. Consequently, when using updateOnDuplicate: true option, there is a possibility of receiving incorrect IDs in the return.

A handy tip for obtaining accurate IDs
Enhance the data table by introducing a batch field, assigning a batch number to each set of data being inserted. Subsequently, you can query the inserted IDs by using the corresponding batch number.

Risk of using `updateOnDuplicate: true`

While utilizing the updateOnDuplicate: true option, I observed that even though duplicate rows are skipped, the autoIncrement key continues to increment. From the documentation of MySQL, it said:

The effects are not quite identical: For an InnoDB table where a is an auto-increment column, the INSERT statement increases the auto-increment value but the UPDATE does not.

The database follows a process of attempting to insert, incrementing the ID, and then the duplicate is detected. However, once the ID is auto-incremented, it cannot be rolled back.

This situation presents a potential risk when utilizing a master-slave database architecture. As the ID is auto-incremented but duplicate is detected, the operation do not leave bin logs, there is a chance for key conficts to arise when the master goes down and the slave is promoted as the new master.

Background

How Sequelize generate returning IDs in bulkCreate

Risk of using updateOnDuplicate: true

How Sequelize generate returning IDs in `bulkCreate`

Risk of using `updateOnDuplicate: true`