Diving into bulkCreate method in Sequelize
Background
During my recent development work on a data synchronization feature using bulkCreate
method in Sequelize, I encountered an issue with incorrect returning IDs when using the updateOnDuplicate: true
option. This led me to suspect that Sequelize generates the returning IDs through inference. To confirm my hypothesis, I decided to investigate the source code of Sequelize.
How Sequelize generate returning IDs in bulkCreate
Despite the extensive codebase of Sequelize, locating the bulkCreate
method was straightforward. A global search within the codebase led me directly to the src/model.js
file, where I found the relevant comment confirming my initial assumption.
/*
* The success handler is passed an array of instances, but please notice that these may not completely represent the state of the rows in the DB. This is because MySQL
* and SQLite do not make it easy to obtain back automatically generated IDs and other default values in a way that can be mapped to multiple records.
* To obtain Instances for the newly created values, you will need to query for them again.
*/
Next, I proceeded to investigate this method by following the code snippets below:
// src/model.js
const results = await model.queryInterface.bulkInsert(model.getTableName(options), records, options, fieldMappedAttributes);
static get queryInterface() {
return this.sequelize.getQueryInterface();
}
// src/sequelize.js
this.queryInterface = this.dialect.queryInterface;
// src/dialects/sqlite/index.js
// Just take SQLite as an example.
this.queryInterface = new SQLiteQueryInterface(
sequelize,
this.queryGenerator
);
// src/dialects/sqlite/query-interface.js
class SQLiteQueryInterface extends QueryInterface {
// ...
}
// src/dialects/abstract/query-generator.js
async bulkInsert(tableName, records, options, attributes) {
options = { ...options };
options.type = QueryTypes.INSERT;
const results = await this.sequelize.query(
this.queryGenerator.bulkInsertQuery(tableName, records, options, attributes),
options
);
return results[0];
}
// src/sequelize.js
async query(sql, options) {
// ...
const query = new this.dialect.Query(connection, this, options);
// ...
return await query.run(sql, bindParameters);
// ...
}
// src/dialects/sqlite/query.js
async run(sql, parameters) {
// ...
resolve(query._handleQueryResponse(this, columnTypes, executionError, results, errForStack.stack));
// ...
}
And finally I got what I was aiming for:
// src/dialects/sqlite/query.js
_handleQueryResponse(metaData, columnTypes, err, results, errStack) {
// ...
// add the inserted row id to the instance
if (this.isInsertQuery(results, metaData) || this.isUpsertQuery()) {
this.handleInsertQuery(results, metaData);
if (!this.instance) {
// handle bulkCreate AI primary key
if (
metaData.constructor.name === 'Statement'
&& this.model
&& this.model.autoIncrementAttribute
&& this.model.autoIncrementAttribute === this.model.primaryKeyAttribute
&& this.model.rawAttributes[this.model.primaryKeyAttribute]
) {
const startId = metaData[this.getInsertIdField()] - metaData.changes + 1;
result = [];
for (let i = startId; i < startId + metaData.changes; i++) {
result.push({ [this.model.rawAttributes[this.model.primaryKeyAttribute].field]: i });
}
} else {
result = metaData[this.getInsertIdField()];
}
}
}
// ...
}
Upon the _handleQueryResponse
method, it became apparent that Sequelize utilizes the last inserted ID and infers the remaining IDs when using bulkCreate
. Consequently, when using updateOnDuplicate: true
option, there is a possibility of receiving incorrect IDs in the return.
A handy tip for obtaining accurate IDs
Enhance the data table by introducing abatch
field, assigning a batch number to each set of data being inserted. Subsequently, you can query the inserted IDs by using the corresponding batch number.
Risk of using updateOnDuplicate: true
While utilizing the updateOnDuplicate: true
option, I observed that even though duplicate rows are skipped, the autoIncrement
key continues to increment. From the documentation of MySQL, it said:
The effects are not quite identical: For an InnoDB table where a is an auto-increment column, the INSERT statement increases the auto-increment value but the UPDATE does not.
The database follows a process of attempting to insert, incrementing the ID, and then the duplicate is detected. However, once the ID is auto-incremented, it cannot be rolled back.
This situation presents a potential risk when utilizing a master-slave database architecture. As the ID is auto-incremented but duplicate is detected, the operation do not leave bin logs, there is a chance for key conficts to arise when the master goes down and the slave is promoted as the new master.