Home > mysql, Programming, Tip > “Bulk” update in mysql with the use of temporary table

“Bulk” update in mysql with the use of temporary table

You need to insert or update a large number of rows (at least hundreds of thousands) from an external data source (either from other database or file) into an existing mysql table but you are not allowed to truncate the table first because data source contains less number of columns than the existing table. The external source has a column that will be checked against the table’s primary key to determine if a record is new or not. For new records, using bulk insertion is pretty easy. Updating existing rows is quite a different story as there’s not yet an official way to do “bulk” updating. Prepare/Execute method might help a bit but still the performance is not satisfactory.

The solution:
Through a project I completed recently, I’ve learned a new way to solve the above problem, and the basic idea is to use bulk insertion to accomplish the goal of bulk updates, with the help of mysql temporary table. Let’s assume the table we are going to insert/update has the following structure:

tbl_A (
     userid char(20) not null,
     email varchar(250) not null,
     url varchar(250) ,
     flag_a tiny int,
     primary key(userid)

The basic steps:
1) Before entering the loop, create a temporary table tbl_A_tmp

create temporary table if not exists tbl_A_tmp (
   userid char(20) not null,
     email varchar(250) not null,
     url varchar(250) ,
     flag_a tiny int

Please note this table doesn’t have to have exactly the same data structure as tbl_A, as long as it holds the columns that will be used in step 5 and it’s also a good idea to remove any indexes to speed up insertion (to the temporary table).

2) Use delete or truncate statement to delete any existing rows in tbl_A_tmp

3) Inside the loop, save new records to an array list_new and existing records to an array list_existing, when num_of_elements(list_new) >=BATCHCOUNT, do batch insertion to tbl_A directly, when num_of_elements(list_existing) >=BATCHCOUNT, do batch insertion to tbl_A_tmp, clear list_new or list_existing after each batch insertion

4) When looping is done, Check remaining number of elements in both list_new and list_existing, perform batch insertions for any remaining records to the corresponding tables

4) Add unique or primary key to tbl_A_tmp, without this the next step will be very slow

5) Perform cross-table update like the following

    update tbl_A dest, tbl_A_tmp src set
    where dest_A.userid=src.userid

The reason why temporary table is used is because of
a) speed, a temporary table is usually created in the memory and will be converted to use disk storage if the temporary table becomes too large (hence data loss is prevented)
b) convenience, when the db link is disconnected,temporary tables are dropped automatically

How much performance gain do I get by using this method? 45 minutes (bulk insertion for new records + row-by-row updates) vs. less than 2 minutes (bulk insertion for new records + temporary table method for existing records).

Categories: mysql, Programming, Tip
  1. HG
    October 15, 2012 at 8:35 am

    Perfect, thank you!

    • October 15, 2012 at 10:39 am

      Glad to know it’s helpful to you.

  2. Tim
    May 7, 2013 at 8:54 pm

    Nice option – did not think of this, but I believe in truth it will be much faster. I am not updating hundreds of thousands, but certainly 10’s of thousands.

  3. February 6, 2015 at 1:08 pm

    Perfectly work for me…
    update tbl_A dest, tbl_A_tmp src set
    where dest_A.userid=src.userid

    but please help me to update non blank tmp columns into tbl_A

    • February 9, 2015 at 2:19 am

      All you need to do is to add non-empty dest.field_name checks into your where clause:

      where dest_A.userid=src.userid and dest_A.email'' /* and add more checks if you need to */

  4. Rahul
    January 20, 2016 at 10:32 am

    Thanks a lot.. It worked for me..

  5. April 17, 2017 at 4:53 pm

    Thanks a lot for the post!

    I was searching for solutions to insert large number of rows and I found this post very interesting. I just have a question and I hope you could share you opinion/experience ..

    In my researches I found the method “INSERT … ON DUPLICATE KEY UPDATE” and according to what I read, it’s possible to use with bulk insert.

    So, what do you think about the performance with which method?

    • April 18, 2017 at 10:34 pm

      I didn’t know about the “ON DUPLICATE KEY UPDATE” bulk insert method when I was working on the project that requires bulk update. The temp-table method has been serving me well. I will definitely try the other method when the time comes.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: