Archive

Archive for the ‘mysql’ Category

MySQL: select data whose certain column contains the specified values only

December 23, 2011 2 comments

This problem could be best described by an example as below:

select * from mytbl;
+------+------+
| k    | v    |
+------+------+
| A    | 1    |
| A    | 2    |
| A    | 3    |
| B    | 1    |
| B    | 3    |
| C    | 1    |
| C    | 3    |
| C    | 4    |
| E    | 2    |
| E    | 1    |
| F    | 5    |
+------+------+

k, v are both char(10), not null.

I want to find out records whose v column is either 1 or 3 [and only 1 or 3, therefore in the above example the only qualified records are (‘B’, ‘1’) and (‘B’, ‘3’)]. This is just a simplified version of the problem I faced at work recently and I did find someone posting the similar question on Internet but I forgot to bookmark the url and I didn’t find its solutions interesting hence I pulled my hair a bit and came up with the following solution using group_concat and regexp:

select k, group_concat(distinct v order by v) as g from mytbl group by k having g regexp '^(1,?)?(3)?$';
+------+------+
| k    | g    |
+------+------+
| B    | 1,3  |
+------+------+

If the requirement becomes v is either 1, 3 or 4 (again, 1, 3 or 4 only),

select k, group_concat(distinct v order by v) as g from mytbl group by k having g regexp '^(1,?)?(3,?)?(4)?$';
+------+-------+
| k    | g     |
+------+-------+
| B    | 1,3   |
| C    | 1,3,4 |
+------+-------+
Categories: mysql, Programming

Play with randomness in MySQL

December 2, 2011 1 comment

Some of you might know about the rand() function in MySQL — it randomly generates float point number between 0 and 1. Most of the time we use it to generate results that we want to be in random order. For example,

We have a people table that has the following records:

select * from people;
+-----------+
| name      |
+-----------+
| Bob       |
| Alice     |
| Kim       |
| Tom       |
| Jerry     |
| Linda     |
| Fransisco |
| Zack      |
| Peter     |
+-----------+

We are going to pick 3 persons out of the list randomly, so we do a

mysql> select * from people order by rand() limit 3;
+------+
| name |
+------+
| Zack |
| Bob  |
| Kim  |
+------+

The above query should return (most likely) different result every time. Next I am going to make it a little more interesting. Let’s say I have another table named prize which contains a list of prizes:

select * from prize;
+-----------------+
| name            |
+-----------------+
| Pencil          |
| Coffee grinder  |
| iPad            |
| GPS watch       |
| Yoga mat        |
| 2 movie tickets |
+-----------------+

What we want to do is to assign a randomly picked prize to each person in the people table (assuming the same prize can be assigned to more than one person as there are fewer number of prizes than number of people), here’s the query

select o.*, (select name from prize order by rand() limit 1) as prize from people o;

This would return something like the following (again the result will be most like different if you try it):

+-----------+-----------------+
| name      | prize           |
+-----------+-----------------+
| Bob       | iPad            |
| Alice     | Yoga mat        |
| Kim       | GPS watch       |
| Tom       | GPS watch       |
| Jerry     | iPad            |
| Linda     | GPS watch       |
| Fransisco | Pencil          |
| Zack      | 2 movie tickets |
| Peter     | GPS watch       |
+-----------+-----------------+

For your convenience you can use the following sql statements to generate the tables with data populated:

CREATE TABLE `people` (
  `name` varchar(30) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `people` VALUES ('Bob'),('Alice'),('Kim'),('Tom'),('Jerry'),('Linda'),('Fransisco'),('Zack'),('Peter');

CREATE TABLE `prize` (
  `name` varchar(50) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `prize` VALUES ('Pencil'),('Coffee grinder'),('iPad'),('GPS watch'),('Yoga mat'),('2 movie tickets');
Categories: mysql, Programming

Node.js: improve mysql query performance with mysql-pool and cluster

November 1, 2011 1 comment

Prior testing:

npm install mysql -g
npm install mysql-pool -g
npm install express -g
npm install cluster -g

Code that archives best result:

require.paths.push('/usr/local/lib/node_modules');
var express=require('express');
var cluster=require('cluster');
var MySQLPool = require("mysql-pool").MySQLPool;
var pool = new MySQLPool({
  poolSize: 10,
  user:     'node',
  password: 'node',
  database: 'testdb'
});

function symsearch(q, res) {
	var query=pool.query(
		   "select symbol,company_name from symbols where symbol like ? limit 10",
			['%'+q+'%'],
			function selectcb(err, results, fields) {
				if(err) {throw err;}
				console.log("Searching Result for "+q);
				res.send( results );
			}
	);

}

var app=express.createServer();
app.get('/sym/:q', function( req, res) {
	var q=req.params.q || '';
	if(q.length>0) symsearch(q, res);
});

cluster(app).listen(3000);

Benchmark command:

ab -n 10000 -c 250 http://localhost:3000/sym/<pattern>

Results:
With the js code above, I am able to achieve 1300/sec on a dual-core 2.9GHz Virtualbox vm (ubuntu 10.10 32bit). When the same test is conducted on Apache2 + php5, I am getting about 800/sec. If I opt-for the single connection mode (use ‘mysql’ instead of ‘mysql-pool’), the result is only a little better than apache + php combo. The use of node module cluster is to take advantage of the dual-core cpu (node would run with in single-core mode due to its single-threaded nature).

Categories: benchmarking, mysql, node.js

Mysql count with case statement

July 1, 2011 Leave a comment

In one of my recent perl scripts, I need to get a row count of total records from a mysql table, as well as the row count of some rows with some criteria. By combining the technique from http://stackoverflow.com/questions/5045124/count-case-and-when-statement-in-mysql and the atomic row fetch trick I mentioned a while back, I came up with the following script (for illustration only hence this is not the actual script that I wrote)

#!/usr/bin/perl -w
use strict;
use DBI;

# make db connection and stuff ...
my $dbh=DBI->connect( 'DBI:mysql:db:localhost:3306', 'username', 'password' ) or die( $DBI::errstr );

my $sql=qq{ select 
    count(feedback_id) as total_count, 
    count( case when feedback_type in ('suggestion', 'bug_report') then 1 else null end ) as special_count
from customer_feedbacks};

my ($total_count, $special_count)=$dbh->selectrow_array( $sql );
# do something with $total_count and $special_count

$dbh->disconnect() if $dbh;

The benefit of crafting the mysql statement this way is that I only need to do one mysql query instead of two, combined with atomic fetch, the resulting code becomes very clean.

Categories: mysql, perl, Programming

Fetch single row of result from php (atomic fetch)

June 24, 2011 Leave a comment

A while back I wrote about doing mysql atomic fetch using Perl [ here ], actually the similar method can be used on PHP, a couple of examples
Example #1

<?php
    $conn=new mysqli('localhost', 'user', 'pass', 'db');
    $res=$conn->query('select count(customer_id) from customer_info');
    list($cnt)= $res->fetch_row();
    echo "There are $cnt customers.\n";
    $conn->close();

Example #2

<?php
    $conn=new mysqli('localhost', 'user', 'pass', 'db');
    $res=$conn->query("select zip_code, email from customer_info where customer_id='1234'");
    list($zip_code, $email)= $res->fetch_row();
    echo "zip_code=$zip_code, email=$email.\n";
    $conn->close();

P.S. I learned a new trick to embed source codes inside html from [ http://en.support.wordpress.com/code/posting-source-code/ ]

Categories: mysql, php, Programming

“Bulk” update in mysql with the use of temporary table

June 21, 2011 8 comments

Problem:
You need to insert or update a large number of rows (at least hundreds of thousands) from an external data source (either from other database or file) into an existing mysql table but you are not allowed to truncate the table first because data source contains less number of columns than the existing table. The external source has a column that will be checked against the table’s primary key to determine if a record is new or not. For new records, using bulk insertion is pretty easy. Updating existing rows is quite a different story as there’s not yet an official way to do “bulk” updating. Prepare/Execute method might help a bit but still the performance is not satisfactory.

The solution:
Through a project I completed recently, I’ve learned a new way to solve the above problem, and the basic idea is to use bulk insertion to accomplish the goal of bulk updates, with the help of mysql temporary table. Let’s assume the table we are going to insert/update has the following structure:

tbl_A (
     userid char(20) not null,
     email varchar(250) not null,
     url varchar(250) ,
     flag_a tiny int,
     primary key(userid)
)

The basic steps:
1) Before entering the loop, create a temporary table tbl_A_tmp

create temporary table if not exists tbl_A_tmp (
   userid char(20) not null,
     email varchar(250) not null,
     url varchar(250) ,
     flag_a tiny int
)

Please note this table doesn’t have to have exactly the same data structure as tbl_A, as long as it holds the columns that will be used in step 5 and it’s also a good idea to remove any indexes to speed up insertion (to the temporary table).

2) Use delete or truncate statement to delete any existing rows in tbl_A_tmp

3) Inside the loop, save new records to an array list_new and existing records to an array list_existing, when num_of_elements(list_new) >=BATCHCOUNT, do batch insertion to tbl_A directly, when num_of_elements(list_existing) >=BATCHCOUNT, do batch insertion to tbl_A_tmp, clear list_new or list_existing after each batch insertion

4) When looping is done, Check remaining number of elements in both list_new and list_existing, perform batch insertions for any remaining records to the corresponding tables

4) Add unique or primary key to tbl_A_tmp, without this the next step will be very slow

5) Perform cross-table update like the following

    update tbl_A dest, tbl_A_tmp src set
        dest.email=src.email,
        dest.url=src.url,
        dest.flag_a=src.flag_a
    where dest_A.userid=src.userid

The reason why temporary table is used is because of
a) speed, a temporary table is usually created in the memory and will be converted to use disk storage if the temporary table becomes too large (hence data loss is prevented)
b) convenience, when the db link is disconnected,temporary tables are dropped automatically

How much performance gain do I get by using this method? 45 minutes (bulk insertion for new records + row-by-row updates) vs. less than 2 minutes (bulk insertion for new records + temporary table method for existing records).

Categories: mysql, Programming, Tip

Comparing two mysql tables containing huge number of records each

June 15, 2011 Leave a comment

Usually I use the following method to compare two mysql tables with identical structure (or one contains the subset of columns of the other):

select * from db_A.tbl_data
where (col1, col2, col3, ...) not in
(select col1, col2, col3, ... from db_B.tbl_data);

select * from db_B.tbl_data
where (col1, col2, col3, ...) not in
(select col1, col2, col3, ... from db_A.tbl_data)

[ change * to count(*) if only the number of differences is needed ]

This method works pretty well in terms of performance when
1) number of columns to be compared is not too big, usually around 10
2) there are indexes defined in each table, this is very important when number of rows to be compared in each table is huge, I’lve done comparisons between 2M rows agains 2M rows, it can take minutes to finish, even with indexes.

But I ran into performance problem the other day when I tried to compare two identical-structured tables (comparing all 15 columns) with almost 3M rows each, and each table has only 1 index defined. Queries using the first method ran very slowly and I quickly used up patience ( I waited for more than half an hour and even the first query couldn’t got finished). Without other choices, I came up with the following solution:

a) log in to mysql server
b) select * from db_A.tbl_data into outfile '/tmp/outfile_A';
c) select * from db_B.tbl_data into outfile '/tmp/outfile_B';
d) exit mysql console
e) cd /tmp; diff outfile_A outfile_B > diff
f) wc -l diff

if the result from f) is small, I can open up file diff with an editor to see what the differences are, otherwise I can use head diff and/or tail diff to take a glance at the differences. With this method, I got the results in less than 3 minutes, which is a huge improvement compared to the first method.

Categories: howto, mysql, Tip