Archive

Archive for June, 2011

Fetch single row of result from php (atomic fetch)

June 24, 2011 Leave a comment

A while back I wrote about doing mysql atomic fetch using Perl [ here ], actually the similar method can be used on PHP, a couple of examples
Example #1

<?php
    $conn=new mysqli('localhost', 'user', 'pass', 'db');
    $res=$conn->query('select count(customer_id) from customer_info');
    list($cnt)= $res->fetch_row();
    echo "There are $cnt customers.\n";
    $conn->close();

Example #2

<?php
    $conn=new mysqli('localhost', 'user', 'pass', 'db');
    $res=$conn->query("select zip_code, email from customer_info where customer_id='1234'");
    list($zip_code, $email)= $res->fetch_row();
    echo "zip_code=$zip_code, email=$email.\n";
    $conn->close();

P.S. I learned a new trick to embed source codes inside html from [ http://en.support.wordpress.com/code/posting-source-code/ ]

Categories: mysql, php, Programming

3 Substring methods benchmark

June 23, 2011 Leave a comment
#!/usr/bin/perl
use strict;
use warnings;

my $number='12345678';
my ($first,$second);
for my $i (1..10_000_000) {
    if( $number =~ /^(\d{3}(\d{5}))$/ ) {
        $first=$1;
        $second=$2;
    }
}
time perl reg.pl

real	0m14.066s
user	0m13.974s
sys	0m0.012s
============================================================
#!/usr/bin/perl
use strict;
use warnings;

my $number='12345678';
my ($first,$second);
for my $i (1..10_000_000) {
    ($first, $second)=unpack('a3a5', $number);
}

time perl unpack.pl

real	0m9.427s
user	0m9.321s
sys	0m0.016s
============================================================
#!/usr/bin/perl
use strict;
use warnings;

my $number='12345678';
my ($first,$second);
for my $i (1..10_000_000) {
    $first=substr($number, 0, 3);
    $second=substr($number, 3, 5);
}

time perl substr.pl

real	0m5.945s
user	0m5.846s
sys	0m0.017s

Seems the winner is the good old substr routine.

Check apache config syntax without restarting

June 22, 2011 Leave a comment

Command:
apachectl configtest

Found from http://httpd.apache.org/docs/1.3/configuring.html [ first section ]

Categories: apache, howto

“Bulk” update in mysql with the use of temporary table

June 21, 2011 8 comments

Problem:
You need to insert or update a large number of rows (at least hundreds of thousands) from an external data source (either from other database or file) into an existing mysql table but you are not allowed to truncate the table first because data source contains less number of columns than the existing table. The external source has a column that will be checked against the table’s primary key to determine if a record is new or not. For new records, using bulk insertion is pretty easy. Updating existing rows is quite a different story as there’s not yet an official way to do “bulk” updating. Prepare/Execute method might help a bit but still the performance is not satisfactory.

The solution:
Through a project I completed recently, I’ve learned a new way to solve the above problem, and the basic idea is to use bulk insertion to accomplish the goal of bulk updates, with the help of mysql temporary table. Let’s assume the table we are going to insert/update has the following structure:

tbl_A (
     userid char(20) not null,
     email varchar(250) not null,
     url varchar(250) ,
     flag_a tiny int,
     primary key(userid)
)

The basic steps:
1) Before entering the loop, create a temporary table tbl_A_tmp

create temporary table if not exists tbl_A_tmp (
   userid char(20) not null,
     email varchar(250) not null,
     url varchar(250) ,
     flag_a tiny int
)

Please note this table doesn’t have to have exactly the same data structure as tbl_A, as long as it holds the columns that will be used in step 5 and it’s also a good idea to remove any indexes to speed up insertion (to the temporary table).

2) Use delete or truncate statement to delete any existing rows in tbl_A_tmp

3) Inside the loop, save new records to an array list_new and existing records to an array list_existing, when num_of_elements(list_new) >=BATCHCOUNT, do batch insertion to tbl_A directly, when num_of_elements(list_existing) >=BATCHCOUNT, do batch insertion to tbl_A_tmp, clear list_new or list_existing after each batch insertion

4) When looping is done, Check remaining number of elements in both list_new and list_existing, perform batch insertions for any remaining records to the corresponding tables

4) Add unique or primary key to tbl_A_tmp, without this the next step will be very slow

5) Perform cross-table update like the following

    update tbl_A dest, tbl_A_tmp src set
        dest.email=src.email,
        dest.url=src.url,
        dest.flag_a=src.flag_a
    where dest_A.userid=src.userid

The reason why temporary table is used is because of
a) speed, a temporary table is usually created in the memory and will be converted to use disk storage if the temporary table becomes too large (hence data loss is prevented)
b) convenience, when the db link is disconnected,temporary tables are dropped automatically

How much performance gain do I get by using this method? 45 minutes (bulk insertion for new records + row-by-row updates) vs. less than 2 minutes (bulk insertion for new records + temporary table method for existing records).

Categories: mysql, Programming, Tip

Comparing two mysql tables containing huge number of records each

June 15, 2011 Leave a comment

Usually I use the following method to compare two mysql tables with identical structure (or one contains the subset of columns of the other):

select * from db_A.tbl_data
where (col1, col2, col3, ...) not in
(select col1, col2, col3, ... from db_B.tbl_data);

select * from db_B.tbl_data
where (col1, col2, col3, ...) not in
(select col1, col2, col3, ... from db_A.tbl_data)

[ change * to count(*) if only the number of differences is needed ]

This method works pretty well in terms of performance when
1) number of columns to be compared is not too big, usually around 10
2) there are indexes defined in each table, this is very important when number of rows to be compared in each table is huge, I’lve done comparisons between 2M rows agains 2M rows, it can take minutes to finish, even with indexes.

But I ran into performance problem the other day when I tried to compare two identical-structured tables (comparing all 15 columns) with almost 3M rows each, and each table has only 1 index defined. Queries using the first method ran very slowly and I quickly used up patience ( I waited for more than half an hour and even the first query couldn’t got finished). Without other choices, I came up with the following solution:

a) log in to mysql server
b) select * from db_A.tbl_data into outfile '/tmp/outfile_A';
c) select * from db_B.tbl_data into outfile '/tmp/outfile_B';
d) exit mysql console
e) cd /tmp; diff outfile_A outfile_B > diff
f) wc -l diff

if the result from f) is small, I can open up file diff with an editor to see what the differences are, otherwise I can use head diff and/or tail diff to take a glance at the differences. With this method, I got the results in less than 3 minutes, which is a huge improvement compared to the first method.

Categories: howto, mysql, Tip

Checking scripts syntax without actual running

June 15, 2011 Leave a comment

1) In Perl,
perl -c script.pl

2) In PHP, as I mentioned here,
php -l script.php

3) In Bash,
bash -n script.sh

Categories: debugging, Programming

How to get out from tmux session inside a tmux window

June 8, 2011 7 comments

Here’s how I got stuck:
1) Remote access to a linux serverA
2) Launch tmux, Ctrl-b c to create a new window
3) From the new window, remote access to serverB, then run tmux att under shell to attach an existing tmux session (on serverB)
4) Now if I use the shortcut key Ctrl-b d, I can only detach from the session on serverA, because the tmux session on server B also uses Ctrl-b as the prefix

The fix:
a) Press Ctrl-b n until the window containing the ssh session to server B becomes the current window
b) Press Ctrl-b :
c) Enter tmux command set -g prefix C-a
d) Press Ctrl-b d, tmux session on serverB is now detached, type exit to close the new window created in step 2, this step is optional though as the “nested” tmux session is detached from the ssh shell
e) Press Ctrl-a :
f) Enter tmux command set -g prefix C-b

Step c changes prefix to Ctrl-a so it won’t conflict with the tmux shortcut prefix from serverB. Step f changes the prefix back to Ctrl-b for serverA.

Categories: howto, Tip, tmux