Archive

Archive for the ‘perl’ Category

Sending message with attachements using Mail::Sender in Perl

September 8, 2011 Leave a comment

Due to the security setting of a network, not every server is allowed to send out email directly, that’s the reason why I wrote this script. The following Perl packages are required: Mail::Sender and Getopt::Long. Feature that would be nice to have but hasn’t been supported yet: smtp authentication. Therefore in order to be able to run this script, the external smtp server should add the ip of server where mail.pl runs to the allow-to-send list, assuming the smtp server is not configured to be a open relay.

cat mail.pl

#!/usr/bin/perl
# purpose of this script is to address the followings:
# 1) support simple sending through external smtp server that doesn't require authentication
# 2) attach file(s) while sending message
# run the script without parameters will print the usage
# example usage:
# echo "message sent using mail.pl" | perl mail.pl -f sender@email.address -s "sending test" -a file1.txt -a info.log john@email.address,smith@email.address
use strict;
use warnings;
use Mail::Sender;
use Getopt::Long;

my $subject='[NO SUBJECT]';
my @attach=();
my $from='unspecified_sender@company.ltd'; # change this
my $through='localhost';
my $help=0;

usage() if ( @ARGV < 1 or
    ! GetOptions(
        'attach|a=s@'=>\@attach,
        'subjec|st=s'=>\$subject,
        'from|f=s'=>\$from,
        'through|t=s'=>\$through,
        'help|h'=>\$help,
    ) or $help );
 
sub usage
{
  print "Unknown option: @_\n" if ( @_ );
  print "usage: program [--subject|-s SUBJECT] [--from|-f FROM] [--through|-t SMTP] [--attach|-a FILE]  [--help|-h] recipient1@domain1[,recipient2@domain2...]\n";
  print "Suject will become [NO SUBJECT] if not provided.\n";
  print "Multiple --attach can be supplied to attach more than 1 file\n";
  exit;
}

my $to=$ARGV[0];

# check if files exist
map {
    die("File $_ doesn't exist!\n") if ! -f $_;
} @attach;

my $msg='';
while(<STDIN>) {
    $msg.=$_;
}
if($msg eq '') {
    $msg='[BLANK]';
}

my $sender=new Mail::Sender( {
    smtp=>$through,
    from=>$from,
});

$sender->OpenMultipart( {
    to=>$to,
    subject=>$subject,
});

$sender->Body();
$sender->SendLine($msg);
$sender->SendFile( {
    description=>'Raw data File',
    encoding=>'7BIT',
    file=>\@attach
});
$sender->Close();
Categories: emailing, perl

Mysql count with case statement

July 1, 2011 Leave a comment

In one of my recent perl scripts, I need to get a row count of total records from a mysql table, as well as the row count of some rows with some criteria. By combining the technique from http://stackoverflow.com/questions/5045124/count-case-and-when-statement-in-mysql and the atomic row fetch trick I mentioned a while back, I came up with the following script (for illustration only hence this is not the actual script that I wrote)

#!/usr/bin/perl -w
use strict;
use DBI;

# make db connection and stuff ...
my $dbh=DBI->connect( 'DBI:mysql:db:localhost:3306', 'username', 'password' ) or die( $DBI::errstr );

my $sql=qq{ select 
    count(feedback_id) as total_count, 
    count( case when feedback_type in ('suggestion', 'bug_report') then 1 else null end ) as special_count
from customer_feedbacks};

my ($total_count, $special_count)=$dbh->selectrow_array( $sql );
# do something with $total_count and $special_count

$dbh->disconnect() if $dbh;

The benefit of crafting the mysql statement this way is that I only need to do one mysql query instead of two, combined with atomic fetch, the resulting code becomes very clean.

Categories: mysql, perl, Programming

Yet another way to convert strings to numbers in Perl

April 27, 2011 3 comments

OK I know what you are gonna say when you see the title of this post — it’s not necessary to convert strings to numbers, strings are numbers if they look like numbers and Perl will treat them like numbers when you want it to. But there are some situations where data cleaning is required, for example, importing data from a csv file into database and you want to make sure numeric columns contains only numbers, strings such as ‘a345’, ‘ ‘, will be converted to 0 before they are being inserted into the database. For the past I’ve been using the following home-cooked method to do the conversion

sub tonum {
	return $_[0] =~ m/^\s*([+-]?\d+(\.\d+)?)\s*$/?$1:0;
}

This functions converts strings like ' +123.00', '78 ', '-333.456', '+abc' into '+123.00', '78', '-333.456', and '0' respectively. However I found yet another way to perform the same task and it’s much faster:

sub tonum2 {
	no warnings;
	return $_[0]+0;
}

How much faster can this method beat the previous one? I wrote a couple of testing scripts:

cat a.pl

#!/usr/bin/perl
use warnings;
use strict;

sub tonum {
	return $_[0] =~ m/^\s*([+-]?\d+(\.\d+)?)\s*$/?$1:0;
}
my $v;
for my $j (0..1_000_000) {
	for my $i (' 23.45', '-7  ', 'a34.56') {
		$v= tonum($i);
	}
}

———————————
time perl a.pl

real 0m31.250s
user 0m31.202s
sys 0m0.012s

cat b.pl

#!/usr/bin/perl
use warnings;
use strict;

sub tonum2 {
	no warnings;
	return $_[0]+0;
}
my $v;
for my $j (0..1_000_000) {
	for my $i (' 23.45', '-7  ', 'a34.56') {
		$v= tonum2($i);
	}
}

———————————
time perl b.pl

real 0m8.633s
user 0m8.617s
sys 0m0.008s

We now have our clear winner. The tricky part is tonum2 takes advantage of “no warnings” statement which suppresses the “Argument blahblah isn’t numeric” warnings. It’s kind of like the @ operator in PHP.

Categories: perl, Programming, Tip

Interesting results from benchmarking trim “function” in Perl

April 27, 2011 Leave a comment

I’ve been wondering if writing a customer trim function in perl will come with some performance penalty. To satisfy my itchiness, I ran the following three scripts one at a time and use handy command time to get the running times.

cat a.pl

#!/usr/bin/perl
#!/usr/bin/perl
use warnings;
use strict;

sub trim {
	my $str=shift;
	$str =~ s/^\s+|\s+$//g;
	return $str;
}

my $test_str= "  this is a test   ";

my $result;
for my $i (0..1_000_000) {
	my $copy=$test_str;
	$result=trim($copy);
}

——————————
time perl a.pl

real 0m18.451s
user 0m18.417s
sys 0m0.012s

cat b.pl


#!/usr/bin/perl
use warnings;
use strict;

my $test_str= "  this is a test   ";

my ($result,$copy);
for my $i (0..1_000_000) {
	$copy=$test_str;
	$copy =~ s/^\s+|\s+$//g;
	$result=$copy;
}

——————————
time perl b.pl

real 0m13.330s
user 0m13.313s
sys 0m0.000s

By using the regex directly saves the running time by a few seconds. And I continue to break the regex into two as follows:

cat c.pl

#!/usr/bin/perl
use warnings;
use strict;

my $test_str= "  this is a test   ";

my ($result, $copy);
for my $i (0..1_000_000) {
	$copy=$test_str;
	$copy =~ s/^\s+//g;
	$copy =~ s/\s+$//g;
	$result=$copy;
}

——————————
time perl c.pl

real 0m6.534s
user 0m6.524s
sys 0m0.004s

I thought the results from b.pl and c.pl should be pretty close but I was so wrong – it takes only a half of the time to run c.pl when compared to b.pl.

What I learned from the above bench mark tests:
1) Avoid trim “functions” as much as possible
2) Use the double regex method as shown in c.pl instead of one regex
3) Running time can be cut further if it’s known beforehand whether there are trailing spaces (use s/\s+$//g) or leading spaces (use s/^\s+//g) only

Categories: perl, Programming

Perl DBI bind_columns vs. fetchrow_hashref bench marking

April 17, 2011 Leave a comment

Have been wondering the real difference between bind_columns and fetchrow_hashref methods while fetching data using perl DBI and here’s the findings I’ve come up with:

[04/17/2011 10:35:53] Using bind_columns method starts …
[04/17/2011 10:36:39] Using bind_columns method ends …
[04/17/2011 10:36:39] Using fetchrow_hashref method starts …
[04/17/2011 10:38:36] Using fetchrow_hashref method ends …

The clear winner is bind_columns (47 seconds, which is much less than the time taken for fetchrow_hashref method, 1 minute and 57 seconds) The table being tested has 150,000 records and it’s structure is as follows:

describe tbl_userinfo;
+----------+------------+------+-----+---------+----------------+
| Field    | Type       | Null | Key | Default | Extra          |
+----------+------------+------+-----+---------+----------------+
| id       | int(11)    | NO   | PRI | NULL    | auto_increment |
| password | char(32)   | NO   |     | NULL    |                |
| active   | tinyint(4) | NO   |     | 1       |                |
+----------+------------+------+-----+---------+----------------+

The test is done by looping each method 10 times through the script below:

#!/usr/bin/perl
use strict;
use warnings;
use DBI;
use Time::Local;
use POSIX qw/strftime/;

sub d {
	my $msg=shift;
	print strftime("[%m/%d/%Y %H:%M:%S] ", localtime(time)).$msg."\n";
}

sub get_records_bind {
	my ($sth, $ref_records)=@_;
	$sth->execute();
	my($id, $password, $active);
	$sth->bind_columns(undef, \$id, \$password, \$active);
	while($sth->fetch()) {
		$ref_records->{$id}={
			password=>$password,
			active=>$active,
		};
	}
}

sub get_records_hash {
	my ($sth, $ref_records)=@_;
	$sth->execute();
	while(my $ref=$sth->fetchrow_hashref()) {
		$ref_records->{$ref->{id}}={
			password=>$ref->{password},
			active=>$ref->{active}
		};
	}
}	

my $dbh=DBI->connect( "DBI:mysql:database=testdb:host=localhost", 'username', 'password') or die "Can't connect $DBI::errstr";

my $sth=$dbh->prepare("select id,password,active from tbl_userinfo");

d("Using bind_columns method starts ...");
my $r={};
for my $i (1..10) {
	%$r=();
	get_records_bind($sth, $r)
}
d("Using bind_columns method ends ...");

d("Using fetchrow_hashref method starts ...");
for my $i (1..10) {
	%$r=();
	get_records_hash($sth, $r)
}
d("Using fetchrow_hashref method ends ...");

$dbh->disconnect or warn "Disconnection failed" if $dbh;

Therefore I think it’s better to use the bind_columns method whenever possible, especially when the dataset is huge.

Categories: perl, Programming

Perl study note: why using references in functions is the preferred way (mostly)

February 26, 2011 Leave a comment

Using references in Perl has a big advantage over the traditional passing-value way in performance. The reason is simple – no copying is performed when parameters are passed as references to functions. Below is the code I used to test on my Acer AOD 255 netbook:

#!/usr/bin/perl
use Time::Local;
use POSIX qw/strftime/;

sub d {
	my $msg=shift;
	print strftime("[%m/%d/%Y %H:%M:%S] ", localtime(time)).$msg."\n";
}

sub trim_ref {
	my $ref=shift;
	$$ref =~ s/(^\s+|\s+$)//g;
}

sub trim {
	my $str=shift;
	$str =~ s/(^\s+|\s+$)//g;
	return $str;
}

print "Purpose of this program is to illustrate the advantage of using
references in perl functions. Please note the traditional way might take a while
to finish.\n";


d('running traditional way');

my $str="   this   ";
my $dest;
for my $i (1..1_000_000) {
	$dest=trim($str);
}
d('traditional way done');

my $str2="   that   ";
d('using reference method');
for my $j (1..1_000_000) {
	trim_ref(\$str2);
}
d('ref. done');


Result:
...
[02/26/2011 18:37:43] running traditional way
[02/26/2011 18:37:59] traditional way done
[02/26/2011 18:37:59] using reference method
[02/26/2011 18:38:06] ref. done

The reference way (7 seconds) clearly beat the value-passed way (16 seconds). But this conclusion doesn’t mean passing by reference is always the best way especially when modifying the arguments passed is not desired.

Simple logging (to screen) in Perl

February 26, 2011 Leave a comment

Very Simple but useful, esp. for simple scripts:

#!/usr/bin/perl
use Time::Local;
use POSIX qw/strftime/;

sub d {
	my $msg=shift;
	print strftime("[%m/%d/%Y %H:%M:%S] ", localtime(time)).$msg."\n";
}

print "this is my first program written in  perl!\n";
d('some test message');

Categories: perl, Programming