Archive for the ‘perl’ Category

Instant mysql connection with perl.

Monday, September 29th, 2008

Reading about the exciting concept of code speed vs programmer time. That is, balancing what it costs to make a machine run slow code, vs what it costs to let a coder work easier.

Convenience.

So.. I noticed I was coding stuff that uses a mysql connection, and I store the connection variables, the credentials- in a YAML configuration file. Often. It’s good practice. YAML files are simple, and you can store them anywhere- this helps keep sensittive data, or .. the data that makes the same app act different- appart from the executable application.

I was helping a co-worker learn some perl, and he wanted to connect to a db. He was sort of impatient- because the details of establishing a connection with a new language are a little bit… beneath him? This is someone highly trained and skilled in other languages and systems. And he wanted to just.. *have* a connection handle to a database.

So… YAML.. DBH…. I wrote and released YAML::DBH.

As simple as it gets. You write a config file:


password: super
user: myself
database: superstuff

In your script:

use YAML::DBH ‘yaml_dbh’;

my $dbh = yaml_dbh(’./credentials.yml’);

gnu coreutils md5sum 3 times faster than Digest::MD5::File

Saturday, July 26th, 2008

I use md5 sums for identifying files under FileArchiveIndexer.
One of the issues that need improvement are updating the files list- the speed of this procedure.

Now, with getting md5 sums of a few files, there is no issue. Even if we have a few files of large size.
When we are dealing with a few gigs, this becomes more important.

In running benchmarks of FileArchiveIndexer::Update , I notice that the cpu and memory consumption hover at about 10% and 12% or so- This is really not making good use of the machine.

I set about making some tests that would benchmark different ways of getting the md5 sum for files.
There are the following ways that I’ve looked at..

WAYS TESTED TO GET MD5 SUMS

  • The most basic that seems to make sense here is Digest::MD5::File. This module provides various means of getting digests with a simple path argument to the file.
  • Another method is read in the file data, all of it at once (watch out.. memory rape)- and use Digest::MD5 to get the md5_hex() sum.
  • I also considered a lazy approach to reading in only the first 25k or so of a file, and getting a digest from that- as a sort of.. lazy and dangerous way of getting sums. There are reasons, or situations in which this can actually be useful.
  • The fourth way- which I thought woulf be slow- but I wanted to test it anyhow- is using gnu coreutils md5sum via the command line. That is- making a system-ish call to md5sum.
  • THE RESULTS

    [leo@localhost LEOCHARRE-MD5-Benchmark]$ dprofpp -I ./tmon.out
    Total Elapsed Time = 89.05905 Seconds
      User+System Time = 22.93905 Seconds
    Inclusive Times
    %Time ExclSec CumulS #Calls sec/call Csec/c  Name
     100.   0.020 23.021      5   0.0040 4.6042  main::bench_a_dir
     99.5   0.288 22.826     20   0.0144 1.1413  main::get_file_md5s
     66.2   0.134 15.200   5225   0.0000 0.0029  main::_md5_Digest_MD5_File
     65.6   9.682 15.066   5225   0.0019 0.0029  Digest::MD5::File::file_md5_hex
     20.6   4.737  4.737   5225   0.0009 0.0009  Digest::MD5::add
     19.8   1.634  4.551   5225   0.0003 0.0009  main::_md5_Digest_MD5
     15.7   3.614  3.614  10450   0.0003 0.0003  Digest::MD5::md5_hex
     7.79   1.787  1.787   5225   0.0003 0.0003  main::_md5_cli
     4.36   0.304  1.001   5225   0.0001 0.0002  main::_md5_lazy
     2.25   0.517  0.517   5225   0.0001 0.0001  Digest::MD5::File::__ANON__
     0.78   0.020  0.178      9   0.0022 0.0198  main::BEGIN
     0.65   0.150  0.150      5   0.0300 0.0300  main::ls
     0.44   0.020  0.100      5   0.0040 0.0199  Digest::MD5::File::BEGIN
     0.38   0.087  0.087   5225   0.0000 0.0000  Digest::MD5::hexdigest
     0.35   0.060  0.080     10   0.0060 0.0080  LWP::UserAgent::BEGIN
    

    Now, what interests us here are the following ..

    [leo@localhost LEOCHARRE-MD5-Benchmark]$ dprofpp -I ./tmon.out  | grep '::_md5'
     66.2   0.134 15.200   5225   0.0000 0.0029  main::_md5_Digest_MD5_File
     19.8   1.634  4.551   5225   0.0003 0.0009  main::_md5_Digest_MD5
     7.79   1.787  1.787   5225   0.0003 0.0003  main::_md5_cli
     4.36   0.304  1.001   5225   0.0001 0.0002  main::_md5_lazy
    

    The total ammount of data processed is about 250 megs in this test.
    As it turns out, making a call to gnu coreutils md5sum, is about… THREE TIMES FASTER than using Digest::MD5::File- which is what I was using before.
    This means the files/locations update on a few gigs of data that normally take 30 minutes to do- can take 10- if I make calls to cli md5sum.

    WHY DOES THIS MATTER

    FileArchiveIndexer takes care of using OCR on a massive ammount of scanned in hard copy documents. These documents may change location, be renamed, enter the system- leave the system- be copied.. etc. The faster we can register their existence- the better we can track user changes.

    You are welcome to download my test suite: leocharre-md5-benchmark-01tar.gz

LEOCHARRE::Dir released

Tuesday, July 15th, 2008

I’m not quite tired of rewriting opendir and list contents and which are files and directories, etc.
I just think it was time I did this for myself.

It’s a module with some subs to do things like, get all abs paths to files in dir x- simply by coding lsfa('./');

Here’s the manual to LEOCHARRE::Dir..
(more…)

automatically generate change file from cvs

Wednesday, June 25th, 2008

So I had Mark Stosberg note that HTML::Template::Default had no Changes file.

I keep everything in cvs because .. Because otherwise I think I would be in a mental instituion.

So, cvs records changes if you tell it something- as you commit changes.

There is a specific gnu format for a changes file. You will see changefiles in distros like CGI, CGI::Application.. most good distros. And it makes sense, to keep track of stuff. People might want to know what’s up and why. And often I do add comments to my commits.
But, do I have to write this tedious file by hand? Maybe there’s an automatic way to generate this…
(more…)

Stop getting windows cpan fail reports

Sunday, June 8th, 2008

I’ve noticed I’ve been getting more fail reports from CPAN on some basic modules.
Things like File::PathInfo and LEOCHARRE::CLI, things I clearly intended for POSIX only portability. You see- most of these fail reports are from mswin32 platforms. M$ land.
I have zero interest in porting anything to windows. I resent that perl was ported.. or whatever did happen.. to m$ platforms.

I have gone back and made sure to place some hacks to stop my modules from even intalling on mswin32 platforms.
(more…)

cli pdfmerge

Friday, June 6th, 2008

I have a working release of pdfmerge.

My goodness! Why another pdf merging thingie!!!!

This one counts how many pages are in all docs, then compares to output, etc.. Anyhow, it’s somewhat safe.
And it’s a simple call. Instead of looking up how to use xpdf or something to merge pdfs, you can use pdfmerge via the command line. Yes, there *is* another pdfmerge out there. In fact like 2. One requires extra moolah for the ‘full’ version.. pleeezzz…
(more…)

How to add runmodes to a CGI::Application

Tuesday, March 25th, 2008

There really are a ton of messy ways to tinker with adding runmodes to CGI::Application.

I tried a lot of different things, succesfully, strangely, even ‘cleverly’.

If you have a plugin that is being ‘use’d, you should have an import function.

(more…)

posting to wordpress via the command line

Sunday, January 20th, 2008

I tackled this problem via XMLRPC and perl.
I ended up packaging this under WordPress::Post on cpan.

Usage examples:

wppost -t 'title of the post' -c stuff,fruit,cake -i 'This is what I think of bla.'

You can also write a text file, the name of the text file is the title.

wppost -i ./path/to/title_of_the_post.txt

This program is in its infancy. But dammit, it works. It’s how I posted this content via the command like. Also, you can include images in your posts. Multiple images. Via the command line.
Anything not in the argument is considered posting content..
So what if your post if about a party you went to, and you have 5 pictures..

First write your halloween_party.txt file, then have your photos lined up somewhere.. then..

wppost -i halloween_party.txt ./images/halloween_photos_post/*jpg

Done. Linux rocks.

rename images by exif date

Tuesday, January 15th, 2008

I have a digital cam. I take pictures.
I don’t like a million DSCIM or DSCF or whatever files.
It’s inconvenient with so damn many.

It would be nice to rename all the images according to the date, as recorded inside the image exif data, that is, the sate stamp put in by your camera.

I hacked together this script to do that..
Use at your own peril. This is a hack.
Eventually I will release a refined version on cpan.

This scrip requires modules Image::ExifTool, LEOCHARRE::CLI ..

#!/usr/bin/perl -w
use strict;
use File::Copy;
use Cwd;
use LEOCHARRE::CLI;

# requires Image::ExifTool
#
my $o = gopts('m');

#yn('rename all jpg files by date in '.cwd().'?') or exit;

my $files = argv_aspaths();

(defined $files and scalar @$files )or die("no file arguments provided");

if ($o->{m}){
   -d './noout' or mkdir './noout';
}

for (@$files){
   $_=~/(.+)\/([^\/]+)$/ or next;
   my $abs = $1;
   my $filename = $2;
   $filename=~/\.jpg$/i or next;
   print STDERR "$abs   - $filename \n" if DEBUG;

   if ($filename=~/\d{4}[_\: ]+\d{2}[_\: ]+\d{2}/){
      print STDERR "file $filename already named?\n" if DEBUG;
      next;
   }

   my $out =  `exiftool -DateTimeOriginal '$abs/$filename'`;
   chomp $out;

   unless( $out ){
      print STDERR " no out? $filename\n" if DEBUG;
      if ($o->{m}){
         File::Copy::move("$abs/$filename", "$abs/noout/$filename");

      }
      next;

   }
   $out=~s/^Date\/Time Original[\:\s]*//i;
   $out=~s/:| /_/g;

   unless( $out=~/^[\d_]+$/ ){
      print STDERR "dont like [$out]\n" if DEBUG;
      next;
   }   

   print STDERR "$filename : [$out]\n" if DEBUG;;

   rename("$abs/$filename", "$abs/$out\_$filename");
   print STDERR "moved to $abs/$out\_$filename\n" if DEBUG;  

}

=head1 OPTION FLAGS

   -m move to noout dir if cant get date

problems installing DBD::mysql

Saturday, January 12th, 2008

So I was doing a fresh install of my customized database api package for perl. LEOCHARRE::Database.
Goodly enough, my perl Makefile.PL let me know that I was missing DBD::mysql. Great.

I fire up cpan install DBD::mysql, and alas.. No go! How come??

Turns out you need to install mysql-client and mysql-devel.
I’m on a fedora core gui, so I use yum..

yum -y install mysql-client mysql-devel

Great.
Now let’s try cpan again..
cpan install DBD::mysql

It works better.. but oops.. still ..
2 tests skipped.
Failed 31/34 test scripts, 8.82% okay. 473/478 subtests failed, 1.05% okay.
make: *** [test_dynamic] Error 255
/usr/bin/make test -- NOT OK
Running make install
make test had returned bad status, won't install without force

What’s up?
I think the mysql server’s not running on this machine, thus, we need to install to make a full successful check via cpan.

yum -y install mysql-server

And then..
[root@localhost LEOCHARRE-Database]# /etc/init.d/mysqld status
mysqld is stopped
[root@localhost LEOCHARRE-Database]# /etc/init.d/mysqld start
Initializing MySQL database: Installing MySQL system tables...
OK
Filling help tables...
Ok

Great. Let’s try that cpan again..

cpan install DBD::mysql

Haha! It works! :-)