Tag Archives: unix

newbie picking away at awk

I’ve been looking into awk off and on.
This is kind of a weird thing to do, coming from perl.
I’m not a perl genius, but, I’m intermediate. Which is saying a lot.
Perl is a beast. It’s a madman in the forest tripping on lsd- who commands power over small countries and speaks to aliens. If you can talk to perl, it can do everything from giving your wife an orgasm to putting the baby to sleep.

So, awk- it would seem silly to be interested in it. Since perl could do all of this already. Why learn it?

Well, I’ve been learning it. Because if there’s anything cooler than perl, it’s unix.
And awk is .. well.. unixy.

So.. awk. I’m a total awk noob, please.. keep in mind.

Awk seems to be cool for parsing line output, for one.
I often do ls -lha for listing sizes of things. And I may not ne interested in, you know.. permissions. Because maybe I just want to know the size of things..


Playing with ls output

Regular ls

$ ls -lh
total 76K
-rw-rw-r-- 1 leo leo 3.9K 2010-07-13 08:17 build-meta-refresh-substitutes.pl
-rw-rw-r-- 1 leo leo 138 2010-07-02 03:27 convert-old-new.urls-to-htaccess-entries.sh
-rw-r--r-- 1 leo leo 4.1K 2010-07-08 12:31 htaccess
-rw-r--r-- 1 leo leo 3.7K 2010-07-08 12:31 htaccess2
-rw-r--r-- 1 leo leo 3.5K 2010-07-08 12:30 htaccess3
-rw-r--r-- 1 leo leo 4.0K 2010-07-08 12:35 htaccess4
drwxrwxr-x 2 leo leo 4.0K 2010-07-13 08:18 meta-refresh-versions
-rw-rw-r-- 1 leo leo 5.0K 2010-07-02 02:44 old-new.urls
-rw-r--r-- 1 leo leo 20K 2010-07-13 08:18 refreshpages.zip
-rw-rw-r-- 1 leo leo 6.0K 2010-07-02 02:27 sitemap.new
-rw-rw-r-- 1 leo leo 5.0K 2010-07-02 02:43 sitemap.old

Yes, but what would jesus awk say?

$ ls -lh | awk '{ printf "s %s\n", $5, $8 }'

 3.9K build-meta-refresh-substitutes.pl
 138 convert-old-new.urls-to-htaccess-entries.sh
 4.1K htaccess
 3.7K htaccess2
 3.5K htaccess3
 4.0K htaccess4
 4.0K meta-refresh-versions
 5.0K old-new.urls
 20K refreshpages.zip
 6.0K sitemap.new
 5.0K sitemap.old

Get it? No?
The lines are treated one by one. Each argument is $1, $2, $3, etc.
The delimiter is by default, the shell delimiter. That’s capricorn weekly horoscope sign should climb upwards because financial security is necessary for this sign. whitespace (tab space).


looking for text in files and editing in vim

I often need to find something in code or text. Maybe I’m messing with wordpress stuff, and need to find a php function.

For example, finding a php function..

I want to look for a function called get_author* in the php files around here..

html $ find ~/public_html/ -name "*php" | xargs grep 'function get_author'
/home/leocharre/public_html/wp-includes/link-template.php:function get_author_feed_link( $author_id, $feed = '' ) {
/home/leocharre/public_html/wp-includes/rewrite.php: function get_author_permastruct() {
/home/leocharre/public_html/wp-includes/author-template.php:function get_author_posts_url($author_id, $author_nicename = '') {
/home/leocharre/public_html/wp-includes/theme.php:function get_author_template() {

Yes, but what would awk say?

Automating this somewhat..
The cool thing would be to automatically go there, or at least prit the commands so I can call up vim by cut and paste.

Ok.. not the easiest thing as it turns out… making use of this..

html $ find ~/public_html/ -name "*php" | xargs grep -s 'function get_author' | sed 's/:\s\ /:/' | sed "s/'.\ //" | grep2vim
vim '/home/leocharre/public_html/wp-includes/link-template.php' /'function get_author_feed_link( $author_id, $feed = '
vim '/home/leocharre/public_html/wp-includes/rewrite.php' /'function get_author_permastruct() {'
vim '/home/leocharre/public_html/wp-includes/author-template.php' /'function get_author_posts_url($author_id, $author_nicename = '
vim '/home/leocharre/public_html/wp-includes/theme.php' /'function get_author_template() {'

Where grep2vim is an awk script inside my bin dir..

html $ cat ~/bin/grep2vim
#!/bin/awk -f
BEGIN { FS=":" }
{ printf "vim '%s' /'%s'\n", $1, $2 }

Oy.

The output is pretty cool, it’s cut and paste, for example.. and then vim gets the commnand to search for that string, that’s what the / fuss is all about.

Okkkaaaaay…. Putting it all together..

html $ cat ~/bin/findphpfunction2vim
#!/bin/sh

BASEDIR=$1
if [ -z "$BASEDIR" ]; then
 echo "$0 missing DIR path"
 exit 1
fi

FUNCTIONNAME=$2
if [ -z "$FUNCTIONNAME" ]; then
 echo "$0 missing function name"
 exit 1
fi

find $BASEDIR -name "*.php" | xargs grep -s "function $FUNCTIONNAME" | sed 's/:\s\ /:/' | sed "s/'.\ //" | grep2vim

Example usage:

html $ findphpfunction2vim ./ is_user
vim './wp-includes/ms-functions.php' /'function is_user_member_of_blog( $user_id, $blog_id = 0 ) {'
vim './wp-includes/ms-functions.php' /'function is_user_spammy( $username = 0 ) {'
vim './wp-includes/ms-functions.php' /'function is_user_option_local( $key, $user_id = 0, $blog_id = 0 ) {'
vim './wp-includes/pluggable.php' /'function is_user_logged_in() {'
vim './wp-admin/includes/class-wp-importer.php' /'function is_user_over_quota() {'

Great, using my terminal emulator, I can just double click and middle click to cut and paste, automatically executed since select works including the carriage return.

text to html

I’ve been trying out text2html. It’s pretty cool.
You point it to a text file (or stdin), and it spits out html.
There are some fucked up things about it though..

For one, by default, it does nothing.
Nah, I’m not kidding. It does nothing- spits out same shit that came in. Can you imagine if you ate an apple and shat and apple out, unchanged? That would not be cool. Ok ok.. so it does html entities.. fuck me..
Second- there’s no help option.
Every unix command must have a -h or –help option. Because it’s expected. The empirical *I* fucking expect it.
Instead you have to use $ man ‘text2html’. Oh- but.. wait.. what’s this? Not the complete manual? You have to read $ man HTML::FromText for the full options.
Shit.
This is enough to piss me the fuck off.

It does cool shit, but you have to wine and dine it before it’ll suck your dick. It won’t just take your fifty bucks to do it. And you know the unix way.. By default, this program should suck your dick, no surprises, and shut up.
It should eat my apple and shit out a shit.

Workaround..

Create an alias in your ~/.bashrc file. Add this line:

alias text2html='text2html --blockcode --bold --bullets --email --numbers --paras --tables --underline --urls'

Note that it won’t work until you start another shell session.

Now all options are on, and the thing behaves closer to what is expected.

To install this fine mess of a program… Well.. the application is awesome- the api is a fucking cracked out microsoft whore.
As root..

# cpan HTML::FromText
# man html2text

Here’s some example input and output…

Original Text:

I AM TEXT THAT WILL BE TRANSFORMED TO HTML

Let's see and try what happens here.. shall we.

And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will.

And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. This is to test that the next word
will be wrapped up, and there won't be a break between the 'word' and 'will'.

   - thank you
   - i am also thanked of course
   - you think?

ALSO WHAT ABOUT

   A definition.
      I am goint to think so very much

   And what about this on?
      I will also think that. Thank you.

Great. What about a link? http://leocharre.com

Done.

Html output..

I AM TEXT THAT WILL BE TRANSFORMED TO HTML

Let's see and try what happens here.. shall we.

And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will.

And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. And hopefully what we expect to happen will. This is to test that the next word
will be wrapped up, and there won't be a break between the 'word' and 'will'.

  • thank you
  • i am also thanked of course
  • you think?

ALSO WHAT ABOUT

A definition.
   I am goint to think so very much

And what about this on?
   I will also think that. Thank you.

Great. What about a link? http://leocharre.com

Done.


You may want to look at the source of this page for the little details. It’s pretty clean, very nice job. (No, it’s not being trasformed by freaking wordpress.. See wordpress raw html plugin )

help with installing tesseract ocr

# INSTALL.tesseract
# =================
#
# Installing tesseract can be tricky.
#
#
# 1) Some dependencies..
#
#
#
# You’re may need gcc-c++, automake (gnu automake), and svn (subversion).
# You can check if you have these using the ‘which’ command..
# which svn
# which automake
#
# If the command is not present, nothing happens.
#
# If you have ‘yum’ (fedora/rehat) or ‘apt-get’ (debian/ubuntu), you may want
# to simply try:
#
# apt-get install automake
# apt-get install subversion
#
# yum -y install subversion
# yum -y install automake
#
# If this does not work, you need to download the source packages and manually
# install them.
#
# You can get gnu autake from:
# http://www.gnu.org/software/automake/
#
# And subversion from:
# http://subversion.apache.org/
#
# As for gcc-c++ installed on your system- This is likely already present.
# If you’re missing gcc-c++, try using yum or apt-get.
# Here is where to read more about gcc
# http://gcc.gnu.org/
#
# 2) Get the source for tesseract..
#
# You may be able to simply install the SVN version of Tesseract by
# using these commands..

svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
cd tesseract-ocr
./runautoconf
mkdir build-directory
cd build-directory
../configure
make
make install

#
# for more info, see google project on ocr, they use tesseract
#
# you can also try to run these commands as a script ( lines starting
# with a pound sign are comments and ignored bash/sh
# save this text as something like ‘INSTALL.tesseract’ and then run..
# sh ./INSTALL.tesseract

browsing your partition with tree via the command line

These tools are useful for development.

Most of the time you work on the terminal, and to find your way around a project you use things like find, ls, and tab completion.
If you need more of a bird’s eye view, you may fire up a gui browser like konqueror. But that’s a gui, and guis are for users.

Another option is tree. Here is example output of tree:

[leo@localhost devel]$ tree
.
`-- WordPress
    |-- bin
    |   `-- wppost
    |-- lib
    |   `-- WordPress
    |       |-- Base.pm
    |       `-- Post.pm
    |-- t
    |-- wp-content
    |   `-- plugins
    |       |-- akismet
    |       |   |-- akismet.gif
    |       |   `-- akismet.php
    |       |-- hello.php
    |       |-- pictpress.php
    |       |-- pm_admin_menu.php
    |       |-- postmaster
    |       |   `-- readme.txt
    |       |-- postmaster.php
    |       `-- wp-db-backup.php
    |-- wp-mail.php
    `-- xmlrpc.php

9 directories, 13 files

Continue reading

editing images in the command line with convert and mogrify

One of the dumbest things I used to do in making web pages was to resize images and make thumbnails in ‘photoshop’.

The next less dumb thing I did was to script thumbnailing. To allow a server to make the thumbnails instantly.
Then I got comfortable with things like convert and mogrify.

Both of these are interfaces to image magick..
Continue reading