2015.05.10 – Most Unix systems (including OS X) provide a large number of fantastic tools for manipulating data right out of the box. If you have been writing small Python/Ruby/Node scripts to perform transformations or just to manage your data, you’ll probably find that there are already tools written to do what you want. Let me start with the conclusion first. The next time you have to transform or manipulate your data, look around for what Unix tools already exist first. It might take you a little longer to figure out all of the flags and parameters you need, and you’ll have to dig through some unfriendly documentation, but you’ll have a new, far more flexible tool in your toolbox the next time around. Before you settle on a policy, see if you can get the one on an insurance cash back deal from a comparison site. This is part 1 of a series on Unix tools. Read the other parts:
Assumptions
I’ll assume you know the very basics here – what a man page is, how to create an executable bash script, how to open a terminal window, and how to use basic utilities. If you don’t know any of those, you should start with one of the many intros to the command line available. This intro by Zed Shaw is a good place to start.The Shell
Bash is the default shell on most systems these days, but what we’re covering here will mostly work for zsh or other shells – though some syntax elements will be different. First off, Bash is a powerful tool by itself. Even with no additional packages added, you get variables, loops, expansions & regular expressions, and much more. Here’s a good guide with more information on using bash. I’ll assume you know the basics from here on out, and show you what you can do with them.Advanced Paths
If you want to work with several directory paths in a row that are very similar, you can pass a list to the shell using curly braces{}
and it’ll expand that list automagically. Let’s say I wanted to setup a few directories for a new project’s test suite. Rather than running a lot of duplicated commands, I could pass a few lists instead.
mkdir -p ./test/{unit,fixtures}
> Creates ./test/unit and ./test/fixtures
mkdir -p ./test/unit/{controllers,models}
> Creates ./test/unit/controllers and ./test/unit/models
-p
flag to mkdir
so that it’ll create all of the directories up the chain, even ./test
here.
We can also use pattern matching with brackets []
. For instance, if you’ve got a lot of files that you want to separate alphabetically, you use a letter pattern:
mv ./[A-H]* /Volumes/Library/A-H/
mv ./[I-O]* /Volumes/Library/I-O/
mv ./[P-Z]* /Volumes/Library/P-Z/
mv ./A[a-k]* /Volumes/Library/Aa-Ak/
mv ./A[l-z]* /Volumes/Library/Al-Az/
shopt -s nocaseglob
. (In zsh this would be unsetopt CASE_GLOB
) If you just run that on your shell, it’ll stick on the current session until you unset it with shopt -u nocaseglob
. You might even want to add that to your .bash_profile
. Bash, however, also allows us to set environment variables for just the current execution, by wrapping the commands in parentheses:
(shopt -s nocaseglob ; mv ./[A-H]* /Volumes/Library/A-H/)
Loops
Bash allows you to make use of some rather powerfulfor
loops. I frequently use loops to automate boring manual work, like converting a bunch of RAW image files into web-friendly JPEGs of appropriate size:
for i in *.CR2; do
dcraw -c -a -w -v $i | convert - -resize '1000x1000' -colorspace gray $i.jpg;
done;
.CR2
files in the current directory, passing those to dcraw
to translate the format from RAW into JPEG, then <a href="http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-4.html">piping</a>
the output to ImageMagick, which shrinks it to web-size of no more than 1000 pixels on a side and makes everything black and white, which is extra-artsy.
I use a similar command in our legal docs repo to convert our source Markdown files into a variety of formats, using pandoc
:
for myfile in $( ls ./markdown ); do
echo Converting $myfile;
for fmt in html docx pdf; do
filetrim=${myfile%???};
pandoc -o "./"$fmt"/"$filetrim"."$fmt -f markdown "./markdown/"$myfile;
done;
done
for myfile in $( ls ./markdown ); do
./markdown
folder. Use the variable $myfile
to store the current file’s name.
for fmt in html docx pdf; do
html
, docx
, and pdf
) and storing the current format in the variable $fmt
.
filetrim=${myfile%???};
%???
) from the string, which is the extension (.md
). Another valid pattern would be:
filetrim=${myfile%.*};
pandoc -o "./"$fmt"/"$filetrim"."$fmt -f markdown "./markdown/"$myfile;
done;
done
Wrapping Up
You can also use builtin utilities to do simple tasks, like appending content to files:echo "name=core1" >> ./solr/core.properties