Software Carpentry
Sunil Mohan Adapa
sunil at medhas dot org
Some content derived from Software Carpentry Lecture Material http://software-carpentry.org/license/
This work and the original are under Create Commons Attribution 3.0 License
http://creativecommons.org/licenses/by/3.0/
About the Tutorial
Introductory
Hands on
Interactive
Software Carpentry for academics and
research in any discipline
Makes software work easier
Enables new kinds of work
Gets work done faster
Summary
The Unix Shell
Regular Expressions
Make
Version Control
Python
Unix Shell
About Shell
Why use command line when we have GUI
Typical shell: bash
Terminal programs: gnome-terminal, Konsole,
xterm, putty
Example Use Cases
Mr. A wishes to retrieve all files modified last
week and replace the phrase this week with
next week in those files.
Everyday, Mr. B wishes to automatically retrieve
all files modified on that day and back them up to
different location.
Mr. C likes to rename files so that their
extensions are removed
Mr. D likes to combine to merge fives sets of
user lists into a single one
File system
ls to list the files in the directory
ls -l to list files with extra information
pwd to show the current directory
cd <path> to switch to a directory
cd to switch to home directory
/ is the top most directory. It is also the path separator
. is the current directory
.. is the parent directory. /home/user/work/.. is same
as /home/user
File System Structure
/root and /home store user data
/bin, /usr/bin, /sbin and /usr/sbin store executable commands
/usr stores files related to user applications
/usr/local contains applications compiled by the user
/var contains (variable) files that usually grow over time
/lib contains libraries
/tmp contains temporary files
/proc is a virtual file system containing kernel information
/mnt and /mount contain file system mounts
Manipulating Files
cp copies one file to another file or directory
mv renames a file or moves it to another
directory, overwriting
rm deletes files
rm -rf deletes files and directories
mkdir creates a directory
rmdir removes an empty directory
File Permissions
ls -l shows ownership and permissions of a file
chmod changes the permissions
chown changes the ownership
su switches the current user by launching a
new shell
Redirection
ls > out stores the output of ls into out file
cat concatenates files and input given to it
cat < out reads the contents of out file and
provides as input to cat
sort < out > sorted sorts a contents of out file
and stores it in the sorted file
| (a pipe) redirects the output of one command
to another: ls | sort > sorted
Some More Commands
du to find the size occupied by file on disk
less and more for paginated display
find to recursively find files matching a complex criteria
xargs to convert input into arguments
grep to match a pattern/regular expression in a file
head and tail to see part of a file
sort to sort data in a file
uniq to find items after sorting
wc counts number of chars, words and lines in a file
Jobs
Control-C terminates a program
Backgrounding a program
Control-Z and bg
& at the end of the command
jobs list current jobs
fg foregrounds a program
ps lists processes
kill kills a process
References
Bash Manual Page: man bash
http://linux.die.net/man/1/bash
GNU/Linux Man Pages: man
Learning the Shell:
http://linuxcommand.org/learning_the_shell.php
Regular Expressions
What are Regular Expressions?
A concise and flexible means for matching
strings of text
Like *.txt means all files with .txt extension
Parts of matches can be extracted
Matched text can be replaced
Example Use Cases
Mr. A has list of 1000 phrases in a text file. She
would like to add a full-stop at the end of each
line.
Mr. B has a list of percentages of various
categories in Wikipedia and their growth in X
(Y) format. He would like to covert it X/Y format.
Mr. C would like find out all words in a file
containing 3 to 5 alphabets.
Example Use Cases (contd.)
Mr. D would like to list all hexadecimal numbers
in a file.
Mr. E would like to convert all American
formatted dates in a file to ISO date format.
Mr. F would like to retrieve all the sentences
starting with 'Which' from a file.
Mr. G would like to retrieve all words in a
document containing two Hindi consonants
joined by a halant.
Where are RegExps Used?
Editors: Vim, Emacs, Eclipse, Notepad++ etc.
Programming Languages
Inbuilt: Perl, Ruby, Javascript etc.
As library: C, C++, Java, Php, Python etc.
Unix command line: rename, grep, sed, perl etc.
Lot more:
Configuring Apache Web Server
Syntax Highlighting in editors
Even Google Search (well... not really. Just code search)
Basics
A normal alpha-numeric character in regex
matches that character in target string
hello matches the text hello
. matches any character
* repeats the previous expression zero or more
times
Example Applications
Unix command line: grep
Editor: Vim
Programming: Perl
Metacharacters
. matches any character
a. matches as, ab etc.
^ matches the beginning of a line
$ matches the end of a line
| alternation
H|h matches h or H
() grouping
H|hello matches H or hello
(H|h)ello matches Hello or hello
\ escapes any metacharacter
Mr. matches Mr. and Mrs
Mr\. matches Mr. and not Mrs
Character Classes
[Hh] means (h|H)
[0-9] means (0|1|2|3|4|5|6|7|8|9)
[0-9a-z] means ([0-9]|[a-z])
[^ab] matches any characters but not a and b
\x{0915} matches devanagari
\n matches a new line
\r matches a return
\t matches a tab
Character Classes (Perl)
\w matches a word
\W matches a non-word
\s matches a whitespace
\S matches a non-whitespace
\d matches a digit
Quantifiers
* matches 0 or more times
+ matches 1 or more times
? matches 0 or 1 time
{7} matches 7 times
{5,} matches at least 5 times
{2,5} matches at least 2 times but no more than
5 times
Greedy vs. Stingy
In text "XYZ" to "PQR"
".*" will match "XYZ" to "PQR"
".*?" will match "XYZ"
? applies to all other quantifiers also
Substitutions
s/hello/Hello/ will substitute Hello with hello
s/(H|h)ello/Hi/ will substitute Hello or hello with
Hi
() will extract a match
\1, \2 etc. hold the value of the match
s/([0-9])([0-9])/\2\1/ matches two digits and
reverses them
Modifiers
i means case-insensitive match
/Hello/i will match hello, Hello or HELLO
g means global matching
m means multi-line string
References
Perl Regular Expressions: man perlre
http://perldoc.perl.org/perlre.html
Build Tools
Building a Project
file1.c
file2.c
file3.c
file4.c
file1.o
file2.o
file3.o
file4.o
library1.so
main.c
main.o
program
library2.so
Make
Needs a dependency graph
Operates on files and time stamps
Executes shell commands
Other uses
Any set of tasks with dependency graphs
Automated testing
Building documentation
Even booting an operating system!
Writing Makefiles
hello: hello.o
gcc hello.o -o hello
hello.o: hello.c
gcc hello.c -c -o
hello.o
clean:
rm -f hello.o hello
Using Make
$ make
$ make clean
Basics
Target
Prerequisites
hello: hello.o
gcc hello.o -o hello
hello.o: hello.c
gcc hello.c -c -o
hello.o
clean:
rm -f hello.o hello
Commands
Rules
Bigger Project
hello: main.o filel.o file2.o
gcc main.o file1.o file2.o -o hello
main.o: main.c file1.h file2.h
gcc main.c -c -o main.o
file1.o: file1.c file1.h
gcc file1.c -c -o file1.o
file2.o: file2.c file2.h
gcc file2.c -c -o file2.o
clean:
rm -f hello main.o file1.o file2.o
Improving: Step 1
hello: main.o filel.o file2.o
gcc $^ -o $@
mail.o: file1.h file2.h
main.o: main.c
gcc $^ -c -o $@
file1.o: file1.h
file1.o: file1.c
gcc $^ -c -o $@
file2.o: file2.h
file2.o: file2.c
gcc $^ -c -o $@
clean:
rm -f hello main.o file1.o file2.o
Improving: Step 2
hello: main.o filel.o file2.o
gcc $^ -o $@
mail.o: file1.h file2.h
file1.o: file1.h
file2.o: file2.h
%.o: %.c
gcc $^ -c -o $@
clean:
rm -f hello main.o file1.o file2.o
Improving: Step 3
TARGET = hello
OBJECTS = main.o file1.o file2.o
main.o: file1.h file2.h
file1.o: file1.h
file2.o: file2.h
$(TARGET): $(OBJECTS)
gcc $^ -o $@
%.o: %.c
gcc $< -c -o $@
clean:
rm -f $(TARGET) $(OBJECTS)
Improving: Step 4
TARGET = hello
OBJECTS = main.o file1.o file2.o
main.o: file1.h file2.h
file1.o: file1.h
file2.o: file2.h
$(TARGET): $(OBJECTS)
gcc $^ -o $@
$(OBJECTS): %.o: %.c
gcc $< -c -o $@
clean:
rm -f $(TARGET) $(OBJECTS)
Phony Targets
Try this:
$ touch clean
$ make clean
What happened and why?
Declaring a target as phony addresses the
problem
.PHONY: clean
Improving: Step 5
TARGET = hello
OBJECTS = main.o file1.o file2.o
main.o: file1.h file2.h
file1.o: file1.h
file2.o: file2.h
$(TARGET): $(OBJECTS)
gcc $^ -o $@
$(OBJECTS): %.o: %.c
gcc $< -c -o $@
.PHONY: clean
clean:
rm -f $(TARGET) $(OBJECTS)
Even Better Build System
Autoconf
M4
Write macros for Autoconf
Automake
Detect system environment and build accordingly
Automatically generate makefiles
Libtool
Automatically handle different library formats in
different OSes
References
GNU Make Manual: info make
http://www.gnu.org/software/make/manual/make.html
GNU Automake Manual: info automake
http://www.gnu.org/software/automake/manual
GNU Autoconf Manual: info autoconf
http://www.gnu.org/software/autoconf/manual
Version Control
Why?
Keep track of changes
Release management
Work as a group
Identify regressions easily
Maintain personal changes to code elsewhere
Revisions
Initial Version
Added feature 1
Added feature 2
Fixed bug 1
Latest version
Release Management
Initial Version
Added feature 1
Fixed bug 1
Added feature 2
Version 1.1
Fixed bug 1
Version 2.0
Work as a Group
Initial Version
Added feature 1
B's Feature
A's Feature
Merge
Latest Version
Identify Regressions
Bug free version
Bug introduced
Latest version contains a bug
Personal Changes
Free Software Project
on the Internet
Version 1.0
Version 2.0
My research work
Idea 1
Version 3.0
Idea 2
Version 4.0
Idea 3
Getting Started with Git
Basic configuration:
$ git config --global user.name "Your Name Comes Here"
$ git config --global user.email you@yourdomain.example.com
Creating a repository:
$ git init
Adding files to the repository:
$ git add file1.c
Committing the changes
$ git commit
Editing
Edit your file
$ nano file1.c
Mark for commit
$ git add file1.c
Commit the changes
$ git commit
Reviewing Changes
Edit and review changes
$ nano file1.c
$ git diff
Current status
$ git status
Reviewing Changes (contd.)
Changes between two revisions
$ git diff r1..r2
History of changes
$ git log
Exchanging Patches
The diff format
Patch file
Producing a patch file
$ git diff r1..r2 > my_feature.patch
Applying a patch
$ patch -p1 < my_feature.patch
Better ways
Tagging
What are tags?
Creating a tag
$ git tag VERSION_1
Deleting a tag
$ git tag -d VERSION_1
Retrieving older versions
$ git checkout -b VERSION_1
More Topics of Interest
Branching and Merging
Pushing and Pulling from repositories
Rebasing
Bisecting
Stashing changes
Graphical Tools
References
Git: http://git-scm.com
Official Git Tutorial:
http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html
ProGit Book: http://progit.org
Git Manual Pages: man git
Python
Content derived from Official Python Tutorial http://docs.python.org/tutorial/
Why Python?
Easy for beginners
Yet powerful
Rapid development
Scalable for large and complex project
Object oriented
Cross platform
Large set of libraries for performing various
tasks
First Python Program
$ python
>>> 2 + 3
5
>>>
Hello, World!
$ python
>>> print "Hello, World!"
Hello, World!
>>>
Hello, World! in a File
#!/usr/bin/python
print "Hello, World!"
Python as Calculator
>>> 2+2
4
>>> (50-5*6)/4
5
Variables
>>> a = 2
>>> b = 3
>>> print a * b
6
Strings
>>> hello = "Hello"
>>> world = "World"
>>> print hello
Hello
>>> print world
World
>>> print hello + world
HelloWorld
>>> print hello + ", " + world + "!"
Hello, World!
Lists
>>> a = ['spam', 'eggs', 100, 1234]
>>> a
['spam', 'eggs', 100, 1234]
>>> a[0]
'spam'
>>> a[3]
1234
>>> a[-2]
100
>>> a[1:-1]
['eggs', 100]
>>> a[2] = a[2] + 23
>>> a
['spam', 'eggs', 123, 1234]
More on Lists
>>> a = [66.25, 333, 333, 1, 1234.5]
>>> print a.count(333), a.count(66.25), a.count('x')
2 1 0
>>> a.insert(2, -1)
>>> a.append(333)
>>> a
[66.25, 333, -1, 333, 1, 1234.5, 333]
>>> a.index(333)
1
>>> a.remove(333)
>>> a
[66.25, -1, 333, 1, 1234.5, 333]
>>> a.reverse()
>>> a
[333, 1234.5, 1, 333, -1, 66.25]
>>> a.sort()
>>> a
[-1, 1, 66.25, 333, 333, 1234.5]
More on Lists
>>> mat = [
...
[1, 2, 3],
...
[4, 5, 6],
...
[7, 8, 9],
...
]
Tuples
>>> t = 12345, 54321, 'hello!'
>>> t[0]
12345
>>> t
(12345, 54321, 'hello!')
>>> # Tuples may be nested:
... u = t, (1, 2, 3, 4, 5)
>>> u
((12345, 54321, 'hello!'), (1, 2, 3, 4, 5))
>>> t = 12345, 54321, 'hello!'
>>> x, y, z = t
Dictionaries
>>> tel = {'jack': 4098, 'sape': 4139}
>>> tel['guido'] = 4127
>>> tel
{'sape': 4139, 'guido': 4127, 'jack': 4098}
>>> tel['jack']
4098
>>> del tel['sape']
>>> tel['irv'] = 4127
>>> tel
{'guido': 4127, 'irv': 4127, 'jack': 4098}
>>> tel.keys()
['guido', 'irv', 'jack']
>>> 'guido' in tel
True
If .. else
>>> x = int(raw_input("Please enter an int: "))
Please enter an integer: 42
>>> if x < 0:
...
x = 0
...
print 'Negative changed to zero'
... elif x == 0:
...
print 'Zero'
... elif x == 1:
...
print 'Single'
... else:
...
print 'More'
For
>>> # Measure some strings:
... a = ['cat', 'window', 'defenestrate']
>>> for x in a:
...
print x, len(x)
...
cat 3
window 6
defenestrate 12
For
>>>
[0,
>>>
>>>
...
range(10)
1, 2, 3, 4, 5, 6, 7, 8, 9]
a = ['Mary', 'had', 'a', 'little', 'lamb']
for i in range(len(a)):
print i, a[i]
Break
>>> for i in range(10):
...
if i > 5:
...
break
...
print i
...
0
1
2
3
4
5
Continue
>>> for i in range(10):
...
if i == 5:
...
continue
...
print i
...
0
1
2
3
4
6
7
8
9
Comments
>>>
>>>
...
...
# This is single line comment
""" This is a
multiline
comment"""
Functions
>>>
...
...
...
...
...
...
>>>
...
0 1
def fib(n):
# print Fibonacci series
"""Print a Fibonacci series up to n."""
a, b = 0, 1
while a < n:
print a,
a, b = b, a+b
# Now call the function we just defined:
fib(1000)
1 2 3 5 8 13 21 34 55 89 144 233 377 610 987
References
Python Programming Language Official
Website: http://python.org
The Python Tutorial:
http://docs.python.org/tutorial
The Python Standard Library:
http://docs.python.org/library
The Python Language Reference:
http://docs.python.org/reference
Feedback & Further Assistance:
sunil at medhas dot org
Thank you