0% found this document useful (0 votes)
26 views6 pages

Lab1 - ELF

This document provides instructions for Lab 1 of an Introduction to Operating Systems course. The lab explores the ELF file format used for executable files in Unix systems. Students will write a C program to read and analyze the ELF header of executable files. The lab is due on September 17th and students should submit their work by committing files to their personal Git repository for the course. The document provides background on the ELF file format and instructions for using Unix tools like hexdump and readelf to examine the binary contents and ELF header of sample executable files.

Uploaded by

Joseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views6 pages

Lab1 - ELF

This document provides instructions for Lab 1 of an Introduction to Operating Systems course. The lab explores the ELF file format used for executable files in Unix systems. Students will write a C program to read and analyze the ELF header of executable files. The lab is due on September 17th and students should submit their work by committing files to their personal Git repository for the course. The document provides background on the ELF file format and instructions for using Unix tools like hexdump and readelf to examine the binary contents and ELF header of sample executable files.

Uploaded by

Joseph
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

9/16/21, 11:02 AM https://learn.macewan.ca/bbcswebdav/pid-3405656-dt-content-rid-41235220_1/courses/004951-01-2219-1-AS01-90655/lab1.

html

CMPT 360: Introduction to Operating Systems Fall 2021, MacEwan University


Lab instructor: Cam Macdonell

Lab 1: Exploring ELF executable programs


Overview

In this lab we will explore the ELF file format which is the format for executable files in most Unix-based operating systems.

Please read through this lab beforehand and come with questions. We will work through the pre-lab portion together.

Due date and Submission Requirements

Due on Friday, 17 September before 23:59 MDT.

Your personal git repository for this class is stored on the students.cs server. It is located at /var/git/cmpt360f21/<your
username>. Unless explicitly stated otherwise, you will be submi ing your work for CMPT 360 labs by commiting your deliverables
to this repository.

You should be comfortable with the basics of Git from previous classes. I have added a review tutorial for Git which you can work
through if you are feeling rusty; additionally, the first half of Chapter 2 of the freely-available book, "Pro Git" is a very good resource.

You should clone your personal CMPT 360 repository and make a small test commit before proceeding to work on the rest of the lab.
You will submit the work for this lab inside a directory called L1 at the root of your repository. As always, we will be looking for files in
specific locations in your repository so please do not deviate from specified naming conventions!

You will be required to write a C program in this lab, and it may have been some time since you have done so. Make sure your program
is robust to invalid input: your program should safely handle invalid or missing arguments and report errors appropriately and
descriptively. Some standard library functions, variables, and identifiers that you may find useful in this lab are:

- FILE *fopen(const char *path, const char *mode);


- size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);
- int fseek(FILE *stream, long offset, int whence);
- char *strerror(int errnum);
- int errno;
- int memcmp(const void *s1, const void *s2, size_t n);
- SEEK_SET

To find the function signatures for functions such as these, and the appropriate header file to include, you can check the manual page
for each function by running ie. man fopen at the Unix terminal. Occasionally there are multiple valid manual pages and man will ask
you which one you want: Generally speaking, we would be interested in manual pages in section 2 or section 3 of the manual.

Prelab: The ELF file format

You should recall from CMPT 201 that files on a Unix system are sequences of bytes laid out on disk. (When we get to discussing file
systems, we will have much to say about how exactly those bytes are laid out physically. But, for the time being, you can think of a file,
irrespective of what the file contains, as an "array of bytes".)

If we are interested in viewing a file as a sequence of raw bytes, we can use the hexdump program, installed on the students.cs
server, to read each byte and print out each byte that the file contains. Here is how we show the first 64 bytes of some particular text
file:
IsAmpaddressbooktxt wherethisislocated
https://learn.macewan.ca/bbcswebdav/pid-3405656-dt-content-rid-41235220_1/courses/004951-01-2219-1-AS01-90655/lab1.html 1/4
9/16/21, 11:02 AM https://learn.macewan.ca/bbcswebdav/pid-3405656-dt-content-rid-41235220_1/courses/004951-01-2219-1-AS01-90655/lab1.html

macdonellc4@students:~> hexdump -C
00000000 43 61 6d 20 4d 61 63 64
-n
6f
64
6e
addressbook.txt
65 6c 6c 0a 31 32 |Cam Macdonell.12|
44 5
00000010 33 34 20 57 61 6c 6c 20 53 74 2e 2c 20 41 70 74 |34 Wall St., Apt|
00000020 20 35 43 0a 4e 65 77 20 59 6f 72 6b 2c 20 4e 59 | 5C.New York, NY|
00000030 20 31 30 30 32 31 0a 32 31 32 2d 35 35 35 2d 34 | 10021.212-555-4|

The program output shows three columns: the first one is the offset into the file (in hex), then sixteen bytes in hex values, and then
those same sixteen bytes again as ASCII characters (if a given byte has no ASCII character value, the rightmost column represents it
simply as the '.' character.)

Concretely: if I opened this file for binary reading and called getc(), I would be given back the hex value 0x43 as the first byte.

Just like how an array of bytes in C can be cast to a struct, revealing some internal structure and layout, files can be thought of as
having "internal structure" to the bytes that they contain. This structure is referred to as the "file format" of a given file. JPEG images
and MP3 audio files are two well-known file formats. Arguably, a text file, containing one or more lines of ASCII text, is a very simple

filesare structuredin fileformat Elf isjustanexampleofthis


file format too!
a
When the C compiler outputs an executable program, that program is in a particular file format, too. That format is called the
Executable And Linking Format. Examples of data that are stored in ELF files include:

- The machine code that the processor will execute


- Metadata such as debug information (if you have optionally compiled it in)
- Constant program data like string literals and other initialised global variables

You can see from the above Wikipedia article as well as this diagram how different regions of the file, called sections correspond to
different kinds of important data needed to execute a program.

Let's play the same game again: let's hexdump the first 64 bytes of an executable program. Here is a sample program in a file called
hello.c.

1 #include <stdio.h>
2
3 int course_num = 360;
4
5 int main(int argc, char** argv) {
6 printf("Hello CMPT %d!\n", course_num);
7 return 0;
8 }

macdonellc4@students:~> gcc -Wall -g hello.c -o hello


macdonellc4@students:~> ./hello
thepot abytes
read Anyexecutablefilestart
Hello CMPT 360! with Eve headerofElffile
macdonellc4@students:~> hexdump -C -n 64 hello so itwill thesamefirst
alwayshave
buteo 00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............| y
16
byte 00000010 02 00 3e 00 01 00 00 00 50 04 40 00 00 00 00 00 |..>.....P.@.....|
byte32 00000020 40 00 00 00 00 00 00 00 70 28 00 00 00 00 00 00 |@.......p(......|
64
byte 00000030 00 00 00 00 40 00 38 00 09 00 40 00 27 00 26 00 |[email protected]...@.'.&.|

sand
Since this is a binary file, we do not see "human-readable" output like ASCII characters (notice most of the rightmost column
representations are '.' characters). But, this doesn't mean we can't understand how this file is laid out with the help of some Unix
utility programs and a bit of programming!

In this lab, we are going to write a simple C program to understand the ELF header of an executable file. A file format's header is often
the first structure laid out in a binary file, and acts as a "table of contents" for the rest of the file.

https://learn.macewan.ca/bbcswebdav/pid-3405656-dt-content-rid-41235220_1/courses/004951-01-2219-1-AS01-90655/lab1.html 2/4
9/16/21, 11:02 AM https://learn.macewan.ca/bbcswebdav/pid-3405656-dt-content-rid-41235220_1/courses/004951-01-2219-1-AS01-90655/lab1.html

There is a built-in Unix program called readelf that will parse the binary executable file it is given and report its information
textually. The -h flag will report the ELF header of the supplied file. Therefore, to print the ELF header for the hello program above,
we would run:

macdonellc4@students:~> readelf -h hello

Try this with a compiled C program. You can use hello.c, or the mem.c and cpu.c programs from the textbook; they are located in
the directory /tmp/cmpt360/Cprograms on the students.cs server. When you do, you should see a series of facts about the
executable printed to your terminal's standard output, such as the processor architecture that the program is compiled for. As the course
proceeds, we will learn what a lot of these other fields mean. Now, try this with a non-executable program like a plain old text file.
What error do you get?

Your first task writea cfilethatcan read the cpufile

The first four bytes of an ELF header (and therefore the first four bytes of an ELF file) contain what are called magic bytes - this should
sound familiar to you, based on the error you saw when readelf was given a non-ELF file to read. The purpose of magic bytes is a
simple check that a supplied file is structure in the right binary file format. The four magic bytes of an ELF file are, as we saw from the
hexdump output above: 0x7f, 0x45, 0x4c, 0x46. (Notice that the second, third, and fourth characters are E,L,F, cute!) If a file
does not contain these magic bytes, we know it cannot be a file in the ELF file format. (Pause and ponder: if a file contains these four
bytes, do we know for sure that it's an executable program in the ELF format?)

Your task for today is to write a short C program named exactlyelf_verify.c that can verify that the first 4 bytes of an executable
contain the values of the ELF format's magic byte sequence. If a program does have these characters, print out the supplied program
path, followed by "is an ELF executable!" or "is not an ELF executable!".

Sample output:

macdonellc4@students:~/cmpt360/lab_sols/lab1> make
gcc -Wall elf_verify.c -o elf_verify
macdonellc4@students:~/cmpt360/lab_sols/lab1> ./elf_verify
Usage: ./elf_verify <file>
macdonellc4@students:~/cmpt360/lab_sols/lab1> ./elf_verify ~/hello
/home/faculty/macdonellc4/hello is an ELF executable!
macdonellc4@students:~/cmpt360/lab_sols/lab1> ./elf_verify ~/addressbook.txt
/home/faculty/macdonellc4/addressbook.txt is not an ELF executable!

Your second task

In this task, you will write short-answer responses to questions. Please answer these questions in a file called answers.txt, which you
will also check into git repository, in the L1 directory.

When looking at the output of readelf, you will see an entry for "Entry Point Address". This is the address of the first CPU instruction
that the program will execute.

Question 1: Is the entry point address the same for all executables? Compare the entry point address of at least a few different
executable programs. Indicate your response, as well as a hypothesis as to why this is the case, to your answers file.

Now, let's consider another way of making a binary file human-readable. objdump is another Unix tool that can decompile a binary
executable into a human-readable assembly language. As we shall see later in the course (and to a greater degree when you take the
compilers course), the "text section" refers to all the program's executable instructions.

macdonellc4@students:~/cmpt360/lab_sols/lab1> objdump -d -j ".text" ./elf_verify | less

https://learn.macewan.ca/bbcswebdav/pid-3405656-dt-content-rid-41235220_1/courses/004951-01-2219-1-AS01-90655/lab1.html 3/4
9/16/21, 11:02 AM https://learn.macewan.ca/bbcswebdav/pid-3405656-dt-content-rid-41235220_1/courses/004951-01-2219-1-AS01-90655/lab1.html

If you are unfamiliar with Unix shell pipelines: less is a pager program, a handy tool for exploring text output that is too big to fit on a
screen. The | pipe character sends the output of the program on the left to the program on the right.

Question 2: Explore the disassembled binary's text section to find the main function. In answers.txt, answer the following: Does the
address of the first instruction in the main function correspond to the entry point of the executable?

Also: objdump is a program with many command line arguments. To learn about them, you can open the manual page for the
objdump program (and most other Unix utility programs) by running man objdump at your terminal.

Your third task

Extend your elf_verify program to print out the entry point address for files that are proper ELF executables. Recall that this is
another field in the ELF header, which we saw with readelf -h, and lives at some offset into the executable file. Use the ELF format
description link given above to determine where in the header this address resides, read it from the file, and print it out.

Sample output:

macdonellc4@students:~/cmpt360/lab_sols/lab1> readelf -h ~/hello


...
Entry point address: 0x400450
...
macdonellc4@students:~/cmpt360/lab_sols/lab1> ./elf_verify ~/hello
/home/faculty/macdonellc4/hello is an ELF executable!
Its entry point address is 0x400450

Submission makesure executables etcare in git


ignore

Submit your answers.txt file, your C source file, and Makefile, by commi ing them to your git repository. Make sure to push them to
the remote repo using git push. Make sure to write a useful commit message and ensure your directory you added is named exactly
as specified, not "mylab1", "lab1", "l1", or anything else. You can run git status to ensure that no files have been forgo en.

Do not check in your executable or any object files. I can build your executable from your C file, so it is not necessary to commit an
executable. Generally speaking: only submit source code, documentation, and build scripts to repositories. Broadly speaking: a core
tenet of reproducable builds is: if it can be generated from those files, it shouldn't be checked in. As above, git status can confirm
what will be commi ed.

Rendered on Wednesday, 15 September 2021 09:11PM .

III it
booktxt
address the submitted

makesure
you dogitpush frommyowndirinstudent

Toverifythatit'sbeenpushedtotheorigincheck
gitlogandif youseeoriginmasterthen its
youvepushedit If itjustsays HEAD then
beencommittedbutnotpushed tooriginyet

https://learn.macewan.ca/bbcswebdav/pid-3405656-dt-content-rid-41235220_1/courses/004951-01-2219-1-AS01-90655/lab1.html 4/4
I copiedaddressbooktomy owndir
Iwas in U dir means are copying
you
book.txt
CP Ampaddress tocurrentpath
ofbit
2 youhowmanyvaluesproducedathe ofbits
tells
ie ubits 24 16isthemaxvalue in

thf.iea c programtounderstandthe Elf headerofanexecutablefile


Afileformatsheaderisoftenthefirststructurelaidoutinbinarytieandactsasthetableofcontents

t elfverify c
thefirstubytesofanexecutablecontainsthenamesoftheelfformatsmagicbytesequence
Goalverifythat
output supplied path
p rogram isanELFexecutable
or
isnotanelfexecutable
2 answers txt
toQuestions
Provideanswers

a istheentrypointaddressthesameforallexecutables
functioncorrespondtotheentrypointofthe
bDoestheaddressofthefirstinstructioninthemain executable

3 extending elfverify c
Extendtoprint
outtheentrypointaddresses forfilesthatareproperEveexecutables
notethisisjust fieldintheeveheaderandlivesatsomeoffsetintotheexecutable
another
file
Output
itsentrypointaddressis ox400450
Submission

answers txt
elfverifyc
make
file
Inthelabbeware oflittleendiansystemsoifweweretoprintout
byteby byte itwillbereversed
However ifwe extractthe bytes storeit in anintvariableandprintit
4
itwillbeincorrectorderbli itwasprintedall atonce Orderdidn'tmatter
theELFHeader
Understanding

Bytes ClassDataVersion
Magic

t
entrypoint
address

magics always the same in every Elffile


Lass ol indicates 32 bitarch
02 indicates64bitarch
thiswilldetermineifyourmachineis in32bitororbit
that
Note
and thenaffectoffsetofentrypointaddress
would

Data Ol indicates usB leastsignificantbit


02 indicates msts mostsignificantbit
version versiontype 101 is currentlytheonlyone

Machine
Type Be 624AMD
64 inter noteApple
side unsArm
now

You might also like