0% found this document useful (0 votes)
10 views

Intro to C

Uploaded by

navid.panah1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Intro to C

Uploaded by

navid.panah1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Intro to C and assembly

Why C?
Compared to other high-level languages
■ Maps almost directly into hardware instructions making code
potentially more efficient
● Provides minimal set of abstractions compared to other HLLs
● HLLs often make programming simpler at the expense of
efficiency

Compared to assembly programming


■ Abstracts out hardware (i.e. registers, memory addresses) to
make code portable and easier to write
■ Provides variables, functions, arrays, complex arithmetic
and boolean expressions
Why C?
Used prevalently in critical applications
■ Operating systems (e.g. Windows, Linux, FreeBSD/OS X)
■ Web servers (apache, Google’s front-end)
■ Web browsers (firefox, chrome)
■ Mail servers (sendmail, postfix)
■ DNS servers (bind)
■ Video games (any FPS)
■ Office apps (Word, Excel, PowerPoint, Adobe)
■ Graphics card programming (OpenCL GPGPU programming)
Example
2/2014
Example
Heartbleed (4/2014)
Why assembly?
Learn how programs map onto underlying hardware
■ Allows programmers to write efficient code
■ Allows one to identify security problems caused by
programming language and CPU architecture

Enable platform-specific tasks


■ Access and manipulate hardware-specific registers
■ Utilize latest CPU instructions
■ Interface with hardware devices

Reverse-engineer unknown binary code


■ Identify what viruses, spyware, rootkits, and other malware
are doing
■ Understand how cheating in on-line games work
Example
• Meltdown and Spectre
Example
FBI Tor Exploit (Playpen) (8/2013)

8/2014
Example
Example
Shellshock
C
The C Programming Language
One of many programming languages
C is an imperative, procedural programming language
Imperative
■ Computation consisting of statements that change program state
■ Language makes explicit references to state (i.e. variables)
Procedural
■ Computation broken into modular components (“procedures” or
“functions”) that can be called from any point
Contrast to declarative programming languages
■ Describes what something is like, rather than how to create it (e.g.
HTML, SQL)
■ Implementation left to other components (e.g. a web browser or
database server)
The C Programming Language
Simpler than higher-level languages such as C++, C#, Java
■ No support for
● Objects
● Managed memory (e.g. garbage collection)
● Array bounds checking
● Name spaces
■ Simple support for
● Typing
● Structures
■ Basic utility functions supplied by libraries linked in at
compile-time or run-time
● libc, libpthread, libm
■ Low-level, direct access to machine memory (pointers)
■ Easier to write bugs, harder to write safe programs, typically faster
Language features updated with ANSI-C standard
The C Programming Language
Compilation down to machine code as in C++
■ Compiled, assembled, linked via gcc

Compared to interpreted languages…


■ Python/JavaScript
● Commands executed by run-time interpreter
● Interpreter runs natively
■ Java
● Compilation to virtual machine “byte code”
● Byte code interpreted by virtual machine software
● Virtual machine runs natively
C variables
Named using letters, numbers, some special characters
■ By convention, not all capitals
Must be declared before use
■ C is statically typed
■ Contrast to typical dynamically typed scripting languages
(Python, PHP, JavaScript)
Variable declaration format
<type> <variable_name>;
■ Optional initialization using assignment operator (=)
C statements end with ‘;’
Examples
int foo = 34;
float ff = 34.99;
Integer data types and sizes
char – single byte integer
■ 8-bit character, hence the name
■ Strings implemented as arrays of char and referenced via a
pointer to the first char of the array
short – short integer
■ 16-bit (2 bytes) not used much
int – integer
■ 32-bit (4 bytes) used in IA32
long – long integer
■ 64-bit (8 bytes) in x64 (x86-64)
Floating point types and sizes
float – single precision floating point
■ 32-bit (4 bytes)
double – double precision floating point
■ 64 bit (8 bytes)
Data Type Ranges for x86-64
Type Size Range
char 1 -128 to 127
short 2 -32,768 to 32,767
int 4 -2,147,483,648 to 2,147,483,647
long 8 -263 to 263-1
(-9,223,372,036,854,775,808 to …)

float 4 3.4E+/-38
double 8 1.7E+/-308
Constants
Integer literals (constants)
■ Decimal constants directly expressed (1234, 512)
■ Hexadecimal constants preceded by '0x' (0xFE , 0xab78)

Character constants
■ Single quotes to denote ( 'a' )
■ Corresponds to ASCII numeric value of character 'a'

String Literals
■ Double quotes to denote ("I am a string")
■ "" is the empty string
Arrays
char foo[80];
■ An array of 80 characters (stored contiguously in memory)
sizeof(foo) = 80 × sizeof(char)= 80 × 1 = 80 bytes

int bar[40];
■ An array of 40 integers (stored contiguously in memory)
sizeof(bar)= 40 × sizeof(int)= 40 × 4 = 160 bytes
Structures
Aggregate and organize related data of arbitrary types
typedef unsigned long time_t;
typedef unsigned long suseconds_t;

struct timeval {
time_t tv_sec;
suseconds_t tv_usec;
}; /* <== DO NOT FORGET the semicolon */

struct timeval curtime;


curtime.tv_sec = 100;
curtime.tv_usec = 0;
C operators
Logical relation operators (return 0 or 1)
< > <= >= == != && || !
Bit-wise boolean operators
& | ~ ^

Arithmetic operators
+ - * / % (modulus)
int foo = 30;
int bar = 20;
foo = foo + bar;
■ Equivalent shortened form
foo += bar;
Increment and Decrement
Comes in prefix and postfix flavors
■ i++ ++i
■ i-- --i

Makes a difference in evaluating complex statements


■ A major source of bugs
■ Prefix: increment happens before evaluation
■ Postfix: increment happens after evaluation

When the actual increment/decrement occurs is


important to know about
■ What are the values of these expressions for i = 3 ?
i++*2
++i*2
C control flow
Conditional execution
Expression delineated by ( )
if (x == 4)
y = 3; /* sets y to 3 if x is 4 */

Code blocks delineated by curly braces { }


■ For blocks consisting of more than one C statement
if ( ) { } else { }

while ( ) { }

do { } while ( );

for(i=1; i <= 100; i++) { }

switch ( ) {case 1: … }
Other control-flow statements
Keywords and their semantics
■ continue; control passed to next iteration of do/for/while
■ break; pass control out of code block
■ return; exits function it is used in immediately and returns
value specified
Function calls (static)
Function or its prototype must be declared before its
use
Calls to functions typically static (resolved at
compile-time)

void print_ints(int a, int b) {


printf("%d %d\n",a,b);
}

int main(int argc, char* argv[]) {


int i=3;
int j=4;
print_ints(i,j);
}
Example 1
#include <stdio.h>
int main(int argc, char* argv[])
{
/* print a greeting */
printf("Hello world!\n");
return 0;
}

$ gcc -o hello hello.c


$ ./hello
Hello world!
$
Breaking down the code
#include <stdio.h>
■ Include the contents of the file stdio.h
● Case sensitive – lower case only
■ No semicolon at the end of line
■ Required in order to supply function definition for printf call

int main(…)
■ The OS calls this function when the program starts running.

printf(format_string, arg1, …)
■ Call function from libc library
■ Prints out a string, specified by the format string and the
arguments.
Passing arguments
main has two arguments from the command line
int main(int argc, char* argv[])
argc
■ Number of arguments (including program name)
argv
■ Pointer to an array of string pointers
argv[0]: = program name
argv[1]: = first argument
argv[argc-1]: last argument

● Example: find . –print


– argc = 3
– argv[0] = "find"
– argv[1] = "."
– argv[2] = "-print"
Example 2
#include <stdio.h>

int main(int argc, char* argv[])


{
int i;
printf("%d arguments\n", argc);
for(i = 0; i < argc; i++)
printf(" %d: %s\n", i, argv[i]);
return 0;
}
Example 2
$ ./cmdline The Class That Gives CS Its Zip
8 arguments
0: ./cmdline
1: The
2: Class
3: That
4: Gives
5: CS
6: Its
7: Zip
$
C quirks
Pointers
Unique to C
■ Variable that holds an address in memory.
■ Address in memory contains another variable.
■ All pointers are 8 bytes (64-bits) for x86-64

Every pointer has a type


■ Type of data at the address (char, int, long, float,
double)
Pointer operators
Declared via the * operator in C variable declarations
Assigned via the & operator
■ Valid on all lvalues
■ Anything that can appear on the left-hand side of an
assignment
Pointer operators
Dereferenced via the * operator in C statements
■ Returns the data that is stored in the memory location
specified by the pointer
■ Type of pointer determines what is returned when
"dereferenced"
■ Example
int x = 1, y = 2;
int *ip = &x;
y = *ip; // y is now 1
*ip = 0; // x is now 0
Dereferencing uninitialized pointers:
■ What happens?
int *ip;
*ip = 3;
■ Segmentation fault
■ Pointers must always be pointing to allocated memory space!
Using Pointers
long i; /* data variable */
long *i_addr; /* pointer variable */

i i_addr

? ?
0x4300 0x4308

i_addr = &i; /* & = address operator */

i i_addr

? 4300
0x4300 0x4308
Using Pointers
*i_addr = 32; /* dereference operator */

i i_addr

32 4300
0x4300 0x4308

long j = *i_addr; /* dereference: j is now 32 */

i i_addr j

32 4300 32
0x4300 0x4308 0x4310
Using Pointers
i = 13; /* but j is still 32 */

i i_addr j

13 4300 32
0x4300 0x4308 0x4310
Pointers and arrays in C
Assume array z[10]
■ z[i] returns ith element of array z
■ &z[i] returns the address of the ith element of array z
■ z alone returns address the array begins at or the address of
the 0th element of array z (&z[0])
int* ip;
int z[10];
ip = z; /* equivalent to ip = &z[0]; */
Pointer arithmetic
Done based on type of pointer
char* cp1;
int* ip1;
cp1++; // Increments address by 1
ip1++; // Increments address by 4

Often used to sequence arrays


int* ip;
int z[10];
ip = z;
ip += 3;
*ip = 100

How much larger is ip than z? 12


Which element of z is set to 100? z[3] = 100
Function call parameters
Function arguments are passed “by value”.
What is “pass by value”?
■ The called function (callee) is given a copy of the
arguments.

What does this imply?


■ The callee can’t alter a variable in the caller function, only its
copy given through arguments.
Example 1: swap_1

void swap_1(int a, int b) If x=3,y=4, then


{ after
int temp; swap_1(x,y);
temp = a;
a = b; A1: x=4; y=3;
b = temp;
} A2: x=3; y=4;
Example 2: swap_2

void swap_2(int *a, int If x=3,y=4, then


*b) after
{ swap_2(&x,&y);
int temp;
temp = *a;
*a = *b; A1: x=3; y=4;
*b = temp;
} A2: x=4; y=3;
Call by value vs. reference in C
Call by reference implemented via pointer passing
void swap(int *px, int *py) {
int tmp;
tmp = *px;
*px = *py;
*py = tmp;
}
■ Swaps the values of the variables x and y if px is &x and py is &y
■ Uses integer pointers instead of integers
Otherwise, call by value...
void swap(int x, int y) {
int tmp;
tmp = x;
x = y;
y = tmp;
}
Assignments and expressions
In C, assignment is an expression
■ x = 4 has the value 4

if (x == 4)
y = 3; /* sets y to 3 if x is 4 */

if (x = 4)
y = 3; /* always sets y to 3 */

while ((c=getchar()) != EOF)


Tricky expressions

https://freedom-to-tinker.com/blog/felten/the-linux-backdoor-attempt-of-2003/
JPL’s Power of Ten rules
C is difficult to code securely in
For safety-critical, embedded applications (IoT, driverless cars,
etc), use these rules to improve safety of code
Rule #1
■ Restrict all code to very simple control flow constructs
■ Avoid using goto statements, setjmp or longjmp constructs, and
recursion
■ Easier to verify, better code clarity, acyclic call graph for integrity
checks
Rule #2
■ Ensure loops have a fixed upper bound that can be statically
proven to prevent runaway code
Rule #3
■ Do not use dynamic memory allocation after initialization
■ Predictability in performance, avoidance of memory leaks and
memory corruption bugs in allocation/deallocation code
JPL’s Power of Ten rules
Rule #4
■ No function should be longer than what can be printed on a single
sheet of paper (60 lines)
■ Each function should be a logical unit that is understandable and
verifiable as a unit
Rule #5
■ Code should average a minimum of two assertions per function to
check for anomalous conditions that should never happen
■ Verify pre- and post-conditions of functions, parameters, return
values and loop invariants
■ Assertion failures should return error conditions to caller
■ Defect rate of 1 every 10-100 lines of code. Assertions help catch
them
JPL’s Power of Ten rules
Rule #6
■ Data objects should be declared at the smallest possible level of
scope
■ Data-hiding keeps values from being referenced or corrupted from
elsewhere
■ Allows easier diagnosis of errors
Rule #7
■ Return value of non-void functions should be checked by calling
function
■ Validity of parameters should be checked inside each function
■ Common to ignore return value checks on many C library calls
■ Exceptions should be justified
JPL’s Power of Ten rules
Rule #8
■ Limit preprocessor use to inclusion of header files and simple
macros. Macros should expand into complete syntactic units
■ Limit the use of conditional compilation directives to one or two to
limit number of versions of code that need to be tested
■ Reduce obfuscated code
Rule #9
■ Restrict the use of pointers with no more than one level of
dereferencing. Function pointers should be avoided
■ Pointers easily misused, are hard to follow and analyze for static
analyzers
■ Must have clear flow of control and function call hierarchies to
reason about safety of code
Rule #10
■ Compile all code with all compiler warnings enabled -Wpedantic
■ Code should pass with no warnings
■ Modern, static source code analyzers can catch a large percentage
of errors with few false positives
Constant pointers
Used for static arrays
■ Square brackets used to denote arrays
■ Symbol that points to a fixed location in memory
char amsg[ ] = “This is a test”; This is a test\0

■ Can change characters in string ( amsg[3] = 'x'; )


■ Can not reassign amsg to point elsewhere (i.e. amsg = p)

You might also like