Intro to C
Intro to C
Why C?
Compared to other high-level languages
■ Maps almost directly into hardware instructions making code
potentially more efficient
● Provides minimal set of abstractions compared to other HLLs
● HLLs often make programming simpler at the expense of
efficiency
8/2014
Example
Example
Shellshock
C
The C Programming Language
One of many programming languages
C is an imperative, procedural programming language
Imperative
■ Computation consisting of statements that change program state
■ Language makes explicit references to state (i.e. variables)
Procedural
■ Computation broken into modular components (“procedures” or
“functions”) that can be called from any point
Contrast to declarative programming languages
■ Describes what something is like, rather than how to create it (e.g.
HTML, SQL)
■ Implementation left to other components (e.g. a web browser or
database server)
The C Programming Language
Simpler than higher-level languages such as C++, C#, Java
■ No support for
● Objects
● Managed memory (e.g. garbage collection)
● Array bounds checking
● Name spaces
■ Simple support for
● Typing
● Structures
■ Basic utility functions supplied by libraries linked in at
compile-time or run-time
● libc, libpthread, libm
■ Low-level, direct access to machine memory (pointers)
■ Easier to write bugs, harder to write safe programs, typically faster
Language features updated with ANSI-C standard
The C Programming Language
Compilation down to machine code as in C++
■ Compiled, assembled, linked via gcc
float 4 3.4E+/-38
double 8 1.7E+/-308
Constants
Integer literals (constants)
■ Decimal constants directly expressed (1234, 512)
■ Hexadecimal constants preceded by '0x' (0xFE , 0xab78)
Character constants
■ Single quotes to denote ( 'a' )
■ Corresponds to ASCII numeric value of character 'a'
String Literals
■ Double quotes to denote ("I am a string")
■ "" is the empty string
Arrays
char foo[80];
■ An array of 80 characters (stored contiguously in memory)
sizeof(foo) = 80 × sizeof(char)= 80 × 1 = 80 bytes
int bar[40];
■ An array of 40 integers (stored contiguously in memory)
sizeof(bar)= 40 × sizeof(int)= 40 × 4 = 160 bytes
Structures
Aggregate and organize related data of arbitrary types
typedef unsigned long time_t;
typedef unsigned long suseconds_t;
struct timeval {
time_t tv_sec;
suseconds_t tv_usec;
}; /* <== DO NOT FORGET the semicolon */
Arithmetic operators
+ - * / % (modulus)
int foo = 30;
int bar = 20;
foo = foo + bar;
■ Equivalent shortened form
foo += bar;
Increment and Decrement
Comes in prefix and postfix flavors
■ i++ ++i
■ i-- --i
while ( ) { }
do { } while ( );
switch ( ) {case 1: … }
Other control-flow statements
Keywords and their semantics
■ continue; control passed to next iteration of do/for/while
■ break; pass control out of code block
■ return; exits function it is used in immediately and returns
value specified
Function calls (static)
Function or its prototype must be declared before its
use
Calls to functions typically static (resolved at
compile-time)
int main(…)
■ The OS calls this function when the program starts running.
printf(format_string, arg1, …)
■ Call function from libc library
■ Prints out a string, specified by the format string and the
arguments.
Passing arguments
main has two arguments from the command line
int main(int argc, char* argv[])
argc
■ Number of arguments (including program name)
argv
■ Pointer to an array of string pointers
argv[0]: = program name
argv[1]: = first argument
argv[argc-1]: last argument
i i_addr
? ?
0x4300 0x4308
i i_addr
? 4300
0x4300 0x4308
Using Pointers
*i_addr = 32; /* dereference operator */
i i_addr
32 4300
0x4300 0x4308
i i_addr j
32 4300 32
0x4300 0x4308 0x4310
Using Pointers
i = 13; /* but j is still 32 */
i i_addr j
13 4300 32
0x4300 0x4308 0x4310
Pointers and arrays in C
Assume array z[10]
■ z[i] returns ith element of array z
■ &z[i] returns the address of the ith element of array z
■ z alone returns address the array begins at or the address of
the 0th element of array z (&z[0])
int* ip;
int z[10];
ip = z; /* equivalent to ip = &z[0]; */
Pointer arithmetic
Done based on type of pointer
char* cp1;
int* ip1;
cp1++; // Increments address by 1
ip1++; // Increments address by 4
if (x == 4)
y = 3; /* sets y to 3 if x is 4 */
if (x = 4)
y = 3; /* always sets y to 3 */
https://freedom-to-tinker.com/blog/felten/the-linux-backdoor-attempt-of-2003/
JPL’s Power of Ten rules
C is difficult to code securely in
For safety-critical, embedded applications (IoT, driverless cars,
etc), use these rules to improve safety of code
Rule #1
■ Restrict all code to very simple control flow constructs
■ Avoid using goto statements, setjmp or longjmp constructs, and
recursion
■ Easier to verify, better code clarity, acyclic call graph for integrity
checks
Rule #2
■ Ensure loops have a fixed upper bound that can be statically
proven to prevent runaway code
Rule #3
■ Do not use dynamic memory allocation after initialization
■ Predictability in performance, avoidance of memory leaks and
memory corruption bugs in allocation/deallocation code
JPL’s Power of Ten rules
Rule #4
■ No function should be longer than what can be printed on a single
sheet of paper (60 lines)
■ Each function should be a logical unit that is understandable and
verifiable as a unit
Rule #5
■ Code should average a minimum of two assertions per function to
check for anomalous conditions that should never happen
■ Verify pre- and post-conditions of functions, parameters, return
values and loop invariants
■ Assertion failures should return error conditions to caller
■ Defect rate of 1 every 10-100 lines of code. Assertions help catch
them
JPL’s Power of Ten rules
Rule #6
■ Data objects should be declared at the smallest possible level of
scope
■ Data-hiding keeps values from being referenced or corrupted from
elsewhere
■ Allows easier diagnosis of errors
Rule #7
■ Return value of non-void functions should be checked by calling
function
■ Validity of parameters should be checked inside each function
■ Common to ignore return value checks on many C library calls
■ Exceptions should be justified
JPL’s Power of Ten rules
Rule #8
■ Limit preprocessor use to inclusion of header files and simple
macros. Macros should expand into complete syntactic units
■ Limit the use of conditional compilation directives to one or two to
limit number of versions of code that need to be tested
■ Reduce obfuscated code
Rule #9
■ Restrict the use of pointers with no more than one level of
dereferencing. Function pointers should be avoided
■ Pointers easily misused, are hard to follow and analyze for static
analyzers
■ Must have clear flow of control and function call hierarchies to
reason about safety of code
Rule #10
■ Compile all code with all compiler warnings enabled -Wpedantic
■ Code should pass with no warnings
■ Modern, static source code analyzers can catch a large percentage
of errors with few false positives
Constant pointers
Used for static arrays
■ Square brackets used to denote arrays
■ Symbol that points to a fixed location in memory
char amsg[ ] = “This is a test”; This is a test\0