- Forward


Buffer Overflows in C
Vulnerabilities, Attacks, and Mitigations


Prof. David Bernstein
James Madison University

Computer Science Department
bernstdh@jmu.edu

Print

Getting Started
Back SMYC Forward
  • Definition:
    • A buffer overflow (or overrun) is a situation in which a program uses locations adjacent to a buffer (i.e., beyond one or both of the boundaries of a buffer).
  • An Issue:
    • People frequently limit the definition of a buffer overflow to situations in which data is written to locations adjacent to the buffer
    • We will include both reading and writing since reading beyond the boundary can lead to harm (e.g., confidentiality and availability)
  • Common (in Memory) Buffer Overflows in C:
    • When explicitly using an array
    • When using a string (i.e., implicitly using an array)
Vulnerabilities when Using Arrays - Length Faults
Back SMYC Forward
  • A Common Idiom:
    • Pass an array and its length as different parameters
      (e.g., main(int argc, char* argv[]))
  • The Problem:
    • A length that is too large might be passed in (or the length might be increased locally)
Vulnerabilities when Using Arrays - Length Faults (cont.)
Back SMYC Forward
  • Another Common Idiom:
    • void operation(int array[]) { int length = sizeof(array) / sizeof(array[0]); for (int i=0; i<length; i++) { // Operate on array[i] } }
  • The Problem:
    • array is a parameter and, so, is a pointer type
    • Hence, sizeof(array) is the size of an int *, not the size of the array
Vulnerabilities when Using Arrays - Sentinel Faults
Back SMYC Forward
  • A Common Idiom:
    • Use a special value to indicate the last element of the array (e.g., -1 in an array of non-negative integers)
  • The Problem:
    • The sentinel can be omitted or misspecified
Vulnerabilities when Using Strings
Back SMYC Forward
  • Recall:
    • C has character types char and wchar_t
    • A string in C is a contigous sequence of characters terminated by (and including) a sentinel character (the null character '\0')
  • Data Structure:
    • Strings are stored in arrays
    • The length of the string is (i.e., should be) at least one less than the length of the array
Vulnerabilities when Using Strings - gets()
Back SMYC Forward
  • A Common Practice:
    • Use gets() to read characters (until it reaches a newline or end-of-stream character)
  • An Example:
    • char line[81]; gets(line);
  • The Problem:
    • An attacker can enter an indeterminate number of characters making it impossible to ensure that the buffer is large enough
Vulnerabilities when Using Strings - strcpy()
Back SMYC Forward
  • A Common Practice:
    • Copy a source string into a target/destination string using strcpy()
  • An Example:
    • int main(int argc, char* argv[]) { char file_name[65]; char *temp; temp = argv[1] ? argv[1] : ""; strcpy(file_name, temp); }
  • The Problem:
    • The attacker can supply a string that is too long (i.e., a file name that is longer than 64 characters in this example)
Vulnerabilities when Using Strings - strcat()
Back SMYC Forward
  • A Common Practice:
    • Use strcat() to append a source string to a target string
  • The Problem:
    • The target might not be long enough to hold the result
Vulnerabilities when Using Strings - sprintf()
Back SMYC Forward
  • A Common Practice:
    • Use sprintf() to create a (formatted) string
  • An Example:
    • char line[81]; sprintf(line, "%2d: %s\n", i, user_input);
  • The Problem:
    • sprintf() assumes that the buffer it is passed is large enough and the attacker might provide a string that is too long (i.e., longer than 80 - 2 - 2 - 1 = 75 characters in this example)
Vulnerabilities when Using Strings - Null Termination
Back SMYC Forward
  • Some Common Practices:
    • Not allocating memory for the null character (i.e., an off-by-one error when calculating the size)
    • Forgetting to include the null character (though there is space for it)
    • Using strncpy() to copy the first n characters from the source to the target when the source contains n or more characters (which results in a non-null terminated string)
  • Some Problems That Arise:
    • strlen() uses the null character to determine the length of the string so any function that uses strlen() will have problems
    • Other functions (e.g., strcpy()) iterate until a null character is encountered so will have problems
Vulnerabilities when Using Strings - Null Termination (cont.)
Back SMYC Forward

An Example

char a[10], b[10]; strncpy(a, "0123456789", 10); // a will not be null-terminated strcpy(b, a); // b will probably overflow
Threats - Memory Corruption
Back SMYC Forward
  • The Issue:
    • By overflowing a buffer the attacker can corrupt memory that is being used to store information of various kinds
  • A Common Attack Vector:
    • User input
Threats - Arbitrary Memory Writes
Back SMYC Forward
  • The Issue:
    • By overflowing a buffer the attacker can corrupt the value of a pointer and, if the pointer is used in an assignment statement, corrupt the arbitrary address it now points to
  • Sketch of an Example:
    • ... char buffer[BUFFER_SIZE]; long value = ...; long* p = ...; strncpy(buffer, argv[1], length); // Potential overflow into p *p = value; // Assign value to the address pointed to by p ...
Threats - Corrupted Function Pointers
Back SMYC Forward
  • The Issue:
    • By overflowing a buffer the attacker can corrupt the value of function pointer and, if the function pointer is used subsequently, transfer control to arbitrary code
  • Sketch of an Example:
    • ... static int value = ...; static char buffer[BUFFER_SIZE]; static void (*f)(int i); f = &some_function; strncpy(buffer, argv[1], length); // Potential overflow into f (void)(*f)(value); // Execute the code pointed to by f ...
Threats - Stack Smashing
Back SMYC Forward
  • The Issue:
    • By overflowing a buffer the attacker can overwrite data in the memory allocated to the execution stack
  • Ramifications:
    • The values of automatic variables can be modified
    • Program execution can be terminated
    • Arbitrary code can be executed
Attacks - Data Integrity
Back SMYC Forward
  • A Memory Corruption Example:
    • cexamples/bufferoverflow/windows/string_overflow_data.c
       
  • A Sample Windows/Intelx86 Execution:
    • Address of eid: 0x0804a030
    • Address of grade: 0x0804a024
Attacks - Data Integrity (cont.)
Back SMYC Forward
  • Notes:
    • Though eid only requires 9 bytes, space actually exists for 12 (for alignment reasons)
    • Intelx86 is little-endian
  • Memory (Before and After):
Attacks - Program Termination/Availability
Back SMYC Forward
An Example
cexamples/bufferoverflow/smash.c
 
Attacks - Program Termination/Availability (cont.)
Back SMYC Forward
A Partial Result of Executing the Program (MS Visual C/Intelx86)
foo: 00401000
bar: 00401045
Stack (Before):
00000000
00000000
7FFDF000
0012FF80
0040108A
00410EDE
  
Interpreting the Stack

If you compile using gcc -S smash.c a file containing the assembly language code (named smash.s) will be generated.

If you compile using gcc -o smash smash.c and then run gdb smash you can use the command disassemble main to see the assembly code that will be executed (i.e., with resolved addresses).

Attacks - Program Termination/Availability (cont.)
Back SMYC Forward
Passing "Hello"
  Output              Interpretation

foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x00000000
0x00000000
0x7ffdf000
0x0012ff80
0x0040108a            The return address for foo()
0x00410ede

Stack (After):
0x6c6c6548                lleH
0x0000006f                   o
0x7ffdf000
0x0012ff80
0x0040108a            The return address for foo()
0x00410ede
  
Attacks - Program Termination/Availability (cont.)
Back SMYC Forward
Passing "AAAAAAAAAAAAAAAAAAAAAAA"
  Output              Interpretation

foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x00000000
0x00000000
0x7ffdf000
0x0012ff80
0x0040108a            The return address for foo()
0x00410ede

Stack (After):
0x41414141                AAAA
0x41414141                AAAA
0x41414141                AAAA
0x41414141                AAAA
0x41414141                AAAA (And the return address for foo()!)
0x41414141                AAAA
  
Attacks - Program Termination/Availability (cont.)
Back SMYC Forward
  • The Result:
    • The program terminates abnormally
  • Possible Reasons for an Abnormal Termination:
    • The new address is invalid
    • Memory at that address does not contain a valid CPU instruction
    • The CPU registers are not setup for the proper execution of the instruction
    • Memory at that address is not executable
Attacks - Execution Integrity
Back SMYC Forward
  • An Observation:
    • In the attack above, "random" characters were used to overflow the buffer
  • A Question:
    • Can the buffer overflow be used to change the return to elsewhere in the program?
  • The Answer:
    • Yes, if I can pass untypable characters (which I can using another program).
Attacks - Execution Integrity (cont.)
Back SMYC Forward
Passing "StaticOverrun ABCDEFGHIJKLMNOP\x45\x10\x40"
  Output              Interpretation

foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x77fb80db
0x77f94e68
0x7ffdf000
0x0012ff80
0x0040108a            The return address for foo()
0x00410eca

Stack (After):
0x44434241                DCBA
0x48474645                HGFE
0x4c4b4a49                LKJI
0x504f4e4d                PONM
0x00401045            The address of bar() (and the return address for foo()!)
0x00410eca            
  
Attacks - Arc Injection
Back SMYC Forward
  • Question:
    • Can control flow changes like those above be used to cause other kinds of harm?
  • The Answer:
    • Yes, any code that exists in process memory can be executed (e.g., system() or exec()).
Attacks - Code Injection
Back SMYC Forward
  • A Question:
    • Is it possible to execute arbitrary code (i.e., code that isn't in process memory)?
  • The Answer:
    • Yes, by using the buffer overflow to put instructions (often called shellcode for historical reasons) on the stack.
Attacks - Gadget Injection
Back SMYC Forward
  • A Question:
    • Isn't code injection difficult?
  • An Answer:
    • It requires time, effort, and a knowledge of assembly language.
  • Another Question:
    • Is there another way?
  • The Answer:
    • Unfortunately, yes, using gadgets (or return-oriented programming).
Attacks - Gadget Injection (cont.)
Back SMYC Forward
  • What Are Gadgets?
    • A gadget is a useful sequence of instructions.
  • Are They Powerful?
    • A Turing-complete set of gadgets has been written for the IntelX86 architecture (i.e., any arbitrary program can be written using gadgets).
  • How Are They Used?
    • A compiler can be used to create the gadgets from a higher-level language.
Mitigation - Array Handling
Back SMYC Forward
  • Use a Single, Consistent Design:
    • Caller allocates, caller frees (C99, C11 Annex K)
    • Callee allocates, caller frees
    • Callee allocates, callee frees (C++ std::basic_string)
  • Copying:
    • Avoid memcpy() and use memcpy_s() instead (C11) -- specifies the size and reports violations with the return value; reports NULL destination; reports the copying of overlapping segments
  • Moving:
    • Avoid memmove() and use memmove_s() instead (C11) -- specifies the size and reports violations with the return value; reports NULL destinations
Mitigation - String Handling
Back SMYC Forward
  • Use a Single, Consistent Design:
    • Caller allocates, caller frees (C99, C11 Annex K)
    • Callee allocates, caller frees
    • Callee allocates, callee frees (C++ std::basic_string)
  • Use the Right Functions (that Include the Maximum Size):
    • Never use gets(), use fgets() and/or getchar() (which are still vulnerable, but better) or gets_s() (C11)
    • Avoid strcpy() and strncpy() (which have different vulnerabilities) and use strcpy_s() instead (C11)
    • Avoid strcat() and strncat() (which have different vulnerabilities) and use strcat_s() instead (C11)
Runtime Mitigation - Validation/Sanitization
Back SMYC Forward
  • An Observation:
    • It is awkward to validate data in every function (and, hence, is unlikely to be done in practice)
  • A Reasonable Approach:
    • Any data that crosses a trust boundary should be validated
  • Example Sources:
    • Command-line arguments
    • Environment variables
    • Sockets
    • Files
    • Pipes
    • Signals
    • Shared Memory
    • Devices
Runtime Mitigation - Size Checking
Back SMYC Forward
  • An Observation:
    • The size of some entites can be determined reliably
  • Obtaining this Information:
    • size_t __builtin_object_size(void* ptr, int type)
  • Error Codes:
    • Returns 0 (if type is in {0,1}) or -1 or (if type is in {2,3}) if the size can't be determined
Runtime Mitigation - Size Checking (cont.)
Back SMYC Forward
  • Some Details:
    • The pointer needn't point to the start of the entity, it can point to a member
    • The value of type determines what is considered the "end of" the entity
  • An Example:
    • struct Account {char name[10]; int id; char branch[20];} a; void* p = &a.id;
  • Behavior:
    • If type is 0 or 2 then size_t __builtin_object_size(p, type) returns sizeof(int) plus 20 (i.e., the size to the end of a; the maximum remaining size)
    • If type is 1 or 3 then size_t __builtin_object_size(p, type) returns sizeof(int) (i.e., the size to the end of a.id; the minimum remaining size)
Runtime Mitigation - Size Checking (cont.)
Back SMYC Forward
  • Using Size Checking "By Hand":
    • Many functions that operate on arrays can make use of size checking to provide some protection
  • Using Size Checking More Widely:
    • In GCC, many string functions will use size checking if the symbol _FORTIFY_SOURCE is defined
Runtime Mitigation - Stack Canaries
Back SMYC Forward
  • The Goal:
    • Protect the return address on the stack from being written to
  • The Approach:
    • Write a value that is difficult to insert/spoof to an address "before" that of the memory being protected
  • The Analogy:
    • Like a canary in a coal mine, the special value would be "killed" by any attempt to write to the memory being protected
Runtime Mitigation - Stack Canaries (cont.)
Back SMYC Forward
  • Values that are Difficult to Insert:
    • CR, LF, NULL
  • Values that are Difficult to Spoof:
    • 32-bit random number
    • 32-bit random number XORed with the return address
Runtime Mitigation - Stack Canaries (cont.)
Back SMYC Forward
  • Stack Canaries in GCC (SSP/ProPolice):
    • Use the -fstack-protector-all flag
    • Note: This tool also changes the organization of the stack (i.e., places the canary "after" arrays)
  • Stack Canaries in Visual C:
    • Use the /GS flag
Environment-Based Mitigation - Addess Space Layout Randomization
Back SMYC Forward
  • Purpose:
    • Prevent the execution of arbitrary code
  • How It Works:
    • Randomizing the addresses of stack pages (e.g., using complete randomization or a randomly-sized gap) makes it more difficult for attackers to predict the addresses (e.g., of shellcode, system functions, and/or gadgets) that they want to return to
  • Examples:
    • Linux (Debian since 2007, Ubuntu since 2008): See sysctl -w kernel.randomize_va_space
    • MS Windows (Since Vista): See the /DYNAMICBASE linker option
Environment-Based Mitigation - Nonexecutable Stack
Back SMYC Forward
  • Purpose:
    • Prevent the execution of malicious code from the stack
  • Limitations:
    • Doesn't prevent the execution of malicious code from the heap
    • Doesn't prevent the execution of malicious code from data segments
Environment-Based Mitigation - W^X
Back SMYC Forward
  • Purpose:
    • Prevent the execution of malicious code
  • How it Works:
    • Parts of the process memory space are marked as writable (W) or executable (X), but not both (e.g., W xor X)
  • Hardware Support:
    • Include the ability to mark memory pages as data only, disabling the execution of code in those pages
Environment-Based Mitigation - W^X (cont.)
Back SMYC Forward
  • Hardware Support (cont.):
    • AMD - NoeXecute (NX) bit
    • Intel - eXecute Disable (XD) bit
    • ARM - eXecute Never (XN) bit
  • Operating System Support:
    • MS Windows - Data Execution Prevention (DEP) using /NXCOMPAT
    • Linux - PaX
Environment-Based Mitigation - Other Things
Back SMYC Forward
  • Oracle - Silicon Secured Memory (SSM):
    • 64-bit pointers use some of the bits for a "color" and the chip checks to ensure that it points to memory of the appropriate "color"
    • Sometimes called Application Data Integrity (ADI)
Experimenting with Buffer Overflows
Back SMYC Forward
  • An Observation:
    • There are now many compile-time and run-time mitigation strategies in place to help protect against buffer overflow vulnerabilities
  • One Implication:
    • You may need to temporarily disable some of these mitigation strategies while experimenting
  • Examples:
    • gcc: -fno-stack-protector
    • gcc: -fno-defer-pop
    • shell: su echo 0 > /proc/sys/kernel/randomize_va_space (it normally contains the value 2)
    • shell: execstack -s
    • Visual C: Don't use /GS
There's Always More to Learn
Back -