Buffer Overflows in C Vulnerabilities, Attacks, and Mitigations Prof. David Bernstein James Madison University Computer Science Department bernstdh@jmu.edu

Getting Started
• Definition:
• A buffer overflow (or overrun) is a situation in which a program uses locations adjacent to a buffer (i.e., beyond one or both of the boundaries of a buffer).
• An Issue:
• People frequently limit the definition of a buffer overflow to situations in which data is written to locations adjacent to the buffer
• We will include both reading and writing since reading beyond the boundary can lead to harm (e.g., confidentiality and availability)
• Common (in Memory) Buffer Overflows in C:
• When explicitly using an array
• When using a string (i.e., implicitly using an array)
Vulnerabilities when Using Arrays - Length Faults
• A Common Idiom:
• Pass an array and its length as different parameters
(e.g., main(int argc, char* argv[]))
• The Problem:
• A length that is too large might be passed in (or the length might be increased locally)
Vulnerabilities when Using Arrays - Length Faults (cont.)
• Another Common Idiom:
• void operation(int array[])
{
int length = sizeof(array) / sizeof(array[0]);
for (int i=0; i<length; i++)
{
// Operate on array[i]
}
}

• The Problem:
• array is a parameter and, so, is a pointer type
• Hence, sizeof(array) is the size of an int *, not the size of the array
Vulnerabilities when Using Arrays - Sentinel Faults
• A Common Idiom:
• Use a special value to indicate the last element of the array (e.g., -1 in an array of non-negative integers)
• The Problem:
• The sentinel can be omitted or misspecified
Vulnerabilities when Using Strings
• Recall:
• C has character types char and wchar_t
• A string in C is a contigous sequence of characters terminated by (and including) a sentinel character (the null character '\0')
• Data Structure:
• Strings are stored in arrays
• The length of the string is (i.e., should be) at least one less than the length of the array
Vulnerabilities when Using Strings - gets()
• A Common Practice:
• Use gets() to read characters (until it reaches a newline or end-of-stream character)
• An Example:
• char     line[81];

gets(line);

• The Problem:
• An attacker can enter an indeterminate number of characters making it impossible to ensure that the buffer is large enough
Vulnerabilities when Using Strings - strcpy()
• A Common Practice:
• Copy a source string into a target/destination string using strcpy()
• An Example:
• int main(int argc, char* argv[])
{
char    file_name[65];
char   *temp;

temp = argv[1] ? argv[1] : "";
strcpy(file_name, temp);
}

• The Problem:
• The attacker can supply a string that is too long (i.e., a file name that is longer than 64 characters in this example)
Vulnerabilities when Using Strings - strcat()
• A Common Practice:
• Use strcat() to append a source string to a target string
• The Problem:
• The target might not be long enough to hold the result
Vulnerabilities when Using Strings - sprintf()
• A Common Practice:
• Use sprintf() to create a (formatted) string
• An Example:
• char     line[81];

sprintf(line, "%2d: %s\n", i, user_input);

• The Problem:
• sprintf() assumes that the buffer it is passed is large enough and the attacker might provide a string that is too long (i.e., longer than 80 - 2 - 2 - 1 = 75 characters in this example)
Vulnerabilities when Using Strings - Null Termination
• Some Common Practices:
• Not allocating memory for the null character (i.e., an off-by-one error when calculating the size)
• Forgetting to include the null character (though there is space for it)
• Using strncpy() to copy the first n characters from the source to the target when the source contains n or more characters (which results in a non-null terminated string)
• Some Problems That Arise:
• strlen() uses the null character to determine the length of the string so any function that uses strlen() will have problems
• Other functions (e.g., strcpy()) iterate until a null character is encountered so will have problems
Vulnerabilities when Using Strings - Null Termination (cont.)

An Example

char     a[10], b[10];

strncpy(a, "0123456789", 10); // a will not be null-terminated

strcpy(b, a);                 // b will probably overflow

Threats - Memory Corruption
• The Issue:
• By overflowing a buffer the attacker can corrupt memory that is being used to store information of various kinds
• A Common Attack Vector:
• User input
Threats - Arbitrary Memory Writes
• The Issue:
• By overflowing a buffer the attacker can corrupt the value of a pointer and, if the pointer is used in an assignment statement, corrupt the arbitrary address it now points to
• Sketch of an Example:
• ...
char  buffer[BUFFER_SIZE];
long  value = ...;
long* p     = ...;
strncpy(buffer, argv[1], length); // Potential overflow into p
*p = value;                       // Assign value to the address pointed to by p
...

Threats - Corrupted Function Pointers
• The Issue:
• By overflowing a buffer the attacker can corrupt the value of function pointer and, if the function pointer is used subsequently, transfer control to arbitrary code
• Sketch of an Example:
• ...
static int   value = ...;
static char  buffer[BUFFER_SIZE];
static void  (*f)(int i);
f = &some_function;
strncpy(buffer, argv[1], length); // Potential overflow into f
(void)(*f)(value);                // Execute the code pointed to by f
...

Threats - Stack Smashing
• The Issue:
• By overflowing a buffer the attacker can overwrite data in the memory allocated to the execution stack
• Ramifications:
• The values of automatic variables can be modified
• Program execution can be terminated
• Arbitrary code can be executed
Attacks - Data Integrity
• A Memory Corruption Example:
• cexamples/bufferoverflow/windows/string_overflow_data.c
        #include <stdio.h>
#include <string.h>

char    eid[9];    // 8 characters plus '\0'

int main(int argc, char* argv[])
{
// Initialize
strcpy(eid, "bernstdh");

// Copy user input into the eid
strcpy(eid, "bernstdh  \x08\x08");     // 0x08 is ASCII backspace
}

• A Sample Windows/Intelx86 Execution:
• Address of eid: 0x0804a030
• Address of grade: 0x0804a024
Attacks - Data Integrity (cont.)
• Notes:
• Though eid only requires 9 bytes, space actually exists for 12 (for alignment reasons)
• Intelx86 is little-endian
• Memory (Before and After):
Attacks - Program Termination/Availability
An Example
cexamples/bufferoverflow/smash.c

Attacks - Program Termination/Availability (cont.)
A Partial Result of Executing the Program (MS Visual C/Intelx86)
foo: 00401000
bar: 00401045
Stack (Before):
00000000
00000000
7FFDF000
0012FF80
0040108A
00410EDE

Interpreting the Stack

If you compile using gcc -S smash.c a file containing the assembly language code (named smash.s) will be generated.

If you compile using gcc -o smash smash.c and then run gdb smash you can use the command disassemble main to see the assembly code that will be executed (i.e., with resolved addresses).

Attacks - Program Termination/Availability (cont.)
Passing "Hello"
  Output              Interpretation

foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x00000000
0x00000000
0x7ffdf000
0x0012ff80
0x0040108a            The return address for foo()
0x00410ede

Stack (After):
0x6c6c6548                lleH
0x0000006f                   o
0x7ffdf000
0x0012ff80
0x0040108a            The return address for foo()
0x00410ede

Attacks - Program Termination/Availability (cont.)
Passing "AAAAAAAAAAAAAAAAAAAAAAA"
  Output              Interpretation

foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x00000000
0x00000000
0x7ffdf000
0x0012ff80
0x0040108a            The return address for foo()
0x00410ede

Stack (After):
0x41414141                AAAA
0x41414141                AAAA
0x41414141                AAAA
0x41414141                AAAA
0x41414141                AAAA (And the return address for foo()!)
0x41414141                AAAA

Attacks - Program Termination/Availability (cont.)
• The Result:
• The program terminates abnormally
• Possible Reasons for an Abnormal Termination:
• The new address is invalid
• Memory at that address does not contain a valid CPU instruction
• The CPU registers are not setup for the proper execution of the instruction
• Memory at that address is not executable
Attacks - Execution Integrity
• An Observation:
• In the attack above, "random" characters were used to overflow the buffer
• A Question:
• Can the buffer overflow be used to change the return to elsewhere in the program?
• Yes, if I can pass untypable characters (which I can using another program).
Attacks - Execution Integrity (cont.)
Passing "StaticOverrun ABCDEFGHIJKLMNOP\x45\x10\x40"
  Output              Interpretation

foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x77fb80db
0x77f94e68
0x7ffdf000
0x0012ff80
0x0040108a            The return address for foo()
0x00410eca

Stack (After):
0x44434241                DCBA
0x48474645                HGFE
0x4c4b4a49                LKJI
0x504f4e4d                PONM
0x00410eca

Attacks - Arc Injection
• Question:
• Can control flow changes like those above be used to cause other kinds of harm?
• Yes, any code that exists in process memory can be executed (e.g., system() or exec()).
Attacks - Code Injection
• A Question:
• Is it possible to execute arbitrary code (i.e., code that isn't in process memory)?
• Yes, by using the buffer overflow to put instructions (often called shellcode for historical reasons) on the stack.
• A Question:
• Isn't code injection difficult?
• It requires time, effort, and a knowledge of assembly language.
• Another Question:
• Is there another way?
• Unfortunately, yes, using gadgets (or return-oriented programming).
• A gadget is a useful sequence of instructions.
• Are They Powerful?
• A Turing-complete set of gadgets has been written for the IntelX86 architecture (i.e., any arbitrary program can be written using gadgets).
• How Are They Used?
• A compiler can be used to create the gadgets from a higher-level language.
Mitigation - Array Handling
• Use a Single, Consistent Design:
• Caller allocates, caller frees (C99, C11 Annex K)
• Callee allocates, caller frees
• Callee allocates, callee frees (C++ std::basic_string)
• Copying:
• Avoid memcpy() and use memcpy_s() instead (C11) -- specifies the size and reports violations with the return value; reports NULL destination; reports the copying of overlapping segments
• Moving:
• Avoid memmove() and use memmove_s() instead (C11) -- specifies the size and reports violations with the return value; reports NULL destinations
Mitigation - String Handling
• Use a Single, Consistent Design:
• Caller allocates, caller frees (C99, C11 Annex K)
• Callee allocates, caller frees
• Callee allocates, callee frees (C++ std::basic_string)
• Use the Right Functions (that Include the Maximum Size):
• Never use gets(), use fgets() and/or getchar() (which are still vulnerable, but better) or gets_s() (C11)
• Avoid strcpy() and strncpy() (which have different vulnerabilities) and use strcpy_s() instead (C11)
• Avoid strcat() and strncat() (which have different vulnerabilities) and use strcat_s() instead (C11)
Runtime Mitigation - Validation/Sanitization
• An Observation:
• It is awkward to validate data in every function (and, hence, is unlikely to be done in practice)
• A Reasonable Approach:
• Any data that crosses a trust boundary should be validated
• Example Sources:
• Command-line arguments
• Environment variables
• Sockets
• Files
• Pipes
• Signals
• Shared Memory
• Devices
Runtime Mitigation - Size Checking
• An Observation:
• The size of some entites can be determined reliably
• Obtaining this Information:
• size_t __builtin_object_size(void* ptr, int type)
• Error Codes:
• Returns 0 (if type is in {0,1}) or -1 or (if type is in {2,3}) if the size can't be determined
Runtime Mitigation - Size Checking (cont.)
• Some Details:
• The pointer needn't point to the start of the entity, it can point to a member
• The value of type determines what is considered the "end of" the entity
• An Example:
• struct Account {char name[10]; int id; char branch[20];} a;
void* p = &a.id;

• Behavior:
• If type is 0 or 2 then size_t __builtin_object_size(p, type) returns sizeof(int) plus 20 (i.e., the size to the end of a; the maximum remaining size)
• If type is 1 or 3 then size_t __builtin_object_size(p, type) returns sizeof(int) (i.e., the size to the end of a.id; the minimum remaining size)
Runtime Mitigation - Size Checking (cont.)
• Using Size Checking "By Hand":
• Many functions that operate on arrays can make use of size checking to provide some protection
• Using Size Checking More Widely:
• In GCC, many string functions will use size checking if the symbol _FORTIFY_SOURCE is defined
Runtime Mitigation - Stack Canaries
• The Goal:
• Protect the return address on the stack from being written to
• The Approach:
• Write a value that is difficult to insert/spoof to an address "before" that of the memory being protected
• The Analogy:
• Like a canary in a coal mine, the special value would be "killed" by any attempt to write to the memory being protected
Runtime Mitigation - Stack Canaries (cont.)
• Values that are Difficult to Insert:
• CR, LF, NULL
• Values that are Difficult to Spoof:
• 32-bit random number
• 32-bit random number XORed with the return address
Runtime Mitigation - Stack Canaries (cont.)
• Stack Canaries in GCC (SSP/ProPolice):
• Use the -fstack-protector-all flag
• Note: This tool also changes the organization of the stack (i.e., places the canary "after" arrays)
• Stack Canaries in Visual C:
• Use the /GS flag
Environment-Based Mitigation - Addess Space Layout Randomization
• Purpose:
• Prevent the execution of arbitrary code
• How It Works:
• Randomizing the addresses of stack pages (e.g., using complete randomization or a randomly-sized gap) makes it more difficult for attackers to predict the addresses (e.g., of shellcode, system functions, and/or gadgets) that they want to return to
• Examples:
• Linux (Debian since 2007, Ubuntu since 2008): See sysctl -w kernel.randomize_va_space
• MS Windows (Since Vista): See the /DYNAMICBASE linker option
Environment-Based Mitigation - Nonexecutable Stack
• Purpose:
• Prevent the execution of malicious code from the stack
• Limitations:
• Doesn't prevent the execution of malicious code from the heap
• Doesn't prevent the execution of malicious code from data segments
Environment-Based Mitigation - W^X
• Purpose:
• Prevent the execution of malicious code
• How it Works:
• Parts of the process memory space are marked as writable (W) or executable (X), but not both (e.g., W xor X)
• Hardware Support:
• Include the ability to mark memory pages as data only, disabling the execution of code in those pages
Environment-Based Mitigation - W^X (cont.)
• Hardware Support (cont.):
• AMD - NoeXecute (NX) bit
• Intel - eXecute Disable (XD) bit
• ARM - eXecute Never (XN) bit
• Operating System Support:
• MS Windows - Data Execution Prevention (DEP) using /NXCOMPAT
• Linux - PaX
Environment-Based Mitigation - Other Things
• Oracle - Silicon Secured Memory (SSM):
• 64-bit pointers use some of the bits for a "color" and the chip checks to ensure that it points to memory of the appropriate "color"
• Sometimes called Application Data Integrity (ADI)
Experimenting with Buffer Overflows
• An Observation:
• There are now many compile-time and run-time mitigation strategies in place to help protect against buffer overflow vulnerabilities
• One Implication:
• You may need to temporarily disable some of these mitigation strategies while experimenting
• Examples:
• gcc: -fno-stack-protector
• gcc: -fno-defer-pop
• shell: su echo 0 > /proc/sys/kernel/randomize_va_space (it normally contains the value 2)
• shell: execstack -s
• Visual C: Don't use /GS