Buffer Overflows in C
Vulnerabilities, Attacks, and Mitigations
|
Prof. David Bernstein
James Madison University
|
|
Computer Science Department
|
bernstdh@jmu.edu
|
Getting Started
- Definition:
- A buffer overflow (or overrun) is a situation
in which a program uses locations adjacent to a
buffer (i.e., beyond
one or both of the boundaries of a buffer).
- An Issue:
- People frequently limit the definition of a buffer
overflow to situations in which data is written
to locations adjacent to the buffer
- We will include both reading and writing
since reading beyond the
boundary can lead to harm (e.g., confidentiality and
availability)
- Common (in Memory) Buffer Overflows in C:
- When explicitly using an array
- When using a string (i.e., implicitly using an array)
Vulnerabilities when Using Arrays - Length Faults
- A Common Idiom:
- Pass an array and its length as different parameters
(e.g., main(int argc, char* argv[])
)
- The Problem:
- A length that is too large might be passed in (or the length might
be increased locally)
Vulnerabilities when Using Arrays - Length Faults (cont.)
- Another Common Idiom:
- The Problem:
-
array
is a parameter and, so, is a
pointer type
- Hence,
sizeof(array)
is the size of an
int *
, not the size of the array
Vulnerabilities when Using Arrays - Sentinel Faults
- A Common Idiom:
- Use a special value to indicate the last element of the
array (e.g.,
-1
in an array of non-negative
integers)
- The Problem:
- The sentinel can be omitted or misspecified
Vulnerabilities when Using Strings
- Recall:
- C has character types
char
and
wchar_t
- A string in C is a contigous sequence of characters
terminated by (and including) a sentinel character
(the null character
'\0'
)
- Data Structure:
- Strings are stored in arrays
- The length of the string is (i.e., should be) at least one less
than the length of the array
Vulnerabilities when Using Strings - gets()
- A Common Practice:
- Use
gets()
to read characters (until it reaches a
newline or end-of-stream character)
- An Example:
- The Problem:
- An attacker can enter an indeterminate number of characters
making it impossible to ensure that the buffer is large enough
Vulnerabilities when Using Strings - strcpy()
- A Common Practice:
- Copy a source string into a target/destination string
using
strcpy()
- An Example:
- The Problem:
- The attacker can supply a string that is too long (i.e.,
a file name that is longer than 64 characters in this
example)
Vulnerabilities when Using Strings - strcat()
- A Common Practice:
- Use
strcat()
to append a source string to
a target string
- The Problem:
- The target might not be long enough to hold the result
Vulnerabilities when Using Strings - sprintf()
- A Common Practice:
- Use
sprintf()
to create a (formatted) string
- An Example:
- The Problem:
-
sprintf()
assumes that the buffer it is passed
is large enough and the attacker might provide a string that
is too long (i.e., longer than 80 - 2 - 2 - 1 = 75
characters in this example)
Vulnerabilities when Using Strings - Null Termination
- Some Common Practices:
- Not allocating memory for the null character (i.e., an off-by-one
error when calculating the size)
- Forgetting to include the null character (though there is
space for it)
- Using
strncpy()
to copy the first
n
characters
from the source to the target when the source contains
n
or more characters (which results in a non-null
terminated string)
- Some Problems That Arise:
-
strlen()
uses the null character to determine the
length of the string so any function that uses
strlen()
will have problems
- Other functions (e.g.,
strcpy()
) iterate until
a null character is encountered so will have problems
Vulnerabilities when Using Strings - Null Termination (cont.)
An Example
char a[10], b[10];
strncpy(a, "0123456789", 10); // a will not be null-terminated
strcpy(b, a); // b will probably overflow
Threats - Memory Corruption
- The Issue:
- By overflowing a buffer the attacker can corrupt
memory that is being used to store information of
various kinds
- A Common Attack Vector:
Threats - Arbitrary Memory Writes
- The Issue:
- By overflowing a buffer the attacker can corrupt
the value of a pointer and, if the pointer is used in an
assignment statement, corrupt the arbitrary address it
now points to
- Sketch of an Example:
Threats - Corrupted Function Pointers
- The Issue:
- By overflowing a buffer the attacker can corrupt
the value of function pointer and, if the function pointer
is used subsequently, transfer control to arbitrary code
- Sketch of an Example:
-
...
static int value = ...;
static char buffer[BUFFER_SIZE];
static void (*f)(int i);
f = &some_function;
strncpy(buffer, argv[1], length); // Potential overflow into f
(void)(*f)(value); // Execute the code pointed to by f
...
Threats - Stack Smashing
- The Issue:
- By overflowing a buffer the attacker can overwrite data in the
memory allocated to the execution stack
- Ramifications:
- The values of automatic variables can be modified
- Program execution can be terminated
- Arbitrary code can be executed
Attacks - Data Integrity
- A Memory Corruption Example:
-
cexamples/bufferoverflow/windows/string_overflow_data.c
- A Sample Windows/Intelx86 Execution:
-
Address of eid: 0x0804a030
-
Address of grade: 0x0804a024
Attacks - Data Integrity (cont.)
- Notes:
- Though
eid
only requires 9 bytes, space
actually exists for 12 (for alignment reasons)
- Intelx86 is little-endian
- Memory (Before and After):
Attacks - Program Termination/Availability
An Example
cexamples/bufferoverflow/smash.c
Attacks - Program Termination/Availability (cont.)
A Partial Result of Executing the Program (MS Visual C/Intelx86)
foo: 00401000
bar: 00401045
Stack (Before):
00000000
00000000
7FFDF000
0012FF80
0040108A
00410EDE
Interpreting the Stack
If you compile using gcc -S smash.c
a file containing
the assembly language code (named smash.s
) will be generated.
If you compile using gcc -o smash smash.c
and then
run gdb smash
you can use the command
disassemble main
to see the assembly code that will be
executed (i.e., with resolved addresses).
Attacks - Program Termination/Availability (cont.)
Passing "Hello"
Output Interpretation
foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x00000000
0x00000000
0x7ffdf000
0x0012ff80
0x0040108a The return address for foo()
0x00410ede
Stack (After):
0x6c6c6548 lleH
0x0000006f o
0x7ffdf000
0x0012ff80
0x0040108a The return address for foo()
0x00410ede
Attacks - Program Termination/Availability (cont.)
Passing "AAAAAAAAAAAAAAAAAAAAAAA"
Output Interpretation
foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x00000000
0x00000000
0x7ffdf000
0x0012ff80
0x0040108a The return address for foo()
0x00410ede
Stack (After):
0x41414141 AAAA
0x41414141 AAAA
0x41414141 AAAA
0x41414141 AAAA
0x41414141 AAAA (And the return address for foo()!)
0x41414141 AAAA
Attacks - Program Termination/Availability (cont.)
- The Result:
- The program terminates abnormally
- Possible Reasons for an Abnormal Termination:
- The new address is invalid
- Memory at that address does not contain a valid CPU instruction
- The CPU registers are not setup for the proper
execution of the instruction
- Memory at that address is not executable
Attacks - Execution Integrity
- An Observation:
- In the attack above, "random" characters were used to
overflow the buffer
- A Question:
- Can the buffer overflow be used to change the
return to elsewhere in the program?
- The Answer:
- Yes, if I can pass untypable characters (which I can using
another program).
Attacks - Execution Integrity (cont.)
Passing "StaticOverrun ABCDEFGHIJKLMNOP\x45\x10\x40"
Output Interpretation
foo: 0x00401000
bar: 0x00401045
Stack (Before):
0x77fb80db
0x77f94e68
0x7ffdf000
0x0012ff80
0x0040108a The return address for foo()
0x00410eca
Stack (After):
0x44434241 DCBA
0x48474645 HGFE
0x4c4b4a49 LKJI
0x504f4e4d PONM
0x00401045 The address of bar() (and the return address for foo()!)
0x00410eca
Attacks - Arc Injection
- Question:
- Can control flow changes like those above be used to cause
other kinds of harm?
- The Answer:
- Yes, any code that exists in process memory can be executed
(e.g.,
system()
or exec()
).
Attacks - Code Injection
- A Question:
- Is it possible to execute arbitrary code (i.e., code that
isn't in process memory)?
- The Answer:
- Yes, by using the buffer overflow to put
instructions (often called shellcode for
historical reasons) on the stack.
Attacks - Gadget Injection
- A Question:
- Isn't code injection difficult?
- An Answer:
- It requires time, effort, and a knowledge of assembly
language.
- Another Question:
- The Answer:
- Unfortunately, yes, using gadgets (or return-oriented
programming).
Attacks - Gadget Injection (cont.)
- What Are Gadgets?
- A gadget is a useful sequence of instructions.
- Are They Powerful?
- A Turing-complete set of gadgets has been written for
the IntelX86 architecture (i.e., any arbitrary program can be
written using gadgets).
- How Are They Used?
- A compiler can be used to create the gadgets from a
higher-level language.
Mitigation - Array Handling
- Use a Single, Consistent Design:
- Caller allocates, caller frees
(C99, C11 Annex K)
- Callee allocates, caller frees
- Callee allocates, callee frees
(C++
std::basic_string
)
- Copying:
- Avoid
memcpy()
and use
memcpy_s() instead (C11) -- specifies the size and
reports violations with the return value; reports
NULL
destination; reports the
copying of overlapping segments
- Moving:
- Avoid
memmove()
and use
memmove_s() instead (C11) -- specifies the size and
reports violations with the return value; reports
NULL
destinations
Mitigation - String Handling
- Use a Single, Consistent Design:
- Caller allocates, caller frees
(C99, C11 Annex K)
- Callee allocates, caller frees
- Callee allocates, callee frees
(C++
std::basic_string
)
- Use the Right Functions (that Include the Maximum Size):
- Never use
gets()
, use fgets()
and/or getchar()
(which are still vulnerable,
but better) or gets_s()
(C11)
- Avoid
strcpy()
and strncpy()
(which have different vulnerabilities) and use
strcpy_s()
instead (C11)
- Avoid
strcat()
and strncat()
(which have different vulnerabilities) and use
strcat_s()
instead (C11)
Runtime Mitigation - Validation/Sanitization
- An Observation:
- It is awkward to validate data in every function (and,
hence, is unlikely to be done in practice)
- A Reasonable Approach:
- Any data that crosses a trust boundary
should be validated
- Example Sources:
- Command-line arguments
- Environment variables
- Sockets
- Files
- Pipes
- Signals
- Shared Memory
- Devices
Runtime Mitigation - Size Checking
- An Observation:
- The size of some entites can be determined reliably
- Obtaining this Information:
-
size_t __builtin_object_size(void* ptr, int type)
- Error Codes:
- Returns 0 (if type is in {0,1}) or -1 or (if type is in {2,3})
if the size can't be determined
Runtime Mitigation - Size Checking (cont.)
- Some Details:
- The pointer needn't point to the start of the entity,
it can point to a member
- The value of
type
determines what is considered
the "end of" the entity
- An Example:
- Behavior:
- If
type
is 0 or 2 then
size_t __builtin_object_size(p, type)
returns
sizeof(int)
plus 20 (i.e., the size to the end
of a
; the maximum remaining size)
- If
type
is 1 or 3 then
size_t __builtin_object_size(p, type)
returns
sizeof(int)
(i.e., the size to the end
of a.id
; the minimum remaining size)
Runtime Mitigation - Size Checking (cont.)
- Using Size Checking "By Hand":
- Many functions that operate on arrays can make use of size
checking to provide some protection
- Using Size Checking More Widely:
- In GCC, many string functions will use size checking if the
symbol
_FORTIFY_SOURCE
is defined
Runtime Mitigation - Stack Canaries
- The Goal:
- Protect the return address on the stack from being written to
- The Approach:
- Write a value that is difficult to insert/spoof to an
address "before" that of the memory being protected
- The Analogy:
- Like a canary in a coal mine, the special value would
be "killed" by any attempt to write to the memory
being protected
Runtime Mitigation - Stack Canaries (cont.)
- Values that are Difficult to Insert:
- Values that are Difficult to Spoof:
- 32-bit random number
- 32-bit random number XORed with the return address
Runtime Mitigation - Stack Canaries (cont.)
- Stack Canaries in GCC (SSP/ProPolice):
- Use the
-fstack-protector-all
flag
- Note: This tool also changes the organization of the stack
(i.e., places the canary "after" arrays)
- Stack Canaries in Visual C:
Environment-Based Mitigation - Addess Space Layout Randomization
- Purpose:
- Prevent the execution of arbitrary code
- How It Works:
- Randomizing the addresses of stack pages (e.g., using complete
randomization or a randomly-sized gap) makes it more
difficult for attackers to predict the addresses (e.g., of
shellcode, system functions, and/or gadgets)
that they want to return to
- Examples:
- Linux (Debian since 2007, Ubuntu since 2008): See
sysctl -w kernel.randomize_va_space
- MS Windows (Since Vista): See the
/DYNAMICBASE
linker option
Environment-Based Mitigation - Nonexecutable Stack
- Purpose:
- Prevent the execution of malicious code from the stack
- Limitations:
- Doesn't prevent the execution of malicious code from
the heap
- Doesn't prevent the execution of malicious code from
data segments
Environment-Based Mitigation - W^X
- Purpose:
- Prevent the execution of malicious code
- How it Works:
- Parts of the process memory space are marked as writable (W)
or executable (X), but not both (e.g., W xor X)
- Hardware Support:
- Include the ability to mark memory pages as data only,
disabling the execution of code in those pages
Environment-Based Mitigation - W^X (cont.)
- Hardware Support (cont.):
- AMD - NoeXecute (NX) bit
- Intel - eXecute Disable (XD) bit
- ARM - eXecute Never (XN) bit
- Operating System Support:
- MS Windows - Data Execution Prevention (DEP) using
/NXCOMPAT
- Linux - PaX
Environment-Based Mitigation - Other Things
- Oracle - Silicon Secured Memory (SSM):
- 64-bit pointers use some of the bits for a "color"
and the chip checks to ensure that it points to memory
of the appropriate "color"
- Sometimes called Application Data Integrity (ADI)
Experimenting with Buffer Overflows
- An Observation:
- There are now many compile-time and run-time mitigation
strategies in place to help protect against buffer
overflow vulnerabilities
- One Implication:
- You may need to temporarily disable some of these
mitigation strategies while experimenting
- Examples:
- gcc:
-fno-stack-protector
- gcc:
-fno-defer-pop
- shell:
su echo 0 > /proc/sys/kernel/randomize_va_space
(it normally contains the value 2)
- shell:
execstack -s
- Visual C: Don't use
/GS
There's Always More to Learn