A reminder and exploration of low-level software basics: building, debugging, exploring and (a very little bit of) tampering.

Preparation | Procedure

1. Preparation

Complete the following steps before coming to the lab on Thursday. For this week, you do not need to submit your pre-lab work, but you are very welcome to ask questions if anything doesn’t make sense to you.

  1. [optional] Identify a lab partner[1].

  2. Ensure that you are able to log into a LabNet Linux image.

  3. Download, compile and run some of the C and C++ examples from Lecture 2. Ensure that you understand their outputs.

  4. Explore some of the available text editors on our LabNet machines. You’re probably quite used to a full-featured IDE such as Visual Studio Code, but it’s good to also be familiar with command-line tools that tend to be available in more restricted environments than their graphical counterparts.

    Vim / gVim

    This is the classic modal editor that I use in class.

    If you haven’t used Vim before, you might like to learn a bit about it by running vimtutor at the command line (a great place to start!), via the OpenVim interactive tutorial or perhaps by a less interactive tutorial.

    Vim is very configurable via commands and startup scripts. Once you get into it, you’ll be able to customize things very much to your liking. For example, here’s my Vim config: my .vimrc file for all Vim instances and my .gvimrc file for graphical instances)

    Emacs

    This is another classic, popular editor among programmers. You can run emacs at the command line or use a graphical version. Options include:

    Visual Studio Code

    A cross-platform editor from Microsoft. It should be installed and ready for use with at least Python, as we use it in ENGI 1020. You may need to install additional extensions for C, LLDB, etc.

    gedit

    This is a general-purpose text editor for open-source platforms. It’s included with the GNOME desktop environment (the default graphical environment for Ubuntu) and runs on FreeBSD, Linux and other Unix-like platforms.

    Others

    There may be other editors installed on our LabNet image, too…​ perhaps Kate or Atom or others?

2. Procedure

Complete the following procedure, recording all commands that you execute and their outputs.

2.1. Compilation

  1. Download product.cpp and compile it using the following command:

    $ c++ -g -c product.cpp

    What file is generated?

  2. Inspect this file by passing it to the nm program. Find the symbol for the product function within the nm output (hint: you may find the grep utility to be helpful…​ run man grep for more information about this utility). What is this symbol called?

  3. Read the DESCRIPTION section of the manual page for the cfilt` program (run `man cfilt). Use c++filt to demangle this C++ name into a function signature. What is this signature?

  4. What is the symbol name for the main function? Even within a C++ file, main is named according to C symbol name rules. How does this differ from the C++ symbol for the product function?

  5. Download Makefile to the same directory, ensuring that its name is Makefile after it is saved (not Makefile.txt, just plain Makefile). Run make product to build an executable binary called product from your product.o object file. What commands does make execute for you?

  6. Use the command objdump -S to disassemble the object file product.o. Pipe the result through c++filt and save it to a file product.o.dump (hint: the > character can be used to redirect output from a command into a file). Within this file, find the disassembly of the product function. What is its numeric offset within the object file? What else do you observe about the function?

  7. Re-generate the dump file above, this time using the option --disassemble=SYMBOL, where SYMBOL is the C++-mangled symbol name for the product function. How does this dump file differ from the previous dump file?

  8. Use objdump to disassemble the product function within the product binary and save the (C++-demangled) output to a file product.dump. What is different about the dumps of the object file and the full binary?

    Details

    There are a few ways that you can compare these two dump files:

    1. Visual inspection in side-by-side text editors (sounds like a lot of work!)

    2. The diff command, i.e., diff product.o.dump product.dump. Personally I like to use the "unified diff" format (-u option), but YMMV.

    3. Fancy word-granual coloured diffs at the command line:

      $ wdiff -n product.o.dump product.dump | colordiff
    4. A GUI tool like Meld (which is available in our LabNet Ubuntu image)

  9. Run the product program with no command-line arguments. What do you observe?

2.2. Debugging with symbols

  1. Use the product program to compute the product of the numbers 1, 2 and 3. What do you observe?

  2. Run product under the LLDB debugger by executing lldb ./product, then executing the run command within LLDB. What do you observe? Exit using exit.

  3. Run product under the debugger with command-line arguments:

    $ lldb -- ./product 1 2 3

    What do you observe when you run the program?

  4. Run help break set to explore the syntax of setting a breakpoint. How can you set a breakpoint at the beginning of the product function?

  5. Set a breakpoint at the beginning of the product function and re-run the program from the start (with run). Use the bt command to get a backtrace of the current call stack. What is the next instruction to execute? Verify this using the register read command.

  6. Use the frame variable command to print all local variables. Where does the numbers pointer point to?

  7. Use the command expr &numbers to get the address of the numbers pointer (i.e., the address of the variable containing the pointer to the array). What is this address (of the numbers pointer)?

  8. Use the memory read command to output the pointer value held in the numbers variable, and then the contents of the array pointed at by numbers. For example, if you wanted to output 64 B of memory starting at address 0x00007ff7bfeff000, you would run:

    (lldb) mem read 0x00007ff7bfeff000 0x00007ff7bfeff000+64
  9. Using these addresses and memory contents, draw a diagram showing how numbers points to location in memory, and how that memory contains the integer values derived from the program’s command line arguments. Include all relevant memory addresses in your diagram.

  10. Run the continue command three times to pause the program at the fourth invocation of product. What is the address of each instruction in the call stack? Find these instructions in product.dump and include them in your report.

  11. What address does numbers point to now? What is the address of the numbers pointer?

  12. Use the memory read command to print the contents of 384 B (0x180 B) of stack memory, starting at the address of numbers.

    1. What value is contained at the address of numbers?

    2. Identify all parameters and return address within the portion of stack memory that you printed.

2.3. Debugging in hard mode

  1. Edit Makefile to remove the -g flag from CPPFLAGS and LDFLAGS. Clean the build directory (run make clean), then run make to re-build everything. How does the size of the new product.o compare with the previous one?

  2. Download and save the executable binary game.

    1. How does its nm output compare to that the of C++ program above?

    2. How does its objdump output compare to that the of C++ program above?

  3. Use the strings program to inspect all of the string literals in game. What is the secret word?

  4. Run the game program and make a guess at the secret number. What do you observe?

  5. Using your objdump output, set a breakpoint at the beginning of the main function. run until you hit the breakpoint. When the game prompts you, enter the guess 4660 (0x1234).

  6. Run the disas command to disassemble the current function. Locate the address of the cmp instruction just before the call to the play_game function (this comparison is part of the conditional check that will either allow you to play the game or not, depending on whether or not you guessed the secret number). Set a breakpoint at this address and continue execution.

    1. When the game pauses at your breakpoint, what is the value of rbp?

    2. What is the value of eax?

    3. What is the value of rbp minus the offset shown in the current instruction (e.g., -0x10(%rbp) means "the value of rbp minus 0x10")?

  7. Use the mem read command to read the four bytes of memory at the indicated offset from rbp. What are these four bytes (in hexadecimal representation)? What is the decimal representation of this integer?

  8. Use the mem write command (see help mem write for information) to modify the secret value in memory to match your already-inputted guess (1). Demonstrate that you have bypassed the "secret number" check.

  9. Having learned the program’s "secret value" above, re-run the program and input the correct guess. Demonstrate that this works.

  10. Bonus: without modifying any source code, modify the product numbers program from the previous section to fix its bug.

2.4. Stack smashing

Graduate students (ENGI 9807) are expected to complete this section of the lab. Undergraduate students (ECE 7420) may complete it for extra credit but are not required to do so.

Use the techniques we covered in lecture 2 to smash the stack of the demo program that was presented in that lecture. You may find it helpful to start a shell with ASLR disabled by running the setarch command:

[[email protected]]$ setarch -R sh
$ ./foo    # whatever you run now will have ASLR disabled

1. You are free to work with a different lab partner every week, or the same partner, whichever you prefer.