Tutorials
Organisation
mmushell/architectures/
: various architectures parsers and a generic onemmushell/mmushell.py
: main script allowing to reconstruct virtual address spaces from a memory dump, more instructions belowmmushell/exporter.py
: this is a POC showing the possible use of techniques to perform a preliminary analysis of a dump by exporting each virtual address space as a self-contained ELF Core dump file. See section TOWARDS OS AGNOSTIC MEMORY FORENSICS.converter.py
: export dump to be used in Fossil. It adds CPU registers and convert the kernel physical address space in virtual address space one. Note: you can ignore this script, is not part of mmushellqemu/
: contains scripts and patch necessary to get ground truth registers values from an emulated system
Quick installation
On a standard Linux distribution :
$ python -m venv --system-site-packages --symlinks venv
$ venv/bin/pip install -r requirements.txt
On Nix/NixOS :
$ nix develop
# or with direnv
$ direnv allow .
Usage
Dataset
Here part of the dataset containing the memory dumps of the OSs used in the paper (only the open-source ones, due to license restrictions).
In each archive there are a minimum of 4 files and require at least 4GB of free space (decompressed):
-
XXX.regs
: contains the values of the registers collected by QEMU during the execution (the ground truth), pickle format, to be used (optionally) with --gtruth option of mmushell -
XXX.yaml
: contains the hardware configuration of the machine which has run the OS, YAML file, to be used as argument of mmushell -
XXX.dump.Y
: chunk of the RAM dump of the machine -
XXX.lzma
: an mmushell session file, it contains the output of mmushell, pickle lzma format, to be used (optionally) with--session
option of mmushell
The use of XXX.lzma
allows to avoid reexecuting the parsing and data structure reconstructing phase, gaining time!
CLI
MMUShell must be run in the folder containing dump/configuration files as all the paths are relatives.
Warning
Some OSs require a minimum of 32GB of RAM to be parsed (the Intel 32bit ones, in particular HaikuOS) or a minimum of 1 hour of execution (independently by the number of the CPU cores, Intel 32/PAE/IA64 OSs)
Consider using the session file for them to gain time.
Help :
$ mmushell.py
usage: mmushell.py [-h] [--gtruth GTRUTH] [--session SESSION] [--debug] MACHINE_CONFIG
mmushell.py: error: the following arguments are required: MACHINE_CONFIG
- Dump all the RAM areas of the machine that you want to analyze in raw format, one file per physical memory area.
-
Create a YAML file describing the hardware configuration of the machine (see the examples available in the dataset) The format is the following :
cpu: # Architecture type architecture: (aarch64|arm|intel|mips|ppc|riscv) # Endianness level endianness: (big|little) # Bits used by architecture bits: (32|64) mmu: # MMU mode varying from architectures mode: (ia64|ia32|pae|sv32|sv39|sv48|ppc32|mips32|Short|Long) # any class that inherits from MMU # ^^^^^^^^^ ^^^^^^^^^^ ^^ ^^^ ^^^^^^ # intel riscv ppc mips arm memspace: # Physical RAM space region ram: - start: 0x0000000080000000 # ram start address end: 0x000000017fffffff # ram end address dumpfile: linux.dump # Physical memory regions that are not RAM # Example: reserved regions for MMIO, ROM, ... See https://en.wikipedia.org/wiki/Memory-mapped_I/O_and_port-mapped_I/O#Examples # Those portions are needed because page tables also maps these special physical addresses, so the CPU can use these associated # virtual addresses to write or read from them. We need to distinguish them otherwise we can misinterpret some page tables as data pages. not_ram: - start: 0x0000000000000000 end: 0x0000000000011fff - start: 0x0000000000100000 end: 0x0000000000101023 # ...
-
Launch mmushell with the configuration file. Example with the provided RISC-V SV39 memory dump :
Use the interactive shell to find MMU registers, Radix-Trees, Hash tables etc. and explore them. The$ mmushell.py dataset/riscv/sv39/linux/linux.yaml MMUShell. Type help or ? to list commands. [MMUShell riscv]# ? Documented commands (type help <topic>): ======================================== exit help parse_memory show_radix_trees find_radix_trees ipython save_data show_table
help
command lists all the possible actions available for the selected CPU architecture.
Ground truth
XXXX_gtruth
commands are available only if you load a XXX.regs
file as they compare found results with the ground truth.
These commands have an equivalent command which show only the results found by MMUShell without comparing them with the ground truth.
Note
The folder qemu/
contains the patch for QEMU 5.0.0 in order to collect the ground truth values of the MMU registers during OSs execution. Please read the concerned README in qemu/README.md.
Notes and procedures
As mmushell available commands differs from one architecture to another, here are different steps needed to be performed in order.
Note
Steps prefixed with "*" are necessary only if you don't use session files (e.g: XXX.lzma
).
RISC-V
- *
parse_memory
: find MMU tables - *
find_radix_trees
: reconstruct radix trees show_radix_trees_gtruth
: compare radix trees found with the ground truth
MIPS
- *
parse_memory
-> find MMU opcodes - *
find_registers_values
-> perform dataflow analysis and retrieve registers values show_registers_gtruth
-> compare registers found with the ground truth
PowerPC
- *
parse_memory
-> find MMU opcodes and hash tables - *
find_registers_values
-> perform dataflow analisys and retrieve registers values show_hashtables_gtruth
-> compare the hash table found with the ground truthshow_registers_gtruth
-> compare the registers found with the ground truth
Note
For Linux Debian, Mac OS X, Morphos : show_hashtables_gtruth
shows another Hash Table not retrieved by MMUShell, but it is a table used during startup (as shown by the timestamp) and we ignore it because it does not used during normal OS operation.
For Mac OS X : ignore BAT registers values in show_registers_gtruth
as it uses different values for each process (as shown by the ground truth), the falses and positives results are purely a coincidence.
Intel
- *
parse_memory
-> look for tables and IDT - *
find_radix_trees
-> reconstruct radix trees (could be slow) - *
show_idtrs_gtruth
-> show a comparization between true and found IDTs (note: some OSs define multiple IDTs, one per core). We deliberately ignore IDT table used during the boot phase (see PowerPC notes) show_radix_trees_gtruth XXXX
-> where XXX must be the PHYSICAL address of the last IDT used by the system shown by show_idtrs_gtruth". Shows a comparization between true and found radix trees which resolve the IDT PHYSICAL ADDRESS XXXX (obtained by the previous command, for our statistics we always use a true positive one)
Note
For BarrellfishOS, OmniOS : those allocate a different IDT for every single CPU core. Some processes are core specific and are able to resolve only the IDT of the same core. For each IDT found, MMUShell shows different proccesses as FN. They are not real false negatives because are the per-core processes which are not able to resolve that specific IDT but are found by MMUShell using one among the other IDT.
For RCore : if you enter the physical address of the real IDT used by the system in show_radix_trees_gtruth
, MMUShell does not show any entry because it has found a different IDT (a FP) and has no valid radix trees for the real IDT. Please use show_idtrs
to show the IDT found (YYY) and show_radix_trees XXXX
to show the radix trees associated (all FP).
ARM
- *
find_registers_values
-> perform dataflow analysis to recover TTBCR value - *
show_registers_gtruth
-> compare the values retrieved with the ground truth - *
set_ttbcr XXX
-> use REAL value of the TTBCR shown by the previous command - *
find_tables
-> find MMU tables - *
find_radix_trees
-> reconstruct radix trees show_radix_trees_gtruth
-> compare radix trees found with the ground truth
AArch64
- *
find_registers_values
-> perform dataflow analysis to recover TCR value - *
show_registers_gtruth
-> compare the values retrieved with the ground truth - *
set_tcr XXX
-> use REAL value of the TCR shown by the previous command - *
find_tables
-> find MMU tables - *
find_radix_trees
-> reconstruct radix trees show_radix_trees_gtruth
-> compare radix trees found with the ground truth