Using Every Part of the Buffalo in Windows Memory Analysis∗
Jesse D. Kornblum
Principal Computer Forensics Engineer, ManTech SMA
[Link]@[Link]
Abstract activities those processes were doing and the state of ac-
tive network connections.
All Windows memory analysis techniques depend on the An integral part of memory analysis is the examiner’s
examiner’s ability to translate the virtual addresses used ability to translate the virtual addresses that programs and
by programs and operating system components into the most operating system components use into the true loca-
true locations of data in a memory image. In some mem- tions of data in a memory image. Virtual addresses are an
ory images up to 20% of all the virtual addresses in use abstraction mechanism used by many operating systems
point to so called “invalid” pages that cannot be found us- to simplify the memory management system.
ing a naive method for address translation. This paper ex- Until now the virtual address translation process relied
plains virtual address translation, enumerates the different on addresses pointing to data that was in main memory,
states of invalid memory pages, and presents a more ro- used by only one program, not in transition and unmodi-
bust strategy for address translation. This new method in- fied. Memory is divided into pages or frames of 0x1000
corporates invalid pages and even the paging file to greatly bytes each1 . When a page did not meet the above condi-
increase the completeness of the analysis. By using ev- tions, it was said to be “invalid” as it could nofst be used
ery available page, every part of the buffalo as it were, immediately by a program. Despite the name, these pages
the examiner can more accurately recreate the state of the were still accessible to the operating system and thus ig-
machine as it existed at the time of imaging. noring them is a naive method for performing memory
Keywords: Windows, memory analysis, forensics, in- analysis. Incorporating these invalid pages creates a more
valid pages, prototype, pagefile complete picture and, to borrow a phrase, is like using ev-
ery part of the buffalo [10]; taking full advantage of the
available resources.
1 Introduction This paper demonstrates the methods for translating
virtual addresses into physical locations even when they
Memory analysis is a relatively new area of computer
point to invalid pages. These pages can be located in a
forensics in which an examiner attempts to gather infor-
memory image and used during analysis. The paper starts
mation from the the contents of a computer’s memory as
with an introduction to virtual to physical address transla-
captured in a memory image. Information gleaned from
tion and describes the results of the translation process.
memory images can include which processes were run-
Then the six kinds of invalid entries are described fol-
ning, when they were started and by whom, what specific
lowed by a demonstration of how much more data can be
∗ This is the author’s version of a work that was accepted for publi- retrieved from a memory image when the examiner con-
cation in Digital Investigation. Changes resulting from the publishing siders both valid and invalid pages. Finally, some sugges-
process, such as peer review, editing, corrections, structural formatting,
tions for future research are discussed.
and other quality control mechanisms may not be reflected in this docu-
ment. Changes may have been made to this work since it was submitted
for publication. A definitive version will be subsequently published in 1 Each 0x1000 bytes of data constitute a ‘page’ when in memory and
Digital Investigation at [Link] a ‘frame’ when on the disk.
1
2 Related Work tween 32-bit and 64-bit operating systems and depending
if Physical Address Extension (PAE) or Address Window-
The modern era of Windows memory analysis began in ing Extensions (AWE) are enabled. These processes are
2005 with the DFRWS Memory Analysis Challenge [6]. detailed in [11] and are not addressed in this paper. For
The challenge presented two Windows 2000 memory im- simplicity, this paper focuses entirely on 32-bit, or x86,
ages and asked researchers to answer a set of specific operating systems where PAE and AWE are not enabled.
questions regarding malicious software and illicit activity Address translation is generally a three stage proce-
on the system. Chris Betz along with George Garner and dure. Every process on a Windows system maintains a
Robert-Jan Mora published detailed responses [2, 7], but DirectoryTableBase variable. On a x86 systems this
neither paper discussed their address translation method- value is stored in the CR3 register when the process is run-
ology. ning. This value contains the base address of the table of
Betz published his tool the following year [3] and was Page Directory Entries (PDE) for that process. For each
soon followed by Mariusz Burdach [4], Harlan Carvey virtual address being translated, a PDE is specified using
[5], Andreas Schuster [14], Joe Stewart [15], and others a few bits from the original virtual address. The PDE is
[1, 16]. Unfortunately all of them used a naive method used to find the base address of a page of Page Table En-
for address translation. Either an address was valid and tries (PTE). The specific PTE is designated using this base
the resulting data were used by the tool, or the address address and some more bits from the original virtual ad-
was invalid and ignored. In most cases, when data were dress. The PTE in turn points to the base address of the
unavailable the result was padded with zeros. page in physical memory where the data is stored. The
The FATKit framework [8] was the first to mention us- final address in physical memory is the base address of
ing the pagefile as a further source of data for memory this page plus the remaining bits from the original virtual
analysis. That paper did not, however, mention the other address.
invalid memory states described in this paper. Nicholas The least significant bit in a PDE or PTE entry is the
Maclean’s thesis [9] discussed the invalid states and de- Valid or V bit. When this bit is one the entry is said
scribed a method to parse some of them correctly, but still to be ‘valid’ and bits 12-31 of the entry contain the Page
ignored prototype entries. Frame Number (PFN) used in the next part of the address
translation process. In a PDE, the PFN points to the page
containing the PTE entry. In a PTE, the PFN points to
3 Address Translation the page containing the memory indicated in the original
virtual address. See Figure 1 for an example.
Windows uses virtual addresses to abstract the memory On the other hand, when the V bit is zero the entry is
storage system from the rest of the operating system and said to be ‘invalid’ and a different set of rules must be
other programs. The operating system presents each pro- used to find the data in question. In this paper we are
gram with a large private virtual address space. Each time concerned with bit 10, the Prototype or P bit, and bit
a program references a virtual address, the operating sys- 11, the Transition or T bit. These bits are shown in
tem translates that virtual address into a physical loca- Figure 2. The other bits in these entries are documented
tion and accesses the requested data. The data could be in [11] but beyond the scope of this paper.
in main memory or on the disk, but the operating sys-
tem must find it and load it into memory before a pro-
gram can use it. If necessary, the operating system loads 4 Invalid PDE and PTE Values
data from the disk, resolves inconsistencies, and ensures
the integrity of the system during these accesses. Dur- Just because an entry is invalid doesn’t mean that the data
ing memory analysis the examiner must accomplish this it references is inaccessible. After all, the original oper-
same translation process, but without the operating sys- ating system had a method to access these data! The ex-
tem’s help. aminer can follow the same rules as the operating system
The address translation process is slightly different be- to access the data in question. It is possible that the data
2
Figure 1: Valid PDE or PTE
Figure 2: PDE and PTE bits relevant to address translation
had never been loaded into memory and are thus inacces- try, PageFileOffset uses bits 12-21, shifted right 12
sible to the examiner. That state, however, is provable and places, from the original virtual address being referenced.
will be described in Sections 4.5. Regardless, each in- For a Pagefile PTE entry, PageFileOffset uses bits 0-
valid PDE or PTE fits into one of six categories: Pagefile, 11 from the original virtual address. These equations are
Demand Zero, Transition, Prototype, Zero, or Unknown. shown in Figure 4.
4.1 Pagefile 4.2 Demand Zero
When Windows runs out of physical memory it stores Like a pagefile entry, Demand Zero entries have zeros
pages in a paging file on the disk. If both the P and T in the T and P bits. But when the PageFileNumber
bits in an invalid PTE or PDE entry are zero, the entry and PageFileOffset are both zero, the operating sys-
points to a frame in one of the paging files [9, 11]. The tem has marked the requested page as Demand Zero and
format for a Pagefile entry is shown in Figure 3. Windows would return any request for it with a page of zeros [11].
can support up to 16 paging files, so the page file number, It is thus safe for the examiner to treat the requested page
PageFileNumber, is given in bits 1-4. Note that [11] and as containing nothing but zeros.
others sometimes refer to the PageFileNumber as the
PFN, creating confusion with the Page Frame Number in
4.3 Transition
valid PDEs and PTEs. In this paper the abbreviation PFN
only refers to the Page Frame Number. When the T bit in an entry is one and the P bit is zero, the
The offset of the desired frame in the pagefile, page is said to be in Transition. This means that the page
PageFileOffset, is in bits 12-31 of the invalid entry. has been modified but not yet written back to the disk. It
The true offset in the paging file is the value of bits 12- is currently on either the system’s standby, modified, or
31 from the entry plus some bits from the original vir- modified-no-write lists [11]. (Note that although the de-
tual address. Note that both PDEs and PTEs can point scription on page 441 of [11] is correct, the diagram is
into the paging file and that the methods for finding the not.) The format for a Transition entry is shown in Figure
frame in question is different. For a PDE Pagefile en- 5. The examiner must be careful to also consider that large
3
Figure 3: Pagefile Page Table Entry
PTE PageFileOffset = (pde value & 0xfffff000) +
((virtual address & 0x3ff000) >> 12)
Frame PageFileOffset = (pte value & 0xfffff000) +
(virtual address & 0xfff)
Figure 4: Pagefile Offset Calculations
memory pages2 can be in transition too! Even though a page in question is moved, the operating system only has
page was in transition, the page was still in active mem- to update the one prototype. The PTE stored by each pro-
ory and can therefore be retrieved by an examiner. Just cess acts like a shortcut or symbolic link to the true PTE.
like a valid entry, the page frame number is given in bits When a prototype PTE is encountered, the kernel calls
12-31 and can be used to continue the address translation the function MiResolveProtoPteFault to resolve the
process. page fault. By reverse engineering that function, the
author determined that Windows stores Prototype PTEs
4.4 Prototype in the system’s paged area beginning at 0xe1000000.
The reader can verify this by examining the relation
In a PTE, when the P bit is one the entry is a pointer to of pointers as described on page 453 of [11]. The
a prototype page table entry. Note that when P is one the SectionObject structure in the EPROCESS structure (on
value of the T bit is part of the prototype’s index number Windows XP and above) points to a Segment structure
and has no bearing on the PTE’s type. The P bit has no that contains the address of the prototype PTEs for that
bearing on a PDE’s type. The format for Prototype PTEs object, namely the executable itself. The reader will see
is shown Figure 6. The entry contains an index number that these values always begin above 0xe1000000.
that can be used to compute the virtual address of the pro- To find the virtual address of a Prototype PTE, the ex-
totype PTE. aminer should multiply the index number given in the in-
Prototype PTEs are used when more than one process valid PTE by the size of a PTE, four bytes on an x86 sys-
is using the same page in memory. Prototypes are created tem without PAE. Then add the result to the start of the
when the operating system needs to invalidate the page in system’s paged area. This formula is shown in Figure 7.
question. The operating system authors wanted to avoid Note that the examiner may have to do another address
having to update all of the processes using the page each translation to determine the true location of the prototype
time the page is moved. Instead, they direct each process PTE itself! Care should be taken so that the analysis does
using the page to point to the same prototype. The pro- not fall into an infinite loop of resolving prototype PTEs
totype then points to the page’s true location. When the during this lookup process.
2 On non-PAE systems, a regular memory page is 4KB. A large mem- The author was also able to determine how the flags in
ory page is 4MB. a prototype PTE are handled. Each Prototype PTE should
4
Figure 5: PDE or PTE in transition
Figure 6: Prototype PTE entry
be in one of the six states listed below. The flags in the 6. Mapped File: The P bit is one. The operating sys-
Protoytpe PTE are generally the same as a regular PTE tem would retrieve the requested data from the orig-
except for the P bit. Obviously a Prototype PTE shouldn’t inal file on the disk. The author does not know how
refer to another Prototype PTE. Instead, the P bit is used to use the value from these prototype PTEs. See Sec-
to denote that the Prototype PTE points to a mapped file. tion 6.
1. Active: The V bit is one. The page was in memory
and can accessed using the Page Frame Number in 4.5 Zero
the prototype PTE entry.
If the entry is zero, there is no information available for
2. Transition: The V bit is zero and the T bit is one. the page in question. Specifically, the page has been com-
The page was in Transition, but can be accessed in mitted, but has not yet been accessed [11]. That is, the
the memory image using the Page Frame Number in operating system has allocated a page of physical mem-
the prototype PTE entry. ory for this page but has not read from or written to it.
When encountering this situation, it is safe for the exam-
3. Modified No-Write: Like a transition prototype iner to assume that the entire page is zeros. If a process
PTE, but the Dirty bit, bit six, is also one. The page has allocated memory but never accessed it, it would only
can be accessed in the memory image using the Page contain the zeros that the operating system provided.
Frame Number in the prototype PTE entry.
4. Page File: The V, T, and P bits are all zero. The data 4.6 Unknown
are stored in the page file. See Section 4.1.
There are still some values that do not fit the rules above.
5. Demand Zero: The V, T, and P bits are zero along For example, if a PTE appears to point to a pagefile entry,
with the PageFileNumber and PageFileOffset. but the PageFileNumber is an invalid value (e.g. on a
The page should be satisfied with all zeros. system with one page file, PageFileNumber is 8).
5
PrototypePteAddress = 0xe1000000 + (PrototypeIndex << 2)
Figure 7: Prototype PTE Address Calculation
5 Using Invalid Pages entries recoverable by the examiner for analysis. For the
DFRWS-1 image, there were 75,267 recoverable entries,
For this paper the author created a tool that collected in- or 12,131 more entries than were found using the naive
formation similar to the writeups for the DFRWS Mem- translation. For the DFRWS-2 image, the recovered total
ory Analysis Challenge [2, 7]. The tool also recovered was 100,790 entries, or 14,034 entries more than using the
executables, DLLs, and drivers, using the method de- naive translation. Using the robust method gave the exam-
scribed in [13, 18]. The tool also allowed the author to use iner 19.21% and 16.18% more recoverable entries in the
two different methods of address translation. The naive DFRWS-1 and DFRWS-2 images, respectively.
method of address translation identified valid PDE and
PTE entries, zeroed entries, and classified all other entries
as unknown. Although technically zeroed entries are in- 5.1 Analysis
valid, the naive method of address translation still supplies The author was surprised by the increase in the number of
the examiner the same result as a more robust method: a Valid pages when using the robust address translation pro-
page full of zeros. As such, the author included zeroed cedure. This gain probably came from valid PTEs with
entries in the naive mode as they are technically handled PDEs that had been marked as being in transition. Al-
correctly. though they could not be read using the naive method,
The tool had a second, more robust configuration that they were able to be processed with the more robust rule
used the information in invalid PDE and PTE to locate set.
data in the memory images. Prototype PTE entries were It should be noted that some of those entries are surely
examined and, if valid, used. Prototypes that referred to repeated. For example, most processes use a number of
mapped files were noted, but as described above, could basic system libraries like [Link]. By including that
not be parsed. Entries marked as in transition were as- same DLL in the recovery procedure for each process run-
sumed to still be in main memory and used. Entries that ning on the system, the same DLL will be recovered mul-
referred to the pagefile were noted, and if the page file was tiple times. So the actual gain from using the robust trans-
available, used. lation method may not be as much as 19%, but it is still
The author ran the tool against the two memory im- considerable.
ages distributed with the DFRWS Challenge and recorded
the results of the address translation process. One value
was recorded for resolving the PDE and one value was 6 Future Work
recorded for resolving the PTE. Note that it is possible
to have an odd number of values if the PDE could be re- This paper has attempted to expand the amount of in-
solved but the PTE could not (e.g. if the PDE referred to formation available to an examiner conducting Windows
a frame in the paging file which was not available). memory analysis. Although demonstrating that more in-
The data from the naive translation method are given in formation is available when using robust address trans-
Table 1 and the data using the robust method are given in lation, there are still many opportunities to increase the
Table 2. Neither image had any Demand Zero entries, so amount of recoverable data in a memory image.
they are not shown. In the second table, the Prototype col-
umn refers to prototype PTEs that pointed to active pages
6.1 Using the Pagefile
in memory.
By summing the Valid, Prototype, and Transition Even more pages could have been recovered if the exam-
columns in Table 2, we can compute the total number of iner had access to the pagefile for each system. Frames
6
Table 1: PTE and PDE entries using Naive Address Translation
Image Valid Zero Unknown Total
DFRWS-1 63,136 34,431 8,799 106,366
DFRWS-2 86,756 47,547 6,739 141,042
Table 2: PTE and PDE entries using Invalid Values in Address Translation
Image Valid Zero Prototype Mapped File Pagefile Transition Unknown Total
DFRWS-1 72,295 36,005 1,956 1,525 1,995 1,016 3,952 118,744
DFRWS-2 98,852 51,877 1,347 1,606 413 591 4,733 159,419
stored in the pagefile would immediately be accessible. file in one step.
In addition, using the pagefile might also allow the exam- When working with a virtualized environment like
iner to use more information already present in the mem- VMWare [17], however, the examiner can suspend the
ory image. A virtual address may reference a PDE that guest operating system and capture both memory and the
points to a PTE in the paging file even though the physical paging file at her leisure. The contents of physical mem-
page in question is in main memory. Such a page would ory are usually written to a file (e.g. A .vmem file un-
be inaccessible under naive translation under any circum- der VMWare). To acquire the pagefile, the examiner can
stances and even when using robust translation unless the mount the drive from the system in question in another
pagefile was also available. guest operating system and copy the pagefile. The exam-
Furthermore, some crucial information to module re- iner must be careful to mount the drive in non-persistent
covery is located in the first page of the module. The mode so that no changes are made to the source drive.
number of sections, their locations and offsets are all in Making changes on the source drive makes it difficult to
this first page. Without the first page, the examiner can restart the suspended virtual machine.
recover a number of bytes equal to the total size of the At this time the author does not know if the amount
module, but won’t recover the module correctly. If the of additional information gathered from the pagefile
module’s first page was in the paging file, which happened would be worth modifying incident response procedures
a few times in the DFRWS images, recovering the rest of to gather the required data. Further, the inevitable delay,
the module properly was not possible. no matter how slight, between capturing a memory image
In order to use the pagefile effectively, however, it and the paging file may create inconsistencies between the
must be acquired at the same time of memory acquisi- two that frustrate an analysis. That is, data in main mem-
tion. By default, Windows uses only one paging file, ory might refer to items in the pagefile that were no longer
%SystemDrive%\[Link], but the true locations in the pagefile when it was captured.
and filenames for paging files should be found using the
registry [12]. Capturing these pagefiles is difficult on a 6.2 Other Issues
live system as traditional file copying utilities cannot open
them. The files are in use by the operating system and may The author has not determined how the operating sys-
not be opened by another process. To copy a paging file tem retrieves data from prototype PTE entries that re-
the examiner can use a program to parse the raw file sys- fer to mapped files. Reverse engineering the function
tem. It would be beneficial for first responders to have a MiResolveMappedFileFault may yield some infor-
program to capture both physical memory and the paging mation, but further work may be needed to determine if
7
an examiner can use this information, in conjunction with to H– and ManTech SMA for providing the time and re-
the appropriate filesystem, to recover the data in question. sources to conduct this research.
The thousands of still unknown PDE and PTE values
found in the DFRWS memory images are troubling. The
author did not determine if these values represent evi- 9 About the Author
dence of the malware present on the system or are a nor-
mal part of the operating system. It is possible that these Jesse D. Kornblum is a Principal Computer Forensics En-
values contain some meaningful information, but the au- gineer for ManTech SMA’s Computer Forensics and In-
thor has not researched this question. trusion Analysis Division. Based in the Washington DC
Finally, this paper has focused entirely on 32-bit oper- area, his research focuses on computer forensics and com-
ating systems without PAE or AWE enabled. Although puter security. He has authored a number of computer
a similar technique for using invalid PDEs and PTEs can forensics tools including ssdeep, the context triggered
easily be applied to PAE systems, the performance gain piecewise hashing program, and md5deep, the widely
for doing so is unknown. Similarly, the author does not used suite of cryptographic hashing programs. Mr. Korn-
know if the analysis of AWE or 64-bit systems would blum believes that his dog Zoey is smarter than your dog.
benefit from using invalid entries. Additional work will You can send him mail at [Link]@mantech.
com.
be required to apply this new technique to those operating
systems.
7 Conclusion
This paper has demonstrated that the completeness of
Windows memory analysis is significantly improved
when using robust address translation. Naive address
translation methods have worked to date, but are not ade-
quate for a rigorous analysis. Furthermore, robust address
translation allows examiners to make use of the paging
file for the first time. Using the techniques described in
this paper, examiners should be able to recover more data
from each memory image and create a more complete pic-
ture during their analyses.
8 Acknowledgments
The author will forever be indebted to Mark Russinovich
and David Solomon for their excellent book Microsoft
Windows Internals. Not only was the information invalu-
able, but the figures in this paper were derived from those
found in chapter seven. This paper would not have been
possible without in-depth discussions with Harlan Carvey,
Andreas Schuster, and Mariusz Burdach. Additional tech-
nical information provided by Eugene Libster, E–, Nicole
Donnely and Robert Hansen. Proofreading was assisted
by Adrienne Hollister. Special thanks to S–. Thanks also
8
References
[1] Agile Risk Management. Nigilant32 for first responders: Active memory imaging, 2006. [Link]
[Link]/publications [Link].
[2] Chris Betz. DFRWS 2005 challenge report. In DFRWS Memory Analysis Challenge. Digital Forensic Research
Workshop, 2005. [Link]
[3] Chris Betz. Memparser, 1.0 edition, July 2006. [Link]
[4] Mariusz Burdach. An introduction to the windows memory forensic, 2005. [Link]
net/pdf/introduction to windows memory [Link].
[5] Harlan Carvey. LiSt process image, July 2006. [Link]
[Link].
[6] Digital Forensic Research Workshop. DFRWS Memory Analysis Challenge, 2005. [Link]
challenge/[Link].
[7] George Garner jr and Robert-Jan Mora. Preliminary analysis of 2005 DFRWS forensic challenge. In
DFRWS Memory Analysis Challenge. Digital Forensic Research Workshop, 2005. [Link]
challenge/[Link].
[8] Nick Petroni jr, AAron Walters, Timothy Fraser, and William Arbaugh. FATKit: A framework for the extrac-
tion and analysis of digital forensic data from volatile system memory. Digital Investigation, 3(4):197–210,
December 2006.
[9] Nicholas P. Maclean. Acquisition and analysis of windows memory. Master’s thesis, University of Strathclyde,
2006. [Link] [Link].
[10] Thomas E. Mails. The Mystic Warriors of the Plains: The Culture, Arts, Crafts and Religion of the Plains
Indians. Marlowe & Company, 1972.
[11] Mark Russinovich and David Solomon. Microsoft Windows Internals. Microsoft Press, Redmond, Washington,
fourth edition, 2005.
[12] Paul Sanna. Windows 2000 Registry. Prentice Hall, 2001. online at [Link]
prodtechnol/windows2000serv/maintain/featusability/[Link].
[13] Andreas Schuster. Reconstructing a binary, April 2006. [Link]
04/reconstructing a [Link].
[14] Andreas Schuster. Searching for processes and threads in microsoft windows memory dumps. Digital Investi-
gation, 3(S):10–16, August 2006. [Link]
[15] Joe Truman. Truman - the reusable unknown malware analysis net, 2006. [Link]
[16] Tim Vidas. Forensic analysis of volatile memory stores. NEbraskaCERT Conference, August 2006. http:
//[Link]/presentations/2006/files/[Link].
[17] VMWare. Vmware virtualization products. [Link]
[18] AAron Walters. Fatkit: Detecting malicious library injection and upping the“anti”. Technical report, 4TΦ
Research Laboritories, July 2006. [Link] dll [Link].