Those who have “reversed” .NET libraries would probably argue that you’re not really reversing a binary if it contains full type information. Commercial software developers and malware authors would make a similar claim and are therefore compelled to go to great lengths to obfuscate .NET binaries in order to make analysis difficult. One of the most common practices of hampering analysis is string obfuscation. In .NET assemblies, String obfuscation is most often implemented by passing a series of nonsensical arguments to a deobfuscation method which then returns the deobfuscated string to a method. In practice, it has to work like that since built-in .NET methods are unable to accept strings in an obfuscated form. In this post, you will learn some quick and dirty methods of using reflection to automate string deobfuscation in .NET assemblies. I’ll be using a randomly selected malware sample as an example.
If you’ve decided to follow along with this exercise, it is imperative that you perform your analysis in an environment designated for malware analysis. I performed all of my analysis in a clean VM in host-only mode. Additionally, as you will see, when possible, I will only perform analysis of a .NET malware sample in a “reflection only” (non-executable) context.
As with any malware sample, let’s start by poking around in order to determine the level of effort required for analysis. This malware sample employs the following techniques to make analysis more difficult:
- It applies the SuppressIldasm attribute to the assembly. This simply prevents ildasm (Microsoft’s IL disassembler) from disassembling the sample. This is easy to overcome since there are plenty of other disassemblers such as ILSpy, dotPeek, .NET Reflector, Get-ILDisassembly, IDA Pro, and one of my favorite tools – Cerbero Profiler.
- The majority of strings passed to other methods or assigned to variables are obfuscated.
- Most types (i.e. classes) and members (i.e. methods, fields, events, etc.) were renamed to unprintable Unicode characters. This makes programmatic navigation of methods we want to target difficult since you cannot reference them by name. In this case, I will often refer to methods by their metadata token.
Here’s an example of some of the decompiled code from the malware sample in ILSpy:
Now, with neither the ability to read Arabic nor the ability to make sense of nonsensical characters, you should be able to infer that this is a class constructor. Also, you can see that many of the members are being initialized to the result of a method call that takes an unprintable string and a number as an argument. This could very well be the string deobfuscation method we’re seeking. Clicking on the method brings up its implementation:
This method calls another method if and only if the date is before or equal to Sunday, June 30, 2013 12:28:04 AM. Following the next method call, we find what appears to be our deobfuscation function.
Basically, this method takes the obfuscated string and XOR decodes each character using the integer argument as an index into an array containing XOR keys. The last piece of information required to fully understand the string deobfuscator method is to find where the array is initialized. In ILSpy, this can be done by clicking on the array, then listing its cross-references and clicking on “Assigned By.” That way, we’ll know who initialized the array in the first place. After performing these steps, we arrive at the following method:
If you follow the reference to what GetManifestResourceStream is calling, you’ll see that the resource named “43b772467d174e01a4eaebd9d95b31a2
” is being read in as a byte array, the contents of which are used as XOR keys. Here’s the contents of the embedded resource:
If you follow the cross references to the deobsfuscation method, you will find that this method is called quite a few times. It would be a waste of time to manually deobfuscate each string. In order to automate the process, it would be ideal if we could programmatically find all cross references to this functions, pull out the arguments passed during each call and then invoke the deobfuscation method for each set of arguments. Well, that is exactly what we will do using reflection.
If you are not familiar with reflection, it is what allows you to perform type introspection and/or code generation/modification at runtime. For those interested in learning more about reflection as it applies to .NET, it is highly recommended that you become familiar with the classes in the System.Reflection and System.Reflection.Emit namespaces. It is also necessary to have a basic understanding of CIL (Common Intermediate Language) – the byte code used by the .NET framework.
Continuing with our analysis, I am going to use PowerShell to programmatically deobfuscate strings since it allows for easy scripting of the methods we’ll need to use in the System.Reflection namespace. Generally speaking, there are two ways to automate string deobfuscation – via static analysis or via dynamic analysis. Static analysis involves reflecting on the assembly in a “reflection only” context meaning you cannot invoke any of the methods you reflect on. Dynamic analysis allows you to invoke the methods in question. In the case of our Syrian malware sample, since the deobfuscation method is relatively simple, I will implement its logic myself and stick with pure static analysis. I will usually attempt static analysis first in an attempt to understand as much of the sample I can without executing it.
What follows is the PowerShell script that will deobfuscate all strings in the malware sample. I will break down its relevant components individually after presenting the script in its entirety.
$MalwarePath = 'C:infectedاسماً لرجال ونساء سوريين مطلوبين لأفرع المخابرات السورية.exe'
#region 1) Resolve assembly dependencies at runtime
# Automatically loads assembly dependencies in a reflection only context
$ResolveEventHandler = {
Param (
[Object] $Sender,
[ResolveEventArgs] $ResolveArgs
)
[Reflection.Assembly]::ReflectionOnlyLoad($ResolveArgs.Name)
}
# Normally, this would be performed with Register-ObjectEvent but unfortunately,
# it cannot handle event handlers that have return values.
$ReflectionOnlyAssemblyResolveField = [AppDomain].GetField(`
'ReflectionOnlyAssemblyResolve', 'NonPublic, Instance')
$ReflectionOnlyAssemblyResolveField.SetValue([AppDomain]::CurrentDomain, `
($ResolveEventHandler -as [System.ResolveEventHandler]))
#endregion
#region 2) Load the assembly and pull out all of its types
# Load the malware sample in a reflection only context
$EvilAssembly = [Reflection.Assembly]::ReflectionOnlyLoadFrom($MalwarePath)
$EvilModule = $EvilAssembly.GetModules()[0]
try
{
$Types = $EvilModule.GetTypes()
}
catch [Reflection.ReflectionTypeLoadException]
{
$Types = $Error[0].Exception.InnerException.Types
}
# Throw out null entries
$Types = $Types | ? {$_}
#endregion
#region 3) Implement the deobfuscation logic from the malware sample
# Read in the XOR key array from the embedded executable resource
$ResourceName = '43b772467d174e01a4eaebd9d95b31a2'
$ResourceStream = $EvilAssembly.GetManifestResourceStream($ResourceName)
$XORKeyArray = New-Object Byte[](256)
$ResourceStream.Read($XORKeyArray, 0, $XORKeyArray.Length) | Out-Null
# PowerShell implementation of the deobfuscation function
# This is pretty much a line-by-line copy of the decompiled method
Function Deobfuscate-String( [String] $ObfuscatedString,
[Int] $XORKeyArrayIndex )
{
$num = $ObfuscatedString.Length
$num2 = $XORKeyArrayIndex -band 255
$array = $ObfuscatedString.ToCharArray()
foreach ($i in 0..($num - 1))
{
$array[$i] = [Char] (([Int] $array[$i]) -bxor `
(([Int] $XORKeyArray[$num2]) -bor $XORKeyArrayIndex))
}
$DeobfuscatedString = New-Object String([Char[]] $array, 0,
$ObfuscatedString.Length)
Write-Output $DeobfuscatedString
}
#endregion
#region 4) Get cross-refs to deobfuscation method, pull out arguments,
# and call our deobfuscation function
# A call to the deobfuscation method looks like this:
# /* 0000029D 72 01 00 00 70 */ ldstr ""
# /* 000002A2 20 C3 F6 00 00 */ ldc_i4 63171
# /* 000002A7 28 53 00 00 06 */ call (string , int ) // returns string
# 0x28 is the [Reflection.Emit.Opcodes]::Call opcode
# 0x06000053 is the deobfuscation method metadata token
$TargetCall = [Byte[]] @(0x28, 0x53, 0x00, 0x00, 0x06)
$CallSequenceLength = 15
$Results = New-Object PSObject[](0)
# Iterate over each method in the module looking for calls to the deobfuscation
# method. If a call is found, extract its arguments and pass them to
# Deobfuscate-String
$Types | % { $_.GetMethods('NonPublic, Public, Static, Instance') } |
% { $MethodInfo = $_; $_.GetMethodBody() } | % {
try
{
# Extract the IL bytecode
$ILByteArray = $_.GetILAsByteArray()
# Validate that the method contains IL bytecode and that it is large enough
# to hold the bytecodes that set up and call the deobfucation method
if ($ILByteArray -and ($TargetCall.Length -le $CallSequenceLength))
{
for ($i = $ILByteArray.Length - 1; $i -ge 0; $i--)
{
# Have we reached a call to the deobfuscation method?
if ((Compare-Object $TargetCall ($ILByteArray[$i..($i+$TargetCall.Length-1)])) -eq $null)
{
# Check to see if the previous instructions are ldc_i4 and ldstr
if (($ILByteArray[$i-5] -eq 0x20) -and ($ILByteArray[$i-10] -eq 0x72))
{
# We have found a call to the deobfuscation method
# Extract the XOR index key argument
$Index = [System.BitConverter]::ToInt32($ILByteArray[($i-4)..($i-1)], 0)
# Resolve the string argument based on its metadatatoken
$StringMetadataToken = [System.BitConverter]::ToInt32($ILByteArray[($i-9)..($i-6)], 0)
$Obfuscated = $EvilModule.ResolveString($StringMetadataToken)
# Wrap everything up into a nice custom object
$Result = @{
XORIndex = $Index
ObfuscatedString = $Obfuscated
DeobfuscatedString = Deobfuscate-String $Obfuscated $Index
CallerMethod = $MethodInfo
}
$Results += New-Object PSObject -Property $Result
}
}
}
}
} catch {}
}
# Display the deobfuscated strings and their respective arguments,
# and caller methods
$Results | Sort-Object XORIndex -Unique |
Format-Table XORIndex, ObfuscatedString, DeobfuscatedString -AutoSize
#endregion
deobfuscate.ps1
Region #1 sets up an event handler to load assembly dependencies. This step is required because dependencies are not automatically resolved in a reflection-only context. Normally, this step would be achieved by calling the Register-ObjectEvent cmdlet. Unfortunately, you cannot register event handlers with return values like the one that ResolveEventHandler uses.
Region #2 loads the malware sample in a reflection-only context, which means that you can only perform type introspection.
Region #3 is a PowerShell implementation of the deobfuscation method described earlier. It is essentially a line-by-line translation of the decompiled C# code.
Region #4 finds all cross-references to the deobfuscation method, pulls out its arguments, and calls our PowerShell-based deobfuscation function, thus automating the deobfuscation process for each string in the malware sample. For starters, cross references are found using the following call sequence as a reference:
/* 0000029D 72 01 00 00 70 */ ldstr "??????????????????????"
/* 000002A2 20 C3 F6 00 00 */ ldc_i4 63171
/* 000002A7 28 53 00 00 06 */ call (string ?, int ?)
This is an example of a call to the deobfuscation function. The first instruction pushes a string metadata token onto the stack. Strings are always pushed “by reference” in this fashion. The second argument pushes the XOR index value as an immediate value onto the stack. Finally, the deobfuscation method is called using its metadata token – 0x06000053
. Method metadata tokens can be easily recognized because they always begin with 0x06
.
It is worth noting that all metadata tokens are scoped per-module. This can be useful in situations member names are renamed to unprintable Unicode characters because a Reflection.Module object can resolve members by metadata token as can be seen in the above PowerShell script.
After running the PowerShell script, you should see the following output, allowing you to infer some additional functionality of the sample:
DeobfuscatedString
------------------
Arguments
™
Info
ReceiveBufferSize
Read
.exe
1
IconLocation
Contains
,
OSFullName
DeleteSubKey
@
us
Connected
" "
ReceiveTimeout
Replace
SendTimeout
CreateInstance
TargetPath
/c start
GetValue
U0VFX01BU0tfTk9aT05FQ0hFQ0tT
Available
"
Users
cmd.exe
ToArray
GetValueNames
ENABLE
MainWindowTitle
Disconnect
Receive
Dispose
&explorer /root,"%CD%
FullName
®
...
Save
Win
Write
vn
Software
ClassesRoot
netsh firewall add allowedprogram "
LocalMachine
Name
&exit
WorkingDirectory
!
Windows
CreateSubKey
.
Data
.lnk
WScript.Shell
OpenSubKey
CurrentUser
" & exit
Win
Kill
Length
Connect
SendBufferSize
Client
Microsoft
Position
SystemDrive
US
CreateShortcut
Registry
Send
Poll
SoftwareClasses
SetValue
length
cmd.exe /k ping 0 & del "
DefaultIcon
...
DeleteValue
Directory
_
,0
ProcessName
Now, I did say that this was a quick and dirty method of pulling out obfuscated strings. Obviously, this process can be automated further. For example, it would be possible to only specify the metadata token of a deobfuscation method, extract its parameters automatically, find its cross-references, and emulate stack operations in order to extract its arguments as they are pushed onto the stack. Up to this point though, I haven’t needed to jump through those hurdles. Regardless, I hope this post has helped to shed some insight on one possible way to use reflection to pull data out of obfuscated .NET assemblies.