Monday, October 13, 2008

Obfuscation in .net

Obfuscation is one of the means in .NET by which you can protect your intellectual property by preventing other people from decompiling your code or making your code opaque so that it is difficult to understand.


Prevent customers from stealing your algorithms, and crackers from changing your code, by using any Obfuscation techniques.

Why Obfuscation?

Languages such as Java and .Net are compiled to intermediate languages such as byte codes(Java) or Microsoft Intermediate Language(MSIL in case of .Net) are easy to reverse engineer or disassemble into the source code. Unlike native code, the intermediate byte codes contain complete variable names, such that disassembly generates almost the exact source code of the original program. So it is the fact that your code is not secured.

In practice, there is only one absolutely certain way to make it impossible for others to decompile your code and see how it works: keep your assemblies on a private server and have the code execute only on that server. In most cases this means setting your application up as a web service or as an ASP.NET application, although you might alternatively have some architecture in which an application sits on a server, and uses .NET Remoting to communicate with clients. The .NET Framework does make it considerably more convenient than previously to architect applications in this way.

If your assemblies are to be installed or downloaded onto client machines, to which people outside your organization will have access, then your code is insecure as they can read your IL using ILDASM tool. In principle, this is not really any different from the situation that existed before .NET: even if you ship native code out, other people can use a disassembler tool to examine the native executable code instruction sequence, and can if they choose use a decompiler program to generate corresponding source code.
Moreover, the tools that hackers or even curious users might need to reverse engineer code are widely available. Microsoft offers its own MSIL disassembler, called ILDASM, free of cost. The Anakrino tool is an open-source disassembler for .NET (http://www.saurik.com/net/exemplar/); and various other companies offer equivalent tools on a commercial basis. We should take some step towards securing our code from Hackers.
Here where Obfuscation comes into picture. So what is Obfuscation?

Protecting Code

User can protect code from reverse engineering either by using legal or technical protection, but economically it is difficult for a small company to enforce the law against a larger competitor. So user can protect his/her code by making reverse engineering technically difficult that it becomes at the very least economically in viable.

One such approach is to never provide access to applications for users but rather to provide services remotely to the users, by paying small amount of electronic money every time. So that users can never gain access to the applications and will be unable to reverse engineer it. But because of limits on the network capacity the application will perform worse than if run locally.

The Second approach to encrypt the code but this works when entire decryption takes place in hardware and if the code is executed by any virtual machine interpreter (such as JVM or CLR) then it is always possible for users to intercept and decompile the decrypted code.

The most effective way to protect your code from these forms of reverse engineering and snooping is to obfuscate it. This process is called obfuscation.
Obfuscation is one of the means in .NET by which you can protect your intellectual property by preventing other people from decompiling your code or making your code opaque so that it is difficult to understand.

How to obfuscate your code?
There are many ways to obfuscate code primarily focusing on making variable names meaningless, encrypting strings and literals and inserting misleading directives that render disassembled code that are not compiled.
The basic idea is to run application through an obfuscator, a program that transform the application into one that is functionally identical to the original but which is more difficult for malicious users to understand it. Obfuscators can mangle symbol names into formats and names that are legal in the underlying runtime but are very difficult for humans to follow. Code Obfuscation can never completely protect your application from the malicious reverse engineering, i.e., given time malicious user will be able to dissect the application to extract important algorithms by using any de obfuscator utility.
Hence the level of security depends on several factors:
1. Sophistication of Transformations employed.
2. The power of available deobfuscator algorithms.
3. The amount of resources such as time available for deobfuscator.

Obfuscators make use of several low level techniques to hide sections of the metadata and IL from prying eyes.
Following figure illustrates Software protection through Obfuscation:


Fig 1: Software protection through Obfuscation
Obfuscation is a process that is applied to compiled .Net assemblies and not for source code. The output of the obfuscator is another set of assemblies that is functionally equivalent to the original source but transformed in the ways that hinder reverse engineering.
Techniques of Obfuscation
1. Layout Obfuscation technique
Layout obfuscation refers to altering the formatting of the class file. This involves removing debug information and changing the names of elements such as the class, member variables, and the local variable. This technique includes removal of non essential metadata and renaming techniques described below:
Removal of Non Essential Metadata:
Obfuscators remove debug information and non-essential metadata from a file as they process it. This reduces size of the applications and also enhances the protection for the source. When code with debugging information in it is decompiled, local variable

names are preserved. Any proprietary algorithms contained in the code can be easily reverse engineered.
To remove debug information from the executable:
csc /debug- somefile.cs
Not all the metadata is used by the runtime, some of it is used by other tools such as Debuggers, IDE and Designers. For example if a property “Age” is defined on a type in C#, the compiler emits metadata for the property “Age” and associates that name with the methods that implement the get and set operations such as “set_Age” and “get_Age”. So that if the age is retrieved, the compiler will generate call to “get_Age” and never reference to the property by name.
If the application is meant for the runtime only and not by other tools, then such data can be removed from the metadata which is not essential by CLR. Even event names and method parameters can also be removed.
Renaming Technique:
This technique renames all the meaningful variables names to meaning less variable names so that it will be difficult for a hacker to understand the code. Typically this will be a short string such as a single character. As the obfuscator process the code, it will select the next identifier for substitution. This technique has a huge advantage over hashing, as it cannot be reverse engineered. While the program logic is preserved the names become meaningless. When it is obfuscated in such a way, hacker faces the identifiers as a, a(b) instead of variables such as amount, invoice(amount). Nevertheless, the program logic can be reverse engineered.
2. Overload Induction Technique
This technique provides a deeper level of obfuscation, it is a patented algorithm devised by PreEmptive Solutions, Inc. Along with trivial renaming method discussed above this technique overloads methods to large extent. Overload induction method renames as many methods to same name. Which such a level of obfuscation while logic is preserved, it makes reverse engineering very difficult.
Example:
Code for sayHello() method before obfuscation:
public void Friendly::SayHello() {
Console.WriteLine( "Hello, my name is {0}", myName );
}
The obfuscated code for sayhello()
public void a::a(){
Console.WriteLine("Hello, my name is {0}", a);}
Second example:
private void CalcPayroll(SpecialList employeeGroup) {
while (employeeGroup.HasMore()) {
employee = employeeGroup.GetNext(true);
employee.UpdateSalary();
DistributeCheck(employee);
}
}
After overloaded induction:
private void a(a b) {
while (b.a()) {
a = b.a(true);
a.a();
a(a);
}
}
From the examples above, the obfuscated code is more compact and also reduces the size of the file because of the renaming technique. For example, if the length of the name is 50 characters, using this technique it reduces it to 1 character there by saving lot of space. Typically, an Overload Induced project will have up to 35% of the methods renamed to a().
3. Control-Flow Obfuscation
One of the more advanced obfuscation techniques available today is Control-Flow obfuscation. This process synthesizes branching, conditional, and iterative constructs that produce valid forward logic, but yield nondeterministic semantic results when decompilation is attempted.
Example:
public int CompareTo( object o ) {
Frequency f = o as Frequency;
if ( f == null )
return -1;
if ( m_Comparer == null )
return m_Letter.CompareTo(f.Letter);
return m_Comparer.Compare(this,o);
}
After applying control-flow obfuscation technique:
public virtual int a(object A_0) {
g local0;
int local1;

local0 = A_0 as g;
if (local0 != null)
goto i1;
goto i2;
while (true) {
local1 = this.a.CompareTo(local0.c());
goto i3;
i1: if (g.c != null)
goto i4;
}
i2: local1 = -1;
goto i3;
i4: local1 = g.c.Compare(this, A_0);
i3: return local1;
}
4. Incremental Obfuscation
An advanced feature called incremental obfuscation is of particular interest to enterprise development teams maintaining an integrated application environment. By generating name-mapping records during an obfuscation run, obfuscated API names can be reapplied and preserved in successive runs. A partial build can be done with the full expectation that its access points will be renamed the same as a prior build. As a result, the distributed patch files integrate into the previously deployed system without a hitch.
Minimum functionality required for coding obfuscators:
Here is a list of the minimum functionality an obfuscator should provide:
• Remove debug information
• Rename identifiers to be meaningless
• Configurable renaming so that you can choose what gets renamed and what gets obfuscated
• Generate a mapping file so you can map original names to obfuscated names
CONCLUSION
While no software is safe from reverse engineering given enough time, patience, and persistence on the part of the reverse engineer, MSIL code is especially susceptible. Because MSIL code is architecture-neutral, a rich set of metadata is contained in the assembly so that decompiling MSIL can very nearly yield the original source. A MSIL .NET obfuscator, however, can rename classes, member variables, and method names, making them meaningless. A sophisticated obfuscator can even alter control flow, making decompiled code even harder to read.

REFERENCES

 C# Links and Resources. Several articles on C#. http://www.webreference.com/programming/csharp/
 MSDN for Visual Studio .NET 2003

No comments: