A Guide to Malware Analysis: Day 2

On day two of Malware Analysis, fairycn unveils program static analysis, string checking, PE structure, and linking methods, including tools and techniques for shelling, obfuscation, and anti-obfuscation.

A Guide to Malware Analysis: Day 2
This spectacular photograph by the talented UK-based photographer Lee Høwell is part of the Namibia Another World collection. It is a mesmerizing work of art that displays the beauty of a glorious woman contrasted by golden hues of the desert in the southwestern coast of Africa.
fairycn.png

Note: This article is an English translation of an article written in another language. If you have are having trouble reading it, I will improve the translation


On day two of malware analysis, I will cover the following topics:
• Program static analysis
• String checking
• Coding, encoding, and decoding
• PE checking, PE structure, and PE analysis tools

• Linking libraries and functions
• Dynamic linking (load-time, run-time)
• Hash function and hash value
• Shelling and shell removal
• Obfuscation and anti-obfuscation

Continue reading to learn about the definition of these topics and how they are used in malware analysis, including tools and techniques that are used.

Program Static Analysis

Program Static Analysis denotes a code analysis technique that scrutinizes program code using many methodologies and the following:

• Lexical analysis
• Parsing
• Control flow
• Data-flow analysis


It verifies whether the code conforms to certain standards, security measures, reliability parameters, maintainability criteria, and other indicators without necessitating the execution of the code. This technique can be employed to determine whether the software contains a virus, typically analyzing from the aforementioned aspects.

String Checking

In the case that a program is not running, some tools are used to extract the program string to see if we can detect any suspicious information to help us determine if there is a virus. The principle and analysis are as follows:

A string or string (String) is a string of characters consisting of numbers, letters, underscores, etc. It is mainly used for programming, concept descriptions, function explanations, etc. Additionally, a string is similar to an array of characters in storage, so each individual element of its bit is extractable.

A string is commonly used for:

• Output information
• URL addresses
• File names
• Path information


As computers can only recognise 0 and 1 numbers, encoding techniques are often used to solve this problem in order to use the string specified by the input.

Coding, Encoding, and Decoding

Coding, or programming, is the process of writing instructions for a computer to perform tasks in a specific programming language. A pre-defined method is used to encode text, numbers or other objects into numbers, or to convert information or data into a defined electrical pulse signal.

Moreover, encoding is the process of converting information from one form or format to another also known as the code of a computer programming language. Encoding is widely used in electronic computers, television, remote control and communications. Decoding is the reverse process of encoding, converting encoded data back to its original form.

Four common coding techniques are ASCII, Unicode, GB 2312, and GBK encoding.

1) ASCII: A computer coding system based on the Latin alphabet, mainly used to display modern English and other Western European languages.

2) Unicode: An industry standard in the field of computer science, including character sets and encoding schemes, Unicode was created to address the limitations of traditional character encoding schemes by setting a uniform and unique binary code for each character in each language to meet the requirements of cross-language and cross-platform text conversion and processing.

3) GB 2312: It is used for the exchange of information between Chinese character processing and Chinese character communication systems and is commonly used in mainland China; it is also used in Singapore and other places. Almost all Chinese systems and internationalised software in mainland China support GB 2312.

4) GBK encoding: GBK encoding standard is compatible with GB2312, which contains 21,003 Chinese characters and 883 symbols, and provides 1,894 character code bits, with both simple and traditional characters integrated into one database.

Tools for string extraction

To extract strings from the computer's binary code, you can use the following tools:

Strings official website:https://docs.microsoft.com/zh-cn/sysinternals/downloads/strings
Function: Finds printable strings in object files or binary files.
Drawback: Keep note that this will ignore the contextual formatting. It may search for: a memory address, a sequence of CPU instructions, a piece of data, etc.
Limitations: It will only search for printable strings with three or more consecutive ASCII (2 zero-terminated) or Unicode (4 zero-terminated) characters ending in a terminator.
Target: Computer viruses can exploit this search restriction to cause Strings to search for useful strings (e.g. by turning all characters into two characters before stitching).

Tip: Change the file suffix when searching to avoid running.

Case in point:

Be careful when searching URLs or interacting with unknown IP addresses as they can potentially host viruses. When accessing websites, ensure you have adequate security protection measures, like real-time access verification, in order to prevent infecting your device with a virus or malware, such as a web page "hanging horse", for example.

PE Checking

As most infecting viruses are infecting PE files, this allows them to run their own virus code while the PE file is running.  As a result, the virus can continue to infect other normal files in order to spread itself. So from an antivirus point of view, you should first determine whether a file is a PE structure, and then decide which method you should use to scan the file.

Here's how you can determine whether a file is PE structured or not, beginning with the concept of PE:

PE concept: PE (Portable Execute) files are a generic term for executable files under Windows, commonly known as DLL, EXE, OCX, SYS, etc.
Scope: Windows executable programs and dynamic link libraries.
Contents: Crucial information on how Windows loads files from the hard disk into memory for execution.

The fact that a file is a PE file has nothing to do with its extension - PE files can have any extension.

So how does Windows distinguish between executable and non-executable files?

We call LoadLibrary and pass a filename.

How does the system determine that this file is a legitimate dynamic library?

This is where the PE file structure comes in.

The PE structure

P3Ktr.jpg

This specification describes the structure of executable (image) files and object files under the Windows family of operating systems. These files are referred to as Portable Executable (PE) and Common Object File Format (COFF) files, respectively.
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format

Commonly used PE analysis tools

PE-bear: https://github.com/hasherezade/pe-bear

1688207811165.png


PEview(roguekillerpe):https://www.adlice.com/download/roguekillerpe/

1688206897877.png



PPEE: https://mzrst.com/

1688206913838.png


CFF Explorer: https://ntcore.com/?page_id=388

1688206921593.png


PE Explorer: http://www.heaventools.com/overview.htm

1688206928787.png

Checking and killing techniques:

Identifying Abnormal Program Entry Points in PE Files
After infecting a Portable Executable (PE) file, many viruses will usually add a portion of their code to the PE file, and then change the AddressOfEntryPoint in the PE header to locate the address to the code inserted by the virus. In this case, whenever the file is run, the virus code will be the first to run.

In general, many viruses place the code inserted into the PE file at the back of the PE file, and then place a statement at the end of the code to jump back to the real entry point of the original PE file. This allows the user to execute the virus code unnoticed.

Anti-virus software can determine whether a file is suspected of being infected by a virus based on whether the entry point of the PE file is abnormal.

If the entry point of a PE file points to something other than this, then the file is suspected of being infected by a virus. Of course, this subjective judgement is not always accurate, but it can be considered a basis for judgement. The heuristic scan we mentioned in the last issue uses such features to help determine unknown viruses.

Some viruses have also come up with a number of ways to change the program flow without modifying the entry point in order to prevent such detection by anti-virus software. For example, changing the code of the original entry point program, and then jumping to the virus body.

Extracting feature codes based on PE structure
Feature codes are extracted by dividing the file into different parts, and then extracting a certain length of content from each part as a feature code. The problem with this method is that many viruses have similar features, such as the PE structure we are discussing, and a large part of the beginning of many PE files is the same, so it is not ideal to extract the features by dividing the file into equal parts.

This is where we considered using the PE structure to extract a certain amount of content from each section as feature codes, or using various key points as references to find feature codes in the vicinity. In this way, the drawbacks of the equal division of files to extract feature codes method mentioned above can be greatly avoided, and the variability of feature codes among different viruses is enhanced. For example, for this detection of CIH virus, features near the PE Header and near the entry point were examined.

Identification of CIH virus
There are three characteristics:
1) If the first byte of the PE Header is non-zero, it is likely to be infected. CIH itself uses this to determine this. However, this feature is not always reliable, as programs that are not infected with the CIH virus may also become non-zero in this area for various reasons, so two additional code features are added.
2) The CIH virus will change the code entry point in the PE header to point to itself. This ensures that the virus code is executed whenever the infected file is run.
3) Based on this, the CIH exhibits unique behaviors, such as changing the entry point offset, and performing specific actions referred to as the siddt action and hanging the file system hook. By identifying these features, we can reliably detect a CIH virus.

Of course, all 3 features are concentrated in the virus header. If we want to be more reliable and avoid false positives within the family, we can also add some code behind the virus body.

Linking libraries and functions

How are linked libraries and functions targeted by computer viruses when they can bring so much useful information to the analysis of viruses?

To answer this question, know the following:
The reason for targeting: The virus uses the import table in the PE structure to import into the computer's memory the link libraries, functions and other things containing malicious content that the computer virus needs, and calls the functions in the dynamic link libraries (linking the computer virus code to the dynamic link libraries through the link libraries) to prepare the work.

Understand what is linking and what are the linking methods: Linking is the process of combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed. The problem that linking solves is the integration of our own code with a library written by someone else.

In the following sections, I will cover various linking methods.

Linking Methods

Static linking: It is the least common method of linking code bases on Windows platforms, but is more common in UNIX and Linux programs.

  1. What: The binary code for all required functions is included in the executable file when it is generated (link time). Therefore, the linker needs to know which functions are required by the target files participating in the link, and also what functions are available in each target file, so that the linker knows if every function required by the target file can be linked correctly.

    If a function required by a target file is not found in a participating target file, the linker reports an error.

    There are two important interfaces in the target file to provide this information: one is the symbol table and the other is the relocation table.
    When a library is statically linked to an executable, all the code in this library is copied to the executable.

  2. Advantage: No library dependencies are required at the time of release, i.e. no more libraries to be released with, the application can be executed independently.
  3. Disadvantages: However, there is no information about the linked library in the PE file header. This method results in a larger executable and takes up more memory space; if the static library is updated, all executable files will have to be re-linked to use the new static library. This linking method is not normally used by computer viruses to reduce the size of the virus.
  4. Linking time: at the time of generating the executable (linking done during compilation)


Dynamic Linking

Dynamic linking: Dynamic linking is the most common and should be of most concern to malicious code analysts. Dynamic linking information is written in the import table and when the code base is dynamically linked, the host operating system will search for the required code base when the program is loaded.

  1. Features: Instead of directly copying the executable code at compile time, this information is passed to the operating system by recording a series of symbols and parameters, which are passed to the operating system when the program is run or loaded. The operating system is responsible for loading the required dynamic libraries into memory, and then the program, when running to the specified code, goes to share the execution of the dynamic library executable code already loaded in memory, eventually achieving the purpose of run-time connectivity.
  2. Advantage: Multiple programs can share the same piece of code without the need to store multiple copies on disk.
  3. Disadvantage: As it is loaded at runtime, it may affect the pre-execution performance of the program.
  4. Link time: When the program is running or loaded.

Load-time dynamic linking

When the application calls the LoadLibrary or LoadLibraryEx function, the system tries to locate the DLL in load-time dynamic linking search order (see Load-time dynamic linking); if found, the system maps the DLL module into the process's virtual address space and increases the reference count.

Run-time linking

If the code of the DLL specified when LoadLibrary or LoadLibraryEx is called is already mapped to the virtual address space of the calling process, the function returns only the handle to the DLL and increases the DLL reference count.
Note: Two DLLs with the same filename and extension but not in the same directory are not considered to be the same DLL.

Moreover, although run-time linking is not popular in legitimate programs, it is commonly used in malicious code, especially when the malicious code is cased or obfuscated. Because shelling or obfuscation destroys the import table of a computer virus, without which the Windows system will not help the virus to complete its linking work, it is necessary to use run-time linking as a method to load the required linked libraries and functions into memory space at runtime.

  1. Features: Link only if needed for fit.
  2. Advantage: Executable programs using run-time linking only link to the library when a function is needed, rather than at program startup as in dynamic linking mode.
  3. Disadvantage: You need to use the relevant function to call it.
  4. Link time: When a function call is encountered.

Link-based analysis:
The PE file header lists all dynamic link libraries and functions required by the computer virus code. Dynamic link library and function names can be used to analyse the function of a computer virus.

Commonly used analytical tools

  • Dependency Walker: Included in some versions of Visual Studio and other Microsoft development packages to support dynamic linking functions that list executable files

Common functions in viruses:

  1. LoadLibrary: Dynamically loads the dynamic link library from the hard disk into the computer virus memory space.
  2. GetProcAddress: Finds the address of the corresponding function in the DLL.
  3. URLDownloadToFile(): Will download a file from the Internet.

    Import functions
    The PE file header also contains information about the specific function used by the executable, as you can only see the name of the function in the import function, in order to understand the parameters, functions and usage of the function, you can find this information in Microsoft's MSDN or, of course, using a search engine.

    Exporting functions
    Similar to the import functions, the export functions of DLLs and EXEs are used to interact with other programs and code. Usually a DLL will implement one or more functions and then export them so that other programs can import and use them. The PE file also contains information about which functions are exported in a file

Ancillary kill detection

Anti-virus software, malware checking platforms and malware analysis platforms are commonly used to assist in the checking and killing process, and they have the following advantages:

  • Having a virus signature database: A database that contains various "lookalikes" of known viruses, based on which proprietary characteristics, software can be identified as a virus, mainly for known viruses.
  • Virus targeting: the writers of computer viruses can easily modify their code to change the various characteristics of these viruses, often using the following techniques to avoid detection by anti-virus software
  • Code: Polymorphic techniques: semantic invariance, syntactic obfuscation, increased difficulty of inverse analysis. Morphing techniques: functionally invariant, semantically obfuscated, increasing the difficulty of inverse analysis.
  • One-way execution techniques: undeciphered numerical guesses, hashes, increasing the difficulty of reverse analysis. Rubbish instructions: use of a large number of instructions that are useless for analysis, making reverse analysis more difficult.
  • Have heuristic rules: It's important to have heuristic rules because there are virus characteristics in the feature library is not, antivirus software did not check these unknown viruses, it is based on the known virus analysis experience summed up some rules to identify whether the software is a virus, mainly for unknown viruses.
    Virus for: the development of new types of viruses, not used also by antivirus software to know the characteristics and behavior has avoided antivirus software detection

Hash value and hash function
Under certain conditions where local antivirus software is absent and there are restrictions on data traffic, one can calculate the hash value of a file and use it check (identify) and kill (remove) malware, or to simply perform checks on certain websites.

The principle and common query platform is as follows:

A hash function is an algorithm that calculates a hash value. A hash value is a unique identifier of a file, which varies from file to file, with influencing factors like file size, content, creation date, etc. By calculating the hash value, you can determine if a file has been corrupted or modified, which can also be used to query the analysis results in the query platform.

Calculation tools:
Hasher Pro: http://www.den4b.com/
HashOnClick: https://www.2brightsparks.com
Hash Generator Pro: http://insili.co.uk/
MD5 File Hasher Pro: http://www.digital-tronic.com/md5-file-hasher/
Advanced Hash Calculator: http://www.filesweb.com/
Virus Total: https://www.virustotal.com/gui/home/search

1688208357286.png



morality is one foot higher, the devil one foot

In the following sections, I will discuss the shelling and obfuscation techniques used by higherViruses.

Read on to learn more.

higherViruses

higherViruses often use shelling and obfuscation techniques to avoid being analysed by static analysis techniques.

The purpose of shelling and obfuscation is to avoid detection by antivirus software and to make virus analysis more difficult.

Obfuscation refers to the process of concealing something important, valuable, or critical. Obfuscation hides information about computer virus programs.

Obfuscation tools:
DotFuscator: https://www.preemptive.com/
DashO Pro:  https://www.preemptive.com/
ProGuard: https://www.guardsquare.com/en/proguard
Virbox Protector:https://shell.virbox.com/
Code Virtualizer:https://www.oreans.com/
Skater .NET obfuscator:http://www.rustemsoft.com/

Shelling

Virus shelling: Compressing the size of computer virus files and protecting the core code of the virus using encryption techniques.

Virus shelling tools:
UPXShell: http://upxshell.sourceforge.net/download.html
DRMsoft EncryptEXE: http://www.drmsoft.com/
Vmproject:https://vmpsoft.com/

Protection strategies against viruses, often assisted by shelling and anti-obfuscation techniques for analysis.

Shell Removal: Shell removal is the removal of software shell. It is the process of extracting the core code from a shell-protected software, which is usually used in the analysis of malware. There are manual shelling and automatic shelling of software.

Shell removal tools:
QuickUnpack:http://qunpack.ahteam.org/?p=458#more-458
frida-unpack:https://github.com/WeiEast/frida-unpack
de4dothttps://github.com/0xd4d/de4dot
drizzleDumper:https://github.com/DrizzleRisk/drizzleDumper
de4js:https://github.com/lelinhtinh/de4js
wxappUnpacker:https://github.com/gzh4213/wxappUnpacker
Android_unpacker:https://github.com/CheckPointSW/android_unpacker
unpacker:https://github.com/malwaremusings/unpacker

Anti-obfuscation

Anti-obfuscation: bringing code back to a beautiful, highly readable state.

Anti-obfuscation tools (commonly used)
simplifyhttps://github.com/CalebFenton/simplify
de4dot:https://github.com/0xd4d/de4dot
flare-floss:https://github.com/fireeye/flare-floss
Tigress_protectionhttps://github.com/JonathanSalwan/Tigress_protection
VTIL-Corehttps://github.com/vtil-project/VTIL-Core
dex-oracle:https://github.com/CalebFenton/dex-oracle
malware-jail:https://github.com/HynekPetrak/malware-jail
de4jshttps://github.com/lelinhtinh/de4js
dnpatchhttps://github.com/ioncodes/dnpatch
etacsufbohttps://github.com/ChiChou/etacsufbo
samsung-firmware-magichttps://github.com/chrivers/samsung-firmware-magic
JRemapper:https://github.com/Col-E/JRemapper