home *** CD-ROM | disk | FTP | other *** search
- Date: Fri, 30 Apr 1993 19:43:34 -0400
- From: tyetiser@umbc.edu (Mr. Tarkan Yetiser)
- Subject: Polymorphic Viruses
-
- Polymorphic Viruses: Implementation, Detection, and Protection
-
- Copyright (c) 1993
-
- by
-
- VDS Advanced Research Group
- P.O. Box 9393
- Baltimore, MD 21228, U.S.A.
-
- prepared by
-
- Tarkan Yetiser
-
- e-mail: tyetiser@umbc5.umbc.edu
-
- Jan 24, 1993
- PA, U.S.A.
-
-
- Summary
-
- This paper discusses the subject of polymorphic engines and viruses. It looks
- at general characteristics of polymorphism as currently implemented. It tries
- to maintain a practical presentation of the subject matter rather than an
- academic and abstract approach that would confuse many people. Basic
- knowledge of the Intel 80x86 instruction set will be highly useful in
- understanding the material presented. A very detailed discussion is avoided
- not to have the side effect of "teaching" how to create polymorphic engines
- or viruses. The purpose is to help computer professionals understand this
- trend of virus development and the threats it poses. It should serve as a
- starting point for individuals who would like to get an idea about the
- polymorphic viruses and how they are implemented. Long gone are the days of
- innocence, when any schoolboy could write a virus scanner using a few
- signatures extracted from captured virus samples.
-
- The subject of polymorphism can be extended to other areas such as
- anti-reverse-engineering or anti-direct-attacks, and it can be argued to be
- useful in that context. This paper only looks at the use of polymorphism in
- PC viruses to avoid simple detection techniques.
-
-
-
- 1. Introduction
-
- In the Spring of 1992, we analyzed a polymorphic engine called MtE, and
- provided a report to satisfy the curiosity of the public. We also provided a
- freeware program called CatchMtE. At the end of 1992, we received reports
- of a new polymorphic engine, called TridenT Polymorphic Engine, or TPE for
- short. It was released in Europe by someone who calls himself Masud Khafir.
-
- Both MtE and TPE are distributed as object modules that can be linked to
- programs allowing them to create different-looking decryptors. One notorious
- use of such a module is for writing so-called "polymorphic" viruses. Prior
- to the appearance of TPE, several viruses using MtE (Mutation Engine) have
- been seen. The claimed author of TPE pays tribute to the author of MtE in the
- documentation that comes with TPE.
-
- MtE is about 2.4 kilobytes in size. It can generate decyptors using based
- or indexed addressing modes with word-size displacement. The decryptor steps
- through the code a word at a time. It uses 4 variations of the based or
- indexed addressing modes. The structure of the decryptors is constant.
-
- TPE is about 1.5 kilobytes in size. It can generate decryptors using based
- or indexed addressing modes with or without displacement. Unlike MtE, TPE
- can also create byte-at-a-time decryptors, as well as word-at-a-time. It also
- uses more addressing modes available on the 80x86; 6 variations of the based
- or indexed addressing modes are used. Its more general nature makes TPE less
- predictable, and complicates the task of recognizing TPE-based viruses. Many
- encryptive viruses can be considered a subset of TPE-based decryptors, and
- may be flagged as such. To overcome this problem, one has to check for other
- viruses before performing check for the presence of a TPE decryptor.
-
- 2. Polymorphism and Its Common Use
-
- We will now present some preliminary information on polymorphism in
- general, and discuss certain features of the Intel 80x86 instruction set.
-
- Although polymorphism is independent of encryption, it is easier to use
- encryption to hide the main body of the virus and implement a polymorphic
- decryptor. Viruses aim to keep their size as small as possible and it is
- impractical to make the main virus body polymorphic. One could attempt to
- rearrange the instructions in the main virus body or even use different
- instructions (to defy recognition techniques based on checksumming). Such an
- effort would not be as helpful or as easy as the common approach of using
- encryption in combination with polymorphism. In summary, current polymorphic
- viruses keep the main virus body encrypted, and implement a polymorphic
- decryption routine in plaintext. Since the decryptor is comparatively small,
- and performs one specific task, the amount of time and effort needed to
- craft a polymorphic virus is significantly reduced. Pictorially, a generic
- polymorphic virus would be structured as follows:
-
-
- virus entry point:
- Polymorphic Decryptor
- *****************************
- ** encrypted ****************
- ** main virus body **********
- *****************************
- *****************************
-
-
- 3. Implementation of Polymorphism on the Intel 80x86
-
- In almost every case we have examined, the polymorphic engine exploits the
- fact that certain computations can be performed using different registers and
- instructions. To step through encrypted portion of code, for example, one
- can use DI, SI, or BX registers. To increment or to decrement the index value,
- one can ADD to the index register, INC it, or use an implicit instruction that
- increments it (CMPSB is used in TPE for example).
-
- Polymorphic engines can also rely on the availability of instructions that
- are coded using the same opcodes. On the 80x86, there are 11 opcodes used for
- several different instructions: 80, 81, 83, D0, D1, D2, D3, F6, F7, FE, and
- FF. When it is necessary to encode information about one operand, the middle
- three bits of the ModRM byte are used to distinguish operations. The ModRM
- byte follows some opcodes and can either extend the opcode or contain
- information about the operand and addressing modes of an instruction. It
- contains three fields: Mod (2 bits), Reg (3 bits), and R/M (3 bits). For
- example, for the opcode 80, the middle three bits of the ModRM byte, which
- make up the Reg field, have the following meanings:
-
- ADD 000
- OR 001
- ADC 010
- SBB 011
- AND 100
- SUB 101
- XOR 110
- CMP 111
-
- The second operand of each instruction with an opcode of 80 is an
- immediate byte, so the ModRM fields in the second byte of machine code encode
- the first register or memory operand.
-
- By flipping a few bits, it is possible to generate code achieving many
- different operations easily. Combined with a rich set of addressing modes,
- and a good random number generator, one can create very different-looking
- decryptors.
-
- 4. Common Characteristics of Polymorphic Viruses
-
- Polymorphic viruses of varying degrees of complexity have appeared in the
- past. In the anti-virus community, polymorphism is still an active area of
- discussion and much disagreement. Most researchers would agree that certain
- viruses are simply encryptive with a variable key. We do not consider such
- beasts polymorphic since they can be recognized using simple wild card scan
- strings. The number of such viruses significantly increased over the past few
- years as a reaction to the worldwide availability of good signature-based
- virus scanners. Similarly, scanners also evolved to handle such beasts using
- more flexible signatures that can include wild card or don't-care values to
- accommodate the variable parts of the decryptors.
-
- There are other viruses that exhibit some polymorphic behavior, though
- they remain unsophisticated. Classification of such beasts is a gray area.
- For example, some viruses can generate a limited number of different-looking
- decryptors. These do not implement a polymorphic code generation engine, but
- rather pick from a pre-computed set of code fragments that they carry along.
- Writing detection routines for such viruses is not too difficult.
-
- To aid in implementing polymorphism, a random-number generator is often
- used. A random-number generator provides selection of a part of the decryptor.
- This selection could be for "noise" instructions that are inserted in the
- decryptor to render signature-based scanning ineffective. It could also be
- used to select a certain addressing mode and appropriate registers. The
- obvious use of random number generators is to get a seed value for encryption.
-
- Another aspect of a polymorphic engine is the choice of instructions that
- actually perform the decryption. XOR is a favorite choice. The chosen
- operation modifies the code to be decrypted using an addressing mode that
- allows external memory access. By external, we mean outside the processor
- registers. By using several instructions and many different addressing modes,
- a polymorphic engine can achieve a large number of combinations of decryptors.
-
- Usually, a loop construct is set up to step through the encrypted code. On
- the 80x86, many instructions can be used to accomplish looping. The loop
- counter can be modified implicitly using LOOP instruction. Or it could be
- more explicit using a DEC CX, JNZ ?? combination. Several such possibilities
- exist. Again, availability of different approaches increases the number of
- appearances a decryptor can have. In the MtE, the loop construct was very
- predictable not only because it used a specific instruction but also because
- it had a characteristic that made it very simple to recognize MtE-based
- decryptors. Although "appearance-based" analysis on the MtE left many
- anti-virus developers in despair, a structural analysis proved to be very
- effective. We doubt even the developer of MtE noticed how predictable his
- polymorphic engine is.
-
- The goal of a polymorphic engine is not to leave any predictable sequence
- of instructions that a virus scanner can use to simplify or optimize an
- appropriate detection algorithm. For example, using the same instruction to
- increment the index register is too predictable. A mixture of instructions
- that achieve the same result explicitly or implicitly (as in TPE) would make
- detection a lot harder.
-
- 5. Detection of Polymorphic Viruses
-
- There are several problems that must be addressed to develop a reliable
- detection routine for a given polymorphic engine. The most challenging one
- appears to be avoiding false positives. Many programs include a run-time
- decompressor to reduce space requirements. When loaded in memory, the
- decompressor takes charge and produces an expanded copy of the code that can
- be run on the CPU. Furthermore, some programs are actually encrypted on disk,
- and they are decrypted on the fly when they are loaded. The purpose of using
- such a scheme is to make reverse-engineering efforts difficult. Such programs
- are more likely to trigger a false positive.
-
- To reduce false positive rate, one can limit the search to a small area of
- the suspect file. This would be helpful in many cases; however, the compressed
- and encrypted files carry their decompressor/decryptor at the entry point of
- the program, just where many viruses plant their code. Another trick a few
- viruses employ is to hide the virus entry point where the decryptor is
- located. Of course, scattering the virus code while keeping the host
- functional complicates the virus.
-
- A structural analysis of TPE reveals some information that can be used to
- recognize a TPE-based decryptor, although it is not as useful as it is in the
- case of MtE. Note that structural analysis does not rely on presence of
- specific byte sequences, and therefore offers a powerful tool that can be
- used in developing recognition routine for many polymorphic engines.
-
- 6. Protection Mechanisms and Solutions
-
- Anti-virus field is full of solutions to almost every existing virus
- problem, and sometimes solutions to non-existing problems as well. Among
- other things, one solution prevailed as the most popular: scanning. The
- inherent flaw in this solution is that it cannot cope with viruses that it
- does not have signatures for. Therefore ugrades are issued frequently.
- Deployment of such upgrades in large installations requires tremendous
- effort, and quickly translates into having a very old version of a scanner
- installed; even then not necessarily used. Aside from the false sense of
- security scanning gives, some companies make less-than-honest claims about
- the capabilities of their scanners to promote their products. Testing such
- claims is beyond the resources of most organizations. Those who could conduct
- such tests are not always impartial.
-
- Scanning has its place, and it is very useful when used properly. The best
- protection against computer viruses is user awareness and education.
- Unfortunately, a well-written virus can be so transparent that even the most
- observant user may not notice a thing. Even worse, many users lack the
- technical knowledge to understand how their computers work. Most people simply
- do not have the time or desire to learn more than how to use a mouse. The
- proof of this is the proliferation of products that make excessive hardware
- demands without offering improved functionality. The bulk of the code takes
- care of the user interface. This trend is unlikely to change.
-
- More powerful techniques such as integrity checking can deal with viruses
- of different kinds, including polymorphic viruses. Even a simple integrity
- checker should be able to find if a polymorphic virus is spreading all over
- your disk. In other words, solution to polymorphic viral spread has been
- available for quite some time. Note that there are some viruses that target
- integrity-based products and they must be dealt with accordingly. A detailed
- discussion of such viral methods can be found in a paper written by anti-virus
- researcher Mr. Vesselin Bontchev of Virus Test Center at the University of
- Hamburg. His paper is titled Attacks Against Integrity Checkers, and it is
- available via anonymous FTP. Interested individuals are encouraged to read
- his excellent paper.
-
- The point is that an integrity-based anti-viral solution should be made
- part of your arsenal against viruses. Such a solution could easily provide
- you with early detection of viruses before they spread. Once the viral spread
- is controlled, viruses become nothing more than another "computer glitch" that
- can haunt only those without timely backups.
-
- We believe that neither harsh legislation nor emphasis on responsible
- computing can stop virus development, although they may slow it down. It is
- necessary to take matters into your own hands and protect your computers
- adequately.
-