home *** CD-ROM | disk | FTP | other *** search
-
- so. An exception is the saving and restoring of registers at
- entrance to and exit from a subroutine; here, if the subroutine is
- long, you should probably PUSH everything which the caller may need
- saved, whether you will use the register or not, and POP it in
- reverse order at the end.
- Be aware that CALL and INT push return address information on the
- stack and RET and IRET pop it off. It is a good idea to become
- familiar with the structure of the stack.
- c. In practice, to invoke system services you will use the INT
- instruction. It is quite possible to use this instruction effec-
- tively in a cookbook fashion without knowing precisely how it
- works.
- d. The transfer of control instructions (CALL, RET, JMP) deserve care-
- ful study to avoid confusion. You will learn that these can be
- classified as follows:
- 1) all three have the capability of being either NEAR (CS register
- unchanged) or FAR (CS register changed)
- 2) JMPs and CALLs can be DIRECT (target is assembled into instruc-
- tion) or INDIRECT (target fetched from memory or register)
- 3) if NEAR and DIRECT, a JMP can be SHORT (less than 128 bytes
- away) or LONG
- In general, the third issue is not worth worrying about. On a for-
- ward jump which is clearly VERY short, you can tell the assembler
- it is short and save one byte of code:
- JMP SHORT CLOSEBY
- On a backward jump, the assembler can figure it out for you. On a
- forward jump of dubious length, let the assembler default to a LONG
- form; at worst you waste one byte.
- Also leave the assembler to worry about how the target address is
- to be represented, in absolute form or relative form.
- e. The conditional jump set is rather confusing when studied apart
- from the assembler, but you do need to get a feeling for it. The
- interactions of the sign, carry, and overflow flags can get your
- mind stuttering pretty fast if you worry about it too much. What
- is boils down to, though, is
- JZ means what it says
- JNZ means what it says
- JG reater this means "if the SIGNED difference is positive"
- JA bove this means "if the UNSIGNED difference is positive"
- JL ess this means "if the SIGNED difference is negative"
- JB elow this means "if the UNSIGNED difference is negative"
- JC arry assembles the same as JB; it's an aesthetic choice
-
- IBM PC Assembly Language Tutorial 10
-
-
- You should understand that all conditional jumps are inherently
- DIRECT, NEAR, and "short"; the "short" part means that they can't
- go more than 128 bytes in either direction. Again, this is some-
- thing you could easily imagine to be more of a problem than it is.
- I follow this simple approach:
- 1) When taking an abnormal exit from a block of code, I always use
- an unconditional jump. Who knows how far you are going to end
- up jumping by the time the program is finished. For example, I
- wouldn't code this:
- TEST AL,IDIBIT ;Is the idiot bit on?
- JNZ OYVEY ;Yes. Go to general cleanup
- Rather, I would probably code this:
- TEST AL,IDIBIT ;Is the idiot bit on?
- JZ NOIDIOCY ;No. I am saved.
- JMP OYVEY ;Yes. What can we say...
- NOIDIOCY:
- The latter, of course, is a jump around a jump. Some would say
- it is evil, but I submit it is hard to avoid in this language.
- 2) Otherwise, within a block of code, I use conditional jumps
- freely. If the block eventually grows so long that the assem-
- bler starts complaining that my conditional jumps are too long
- I
- a) consider reorganizing the block but
- b) also consider changing some conditional jumps to their
- opposite and use the "jump around a jump" approach as shown
- above.
- Enough about specific instructions!
- 6. Finally, in order to use the assembler effectively, you need to know
- the default rules for which segment registers are used to complete
- addresses in which situations.
- a. CS is used to complete an address which is the target of a NEAR
- DIRECT jump. On an NEAR INDIRECT jump, DS is used to fetch the
- address from memory but then CS is used to complete the address
- thus fetched. On FAR jumps, of course, CS is itself altered. The
- instruction counter is always implicitly pointing in the code seg-
- ment.
- b. SS is used to complete an address if BP is used in its formation.
- Otherwise, DS is always used to complete a data address.
- c. On the string instructions, the target is always formed from ES and
- DI. The source is normally formed from DS and SI. If there is a
- segment prefix, it overrides the source not the target.
-
- IBM PC Assembly Language Tutorial 11
-
-
- Learning about DOS
- __________________
- Learning about DOS
- Learning about DOS
- Learning about DOS
- I think the best way to learn about DOS internals is to read the technical
- appendices in the manual. These are not as complete as we might wish, but
- they really aren't bad; I certainly have learned a lot from them. What you
- don't learn from them you might eventually learn via judicious disassembly
- of parts of DOS, but that shouldn't really be necessary.
- From reading the technical appendices, you learn that interrupts 20H
- through 27H are used to communicate with DOS. Mostly, you will use inter-
- rupt 21H, the DOS function manager.
- The function manager implements a great many services. You request the
- individual services by means of a function code in the AH register. For
- example, by putting a nine in the AH register and issuing interrupt 21H you
- tell DOS to print a message on the console screen.
- Usually, but by no means always, the DX register is used to pass data for
- the service being requested. For example, on the print message service
- just mentioned, you would put the 16 bit address of the message in the DX
- register. The DS register is also implicitly part of this argument, in
- keeping with the universal segmentation rules.
- In understanding DOS functions, it is useful to understand some history and
- also some of the philosophy of MS-DOS with regard to portability. General-
- ly, you will find, once you read the technical information on DOS and also
- the IBM technical reference, you will know more than one way to do almost
- anything. Which is best? For example, to do asynch adapter I/O, you can
- use the DOS calls (pretty incomplete), you can use BIOS, or you can go
- directly to the hardware. The same thing is true for most of the other
- primitive I/O (keyboard or screen) although DOS is more likely to give you
- added value in these areas. When it comes to file I/O, DOS itself offers
- more than one interface. For example, there are four calls which read data
- from a file.
- The way to decide rationally among these alternatives is by understanding
- the tradeoffs of functionality versus portability. Three kinds of porta-
- bility need to be considered: machine portability, operating system porta-
- bility (for example, the ability to assemble and run code under CP/M 86)
- and DOS version portability (the ability for a program to run under older
- versions of DOS>.
- Most of the functions originally offered in DOS 1.0 were direct descendents
- of CP/M functions; there is even a compatibility interface so that programs
- which have been translated instruction for instruction from 8080 assembler
- to 8086 assembler might have a reasonable chance of running if they use
- only the core CP/M function set. Among the most generally useful in this
- original compatibility set are
-
-
-
- IBM PC Assembly Language Tutorial 12
-
-
- 09 -- print a full message on the screen
- 0A -- get a console input line with full DOS editing
- 0F -- open a file
- 10 -- close a file (really needed only when writing)
- 11 -- find first file matching a pattern
- 12 -- find next file matching a pattern
- 13 -- erase a file
- 16 -- create a file
- 17 -- rename a file
- 1A -- set disk transfer address
- The next set provide no function above what you can get with BIOS calls or
- more specialized DOS calls. However, they are preferable to BIOS calls
- when portability is an issue.
- 00 -- terminate execution
- 01 -- read keyboard character
- 02 -- write screen character
- 03 -- read COM port character
- 04 -- write COM port character
- 05 -- print a character
- 06 -- read keyboard or write screen with no editing
- The standard file I/O calls are inferior to the specialized DOS calls but
- have the advantage of making the program easier to port to CP/M style sys-
- tems. Thus they are worth mentioning:
- 14 -- sequential read from file
- 15 -- sequential write to file
- 21 -- random read from file
- 22 -- random write to file
- 23 -- determine file size
- 24 -- set random record
- In addition to the CP/M compatible services, DOS also offers some special-
- ized services which have been available in all releases of DOS. These
- include
- 27 -- multi-record random read.
- 28 -- multi-record random write.
- 29 -- parse filename
- 2A-2D -- get and set date and time
- All of the calls mentioned above which have anything to do with files make
- use of a data area called the "FILE CONTROL BLOCK" (FCB). The FCB is any-
- where from 33 to 37 bytes long depending on how it is used. You are
- responsible for creating an FCB and filling in the first 12 bytes, which
- contain a drive code, a file name, and an extension.
- When you open the FCB, the system fills in the next 20 bytes, which
- includes a logical record length. The initial lrecl is always 128 bytes,
- to achieve CP/M compatibility. The system also provides other useful
- information such as the file size.
-
- IBM PC Assembly Language Tutorial 13
-
-
- After you have opened the FCB, you can change the logical record length.
- If you do this, your program is no longer CP/M compatible, but that doesn't
- make it a bad thing to do. DOS documentation suggests you use a logical
- record length of one for maximum flexibility. This is usually a good
- recommendation.
- To perform actual I/O to a file, you eventually need to fill in byte 33 or
- possibly bytes 34-37 of the FCB. Here you supply information about the
- record you are interested in reading or writing. For the most part, this
- part of the interface is compatible with CP/M.
- In general, you do not need to (and should not) modify other parts of the
- FCB.
- The FCB is pretty well described in appendix E of the DOS manual.
- Beginning with DOS 2.0, there is a whole new system of calls for managing
- files which don't require that you build an FCB at all. These calls are
- quite incompatible with CP/M and also mean that your program cannot run
- under older releases of DOS. However, these calls are very nice and easy
- to use. They have these characteristics
- 1. To open, create, delete, or rename a file, you need only a character
- string representing its name.
- 2. The open and create calls return a 16 bit value which is simply placed
- in the BX register on subsequent calls to refer to the file.
- 3. There is not a separate call required to specify the data buffer.
- 4. Any number of bytes can be transfered on a single call; no data area
- must be manipulated to do this.
- The "new" DOS calls also include comprehensive functions to manipulate the
- new chained directory structure and to allocate and free memory.
-
- Learning the assembler
- ______________________
- Learning the assembler
- Learning the assembler
- Learning the assembler
- It is my feeling that many people can teach themselves to use the assembler
- by reading the MACRO Assembler manual if
- 1. You have read and understood a book like Morse and thus have a feeling
- for the instruction set
- 2. You know something about DOS services and so can communicate with the
- keyboard and screen and do something marginally useful with files. In
- the absence of this kind of knowledge, you can't write meaningful prac-
- tice programs and so will not progress.
- 3. You have access to some good examples (the ones supplied with the
- assembler are not good, in my opinion. I will try to supply you with
- some more relevant ones.
- IBM PC Assembly Language Tutorial 14
-
-
- 4. You ignore the things which are most confusing and least useful. Some
- of the most confusing aspects of the assembler include the facilities
- combining segments. But, you can avoid using all but the simplest of
- these facilities in many cases, even while writing quite substantial
- applications.
- 5. The easiest kind of assembler program to write is a COM program. They
- might seem harder, at first, then EXE programs because there is an
- extra step involved in creating the executable file, but COM programs
- are structurally very much simpler.
- At this point, it is necessary to talk about COM programs and EXE programs.
- As you probably know, DOS supports two kinds of executable files. EXE pro-
- grams are much more general, can contain many segments, and are generally
- built by compilers and sometimes by the assembler. If you follow the lead
- given by the samples distributed with the assembler, you will end up with
- EXE programs. A COM program, in contrast, always contains just one
- segment, and receives control with all four segment registers containing
- the same value. A COM program, thus, executes in a simplified environment,
- a 64K address space. You can go outside this address space simply by tem-
- porarily changing one segment register, but you don't have to, and that is
- the thing which makes COM programs nice and simple. Let's look at a very
- simple one.
- The classic text on writing programs for the C language says that the first
- thing you should write is a program which says
- HELLO, WORLD.
- when invoked. What's sauce for C is sauce for assembler, so let's start
- with a HELLO program of our own. My first presentation of this will be
- bare bones, not stylistically complete, but just an illustration of what an
- assembler program absolutely has to have:
- HELLO SEGMENT ;Set up HELLO code and data section
- ASSUME CS:HELLO,DS:HELLO ;Tell assembler about conditions at entry
- ORG 100H ;A .COM program begins with 100H byte prefix
- MAIN: JMP BEGIN ;Control must start here
- MSG DB 'Hello, world.$' ;But it is generally useful to put data first
- BEGIN: MOV DX,OFFSET MSG ;Let DX --> message.
- MOV AH,9 ;Set DOS function code for printing a message
- INT 21H ;Invoke DOS
- RET ;Return to system
- HELLO ENDS ;End of code and data section
- END MAIN ;Terminate assembler and specify entry point
- First, let's attend to some obvious points. The macro assembler uses the
- general form
- name opcode operands
- Unlike the 370 assembler, though, comments are NOT set off from operands by
- blanks. The syntax uses blanks as delimiters within the operand field (see
- line 6 of the example) and so all comments must be set off by semi-colons.
- IBM PC Assembly Language Tutorial 15
-
-
- Line comments are frequently set off with a semi-colon in column 1. I use
- this approach for block comments too, although there is a COMMENT statement
- which can be used to introduce a block comment.
- Being an old 370 type, I like to see assembler code in upper case, although
- my comments are mixed case. Actually, the assembler is quite happy with
- mixed case anywhere.
- As with any assembler, the core of the opcode set consists of opcodes which
- generate machine instructions but there are also opcodes which generate
- data and ones which function as instructions to the assembler itself, some-
- times called pseudo-ops. In the example, there are five lines which gener-
- ate machine code (JMP, MOV, MOV, INT, RET), one line which generates data
- (DB) and five pseudo-ops (SEGMENT, ASSUME, ORG, ENDS, and END).
- We will discuss all of them.
- Now, about labels. You will see that some labels in the example end in a
- colon and some don't. This is just a bit confusing at first, but no real
- mystery. If a label is attached to a piece of code (as opposed to data),
- then the assembler needs to know what to do when you JMP to or CALL that
- label. By convention, if the label ends in a colon, the assembler will use
- the NEAR form of JMP or CALL. If the label does not end in a colon, it
- will use the FAR form. In practice, you will always use the colon on any
- label you are jumping to inside your program because such jumps are always
- NEAR; there is no reason to use a FAR jump within a single code section. I
- mention this, though, because leaving off the colon isn't usually trapped
- as a syntax error, it will generally cause something more abstruse to go
- wrong.
- On the other hand, a label attached to a piece of data or a pseudo-op never
- ends in a colon.
- Machine instructions will generally take zero, one or two operands. Where
- there are two operands, the one which receives the result goes on the left
- as in 370 assembler.
- I tried to explain this before, now maybe it will be even clearer: there
- are many more 8086 machine opcodes then there are assembler opcodes to rep-
- resent them. For example, there are five kinds of JMP, four kinds of CALL,
- two kinds of RET, and at least five kinds of MOV depending on how you count
- them. The macro assembler makes a lot of decisions for you based on the
- form taken by the operands or on attributes assigned to symbols elsewhere
- in your program. In the example above, the assembler will generate the
- NEAR DIRECT form of JMP because the target label BEGIN labels a piece of
- code instead of a piece of data (this makes the JMP DIRECT) and ends in a
- colon (this makes the JMP NEAR). The assembler will generate the immediate
- forms of MOV because the form OFFSET MSG refers to immediate data and
- because 9 is a constant. The assembler will generate the NEAR form of RET
- because that is the default and you have not told it otherwise.
- The DB (define byte) pseudo-op is an easy one: it is used to put one or
- more bytes of data into storage. There is also a DW (define word)
- pseudo-op and a DD (define doubleword) pseudo-op; in the PC MACRO assem-
- bler, the fact that a label refers to a byte of storage, a word of storage,
- IBM PC Assembly Language Tutorial 16
-
-
- or a doubleword of storage can be very significant in ways which we will
- see presently.
- About that OFFSET operator, I guess this is the best way to make the point
- about how the assembler decides what instruction to assemble: an analogy
- with 370 assembler:
- PLACE DC ......
- ...
- LA R1,PLACE
- L R1,PLACE
- In 370 assembler, the first instruction puts the address of label PLACE in
- register 1, the second instruction puts the contents of storage at label
- PLACE in register 1. Notice that two different opcodes are used. In the
- PC assembler, the analogous instructions would be
- PLACE DW ......
- ...
- MOV DX,OFFSET PLACE
- MOV DX,PLACE
- If PLACE is the label of a word of storage, then the second instruction
- will be understood as a desire to fetch that data into DX. If X is a
- label, then "OFFSET X" means "the ordinary number which represents X's off-
- set from the start of the segment." And, if the assembler sees an ordinary
- number, as opposed to a label, it uses the instruction which is equivalent
- to LA.
- If PLACE were the label of a DB pseudo-op, instead of a DW, then
- MOV DX,PLACE
- would be illegal. The assembler worries about length attributes of its
- operands.
- Next, numbers and constants in general. The assembler's default radix is
- decimal. You can change this, but I don't recommend it. If you want to
- represent numbers in other forms of notation such as hex or bit, you gener-
- ally use a trailing letter. For example,
- 21H
- is hexidecimal 21,
- 00010000B
- is the eight bit binary number pictured.
- The next elements we should point to are the SEGMENT...ENDS pair and the
- END instruction. Every assembler program has to have these elements.
- SEGMENT tells the assembler you are starting a section of contiguous mate-
- rial (code and/or data). The symmetrically named ENDS statement tells the
- assembler you are finished with a section of contiguous material. I wish
- they didn't use the word SEGMENT in this context. To me, a "segment" is a
- hardware construct: it is the 64K of real storage which becomes address-
- able by virtue of having a particular value in a segment register. Now, it
- IBM PC Assembly Language Tutorial 17
-
-
- is true that the "segments" you make with the assembler often correspond to
- real hardware "segments" at execution time. But, if you look at things
- like the GROUP and CLASS options supported by the linker, you will discover
- that this correspondence is by no means exact. So, at risk of maybe con-
- fusing you even more, I am going to use the more informal term "section" to
- refer to the area set off by means of the SEGMENT and ENDS instructions.
- The sections delimited by SEGMENT...ENDS pairs are really a lot like CSECTs
- and DSECTs in the 370 world.
- I strongly recommend that you be selective in your study of the SEGMENT
- pseudo-op as described in the manual. Let me just touch on it here.
- name SEGMENT
- name SEGMENT PUBLIC
- name SEGMENT AT nnn
- Basically, you can get away with just the three forms given above. The
- first form is what you use when you are writing a single section of assem-
- bler code which will not be combined with other pieces of code at link
- time. The second form says that this assembly only contains part of the
- section; other parts might be assembled separately and combined later by
- the linker.
- I have found that one can construct reasonably large modular applications
- in assembler by simply making every assembly use the same segment name and
- declaring the name to be PUBLIC each time. If you read the assembler and
- linker documentation, you will also be bombarded by information about more
- complex options such as the GROUP statement and the use of other "combine
- types" and "classes." I don't recommend getting into any of that. I will
- talk more about the linker and modular construction of programs a little
- later. The assembler manual also implies that a STACK segment is required.
- This is not really true. There are numerous ways to assure that you have a
- valid stack at execution time.
- Of course, if you plan to write applications in assembler which are more
- than 64K in size, you will need more than what I have told you; but who is
- really going to do that? Any application that large is likely to be coded
- in a higher level language.
- The third form of the SEGMENT statement makes the delineated section into
- something like a "DSECT;" that is, it doesn't generate any code, it just
- describes what is present somewhere already in the computer's memory.
- Sometimes the AT value you give is meaningful. For example, the BIOS work
- area is located at location 40 hex. So, you might see
- BIOSAREA SEGMENT AT 40H ;Map BIOS work area
- ORG BIOSAREA+10H
- EQUIP DB ? ;Location of equipment flags, first byte
- BIOSAREA ENDS
- in a program which was interested in mucking around in the BIOS work area.
- At other times, the AT value you give may be arbitrary, as when you are
- mapping a repeated control block:
- IBM PC Assembly Language Tutorial 18
-