Text Truncated. Only the first 1MB is shown below. Download the file for the complete contents.
Introduction N.1 ================ Hi! Welcome to "The PC Assembler Helper" and "The PC Assembler Tutor". Both the program and the tutorial are designed to help those who are just starting to learn assembler language as well as those who know some assembler instructions but want to have a firmer grasp of the complete instruction set for the 8086. There are two significant problems to learning assembler language. First, it is difficult to do either input or output at the assembler level. Imagine trying to learn BASIC if you were not allowed the following two instructions: PRINT RESULT INPUT NEW.DATA Without PRINT and INPUT, you might be able to write a program but you would not be able to see the results. You would never be sure that the results were what you wanted. Also, you would not be able to vary the data. It would have to be coded into the program. "The PC Assembler Helper" has taken care of this problem. It provides input and output of all standard 8086/8087 integral data types. These include 1 byte, 2 byte, 4 byte and 8 byte signed and unsigned numbers along with 1 byte and 2 byte hex, ASCII and binary data. Lastly, there is i/o for 10 byte BCD numbers. The interface has been designed so that beginners can use it with a minimum of trouble. The second major problem is that most assembler books regard the 8086 as a black box. There is no way of seeing the workings of the chip itself. What exactly happens when you add two numbers? What about multiplication? Once again, "The PC Assembler Helper" has come up with the solution. It allows you to view all the registers and flags at will. These registers can be independently formatted. If one register holds ASCII data while another has binary information and yet a third has a signed number, then each register can be set to display the appropriate type of data. This, you will find, is invaluable. A third, though less important problem, is assembler overhead. There is a certain structure that must be followed to get the program to assemble and run correctly. This has been provided in template files. All you need to do is copy the appropriate template file and put the code in a predefined location to get your program to run correctly. For simple programs this can cut your work in half. It also minimizes the number of typos. "The PC Assembler Tutor" is built around the Helper. It systematically goes through the 8086 instruction set, having you write small programs to illustrate how each instruction works. At The PC Assembler Tutor 2 ______________________ the end you should have a feeling for all the instructions except a few which involve the 8087 or peripheral hardware. These will be mentioned, but not used. You will be a better programmer if you know how all the instructions work. There are times when one specific instruction is just what you want. If you don't know that it exists or how it works, you won't use it. This way, if you run across a situation where you think that a certain instruction might be useful to you, you can go back to the Tutor to refresh your memory and be able to put the instruction to use almost immediately. If you are a beginner, I feel confident that you will learn faster and more thoroughly than with any other method. If you know some assembler but would like to know more, I'm sure that there is lots that would interest you. AAD SBB XLAT REPNE SCAS Do you know what these are? What about segment overrides? Do you know when to use them and when to avoid them? Do you know ALL the allowable addressing modes? What actually is an ASSUME statement? In order to let you see if you find the material interesting, you may go through chapters 0 - 4 chapters without any obligation. You may also make an archival copy of the disks (you are urged to do so). If you continue after the fourth chapter then please register by sending $9.95 (or $10.60 for Californians) to Nelsoft. The registration form is at the end of this introduction. If I followed the pricing structure of other people I would be charging several times as much. My goal is different. I want everyone who can benefit from the program and tutorial to use them, and I want everyone who uses them to do so legally. Therefore, I have priced them so that everyone can pay for them without any inconvenience. If you use them, you can certainly afford my minimal price. The material is sequential. Chapter 0 should be read before starting on the other chapters and the chapters should be read in order. Appendix 1 contains all the subroutine calls in "The Assembler Helper" and how to access them. Appendix 2 is an alphabetical list of all the 8086 instructions, telling what they do and showing all allowable syntaxes. Appendix 3 gives the speed of all instructions along with a list of which flags are affected (you will learn what this means in the Tutor). It is to your benefit to start at the beginning and work your way through. Chapter 0 contains material that you need to know, so you must read it. The text has been broken up into sections so that no printout is longer than 10 pages or so. All text files have a file extension .DOC. If a chapter is much longer than Introduction 3 ____________ that, it will be divided into parts, indicated by -1, -2, -3 after the chapter number. These files should be printable with the DOS 'print' command. The only imbedded printer command code is form feed for the next page. The text runs about 2500 characters a page, so you can estimate the size of the printout from the size of the text file. Curly brackets in the text denote a footnote.{1} Some of the footnotes are technical and will be understood by only a quarter of the people. If you are one of that quarter, fine. If not, the important thing is not that you understand the outline of the proof, but that you believe that what is being proved is true. The assembler level is for those who have some degree of intellegence. You have an unparalleled opportunity to screw things up at this level. If you got Cs and Ds in high school algebra because you didn't quite understand what was going on, then you probably shouldn't do assembler programming.{2} In addition, I assume that you have done a lot of programming, preferably in either Pascal or C. BASIC is a nice language, but it is missing a certain type of structure which is vital for creating robust code in assembler language. If BASIC is all you know, I would recommend that you learn C first and then come back to assembler. You will be a better programmer for it.{3} Finally, "The Assembler Helper" assumes that it has control of the screen. If you are hooked up to a debugger, there may be a conflict. There is a subroutine in the Helper called "set_timer" which may help minimize this conflict. You need to be in chapter 5 or so before you will be able to use it. See \APPENDIX\APP1.DOC for details. If you are ready to go, please look at the following two pages and then read INTRO2.DOC. It will explain a little about what an assembler is. I hope you enjoy using the Helper and the Tutor as much as I enjoyed writing them. Chuck Nelson ____________________ 1. Like this one. 2. If you got Cs and Ds because you were too busy reading "Tales from the Crypt" and Isaac Asimov, that's something entirely different. 3. On re-reading this I decided that it is true, but pretensious. If you like BASIC and program well in BASIC, then you should learn assembler and continue using BASIC. There are certain inherent difficulties with BASIC, so before you start you should read BAS1.DOC. This is on DISK2. The PC Assembler Tutor 4 ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson All rights reserved Microsoft (R) Macro Assembler and Microsoft (R) Overlay Linker are registered trademarks of Microsoft Corporation. This manual contains screen output of the Macro Assembler and the Overlay Linker. Screen shots (C) 1981-1988 Microsoft Corporation. It also contains excerpts from Macro Assembler .LST files and Overlay Linker .MAP files. Portions of these files are Copyright (C) 1981-1988 Microsoft Corporation. Used with permission of Microsoft Corporation. TRADEMARK ACKNOWLEDGEMENT IBM is a registered trademark of International Business Machines Inc. Intel is a registered trademark of Intel Corporation. Macintosh is a registered trademark of Apple Computer, Inc. Microsoft is a registered trademark of Microsoft Corporation. Motorola is a registered trademark of Motorola, Inc. 8086 is a trademark of Intel Corporation. Codeview is a registered trademark of Microsoft Corporation. QuickC is a registered trademark of Microsoft Corporation. Turbo Pascal, Turbo Assembler and Turbo Debugger are registered trademarks of Borland International. The PC Assembler Helper was designed as a learning tool. It is meant to be used in conjunction with simple assembler programs to display the results of individual assembler instructions. It should not be used with high-level languages nor with programs that modify the screen. HELPMEM.COM, the memory resident version, uses the same interrupts as a debugger. Therefore, if there is a debugger attatched to any program that is being used, HELPMEM.COM should not be loaded into memory. WARRANTY THIS PROGRAM, INSTRUCTION MANUAL, AND REFERENCE MATERIALS ARE SOLD "AS IS", WITHOUT WARRANTY AS TO THEIR PERFORMANCE, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE RESULTS AND PERFORMANCE OF THESE PROGRAMS IS ASSUMED BY YOU. ***************************************************************** REGISTRATION Hey, Chuck, I'm no chump! I'm using your programs/manual, and I want to pay my fair share. Please make me a registered user of "The PC Assembler Tutor" and "The PC Assembler Helper". Enclosed is a check for $9.95 (plus 6.5% tax or $10.60 for California residents). Say, that's cheaper than a large pizza! Name_________________________________________________________ Last First Initial Address______________________________________________________ Street Address _______________________________________________________ City, State, and Zip Code I got my copy from ___________________________________________ Make checks payable to NELSOFT and send your registration to: NELSOFT P.O. Box 21389 Oakland, CA 94620 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ REGISTRATION BENEFITS As a registered user of "The PC Assembler Helper" and "The PC Assembler Tutor" you are entitled to: 1) Use asmhelp.obj and helpmem.com for personal use. 2) Make 1 (one) printer copy of "The PC Assembler Tutor". 3) Use all programs in "The PC Assembler Tutor" for personal use. 4) Make an archival copy of the disks. 5) Distribute UNALTERED disks to friends for their perusal. 6) Use any updates to either "The PC Assembler Helper" or "The PC Assembler Tutor" under the same registration conditions. Though copies of the disk may be given away if there is no charge, it is illegal to charge for redistribution of the disk or its contents without permission of the author. Under no circumstances may you distribute printed copies of "The PC Assembler Tutor". If you intend to charge for distributing the disk or its information, please read and sign the following distribution agreement. ***************************************************************** DISTRIBUTION LICENSING AGREEMENT FOR THE PC ASSEMBLER HELPER AND THE PC ASSEMBLER TUTOR Anyone wishing to charge people a fee for giving them a copy of The PC Assembler Helper and/or The PC Assembler Tutor must have the written authorization of the author, without which the distributor is guilty of copyright violation. To receive such authorization, send this completed application, along with a copy of your software library's order form to: NELSOFT P.O. Box 21389 Oakland, CA 94620 If you want a distribution disk with the latest copy of these programs, please include $7.00 to cover the cost of the disks, mailing and handling. (This offer is for bona fide user groups and shareware distributors only). NAME OF ORGANIZATION ___________________________________________ YOUR NAME ______________________________________________________ ADDRESS ________________________________________________________ CITY, STATE ____________________________________________________ TERMS OF DISTRIBUTION 1. The fee charged for each disk may not exceed $7.00. On high-density disks, the fee may not be over $10.00. 2. Your library's catalog or listing must state that this material is not free, but is copyrighted material that is provided to allow the user to evaluate it before paying. 3. The offering and sale of disks containing The PC Assembler Helper and The PC Assembler Tutor will be stopped at any time the author so requests. 4. The Tutor and the Helper must be distributed together. The compressed files and the information document must remain in the subdirectory \PCTUTOR. There may be no additional files in this subdirectory. Both the name and the contents of \PCREADME.DOC must remain unaltered. 5. Problems or complaints will be reported to the author. In return for the right to charge a fee for the distribution of The PC Assembler Helper and The PC Assembler Tutor, I agree to comply with the above terms of distribution. Signed, __________________________________________ __________________ Your Signature Date Introduction N.2 ================ 5 WHAT'S AN ASSEMBLER? What is the difference between a compiler and an assembler? A compiler is a program that takes the source code you have written and turns it into machine language instructions that are usable by the computer. A machine language instruction is a binary number that tells the computer to do one specific thing. This is something very specific like: add 1 to a specific variable. The compiler does this in two steps. First, the compiler takes each line of source code and turns the expression on the line into a number of simple tasks to accomplish what is desired, generating a number of assembler TEXT instructions and data definitions. When the compiler is through with the source code, it then takes these TEXT instructions and assembles them to form an object module. An object module is a file that can be linked with other files to form a larger program. Why two steps instead of one? There are several reasons. This allows a compiler writer to write a first part for any language like Pascal, C, BASIC, etc., and then use the same assembler part. This saves development time in a company that has more than one type of compiler. You can insert a new assembler part without worrying about the text generation part, or you can insert a new text generation part without worrying about the assembler part. Secondly, though an assembler is not a simple program, the compiler's text generator is even more complicated. Putting the two together is like trying to juggle eight balls instead of four balls. This leads to the most vital reason. Not only would a unified text-generator/assembler be more error prone, it would be almost impossible to debug. If you are getting an error due to one type of Pascal instruction, is this because it is being misunderstood by the compiler or because the compiler is giving it the wrong machine codes? In the two part system, the compiler writer can look at the intermediate text code and isolate the problem into one of the two halves. An assembler is a program that takes a TEXT file where each line corresponds to a specific machine instruction or type of data, calculates the address in memory where each piece of data or machine instruction will be, translates each instruction and piece of data into machine readable form, and inserts the addresses of data and labels into machine instructions where appropriate.{1} ____________________ 1. A label is just a name which marks a certain spot in the assembler code. The PC Assembler Tutor 6 ______________________ The text name for a machine instruction is called a mnemonic.{2} It indicates what is being done by the instruction. Which would you rather use for multiplication, 'MUL' or '11110111xx100xxx'? These 'x's indicate a digit whose value depends on where something is in memory. For each mnemonic there is a single machine instruction which performs the operation. This means that the assembler's task is relatively simple. It only needs to allocate space for all the variables and instructions, to translate each mnemonic and data value into its corresponding machine code, and finally put it into a machine usable file. Here you need to know what different forms of file there are. 1) An executable (.EXE) file contains certain information for the operating system when the program is started. This allows the program to be as large as is wanted. 2) A .COM file contains no information for the operating system. When the operating system starts a .COM file it simply puts it in memory and starts it. Files with a .COM extension are limited to a length of 64k bytes. 3) Binary files are files which must be loaded into a .COM or .EXE program before being run. They cannot be used by themselves. They are archaic. They are a crutch for those compilers which don't support .OBJ files, and are disappearing. 4) An object (.OBJ) file is a section of a program. It contains code and variables, but also contains information that can be used to combine it with other object files into a larger program. A linker can convert one or more object files into an executable file. Things have moved along in the past few years. TurboPascal 3.0 generated .COM files. This wasn't because .COM files were superior but because it was too difficult to generate the extra information needed to produce an .OBJ file. Interpreted BASIC required binary files because it did not have the ability to use .OBJ files. The situation now is: If you want to link with the current Turbo Pascal, you should use an .OBJ file. If you want to link with QuickC, you need an .OBJ file. If you want to combine an external subprogram with QuickBASIC, you need an .OBJ file. Get the picture? No assembler makes .EXE files. If you have a single file that you want made onto a stand alone .EXE file, you first make an .OBJ file and then use that single file with LINK. ____________________ 2. You don't pronounce that first 'm'. Introduction 7 ____________ When making a .COM file, the normal route is to make .OBJ files, link them together into an EXE file, and then convert them to a .COM file with a program called EXE2BIN. This allows you to divide the problem into a number of subproblems and put them all together at the end. As you can see from the above, the job for the assembler is to take a text file and convert it into an .OBJ file. The three assemblers that you are most likely to have are MASM, A86, and TurboAssembler. They all produce object files. Since assemblers simply supply the machine code for each instruction, they will all produce the exact same code.{3} This is one of the differences between assemblers and compilers. The text generation phase of the compiler requires creativity. It is the compiler writer's idea of how to solve a certain problem in a specific language. This generated text is copyrighted, and you need a license to distribute a program that includes this generated text. An assembler. on the other hand, is just a drudge. If you had a book with the machine codes and had enough time, you could produce the same file byte for byte that the assembler would produce for a .COM file. There is no creativity involved in the generated code, and there is no license involved in its distribution. There is some difference in the speed of those three assemblers, but I'll have a comment about that after I give you the numbers. These numbers are from the time of hitting the ENTER key to the return of the DOS prompt ('>'). These are on a low speed machine so your numbers should be better, but won't be any worse. A86 TurboASM MASM one page of code 3.2 8.8 10.3 20 pages of code 7.3 12.7 20.4 60 pages of code 12.5 22.9 43.3 All these numbers are in seconds. A86 is the fastest. It loads faster because it itself is a .COM file, and it works faster. But even the slowest (MASM), is fast enough. How long does it take to write 60 pages of code? Probably a week or two. Writing assembler code is normally slower than writing code for a high-level language. Even the slowest finishes the assembly in well under a minute. Think of the time it would take to compile 60 pages of Pascal code. In fact, the normal length of a file will be from 10 to 20 pages, so these are the numbers you need to think of. These will be the smaller .OBJ files which are linked together by the linker. If you have one of these assemblers you don't need anything different. If you need to get one, you can use this information ____________________ 3. Or functionally the same code. Sometimes there are two different instructions that do the same thing, just as 6+1, 5+2 and 4+3 all produce 7. The PC Assembler Tutor 8 ______________________ to help you in your selection. The following prices are as of mid-1990. A86 is available through shareware. It costs $50.00 for the assembler alone, $80.00 with the debugger. Add another $10.00 if you want a printed manual. Both MASM and Turbo Assembler come with assembler, debugger, a number of utility programs and several manuals. They both retail for $150.00, but even a quick glance at BYTE magazine will find you a place that is selling them for $105.00 - $110.00. They come bundled, so you cannot buy the assembler without the debugger (The Microsoft debugger is Codeview and the Borland debugger is Turbo Debugger). Speaking of debuggers, you may be thinking, "Well, I have DEBUG, so why do I want another debugger?" DEBUG has been outdated for several years now. It has been supplanted by symbolic debuggers which associate code with specific lines in the original text file. You give the symbolic debugger the .EXE file along with the text files that produced it, and it shows you your source code as you go along. Here's a section of code we will meet later in the Tutor: ; - - - - - - - - - - - - - - - - - - - - start: push ds ; set up for return sub ax,ax push ax mov ax, DATASTUFF ; load ds mov ds,ax outer_loop: lea ax, multiplicand ; load multiplicand call get_unsigned_8byte call print_unsigned_8byte call get_unsigned ; unsigned word to multiplier mov multiplier, ax lea si, multiplicand ; load pointers lea bx, result ; - - - - - - - - - - - - - - - - - - - - Don't worry about what these instructions do. You'll learn that later. Here is DEBUG's idea of what is going on: ******************** DEBUG SCREEN SHOT ************************ -r AX=0000 BX=0000 CX=2749 DX=0000 SP=0A00 BP=0000 SI=0000 DI=0000 DS=0D7E ES=0D7E SS=0D8E CS=0E7F NV UP DI PL NZ NA PO NC 0E7F:0000 1E PUSH DS -u0E7F:0000 2A 0E7F:0000 1E PUSH DS 0E7F:0001 2BC0 SUB AX,AX 0E7F:0003 50 PUSH AX 0E7F:0004 B82E0E MOV AX,0E2E 0E7F:0007 8ED8 MOV DS,AX 0E7F:0009 8D060800 LEA AX,[0008] Introduction 9 ____________ 0E7F:000D E80C14 CALL 141C 0E7F:0010 E84B11 CALL 115E 0E7F:0013 E8F505 CALL 060B 0E7F:0016 A31000 MOV [0010],AX 0E7F:0019 8D360800 LEA SI,[0008] 0E7F:001D 8D1E1200 LEA BX,[0012] ********************** END DEBUG ******************************* Part of this is understandable, but a lot of it is confusing, and you have lost the concept of what you are trying to do. To see what a symbolic debugger does with the same code, here is the Turbo Debugger's idea of what is happening: ************************* TURBO SCREEN SHOT {4} ***************************** File View Run Breakpoints Data Window Options READY .Module: debugtst File: debugtst.asm 74....................................1 . . start: push ds ; set up for return .Registers......3. . sub ax,ax . ax 5C94 .c=0. . push ax . bx 0000 .z=1. . . cx 0000 .s=0. . mov ax, DATASTUFF ; load ds . dx 0000 .o=0. . mov ds,ax . si 0000 .p=1. . . di 0000 .a=0. . outer_loop: . bp 0000 .i=1. . lea ax, multiplicand ; load multipli. sp 09FA .d=0. . call get_unsigned_8byte . ds 4AD6 . . . call print_unsigned_8byte . es 4A26 . . . call get_unsigned ; unsigned word. ss 4A36 . . . mov multiplier, ax . cs 4B27 . . . . ip 0019 . . . .................. . lea si, multiplicand ; load pointers ............................................................................. .Watches....................................................................2 .multiplier,d 23700 .multiplicand qword 00000042E843515D ............................................................................. F1-Help F2-Bkpt F3-Close F4-Here F5-Zoom F6-Next F7-Trace F8-Step F9-Run ***************************************************************************** DEBUG really doesn't meet our needs. Of course, those of you who still use EDLIN to write 20 page documents should feel free to use DEBUG. Modern language compilers have their own debuggers in their environments, so we only need a debugger for the assembler. Turbo Debugger and Code View do the job very well. D86 is better than DEBUG but it has some problems. ____________________ 4. Turbo Debugger is (C) Copyright 1988-1989 Borland International. The PC Assembler Tutor 10 ______________________ If you actually want a debugger that will symbolically debug everything that supports symbolic debugging, you might want to take a look at the Turbo Debugger. It has more power than you are ever likely to need. "The Assembler Helper", the program which comes with the Tutor, is NOT a debugger. Debuggers are designed to show you what is happening with your code; the Helper is designed to show you what is happening with the 8086. It's a fundamental difference in outlook. If you want to use a debugger while you are doing this tutorial it is possible but the results are not guaranteed. Please see DEBUGGER.DOC in \COMMENTS for some information about the different debuggers and how to use ASMHELP with a debugger. You should not try to do this before you are in chapter 5 or 6. You may have noticed that CHASM, an assembler distributed through shareware, was not listed. There is a reason for this. It can't produce .OBJ files, so it cannot produce the standard files for use with current compilers, including QuickBASIC. CHASM is also unusable with this tutorial because it cannot produce files to link with ASMHELP.OBJ, the i/o interface program. If you are getting a shareware assembler, get A86. It's a quality assembler. This tutorial was originally written for those using MASM. In order to allow those using the Turbo Assembler and A86 to follow along, there is a document for each one in \COMMENTS which explains any differences between what is in the chapters (which use MASM as an example) and what the respective assemblers do. There aren't that many differences. The pathnames are \COMMENTS\TURBO.DOC and \COMMENTS\A86.DOC. 1 TABLE OF CONTENTS Chapter 0.1 - Numbers And Arithmetic . . . . . . . . . . . i Base 10 Machine . . . . . . . . . . . . . . . . . . . . i Negative Numbers . . . . . . . . . . . . . . . . . . . ii 10's Complement . . . . . . . . . . . . . . . . . . . iii Addition . . . . . . . . . . . . . . . . . . . . . . . v Subtraction . . . . . . . . . . . . . . . . . . . . . . v Modular Math . . . . . . . . . . . . . . . . . . . . . vi Sign Extension . . . . . . . . . . . . . . . . . . . . ix Overflow . . . . . . . . . . . . . . . . . . . . . . xii Multiplication . . . . . . . . . . . . . . . . . . . xiii Division . . . . . . . . . . . . . . . . . . . . . . xiv Chapter 0.2 - Bases 2 And 16 . . . . . . . . . . . . . . . xv Base Conversion . . . . . . . . . . . . . . . . . . . . xv Binary Math . . . . . . . . . . . . . . . . . . . . . xvi 2's Complement . . . . . . . . . . . . . . . . . . . xvii Sign Extension . . . . . . . . . . . . . . . . . . xviii Chapter 0.3 - Logic . . . . . . . . . . . . . . . . . . . xxi AND . . . . . . . . . . . . . . . . . . . . . . . . . xxi OR . . . . . . . . . . . . . . . . . . . . . . . . . xxii XOR . . . . . . . . . . . . . . . . . . . . . . . . . xxii NOT . . . . . . . . . . . . . . . . . . . . . . . . xxiii Chapter 0.4 - Memory . . . . . . . . . . . . . . . . . . xxv Segmentation . . . . . . . . . . . . . . . . . . . . xxv Numbers In Memory . . . . . . . . . . . . . . . . . xxvii Chapter 0.5 - Style . . . . . . . . . . . . . . . . . . . xxix Chapter 1 - Some Simple Programs . . . . . . . . . . . . . 1 Label . . . . . . . . . . . . . . . . . . . . . . . . . 3 CALL . . . . . . . . . . . . . . . . . . . . . . . . . 3 JMP . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 - Data . . . . . . . . . . . . . . . . . . . . . 11 DB, DW, DD, DQ, DT, DF . . . . . . . . . . . . . . . . 11 Definition of Constants . . . . . . . . . . . . . . . . 13 Chapter 3 - Asmhelp . . . . . . . . . . . . . . . . . . . . 16 Registers . . . . . . . . . . . . . . . . . . . . . . . 16 Show_regs . . . . . . . . . . . . . . . . . . . . . . . 17 MOV . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4 - Show_regs . . . . . . . . . . . . . . . . . . . 24 Show_reg Codes . . . . . . . . . . . . . . . . . . . . 29 Chapter 5 - Addition and Subtraction . . . . . . . . . . . 31 LOOP . . . . . . . . . . . . . . . . . . . . . . . . . 31 The PC Assembler Tutor 2 ______________________ OF, ZF, SF, CF . . . . . . . . . . . . . . . . . . . . 32 ADD . . . . . . . . . . . . . . . . . . . . . . . . . . 33 PUSH . . . . . . . . . . . . . . . . . . . . . . . . . 33 POP . . . . . . . . . . . . . . . . . . . . . . . . . . 34 SUB . . . . . . . . . . . . . . . . . . . . . . . . . . 37 JC, JNC, JO, JNO . . . . . . . . . . . . . . . . . . . 38 INTO . . . . . . . . . . . . . . . . . . . . . . . . . 39 INTO.COM . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 6 - Multiplication and Division . . . . . . . . . . 41 MUL . . . . . . . . . . . . . . . . . . . . . . . . . . 41 IMUL . . . . . . . . . . . . . . . . . . . . . . . . . 41 DIV . . . . . . . . . . . . . . . . . . . . . . . . . . 44 IDIV . . . . . . . . . . . . . . . . . . . . . . . . . 44 Chapter 7 - Logic . . . . . . . . . . . . . . . . . . . . . 47 AND . . . . . . . . . . . . . . . . . . . . . . . . . . 47 TEST . . . . . . . . . . . . . . . . . . . . . . . . . 49 OR . . . . . . . . . . . . . . . . . . . . . . . . . . 50 XOR . . . . . . . . . . . . . . . . . . . . . . . . . . 50 NEG . . . . . . . . . . . . . . . . . . . . . . . . . . 51 NOT . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Masks . . . . . . . . . . . . . . . . . . . . . . . . . 52 Chapter 8 - Shift and Rotate . . . . . . . . . . . . . . . 56 SAL, SHL . . . . . . . . . . . . . . . . . . . . . . . 56 INC, DEC . . . . . . . . . . . . . . . . . . . . . . . 57 SHR . . . . . . . . . . . . . . . . . . . . . . . . . . 58 SAR . . . . . . . . . . . . . . . . . . . . . . . . . . 59 ROL, ROR . . . . . . . . . . . . . . . . . . . . . . . 60 RCL, RCR . . . . . . . . . . . . . . . . . . . . . . . 61 Chapter 9 - Jumps . . . . . . . . . . . . . . . . . . . . . 68 CMP . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Signed and Unsigned Conditional Jumps . . . . . . . . . 70 Flag Conditional Jumps . . . . . . . . . . . . . . . . 75 JCXZ . . . . . . . . . . . . . . . . . . . . . . . . . 75 Chapter 10 - Templates . . . . . . . . . . . . . . . . . . 78 .LST File . . . . . . . . . . . . . . . . . . . . . . . 78 SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . 84 PUBLIC (SEGMENTS) . . . . . . . . . . . . . . . . . . . 85 CLASS . . . . . . . . . . . . . . . . . . . . . . . . 85ff ENDS . . . . . . . . . . . . . . . . . . . . . . . . . 92 ASSUME . . . . . . . . . . . . . . . . . . . . . . . . 93 Segment Overrides . . . . . . . . . . . . . . . . . . . 93 Subroutines . . . . . . . . . . . . . . . . . . . . . . 96 END . . . . . . . . . . . . . . . . . . . . . . . . . . 97 RET . . . . . . . . . . . . . . . . . . . . . . . . . . 98 EXTRN . . . . . . . . . . . . . . . . . . . . . . . . . 99 STACK . . . . . . . . . . . . . . . . . . . . . . . . 101 Chapter 11 - Addressing Modes . . . . . . . . . . . . . . 104 EQU . . . . . . . . . . . . . . . . . . . . . . . . . 110 All Addressing Modes . . . . . . . . . . . . . . . . 114 OFFSET . . . . . . . . . . . . . . . . . . . . . . . 118 SEG . . . . . . . . . . . . . . . . . . . . . . . . . 118 Table Of Contents 3 _________________ LEA . . . . . . . . . . . . . . . . . . . . . . . . . 118 Chapter 12 - Multiple Word Arithmetic I . . . . . . . . . 122 ADC . . . . . . . . . . . . . . . . . . . . . . . . . 123 CLC . . . . . . . . . . . . . . . . . . . . . . . . . 124 SBB . . . . . . . . . . . . . . . . . . . . . . . . . 126 Chapter 13 - Multiple Word Arithmetic II . . . . . . . . 129 Unsigned Multiplication . . . . . . . . . . . . . . . 129 Unsigned Division . . . . . . . . . . . . . . . . . . 130 Chapter 14 - Zoom . . . . . . . . . . . . . . . . . . . . 134 Chapter 15 - Subroutines . . . . . . . . . . . . . . . . 137 PUSHREGS.MAC . . . . . . . . . . . . . . . . . . . . 137 EXTRN in Subroutines . . . . . . . . . . . . . . . . 142 Passing Data . . . . . . . . . . . . . . . . . . . . 144 Near and Far Procedures . . . . . . . . . . . . . . . 145 The Stack . . . . . . . . . . . . . . . . . . . . . . 148 Types of Returns . . . . . . . . . . . . . . . . . . 153 PUSHREGS . . . . . . . . . . . . . . . . . . . . . . 154 POPREGS . . . . . . . . . . . . . . . . . . . . . . . 154 LDS . . . . . . . . . . . . . . . . . . . . . . . . . 157 LES . . . . . . . . . . . . . . . . . . . . . . . . . 157 Towers of Hanoi . . . . . . . . . . . . . . . . . . . 162 Summary . . . . . . . . . . . . . . . . . . . . . . . 166 Chapter 16 - Long Signed Multiplication And Division . . 170 Long Negation . . . . . . . . . . . . . . . . . . . . 170 Chapter 17 - Interrupts . . . . . . . . . . . . . . . . . 177 INT . . . . . . . . . . . . . . . . . . . . . . . . . 177 NMI . . . . . . . . . . . . . . . . . . . . . . . . . 181 IEF . . . . . . . . . . . . . . . . . . . . . . . . . 181 STI . . . . . . . . . . . . . . . . . . . . . . . . . 181 CLI . . . . . . . . . . . . . . . . . . . . . . . . . 181 INT 3 . . . . . . . . . . . . . . . . . . . . . . . . 182 Chapter 18 - Ports . . . . . . . . . . . . . . . . . . . 185 IN . . . . . . . . . . . . . . . . . . . . . . . . . 185 OUT . . . . . . . . . . . . . . . . . . . . . . . . . 186 Parity . . . . . . . . . . . . . . . . . . . . . . . 186 Chapter 19 - Strings . . . . . . . . . . . . . . . . . . 191 SCAS . . . . . . . . . . . . . . . . . . . . . . . . 191 DF . . . . . . . . . . . . . . . . . . . . . . . . . 191 REP/REPE/REPNE . . . . . . . . . . . . . . . . . . . 195 STOS . . . . . . . . . . . . . . . . . . . . . . . . 196 LODS . . . . . . . . . . . . . . . . . . . . . . . . 198 MOVS . . . . . . . . . . . . . . . . . . . . . . . . 199 CMPS . . . . . . . . . . . . . . . . . . . . . . . . 204 Segment Overrides . . . . . . . . . . . . . . . . . . 207 REP and Overrides . . . . . . . . . . . . . . . . . . 209 Chapter 20 - Control Structures . . . . . . . . . . . . . 212 IF . . . . . . . . . . . . . . . . . . . . . . . . . 212 WHILE . . . . . . . . . . . . . . . . . . . . . . . . 214 The PC Assembler Tutor 4 ______________________ DO-WHILE . . . . . . . . . . . . . . . . . . . . . . 215 BREAK . . . . . . . . . . . . . . . . . . . . . . . . 215 CONTINUE . . . . . . . . . . . . . . . . . . . . . . 215 FOR . . . . . . . . . . . . . . . . . . . . . . . . . 216 SWITCH . . . . . . . . . . . . . . . . . . . . . . . 217 Chapter 21 - .COM Files . . . . . . . . . . . . . . . . . 219 .COM Template . . . . . . . . . . . . . . . . . . . . 219 PSP . . . . . . . . . . . . . . . . . . . . . . . . . 220 ASSUME . . . . . . . . . . . . . . . . . . . . . . . 221 Phase Errors . . . . . . . . . . . . . . . . . . . . 221 Chapter 22 - BCD Numbers . . . . . . . . . . . . . . . . 229 Unpacked BCD . . . . . . . . . . . . . . . . . . . . 229 Packed BCD . . . . . . . . . . . . . . . . . . . . . 230 DAA . . . . . . . . . . . . . . . . . . . . . . . . . 232 DAS . . . . . . . . . . . . . . . . . . . . . . . . . 233 AAA . . . . . . . . . . . . . . . . . . . . . . . . . 240 AAS . . . . . . . . . . . . . . . . . . . . . . . . . 242 AAM . . . . . . . . . . . . . . . . . . . . . . . . . 242 AAD . . . . . . . . . . . . . . . . . . . . . . . . . 243 Unpacking . . . . . . . . . . . . . . . . . . . . . . 245 Packing . . . . . . . . . . . . . . . . . . . . . . . 246 Chapter 23 - XLAT . . . . . . . . . . . . . . . . . . . . 253 EBCDIC Numbers . . . . . . . . . . . . . . . . . . . 253 XLAT . . . . . . . . . . . . . . . . . . . . . . . . 253 Translation Table . . . . . . . . . . . . . . . . . . 253 Chapter 24 - Miscellaneous Instructions . . . . . . . . . 264 XCHG . . . . . . . . . . . . . . . . . . . . . . . . 264 ESC . . . . . . . . . . . . . . . . . . . . . . . . . 264 WAIT . . . . . . . . . . . . . . . . . . . . . . . . 264 FWAIT . . . . . . . . . . . . . . . . . . . . . . . . 264 LOCK . . . . . . . . . . . . . . . . . . . . . . . . 265 LOOPE/LOOPNE . . . . . . . . . . . . . . . . . . . . 265 HALT . . . . . . . . . . . . . . . . . . . . . . . . 266 CMC . . . . . . . . . . . . . . . . . . . . . . . . . 266 LAHF . . . . . . . . . . . . . . . . . . . . . . . . 266 SAHF . . . . . . . . . . . . . . . . . . . . . . . . 267 NOP . . . . . . . . . . . . . . . . . . . . . . . . . 267 Chapter 25 - What Does It All Mean? . . . . . . . . . . . 268 Interrupts . . . . . . . . . . . . . . . . . . . . . 269 Data Bus . . . . . . . . . . . . . . . . . . . . . . 273 Alignment Type . . . . . . . . . . . . . . . . . . . 274 Chapter 26 - Simplifying The Template . . . . . . . . . . 276 INT 21h Function 4Ch . . . . . . . . . . . . . . . . 276 Exit Code . . . . . . . . . . . . . . . . . . . . . . 276 Standardized Segments . . . . . . . . . . . . . . . . 277 _DATA . . . . . . . . . . . . . . . . . . . . . . . . 279 _BSS . . . . . . . . . . . . . . . . . . . . . . . . 279 CONST . . . . . . . . . . . . . . . . . . . . . . . . 280 Literals . . . . . . . . . . . . . . . . . . . . . . 280 STACK . . . . . . . . . . . . . . . . . . . . . . . . 280 Groups . . . . . . . . . . . . . . . . . . . . . . . 280 Table Of Contents 5 _________________ DGROUP . . . . . . . . . . . . . . . . . . . . . . . 281 Groups and OFFSET . . . . . . . . . . . . . . . . . . 284 Standardized Segment Names . . . . . . . . . . . . . 287 Standardized Segment Directives . . . . . . . . . . . 288 .MODEL Names . . . . . . . . . . . . . . . . . . . . 291 Summary . . . . . . . . . . . . . . . . . . . . . . . 293 APPENDIX Appendix I - The PC Assembler Helper . . . . . . . . . . . i Appendix II - The 8086 Instruction Set . . . . . . . . . xiii Appendix III - Instruction Speed And Flags . . . . . . xxvii ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ANCILLARY MATERIAL DEBUGGERS (DEBUGGER.DOC) TASM (TASM.DOC) A86 (A86.DOC) BASIC (BASIC1.DOC, BASIC2-1.DOC, BASIC2-2.DOC) Using Basic . . . . . . . . . . . . . . . . . . . . . . . mmi Variable Typing . . . . . . . . . . . . . . . . . . . mmi Default Type . . . . . . . . . . . . . . . . . . . . mmii Interfacing Basic With Assembler . . . . . . . . . . . mmvii Memory Allocation . . . . . . . . . . . . . . . . . mmvii String Allocation . . . . . . . . . . . . . . . . . mmviii Data Output . . . . . . . . . . . . . . . . . . . . . mmx FIELD . . . . . . . . . . . . . . . . . . . . . . . mmxii LSET/MID$/RSET . . . . . . . . . . . . . . . . . . mmxiii MKI$, MKL$, MKS$, MKD$ . . . . . . . . . . . . . . mmxiii CVI, CVL, CVS, CVD . . . . . . . . . . . . . . . . mmxiii STR$, VAL . . . . . . . . . . . . . . . . . . . . . mmxiv VARPTR . . . . . . . . . . . . . . . . . . . . . . mmxvi PTR86 (Basic 3.0) . . . . . . . . . . . . . . . . . mmxvii BUILDLIB.EXE . . . . . . . . . . . . . . . . . . . mmxvii VARSEG (Basic 4.0) . . . . . . . . . . . . . . . mmxviii Basic Calling Conventions . . . . . . . . . . . . . mmxix INT86 . . . . . . . . . . . . . . . . . . . . . . mxxviii SADD . . . . . . . . . . . . . . . . . . . . . . . mmxxix BLOAD . . . . . . . . . . . . . . . . . . . . . . . mmxxx Summary . . . . . . . . . . . . . . . . . . . . . . mmxxxi Interfacing Basic With Assembler . . . . . . . . . mmvii Ancillary Programs (MISHMASH.DOC) General Block Move . . . . . . . . . . . . . . . . . . 1 Block Multiplication . . . . . . . . . . . . . . . . . 3 Binary Multiplication . . . . . . . . . . . . . . . . . 6 Binary Division . . . . . . . . . . . . . . . . . . . . 9 Chapter 0.1 - Numbers and Arithmetic ==================================== i You don't habitually use the base two system to balance your checkbook, so it would be counterproductive to teach you machine arithmetic on a base two system. What number systems have you had a lot of experience with? The base 10 system springs to mind. I'm going to show you what happens on a base 10 system so you will understand the structure of what happens with computer arithmetic. BASE 10 MACHINE Each place inside the microprocessor that can hold a number is called a REGISTER. Normally there are a dozen or so of these. Our base 10 machine has 4 digit registers. They can represent any number from 0000 to 9999. They are exactly like an industrial counters or the counters on your tape machines.{1} If you add 27 to a register, the microprocessor counts forward 27; if you subtract 153 from a register, the microprocessor counts backwards 153. Every time you add 1 to a register, it increments by 1 - that is 0245, 0246, 0247, 0248. Every time you subtract 1 from a register, it decrements by 1 - that is 3480, 3479, 3478, 3477. Let's do some more incrementing. 9997, 9998, 9999, 0000, 0001, 0002. Whoops! That's a problem. When the register reaches 9999 and we add 1, it changes to 0000, not 10,000. How can we tell the difference between 0000 and 10,000? We can't without a little help from the CPU.{2} Immediately after an arithmetical operation, the CPU knows whether you have gone through 10,000 (9999->0000). The CPU has something called a carry flag. It is internal to the CPU and can have the value 0 or 1. After each arithmetical operation, the CPU sets the CARRY FLAG to 1 if you went through the 9999/0000 boundary, and sets the carry flag to 0 if you didn't.{3} Here are some examples, showing addition, the result, and the carry flag. The carry flag is normally abbreviated by CF. number 1 number 2 result CF 0289 4782 5071 0 4398 2964 7382 0 8177 5826 4003 1 ____________________ 1. Exactly like industrial counters that have several hundred thousand parts, that is. 2. The CPU (central processing unit) is the chip(s) that does all the arithmetic. In the case of the PC, it is the 8086. 3. When you set a flag to 0, it is called CLEARING the flag. The PC Assembler Tutor ii ______________________ 6744 4208 0952 1 Note that you must check the carry flag immediately after the arithmetical operation. If you wait, the CPU will reset it after the next arithmetical operation. Now let's do some decrementing. 0003, 0002, 0001, 0000, 9999, 9998. Golly gosh! Another problem. When we got to 0000, rather than getting -1, -2, we got 9999, 9998. Apparently 9999 stands for -1, 9998 stands for -2. Yes, that's the system on this, on the 8086, and on all computers. (Back to that in a moment.) How do we tell that the number went through 0 ; i.e. 0000->9999? The carry flag comes to the rescue again. If the number goes through the 9999/0000 boundary in either direction, the CPU sets the CF to 1; if it doesn't, the CPU sets the CF to 0. Here's some subtraction, with the result and the carry flag. number 1 number 2 result CF 8473 2752 5721 0 2836 4583 1747 1 0654 9281 8627 1 9281 0654 8627 0 Look at examples 3 and 4. The numbers are reversed. The results are the same but they have different signs. But that is as it should be. When you reverse the order in a subtraction, you get the same absolute value, only a different sign (15 - 7 = 8 but 7 - 15 = -8). Remember, the CF is reliable only immediately after the operation. NEGATIVE NUMBERS The negative numbers go 9999=-1, 9998=-2, 9997=-3, 9996=-4, 9995=-5 etc. A more negative number is denoted by a smaller number in the register; -5 = 10,000 -5 = 9995; -498 = 10,000 -498 = 9502, and in general, -x = 10,000 -x. Here are some negative numbers and their representations on our machine. number machine no number machine no -27 9973 -4652 5348 -8916 1084 -6155 3845 As you will notice, these numbers look exactly the same as the unsigned numbers. They ARE exactly the same as the unsigned numbers. The machine has no way of knowing whether a number in a register is signed or unsigned. Unlike BASIC or PASCAL which will complain whenever you try to use a number in an incorrect way, the machine will let you do it. This is the power and the curse of machine language. You are in complete control. It is your responsibility to keep track of whether a number is signed or unsigned. Which signed numbers should be positive and which negative? This has already been decided for you by the computer, but let's think Chapter 0.1 - Numbers and Arithmetic iii ____________________________________ out what a reasonable solution might be. We could have from 0000 to 8000 positive and from 9999 to 8001 negative, but that would give us 8001 positive numbers and 1999 negative numbers. That seems unbalanced. More importantly, if we take -(3279) the machine will give us 6721, which is a POSITIVE number. We don't want that. For reasons of symmetry, the positive numbers are 0000-4999 and the negative numbers are 9999-5000.{4} Our most negative number is -5000 = 10,000 -5000 = 5000. 10'S COMPLEMENT It's time for a digression. If we are going to be using negative numbers like -(473), changing from an external number to an internal number is going to be a bother: i.e. -473 -> 9527. Going the other way is going to be a pain too: i.e. 9527 -> -473. Well, it would be a problem except that we have some help. 0000 = 10,000 = 9999 +1 - 473 result 9526 +1 = 9527 Let's work this through carefully. On our machine, 0000 and 10000 (9999+1) are the same thing, so 0 - 473 is the same as 9999+1-473 which is the same as 9999-473+1. But when we have all 9s, this is a cinch. We never have to borrow - all we have to do is subtract each digit from 9 and then add 1 to the total. We may have to carry at the end, but that is a lot better than all those borrows. We'll do a few examples: (-4276) 0000 = 10,000 = 9999 +1 -4276 result 5723 +1 = 5724 (-3982) 0000 = 10,000 = 9999 +1 -3982 result 6017 +1 = 6018 (-2400) 0000 = 10,000 = 9999 +1 -2400 result 7599 +1 = 7600 (-1989) 0000 = 10,000 = 9999 +1 ____________________ 4. That way, if we tell the machine that we are working with signed numbers, all it has to do is look at the left digit. If the digit is 5-9, we have a negative number, if it is 0-4, we have a positive number. Note that 0000 is considered to be positive. This is true on all computers. The PC Assembler Tutor iv ______________________ -1989 result 8010 +1 = 8011 This is called 10s complement. Subtract each digit from 9, then add 1 to the total. One thing we should check is whether we get the same number back if we negate the negative result; i.e. does -(-1989)) = 1989? From the last example, we see that -1989 = 8011, so: (-8011) 0000 = 10,000 = 9999 +1 -8011 result 1988 +1 = 1989 It seems to work. In fact, it always works. See the footnote for the proof.{5} You are going to use this from time to time, so you might as well practice some. Here are 10 numbers to put into 10s complement form. The answers are in the footnote. (1) -628, (2) -4194, (3) -9983, (4) -1288, (5) -4058, (6) -6952, (7) -162, (8) -9, (9) -2744, (10) -5000.{6} The computer keeps track of whether a number is positive or negative. After an arithmetical operation, it sets a flag to tell whether the result is positive or negative. This flag has no meaning if you are using unsigned numbers. The computer is saying, "If the last arithmetical operation was with signed numbers, then this is the sign of the result." The flag is called the sign flag (SF). It is 0 if the number is positive and 1 if the number is negative. Let's decrement again and look at both the sign flag and carry flag. NUMBER SIGN CARRY 3 0 0 2 0 0 1 0 0 0 0 0 9999 1 1 ____________________ 5. Let x be any number. Then: -x = ( 10,000 - x) = ( 9999 + 1 - x ) ; -(-x) = ( 10,000 - (-x) ) = ( 9999 + 1 - (-x) ) = ( 9999 + 1 - ( 9999 + 1 - x ) ) = ( 9999 + 1 - 9999 - 1 + x ) = x 6. (1) -628 = 9372 , (2) -4194 = 5806 , (3) -9983 = 0017, (4) -1288 = 8712 , (5) -4058 = 5942 , (6) -6952 = 3048 (7) -162 = 9838 , (8) -9 = 9991 , (9) -2744 = 7256, (10) -5000 = 5000. This last one is a little strange. It changes 5000 into itself. In our system, 5000 is a negative number and it winds up as a negative number. This happens on all computers. If you take the maximum negative number and take its negative, you get the same number back. Chapter 0.1 - Numbers and Arithmetic v ____________________________________ 9998 1 0 9997 1 0 9996 1 0 That worked pretty well. The sign flag changed from 0 to 1 when we went from 0 to 9999 and the carry flag was set to 1 for that one operation so we could see that we had gone through the 9999/0000 boundary. Let's do some more decrementing. NUMBER SIGN CARRY 5003 1 0 5002 1 0 5001 1 0 5000 1 0 4999 0 0 4998 0 0 4997 0 0 4996 0 0 This one didn't work too well. 5000 is our most negative number (-5000) and 4999 is our most positive number; when we crossed the 4999/5000 boundary, the sign changed but there was nothing to tell us that the sign had changed. We need to make another flag. This one is called the overflow flag. We check the carry flag (CF) for the 0000/9999 boundary and we check the overflow flag for the 5000/4999 boundary. The last decrementing example with the overflow flag: NUMBER SIGN CARRY OVERFLOW 5003 1 0 0 5002 1 0 0 5001 1 0 0 5000 1 0 0 4999 0 0 1 4998 0 0 0 4997 0 0 0 4996 0 0 0 This time we can find out that we have gone through the boundary. We'll come back to how the computer sets the overflow flag later, but let's do some addition and subtraction now. UNSIGNED ADDITION AND SUBTRACTION Unsigned addition is done the same way as normally. The computer adds the two numbers. If the result is over 9999, it sets the carry flag and drops the left digit (i.e. 14625 -> 4625, CF = 1, 19137 -> 9137 CF = 1, 10000 -> 0000 CF = 1). The largest possible addition is 9999 + 9999 = 19998. This still has a 1 in the left digit. If the carry flag is set after an addition, the result must be between 10000 and 19998. The PC Assembler Tutor vi ______________________ Since this is unsigned addition, we won't worry about the sign flag or the overflow flag for the moment. Here are some examples of unsigned addition. NUMBER 1 NUMBER 2 RESULT CF 5147 2834 7981 0 6421 8888 5309 1 2910 6544 9454 0 6200 6321 2521 1 Directly after the addition, the computer has complete information about the number. If the carry flag is set, that means that there is an extra 10,000, so the result of the second example is 15309 and the result of the fourth example is 12521. There is no way to store all that information in 4 digits in memory so that extra information will be lost if it is not used immediately. Subtraction is similar. The machine subtracts, and if the answer is below 0000, it sets the carry flag, borrows 10000 and adds it to the result. -3158 -> -3135 + 10000 -> 6842 CF = 1 ; -8197 -> -8197 + 10000 -> 1803 CF = 1. After a subtraction, if the carry flag is set, you know the number is 10000 too big. Once again, the carry flag information must be used immediately or it will be lost. Here are some examples: NUMBER 1 NUMBER 2 RESULT CF 3872 2655 1217 0 9826 5967 3859 0 4561 7143 7418 1 2341 4907 7434 1 If the carry flag is set, the computer borrowed 10000, so example 3 is 7418 - 10000 = -2582 and example 4 is 7434 - 10000 = -2566. MODULAR ARITHMETIC What the computer is doing is modular arithmetic. Modular arithmetic is like a clock. If it is 11 o'clock and you go forward 1 hour it's now 12 o'clock; if it's 11 and you go backwards 1 hour it's now 10. If it's 11 and you go forward 4 hours it's not 15, it's 3. If it's 11 and you go backward 15 hours it's not -4, it's 8. The clock is doing mod 12 arithmetic.{7} (A+B) mod 12 (A-B) mod 12 From the clock's viewpoint, 11 o'clock today, 11 o'clock yesterday and 11 o'clock, June 8, 1754 are all the same thing. If ____________________ 7. To be a perfect analogy 12 o'clock should be 0 o'clock. Chapter 0.1 - Numbers and Arithmetic vii ____________________________________ you go forward 200 hours (that's 12X16 + 8) you will have the same result as going forward 8 hours. If you go backwards 200 hours (that's -(12X16 + 8) = -(12X16) -8) you get the same result as going backwards 8 hours. If you go forward 4 hours from 11 (11+4) mod 12 = 3 you get the same result as going backwards 8 hours (11-8) mod 12 = 3. In fact, these come in pairs. If A + B = 12, then going forward A hours gives the same result as going backwards B hours. Forwards 9 = backwards 3; forwards 7 = backwards 5; forwards 11 = backwards 1. In the mod 12 system, the following things are equivalent: (+72 + 4) (+72 - 8) (+60 + 4) (+60 - 8) (+48 + 4) (+48 - 8) (+36 + 4) (+36 - 8) (+24 + 4) (+24 - 8) (+12 + 4) (+12 - 8) ( 0 + 4) ( 0 - 8) (-12 + 4) (-12 - 8) (-24 + 4) (-24 - 8) (-36 + 4) (-36 - 8) (-48 + 4) (-48 - 8) (-60 + 4) (-60 - 8) They form what is known as an equivalence class mod 12. If you use any one of them for addition or subtraction, you will get the same result (mod 12) as with any other one. Here's some addition:{8} (+48 + 4) + 7 = (48 + 11) mod 12 = 11 (-48 - 8) + 7 = (48 - 1 ) mod 12 = 11 ( 0 - 8) + 7 = ( 0 - 1 ) mod 12 = 11 (-60 + 4) + 7 = (-60 +11) mod 12 = 11 And some subtraction: (+48 + 4) - 2 = (48 + 2 ) mod 12 = 2 (-48 - 8) - 2 = (48 - 10) mod 12 = 2 ( 0 - 8) - 2 = ( 0 - 10) mod 12 = 2 (-60 + 4) - 2 = (-60 + 2) mod 12 = 2 Our pretend computer doesn't cycle every 12 numbers, it cycles every 10,000 numbers - it is a mod 10,000 machine. On our machine, the number 6453 has the following equivalence class: (+30000 + 6453) (+30000 - 3547) (+20000 + 6453) (+20000 - 3547) (+10000 + 6453) (+10000 - 3547) ( 0 + 6453) ( 0 - 3547) (-10000 + 6453) (-10000 - 3547) (-20000 + 6453) (-20000 - 3547) (-30000 + 6453) (-30000 - 3547) ____________________ 8. (-10) mod 12 = 2 ; (-11) mod 12 = 1 The PC Assembler Tutor viii ______________________ Any one of these will act the same as any other one. Notice that 10000 - 3547 is the subtraction that we did to get the representation of -3547 on the machine. -3547 = 9999 + 1 3547 6452 + 1 = 6453 6453 and -3547 act EXACTLY the same on this machine. What this means is that there is no difference in adding signed or unsigned numbers on the machine. The result will be correct if interpreted as an unsigned number; it will also be correct if interpreted as a signed number. 6821 + 3179 = 10000 so -3179 = 6821 and 3179 = -6821 5429 + 4571 = 10000 so -4571 = 5429 and 4571 = -5429 Since -3179 and 6821 act the same on our machine and since -4571 and 5429 act the same, let's do some addition. Take your time so you understand why the signed and unsigned numbers are giving the same results mod 10000: --------------------------------------------------------- 6821 + 497 = 7318 -3179 + 497 = (10000 - 3179) + 497 = 10000 -2682 = -2682 7318 + 2682 = 10000 so -2682 = 7318 ---------------------------------------------------------- 5429 + 876 = 6305 -4571 + 876 = (10000 - 4571) + 876 = 10000 - 3695 = -3695 6305 + 3695 = 10000 so -3695 = 6305 ---------------------------------------------------------- Here's some subtraction: ----------------------------------------------------------- 6821 - 507 = 6314 -3179 - 507 = (10000 - 3179) - 507 = 10000 - 3686 = -3686 6314 + 3686 = 10000 so -3686 = 6314 ---------------------------------------------------------- 5429 - 178 = 5251 -4571 - 178 = (10000 - 4571) - 178 = 10000 - 4749 = -4749 5251 + 4749 = 10000 so -4749 = 5251 ----------------------------------------------------------- It is the same addition or subtraction. Interpreted one way it is Chapter 0.1 - Numbers and Arithmetic ix ____________________________________ signed addition or subtraction; interpreted another way it is unsigned addition or subtraction. The machine could have one operation for signed addition and another operation for unsigned addition, but this would be a waste of computer resources. These operations are exactly the same. This machine, like all computers, has only one integer addition operation and one integer subtraction operation. For each operation, it sets the flags of importance for both signed and unsigned arithmetic. For unsigned addition and subtraction, CF, the carry flag tells whether the 0000/9999 boundary has been crossed. For signed addition and subtraction, SF, the sign flag tells the sign of the result and OF, the overflow flag tells whether the result was too negative or too positive. SIGN EXTENSION Although our base 10 machine is set up for 4 digit numbers, it is possible to use it for numbers of any size by writing the appropriate software. We'll use 12 digit numbers as an example, though they could be of any length. The first problem is converting 4 digit numbers into 12 digit numbers. If the number is an unsigned number, this is no problem (we'll write the number in groups of 4 digits to keep it readable): 4816 -> 0000 0000 4816 9842 -> 0000 0000 9842 127 -> 0000 0000 0127 what if it is a signed number? The first thing we need to know about signed numbers is, what is positive and what is negative? Once again, for reasons of symmetry, we choose positive to be 0000 0000 0000 to 4999 9999 9999 and negative to be 5000 0000 0000 to 9999 9999 9999.{9} This longer number system cycles from 9999 9999 9999 to 0000 0000 0000. Therefore, for longer numbers, 0000 0000 0000 = 1 0000 0000 0000. They are equivalent. 0000 0000 0000 = 9999 9999 9999 + 1. If it is a positive signed number, it is still no problem (recall that in our 4 digit system, a positive number is between 0000 and 4999, a negative signed number is between 5000 and 9999). Here are some positive signed numbers and their conversions: 1974 -> 0000 0000 1974 1 -> 0000 0000 0001 3909 -> 0000 0000 3909 ____________________ 9. Once again, the sign will be decided by the left hand digit. If it is 0-4 it is a positive number; if it is 5-9 it is a negative number. The PC Assembler Tutor x ______________________ If it is a negative number, where did its representation come from in our 4 digit system? -x -> 9999 + 1 -x = 9999 - x + 1. This time it won't be 9999 + 1 but 9999 9999 9999 + 1. Let's have some examples. 4 DIGIT SYSTEM 12 DIGIT SYSTEM -1964 9999 + 1 9999 9999 9999 + 1 -1964 -1964 8035 -> 8036 9999 9999 8035 + 1 -> 9999 9999 8036 -2867 9999 + 1 9999 9999 9999 + 1 -2867 -2867 7132 -> 7133 9999 9999 7132 + 1 -> 9999 9999 7133 -182 9999 + 1 9999 9999 9999 + 1 -182 -182 9817 -> 9818 9999 9999 9817 + 1 -> 9999 9999 9818 As you can see, all you need to do to sign extend a negative number is to put 9s to the left. Can't those 9s on the left become 0s when we add that 1 at the end? No. In order for that to happen, the right four digits must be 9999. But that can only happen if the number to be negated is 0000: 9999 9999 9999 + 1 -0000 9999 9999 9999 + 1 -> 0000 0000 0000 In all other cases, adding 1 does not carry anything out of the right four digits. It is impossible to truncate one of these 12 digit numbers to a 4 digit number without making the results unreliable. Here are two examples: (number) 0000 0168 7451 -> 7451 (now a negative number) (actual value) +168 7451 -2549 (number) 9999 9643 2170 -> 2170 (now a positive number) (actual value) -356 7830 +2170 We now have 12 digit numbers. Is it possible to add them and subtract them? Yes but only 4 digits at a time. When you add with pencil and paper you carry left from each digit. The computer can carry left from each group of 4 digits. We'll do the following addition: 0138 6715 6037 + 2514 2759 7784 Chapter 0.1 - Numbers and Arithmetic xi ____________________________________ Do this with pencil and paper and write down all the carries. The computer is going to do this in 3 parts: 1) 6037 + 7784 2) 6715 + 2759 + carry (if any) 3) 0138 + 2514 + carry (if any) The first addition is our regular addition. It will set the carry flag if the 0000/9999 boundary was crossed (i.e. the result was larger than 9999). In our case CF = 1 since the result is 13821. The register holds 3821. We store 3821. Next, we need to add three things: 6715 + 2759 + CF (=1). There is an instruction like this on all computers. It adds two numbers plus the value of the carry flag. Our first addition was ADD (add two numbers). This time the machine instruction is ADC (add two numbers and the carry). The result of our second addition is 9475. The register holds 9475 and CF = 0. We store 9475. Finally, we need to add three more things: 0138 + 2514 + CF (=0). Once again we use ADC. The result is 2652, CF = 0. We store the 2652. That is the whole result: 2652 9475 3821 If CF = 1 at this point, the number has crossed the 9999,9999,9999/0000,0000,0000 boundary. This will work for signed numbers also. The only difference is that at the very end we don't check CF, we check OF to see if the 4999,9999,9999/5000,0000,0000 boundary has been crossed. Just to give you one more example we'll do a subtraction using the same numbers: 0138 6715 6037 2514 2759 7784 Notice that in order for you to do this with pencil and paper you'll have to put the larger number on top before you subtract. With the machine this is unnecessary. Go ahead and do the subtraction with pencil and paper. The machine can do this 4 digits at a time, so this is a three step process: 1) 6037 - 7784 2) 6715 - 2759 - borrow (if any) 3) 0138 - 2514 - borrow (if any) The first one is a regular subtraction and since the bottom number is larger, the result is 8253, CF = 1. (Perhaps you are puzzled because that's not the result that you got. Don't worry, it all comes out in the wash). Step two subtracts but also subtracts any borrow (We had a borrow because CF = 1). There is a special instruction called SBB (subtract with borrow) that does just that. 6715 - 2759 - 1 = 3955, CF = 0. We store the 3955 and go on to the third part. This also is SBB, but since we had no The PC Assembler Tutor xii ______________________ borrow, we have 0138 - 2514 - 0 = 7624, CF = 1. We store 7624. This is the end result, and since CF = 1, we have crossed the 9999,9999,9999/0000,0000,0000 boundary. This is going to be the representation of a negative number mod 1,0000,0000,0000. With pencil and paper, your result was: -2375 6044 1747 The machine result was: 7624 3955 8253 But CF was 1 at the end, so this represents a negative number. What number does it represent? Let's take its negative to get a positive number with the same absolute value: 9999 9999 9999 + 1 7624 3955 8253 2375 6044 1746 + 1 = 2375 6044 1747 This is the same thing you got with pencil and paper. The reason it looked wierd is that a negative number is always stored as its modular equivalent. If you want to read a negative number, you need to take its negative to get a positive number with the same absolute value. If we had been working with signed numbers, we wouldn't have checked CF at the very end, we would have checked OF to see if the 4999,9999,9999/5000,0000,0000 boundary had been crossed. If OF = 1 at the end, then the result was either too negative or too positive. OVERFLOW How does the machine decide that overflow has occured? First, what exactly is overflow and when is it possible for overflow to occur? Overflow is when the result of a signed addition or subtraction is either larger than the largest positive number or more negative than the most negative number. In the case of the 4 digit machine, larger than +4999 or more negative than -5000. If one number is negative and the other is positive, it is not possible for overflow to occur. Take +32 and -4791 as examples. If we start with the positive number (+32) and add the negative number (-4791), the result can't possibly be too positive. Similarly, if we start with the negative number (-4791) and add the positive number (+32), the result can't be too negative. Therefore, the result can be neither too positive nor too negative. Make sure you understand this before going on. What if both are positive? Then overflow is possible. Here are some examples: Chapter 0.1 - Numbers and Arithmetic xiii ____________________________________ (+3500) + (+4500) = 8000 = -2000 (+2872) + (+2872) = 5744 = -4256 (+1799) + (+4157) = 5956 = -4044 In each case, two positive numbers give a negative result. How about two negative numbers? (7154) + (6000) = 3154 = +3154 (actual value) -2946 -4000 (5387) + (5826) = 1213 = +1213 (actual value) -4613 -4174 (8053) + (6191) = 4244 = +4244 (actual value) -1947 -3809 The numbers underneath are the negative numbers that the numbers above them represent. In these cases, adding two negative numbers gives a positive result. This is what the machine checks for. Before the addition, it checks the signs of the numbers. If the signs are the same, then the result must also be the same sign or overflow has occurred.{10} Thus + and + must have a + result; - and - must have a - result. If not, OF (the overflow flag) is set (OF = 1). Otherwise OF is cleared (OF = 0). MULTIPLICATION Unsigned multiplication is easy. The machine simply multiplies the two numbers. Since the result can be up to 8 digits (the maximum result is 9999 X 9999 = 9998 0001) the machine uses two registers to hold the result. We'll call them R1 and R2. 5436 X 174 R1 0094 R2 5864 2641 X 2003 R1 0528 R2 9923 You need to know which register holds which half of the result, but besides that, everything is straightforward. On this machine R1 holds the left four digits and R2 holds the right four digits. Notice that our machine has changed the modular base from N to N*N (from 1 0000 to 1 0000 0000). What this means is that two things which are modularly equivalent under addition and subtraction are not necessarily equivalent under multiplication and division. 6281 and -3719 will not work the same. ____________________ 10. The machine checks something considerably more obscure because it is easier to implement in semiconductor logic, but what it is actually doing is checking to see if the two numbers being added have the same sign. If they do, the result must be the same sign or overflow has occurred. The PC Assembler Tutor xiv ______________________ The machine can't do signed multiplication. What it actually does is convert the numbers to positive numbers (if necessary), perform unsigned multiplication, and then do sign adjustment of the results (if necessary). It uses 2 registers for the result. SIGNED MULTIPLICATION REGS RESULT (number) (5372) X (3195) R1 8521 = -1478 6460 (actual value) -4628 X +3195 R2 3540 (number) (9164) X (8746) R1 0104 = +104 8344 (actual value) -836 X -1254 R2 8344 (number) (9927) X (0013) R1 9999 = -949 (actual value) -73 X +13 R2 9051 Looking at the last example, if we performed unsigned multiplication on those two numbers, we would have 9927 X 0013 = 0012 9051, a completely different answer from the one we got. Therefore, whenever you do multiplication, you have to tell the machine whether you want unsigned or signed multiplication. DIVISION Unsigned division is easy too. The machine divides one number by the other, puts the quotient in one register and the remainder in another. Once again, the only problem is remembering which register has the quotient and which register has the remainder. For us, the quotient is R1 and the remainder is R2. 6190 / 372 R1 0016 16 remainder 238 R2 0238 9845 / 11 R1 0895 895 remainder 0 R2 0000 As with multiplication, signed division is handled by the machine changing all numbers to positive numbers, performing unsigned division, then putting back the appropriate signs. SIGNED DIVISION REGS RESULT (number) (7192) / (9164) R1 0003 +3 rem. -300 (actual value)-2808 / -836 R2 9700 (number) (3753) / (9115) R1 9996 -4 rem. +213 (actual value)+3753 / -885 R2 0213 Looking at the last example, 3753 / 9115, if that were unsigned multiplication the answer would be 0 remainder 3753, a completely different answer from the signed division. Every time you do a division, you have to state whether you want unsigned or signed division. Chapter 0.2 - Bases 2 and 16 ============================ xv I'm making the assumption that if you are along for the ride you already know something about binary and hex numbers. This is a review only. BASE 2 AND BASE 16 Base 2 (binary) allows only 0s and 1s. Base 16 (hexadecimal) allows 0 - 9, and then makes up the next six numbers by using the letters A - F. A = 10, B=11, C=12, D=13, E=14 and F=15. You can directly translate a hex number to a binary number and a binary number to a hex number. A group of four digits in binary is the same as a single digit in hex. We'll get to that in a moment. The binary digits (BITS) are the powers of 2. The values of the digits (in increasing order) are 1, 2, 4, 8, 16, 32, 64, 128, 256 and so on. 1 + 2 + 4 + 8 = 15, so the first four digits can represent a hex number. This repeats itself every four binary digits. Here are some numbers in binary, hex, and decimal BINARY HEX DECIMAL 0100 4 4 1111 F 15 1010 A 10 0011 3 3 Let's go from binary to hex. Here's a binary number. 0110011010101101 To go from binary to hex, first divide the binary number up into groups of four starting from the right. 0110 0110 1010 1101 Now simply change each group into a hex number. 0110 -> 4 + 2 -> 6 0110 -> 4 + 2 -> 6 1010 -> 8 + 2 -> A 1101 -> 8 + 4 + 1 -> D and we have 66AD as the result. Similarly, to go from hex to binary: D39F change each hex digit into a set of four binary digits: D = 13 -> 8 + 4 + 1 -> 1101 ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor xvi ______________________ 3 -> 2 + 1 -> 0011 9 -> 8 + 1 -> 1001 F = 15 -> 8+4+2+1 -> 1111 and then put them all together: 1101001110011111 Of course, having 16 digits strung out like that makes it totally unreadable, so in this book, if we are talking about a binary number, it will always be separated every 4 digits for clarity.{1} All computers operate on binary data, so why do we use hex numbers? Take a test. Copy these two binary numbers: 1011 1000 0110 1010 1001 0101 0111 1010 0111 1100 0100 1100 0101 0110 1111 0011 Now copy these two hex numbers: B86A957A 7C4C56F3 As you can see, you recognize hex numbers faster and you make fewer mistakes in transcription with hex numbers. ADDITION AND SUBTRACTION The rules for binary addition are easy: 0 + 0 = 0 0 + 1 = 1 1 + 0 = 1 1 + 1 = 0 (carry 1 to the next digit left) similarly for binary subtraction: 0 - 0 = 0 0 - 1 = 1 (borrow 1 from the next digit left) 1 - 0 = 1 1 - 1 = 0 On the 8086, you can have a 16 bit (binary digit) number represent a number from 0 - 65535. 65535 + 1 = 0 (65536). For binary numbers, the boundary is 65535/0. You count up or down through that boundary. The 8086 is a mod 65536 machine. That means the things that are equivalent to 35631 mod 65536 are:{2} ____________________ 1. This will not be true of the actual assembler code, since the assembler demands an unseparated number. 2. 35631 + 29905 = 65536. -29905 = 35631 (mod 65536) Chapter 0.2 - Binary and Hex Numbers xvii ____________________________________ (3*65536 + 35631) (3*65536 - 29905) (2*65536 + 35631) (2*65536 - 29905) (1*65536 + 35631) (1*65536 - 29905) ( 0 + 35631) ( 0 - 29905) (-1*65536 + 35631) (-1*65536 - 29905) (-2*65536 + 35631) (-2*65536 - 29905) (-3*65536 + 35631) (-3*65536 - 29905) The unsigned number 35631 and the signed number -29905 look the same. They ARE the same. In all addition, they will operate in the same fashion. The unsigned number will use CF (the carry flag) and the signed number will use OF (the overflow flag). On all 16 bit computers, 0-32767 is positive and 32768 - 65535 is negative. Here's 32767 and 32768. 32767 0111 1111 1111 1111 32768 1000 0000 0000 0000 32768 and all numbers above it have the left bit 1. 32767 and all numbers below it have the left bit 0. This is how to tell the sign of a signed number. If the left bit is 0 it's positive and if the left bit is 1 it's negative. TWO'S COMPLEMENT In base 10 we had 10s complement to help us with negative numbers. In base 2, we have 2s complememt. 0 = 65536 = 65535 + 1 so we have: 1 0000 0000 0000 0000 = 1111 1111 1111 1111 + 1 To get the negative of a number, we subtract: -49 = 0 - 49 = 65536 - 49 = 65535 - 49 + 1 (65536) 1111 1111 1111 1111 + 1 (49) 0000 0000 0011 0001 result 1111 1111 1100 1110 + 1 -> 1111 1111 1100 1111 (-49) ; - - - - - -21874 (65536) 1111 1111 1111 1111 + 1 (21874) 0101 0101 0101 0111 result 1010 1010 1010 1000 + 1 -> 1010 1010 1010 1001 (-21847) ; - - - - - -11628 (65536) 1111 1111 1111 1111 + 1 (11628) 0010 1101 0110 1100 result 1101 0010 1001 0011 + 1 -> 1101 0010 1001 0100 (-11628) ; - - - - - The PC Assembler Tutor xviii ______________________ -1764 (65536) 1111 1111 1111 1111 + 1 (1764) 0000 0110 1110 0100 result 1111 1001 0001 1011 + 1 -> 1111 1001 0001 1100 (-1764) ; - - - - - Notice that since: 1 - 0 = 1 1 - 1 = 0 when you subtract from 1, you are simply switching the value of the subtrahend (that's the number that you subtract). 1 -> 0 0 -> 1 1 becomes 0 and 0 becomes 1. You don't even have to think about it. Just switch the 1s to 0s and switch the 0s to 1s, and then add 1 at the end. Well do one more: -348 (65536) 1111 1111 1111 1111 + 1 (348) 0000 0001 0101 1100 result 1111 1110 1010 0011 + 1 -> 1111 1110 1010 0100 (-348) Now two more, this time without the crutch of having the top number visible. Remember, even though you are subtracting, all you really need to do is switch 1s to 0s and switch 0s to 1s, and then add 1 at the end. -658 (658) 0000 0010 1001 0010 result 1111 1101 0110 1101 + 1 -> 1111 1101 0110 1110 (-658) ; - - - - - -31403 (34103) 0111 1010 0100 0111 result 1000 0101 1011 1000 + 1 -> 1000 0101 1011 1001 (-31403) SIGN EXTENSION If you want to use larger numbers, it is possible to use multiple words to represent them.{3} The arithmetic will be done 16 bits at a time, but by using the method described in Chapter 0.1, it is possible to add and subtract numbers of any length. One normal length is 32 bits. How do you convert a 16 bit to a 32 bit number? If it is unsigned, simply put 0s to the left: 0100 1100 1010 0111 -> 0000 0000 0000 0000 0100 1100 1010 0111 ____________________ 3. On the 8086, a word is 16 bits. Chapter 0.2 - Binary and Hex Numbers xix ____________________________________ What if it is a signed number? The first thing we need to know about signed numbers is what is positive and what is negative. Once again, for reasons of symmetry, we choose positive to be from 0000 0000 0000 0000 0000 0000 0000 0000 to 0111 1111 1111 1111 1111 1111 1111 1111 (hex 00000000 to 7FFFFFFF) and we choose negative to be from 1000 0000 0000 0000 0000 0000 0000 0000 to 1111 1111 1111 1111 1111 1111 1111 1111 (hex 10000000 to FFFFFFFF).{4} This longer number system cycles from 1111 1111 1111 1111 1111 1111 1111 1111 to 0000 0000 0000 0000 0000 0000 0000 0000 (hex FFFFFFFF to 00000000). Notice that by using binary numbers we are innundating ourselves with 1s and 0s. If it is a positive signed number, it is still no problem (recall that in our 16 bit system, a positive number is between 0000 0000 0000 0000 and 0111 1111 1111 1111, a negative signed number is between 1000 0000 0000 0000 and 1111 1111 1111 1111). Just put 0s to the left. Here are some positive signed numbers and their conversions: (1974) 0000 0111 1011 0110 -> 0000 0000 0000 0000 0000 0111 1011 0110 (1) 0000 0000 0000 0001 -> 0000 0000 0000 0000 0000 0000 0000 0001 (3909) 0000 1111 0100 0101 -> 0000 0000 0000 0000 0000 1111 0100 0101 If it is a negative number, where did its representation come from in our 16 bit system? -x -> 1111 1111 1111 1111 + 1 -x = 1111 1111 1111 1111 - x + 1. This time it won't be FFFFh + 1 but FFFFFFFFh + 1. Let's have some examples. (Here we have 8 bits to the group because there is not enough space on the line to accomodate 4 bits to the group). 16 BIT SYSTEM 32 BIT SYSTEM -1964 11111111 11111111 + 1 11111111 11111111 11111111 11111111 + 1 00000111 10101100 00000000 00000000 00000111 10101100 11111000 01010011 + 1 11111111 11111111 11111000 01010011 + 1 11111000 01010100 11111111 11111111 11111000 01010100 ____________________ 4. Once again, the sign will be decided by the left hand digit. If it is 0 it is a positive number; if it is 1 it is a negative number. The PC Assembler Tutor xx ______________________ -2867 11111111 11111111 + 1 11111111 11111111 11111111 11111111 + 1 00001011 00110011 00000000 00000000 00001011 00110011 11110100 11001100 + 1 11111111 11111111 11110100 11001100 + 1 11110100 11001101 11111111 11111111 11110100 11001101 -182 11111111 11111111 + 1 11111111 11111111 11111111 11111111 + 1 00000000 10110110 00000000 00000000 00000000 10110110 11111111 01001001 + 1 11111111 11111111 11111111 01001001 + 1 11111111 01001010 11111111 11111111 11111111 01001010 As you can see, all you need to do to sign extend a negative number is to put 1s to the left. Can't those 1s on the left become 0s when we add that 1 at the end? No. In order for that to happen, the right 16 bits must be 1111 1111 1111 1111. But that can only happen if the number to be negated is 0: 1111 1111 1111 1111 1111 1111 1111 1111 + 1 -0000 0000 0000 0000 1111 1111 1111 1111 1111 1111 1111 1111 + 1 -> 0000 0000 0000 0000 0000 0000 0000 0000 In all other cases, adding 1 does not carry anything out of the right 16 bits. It is impossible to truncate one of these 32 bit numbers to a 16 bit number without making the results unreliable. Here are two examples: +1,687,451 00000000 00011001 10111111 10011011 -> 10111111 10011011 (-16485) -3,524,830 11111111 11001010 00110111 00100010 -> 00110111 00100010 (+14114) Truncating has changed both the sign and the absolute value of the number. Chapter 0.3 - LOGIC =================== xxi Programs use numbers a lot. But they also ask questions that require a yes/no answer. Is there an 8087 chip in the computer? Is there a color monitor; how about a monochrome monitor? Is there keyboard input waiting to be processed? Are you going to get lucky on your date on Friday? Or, since you are a computer programmer, are you going to have a date this month? Did the file open correctly? Have we reached end of file? In order to combine these logical questions to our heart's content, we need a few operations: AND, OR, XOR (exclusive or), and NOT. AND If we have two sentences "A" and "B", then ("A" AND "B") is true if (and only if) both "A" and "B" are true. "It is raining and I am wet" is true only if both "It is raining" and "I am wet" are true. If one or both are false, then 'A and B' is false. A shortcut for writing this is to use a truth table. A truth table tells under what conditions an expression is true or false. All we need to know is whether each component expression is true or false. T stands for true, F for false. "A" "B" "A" AND "B" T T T T F F F T F F F F Notice that the truth table does NOT say anything about whether the expression makes sense. The sentence: "It's hot and I am sweating." is a reasonable expression which may or may not be true. It will be true if both "It is hot" and "I am sweating" are true. But the sentence: "The trees are green and Quito is the Capital of Ecuador." is pure garbage. It does not satisfy our idea of what a sensible expression should be, and should NEVER be evaluated by means of a truth table. The warranty on a truth table is, if the expression makes sense, then the truth table will tell you under what conditions it is true or false. If the expression does not make sense, then the truth table tells you nothing. Fortunately, this problem really belongs to philisophical logic. When you use logical operators in your program, there will be a ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor xxii ______________________ well defined reason for each use. If you start doing screwy things, your program probably won't run. OR There are two different types of OR alive and kicking in the English language - the exclusive OR (A or B but not both) and the inclusive OR (A or B or both). A mother tells her child "You can have a piece of cake or a piece of candy." Does this mean that he can have both if he wants? Of course not. He can have one or the other, but not both. This is XOR, the exclusive or. The truth table for this is: "A" "B" "A" XOR "B" T T F T F T F T T F F F 'A XOR B' is true if exactly one of them is true. If they both are true 'A XOR B' is false. If neither is true, 'A XOR B' is false. Examples of XOR are: 1) We will either go to Italy or to Japan on our vacation. 2) I'll either have a tuna salad or a chef's salad. 3) He'll either buy a Lamborghini or a BMW. Consider this sentence: "To go to Harvard, you need to have connections or to be very smart." Do we want this to mean that if you have connections but are very smart, you are automatically excluded from going to Harvard? No. We want this to mean one or the other or perhaps both. Sometimes you write this as 'and/or' if you want to be absolutely clear. This is the inclusive OR. The truth table for OR is: "A" "B" "A" OR "B" T T T T F T F T T F F F 'A OR B' is true if one or both of them are true. If both are false, then it is false. Examples of OR are: 1) They'll either go to Italy or to Austria on their vacation. 2) I'll have either steak or shrimp at The Sizzler. 3) He'll buy either a paisley tie or a rep tie. The three sentences for XOR and OR mimic each other on purpose. In the English language, you know which type of OR is being used by what is being talked about. You know intuitively which one Chapter 0.3 - Logic xxiii ___________________ applies. If someone buys two different ties you are not suprised. If someone buys two expensive cars at the same time you are quite surprised.{1} With very few exceptions, if you confuse the two you are doing it on purpose. If your father says "You can have the car on Friday night or on Saturday night." and you don't understand which OR applies, it's not his fault. NOT The final logical operation is NOT. The sentence: "It is not raining." is false if it is raining, and true otherwise. The truth table is: "A" NOT "A" T F F T This is self-explanatory. Amazingly enough, this is all you need to know about logic to be a quality programmer. Trying to make very complex combinations of these logical operations may be fun for philosophy, but it is death to a program. KISS is the operative word in programming.{2} ____________________ 1. Especially if his job is working the cash register at Sears. 2. Which, if you don't know, is Keep It Simple, Stupid. Chapter 0.4 - MEMORY ==================== xxiv The basic unit of memory on 8086 machines is the byte.{1} This means that in every memory cell we can store a number from 0 to 255. Each memory cell has an address. The first one in memory has address 0, then address 1, then 2, then 3, etc. The registers on the 8086 are one word (two bytes) long. This means that any register can store and operate on a number from 0 to 65,535. (It also has some registers which can operate on bytes and can store and operate on numbers from 0 to 255.) Memory is physically external to the 8086. Registers are physically internal to the 8086; they are actually on the chip. One of the ways that we can access memory on the 8086 is by putting the address of a memory cell in a register and then telling the 8086 that we want to use the data at that memory address. Since each byte has its own address, and since we can't have a number larger that 65,535 in any one register, it is impossible to address more than 65,535 bytes with a single register. Back in the dark ages, that might have been enough memory, but it sure isn't enough nowdays. Intel solved the problem by creating SEGMENTS. Each segment is 65,536 bytes long, going from address 0 to address 65,535.{2} You tell the 8086 where you want to go in memory by telling it which segment you are in and what the address is within that segment. Segments are numbered from 0 to 65,535. That is, there are 65,536 of them. As a design decision, Intel decided that a segment should start every 16 bytes. This decision was made in the late 70's and is cast in stone. On the 8086, a segment starts every 16 bytes. Here is a list of some segments, with the segment number and the segment starting address in both decimal and hex. SEGMENT NUMBER STARTING ADDRESS 0d 0h 0d 00h 1d 1h 16d 10h 2d 2h 32d 20h 3d 3h 48d 30h 4d 4h 64d 40h ____________________ 1. As it is on all computers. 2. The last segments are actually less that 65,535, as will be explained later. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 0.4 - Memory xxv ____________________ 200d C8h 3,200d C80h 21694d 54BEh 347,104d 54BE0h 51377d C8B1h 822,032d C8B10h One thing you should note is that in hex, the segment number is the same as the starting address, except that the starting address has an extra 0 digit on the right. These segments overlap. A new segment starts every 16 bytes, and they are all 65,535 bytes long. This means that with the exception of the first 16 bytes, any address in memory is in more that one segment. Here are some addresses. The word "offset" means the number of bytes from the beginning of the segment. (It is possible for a memory cell to have an offset 0). The offset is shown in both decimal (d) and hex (h): memory address 55 (37h) Seg # Offset Offset 0 55d 37h 1 39d 27h 2 23d 17h 3 7d 7h Thus the address 55 is in 4 different segments, and can be addressed relative to any one of them. memory address 17,946 (461Ah) Seg # Offset Offset 0 17,946d 461Ah 1 17,930d 460Ah 2 17,914d 45FAh ... ... 1120 26d 1Ah 1121 10d 0Ah The address 17,946 is in 1122 different segments, and can be addressed relative to any of them. Notice that as the segment number goes up one segment, the offset address goes down 16 bytes (10h). Above the address 65,519, every memory cell can be addressed by 4,096 different segments. Because of the way the addresses are generated on the 8086, the maximum memory address possible is 1,048,575 (FFFFF hex.) This means that the final segments are less than 65,536 bytes long. In this table "Address" is the starting address of the segment. All the following numbers are decimal: Segment # Address Max Offset 65,535d 1,048,560d 15d 65,534d 1,048,544d 31d 65,533d 1,048,528d 47d ... ... ... The PC Assembler Tutor xxvi ______________________ 64,000d 1,024,000d 24,575d Let's look at these same numbers in hex: Segment # Address Max Offset FFFFh FFFF0h Fh FFFEh FFFE0h 1Fh FFFDh FFFD0h 2Fh ... ... ... FA00h FA000h 5FFFh The maximum addressable number is FFFFFh, which is why these segments are cut short. There are 4,095 segments at the top which are less than 65,536 bytes long. Don't worry, though, because this top section of memory belongs to the operating system, and your programs will never be put there. It will never affect you. Back in the late 70s, a million bytes of memory seemed like a lot. In the 60s, large mainframe computers had only a half-million bytes of memory. In the 70s memory was still exhorbitantly expensive. Nowdays, however, you practically need a megabyte just to blow your nose. But this segmentation system is cast in stone, so there is no way to get more memory on the 8086.{3} In the beginning, when we make a program, we will use one segment for the machine code, one segment for permanant data, and one segment for temporary data. If we need it, this gives us 196,608 bytes of usable memory right off the bat. As you will see by the time we are finished, ALL memory is addressable - we just need to do more work to get to it all. All this talk about segments and offsets may have you concerned. If you have to keep track of all these offsets, programming is going to be very difficult.{4} Not to worry. It is the assembler's job to keep track of the offsets of every variable ____________________ 3. This megabyte rule is unalterable. EMS (extended memory) is actually a memory swapping program. On the 28086 and 80386 you can have more than one megabyte of memory but the program can only access one megabyte. The program reserves a section of its one megabyte for a transfer area. It then calls EMS which transfers the data from this extended memory to the transfer area. It is in effect a RAM disk. It is like using a hard disk but is much faster. If Intel had bitten the bullet with the 80286 and said that a segment would start every 256 bytes instead of every 16 bytes, we would have 16 megabytes of directly accessible memory instead of 1 megabyte. Hindsight is such a wonderful thing. 4. Remember, an offset is just how many bytes a memory cell is from the beginning of the segment. Chapter 0.4 - Memory xxvii ____________________ and every label in your program.{5} Which segments your program uses is decided by the operating system when it loads your program into memory. It puts some of this information into the 8086. At the start of the program, you put the rest of the information into the 8086, and then you can forget about segments. NUMBERS IN MEMORY The largest number you can store in a single byte is 255. If you are calculating the distance from the sun to Alpha Centauri in inches, obviously one byte is not enough. Unfortunately, the 8086 can't really handle large numbers like that.{6} It can handle numbers which are 16 bits (2 bytes) long. However, with subprograms supplied with all compilers, we can handle large numbers, though rather slowly if we don't use an 8087. All these different programs need a standard way to write numbers in memory, and this standard is supplied by Intel. The standard is : (1) integers can be 1, 2, or 4 bytes long. This corresponds to -128 to +127 , -32,768 to +32,767, and -2,147,483,648 to +2,147,483,647. (2) scientific floating point numbers which have an exponent and can be very large. They come as 4 byte and 8 byte numbers. We will not deal with them at all, but we need to know how they are stored. (3) Commercial or BCD numbers which occupy 1/2 byte per digit. Since some of the 8086 instructions are concerned with these we will cover them, but if you are not curious about them, you can skip that section. The standard is a 10 byte number. Let's look at a number. For the rest of this section, all numbers will be hex, and if a number is longer than one byte, we will display it with a blank space between each byte. If it is a one byte number - e.g. 3C, we know exactly where we are going to put it. But what if a number is 4 bytes long - e.g. 2D F5 33 0A - and we want to put it in memory starting at offset 264. We have two choices: 2D F5 33 0A Address Choice 1 Choice 2 267 2D 0A 266 F5 33 265 33 F5 ____________________ 5. As in other languages, a label is a name that marks a place in the code. In BASIC, labels are actually numbers (such as 500 in the instruction GOTO 500). Labels are frowned on in Pascal and C, but are the lifeblood of assembler language. 6. But fortunately, the 8087 can. The PC Assembler Tutor xxviii ______________________ 264 0A 2D Neither choice is better than the other. Choice 1 puts the right-most byte in low memory, choice 2 puts the right-most byte in high memory.{7} The right-most byte is called the LEAST SIGNIFICANT BYTE because it has the least effect on a number, while the left-most byte is called the MOST SIGNIFICANT BYTE because it has the most effect on a number. In fact, Intel picked choice #1 for the 8086 (which has the least significant byte in low memory), and Motorola picked choice #2 for the 68000 (which has the most significant byte in low memory). This is consistent for both the 8086 and the 8087: THE LEAST SIGNIFICANT BYTE IS ALWAYS IN LOW MEMORY: EACH NUMBER IN MEMORY STARTS WITH THE LEAST SIGNIFICANT BYTE. Remember that, and you'll save yourself some trouble. One problem you will run up against is that when we draw pictures of memory, we often draw from left to right, that is: ADDRESS 264 265 266 267 When we do that, things start looking wierd. For 2D F5 33 0A we have: ADDRESS 264 265 266 267 VALUE 0A 33 F5 2D This is exactly backwards. Remember, memory doesn't go from left to right, it goes UP from 0, and THE LEAST SIGNIFICANT BYTE IS ALWAYS IN LOW MEMORY. You will certainly make some mistakes till you get used to being consistent about this. The right hand digit of a number is always in low memory. If you think of memory as being VERTICAL: 1E A3 07 B5 Value Address 1E 4782 A3 4781 07 4780 B5 4779 rather than being LEFT TO RIGHT: Address 4779 4780 4781 4782 Value B5 07 A3 1E you will be much better off. ____________________ 7. Low memory always means the low addresses and high memory always means the high addresses. Chapter 0.5 - STYLE =================== xxix Finally, it is time to say a word about style. Assembler code is by its nature difficult to read. It divides any concept into a number of sequential steps. If you have the BASIC statement: MINUTES = DAYS * 1440 You get the idea because you can scan the line to see what is wanted. The assembler code for the above line is: {1} mov ax, days mov bx, 1440 mul bx mov minutes, ax In BASIC, the concept was imbedded in the expression. In assembler it was lost. This means two things. First, you must be religious about documenting every step. If you come back to something two or three months later and you haven't documented what you are doing, it may take you longer to figure out what you did than it would to completely rewrite what you did. Secondly, if you are a person who likes code like this: x = (y + k) / (z - 4) you are headed for big trouble. At the assembler level it is CRITICAL that you give every variable a name that signifies exactly what it is. If you are counting the number of correct answers, then the variable should be: correct_answers not 'K'. If you are looking at the remainder from a division, then the variable should be: remainder not 'R'. If you try to use short names like 'x', 'k' and 'y', I will guarantee that for every minute you save by not having to type in long variable names, you will lose 10 minutes by not being able to figure out what is going on when you reread the code. In this tutorial we will use multiple words connected by underscores: first_positive_prime median_age oldest_student ____________________ 1. Don't worry about what this code does. You will learn soon enough. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor xxx ______________________ The Microsoft assembler allows 31 significant characters in a name. Even though there are several other characters allowed, we will use only letters, the underscore, and numbers (where appropriate): approximation1 approximation2 approximation3 This should make your code similar to well written C or Pascal code, and greatly increase the readability of the code. Chapper 1 - Some Simple Programs ================================ 1 It is now time to start writing assembler code. One of the problems with writing in assembler is that there is no way to get input into the program or output from the program until you are very far along with learning assembler language. This is a Catch-22 situation. You can't learn assembler easily without access to input and output, and you can't write i/o routines till you know assembler.{1} Help is at hand. Included on this disk is a file called asmhelp.obj. It is actually a series of programs that will allow you to get input from the keyboard and print output to the screen. It has some other features which will be explained later. The second problem at the start is that every assembler program has a lot of overhead. These are standard instructions and formats that you need to get the program to work AT ALL. This disk contains templates that contain all the overhead, so to write a program you just make a copy of the template and enter the code and data at the appropriate place. By the end of this sequence of lessons, you will know how to make templates yourself and know the meaning of each word in the template. For now, you have to have faith that what is written is necessary, and that you will learn the meaning of everything later. Let's start. At the end of this chapter is the template we will use for now - temp1.asm. These templates are in the subdirectory \template. Let's call the first program prog1.asm (very original). All programs in assembler must have the file extension .asm so make a copy by giving the command: >copy \template\temp1.asm prog1.asm You are now ready to enter code. Open up prog1.asm with your editor, and take a look at it. It should look the same as temp1.asm. Where it says "put name of program here" - that is for your personal use so you can see the program name while in the editor. The assembler ignores everything after a semicolon. All the lines that start with a semicolon are there for visual separation or for comments. The lines with asterisks separate segments. Yes, the assembler is going to make this program into three segments. You should put all code between the line labeled "START CODE BELOW THIS LINE" and the line labeled "END CODE ABOVE THIS LINE". ____________________ 1 Just to give you an idea of how contradictory the situation is, asmhelp.obj was written in assembler language and consists of about 3500 lines (that's about 50 pages), yet you need to be using it from day one. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 2 ______________________ Later you will get more flexibility. Also notice the lines starting with the word EXTRN. Those lines tell the assembler that the subroutines (such as print_num) are in a different file and must be found when this program is linked. The assembler enters each EXTRN name in a list and records each place that an EXTRN subroutine is requested. It is possible that one of these subroutines is never requested. That's fine. However, every one of these subroutines must be present at link time or you will get a link error. This is true even if the subroutine is on the list but never requested. Since the overhead is so long, and I am so lazy, the template will never be included in the description of the program. What you will get is the name of the template, then any data you need, then the code. It will look like this: TEMP1.ASM ; - - - - - START DATA BELOW THIS LINE the data is written here ; - - - - - END DATA ABOVE THIS LINE ; - - - - - START CODE BELOW THIS LINE the code is written here ; - - - - - END CODE ABOVE THIS LINE If there is no data, no data section will be included. If there is data, it should be written in the segment named DATASTUFF between the lines "START DATA..." and "END DATA ...". The code should be written between the lines that say "START CODE ..." and "END CODE ...". For our first program, the description will look like this: TEMP1.ASM ; - - - - - START CODE BELOW THIS LINE first_label: call get_num call print_num jmp first_label ; - - - - - END CODE ABOVE THIS LINE That's all we need. If we needed to write all the overhead (starting with the line "main proc far") we would have: main proc far start: push ds sub ax, ax push ax mov ax, DATASTUFF mov ds, ax ; + + + + + + + + + + + Chapter 1 - Some Simple Programs 3 ________________________________ first_label: call get_num call print_num jmp first_label ; + + + + + + + + + + + ret mainendp etc. You can see that a simple four line program has blossomed into a monster. I'm assuming some intelligence on your part. Until further notice, the code goes between the "START CODE' and the "END CODE" lines and the data goes in the DATASTUFF segment between the "START DATA" and the "END DATA" lines. It's time to type in the program listed above. Be careful when you type. When you are done I'll explain it. In assembler we need a way to label different spots in the code. We use labels. A LABEL is a name (at the beginning of a line) which is immediately followed by a colon. A label doesn't generate any code. The assembler merely keeps track of where the label is for future use. The label we are using is named first_label. The CALL instruction tells the assembler to call the subroutine listed after the call.{2} We are calling two subroutines; first get_num which gets a number, then print_num, which prints a number in a variety of styles. Finally, JMP tells the assembler that you want to jump to the label listed after it. It is the same as GOTO in BASIC. If you look at the program, you will notice that we have an infinite loop. It was designed that way. It takes a fair amount of code to exit gracefully, so we will always exit ungracefully. When you are tired of the program, simply press Control-C. That should get you out. That way you can try out something an indefinite number of times, and when you have finished you can press CTRL-C to quit the program. One warning about machine language before we start. There is no safety net, so before you start a machine language program, make sure all files are closed (i.e. that you have no other programs in memory). We will NEVER open a file in one of our programs. ____________________ 2 I am using the words subroutine, routine, program and procedure (the technical word) interchangebly throughout the book. A program is actually a group of one or more procedures, but I'm not going to be too strict about it. Context should tell whether we are talking about a single procedure or a whole program. The PC Assembler Tutor 4 ______________________ Your programs are almost certain to wind up in zombie space from time to time. If that happens, your choices are (1) hit CTRL-C. If that doesn't work, then (2) hit CTRL-ALT-DEL. As a last resort, you can (3) hit a reset button or shut the machine down. For that reason, memorize this mantra: BACKUP YOUR PROGRAMS AND BACKUP YOUR BACKUPS. Double check that you have typed in the assembler code correctly. Now it's time to assemble it. I am assuming that you have the Microsoft assembler.{3} Type: >masm prog1 ; The first thing you will see is the copyright notice: Microsoft (R) Macro Assembler Version 5.10 Copyright (C) Microsoft Corp 1981, 1988. All rights reserved. The file extension .asm is unnecessary; it's understood. The semicolon is to speed things up. If you don't use it, the assembler will ask you if you want to change any of the default choices. If you type just masm: >masm then you need to give the name of the assembler text file on the first line: Source filename [.ASM]: prog1 Object filename [prog1.OBJ]: Source listing [NUL.LST]: Cross-reference [NUL.CRF]: but press ENTER for the other options. If you type masm prog1: >masm prog1 You don't want to change any of the default settings: Object filename [prog1.OBJ]: Source listing [NUL.LST]: Cross-reference [NUL.CRF]: When the assembler asks you about options, hit the ENTER key. If you have made any errors, the assembler will tell you which line they are on and give you a description of the problem. Make a hard copy of them on the printer, then use your editor and find the line. Unfortunately, at this stage of the game, it will be ____________________ 3 If you are using A86, then consult A86.APP. If you are using Turbo Assembler, then consult TASM.APP. They are both located in the \APPENDIX subdirectory on disk 3. Chapter 1 - Some Simple Programs 5 ________________________________ very difficult for you to figure out what the problem is. You will have to struggle through the first 4 or 5 programs before things start getting easier. All the programs on these disks have been compiled on a Microsoft v5.1 assembler. They have assembled. They have been run, and they work. Don't tamper with the template and copy the code exactly and everything should work. If you haven't made any errors, the assembler will say: 0 Warning Errors 0 Severe Errors LINKING The assembler has given you back another program named prog1.obj - the same name with the extension .obj. I am assuming that you have all used the linker with compiled programs. If you haven't, you may be getting in over your head by using machine language. All the extra subroutines are in a program called asmhelp.obj. Its pathname is \asmhelp\asmhelp.obj. You want to put it in the root directory of your current drive. In the whole book, we will assume that its pathname is: \asmhelp.obj If you put it somewhere else, you will have to modify the pathname whenever it appears. Link the two modules by writing: >link prog1+\asmhelp ; The copyright notice will appear: Microsoft (R) Overlay Linker Version 3.61 Copyright (C) Microsoft Corp 1983-1987. All rights reserved. This time the file extensions are understood to be .obj. The semicolon is to avoid having to make default choices. If you type: >link then you need to put the module names after the first prompt: Object Modules [.OBJ]: prog1+\asmhelp Run File [PROG1.EXE]: List File [NUL.MAP]: Libraries [.LIB]: but press ENTER for the other choices. If you type: link prog1+\asmhelp You need to do nothing extra: Run File [PROG1.EXE]: The PC Assembler Tutor 6 ______________________ List File [NUL.MAP]: Libraries [.LIB]: When the linker asks for choices, simply press the ENTER key. The linker gives the executable file the name of the first object file on the line, so you should always put your program first and asmhelp.obj second. If there are no errors, you are ready to go. If there are errors, once again, they will be very difficult to trace. Go back and check everything from the beginning. You are now ready to run the program. Type: >prog1 The program will start. The first thing you will see is a copyright notice. The PC Assembler Helper Version 1.0 Copyright (C) 1989 Chuck Nelson All rights reserved. It appears the first time you call a subprogram in the module asmhelp.obj. The program will request a number. Give it any legal signed or unsigned number. It should be no longer than 5 digits. Press ENTER, and it will display the possible ways that that number can be thought of by the computer. Enter any decimal number 4410 HEX SIGNED UNSIGNED CHAR BINARY 113AH +04410 04410 11 : ** 0001000100111010 Enter any decimal number 30486 HEX SIGNED UNSIGNED CHAR BINARY 7716H +30486 30486 w 16 ** 0111011100010110 If the signed or unsigned number doesn't look the same as what you entered, then the number you entered is too big for a 16 bit computer. For signed numbers, the limits are +32767 to -32768 and for unsigned numbers, the limits are 0 to 65535. Enter any decimal number -64661 HEX SIGNED UNSIGNED CHAR BINARY 036BH +00875 00875 03 k ** 0000001101101011 Enter any decimal number 94547 HEX SIGNED UNSIGNED CHAR BINARY 7153H +29011 29011 q S * 0111000101010011 Lets look at the numbers. Each type of output is labeled. After a hex number, there is an 'H' and after the characters, there is a '*'. This is always true. Every time you print a hex number, there will be an 'H', and every time you print a character, there Chapter 1 - Some Simple Programs 7 ________________________________ will be a '*'. This is so you will always know what is being printed. Also notice that a signed integer ALWAYS has a sign and an unsigned integer NEVER has a sign. Not all characters are visible. Ascii 0 - 32 are invisible (32 is a blank). On the PC, ascii 33-255 are visible, but ascii 127 and ascii 255 are problematic. Therefore, if the ascii code is 0-32, 127 or 255, that character will be printed as a hex number, not a character, and print_num will signal the event by printing a double asterisk '**' instead of a single one. This has happened in the first two examples. ( 11 : ** ) and ( w 16 ** ). The first one is the hex number 11 followed by the character ':' and the second one is the character 'w' followed by the hex number 16. Both are signalled by the double asterisk '**' instead of the single asterisk '*'. Do a few examples. When you are done looking at the numbers, press CTRL-C and you will exit the program. Enter any decimal number ^C PROGRAM 2 The second program is almost the same as the first one. The program takes input from the keyboard and displays it in a variety of styles. This time, however, it is going to ask for different inputs: ascii, hex, binary and decimal. If you make an error in the input, the subroutine will prompt you again for the input. Here's the program: TEMP1.ASM ;+ + + + + + + + + + START CODE BELOW THIS LINE first_label: call get_num ; 1 to 5 digit signed or unsigned call print_num call get_ascii ; 1 or 2 characters call print_num call get_binary ; a 1 to 16 bit binary number call print_num call get_hex ; a 1 to 4 digit hex number call print_num jmp first_label ;+ + + + + + + + + + END CODE ABOVE THIS LINE The things to the right of the semicolons are comments. You do not need to type them in if you don't want to. Once again, assemble the program. (There should be no warning or severe errors. If something is wrong, it is most likely a typing error.) Then link it with asmhelp.obj. Remember - your program should be The PC Assembler Tutor 8 ______________________ the first one listed.{4} If all is well, run the program. It will ask you for a number (that is a signed or unsigned number), ascii characters, a binary number, and a 4 digit hex number (0-9,A-F). Enter any decimal number 27959 HEX SIGNED UNSIGNED CHAR BINARY 6D37H +27959 27959 m 7 * 0110110100110111 Enter one or two ascii characters $% HEX SIGNED UNSIGNED CHAR BINARY 2425H +09253 09253 $ % * 0010010000100101 Enter a two byte binary number 0101111001100010 HEX SIGNED UNSIGNED CHAR BINARY 5E62H +24162 24162 ^ b * 0101111001100010 Enter a two byte hex number 784d HEX SIGNED UNSIGNED CHAR BINARY 784DH +30797 30797 x M * 0111100001001101 Once again, this is an infinite loop, so in order to quit, you need to hit CTRL-C. The purpose of these first two programs is to remind you that the computer doesn't care whether you think you are storing binary numbers, characters, hex numbers, signed numbers or unsigned numbers. They all wind up in the computer as a series of 1s and 0s, and you can use these 1s and 0s any way you like. It's up to you to keep track of them. If you feel comfortable with the way we are writing, assembling and linking programs, you are ready to start looking at the 8086 itself. ____________________ 4 For your convenience, there is a batch file on the disk called asmlink.bat. Its pathname is \asmhelp\asmlink.bat. It is one line long and looks like this: link %1+\asmhelp ; If you use this batch file, you will never have an order problem. If your file is named myfile.asm, then type: >asmlink myfile Chapter 1 - Some Simple Programs 9 ________________________________ ; TEMP1.ASM The first assembler template ; put name of program here ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - STACKSEG SEGMENT STACK 'STACK' dw 100 dup (?) STACKSEG ENDS ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - DATASTUFF SEGMENT PUBLIC 'DATA' ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE DATASTUFF ENDS ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CODESTUFF SEGMENT PUBLIC 'CODE' EXTRN print_num:NEAR , get_num:NEAR EXTRN get_ascii:NEAR , get_hex:NEAR , get_binary:NEAR ASSUME cs:CODESTUFF, ds:DATASTUFF main proc far start: push ds ; set up for return sub ax,ax push ax mov ax, DATASTUFF mov ds,ax ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE ret main endp CODESTUFF ENDS ; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - END start The PC Assembler Tutor 10 ______________________ SUMMARY CALL CALL calls a subroutine. call get_num JMP JMP jumps to the indicated label. JMP label47 LABEL A label is a name at the beginning of a line which is followed by a colon. It is used to mark a spot in the program. label47: There are three different things which will be mentioned from time to time, so it's best to define them now. ASSEMBLER INSTRUCTIONS (CODE) is the text that you type in and give to the assembler. MACHINE CODE is the code that the assembler generates. After some adjustment by the linker, it is readable by the 8086. It is the actual code that controls the program. MICROCODE is the code that is imbedded in the 8086 itself. Each instruction has its own set of mini instructions within the 8086. This is the MICROCODE. Chapter 2 - Data ================ 11 Before you start using data, you need to know what data looks like. It is not necessary for the data to have a name. For instance, the following definition is perfectly legal: db "Mary had a little lamb." Unfortunately, the assembler has no way to find it. The normal thing is to start the line with a name, and then give the definition of the data. The assembler processes the data line by line, so a definition on one line does not carry over to another line. We can have: poem db "Mary had a little lamb," Notice that names for data don't have colons after them. What if we wanted to continue the poem? It isn't going to fit all on one line. No problem. All we need to do is define the following lines without a name. poem db "Mary had a little lamb," db "It's fleas were white as snow," db "And everywhere that Mary went," db "She scratched and scratched and scratched." The assembler still can't find lines 2-4, but starting at the first byte of "poem", it can go all the way through the poem one byte after the other. By the way, there are no carriage returns in the poem right now. They will come later. So we have the name part, the db part, and the data part. What is that db anyway. It stands for Define Byte. Whenever you give the name "poem" to the assembler, it knows that you want to deal with the data one byte at a time. If you try working a word at a time, you will get an assembler error. The legal definitions are: DB define byte [ 1 byte ] DW define word [ 2 bytes ] DD define doubleword [ 2X2 bytes = 4 bytes ] DQ define quadword [ 4X2 bytes = 8 bytes ] DT define ten-byte [ 10 bytes ] DF define farword [ 6 bytes - used for 80386 only ] Every time you use one of these directives, the assembler allocates the number of bytes in brackets for EACH variable. For instance in: db "Mary had a little lamb," each character inside the quotes is a variable. That's 23 variables X 1 byte = 23 bytes. In: ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 12 ______________________ dq 0, 1, 2, 3, 4 each number is a variable. 5 variables X 8 bytes = 40 bytes. Notice from these examples that you can have more than one variable on a line but they all share the same defining type. What do you do if you have an uninitialized variable, i.e. you don't know its starting value? Easy as pie. Here's a four byte variable: some_data dd ? The question mark lets the assembler know that you didn't forget the number but rather you didn't know the number. The commas are separators. When you write a comma, the assembler expects another piece of data on the line. If it doesn't get the number, it is an error. That means there can be no commas inside a number. dw 32,421 is two variables: 32 and 421. What if you want to make an array? The assembler has a directive for that too: dw 150 dup ( 400 ) The 'dup' is for duplicate. This makes 150 two byte copies and puts the number 400 in each one. db 273 dup ( 'c' ) This makes 273 one byte copies and puts the letter 'c' in each one. dd 459 dup ( 1, 2, 3, 4, 5 ) This makes 459 copies of what is inside the parentheses. That means ( 5 variables X 4 bytes ) X 459 for a total of 9180 bytes. Starting from the beginning of the array, we will have the sequence: 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3,... dq 20000 dup ( 455 ) This makes 20000 eight byte copies and causes an assembler error because there is a limit of 65,536 bytes for the data and you have used 160,000 bytes (20,000 X 8). db 7 dup ( 'Mary had a little lamb,') This makes 7 copies of 'Mary had ..' which is 23 bytes, for a total of 161 bytes. dw 39 dup ( 28 dup ( 0 ) ) Chapter 2 - Data 13 ________________ The assembler even supports nesting, so you can make a multi-dimensional array. This is a 39 X 28 array initialized to zero. 39 copies of 28 two byte numbers is 2184 bytes. The standard form for arrays is (1) first define the data type, (2) then say how long the array is followed by the keyword "dup" and (3) put the initial value inside the parentheses. What if you don't know the initial value? Simple: dw 347 dup ( ? ) The question mark lets the assembler know that you don't know. DEFINING NUMBERS What kinds of data can you have? 1. A single character inside single or double quotes: 'a' , "&" , '|' 2. A string inside single or double quotes: "Mary had a little lamb," 'Mary had a little lamb,' Each character is stored as a byte, and the bytes are stored consecutively. If the array starts at address 2743, 2743 = 'M', 2744 = 'a', 2745 = 'r', 2746 = 'y', 2747 = ' ', etc. As usual in these instances, if you want a double quote inside a double quoted string or a single quote inside a single quoted string, you need to use a pair: "Mary asked her fleas ""Why don't you join the circus?""" 'Mary asked her fleas "Why don''t you join the circus?"' 3. A decimal number. Decimal is the default: 27, 44, 641, 89 4. A hex number. A hex number must start with a number, so if the highest digit is A - F, there must be a 0 in front.{1} b77h is illegal, 0b77h is legal. All hex numbers must be followed by an 'h': 0a162H , 0329H , 0DDDh , 7h 5. An octal (base 8) number. An octal is followed either by the ____________________ 1 When the assembler looks at something it needs to know whether it is a name or a number. Is 'A7' a name or a hex number? Is '3D' a name or a number? To solve this problem, all assemblers and all compilers insist that -> if the first character is a number, it's a number; if the first character is not a number, it is not a number. That is why you can't start a variable name with a number. The PC Assembler Tutor 14 ______________________ letter q or the letter o: 641q , 2345o , 1472o 6. A binary number. A binary number is followed by a b: 0100100b , 1b , 01001000111010b Any of these types can be mixed on a line. For instance: db "Mary had a little lamb," , 13 , 10 13 followed by 10 is CRLF, the PC signal for a carriage return. A string in the C language ends with the number 0. If we wanted a C string with CRLF, we would have: db "Mary had a little lamb," , 13 , 10 , 0 Another mixed example: dw 7 , 010010b , 0FFFFh , 037q is dopey but legal. You can also have an equation, as long it resolves to a number. This calculation is done by the assembler, so the values of variables are not allowed: dw ( ( 19 * 7 * 25 ) + 6 ) / ( 9 + 7 ) is legal, but: data1 dw 25 data2 dw 7 dw ( ( 19 * 7 * data1 ) + 6 ) / ( 9 + data2 ) is illegal. Everything must be a constant. Remember that when the assembler starts calculating it might truncate the partial answers, so don't get too fancy. Chapter 2 - Data 15 ________________ SUMMARY The assembler works one line at a time. Each line with data must start with a data type declaration (after an optional name.) DATA TYPES DB define byte ( 1 byte ) DW define word ( 2 bytes ) DD define doubleword ( 2X2 bytes = 4 bytes ) DQ define quadword ( 4X2 bytes = 8 bytes ) DT define ten-byte ( 10 bytes ) DF define farword ( 6 bytes - used for 80386 only ) COMMON INTEGER TYPES TYPE MAX SIGNED MAX UNSIGNED byte -128/+127 255 word -32768/+32767 65535 doubleword -2147483648/+2147483647 4294967295 Note that the max. negative integer is 1 larger than the max. positive integer. POSSIBLE BASES FOR CONSTANTS b binary data o,q octal data d decimal data (default) h hex data (must start with a number 0 - 9) ARRAY DEFINITIONS d* num1 dup ( data1 ) Using the d* data type (db, dw, dd, dq, etc.) make num1 copies of data1 (data1 may be either a single piece of data or a group of data.) MULTIPLE DATA ON ONE LINE Different data elements on the same line are separated by commas. All elements on the same line have the same data type. Chapter 3 - ASMHELP =================== 16 We are now going to introduce both the 8086 registers and a program for looking at them. You are going to get information flying at you at a rapid pace, so read both this and the next chapter carefully and slowly. REGISTERS The 8086 has a number of registers. Remember, registers are places for storing data that are internal to the 8086 chip. They are much faster, but there are very few of them. There are 6 registers that you can use for addition and subtrac- tion of word (2 byte) sized numbers, as well as logical opera- tions on word (2 byte) numbers or data. These registers are AX, BX, CX, DX, SI, DI. In addition, there is a register which works the same way, but has a special function in all high-level languages (Basic, Pascal, C, etc.). This is BP, the base pointer. There is one more register that performs the same operations as the above seven, but it is RESERVED for special use and should never be used for anything. It is called SP (the stack pointer). There are 4 registers that tell the 8086 which memory segments you are in. They just sit there and help the 8086 find things in memory. You will learn how they work later. They are CS, DS, SS, ES. (That is Code Segment, Data Segment, Stack Segment and Extra Segment respectively). There is the flags register which contains all the information the 8086 needs to evaluate its state. We will learn about this later. The flags register has no name, and there are machine instructions for manipulating individual flags in the register. Finally, there is IP, the instruction pointer, which points to the machine instructions. You have no direct access to this, which is good because you would screw it up for certain. The 8086 handles the IP automatically and correctly. One word (two bytes or 16 bits) is the largest piece of data that the 8086 can handle naturally. It is possible to handle larger things, but we do it through software (which is slower), not hardware (which is faster). Sometimes we want to handle things one byte at a time as when we work with characters. The 8086 gives us this possibility by letting us divide the AX, BX, CX, and DX registers into an upper half and a lower half. For any or all of these registers, we can replace one 2 byte register by two 1 byte registers. The data in the full register stays the same, but we can look at each half. The two parts of AX are called AH (for A high) and AL (for A low). |------AX------| 0000000000000000 |--AH--||--AL--| Chapter 3 - ASMHELP.OBJ 17 _______________________ This is the 16 bit binary number 0 in the AX register. Using AX allows us to manipulate all 16 bits. Using AH allows us to manipulate the upper 8 bits (without affecting the lower 8 bits), and using AL allows us to manipulate the lower 8 bits without affecting the upper 8 bits. Similarly, for BX we have BH, BL, for CX we have CH, CL, and for DX we have DH, DL. SHOW_REGS We have named all the registers, now let's take a look at them. Included in the module asmhelp.obj is a program called show_regs. It shows all the above registers on the top 10 lines of your screen and allows you to enter data normally underneath. When you call show_regs, it puts the current value of all the registers on the screen. Those values stay on the screen until the next call - i.e. the program does not change what is on the screen even though the registers may be changing value. You need to call show_regs every time that you want to see the current values of the registers. The first time you call show_regs, it clears the screen so you should call it right at the beginning of the program in order to initialize the screen. This time we want temp2.asm for a template; we will call this program prog3.asm, so make a copy of temp2.asm and give it the new name. Let's take a look at it. The only differences are (1) there are a lot more programs in the EXTRN statements and (2) in the data segment DATASTUFF there are these definitons: ax_byte db 2 bx_byte db 2 cx_byte db 2 etc. These will be used for show_regs later, but you need to learn a few assembler instructions first. Here's our next program: TEMP2.ASM + + + + + + + + START CODE BELOW THIS LINE call show_regs label_one: call get_hex call show_regs call get_num call show_regs jmp label_one + + + + + + + + END CODE ABOVE THIS LINE The PC Assembler Tutor 18 ______________________ That's all the program does. It asks for a number, then calls show_regs to show us what is in the registers. Note that one of the numbers is hex while the other number is decimal. Compile this, and link it with >link prog3+\asmhelp ; and we're ready to go. The program reads information in the computer to find out what kind of monitor you have and where the screen output goes. It then puts the register information on the top lines. If it doesn't appear there, we have a screwup somewhere. The text should appear in black and white, but if you have a color monitor you can make it a blue background with white letters.{1} *********************** SCREEN SHOT *************************** AX 19825 SI 00000 BX 00000 DI 00256 CX 00255 BP 17113 DX 02596 SP 00508 CS 0AA4H DS 0A54H ES 0A24H SS 0A34H IP 0018H OF DF IEF TF SF ZF AF PF CF + x + x E COUNT 00003 ----------------------------------------------------------------- The PC Assembler Helper Version 1.0 Copyright (C) 1989 Chuck Nelson All rights reserved. Enter a two byte hex number 4df9 Enter any decimal number +19825 Enter a two byte hex number ***************************************************************** This is how the screen looks after entering first a hex number, then a decimal number. The numbers in the registers will probably be different for you. Note that AX contains the last number that was entered. On the left side you will see the AX, BX, CX and DX ____________________ 1 To make a blue background and white letters, insert the code "call set_blue" before the FIRST "call show_regs". i.e.: call set_blue call show_regs label_one: etc. This only works if it is allowed by the video board. Chapter 3 - ASMHELP.OBJ 19 _______________________ registers. For the time being, these registers will display unsigned numbers. On the right are the SI, DI, BP and SP registers. They are also unsigned numbers for the moment. Below that are the segment registers CS, DS, ES, and SS and the instruction pointer (IP). These are hex numbers and will always be hex numbers. The bottom line has OF, DF, IEF, etc. These are the flags, and the marking underneath them (either a blank or some character) tells how they are set. Finally we have COUNT. This has nothing to do with the 8086. It is a counter that is incremented each time you call show_regs. Keep entering numbers and watch the registers. You will notice that three things are changing - AX, IP and COUNT. AX has the last number you entered and IP keeps changing. Write down the value of IP each time it changes. It goes back and forth between two numbers. That is because you call show_regs in two different places in the loop, {2} and those are two different places in memory where the 8086 is reading the machine code. Why is AX changing? You may have wondered in prog1.asm and prog2.asm how that information was going back and forth between your program and asmhelp.obj. The answer is that in all the programs in asmhelp.obj, if you need to pass information, it is passed via register AX. This is not the normal way to pass information. The normal way is more elegant but more complicated. We will cover that much later. The counter, of course, increases by 1 each time you call show_regs. Try entering a few more numbers and then it's time to go on to the next program. MOVING DATA Obviously, we want to move data from memory to the 8086, from the 8086 to memory, and between registers. We have the following possibilities: (1) move from a register to memory (2) move from memory to a register (3) move a constant to memory (4) move a constant to a register (5) move from one register to another register That's it. There is no 8086 instruction that moves a single word or a byte from one place in memory to another. The move mnemonic for the 8086 is "mov".{3} We need some sample data: ____________________ 2 IP actually has three different values, since you call show_regs once before you enter the loop. 3 A mnemonic is a name for a machine instruction, which sounds like what the instruction is supposed to do - MOV for move, SUB for subtract, IMUL for integer multiplication, etc. The PC Assembler Tutor 20 ______________________ EXAMPLE DATA this_year dw 1989 total dw ? average dw ? time dw 7 age db ? ; I hope you aren't older than 255 poem db "In 1492 Columbus played a mean kazoo." secret_code db 3Bh character db ? Here is some sample code. To move from register to memory, we have: mov total, ax mov time, si mov age, cl mov character, bh The first thing that strikes the eye is that the destination is on the left and the source is on the right.{4} This is standard for the 8086 instruction set, and it's going to take some getting used to. DESTINATION ON LEFT, SOURCE ON RIGHT. You are going to blow this one from time to time, so always double check to make sure that they are in the right order. Also note that the 1 byte registers are matched up with 1 byte variables, and 2 byte registers are matched up with 2 byte data. If the sizes don't match, the assembler will complain.{5} Thus: mov age, ax is an illegal instruction. For examples of memory to register moves, we have: mov ch, secret_code mov di, this_year mov dl, poem ; moves 'I' to dl mov bx, time ____________________ 4 The computer community likes the words "destination" and "source". "Source" means FROM, and "destination" means TO. The 8086 instruction set is designed: MOV TO, FROM which is exactly opposite to the way you would say it in an English sentence. For the 8086, it is always destination on the left, source on the right. 5 Half register and full register operations have different machine codes, and the assembler needs to know which code to use, so the two things must be the same number of bytes. Chapter 3 - ASMHELP.OBJ 21 _______________________ Once again: (1) DESTINATION ON LEFT, SOURCE ON RIGHT, and (2) the sizes of the two objects must match. For constant to memory we have: mov total, 2986 mov age, 36h mov secret_code, 0110100b mov average, (27/5) + 3 The arithmetic is done by the assembler and anything that is made up totally of constants is legal. Thus: mov average, (((64+27)*51 )/(196-82)) is legal but: mov average, this_year/time is illegal. The assembler makes either a one byte or a two byte constant to match the size of the destination. The constants for "total" and "average" are two byte constants while those for "age" and "secret_code" are one byte constants. The constants must be within range, that is -129 is too negative for a byte, 256 is too positive for a byte, -32769 is too negative for a word, 65536 is too positive for a word. The assembler will give an error if the constant is out of range. You can also move a constant to a register: mov al, 'c' mov ax, 'c' mov di, 46280 mov bl, 99 The same rules apply. The constant must be within range and the assembler will make a constant the same size as the destination register (one or two bytes). These constants are actually imbedded in the machine code by the assembler, and are unchangable. Lastly, you can move data from one register to another: mov ax, cx ; from cx to ax mov ah, bl ; from bl to ah mov dl, dh ; from dh to dl mov di, dx ; from dx to di All this is summarized at the end of the chapter. It's time for program #4. All this program is going to do is get input and the move it to different registers. We are still using temp2.asm. Here's the program: TEMP2.ASM The PC Assembler Tutor 22 ______________________ ;+ + + + + + + + + + START DATA BELOW THIS LINE byte_data db ? word_data dw ? ;+ + + + + + + + + + END DATA ABOVE THIS LINE ;+ + + + + + + + + + START CODE BELOW THIS LINE call show_regs This_is_a_very_long_label_name: call get_hex ; (1) call show_regs_and_wait mov dx, ax ; (2) call show_regs_and_wait mov byte_data, al ; (3) mov ch, byte_data call show_regs_and_wait mov word_data, ax ; (4) mov di, word_data call show_regs jmp This_is_a_very_long_label_name ;+ + + + + + + + + + END CODE ABOVE THIS LINE There is a data section in this one, so copy those variables into your data section. Here is what the program does. (1) it gets a hex number from the keyboard, (2) it moves the number in ax to dx, (3) it moves one byte from al to ch via the variable byte_data, and (4) it moves two bytes from ax to di via word_data. There are two different subprograms - show_regs and show_regs_and_wait. They do the same thing except that show_regs_and_wait waits for you to hit the ENTER key before continuing. The computer works so fast that we wouldn't be able to see the changes in the screen if we didn't have a way of pausing. You can use COUNT on the screen to keep track of exactly where you are in the loop. Assemble program 4, link it to asmhelp.obj, and watch it work. There are two things to notice here. First, we are entering a hex number, but AX is displaying an unsigned number. It is not self-evident that the unsigned number in AX is the same as the hex number that you are entering. Secondly, though CH is changing, there doesn't seem to be any relationship between the number in AX and the number in CX. We will solve both problems in the next chapter. Chapter 3 - ASMHELP.OBJ 23 _______________________ SUMMARY MOVING DATA You can: (1) move from a register to memory (2) move from memory to a register (3) move a constant to memory (4) move a constant to a register (5) move from one register to another register The constants are actual constants which are imbedded in the machine code. REGISTERS The normal registers are AX, BX, CX, DX, SI, DI AND BP. AX, BX, CX, and DX can be divided into AH-AL, BH-BL, CH-CL AND DH-DL. The 'H' is for high and the 'L' is for low. SP is committed and may not be used. The segment registers are CS, DS, ES, SS. The instruction pointer (IP) is not available to you. The register that holds the flags is manipulated by special machine instructions. ASMHELP.OBJ Call show_regs to see what is in the 8086 registers. Count increments by one every time you call show_regs. show_regs_and_wait is the same as show_regs except that it waits for you to hit the ENTER key to allow you time to look at the screen. Call set_blue at the outset if you have a color card and a color monitor and you want to have a blue background. get_num gets a signed or unsigned number from the keyboard. (1-5 digits). It does no range checking to see whether the number is too big. All other input routines check to see if a number is too large for its data size. get_hex gets a hex number from the keyboard. (1-4 digits) get_ascii gets characters from the keyboard. (1 or 2 characters) get_binary gets a binary number from the keyboard (1 - 16 digits) print_num prints a number as hex, signed, unsigned, char, and binary. All data transfer to or from ASMHELP.OBJ is via the AX register. Chapter 4 - SHOW_REGS ===================== 24 We got started using the program show_regs in the last chapter, but we have already run into problems. The hex number doesn't look the same once we put it in the register - that's because what we are seeing in the arithmetic registers is an unsigned number. Also, when we moved a byte from AL to CH, it was clear that something had moved, but it wasn't clear what the number was. There are two problems here: (1) We want to use data in hex, binary, ascii, unsigned and signed format depending on what we are doing in the program. (2) Some of the registers can be used as half registers, so we want a whole register when we need it and a half register when we need it. Nooooooo problem. There are eight registers whose formats we want to be able to change: AX, BX, CX, DX, SI, DI, BP and SP. We need to give each one a code to tell it what to display. The code is the following: signed number 1 unsigned number 2 binary number 3 hex number 4 ascii 5 Also, we need to know whether AX, BX, CX and DX are half or full registers. The code for that is: half register 128 (80 hex) full register 0 We will need to do two things - set up the codes, then tell show_regs about the code. We'll begin by setting up the codes. First let's start with SI, DI, BP and SP. They must be full registers, so the half register information is irrelevant.{1} In the data section is a set of data starting ax_byte, bx_byte ... sp_byte. That is where you need to put the code. Don't change the order of these variables. Just put the correct formatting code in the appropriate byte. mov si_byte, 3 will display SI as a binary number. ____________________ 1 show_regs is very forgiving. It only recognizes half registers where appropriate, and if you screw up on the format code, it just makes it an unsigned number. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 4 - SHOW_REGS 25 _____________________ mov bp_byte, 4 will display BP as a hex number. mov di_byte, 1 will display DI as a signed number. That's pretty easy. If you are using AX, BX, CX, or DX as a full register, they are exactly the same. mov dx_byte, 5 ; full register, character format mov ax_byte, 2 ; full register, unsigned format mov bx_byte, 3 ; full register, binary If we use half registers we need to pass more information. show_regs needs to know that it is half registers, not full registers; it also needs to know what the left format is and what the right format is. This is easier than it sounds but will take a little getting used to. The right nibble (half byte) gets the right format and the left nibble (half byte) gets the left format. Then you add 128 to the total. This works easily in hex. 13h means that the left half register is (1 = signed) and the right register is (3 = binary). Then we add (128 = 80h) for a total of 93h. Here are some more examples. Remember, 8 + 1 = 9, 8 + 2 = A, 8 + 3 = B, 8 + 4 = C, 8 + 5 = D C4h ; 80h + 44h left and right are hex A5h ; 80h + 25h left is unsigned, right is ascii B1h ; 80h + 31h left is binary, right is signed D2h ; 80h + 52h left is ascii, right is unsigned 94h ; 80h + 14h left is signed, right is hex There is a summary at the end of the chapter giving all the commands and codes for show_regs. It is important to take some time here and learn to make the registers look the way we want, because later on we have machine instructions for signed numbers, for ascii characters, for binary numbers, and you need to see what the registers look like in the appropriate formats. Spending a little time right now will save you a lot of time later on. If you don't like using hex numbers you can use decimal numbers: code = type_of_register + left code + right code where full register = 0, half register = 128d and: NUMBER FORMAT LEFT CODE RIGHT CODE signed 16d 1d unsigned 32d 2d binary 48d 3d hex 64d 4d ascii 80d 5d Therefore, left binary, right hex = 128 + 48 + 4 = 180d. Some The PC Assembler Tutor 26 ______________________ more examples: left ascii, right signed = 128 + 80 + 1 = 209d left signed, right binary = 128 + 16 + 3 = 147d left hex, right unsigned = 128 + 64 + 2 = 192d We can put in the code the same way as before. mov ax_byte, 192d mov dx_byte, 147d mov cx_byte, 0D2h mov bx_byte, 94h We now have the codes in out program, but show_regs doesn't know about them. In order to give the information to show_regs, we call set_reg_style. set_reg_style makes a copy of your information for use by show_regs. The next time that you call show_regs, it will use the new register formats. There is a small problem, however. The information we have is eight bytes long, and ax is only two bytes long. How do we pass the information? The answer is: we don't pass the information. Instead, we pass the address of the information. If you look in the data segment of temp2.asm, you will see that ax_byte is the first byte of this data. We pass the address of ax_byte (the first byte) and set_reg_style knows that that address plus the following 7 addresses have the information that it needs. There is a special machine instruction for putting an address in a register - it is LEA or load effective address. It looks like this: lea ax, ax_byte This instruction says: put the address of ax_byte in the AX register. Combined with the call, we have: lea ax, ax_byte call set_reg_style Before we start writing programs with set_reg_style, we will run a pre-existing program called SETREGS.EXE. Its pathname is \ASMHELP\SETREGS.EXE. It puts the same (pseudo) random number in all arithmetic registers except SP, then requests a formatting code for each register. After cycling through all the registers, it asks you to press ENTER. It then puts a new random number in the registers and starts the cycle again. The hex codes are displayed on the screen before each request. As usual, use Control-C to exit the program. Here is what the screen might look like after the first cycle. The prseudo random number 2571 will be the same, but your formatting might be different: Chapter 4 - SHOW_REGS 27 _____________________ *********************** SCREEN SHOT *************************** AX +02571 SI +02571 BH 00001010 BL 0B ** DI 02571 CX 0A 0B ** BP 0000101000001011 DH 0A ** DL 0B ** SP 00C4H CS 0AA4H DS 0A42H ES 0A25H SS 0A35H IP 0115H OF DF IEF TF SF ZF AF PF CF x + x + O x COUNT 00009 ---------------------------------------------------------------- hex = C0h or 4h; ascii = D0h or 5h Enter a code for sp. Enter a one byte hex number 4 Press ENTER to continue ***************************************************************** The formats I have used are: AX full register (signed) BX half registers (binary, ascii) CX full register (ascii) DX half registers (ascii, ascii) SI full register (signed) DI full register (unsigned) BP full register (binary) SP full register (hex) Cycle through the registers a couple of times. If you make them binary, they get longer, if you make them hex or ascii they get shorter; a sign appears if they are signed, and you can change from full to half registers for AX, BX, CX and DX. You will always be able to tell what kind of number show_regs is printing because (1) a signed number always has a + or - in front of it, (2) a hex number always has an h after it, (3) a binary number is 8 digits long for a half register or 16 digits long for a full register, and (4) an ascii has an asterisk after it. Just as with print_num, if the ascii character has one of the values 0-32, 127 or 255, it will print a hex number and show a double asterisk '**' to signal the event. (5) If none of the above is true, then it is an unsigned decimal number. If you have a feel for what's happening, it is time to take a mini-test. This is an untimed test, so just make sure that it is correct. I'll give you a particular style, and you figure out the code for that style. The answers are at the bottom of the page. You don't have to memorize the codes. You should be using the summary at the end of the chapter for this quiz. 1. full register, binary 2. half register, left ascii, right hex 3. half register, left signed, right unsigned 4. full register, ascii 5. half register, left binary, right ascii 6. half register, left hex, right signed 7. half register, left unsigned, right binary The PC Assembler Tutor 28 ______________________ 8. full register, hex 9. full register, signed 10. half register, left signed, right binary. If you feel comfortable with what's going on and are able to do set the registers with the help of the summary, we are ready to move on. Here are the answers.{2} ____________________ 2 Here are the answers, both in hex and decimal. PROBLEM HEX DECIMAL 1. 3h 3d 2. D4h 212d 3. 92h 146d 4. 5h 5d 5. B5h 181d 6. C1h 193d 7. A3h 163d 8. 4h 4d 9. 1h 1d 10. 93h 147d These things are slow to calculate. It took me about a minute per problem to do both the hex and binary. Chapter 4 - SHOW_REGS 29 _____________________ SUMMARY The registers may be displayed in signed, unsigned, binary, hex, and ascii formats. The basic codes for this are: signed 1 unsigned 2 binary 3 hex 4 ascii 5 In addition you need to add the register type. They are: full register 0 half register 128d or 80h For the left half register, we have: FORMAT LEFT HEX LEFT DECIMAL signed 10h 16d unsigned 20h 32d binary 30h 48d hex 40h 64d ascii 50h 80d Since the left code is of interest only when the half register type is being used, we simply add 80h and come up with: FORMAT LEFT CODE RIGHT CODE signed 90h 1h unsigned A0h 2h binary B0h 3h hex C0h 4h ascii D0h 5h Or we add 128d and have: FORMAT LEFT CODE RIGHT CODE signed 144d 1d unsigned 160d 2d binary 176d 3d hex 192d 4d ascii 208d 5d SETTING THE FORMATS Formats are set by calling set_reg_style. The address of ax_byte must be in AX. The standard assembler instructions for this are: The PC Assembler Tutor 30 ______________________ lea ax, ax_byte call set_reg_style set_reg_style makes a copy of your format data. It changes nothing on the screen. The next time that you call show_regs, it will use the new formatting data. The correct order for the data in the data segment is: ax_byte, bx_byte, cx_byte, dx_byte, si_byte, di_byte, bp_byte, sp_byte. They are, of course, all byte sized data. Chapter 5 - Addition and Subtraction ==================================== 31 The first arithmetic operations we will look at are addition and subtraction, but before we do that, we need to look at one instruction that controls program flow. LOOP We already have JMP which sends you to a label: jmp label3 sends the program to label3, wherever that is in the code. Sometimes we want to repeat a section of code a specific number of times and then go on. For this, we have LOOP. LOOP decrements the CX register by 1. If CX is not zero after being decremented, LOOP jumps to the label indicated. If CX is zero after being decremented, LOOP falls through. The 8086 does not have general purpose registers. A general purpose register is a register that can be used for ALL instructions. There are a number of instructions on the 8086 which must be done with specific registers, and LOOP is the first one we meet. LOOP always looks at the CX register. This first program lets you enter a number and then loops that many times so you can watch the CX register. As usual, you exit the program by hitting Control-C. We use temp2.asm. temp2.asm ; - - - - START CODE BELOW THIS LINE call show_regs ; initialize outer_loop: call get_unsigned mov cx, ax ; number to cx inner_loop: call show_regs_and_wait loop inner_loop jmp outer_loop ; - - - - END CODE ABOVE THIS LINE A very simple program. As always, link it with asmhelp.obj. Get_unsigned gets a two byte number (less than 65536) and puts it in AX. We put that number in CX, and then watch the program loop. Make sure you use show_regs_and_wait, or everything will happen too fast for you to see. Try entering 0. On the first pass, loop will decrement CX from 0 to 65535. If CX is 0 when you enter, you ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 32 ______________________ have to repeat the loop 65536 times before you exit the loop. Hit Control-C now to exit. Throughout the book, I will use label names that end in '_loop' to indicate that they are the destination of a jump or loop instruction. The single word "loop" is a reserved word and may not be used as a label - it can only be used as an instruction. The addition program will have 4 sections and LOOP will give us the ability to do each section a limited number of times before going on to the next section. ADDITION If you read the introductory section on numbers carefully, you know that it is the same instruction for both signed and unsigned addition. The 8086 sets the flags correctly for both signed and unsigned addition. For signed addition, the following flags are set: OF the overflow flag is set (1) if the result is too negative or too positive, that is, if the result in the register does not show the correct result of the addition. It is cleared (0) otherwise. ZF the zero flag is 1 if the result is zero, and is cleared (0) if the result is non-zero. SF the sign flag is set (1) if the result is NEGATIVE and is cleared (0) if the result is POSITIVE. Zero is considered a positive number. For unsigned addition, the following flags are set: CF the carry flag is set (1) if the result is too large (over 255 for byte and over 65535 for word operations). It is cleared (0) otherwise. ZF the zero flag is the same as above. In addition, there are two more flags (PF the parity flag and AF the auxillary flag) which will be set or cleared; we will learn about them later. Show_regs shows all the flags. The setting for each flag is underneath its name. For the flags OF, ZF and CF, there is an 'X' if the flag is set and a blank if the flag is cleared. SF, the sign flag, is '-' if the flag is set and '+' if the flag is cleared. The addition program is fairly long because there are four things to look at - unsigned word addition, unsigned byte addition, signed word addition and signed byte addition. For that reason, it has already been typed in for you. It is called ADD1.ASM and its pathname is \XTRAFILE\ADD1.ASM. Print out a copy of it. Chapter 5 - Addition and Subtraction 33 ____________________________________ There are four blocks of code which are almost identical except the calls are a little different and two blocks refer to whole registers while the other two refer to half registers. At the head of each block is code to set the appropriate register styles for show_regs. SI, DI, and BP are not used and are set to 0 to make the screen easier to read. Here is the first block of code, which is typical. ; - - - CODE ; UNSIGNED WORD ADDITION mov ax_byte, 2 ; ax, bx, dx unsigned mov bx_byte, 2 mov dx_byte, 2 lea ax, ax_byte ; call set_reg_style call set_reg_style mov cx, 3 ; 3 iterations unsigned_loop: mov ax, 0 ; clear the registers for visibility mov bx, 0 mov dx, 0 call show_regs call get_unsigned ; first number to ax call show_regs push ax ; temporarily save ax call get_unsigned ; second number to bx mov bx, ax pop ax ; get ax back mov dx, ax ; copy of ax to dx add dx, bx ; dx (=ax) + bx call show_regs_and_wait loop unsigned_loop ; - - - CODE First, we set AX, BX, and DX for the appropriate register style. Here it is unsigned full register. We then put 3 in CX so we can have 3 iterations with loop. Upon entering the loop, AX, BX, and DX are cleared for reasons of visibility. We don't want the screen cluttered up with numbers. Get_unsigned gets a two byte unsigned number and returns it in AX. We want the first number to be visually on the top (which is AX), but there is a problem here. In order to get the second number we need to call get_unsigned again, and it is going to put another number in AX. We need to temporarily store the first number while we bring in the second number and transfer it to bx. There is a special 8086 instruction to do this, it is called PUSH. Push temporarily stores a word. The word can be either a full register or a word (two bytes) in memory. You can have either: variable1 dw 10000 push ax push variable1 The PC Assembler Tutor 34 ______________________ These are stored in a special place called the stack which we will talk about much later. When you want it back, you use the instruction POP. POP gets back the LAST thing that you pushed onto the stack. Things come off the stack in REVERSE order of how they were put on. push variable1 push variable2 push variable3 push variable4 pop variable4 pop variable3 pop variable2 pop variable1 is the correct order. This is used for temporary storage only, and the only thing which is accessable is the last thing which you PUSHed on the stack. We push AX to store it temporarily, call get_unsigned again and transfer the number to BX. We then pop AX to get the number back. The situation now is: the first number is in AX, the second number is in BX. For the actual addition, we transfer AX to DX and then add DX and BX. AX and BX contain the two numbers, and DX contains the result. Then you must press ENTER to continue. LOOP will jump to 'unsigned_loop' two times. The third time it will fall through to the next section of code. This program illustrates a hallmark of assembler code. It normally takes scads of code just to do something simple. Assemble add1.asm and link it with asmhelp.obj. Run it: ******************** SCREEN SHOT ****************************** AX 17428 SI 00000 BX 19755 DI 00000 CX 00003 BP 00000 DX 37183 SP 00508 CS 0AA5H DS 0A55H ES 0A25H SS 0A35H IP 004DH OF DF IEF TF SF ZF AF PF CF x + x - E COUNT 00004 ---------------------------------------------------------------- The PC Assembler Helper Version 1.0 Copyright (C) 1989 Chuck Nelson All rights reserved. Enter a number from 0 to 65535 17428 Enter a number from 0 to 65535 19755 Press ENTER to continue ***************************************************************** This is the screen after the first addition. I have added 17428 (AX) and 19755 (BX). The result 37183 is in DX. CX is still 3 because it hasn't LOOPed yet. Chapter 5 - Addition and Subtraction 35 ____________________________________ Notice that even though it is the same assembler instruction: add in all four blocks of code, it is doing both signed and unsigned addition correctly. When you are doing signed addition, you want to look at OF, the overflow flag, SF, the sign flag, and ZF, the zero flag after each addition to see how they are set. When you do unsigned addition, you want to look at CF, the carry flag, and ZF, the zero flag to see how they are set. Play around with this for a while, and then it is time for the next program. As in all 8086 instructions, the order is: add destination, source We add both numbers, and put the result in the destination, the thing on the left. There are five different types of addition you can do, (just as there are five different types of moves). They are: 1. add two registers 2. add a register to a variable (memory) 3. add a variable (memory) to a register 4. add a constant to a variable (memory) 5. add a constant to a register Here's a program that does all 5 things. Use template.asm to make this program. template.asm is almost the same as the other ones we have used. It has a few changes. First, it now lists ALL the subroutines you can call in asmhelp.obj.{1} Appendix 1 (\APPENDIX\APP1.DOC) contains a description of all the subroutines, what they do, and how they are called. Second, the size of STACKSEG is larger. We don't need this large of a stack now; it is for later. Finally, there is a section: ; + + + + + + + + + + PUT SUBROUTINES BELOW THIS LINE ; + + + + + + + + + + PUT SUBROUTINES ABOVE THIS LINE for subroutines. Ignore this. This is for later. From now on, we will always use template.asm unless it is explicitly stated that something else is being used. Here's the program: template.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE ____________________ 1 This does not change the size of the .EXE file by even one byte, but it adds a lot of information to the .OBJ file, so they are much larger. The PC Assembler Tutor 36 ______________________ variable1 dw ? variable2 dw ? ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE call show_regs outer_loop: call get_unsigned ; first number to ax push ax ; store ax call get_unsigned ; second number to bx mov bx, ax pop ax ; restore ax mov variable1, ax ; first number to variable1 mov variable2, bx ; second number to variable2 ; add 2 registers mov cx, ax ; cx + bx add cx, bx ; add register to memory add variable1, bx mov dx, variable1 ; put in dx for display ; add memory to register mov si, ax add si, variable2 ; add a constant to memory add variable2, 25 mov di, variable2 ; put in di for display ; add a constant to a register mov bp, bx add bp, 25 call show_regs jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE The program puts the first number in AX and the second number in BX. It then proceeds to do the same addition (first number plus second number) three times. These are: 1. CX = add two registers (CX + BX) 2. DX = add a register to memory (variable1 + BX) 3. SI = add memory to a register (SI + variable2) Finally it adds a constant (second number + 25). These are: 4. DI = add a constant to memory (variable2 + 25) 5. BP = add a constant to a register (BP + 25) On the 8086, it is not possible to add two things in memory. That is: add variable1, variable2 is an illegal instruction. Instead, you need to write: mov ax, variable2 Chapter 5 - Addition and Subtraction 37 ____________________________________ add variable1, ax SUBTRACTION It is now time to do some subtraction. The instruction is SUB: sub destination, source subtracts source from destination and stores it in destination, the thing on the left. sub ax, cx ; (ax - cx) -> ax In order to do subtraction we are going to modify add1.asm, so make a copy and call it sub1.asm: >copy add1.asm sub1.asm How many instructions do we need to change to modify the program? Four. add dx, bx -> sub dx, bx add dl, bl -> sub dl, bl Each of these is changed twice, and we are ready to roll. Assemble it, link it, and run it. Once again we want to look at the flags at the end of each subtraction. For unsigned subtraction, look at ZF, the zero flag, and CF, the carry flag. This time, CF will be set if the result is below zero. For signed subtraction, look at OF, the overflow flag, SF, the sign flag, and ZF, the zero flag. As with addition, subtraction changes PF, the parity flag and AF the auxillary flag. They don't concern us. As with addition, there are five possibilities for subtraction. They are: 1. subtract one register from another 2. subtract a register from a variable (memory) 3. subtract a variable (memory) from a register 4. subtract a constant from a variable (memory) 5. subtract a constant from a register the code for these is: sub cx, bx ; (cx - bx) -> cx sub variable1, bx ; (variable1 - bx) -> variable1 sub si, variable2 ; (si - variable2) -> si sub variable2, 25 ; (variable2 - 25) -> variable2 sub bp, 25 ; (bp - 25) -> bp You can copy add2.asm to sub2.asm if you want and change the five ADD instructions to SUB instructions. It will then do those five types of subtraction. The PC Assembler Tutor 38 ______________________ SIGNED AND UNSIGNED NUMBERS What should you do if you are doing unsigned addition or subtraction and the carry flag gets set? It depends. Sometimes it makes a difference, sometimes it doesn't. If you have an error handling routine, then you can call it with the following code: add ax, bx jnc go_on call error_handler go_on: JC and JNC are conditional jump instructions. JC (jump on carry) jumps if the carry flag is set (1) and JNC (jump on not carry) jumps if the carry flag is not set (0). Using reverse logic here, we skip the error handler if everything is ok. For signed numbers, it is certainly an error if there is overflow. You are making mathematical calculations and you now have invalid data. One possibility is to do the same as above but with the overflow flag. add ax, bx jno go_on call error_handler go_on: JO and JNO are two more conditional jump instructions. JO (jump on overflow) jumps if the overflow flag is set (1) and JNO (jump on not overflow) jumps if the overflow flag is not set (0). We use the same logic here. However, there is one special instruction for signed numbers, and that is INTO (interrupt on overflow). It is possible to have an error handler external to your program. It sits permanantly in memory. When you make a signed arithmetic error, INTO interrupts your program and goes to the external error handler. The code looks like this: add ax, bx into You probably don't have an error handler in your computer right now. In that case, INTO simply goes looking for it and returns when it can't find it. Let's find out if you have an error handler installed. Once again, use template.asm ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE mov ax_byte, 1 ; signed register style mov bx_byte, 1 mov cx_byte, 1 lea ax, ax_byte call set_reg_style call show_regs Chapter 5 - Addition and Subtraction 39 ____________________________________ outer_loop: call get_signed push ax call get_signed mov bx, ax pop ax mov cx, ax add cx, bx into call show_regs jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE This is basically the same thing as before, but using AX, BX, and CX. They are set for signed style, and then we get two signed numbers and add them. The result is in CX. Right after the addition instruction is INTO. If the result is too positive or too negative, OF will be set and INTO will look for the error handler. Assemble this program and link it with asmhelp.obj. Try both numbers that do not set the overflow flag and numbers that do set the overflow flag. Did anything different happen when the overflow flag was set? If nothing different happened, you don't have an error handler for INTO. Included on the disks is an error handler. It is called INTO.COM, and it's pathname is \XTRAFILE\INTO.COM. When you run it: >into it will install itself and then return to the command prompt: > INTO.COM will stay in memory until you reboot or shut off the machine. INTO.COM provides the type of sophisticated error handling that you might want to use in a real program. Install (run) INTO.COM, and then try the previous program again, both with numbers that cause an overflow and numbers that don't cause an overflow. The PC Assembler Tutor 40 ______________________ SUMMARY ADD performs both signed and unsigned addition. It can: 1. add two registers 2. add a register to a variable (memory) 3. add a variable (memory) to a register 4. add a constant to a variable (memory) 5. add a constant to a register SUB performs both signed and unsigned subtraction. It can: 1. subtract one register from another 2. subtract a register from a variable (memory) 3. subtract a variable (memory) from a register 4. subtract a constant from a variable (memory) 5. subtract a constant from a register The flags affected by both ADD and SUB are: CF the carry flag (for unsigned). Set if the 0/65535 (0/255) border was crossed. ZF the zero flag (for signed and unsigned). Set if the result is 0. SF the sign flag (for signed). Set if the result is negative. OF the overflow flag (for signed). Set if the result was too negative or too positive. PF the parity flag and AF, the auxillary flag The following jump instructions are conditional on the setting of the flags: JC jump on carry, JNC, jump on not carry JO jump on overflow. JNO, jump on not overflow LOOP LOOP decrements cx by 1. If cx is then not zero, it jumps to the named label. If cx is zero, it falls through to the next instruction. INTO If the overflow flag is set, INTO (interrupt on overflow) interrupts the program and goes to an external error handler if one exists. It returns immediately if one doesn't exist. PUSH and POP PUSH stores either a register or a word (in memory) in a temporary storage area. POP retrieves the last word PUSHed. Chapter 6 - Multiplication and Division ======================================= 41 Unlike addition and subtraction, where the result can be in either memory or in one of the registers, the multiplication and division instructions have a rigid format. MULTIPLICATION You can multiply a one byte number by a one byte number and get a two byte result, or you can multiply a one word number by a one word number and get a two word result. The first number MUST be in AL for the byte operation or in AX for the word operation. The second number may be a register or a memory location (but not a constant). The result is in AH:AL for the byte operation and DX:AX for the word operation. Our possibilities are: AL X (one byte register or memory) -> AH:AL AX X (one word register or memory) -> DX:AX Is there a difference between signed and unsigned numbers? Yes, a very big difference. For the byte operation FFh = -1 signed but FFh = 255 unsigned. -1 X -1 = 1 = 0001h. 255 X 255 = 65025 = FE01h. These are two completely different answers. You need to tell the 8086 whether you want signed multiplication or unsigned multiplication. The 8086 does the rest. Let's look at both signed and unsigned multiplication. We'll do byte multiplication for unsigned numbers and word multiplication for signed numbers. The instruction for unsigned multiplication is MUL. The instruction for signed multiplication is IMUL. AX or AL is understood to be the register, so it is not in the code. The instructions are: variable1 db ? variable2 dw ? mul bx ; unsigned word from a register mul variable1 ; unsigned byte from memory imul ch ; signed byte from a register imul variable2 ; signed word from memory No AX or AL. It's understood. Here's our program: template.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE answer1 dw ? answer2 dw ? ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; - - - - - START CODE BELOW THIS LINE mov cx, 0 ; clear cx for visual effect ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 42 ______________________ outer_loop: ; unsigned byte multiplication mov ax_byte, 0A2h ; half regs, unsigned mov bx_byte, 0A2h ; half regs, unsigned lea ax, ax_byte call set_reg_style mov ax, 0 ; clear regs mov bx, 0 mov dx, 0 call show_regs call get_unsigned_byte ; get two unsigned bytes call show_regs push ax ; save the first number call get_unsigned_byte mov bl, al pop ax ; restore the first number call show_regs_and_wait mul bl ; unsigned multiplication call print_unsigned ; display the result (ax) call show_regs_and_wait ; signed word multiplication mov ax_byte, 01h ; full reg, signed mov bx_byte, 01h mov dx_byte, 01h lea ax, ax_byte call set_reg_style mov ax, 0 ; clear regs mov bx, 0 call show_regs call get_signed ; get two numbers call show_regs push ax ; save the first number call get_signed mov bx, ax pop ax ; restore the first number call show_regs_and_wait imul bx ; signed multiplication push ax ; save result mov answer1, ax ; display 4 byte result mov answer2, dx lea ax, answer1 call print_signed_4byte pop ax ; restore result call show_regs_and_wait jmp outer_loop ; - - - - - END CODE ABOVE THIS LINE If the answer for the unsigned byte multiplication is greater than 255, it will be difficult to read the answer from the half Chapter 6 - Multiplication and Division 43 _______________________________________ registers, so we print out the whole AX register. If the answer for the signed word multiplication is greater than +32767 or is less than -32768, the answer will be unreadable in the DX:AX registers. We move the answer to memory, and then call print_signed_4byte. As with set_reg_style, the data is too long to be put in AX, so we pass the address of the first byte of data with: lea ax, answer1 and then call print_signed_4byte. Everything from PUSH AX to POP AX is designed to do that. Do MUL and IMUL set any flags? Yes. For byte multiplication, if AL contains the total answer, the 8086 clears the OF and CF flags. If part of the answer is in AH, then the 8086 sets both the OF and CF flags. For word multiplication, if AX contains the total answer, the 8086 clears the OF and CF flags. If part of the answer is in DX, then the 8086 sets both the OF and CF flags. What do we mean by the total answer? This is simple for unsigned multiplication. If AH (or DX for word) is 0, then the total answer is in AL (or AX for word). It is more complicated for signed multiplication. Consider word multiplication. +30000 X +2 = +60000. But that's less than 65536, so it is completely contained in AX, right? WRONG. The leftmost bit of AX contains the sign. If the signed result is out of the range -32768 to +32767, information about the absolute value of the number is corrupting information about the sign of the number. AX will have the wrong number and the wrong sign. Only by combining AX with DX will you get the correct answer. Similarly for byte multiplication with AL, if the result is not in the range -128 to +127, The leftmost (sign) bit will be corrupted, and only by looking at AH:AL will you be able to get the correct result. If CF and OF are set, you need to look at both registers to evaluate the number. You might want to do error handling, so once again, you can have: mul bx jnc go_on call error_handler go_on: using the same reverse logic as before (if nothing is wrong, skip the error handler). We can also use: mul bx into if there is an INTO error handler. DIVISION Division operates in the same way as multiplication. Word The PC Assembler Tutor 44 ______________________ division operates on the DX:AX pair and byte division operates on the AH:AL pair. There are two instructions, DIV for unsigned division and IDIV for signed division. After the division: byte AL = quotient, AH = remainder word AX = quotient, DX = remainder Both DIV and IDIV operate on BOTH registers. For bytes, they consider AH:AL a single number. This means that AH must be set correctly before the division or you will get an incorrect answer. For words, they consider DX:AX a single number. This means that DX must be set correctly before the division, or the result will be incorrect. Why did Intel include AH and DX in the division? Wouldn't it have been easier to use just AH (or AX for word division) and put the quotient and remainder in the same place? These instructions are actually designed for dividing a long number (4 or 8 bytes). How it works is pretty slick; you'll find out about it later in the book. How do you set AH and DX correctly? For unsigned numbers, that's easy. Make them 0: mov al, variable mov ah, 0 div cl ; unsigned byte division For signed division, set AH or DX to 0 (0000h) if it is a positive number and set them to -1 (FFFFh) if the number is negative. This is just standard sign extension that was covered in the chapter on numbers. Fortunately for us, Intel has provided instructions which do the sign extension for us. CBW (convert byte to word) correctly extends the signed number in AL through AH:AL. CWD (convert word to double) correctly extends the signed number in AX through DX:AX. The code is mov ax, variable5 cwd idiv bx ; signed word division Of course with these two instructions you can convert a byte to a double word. mov al, variable6 cbw cwd idiv bx ; signed word division first converting to a word, then to a double word. For the division program, we are going to use the multiplication program and make some small changes. Make a copy of your multiplication program: >copy mult.asm div.asm and then make the following changes: Chapter 6 - Multiplication and Division 45 _______________________________________ MULTIPLICATION DIVISION ; unsigned byte ; unsigned byte pop ax pop ax mov ah, 0 call show_regs_and_wait call show_regs_and_wait mul bl div bl ; signed word ; signed word pop ax pop ax cwd call show_regs_and_wait call show_regs_and_wait imul bx idiv bx The calls to print_unsigned and print_signed_4byte are irrelevant, so you may either delete them or ignore the output. All we did was change the multiplication instruction to division and prepare the upper register correctly (AH for byte, DX for word). That's all. Assemble, link, and run it. Try out both positive and negative numbers and see what the remainder looks like. Also notice the sign extension just before the division. Remember, for division, the results are in the following places: byte AL = quotient, AH = remainder word AX = quotient, DX = remainder Now divide by 0. Ka-pow! You should have exited the program and gotten an error message. Unlike the other arithmetical errors where you have the option of ignoring them or making an error handler for them, the 8086 considers division by 0 a major no-no. When the 8086 detects division by zero,{1} it interrupts the program and goes to the zero-divide handler (which is external to the program). Normally, this just exits the program since the data is now worthless. ____________________ 1 What it actually detects is that the quotient is too large to fit in the lower register (AL for byte or AX for word). As long as the upper register is correctly sign extended, the only time this can happen is when you divide by 0. If the upper register is NOT sign extended correctly, you can have zero divide errors all over the place, even though you aren't dividing by 0. As an example, if AH:AL contain 3275 and bl contains 10, then: div bl will give a quotient of 327 ( > 255) and will generate a zero divide error. The PC Assembler Tutor 46 ______________________ SUMMARY MUL and IMUL MUL performs unsigned multiplication and IMUL performs signed multiplication. For bytes, the multiplicand is in AL and the result is in the AH:AL pair. For words, the multiplicand is in AX and the result is in the DX:AX pair. If the total result is contained in the lower register, CF and OF are cleared (0). If part of the result is in the upper register, CF and OF are set (1). The multiplier may be either a register or a variable in memory. variable1 db ? variable2 dw ? mul variable1 ; unsigned byte mul cx ; unsigned word imul bl ; signed byte imul variable2 ; signed word DIV and IDIV DIV performs unsigned division. IDIV performs signed division. For bytes, the dividend is the AH:AL pair. For words, the dividend is the DX:AX pair. In byte division, AH must be correctly prepared before the division. For word division, DX must be correctly prepared before the division. The divisor may be either a register or a variable in memory. variable1 db ? variable2 dw ? div variable1 ; unsigned byte div cx ; unsigned word idiv bl ; signed byte idiv variable2 ; signed word The quotient and remainder are as follows: byte AL = quotient, AH = remainder word AX = quotient, DX = remainder No flags are affected. If the quotient is too large for the lower register, or if you divide by zero, a zero divide program interrupt occurs. CORRECT SIGN EXTENSION To prepare for division, you must correctly sign extend the lower register into the upper register. For unsigned division, zero the upper register (AH = 0 or DX = 0). For signed division, use CBW and CWD. CBW (convert byte to word) extends a signed number in AL through AH:AL. CWD (convert word to double) extends a signed number in AX through DX:AX Chapter 7 - LOGIC ================= 47 There are a number of operations which work on individual bits of a byte or word. Before we start working on them, it is necessary for you to learn the Intel method of numbering bits. Intel starts with the low order bit, which is #0, and numbers to the left. If you look at a byte: 7 6 5 4 3 2 1 0 that will be the ordering. If you look at a word: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 that is the ordering. The overwhelming advantage of this is that if you extend a number, the numbering system stays the same. That means that if you take the number 45 : 7 6 5 4 3 2 1 0 0 0 1 0 1 1 0 1 (45d) and sign extend it: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 each of the bits keeps its previous numbering. The same is true for negative numbers. Here's -73: 7 6 5 4 3 2 1 0 1 0 1 1 0 1 1 1 (-73d) 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 (-73d) In addition, the bit-position number denotes the power of 2 that it represents. Bit 7 = 2 ** 7 = 128, bit 5 = 2 ** 5 = 32, bit 0 = 2 ** 0 = 1. {1}. Whenever a bit is mentioned by number, e.g. bit 5, this is what is being talked about. AND We will use AND as the prototype. There are five different ways you can AND two numbers: ____________________ 1 I'm using the Fortran convention for showing exponents. That is, 2 ** 7 is 2 to the 7th, 3 ** 19 is 3 to the 19th. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 48 ______________________ 1. AND two register 2. AND a register with a variable 3 AND a variable with a register 4. AND a register with a constant 5. AND a variable with a constant That is: variable1 db ? variable2 dw ? and cl, dh and al, variable1 and variable2, si and dl, 0C2h and variable1, 01001011b You will notice that this time the constants are expressed in hex and binary. These are the only two reasonable alternatives. These instructions work bit by bit, and hex and binary are the only two ways of displaying a number bitwise (bit by bit). Of course, with hex you must still convert a hex digit into four binary digits. The table of bitwise actions for AND is: 1 1 -> 1 1 0 -> 0 0 1 -> 0 0 0 -> 0 That is, a bit in the result will be set if and only if that bit is set in both the source and the destination. What is this used for? Several things. First, if you AND a register with itself, you can check for zero. and cx, cx If any bit is set, then there will be a bit set in the result and the zero flag will be cleared. If no bit is set, there will be no bit set in the result, and the zero flag will be set. No bit will be altered, and CX will be unchanged. This is the standard way of checking for zero. You can't AND a variable that way: and variable1, variable1 is an illegal instruction. But you can AND it with a constant with all the bits set: and variable1, 11111111b If the bit is set in variable1, then it will be set in the result. If it is not set in variable1, then it won't be set in the result. This also sets the zero flag without changing the variable. AND is also used in masks, which will be covered at the end of the chapter. Chapter 7 - Logic 49 _________________ Finally, there is a variant of AND called TEST. TEST does exactly the same thing as AND but throws away the results when it is done. It does not change the destination. This means that it can check for specific things without altering the data. It has the same possibilities as AND: variable1 db ? variable2 dw ? test cl, dh test al, variable1 test variable2, si test dl, 0C2h test variable1, 01001011b will set the flags exactly the same as the similar AND instructions but will not change the destination. We need a concrete example, and for that we'll turn to your video card. In text mode, your screen is 80 X 25. That is 2000 cells. Each cell has a character byte and an attribute byte. The character byte has the actual ascii number of the character. The attribute byte says what color the character is, what color the background is, whether the character is high or low intensity and whether it blinks. An attribute byte looks like this: 7 6 5 4 3 2 1 0 X R G B I R G B Bits 0,1 and 2 are the foreground (character) color. 0 is blue, 1 is green, and 2 is red. Bits 4, 5, and 6 are the background color. 4 is blue, 5 is green, and 6 is red. Bit 3 is high intensity, and bit 7 is blinking. If the bit is set (1) that particular component is activated, if the bit is cleared (0), that component is deactivated. The first thing to notice is how much memory we have saved by putting all this information together. It would have been possible to use a byte for each one of these characteristics, but that would have required 8 X 2000 bytes = 16000 bytes. If you add the 2000 bytes for the characters themselves, that would be 18000 bytes. As it is, we get away with 4000 bytes, a savings of over 75%. Since there are four different screens (pages) on a color card, that is 18000 X 4 = 72000 bytes compared to 4000 X 4 = 16000. That is a huge savings. We don't have the tools to access these bytes yet, but let's pretend that we have moved an attribute byte into dl. We can find out if any particular bit is set. TEST dl with a specific bit pattern. If the zero flag is cleared, the result is not zero so the bit was on. If the zero flag is set, the result is zero so that bit was off test dl, 10000000b ; is it blinking? test dl, 00010000b ; is there blue in the background? test dl, 00000100b ; is there red in the foreground? The PC Assembler Tutor 50 ______________________ If we look at the zero flag, this will tell us if that component is on. It won't tell us if the background is blue, because maybe the green or the red is on too. Remember, test alters neither the source nor the destination. Its purpose is to set the flags, and the results go into the Great Bit Bucket in the Sky. OR The table for OR is: 1 1 -> 1 1 0 -> 1 0 1 -> 1 0 0 -> 0 If either the source or the destination bit is set, then the result bit is set. If both are zero then the result is zero. OR is used to turn on a specific bit. or dl, 10000000b ; turn on blinking or dl, 00000001b ; turn on blue foreground After this operation, those bits will be on whether or not they were on before. It changes none of the bits where there is a 0. They stay the same as before. XOR The table for XOR is: 1 1 -> 0 1 0 -> 1 0 1 -> 1 0 0 -> 0 That is, if both are on or if both are off, then the result is zero. If only one bit is on, then the result is 1. This is used to toggle a bit off and on. xor dl, 10000000b ; toggle blinking xor dl, 00000001b ; toggle blue foreground Where there is a 1, it will reverse the setting. Where there is a 0, the setting will stay the same. This leads to one of the favorite pieces of code for programmers. xor ax, ax zeros the ax register. There are three ways to zero the ax register: mov ax, 0 sub ax, ax xor ax, ax Chapter 7 - Logic 51 _________________ The first one is very clear, but slightly slower. For the second one, if you subtract a number from itself, you always get zero. This is slightly faster and fairly clear.{2} For the third one, any bit that is 1 will become 0, and and bit that is 0 will stay 0. It zeros the register as a side effect of the XOR instruction. You'll never guess which one many programmers prefer. That's right, XOR. Many programmers prefer the third because it helps make the code more obsure and unreadable. That gives a certain aura of technical complexity to the code. NEG and NOT NOT is a logical operation and NEG is an arithmetical operation. We'll do both here so you can see the difference. NOT toggles the value of each individual bit: 1 -> 0 0 -> 1 NEG negates the value of the register or variable (a signed operation). NEG performs (0 - number) so: neg ax neg variable1 are equivalent to (0 - AX) and (0 - variable1) respectively. NEG sets the flags in the same way as (0 - number). ; negvsnot.asm ; compares the operations NEG and NOT ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE mov ax_byte, 1 ; signed mov bx_byte, 1 ; signed mov cx_byte, 1 ; signed mov si_byte, 3 ; binary mov di_byte, 3 ; binary mov bp_byte, 3 ; binary mov dx, 0 ; not used, so clear lea ax, ax_byte call set_reg_style call show_regs outer_loop: call get_signed ; get number mov bx, ax ; move it to all registers mov cx, ax mov si, ax mov di, ax mov bp, ax not bx ; NOT the second row down ____________________ 2 This is one of the first instructions in the template files. The PC Assembler Tutor 52 ______________________ not di neg cx ; NEG the third row down neg bp call show_regs jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE This is set up so the left registers are signed and the right registers are binary. The top registers retain the original value, the second registers down will do NOT and the third registers down will do NEG. Put a number in. On the right side, you will see that NOT (in DI) reverses the bit pattern, while on the left, NEG (in CX) negates the number. Do a few more. This is always true. But they seem to be related. In fact, BX will always be 1 too negative. Why? Remember that in the introduction on numbers, when we changed signs, we had: negative of number = one's complement + 1 but the one's complement is exactly NEG. It switches the value of each bit. To get the values in the third row (CX and BP), simply add 1 to the values in the second row (BX and DI). Remember, NOT is a logical operation, NEG is an arithmetic operation. MASKS To explain masks, we'll need some data, and we'll use the attribute byte for the monitor. Here it is again: 7 6 5 4 3 2 1 0 X R G B I R G B Bits 0,1 and 2 are the foreground (character) color. 0 is blue, 1 is green, and 2 is red. Bits 4, 5, and 6 are the background color. 4 is blue, 5 is green, and 6 is red. Bit 3 is high intensity, and bit 7 is blinking. What we want to do is turn certain bits on and off without affecting other bits. What if we want to make the background black without changing anything else? We use and AND mask. and video_byte, 10001111b Bits 0, 1, 2, 3 and 7 will remain unchanged, while bits 4, 5 and 6 will be zeroed. This will make the background black. What if we wanted to make the background blue? This is a two step process. First we make the background black, then set the blue background bit. This involves first the AND mask, then an OR mask. and video_byte, 10001111b or video_byte, 00010000b Chapter 7 - Logic 53 _________________ The first instruction shuts off certain bits without changing others. The second turns on certain bits without effecting others. The binary constant that we are using is called a mask. You may write this constant as a binary or a hex number. You should never write it as a signed or unsigned number (unless you are one of those people who just adores making code unreadable). If you want to turn off certain bits in a piece of data, use an AND mask. The bits that you want left alone should be set to 1, the bits that you want zeroed should be set to 0. Then AND the mask with the data. If you want to turn on certain bits in a piece of data, use an OR mask. The bits that you want left alone should be set to 0. The bits that you want turned on should be set to 1. Then OR the mask with the data. Go back to AND and OR to make sure you believe that this is what will happen. The PC Assembler Tutor 54 ______________________ SUMMARY For AND, TEST, OR, and XOR you can have the following combinations: 1. two register 2. a register with a variable 3 a variable with a register 4. a register with a constant 5. a variable with a constant AND is a bitwise logical operation. 1 1 -> 1 1 0 -> 0 0 1 -> 0 0 0 -> 0 TEST does the same thing as AND except that the result is discarded. It is used for setting the flags without altering the data. OR is a bitwise logical operation. 1 1 -> 1 1 0 -> 1 0 1 -> 1 0 0 -> 0 XOR is a bitwise logical operation. 1 1 -> 0 1 0 -> 1 0 1 -> 1 0 0 -> 0 You can use NOT and NEG with either a register or a variable in memory. NOT is a bitwise logical operation 0 -> 1 1 -> 0 NEG is an arithmetic operation. NUMBER -> - NUMBER. It gives the negative of a signed number. MASKS If you want to turn off certain bits of a piece of data, use an AND mask. The bits that you want left alone should be set to 1, the bits that you want zeroed should be set to 0. Then Chapter 7 - Logic 55 _________________ AND the mask with the data. If you want to turn on certain bits of a piece of data, use an OR mask. The bits that you want left alone should be set to 0. The bits that you want turned on should be set to 1. Then OR the mask with the data. Chapter 8 - Shift and Rotate ============================ 56 There are seven instructions that move the individual bits of a byte or word either left or right. Each instruction works slightly differently. We'll make a standard program and then substitute each instruction into that program. SAL - SHL The instructions SHL (shift logical left) and SAL (shift arithmetic left) are exactly the same. They have the same machine code. They shift each bit to the left. How far? That depends. There are two (and only two) forms of this instruction. All other shift and rotate instructions have these two (and only these two) forms as well. The first form is: shl al, 1 Which shifts each bit to the left one bit. The number MUST be 1. No other number is possible. The other form is: shl al, cl shifts the bits in AL to the left by the number in CL. If CL = 3, it shifts left by 3. If CL = 7, it shifts left by 7. The count register MUST be CL (not CX). The bits on the left are shifted out of the register into the bit bucket, and zeros are inserted on the right. The easy way to understand this is to fire up the standard program. Remember, from now on we always use template.asm. ;sal.asm ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE mov ax_byte, 0A3h ; half reg, low reg binary mov bx_byte, 0A4h ; half reg, low reg hex mov cx_byte, 0A1h ; half reg, low reg signed mov dx_byte, 0A2h ; half reg, low reg unsigned lea ax, ax_byte call set_reg_style mov ax, 0 ; clear registers mov bx, 0 mov cx, 0 mov dx, 0 mov di, 0 mov bp, 0 call show_regs outer_loop: call get_hex_byte ; get number and put in registers mov bl, al mov cl, al ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 8 - Shift and Rotate 57 ____________________________ mov dl, al mov si, 8 ; 8 iterations of the loop and al, al ; set the flags call show_regs_and_wait shift_loop: sal al, 1 sal bl, 1 sal cl, 1 sal dl, 1 call show_regs_and_wait dec si jnz shift_loop jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE This standard program is with bytes, not words. This is because if we had used words we would have performed 16 individual shifts and that would have been time consuming and boring. First we set the style to half registers. Notice that one is binary, one is hex, one is signed and one is unsigned. That covers all bases. All the registers are then cleared. It would be nice to use the loop instruction, but CX is committed, so we make our own loop instruction. We move 8 into SI. The loop instructions are: dec si jnz shift_loop DEC decrements a register or a variable by 1. Its counterpart INC increments a register or variable by 1. JNZ (jump if not zero) jumps to 'shift_loop' if SI is not zero. We get a hex byte in AL and put the same byte in BL, CL, and DL. This way we will be able to see what is happening in binary, hex, signed and unsigned. Before starting, we have: and al, al This is there to set the flags correctly before starting. All four are shifted left one bit each time, and then we look at the result. Assemble, link and run it. Enter the number 7. In binary, that is (0000 0111). Take a look at the flags before starting. It is a positive number so SF shows '+'. ZF is not set. PF shows 'O'. O stands for odd. Every time you perform an arithmetic or logical operation, the 8086 checks parity. Parity is whether the number contains an even or odd number of 1 bits. This contains 3 1 bits, so the parity is odd. The possible settings are 'E' for even and 'O' for odd.{1} SAL checks for parity (though some of the other instructions don't). Now press ENTER. It will shift left 1 and you will have (0000 1110). What does the unsigned number say now? 14. Press ENTER again. (0001 1100) What does the unsigned number say? 28. Again (0011 1000) 56. Again (0111 0000) 112. Notice that ____________________ 1 This is for use by communications programs. The PC Assembler Tutor 58 ______________________ the signed number reads +112. Look at the CF and OF. They are both cleared. Things are going to change now. Press ENTER again. (1110 0000). SF is now '-'. OF, the overflow flag is set because you changed the number from positive to negative (from +112 to -32). What is the unsigned number now? 224. CF is cleared. PF is '0'. Shift again. (1100 0000) OF is cleared because you didn't change signs. (Remember, the leftmost bit is the sign bit for a signed number). PF is now 'E' because you have two 1 bits, and two is even. CF is set because you shifted a 1 bit off the left end. Keep pressing ENTER and watch SF, OF, CF, and PF. Let's look at the unsigned numbers we had until we started shifting 1 bits off the left end. We started with 7, then had 14, 28, 56, 112, 224. This instruction is multiplying by 2. That's right, and it is MUCH faster than multiplication (about 50 times faster). Far and away the fastest way to multiply a register by 2, 4 or 8 is to use sal. ; by 2 ;by 4 ; by 8 sal di,1 sal di, 1 sal di, 1 sal di, 1 sal di, 1 sal di, 1 For a register, it is faster to use a series of 1 shifts than to load cl. For a variable in memory, anything over 1 shift is faster if you load cl. Do a few more numbers to see what is happening both with the number and the flags. CF always signals when a 1 bit has been shifted off the end. SAR and SHR Unlike the left shift instruction, there are two completely different right shift instructions. SHR (shift logical right) shifts the bits to the right, setting CF if a 1 bit is pushed off the right end. It puts 0s in the leftmost bit. Make a copy of SAL.ASM and replace the four instructions: sal al, 1 sal bl, 1 sal cl, 1 sal dl, 1 with SHR. We'll call the new program SHR.ASM. Run this one too. Instead of 7, use E0h (1110 0000) which is 224d. The first time you shift (0111 0000) the OF flag will be set because the sign changed. Keep shifting, noting the flags and the unsigned number. This time we have 224, 112, 56, 28, 14, 7, 3, 1. It is dividing by two and is once again MUCH faster than division. For a single shift, the remainder is in CF. For a shift of more than one bit, you lose the remainder, but there is a way around this which we will discuss in a moment. Do some more numbers till you are comfortable with the flags and the operation. If you want to divide by 16, you will shift right four times, so Chapter 8 - Shift and Rotate 59 ____________________________ you'll lose those 4 bits. But those bits are exactly the value of the remainder. All we need to do is: mov dx, ax ; copy of number to dx and dx, 0000000000001111b ; remainder in dx mov cl, 4 ; shift right 4 bits shr ax, cl ; quotient in ax Using a mask, we keep only the right four bits, which is the remainder. SAR SAR (shift arithmetic right) is different. It shifts right like SHR, but the leftmost bit always stays the same. This will make more sense when you run the program. Make another copy, call it SAR.ASM, and change the four instructions to SAR. The flags operate the same as for SHR and SHL. The overflow flag will never change since the left bit will always stay the same. First enter 74h (+116). We will be looking at the signed numbers only. Copy down the signed numbers as you go along. They should be: 116, 58, 29, 14, 7, 3, 1, 0, 0. Now try 8Ch (-116). The numbers you should get are: -116, -58, -29, -15, -8, -4, -2, -1, -1. They started out the same, then they got off by one. The negative numbers are one too negative. Try 39h (+57). The numbers here are: 57, 28, 14, 7, 3, 1, 0, 0, 0. Just as it should be for division by 2. Now try C7 (-57). Here the numbers are: -57, -29, -15, -8, -4, -2, -1, -1, -1. This time it went screwy right off the bat. Once again, the negative numbers are one too negative. SAR is an instruction for doing signed division by 2 (sort of). It is, however, an incomplete instruction. The rule for SAR is: SAR gives the correct answer if the number is positive. It gives the correct answer if the number is negative and the remainder is zero. If the number is negative but there is a remainder, then the answer is one too negative. The reason for this is a little complex, but we need to add some code if we want to do signed division.{2} For SHR, the remainder part was optional. Here it is not. We need to know whether the remainder is zero or not. For this example we will do a word shift left by 6. That's dividing by 64. remainder_mask dw 002Fh ; 63 call get_signed ; number in ax mov bx, ax ; copy in bx and bx, remainder_mask ; the remainder mov cl,6 ; shift right 6 bits sar ax, cl jns continue ; is it positive? ____________________ 2 Both the code and the reasons will be explained (but not proved) in the summary. The PC Assembler Tutor 60 ______________________ and bx, bx ; is the remainder zero? jz continue inc ax continue: We get the remainder, then shift right 6 bits. Upon finishing SAR, the sign flag will be set correctly. Here is yet another jump. This one is JNS (jump on not sign) jumps if the sign flag is NOT set, that is if the number is positive. If it is positive, then everything is ok so we skip ahead. If the number is negative, then we check to see if there was a remainder. If there wasn't, everything is ok, so we go ahead. If there was a remainder, then we INC (add 1) ax. Is the remainder correct? If the number was positive, the remainder is correct, but if the number was negative, then we need to do one more thing. After INC, but before 'continue' we have a SUB instruction: inc ax sub bx, 64 ; correct the remainder continue: Why that is the correct number will be explained in the summary. What a lot of work when we could simply write: mov cx, 64 call get_signed cwd ; sign extend idiv cx ; signed division Is there any advantage to this instruction? Not really. Remember that the more you shift, the longer it takes. If you shift 2, then it's about 1/3 faster than division. If you shift 14, then it is only 15% faster than division. Considering that even a slow PC can do 25000 divisions a second, you must be in serious need of speed to use this. In any case, you will never or almost never use SAR for signed division, while you will find lots of opportunity to use SHR and SHL for unsigned multiplication and division. ROR and ROL ROR (rotate right) and ROL (rotate left) rotate the bits around the register. We will just do one program since they operate the same way, only in opposite directions. Make another copy of SAL.ASM and put in ROR in the appropriate spots. Enter a number. This time you will notice that the bits, rather than dissapearing off the end, reappear on the other side. They rotate around the register. The only flags that are defined are OF and CF. OF is set if the high bit changes, and CF is set if a 1 bit moves off the end of the register to the other side. Do a few more, and we'll go on to the last two instructions. Chapter 8 - Shift and Rotate 61 ____________________________ RCR and RCL RCR (rotate through carry right) and RCL (rotate through carry left) rotate the same as the above instructions except that the carry flag is involved. Rotating right, the low bit moves to CF, the carry flag and CF moves to the high bit. Rotating left, the high bit moves to CF and CF moves to the low bit. There are 9 bits (or 17 bits for a word) involved in the rotation. Make yet another copy of the program, and change those 4 instructions to RCR. Also, since we have 9 bits instead of 8, change the loop count to 9 from 8: mov si, 9 Enter a number and watch it move. Before you start moving, look at CF and see if there is anything in it. There are only two flags defined, OF and CF. Obviously, CF is set if there is something in it. OF is wierd. In RCL (the opposite instruction to the one we are using), OF operates normally, signalling a change in the top (sign) bit. In RCR, OF signals a change in CF. Why? I don't have the slightest idea. You really have no need for the OF flag anyways, so this is unimportant. Well, those are the seven instructions, but what can you do with them besides multiply and divide? First, you can work with multiple bit data. The 8087 has a word length register called the status register. Looking at the upper byte: 15 14 13 12 11 10 9 8 X X X bits 11, 12 and 13 contain a number from 0 to 7. The data in this register is not directly accessable. You need to move the register into memory, then into an 8086 register. If you want to find what this number is, what do you do? mov bx, status_register_data mov cl, 3 ror bx, cl and bh, 00000111b we rotate right 3 and then mask off everything else. The number is now in BH. We could have used SHR if we wanted. Another 8087 register is the control register. In the upper byte it has: 15 14 13 12 11 10 9 8 X X a number from 0 to 3 in bits 10 and 11. If we want the information, we do the same thing: mov bx, control_register_data mov cl, 2 ror bx, cl The PC Assembler Tutor 62 ______________________ and bh, 00000011b and the number is in BH. You are now going to write a program that inputs an unsigned number and prints out its hex representation. Here it is: ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE mov ax_byte, 0A5h ; half regs, right ascii mov bx_byte, 4 ; hex mov dx_byte, 4 ; hex lea ax, ax_byte call set_reg_style call show_regs outer_loop: call get_unsigned mov bx, ax mov dx, ax mov cx, 4 inner_loop: push cx ; save cx mov cl, 4 rol bx, cl ; rotate left 1/2 byte mov al, bl ; copy to al and al, 0Fh ; mask off upper 1/2 byte cmp al, 10 ; < 10, 0 - 9 ; > 9 A - F jae use_letters add al, '0' ; change to ascii jmp print_it use_letters: add al, 'A' - 10 ; 10 = 'A' print_it: call print_ascii_byte call show_regs_and_wait pop cx loop inner_loop jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE AL will be shown in ascii while BX and DX will be in hex. We save the original number in DX. Since the first thing we want to print is the left hex character, we rotate left, not right. We move the low byte to AL, mask off everything but the low hex number and then convert to an ascii character. If it is 0 - 9, we add '0' (the character, not the number). If it is > 9, we add "'A' - 10" and get a letter (if the number is 10, we get 'A'). JAE means jump if above or equal, and is an unsigned comparison.{3} ____________________ 3 You are getting innundated with conditional jump instructions. Don't worry. As long as you understand each one when you run across it, you don't have to remember it. All jump instructions will be covered soon. Chapter 8 - Shift and Rotate 63 ____________________________ Finally, we print the ascii character that is in AL.{4} Another thing to notice is that just inside the loop we push CX. That is because we use CL for the ROL instruction. It is then POPped just before the loop instruction. This is typical. CX is the only register that can be used for counting in indexed instructions. It is common for indexing instructions to be nested, so you temporarily store the old value of CX while you are using CX for something different. push cx ; typical code for a shift mov cl, 7 shr si, cl pop cx Finally, let's multiply large numbers by 2. Here's the code: ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE byte1 db ? byte2 db ? byte3 db ? byte4 db ? error_message db "Result is too large.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE outer_loop: lea ax, byte1 ; get 4 byte number call get_unsigned_4byte shl byte1, 1 rcl byte2, 1 rcl byte3, 1 rcl byte4, 1 jnc go_on lea ax, error_message call print_string go_on: lea ax, byte1 call print_unsigned_4byte jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE This will require some explaination. Get_unsigned_4byte gets a number from 1 to four billion. We put it in memory. Normally, the following instructions would be done word by word. We are doing them byte by byte so you can see the mechanics of the situation. The low byte is shifted left 1 bit. This doubles it, but may shift a 1 bit from the high bit into CF. If it does, then it will be present when we rotate byte2. That moves CF into the low bit and moves the high bit into CF. We do it again. And again. If there is an unsigned overflow, it will be signalled by CF being ____________________ 4 Any subroutine in ASMHELP.OBJ that involves a one byte input or output has the data in AL. The PC Assembler Tutor 64 ______________________ set after: rcl byte4, 1 JNC (jump on not carry) will skip the error message if everything is ok. Print_string prints a zero terminated string, that is a C string which is terminated by the number (not the character) 0. Finally, we print the number. A word about large numbers in ASMHELP.OBJ. It is assumed that you would like to use commas if you could. Any data type over 1 word long allows commas. The following are considered the same by ASMHELP.OBJ in its input routines: 23546787 2,3,5,4,6,7,8,7 23,,5,46,,78,7 23,546787 23,546,787 It always prints commas correctly in the print routines. Chapter 8 - Shift and Rotate 65 ____________________________ SUMMARY All shift and rotate instructions operate on either a register or on memory. They can be either 1 bit shifts: sal cx, 1 ror variable1, 1 shr bl, 1 or shifts indexed by CL (it must be CL): rcl variable2, cl sar si, cl rol ah, cl SHL and SAL SHL (shift logical left) and SAL (shift arithmetic left) are exactly the same instruction. They move bits left. 0s are placed in the low bit. Bits are shoved off the register (or memory data) on the left side, and CF indicates whether the last bit shoved was a 1 or a 0. It is used for multiplying an unsigned number by powers of 2. SHR SHR (shift logical right) does the same thing as SHL but in the opposite direction. Bits are shifted right. 0s are placed in the high bit. Bits are shoved off the register (or memory data) on the right side and CF indicates whether the last bit shoved off was a 0 or a 1. It is used for dividing an unsigned number by powers of 2. SAR SAR (shift arithmetic right) shifts bits right. The high (sign) bit stays the same throughout the operation. Bits are shoved off the register (or memory data) on the right side. CF indicates whether the last bit shoved off was a 1 or a 0. It is used (with difficulty) for dividing a signed number by powers of 2. ROR and ROL ROR (rotate right) and ROL (rotate left) rotate the bits of a register (or memory data) right and left respectively. The bit which is shoved off one end is moved to the other end. CF indicates whether the last bit moved from one end to the other was a 1 or a 0. RCR and RCL The PC Assembler Tutor 66 ______________________ RCR (rotate through carry right) and RCL (rotate through carry left) rotate the bits of a register (or of memory data) right and left respectively. The bit which is shoved off the register (or data) is placed in CF and the old CF is placed on the other side of the register (or data). INC INC increments a register or a variable by 1. inc ax inc variable1 DEC DEC decrements a register or a variable by 1. dec ax dec variable1 The following is fairly technical. It is only for those willing to wade their way through a turgid explaination. If you don't understand it, forget it. CODE FOR SHL If you are shifting an UNSIGNED number right by 'X' bits, it is the same as dividing by (2 ** X) 1 bit = (2**1 = 2), 2 bits = (2**2 = 4), 7 bits = (2**7 = 128). This is the same as dividing by a number which is all 0s except the Xth bit which is 1 (for 0 we have 0000 0001, for 1 we have 0000 0010, for 3 we have 0000 1000, for 7 we have 1000 0000). The remainder mask will be this number minus 1 (for 0 we have 0000 0000, for 1 we have 0000 0001, for 3 we have 0000 0111, for 7 we have 0111 1111). CODE FOR SAR The order of numbers is important for SAR. If you start with 0 and add 1 each time, the actual sequence of signed numbers that you get (from the bottom up) is: -1 -2 . . -32767 -32768 +32767 +32766 . . 3 2 1 0 Chapter 8 - Shift and Rotate 67 ____________________________ The positive numbers are increasing in absolute value while the negative numbers are decreasing in absolute value. If you divide by shifting and there is no remainder, then the quotient is exact. If there is a remainder, the quotient will truncate towards 0 IN THE ABOVE DIAGRAM. This means that positive numbers will truncate down, while the negative numbers will truncate towards -32768, and will be one too negative. If the number was positive, the remainder will be positive and will be exactly the same as for SHR. If the number was negative, then things are more complicated. We'll take division by 32 as an example. If we divide by 32 (0010 0000) the remainder mask will be 31 (0001 1111). If the number is negative, then what we get when we AND the mask: and ax, 00011111b is not the remainder but (remainder + 32). In order to get the actual negative remainder, we need to subtract 32. This gives us (remainder + 32 - 32). remainder mask = divisor - 1 negative remainder correction = NEG divisor. Chapter 9 - Jumps ================= 68 So far we have done almost exclusively sequential programming - the order of execution has been from one instruction to the next until the very end, where the jump instruction has brought us back to the top. This is not the normal way programs work. In order to have things like DO loops, FOR loops, WHILE loops, CASE constructions and IF-THEN-ELSE constructs, we need to make decisions and to be able to go to different blocks of code depending on our decisions. Intel has provided a wealth of conditional jumps to answer all our needs. All of them are listed in a summary at the end of this chapter. The thing we will do most often is compare the size of two numbers. In BASIC code: IF A < B THEN we need to see if A < B . If that is true, we do one thing, if it is false, we do something else. One thing that we need to watch out for is whether A and B are signed or unsigned numbers. Let A = F523h (signed -2781; unsigned 62755) and B = 59E0h (signed +23008; unsigned 23008). If A and B are signed numbers, then A < B. However, if they are unsigned numbers, then A > B. In C and PASCAL, the compiler takes care of whether you want signed or unsigned numbers (BASIC assumes that it is always signed numbers). You are now on the machine level, and must take care of this yourself. To compare two numbers, you subtract one from the other, and then evaluate the result (less than 0, 0, or more than 0). To compare A and B, you do A minus B. To compare AX and BX, you can: sub ax, bx and then evaluate it. Unfortunately, if you do that, you will destroy the information in AX. It will have been changed in the process. Intel has solved this problem with the CMP (compare) instruction. cmp ax, bx subtracts BX from AX, sets the flags, and throws the answer away. CMP is exactly the same as SUB except that AX remains unchanged. We probably don't want to save the result anyway. If we do, we can always use: sub ax, bx We have subtracted BX from AX. There are now three possibilities. (1) AX > BX so the answer is positive, (2) AX = BX so the answer is zero, or (3) AX < BX so the answer is negative. But are these ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 9 - Jumps 69 _________________ signed or unsigned numbers? The 8086 uses the same subtract (or CMP) instruction for both signed or unsigned numbers. You have to tell the 8086 in the following instruction whether you were talking about signed or unsigned numbers. cmp ax, bx ja some_label asks the machine to compare AX and BX. It then says that AX and BX were UNSIGNED numbers and the machine should jump to "some_label" if AX was above BX. cmp ax, bx jg some_label asks the machine to compare AX and BX. It then says that AX and BX were SIGNED numbers and the machine should jump to "some_label" if AX was greater than BX. The 8086 makes the decision by looking at the flags. They give it complete information about the result. (The decision making rules are in the summary). In our example on the previous page, we had A = -2781 (unsigned 62755) and B = +23008 (unsigned 23008). If we put A in AX and B in BX, then the instruction JA will execute the jump, while the instruction JG will not execute the jump. What happens if the 8086 doesn't execute the jump? It goes on to the next instruction. As I said before, the 8086 has LOTS of conditional jumps. I am going to give you a list of them now, but be forewarned that the mnemonics for these suckers are confusing. Rather than doing something sensible like JUG for 'jump unsigned greater' and JSG for 'jump signed greater' - things that would be easy to distinguish, they have JA for 'jump above' (unsigned) and JG for 'jump greater' (signed).{1} Therefore, use the summary at the end of the chapter. With enough use, you will remember which is which. The arithmetic jump instructions have two forms, a positive one and a negative one. For instance: ja ; jump above jnbe ; jump not (below or equal) are actually the same machine instruction. You use whichever one makes the program clearer. We will have examples later to illustrate both positive and negative mnemonics. Here are the signed and unsigned instructions and their meaning. The ____________________ 1 This wierd use of the words above and greater, below and less, is so confusing that my copy of the Microsoft assembler manual v5.1 has it reversed on one page. It calls the signed jumps unsigned and the unsigned jumps signed. And that's the one place where it SHOULD be right. The PC Assembler Tutor 70 ______________________ equivalent mnemonics will appear in pairs. THESE ARE THE SIGNED JUMP INSTRUCTIONS jg ; jump if greater jnle ; jump if not (less or equal){2} jl ; jump if less jnge ; jump if not (greater or equal) je ; jump if equal jz ; jump if zero jge ; jump if greater or equal jnl ; jump if not less jle ; jump if less or equal jng ; jump if not greater jne ; jump if not equal jnz ; jump if not zero These are self-explainatory, keeping in mind that these apply only to signed numbers. THESE ARE THE UNSIGNED JUMP INSTRUCTIONS ja ; jump if above jnbe ; jump if not (below or equal) jb ; jump if below jnae ; jump if not (above or equal) je ; jump if equal jz ; jump if zero jae ; jump if above or equal jnb ; jump if not below jbe ; jump if below or equal jna ; jump if not above jne ; jump if not equal jnz ; jump if not zero These apply to unsigned numbers, and should be clear. JZ, JNZ, JE and JNE work for both signed and unsigned numbers. After all, zero is zero. ____________________ 2 I was trying to decide whether or not to put in the parentheses. If there are two things after the "not" the "not" applies to both of them. By the rules of logic, not (less or equal) == not less AND not equal. If you don't understand this, try to find someone who can explain it to you. Chapter 9 - Jumps 71 _________________ Before we get going, there is one more thing you need to know. The unconditional jump: jmp some_label can jump anywhere in the current code segment. XXXX is the current number in CS, then jmp can go from XXXX offset 0 to XXXX offset 65535. Conditional jumps are something else entirely. ALL conditional jumps (including the loop instruction) are limited range jumps. They are from -128 to +127. That is, they can go backward up to 128 bytes and they can go forward up to 127 bytes.{3} You will find that you will get assembler errors because the conditional jumps are too far away, but don't worry because we can ALWAYS fix them. You will find out later how to deal with them. As in the other arithmetic instructions, there are five forms of the CMP instruction. 1. compare two registers 2. compare a register with a variable 3. compare a variable with a register 4. compare a register with a constant 5. compare a variable with a constant These look like: cmp ah, dl cmp si, memory_data cmp memory_data, ax cmp ch, 127 cmp memory_data, 14938 Here are some decisions we might have to make in programs. (1) We are writing a program to print hex numbers on the screen. We have one routine for 0 - 9 and a different one for A - F. Sound familiar? cmp al, 10 jb decimal_routine ; here we are past the jump, so we can start the A - F ; routine. (2) We want to fire everyone over the age of 55 (this is an unsigned number since there are no negative ages): ____________________ 3 But they don't jump from the beginning of the machine instruction, they jump from the END of the machine instruction (which is two bytes long). This means that they have an effective range of from -126 bytes to +129 bytes from the BEGINNING of the instruction. The PC Assembler Tutor 72 ______________________ cmp employee_age, 55 ja find_a_reason_for_termination ; start of routine for younger employees here. (3) You want to know if you need to go to a loanshark: mov di, bank_balance cmp unpaid_bills, di jg gotta_go_see_Vinnie ; continue normal routine here (4) Notice that the last one could have also been written: mov di, bank_balance cmp di, unpaid_bills jl gotta_go_see_Vinnie ; continue normal routine though to my eye the first one seems clearer. (5) You have the results from two calculations, the first one in DI and the second one in CX. You need to go to different routines depending on which is larger. If they are the same, you exit: cmp di, cx jg routine_one ; di is greater jl routine_two ; cx is greater jmp exit_label ; they are equal We had two conditional jumps in a row and both of them were able to look at the results of the CMP instruction because the first jump (JG) did not change any of the flags. This is true of all jumps - conditional jumps, the unconditional jump, and LOOP. They never change any of the flags, so you can have two jumps in a row and be certain that the flags will not be altered. (6) Your dad has promised to buy you a Corvette if you don't get suspended from school more than three times this semester. Here's his decision code: cmp number_of_suspensions, 3 jng buy_him_the_corvette ; better luck next time JUMP OUT OF RANGE If the code you are jumping to is more than -128 or +127 bytes from the end of a conditional jump, you will get an assembler error that the jump is out of range. This can always be fixed. Take this code: Chapter 9 - Jumps 73 _________________ cmp ax, bx jg destination_label further_code: ; continue the program If 'destination_label' is out of range, what you need to do is jump to 'further_code' with a conditional jump (you are within range of 'further_code') and use JMP (which can go anywhere in the segment) to go to 'destination_label'. To switch the jump, you simply negate the jump condition: jg -> jng je -> jne jne -> je jbe -> jnbe We use reverse logic. Originally, if the condition was met we jumped. If the condition was not met we continued. Now, if the condition is NOT met, we jump, and if the condition is NOT not met (yes, there are two NOTs) which means it was met, we continue, and this sends us to the JMP instruction. Make sure you believe this works correctly before going on. The code then looks like this: cmp ax, bx jng further_code jmp destination_label further_code: ; continue the program This is the standard way of handling the situation. The PC Assembler Tutor 74 ______________________ SUMMARY CMP CMP performs the same operation as subtraction, but it does not change the registers or variables, it only sets the flags. It is the cousin of TEST. As usual, there are five possibilities: 1. compare two registers 2. compare a register with a variable 3. compare a variable with a register 4. compare a register with a constant 5. compare a variable with a constant THESE ARE THE SIGNED JUMP INSTRUCTIONS jg ; jump if greater jnle ; jump if not (less or equal) jl ; jump if less jnge ; jump if not (greater or equal) je ; jump if equal jz ; jump if zero jge ; jump if greater or equal jnl ; jump if not less jle ; jump if less or equal jng ; jump if not greater jne ; jump if not equal jnz ; jump if not zero THESE ARE THE UNSIGNED JUMP INSTRUCTIONS ja ; jump if above jnbe ; jump if not (below or equal) jb ; jump if below jnae ; jump if not (above or equal) je ; jump if equal jz ; jump if zero jae ; jump if above or equal jnb ; jump if not below jbe ; jump if below or equal jna ; jump if not above jne ; jump if not equal jnz ; jump if not zero Chapter 9 - Jumps 75 _________________ THESE JUMPS CHECK A SINGLE FLAG These come in opposite pairs jc ; jump if the carry flag is set jnc ; jump if the carry flag is not set jo ; jump if the overflow flag is set jno ; jump if the overflow flag is not set jp or jpe ; jump if parity flag is set (parity is even) jnp or jpo ;jump if parity flag is not set (parity is odd) js ; jump if the sign flag is set (negative ) jns ; jump if the sign flag is not set (positive or 0) THIS CHECKS THE CX REGISTER jcxz ; jump if cx is zero Why do we have this instruction? Remember, the loop instruction decrements CX and then checks for 0. If you enter a loop with CX set to zero, the loop will repeat 65536 times. Therefore, if you don't know what the value of CX will be when you enter the loop, you use this instruction just before the loop to skip the loop if CX is zero: jcxz after_the_loop loop_start: . . . . loop loop_start after_the_loop: INFORMATION ABOUT JUMPS The unconditional jump (JMP) can go anywhere in the code segment. All other jumps, which are conditional, can only go from -128 to +127 bytes from the END of the jump instruction (that's from -126 to +129 bytes from the beginning of the instruction). Jumps have no effect on the 8086 flags. How does the 8086 decide whether something is more, less, or the same? The rules for unsigned numbers are easy. If you subtract a larger number from a smaller, the subtraction will pass through zero and will set the carry flag (CF). If they are the same, the result will be zero and it will set the zero flag (ZF). If the first number is larger, the machine will clear the carry flag and the zero flag will be cleared (ZF = 0). Therefore, for unsigned numbers, we have: The PC Assembler Tutor 76 ______________________ First number is: above CF = 0 ZF = 0 equal CF = 0 ZF = 1 not equal CF = ? ZF = 0 below CF = 1 ZF = 0 All other unsigned compares are combinations of those three things: jae = ja OR je (CF = 0 and ZF = 0) or ZF = 1 jbe = jb OR je CF = 1 or ZF = 1 When you have a negative condition, it is much easier to look at its equivalent positive condition to figure out what is going on: jnae is the same as jb CF = 1 jnbe is the same as ja CF = 0 ZF = 0 SIGNED NUMBERS This section is not for the fainthearted. It is not necessary, so if you find yourself getting confused, just remember that if you see documentation talking about a jump where the overflow flag equals the sign flag or the overflow flag doesn't equal the sign flag, it is talking about SIGNED numbers. Zero is zero, so we won't concern ourselves with it here. It is exactly the same. If A and B are two signed word sized numbers and we have: cmp A, B then we can have four different cases: 1) If A is just a little greater than B [(A - B) <= +32767], then the result will be a small positive number, and there will be no overflow. SF = 0, OF = 0. 2) If A is much greater than B [+32767 < (A - B)], then the result will be too positive and it will overflow from positive to negative. This will set both the sign flag (it is now negative) and the overflow flag. SF = 1, OF = 1. 3) If A is a little less than B [-32768 <= (A - B)], that is if it is only a little negative, then the result is a small negative number, and there is no overflow. SF = 1, OF = 0. 4) If A is much less than B [(A - B) < -32768], then the result is a large negative number. It is too negative and overflows into a positive number. SF = 0, OF = 1. Recapping, for a positive result: 1) SF = 0, OF = 0 Chapter 9 - Jumps 77 _________________ 2) SF = 1, OF = 1 and for a negative result: 3) SF = 1, OF = 0 4) SF = 0, OF = 1 For positive results (and zero), SF = OF. For negative results, SF is not equal to OF. This, in fact, is how the 8086 decides a signed jump. If SF = OF, it's positive, if SF is not equal to OF, it's negative. If ZF = 1, then obviously they are equal. Here is the list: greater SF = OF ZF = 0 equal SF = OF ZF = 1 not equal ZF = 0 less SF is not equal to OF As with the unsigned numbers, if you have a negative condition, it is easier to change it into its equivalent positive condition and then figure out the requirements. For instance: jnge same as jl SF is not equal to OF jnl same as jge ( SF = OF and ZF = 0 ) or ( ZF = 1) If you think about it, this OF = SF stuff does make sense. We are subtracting two numbers. If the first one is greater, then the answer will be positive. It can either be a little positive as in (cmp 0, -1 ) = 1 or it can be very positive, as in (cmp 32767, -32768) = 65,535 (same as -1). If it is just a little positive, there is no overflow and it has a positive sign (SF = 0, OF = 0). If the difference gets large, then the number overflows from + to -. At that point OF = 1, but it now has a negative sign, so OF = SF. The flags MUST match. In the opposite case where the second number is greater, The answer is negative. It can either be a little negative as in (cmp 12, 13)= -1, or it can be very negative (cmp -32768, 32767) = -65535 =1. If it is a small difference, the sign is negative, but there is no overflow (SF = 1, OF = 0). As the difference gets larger, the number overflows from negative to positive so the sign flag is now positive, but the overflow flag is set (SF = 0, OF = 1). Those flags CAN'T match. Chapter 10 - Templates ====================== 78 Do you remember when you were younger and you needed to look up a word in the dictionary? It would define the word in terms of a second word which you didn't know so you would look that up too. Most likely that second word was either defined in terms of a third word you didn't know or it referred you back to the first word. This chapter is something like that. The items in the template file are interdependent. If you're lucky, everything will be clear by the time you have finished the chapter. If not, you'll have to reread it. There are four different things which operate on the assembler instructions which you write - the ASSEMBLER, the LINKER, the LOADER and the 8086. 1) The ASSEMBLER takes your text and turns it into the machine code that is used by the 8086. It is complete except that the addresses of data and subroutines might change during linking and loading. The assembler generates information called HEADER files which give the LINKER and LOADER the information they need to update these addresses in the machine code. This means that you can move the code anywhere in memory. 2) If your program is made up of more than one file, the LINKER links them together. It then makes it ready for running. If there is only one file, the linker makes it ready for running. It does this by updating the addresses of anything it has moved. It still leaves the HEADER files which contain the segment addresses. 3) At run time, the LOADER, which is part of the operating system, decides where to put your program in memory. It loads the program, and adjusts any segment addresses in the program to reflect where the program actually is in memory. It then gives control to the program. 4) The code is fixed at the time the 8086 takes over. Any addresses are constants and are unchangable. Keep this in mind as we work through the template file. THE .LST FILE The first thing we need to look at is segments. Let's look at a slightly modified version of the template file called segs.asm. Here it is. ;*********************************** ; segs.asm ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 10 - Templates 79 ______________________ ; - - - - - - - - - - - - - STACKSEG SEGMENT STACK 'STACK' variable4 dw 4444h dw 100h dup (?) STACKSEG ENDS ; - - - - - - - - - - - - - MORESTUFF SEGMENT PUBLIC 'HOHUM' variable2 dw 2222h MORESTUFF ENDS ; - - - - - - - - - - - - - DATASTUFF SEGMENT PUBLIC 'DATA' variable1 dw 1111h DATASTUFF ENDS ; - - - - - - - - - - - - - CODESTUFF SEGMENT PUBLIC 'CODE' EXTRN print_num:NEAR , get_num:NEAR ASSUME cs:CODESTUFF,ds:DATASTUFF ASSUME es:MORESTUFF,ss:STACKSEG variable3 dw 3333h main proc far start: push ds sub ax,ax push ax mov ax, DATASTUFF mov ds,ax mov ax, MORESTUFF mov es,ax mov cx, variable1 mov variable1, cx ret main endp CODESTUFF ENDS ; - - - - - - - - - - - - END start ;*************************** There is an extra segment put in that has the definition MORESTUFF SEGMENT PUBLIC 'HOHUM' The PC Assembler Tutor 80 ______________________ There is a variable defined in each segment including the stack segment. These variables all have numbers in them, and the numbers are in hex so they will be easy to read. There are only two external subroutines (neither of which is called). It is time to take a look at an assembler listing. ----- THIS IS FROM THE SCREEN ----- C>masm segs Microsoft (R) Macro Assembler Version 5.10 Copyright (C) Microsoft Corp 1981, 1988. All rights reserved. Object filename [segs.OBJ]: Source listing [NUL.LST]: segs Cross-reference [NUL.CRF]: ----------- If you don't put a semicolon after the filename with masm, you get some prompts. The first asks you if you want the object file name to be different from the asm file name. You may change either the name or the name and the extension. If you don't want to change either, just press ENTER. The second asks if you want a listing. Normally you don't, so you just press ENTER. This time we do, so we give it the same name as the assembler file. The assembler will generate a file SEGS.LST. Finally, it asks if you want the information needed to create a cross-reference file. We won't cover that. Once again, press ENTER. The assembler generates an object file and a listing. Here's the complete listing. ********************** Microsoft (R) Macro Assembler Version 5.10 9/2/89 09:50:54 Page 1-1 ; segs.asm ; - - - - - - - - - - - - - 0000 STACKSEG SEGMENT STACK 'STACK' 0000 4444 variable4 dw 4444h 0002 0100[ dw 100h dup (?) ???? ] 0202 STACKSEG ENDS ; - - - - - - - - - - - - - 0000 MORESTUFF SEGMENT PUBLIC 'HOHUM' 0000 2222 variable2 dw 2222h Chapter 10 - Templates 81 ______________________ 0002 MORESTUFF ENDS ; - - - - - - - - - - - - - 0000 DATASTUFF SEGMENT PUBLIC 'DATA' 0000 1111 variable1 dw 1111h 0002 DATASTUFF ENDS ; - - - - - - - - - - - - - 0000 CODESTUFF SEGMENT PUBLIC 'CODE' EXTRN print_num:NEAR , get_num:NEAR ASSUME cs:CODESTUFF,ds:DATASTUFF ASSUME es:MORESTUFF,ss:STACKSEG 0000 3333 variable3 dw 3333h 0002 main proc far 0002 1E start: push ds 0003 2B C0 sub ax,ax 0005 50 push ax 0006 B8 ---- R mov ax, DATASTUFF 0009 8E D8 mov ds,ax 000B B8 ---- R mov ax, MORESTUFF 000E 8E C0 mov es,ax 0010 8B 0E 0000 R mov cx, variable1 0014 89 0E 0000 R mov variable1, cx 0018 CB ret 0019 main endp 0019 CODESTUFF ENDS Microsoft (R) Macro Assembler Version 5.10 9/2/89 09:50:54 Page 1-2 ; - - - - - - - - - - - - END start Microsoft (R) Macro Assembler Version 5.10 9/2/89 09:50:54 Symbols-1 Segments and Groups: The PC Assembler Tutor 82 ______________________ N a m e Length Align Combine Class CODESTUFF . . . . . . . . . . . 0019 PARA PUBLIC 'CODE' DATASTUFF . . . . . . . . . . . 0002 PARA PUBLIC 'DATA' MORESTUFF . . . . . . . . . . . 0002 PARA PUBLIC 'HOHUM' STACKSEG . . . . . . . . . . . . 0202 PARA STACK 'STACK' Symbols: N a m e Type Value Attr GET_NUM . . . . . L NEAR 0000 CODESTUFF External MAIN . . . . . . . . F PROC 0002 CODESTUFF Length = 0017 PRINT_NUM . . . . L NEAR 0000 CODESTUFF External START . . . . . . . . L NEAR 0002 CODESTUFF VARIABLE1 . . . . . . L WORD 0000 DATASTUFF VARIABLE2 . . . . . . L WORD 0000 MORESTUFF VARIABLE3 . . . . . . L WORD 0000 CODESTUFF VARIABLE4 . . . . . . L WORD 0000 STACKSEG @CPU . . . . . . . . . . . . . . TEXT 0101h @FILENAME . . . . . . . . . . . TEXT segs @VERSION . . . . . . . . . . . . TEXT 510 54 Source Lines 54 Total Lines 21 Symbols 48006 + 428261 Bytes symbol space free 0 Warning Errors 0 Severe Errors ********************** As you can see, the listing, even for a short program, is very long. Let's take it apart section by section. The first large section is a copy of the text file except that there is information on the left. The number on the far left tells the offset address (in hex) from the beginning of the segment for each label, variable or instruction. In this section: 0000 3333 variable3 dw 3333h 0002 main proc far 0002 1E start: push ds 0003 2B C0 sub ax,ax 0005 50 push ax 0006 B8 ---- R mov ax, DATASTUFF 0009 8E D8 mov ds,ax Chapter 10 - Templates 83 ______________________ 000B B8 ---- R mov ax, MORESTUFF 000E 8E C0 mov es,ax 0010 8B 0E 0000 R mov cx, variable1 0014 89 0E 0000 R mov variable1, cx 0018 CB ret 0019 main endp "start" is at 0002h ,"mov cx, variable1" is at 0010h and "ret" is at 18h. The second set of numbers is the actual machine instructions in hex. These are the what the 8086 operates on. "push ds" is 1E, "mov ds, ax" is 8E D8, and "ret" is CB. The instructions can be from 1 - 6 bytes long. Notice the "R" after some of the instructions. The "R" stands for relocatable. This means that it is an address that might be changed by either the linker or the loader. We'll talk about that later. In any case, the object file keeps track of these so they can be changed if necessary. Also, go back to the complete listing and look at the four variables; you will see that the values have been put in the object code; that is, 1111h, 2222h, 3333h and 4444h. If we had had an error, the assembler would have placed an error message at the spot of the error in this part of the file. The next part of the .LST file is the segment listing. It tells how the segments are defined. N a m e Length Align Combine Class CODESTUFF . . . . . . . . 0019 PARA PUBLIC 'CODE' DATASTUFF . . . . . . . . 0002 PARA PUBLIC 'DATA' MORESTUFF . . . . . . . . 0002 PARA PUBLIC 'HOHUM' STACKSEG . . . . . . . . . 0202 PARA STACK 'STACK' We have the segment name, length, and some other information we'll talk about later. Notice that 'HOHUM' which is an artificial class, is dutifully listed with no complaints. Then comes the listing of all labels, variables, and procedure names. Symbols: N a m e Type Value Attr GET_NUM . . . . L NEAR 0000 CODESTUFF External MAIN . . . . . F PROC 0002 CODESTUFF Length = 0017 PRINT_NUM . . L NEAR 0000 CODESTUFF External START . . . . . L NEAR 0002 CODESTUFF VARIABLE1 . . . L WORD 0000 DATASTUFF The PC Assembler Tutor 84 ______________________ VARIABLE2 . . . L WORD 0000 MORESTUFF VARIABLE3 . . . L WORD 0000 CODESTUFF VARIABLE4 . . . L WORD 0000 STACKSEG It shows the segment and offset, whether they are bytes, words, processes etc. The "L" stands for label. The variables and procedures which are in an external file are so marked. Neither print_num nor get_num was called, but the assembler maintains a listing for them. Finally, some internal info for the assembler. @CPU . . . . . . . . . . . . . . TEXT 0101h @FILENAME . . . . . . . . . . . TEXT segs @VERSION . . . . . . . . . . . . TEXT 510 We will come back to parts of the .LST file, so make yourself comfortable with it. SEGMENTS It is now time for the nitty-gritty. We need to know what all those statements in the template file mean. Remember that there are four players in the game - (1) MASM, the Microsoft assembler, (2) LINK, the Microsoft linker, (3) the program loader and (4) the 8086 chip itself. Who does what to whom is the subject of this chapter. You will notice that there are three segments in all the template files, one for data, one for code, and one for the stack. How many segments can a program have? An unlimited number for code, an unlimited number for data, and one for the stack.{1} Although you can have an unlimited number of segments, you can use only four at any one time - two for regular data (referenced by the DS and ES registers), one for code (referenced by the CS register), and one for temporary data (referenced by the SS register). You don't have direct control over CS. You should NEVER change the value in SS. This means that you can only change which segments that ES and DS refer to. How do you do that? The 8086 does not allow you to move a constant into a segment register. Therefore it is a two step process. Put the constant into an arithmetic register (AX, BX, CX, DX, SI, DI or BP) and from there to the segment register. Suppose we have 327 different data segments in our file (named SEG1, SEG2, SEG3 ... SEG327) and we wanted to reference data in SEG27. The code would be: mov ax, SEG27 mov ds, ax ____________________ 1 Although if you REALLY need more space for a stack it is possible, if a little arcane. Chapter 10 - Templates 85 ______________________ This is the standard way to do it, and this is the same as the fourth and fifth instructions in the code segment of the template files where we are putting the address of DATASTUFF in ds. What is that SEG27 in the instruction (mov ax, SEG27)? It is a constant. When the assembler assembles the program, it makes note of the fact that you want to have the starting address of SEG27 in that instruction (you saw the "R" in the listing for the instruction 'mov ax, DATASTUFF'). Later the linker makes sure there is a SEG27 segment in the complete program, gives it a temporary segment address, and puts this temporary address in every place that references that segment address. This address is guaranteed to be adjusted. You will see why when we look at the linker .MAP file. Finally, the loader (which is the program that puts your program into memory) puts the segment where it wants and updates all references to the segment address to reflect where it now is. Thus, the program is complete only when this information is put in at run time. Each time you run the program SEG27 might be in a different place, but the loader will always update the references correctly. We named the segments SEG1, SEG2, etc. Does SEG have to be part of the segment name? Not on your life. Here are three perfectly acceptable segment definitions: CURLY SEGMENT LARRY SEGMENT MOE SEGMENT It is good practice to have 'SEG' as part of the segment name to remind you that these are segments, not variables, but this is a practice only, it is not a law. Any name you could use for a variable or a label you could use as a segment name. The reserved word SEGMENT after the name tells the assembler that this is the beginning of a segment with that name. You tell the assembler that you are starting a segment with 'SEGMENT' CURLY SEGMENT and you tell the assembler that you are finished with that segment with the reserved word ENDS (END [of] Segment): CURLY ENDS You need to put the name of the segment before the ENDS directive. In the template file, the data segment definition reads: DATASTUFF SEGMENT PUBLIC 'DATA' DATASTUFF is the segment name, but what are PUBLIC and 'DATA' there for? To understand this, we need to look at the linker. First, let's assemble temp1.asm (our first template file) just the way it is. The PC Assembler Tutor 86 ______________________ ---------- FROM THE SCREEN ---------- C>masm temp1.asm Microsoft (R) Macro Assembler Version 5.10 Copyright (C) Microsoft Corp 1981, 1988. All rights reserved. Object filename [temp1.OBJ]: Source listing [NUL.LST]: temp1 Cross-reference [NUL.CRF]: ---------- We have made the listing file so let's look at the segment information. N a m e Length Align Combine Class CODESTUFF . . . . . . . . . . . 000A PARA PUBLIC 'CODE' DATASTUFF . . . . . . . . . . . 0000 PARA PUBLIC 'DATA' STACKSEG . . . . . . . . . . . . 00C8 PARA STACK 'STACK' You will see that CODESTUFF is Ah (10d) bytes long, DATASTUFF has no data and is 0 bytes long, and STACKSEG is C8h (200d) bytes long. Now let's link temp1.obj and asmhelp.obj. ---------- FROM THE SCREEN ----- C>link temp1+asmhelp Microsoft (R) Overlay Linker Version 3.61 Copyright (C) Microsoft Corp 1983-1987. All rights reserved. Run File [TEMP1.EXE]: List File [NUL.MAP]: temp Libraries [.LIB]: ---------- This time we have made a listing file for the link process. It is called TEMP.MAP. Let's look at it. Start Stop Length Name Class 00000H 000C7H 000C8H STACKSEG STACK 000D0H 00540H 00471H DATASTUFF DATA 00550H 01944H 013F5H CODESTUFF CODE Program entry point at 0055:0000 This is what the map file looks like. There are still only three segments in the final executable file, STACKSEG, DATASTUFF and CODESTUFF. You will notice that the class name is still there, but the PUBLIC is missing. It's job is finished. "Start" says where the segment starts in the executable file, "Stop" says Chapter 10 - Templates 87 ______________________ where the segment ends in the executable file, and "Length" says the length in bytes of the segment. These numbers are 5 digit hex numbers instead of 4. That means that they are showing the total address. The segment number is the left 4 digits of 'Start'. STACKSEG is C8h (200d) bytes long like before. Although DATASTUFF had no data, it is now 471h (1137d) bytes long, and CODESTUFF was Ah (10d) bytes long before but now it is a whopping 13F5h (5109d) bytes long. What happened? The linker did its work. One of the things the linker does is combine things that we want to be in the same segment. It took the DATASTUFF segment from temp1.obj and appended the DATASTUFF segment from asmhelp.obj, combining them into one larger segment.{2} It took the CODESTUFF segment from temp1.obj and appended the CODESTUFF segment from asmhelp.obj, making them one large segment. Why did it do that? Because we put the word "PUBLIC" in the segment definition. When the assembler sees "PUBLIC" in the segment definition, it passes that information along to the linker in a header file.{3} When the linker has a segment which is "PUBLIC", it will append any other segment which (1) is "PUBLIC", (2) has the same name (i.e. CODESTUFF or DATASTUFF or CURLY etc.), and (3) has the same class name{4}. All three things must be true for the linker to combine them. We will actually check this out a little later to make sure you believe it. One other thing to notice is that the linker is allocating only as much space as is needed. It could allocate 65536 bytes for each segment defined, but it uses only as much as the program needs and then starts the next segment at the next segment starting address. This is efficient management of memory. What is the advantage of combining the smaller segments into one larger segment? For code, there is no big advantage. But for data, remember that every time we want to access data, we need to have the starting address of that particular segment in register ds. We do this by using: mov ax, DATASTUFF mov ds, ax If we have a number of data segments, every time we access data ____________________ 2 The linker always works from left to right. For each different type of segment, it starts with the first one it finds and then appends each succeeding one it finds. 3 A header is information for the linker or loader which is put in front of the machine code in an object file or an executable file. There are typically a number of headers in front of the machine code. 4 Remember that class names are somewhat arbitrary. I use 'CODE', 'DATA' and 'STACK' for clarity and because they are the standard Microsoft class names, but if you are not linking with anyone else's programs, you can use any class name you want. The PC Assembler Tutor 88 ______________________ we need to (1) make sure that ds contains the address of the correct data segment, and (2) if not, we need to write the code to change ds. This entails using a lot of code, can be confusing and is certainly error prone. With one data segment, you simply load ds with the correct address at the beginning of the program and then forget about it. This should be a rule for you. Unless you have truly humongous amounts of data (over 65535 bytes), ALWAYS put all your data in the same segment. Do you remember those dashes '----' in the assembler listing? That was because the assembler didn't have a segment address to put there. 0004 B8 ---- R mov ax, DATASTUFF 0007 8E D8 mov ds,ax 0009 8B 0E 0000 R mov cx, variable1 000D 89 0E 0000 R mov variable1, cx The linker now has a temporary address for the start of DATASTUFF (000D0h) so it will put the segment address (the left four hex bytes) in this spot. This is temporary, but will be updated by the loader. If variable1 has been moved, it will update that too. Why am I sure that these temporary segments will be moved? The segment address of STACKSEG is 0000h. The segment address of DATASTUFF is 000Dh (13h) and the segment address of CODESTUFF is 0055h (85d). But the operating sysyem owns the first several THOUSAND segments. The loader will load your program in much higher memory. They must move. So the linker combines all the segments we want to combine, and then it looks at the machine code and modifies every reference to the segments and to the variables which have been moved. That is a lot of work. For instance, when the linker appends asmhelp.obj, there are a hundred or so variables which it moves and a thousand or so references to those variables which it modifies. The linker does that every time you link a file with ASMHELP.OBJ. That's not too shabby. Do all three conditions need to be met for the linker to combine segments into one segment? 1) They have the same name 2) They have the same class name 3) They are both defined PUBLIC Joe Bob says check it out. Here are two .ASM files which contain a number of segments. Here's the first file: ;file1.asm ;- - - - - - - - - - - - - - - - - - - - STACKSEG SEGMENT STACK 'STACK' dw 100 dup (?) STACKSEG ENDS ;- - - - - - - - - - - - - - - - - - - - MORESTUFFA SEGMENT PUBLIC variable21 dw ? MORESTUFFA ENDS ;- - - - - - - - - - - - - - - - - - - - DATASTUFF SEGMENT PUBLIC 'DATA' variable1 dw ? DATASTUFF ENDS ;- - - - - - - - - - - - - - - - - - - - MORESTUFF SEGMENT PUBLIC 'DATA' variable2 dw ? MORESTUFF ENDS ;- - - - - - - - - - - - - - - - - - - - EVENMORESTUFF SEGMENT PUBLIC 'DATA' variable3 dw ? EVENMORESTUFF ENDS ;- - - - - - - - - - - - - - - - - - - - CODESTUFF SEGMENT PUBLIC 'CODE' ASSUME cs:CODESTUFF, ds:DATASTUFF ASSUME ds:MORESTUFF, es:MORESTUFFA, ds:EVENMORESTUFF main proc far start: push ds sub ax,ax push ax ret main endp CODESTUFF ENDS ;- - - - - - - - - - - - - - - - - - - - END start Here's the other file: ;file2.asm ; - - - - - - - - - - - - - - - - - - - - STACKSEG SEGMENT STACK 'STACK' dw 100 dup (?) STACKSEG ENDS ; - - - - - - - - - - - - - - - - - - - - NOTDATASTUFF SEGMENT PUBLIC 'DATA' variable4 dw ? NOTDATASTUFF ENDS ; - - - - - - - - - - - - - - - - - - - - The PC Assembler Tutor 90 ______________________ DATASTUFF SEGMENT PUBLIC 'DATA' variable5 dw ? DATASTUFF ENDS ; - - - - - - - - - - - - - - - - - - - - MORESTUFFA SEGMENT PUBLIC variable61 dw ? MORESTUFFA ENDS ; - - - - - - - - - - - - - - - - - - - - MORESTUFF SEGMENT PUBLIC 'CLASSOF68' variable6 dw ? MORESTUFF ENDS ; - - - - - - - - - - - - - - - - - - - - EVENMORESTUFF SEGMENT 'DATA' variable7 dw ? EVENMORESTUFF ENDS ; - - - - - - - - - - - - - - - - - - - - CODESTUFF SEGMENT PUBLIC 'CODE' ASSUME cs:CODESTUFF, ds:DATASTUFF, ds:NOTDATASTUFF ASSUME ds:MORESTUFF,ds:MORESTUFFA, ds:EVENMORESTUFF subroutine proc far ret subroutine endp CODESTUFF ENDS ; - - - - - - - - - - - - - - - - - - - - END You will notice that the two CODESTUFFs, the two DATASTUFFs, the two MORESTUFFAs and the two STACKSEGs each have the same definitions, but that (1) NOTDATASTUFF has a different name than DATASTUFF, (2) one MORESTUFF has a different class name from the other, (3) one EVENMORESTUFF is PUBLIC and the other is not, and (4) the two MORESTUFFAs have NO class name. Here's the segment information from file1.lst N a m e Length Align Combine Class CODESTUFF . . . . . . . . . . 0005 PARA PUBLIC 'CODE' DATASTUFF . . . . . . . . . . 0002 PARA PUBLIC 'DATA' EVENMORESTUFF . . . . . . . . 0002 PARA PUBLIC 'DATA' MORESTUFF . . . . . . . . . . 0002 PARA PUBLIC 'DATA' MORESTUFFA . . . . . . . . . . 0002 PARA PUBLIC STACKSEG . . . . . . . . . . . 00C8 PARA STACK 'STACK' and from file2.lst N a m e Length Align Combine Class CODESTUFF . . . . . . . 0001 PARA PUBLIC 'CODE' Chapter 10 - Templates 91 ______________________ DATASTUFF . . . . . . . 0002 PARA PUBLIC 'DATA' EVENMORESTUFF . . . . . 0002 PARA NONE 'DATA' MORESTUFF . . . . . . . 0002 PARA PUBLIC 'CLASSOF68' MORESTUFFA . . . . . . . 0002 PARA PUBLIC NOTDATASTUFF . . . . . . 0002 PARA PUBLIC 'DATA' STACKSEG . . . . . . . . 00C8 PARA STACK 'STACK' These are in alphabetical order. Before we link them together, let's think about what should happen if all three conditions must be met. Both CODESTUFF segments are PUBLIC with the same class name, so they should merge. Both DATASTUFF segments are PUBLIC with the same class name so they should merge. EVENMORESTUFF is PUBLIC in one file but not public in the other, so they should not merge. MORESTUFF is PUBLIC in both files, but they have different class names, so they should not merge. What about STACKSEG? The STACK combine type is similar to PUBLIC{1}, and they have the same class name, so they should merge. Finally, there are the MORESTUFFAs. They have the same name and are PUBLIC, but they have no class name. Will they combine? Let's see what happens. Here is the .MAP file from the command C> link file1+file2 Start Stop Length Name Class 00000H 0018FH 00190H STACKSEG STACK 00190H 001A1H 00012H MORESTUFFA 001B0H 001C1H 00012H DATASTUFF DATA 001D0H 001D1H 00002H MORESTUFF DATA 001E0H 001E1H 00002H EVENMORESTUFF DATA 001F0H 001F1H 00002H NOTDATASTUFF DATA 00200H 00201H 00002H EVENMORESTUFF DATA 00210H 00220H 00011H CODESTUFF CODE 00230H 00231H 00002H MORESTUFF CLASSOF68 Program entry point at 0021:0000 STACKSEG, DATASTUFF and CODESTUFF combined. MORESTUFFA combined. The others are separate. Doesn't this confuse the linker if it has more than one segment with the same name? No. The linker knows which variables are in which segments, and the names of the segments are not relevant. If you look at the class information from the linker listing, you will notice that all things in the same class are grouped together. The linker works from left to right on the command line, so for the above, it read file1.obj first and then read ____________________ 1 STACK tells the linker to combine any other segments which have STACK and the class type 'STACK' and it tells the loader to set the SS register to that segment, and set the SP register to point to the end of that segment. The PC Assembler Tutor 92 ______________________ file2.obj. It orders things (1) first by class (in the order encountered, and then (2) by segment (in the order encountered). For the linker ordering, a segment is like a subclass. Look through the assembler files to check that if you link in the order file1+file2, the order of encountering classes is 'STACK', empty, 'DATA', 'CODE', and 'CLASSOF68'. check the segment ordering also. What if we link the opposite way? > link file2+file1 Here's the listing: Start Stop Length Name Class 00000H 0018FH 00190H STACKSEG STACK 00190H 00191H 00002H NOTDATASTUFF DATA 001A0H 001B1H 00012H DATASTUFF DATA 001C0H 001C1H 00002H EVENMORESTUFF DATA 001D0H 001D1H 00002H MORESTUFF DATA 001E0H 001E1H 00002H EVENMORESTUFF DATA 001F0H 00201H 00012H MORESTUFFA 00210H 00211H 00002H MORESTUFF CLASSOF68 00220H 00234H 00015H CODESTUFF CODE Program entry point at 0022:0010 Assure yourself that this is the order the classes are encountered for file2+file1. Before we go on, let's summarize what we have so far. 1) In an .asm file, each segment starts with a name followed by the word SEGMENT. 2) Each segment ends with the name followed by the word ENDS (end of segment). This is the minimal segment definition: ; - - - - - SEG_A SEGMENT SEG_A ENDS ; - - - - - In addition, if you want to combine a segment with segments from other files in order to make one large segment, then all the segments to be combined must: 1) have the same name. 2) have the same class name (type) 3) be declared PUBLIC Chapter 10 - Templates 93 ______________________ ASSUME The next thing from the template file is the word ASSUME. Who is assuming what? ASSUME cs:CODESTUFF, ds:DATASTUFF This is for the assembler. It says that whenever you are working in the CODESTUFF segment, CS will be set to the segment address of the CODESTUFF segment. Whenever you are working in the DATASTUFF segment, DS will be set to the segment address of the DATASTUFF segment. The CS register takes care of itself, but it is your responsibility to make sure that DS actually points to the proper segment. If you just move a word from memory to a register: mov cx, variable1 the 8086 automatically thinks that it is in the DS segment. But it doesn't have to be that way. The 8086 has something called segment overrides. Here is the list: SEGMENT HEX VALUE CS 2E DS 3E ES 26 SS 36 An override is a 1 byte machine instruction that tells the 8086 that for the next instruction, the memory location will not reference the natural segment register; what it will reference is the segment register named in the override - CS if it is 2Eh, DS if it is 3Eh, ES if it is 26h, and SS if it is 36h. We could plug these in ourselves, but that is a lot of work. Fortunately, the assembler takes care of this for us. Let's look at the code from the very beginning of the chapter. ;*********************************** ; segs.asm ; - - - - - - - - - - - - - STACKSEG SEGMENT STACK 'STACK' variable4dw 4444h dw 100h dup (?) STACKSEG ENDS ; - - - - - - - - - - - - - MORESTUFF SEGMENT PUBLIC 'HOKUM' variable2 dw 2222h MORESTUFF ENDS ; - - - - - - - - - - - - - The PC Assembler Tutor 94 ______________________ DATASTUFF SEGMENT PUBLIC 'DATA' variable1 dw 1111h DATASTUFF ENDS ; - - - - - - - - - - - - - CODESTUFF SEGMENT PUBLIC 'CODE' EXTRN print_num:NEAR , get_num:NEAR ASSUME cs:CODESTUFF,ds:DATASTUFF ASSUME es:MORESTUFF,ss:STACKSEG variable3 dw 3333h main proc far start: push ds sub ax,ax push ax mov ax, DATASTUFF mov ds,ax mov ax, MORESTUFF mov es, ax mov cx, variable1 mov variable1, cx ret main endp CODESTUFF ENDS ; - - - - - - - - - - - - END start ;*************************** For the ASSUME statement we have: ASSUME cs:CODESTUFF,ds:DATASTUFF ASSUME es:MORESTUFF,ss:STACKSEG What we want to look at is this section of code: mov cx, variable1 mov variable1, cx Here is the listing of the offset address and machine code: 000E 8E C0 mov es,ax 0010 8B 0E 0000 R mov cx, variable1 0014 89 0E 0000 R mov variable1, cx Chapter 10 - Templates 95 ______________________ 0018 CB ret Variable1 is in DATASTUFF (ASSUME ds:DATASTUFF), and DS is the natural segment for variables. Now let's change the code to: mov cx, variable2 mov variable2, cx This is the ONLY change in the file. Variable2 is in MORESTUFF and we have - ASSUME es:MORESTUFF. Here's the listing when we assemble the modified file. 000E 8E C0 mov es,ax 0010 26: 8B 0E 0000 R mov cx, variable2 0015 26: 89 0E 0000 R mov variable2, cx 001A CB ret The assembler has put 26h as a segment override. When the 8086 looks at the machine code, it knows that those two instructions reference the es, not the ds, segment register. Also note that the code is now two bytes longer - one byte for each segment override. The "ret" instruction is at 1Ah (26d) instead of 18h (24d). Let's try it with: mov cx, variable3 mov variable3, cx Variable3 is in CODESTUFF and we have - ASSUME cs:CODESTUFF. Here's the listing: 000E 8E C0 mov es,ax 0010 2E: 8B 0E 0000 R mov cx, variable3 0015 2E: 89 0E 0000 R mov variable3, cx 001A CB ret The assembler put in the CS segment override. Now the 8086 knows that variable3 is in the CS segment. Finally: mov cx, varaible4 mov variable4, cx Variable4 is in STACKSEG and we have - ASSUME ss:STACKSEG. Here's the listing: 000E 8E C0 mov es,ax 0010 36: 8B 0E 0000 R mov cx, variable4 0015 36: 89 0E 0000 R mov variable4, cx 001A CB ret The PC Assembler Tutor 96 ______________________ Once again, the assembler put in a segment override. This time it was the SS override. That's nifty. We simply tell the assembler which segment register we will use for each segment and it does all the work. We will do more with segment overrides in the chapter on addressing modes. Remember, though, that it is your responsibility to see that at the time this code is used, the segment register actually contains the appropriate segment address. Is this ASSUME definition unique? That is, must there be a one to one correspondence between segments and registers, with each segment having its own register? No, not at all. Here are a two ASSUME statements, both of which are legal: ASSUME cs:COMSEG, ds:COMSEG, es:COMSEG, SS:COMSEG All four registers contain the address of the same segment. In fact, we will meet this statement when we talk about COM files. This is the only appropriate statement for a .COM file ASSUME ds:SEG_A, ds:SEG_B, es:SEG_C, es:SEG_D, es:SEG_A Four different segments, two of which are referenced by DS and three of which are referenced by ES. Remember, ASSUME tells the assembler that whenever you access something in that segment, the named register will be set to the starting segment address. What exactly does this mean to the assembler? Let's rearrange this a little: SEG_A ds, es SEG_B ds SEG_C es SEG_D es This is the list from the assembler's viewpoint. Suppose it has a variable that is in SEG_C. Does it need an override? Yes, it needs an ES override. Suppose it has a variable in SEG_A. Does it need an override? No, because DS is set to that segment. SUBROUTINES In assembler parlance, subroutines are called procedures. Why? You got me. In any case, whenever I say subroutine, process, subprogram, or anything like that, I mean a procedure. A procedure can have any name you want. You start a procedure by giving the name, using the reserved word 'proc' and then defining it as either near or far. my_procedure proc near is a near procedure with the name my_procedure. You end a procedure by giving the name and following it with the reserved word 'endp' (for end of procedure). Chapter 10 - Templates 97 ______________________ my_procedure endp What is a near procedure? It is one which is ALWAYS in the same segment as the calling program. When you call a near procedure, the value in CS stays the same, but IP (the instruction pointer) changes to the offset of the first byte of the procedure. The next instruction executed will be the first byte of the procedure. If a procedure is called even once from a different segment, then it MUST be a far procedure. my_procedure proc far my_procedure endp When you call a far procedure, the CS register is changed to the segment of the called procedure and IP (the instruction pointer) is set to the first byte of the procedure. This will be covered in the chapter on subroutines. How does the loader know where to start the program? The assembler tells the linker which tells the loader. How does the assembler know? You tell it. The last line of the file is the single word 'END'. That tells the assembler that you are done with the assembler code. If there is a word after the word 'END' (on the same line), then the assembler assumes that this word is the name of the label where the program starts. The first instruction executed will be whatever immediately follows that label. In the template files we have: END start so the label 'start:' indicates where the first instruction is. For an .EXE file, this can be anywhere at all, but we have it at the beginning. The label 'start:' is used for clarity, but we could just as easily have had: END zzyx4 The assembler would then look for the label 'zzyx4:' as the place to start the program. If you look at the link .MAP file from our file1+file2 example you will see: Program entry point at 0021:0000 That says that the starting address is CS = 0021h, IP = 0000h. Note that both CS and IP are different for the file2+file1 example: Program entry point at 0022:0010 where CS = 0022h and IP = 0010h. The initial offset was given to the linker by the assembler. The linker did any adjustment to the offset if it moved the code, and then it calculated the segment address itself. The PC Assembler Tutor 98 ______________________ RET When the loader loads the program, it puts the segment of the starting address in CS and the offset of the starting address in IP. This gives control to your program. When your program is done, how does it get back to the operating system? Good question. When the loader loads the program, it creates something called the PSP (program segment prefix). This is a 100h (256d) byte block of information and code. The first byte (offset 0000) of this block is an 8086 instruction for an orderly exit from a program. What we need to do is set CS to the PSP segment and set IP to 0000. Then the next instruction executed will be the orderly exit code. In talking about procedures, I said that when you call a far procedure, the 8086 puts the procedure's segment in CS and the procedure's offset in IP. But before that, it does two things: push CS ; these are the old CS and IP push IP ; this is not a real 8086 instruction {2} When you have a RET (return) instruction in a far procedure, the 8086 does the following: pop IP ; this is not a real 8086 instruction pop CS ; put back the old CS and IP so RET resets CS and IP to go back where it came from. That is its job. What has been pushed on the stack before starting your program? NOTHING. That's right. That means that if you execute ret at the end of your program, the 8086 will pop two pieces of garbage into IP and CS. Fortunately, when setting up a program, the loader ALWAYS puts the segment address of the PSP in DS., the data segment. All we need to do is PUSH DS (the PSP) and then PUSH 0 (offset 0000) and we have the address of our orderly exit code. If we then execute RET, it will POP these two items into IP and CS, sending us to our orderly exit code. That is what is at the beginning of the code section of the template file. We cannot PUSH a constant, so we manufacture a 0 with 'sub ax, ax'. The code is: push ds ; PSP segment sub ax, ax ; manufacture a 0 ____________________ 2 This is not actual 8086 code. You have no direct access to IP. This is, however, what the 8086 effectively does. Chapter 10 - Templates 99 ______________________ push ax ; offset = 0000 and the program is set up for the return. That's a lot of things together, so let's review. To exit a procedure we use RET, but for the starting procedure we need to return to the operating system. The PSP has the code for an orderly return at offset 0000. At load time, the loader puts the segment address of the PSP in DS. We push the PSP segment address and offset 0000 for later use by the RET instruction. We do this with: push ds ; PSP segment sub ax, ax ; manufacture a 0 push ax ; offset = 0000 These should be the first instructions in the program. Now that you have stored the PSP, DS is free for other use. You can now use DS to hold the segment address of your data. DS is used because that is the segment register that the 8086 expects unless told otherwise. You can't move a constant to a segment register, so this is a two step process: mov ax, DATASTUFF mov ds, ax EXTRN Finally, an EXTRN statement tells the assembler that the procedure or data is in another file and you did not forget it. For a procedure, you need to say whether it is NEAR (push old IP and put in new IP) or far (push old CS and IP; put in new CS and IP). Here is the assembler listing for five calls: E8 09CA R call near_routine 9A 15EE ---- R call far_routine E8 0000 E call near_external_routine 9A 0000 ---- E call far_external_routine E8 0000 E call get_unsigned The first two are in the same file, the next two are in an external file, and we have our friend 'get_unsigned'. 'R' means that the data may be changed, 'E' means that it is external, and will be done by the linker. The first four are labelled whether they are near or far. 'get_unsigned' is a near procedure. Notice that E8 is the near call while 9A is the far call. Also notice that the assembler reserves one word for the new IP in the near calls. If the call is in the same file, the assembler fills in this number, but if it is external the assembler sets it to 0. In the far calls the assembler reserves two words instead of one. The first word is again the new IP, which is either filled in or set to zero. The second word is for the segment address, and will be set by the linker. The PC Assembler Tutor 100 ______________________ Whew!!! It sure took a long time to go through all that and you still probably are unsure about some of this. Read the summary, and if you don't feel good about it, leave it for a day or two and reread it then. At the end of the book I will show you how you can simplify a lot of these things by using standardized segment names and some other standardized instructions. For now, you need to get used to what the structure of programs is, and we will continue using the same type of templates.{3} ____________________ 3 Just think of me as the computer equivalent of a woodshop teacher who forces you to use hand tools to make a coffee table rather than allowing you to use what you really want to be using - a chainsaw. Chapter 10 - Templates 101 ______________________ SUMMARY SEGMENTS Segments are defined by giving a name followed by the word SEGMENT. The end of a segment is signalled by the segment name, followed by the word ENDS (end of segment). ; - - - - - SOME_NAME SEGMENT SOME_NAME ENDS ; - - - - - (As always, anything after a comma is a comment and is ignored by the assembler). In addition, if you want to combine a segment with other segments, then all the segments to be combined must: 1) have the same name. 2) have the same type (class) 3) be declared PUBLIC THE STACK SEGMENT The stack segment may have any name you want, but should be declared " SEGMENT STACK 'STACK' ". This forces the loader to do certain initialization for you. If you don't declare it this way, you have to do the initialization yourself. ANY_NAME SEGMENT STACK 'STACK' EXTRN For procedures, an EXTRN statement tells the assembler that the procedure that you want to call is in a different file, that you didn't forget it. Procedures which are EXTRN must be declared either NEAR or FAR. The grammar is name colon NEAR or name colon FAR. EXTRN procedure1:NEAR, procedure2:FAR You may declare as many things on one line as will fit, but you need to separate them with commas. There can be no comma at the end. ASSUME An ASSUME statement tells the assembler that when a statement references that particular segment, the corresponding segment register will be set to that segment address. ASSUME es:MORESTUFF The PC Assembler Tutor 102 ______________________ tells the assembler that no matter what you do in other parts of the program, every time a variable in MORESTUFF is referenced, es will have the segment address of MORESTUFF. This is for the purpose of correct coding of segment overrides. SEGMENT OVERRIDES Normally, when the 8086 accesses a variable in memory, it does so via the DS segment register. This can be changed with a segment override. The assembler puts the correct segment override code in front of the instruction and the 8086 will use that segment register to access the data in memory. The override codes are: SEGMENT HEX VALUE CS 2E DS 3E ES 26 SS 36 CS CS is the code segment. When the 8086 processes machine code, it ALWAYS uses CS. There is no override. IP IP, the instruction pointer, gives the offset in CS of the next instruction to be processed. When the 8086 processes an instruction, it looks at IP, gets the next instruction and updates IP. This is totally automatic and internal to the 8086. You have no direct access to IP. PROCEDURES A procedure is declared by giving a name followed by the word 'proc' followed by either NEAR or FAR. A procedure is ended by giving the name, followed by 'endp' (end of procedure). ; - - - - - square_root proc far square_root endp ; - - - - - The words NEAR and FAR are for the assembler and the linker so they know whether to change just IP or both IP and CS in RET, the return statement as well as in CALL, the subroutine call. RET The assembler codes a near or a far return depending on whether you have declared a near or a far procedure. A NEAR return POPs Chapter 10 - Templates 103 ______________________ IP off of the stack while a FAR return POPs IP then POPs CS. Thus, a NEAR return stays in the same segment but a FAR return gets a new segment address in CS.{4} END The word END signals to the assembler that you are done with code. The assembler will ignore all following lines, whether they are blank or contain code. If the line with END has a name after the word END, then the assembler assumes that this is the name of a label where execution will begin at run time. That means that the instruction at 'label:' will be the first instruction executed in the program. SETUP In order to setup the program in the beginning you need to (1) PUSH the segment address of the PSP (which is in DS), then push 0 (the offset of the orderly return code). Following this you need to put the segment address of the data segment in DS. The code for all of this is: push ds ; PSP seg address is in ds sub, ax, ax ; 0 push ax ; push 0000 offset mov ax, DATA_SEG ; data segment address to ds mov ds, ax ____________________ 4 Of course, it is possible for CS to keep the same value if the calling procedure is is the same segment. Chapter 11 - Addressing Modes and Pointers ========================================== 104 In this chapter we are going to cover all possible ways of getting data to and from memory with the different addressing modes. Read this carefully, since it is likely this is the only time you will ever see ALL addressing possibilities covered. The easiest way to move data is if the data has a name and the data is one or two bytes long. Take the following data: ; ----- variable1 dw 2000 variable2 db -26 variable3 dw -589 ; ----- We can write: mov variable1, ax mov cl, variable2 mov si, variable3 and the assembler will write the appropriate machine code for moving the data. What can we do if the data is more than two bytes long? Here is some more data: ; ----- variable4 db "This is a string of ascii data." variable5 dd -291578 variable6 dw 600 dup (-11000) ; ----- Variable4 is the address of the first byte of a string of ascii data. Variable5 is a single piece of data, but it won't fit into an 8086 register since it is 4 bytes long. Variable6 is a 600 element long array, with each element having the value -11000. In order to deal with these, we need pointers. Some of you will be flummoxed at this point, while those who are used to the C language will feel right at home. A pointer is simply the address of a variable. We use one of the 8086 registers to hold the address of a variable, and then tell the 8086 that the register contains the address of the variable, not the variable itself. It "points" to a place in memory to send the data to or retrieve the data from. If this seems a little confusing, don't worry; you'll get the hang of it quickly. As I have said before, the 8086 does not have general purpose registers. Many instructions (such as LOOP, MUL, IDIV, ROL) work only with specific registers. The same is true of pointers. You may use only BX, SI, DI, and BP as pointers. The assembler will give you an error if you try using a different register as a pointer. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 11 - Addressing Modes 105 _____________________________ There are two ways to put an address in a pointer. For variable4, we could write either: lea si, variable4 or: mov si, offset variable4 Both instructions will put the offset address of variable4 in SI.{1} SI now 'points' to the first byte (the letter 'T') of variable4. If we wanted to move the third byte of that array (the letter 'i') to CL, how would we do it? First, we need to have SI point to the third byte, not the first. That's easy: add si, 2 But if we now write: mov cl, si we will generate an assembler error because the assembler will think that we want to move the data in SI (a two byte number) to CL (one byte). How do we tell the assembler that we are using SI as a pointer? By enclosing SI in square brackets: mov cl, [si] since CL is one byte, the assembler assumes you want to move one byte. If you write: mov cx, [si] then the assembler assumes that you want to move a word (two bytes). The whole thing now is: lea si, variable4 add si, 2 mov cl, [si] This puts the third byte of the string in CL. Remember, if a register is in square brackets, then it is holding the ADDRESS of a variable, and the 8086 will use the register to calculate where the data is in memory. What if we want to put 0s in all the elements of variable6? ____________________ 1 LEA stands for load effective address. Note that with LEA, we use only the name of the variable, while with: mov si, offset variable4 we need to use the word 'offset'. The exact difference between the two will be explained later. The PC Assembler Tutor 106 ______________________ Here's the code: mov bx, offset variable6 mov ax, 0 mov cx, 600 zero_loop: mov [bx], ax add bx, 2 loop zero_loop We add 2 to BX each time since each element of variable6 is a word (two bytes) long. There is another way of writing this: mov bx, offset variable6 mov cx, 600 zero_loop: mov [bx], 0 add bx, 2 loop zero_loop Unfortunately, this will generate an assembler error. Why? If the assembler sees: mov [bx], ax it knows that you want to move what is in AX to the address in BX, and AX is one word (two bytes) long so it generates the machine code for a word move. If the assembler sees: mov [bx], al it knows that you want to move what is in AL to the address in BX, and AL is one byte long, so it generates the machine code for a byte move. If the assembler sees: mov [bx], 0 it doesn't know whether you want a byte move or a word move. The 8086 assembler has implicit sizing. It is the assembler's job to look at each instruction and decide whether you want to operate on a byte or a word. Other microprocessors do things differently. On the Motorola 68000, the assembler uses explicit sizing. Each instruction must explicitly state whether it is a byte or a word.{2} On the 68000 you have: move.b #213, (A1) move.w #213, (A1) The first instruction says to move a byte (the number 213) to the address in register A1 while the second instruction says to move ____________________ 2 Any of you who use the 68000 assembler know that this is fudging the facts a little bit. Chapter 11 - Addressing Modes 107 _____________________________ a word (the number 213) to the address in register A1.{3} Back to the 8086. If the 8086 assembler looks at an instruction and it can't tell whether you want to move a byte or a word, it generates an error. When you use pointers with constants, you should explicitly state whether you want a byte or a word. The proper way to do this is to use the reserved words BYTE PTR or WORD PTR. mov [bx], BYTE PTR 213 mov [bx], WORD PTR 213 These stand for byte pointer and word pointer respectively. I find this terminology exceptionally clumsy, but that's life. Whenever you are moving a constant with a pointer, you should specify either BYTE PTR or WORD PTR. The Microsoft assembler makes some assumptions about the size of a constant. If the number is 256 or below (either positive or negative), you MUST explicitly state whether it is a byte or a word operation. If the number is 257 or above (either positive or negative), the assembler assumes that you want a word operation. Here's the previous code rewritten correctly: mov bx, offset variable6 mov cx, 600 zero_loop: mov [bx], WORD PTR 0 add bx, 2 loop zero_loop Let's add 435 to every element in the variable6 array: mov bx, offset variable6 mov cx, 600 add_loop: add [bx], WORD PTR 435 add bx, 2 loop add_loop How about multiplying every element in the array by 12? mov di, offset variable6 mov cx, 600 mov si, 12 mult_loop: mov ax, [di] imul si mov [di], ax add di, 2 loop mult_loop ____________________ 3 A1 is a 68000 register. The PC Assembler Tutor 108 ______________________ None of these examples did any error checking, so if the result was too large, the overflow was ignored. This time we used DI for a change of pace. Remember, we may use BX, SI, DI or BP, but no others. You will notice that in all these examples, we started at the beginning of the array and went step by step through the array. That's fine, and that's what we normally would do, but what if we wanted to look at individual elements? Here's a sample program: ; + + + + + START DATA BELOW THIS LINE ; poem_array db "She walks in Beauty, like the night" db "Of cloudless climes and starry skies;" db "And all that's best of dark and bright" db "Meet in the aspect ratio of 1 to 3.14159" character_count db 149 ; + + + + + END DATA ABOVE THIS LINE ; + + + + + START CODE BELOW THIS LINE mov bx, offset poem_array mov dl, character_count character_loop: sub ax, ax ; clear ax call get_unsigned_byte dec al ; character #1 = array[0] cmp al, dl ; out of range? ja character_loop ; then try again mov si, ax ; move char # to pointer register mov al, [bx+si] ; character to al call print_ascii_byte jmp character_loop ; + + + + + END CODE ABOVE THIS LINE You enter a number and the program prints the corresponding character. Before starting, we put the array address in BX and the maximum character count in DL. After getting the number from get_unsigned_byte, we decrement AL since the first character is actually poem_array[0]. The character count has been reduced by 1 to reflect this fact. It also makes 0 an illegal entry. Notice that the program checks to make sure you don't go past the end of the poem. This time we use BX to mark the beginning of the array and SI to count the number of the character. Once again, there are only specific combinations of pointers that can be used. They are: BX with either SI or DI (but not both) BP with either SI or DI (but not both) My version of the Microsoft assembler (v5.1) recognizes the forms [bx+si], [si+bx], [bx][si], [si][bx], [si]+[bx] and [bx]+[si] as the same thing and produces the same machine code for all six. Chapter 11 - Addressing Modes 109 _____________________________ We can get even more complicated, but to show that, we need structures. In databases they are called records. In C they are called structures; in any case they are the same thing - a group of different types of data in some standard order. After the group is defined, we usually make an array with the identical structure for each element of the array.{4} Let's make a structure for an address book. last_name db 15 dup (?) first_name db 15 dup (?) age db ? tel_no db 10 dup (?) In this case, all the data is bytes, but that is not necessary. It can be anything. Each separate piece of data is called a FIELD. We have the last_name field, the first_name field, the age field, and the tel_no field. Four fields in all. The structure is 41 bytes long. What if we want to have a list of 100 names in our telephone book? We can allocate memory space with the following definition: address_book db 100 dup ( 41 dup (' ')) {5} Well, that allocates room in memory, but how do we get to anything? First, we need the array itself: mov bx, offset address_book Then we need one specific entry. Let's take entry 29 (which is address_book[28]). Each entry is 41 bytes long, so: mov ax, 28 ; entry (less 1) mov cx, 41 ; entry length mul cx mov di, ax ; move to pointer That gives us the entry, but if we want to get the age, that's not the first byte of the structure, it's the 31st byte (actually address_book[28] + 30 since the first byte is at +0). We get it by writing: mov dl, [bx+di+30] This is the most complex thing we have - two pointers plus a constant. The total code is then: mov bx, offset address_book mov ax, 28 ; entry (less 1) mov cx, 41 ; entry length ____________________ 4 If you don't know about structures or records, now would be a good time to stop and go to a reference book about them. They are not actually covered here. 5 Nesting of dup statements is allowed. Rather than having uninitialized data, this has blanks in all the spaces. The PC Assembler Tutor 110 ______________________ mul cx ; entry offset from array[0] mov di, ax ; move entry offset to pointer mov dl, [bx+di+30] ; total address Though the machine code has only one constant in the code, the assembler will allow you to put a number of constants in the assembler instruction. It will add them together for you and resolve them into one number.{6} Once again, there are a limited number of registers - they are the same registers as before: BX with either SI or DI (but not both) plus constant BP with either SI or DI (but not both) plus constant We can work with structures on the machine level, but it looks like it's going to be hard to keep track of where each field is. Actually, it isn't so bad because of: OUR FRIEND, THE EQU STATEMENT The assembler allows you to do substitution. If you write: somestuff EQU 37 * 44 then every place that the assembler finds the word "somestuff", it will substitute what is on the right side of the EQU. Is that a number or text? Sometimes it's a number, sometimes it's text. Here are four statements which are defined totally in terms of numbers. This is from the assembler listing. (The assembler lists how it has evaluated the EQU statement on the left after the equal sign.) = 0023 statement1 EQU 5 * 7 = 0025 statement2 EQU statement1 + 6 - 4 = 000F statement3 EQU statement2 - 22 = 001F statement4 EQU statement3 + 16 and the assembler thinks of these as numbers (these numbers are in hex). Now in the next set, with only a minor change: = [bp + 3] statement1 EQU [bp + 3] = [bp + 3] + 6 - 4 statement2 EQU statement1 + 6 - 4 = [bp + 3] + 6 - 4 - 22 statement3 EQU statement2 - 22 ____________________ 6 And it does it quite well. The assembler correctly evaluated the following: add ax, (-3*81)+44/8+[si+27]+6+[bx]-7+(43*96)-2 Not bad, huh? Chapter 11 - Addressing Modes 111 _____________________________ = [bp + 3] + 6 - 4 - 22 + 16 statement4 EQU statement3 + 16 the assembler thinks of it as text. Obviously, the fact that it can be either may cause you some problems along the way. Consult the assembler manual for ways to avoid the problem. Now we have a tool to deal with structures. Let's look at that structure again. last_name db 15 dup (?) first_name db 15 dup (?) age db ? tel_no db 10 dup (?) We don't actually need a data definition to make the structure, we need equates: LAST_NAME EQU 0 FIRST_NAME EQU 15 AGE EQU 30 TEL_NO EQU 31 this gives us the offset from the beginning of each record. If we again define: address_book db 100 dup ( 41 dup (' ')) then to get the age field of entry 87, we write: mov bx, offset address_book mov ax, 86 ; entry (less 1) mov cx, 41 ; entry length mul cx ; entry offset from array[0] mov di, ax ; move entry offset to pointer mov dl, [bx+di+AGE] ; total address This is a lot of work for the 8086, but that is normal with complex structures. The only thing that takes a lot of time is the multiplication, but if you need it, you need it.{7} How about a two dimensional array of integers, 60 X 40 int_array dw 40 dup ( 60 dup ( 0 )) These are initialized to 0. For our purposes, we'll assume that the first number is the row number and the second number is the column number; i.e. array [6,13] is row 6, column 13. We will have 40 rows of 60 columns. For ease of calculation, the first array element is int_array [0,0]. (If it is your array, you can ____________________ 7 You will see more of the EQU statement. The PC Assembler Tutor 112 ______________________ set it up any way you want {8}). Each row is 60 words (120 bytes) long. To get to int_array [23, 45] we have: mov ax, 120 ; length of one row in bytes mov cx, 23 ; row number mul cx mov bx, ax ; row offset to bx mov si, 45 ; column offset sal si, 1 ; multiply column offset by 2 (for word size) mov dx, [bx+si] ; integer to dx Using SAL instead of MUL is about 50 times faster. Since most arrays you will be working with are either byte, word, or double word (4 bytes) arrays, you can save a lot of time. Let ELEMENT_NUMBER be the array number (starting at 0) of the desired element in a one-dimensional array. For byte arrays, no multiplication is needed. For a word: mov di, ELEMENT_NUMBER sal di,1 ; multiply by 2 and for a double word (4 bytes): mov di, ELEMENT_NUMBER sal di, 1 sal di, 1 ; multiply by 4 This means that a one-dimensional array can be accessed very quickly as long as the element length is a power of 2 - either 2, 4 or 8. Since the standard 8086 data types are all 1, 2, 4, or 8 bytes long, one dimensional arrays are fast. Others are not so fast. As a quick review before going on, these are the legal ways to address a variable on the 8086: (1) by name. mov dx, variable1 It is also possible to have name + constant. mov dx, variable1 + 27 The assembler will resolve this into a single offset number and will give the appropriate information to the linker. (2) with the single pointers BX, SI, DI and BP (which are enclosed in square brackets). mov cx, [si] ____________________ 8 Bearing in mind that all compiled languages have fixed formats for arrays. If you want your array to interact with C, Fortran, Pascal or Basic, you'd better be sure you have the right format. Chapter 11 - Addressing Modes 113 _____________________________ xor al, [bx] add [di], cx sub [bp], dh (3) with the single pointers BX, SI, DI and BP (which are enclosed in square brackets) plus a constant. mov cx, [si+421] xor al, 18+[bx] add 93+[di]-7, cx sub (54/7)+81-3+[bp]-19, dh (4) with the double pointers [bx+si], [bx+di], [bp+si], [bp+di] (which are enclosed in square brackets). mov cx, [bx][si] xor al, [di][bx] add [bp]+[di], cx sub [di+bp], dh (5) with the double pointers [bx+si], [bx+di], [bp+si], [bp+di] (which are enclosed in square brackets) plus a constant. mov cx, [bx][si+57] xor al, 45+[di+23][bx+15]-94 add [bp]+[di]-444, cx sub [6+di+bp]-5, dh These are ALL the addressing modes allowed on the 8086. As for the constants, it is the ASSEMBLER'S job to resolve all numbers in the expression into a single constant. If your expression won't resolve into a constant, it is between you and the assembler. It has nothing to do with the 8086 chip. The PC Assembler Tutor 114 ______________________ We can consolidate all this information into the following list: All the following addressing modes can be used with or without a constant: variable_name (+constant) [bx] (+constant) [si] (+constant) [di] (+constant) [bp] (+constant) [bx+si] (+constant) [bx+di] (+constant) [bp+si] (+constant) [bp+di] (+constant) This is a complete list. Thus, you can access a variable by name or with one of the eight pointer combinations. There are no other possibilities. One thing that may confuse you about an addressing statement is all the plusses and minuses. As an example: mov cx, -45+27[bx+22]+[-195+di]+23-44 the total address is: -45+27[bx+22]+[-195+di]+23-44 When the 8086 performs this instruction, it will ADD (1) BX (2) DI and (3) a single constant. That single constant can be a positive or a negative number; the 8086 will ADD all three elements. The '+' in front of 'di' is for convenience of the assembler only; [-195-di] is illegal and the assembler will generate an error. If you actually want the negative of what is in one of the registers, you must negate it before calling the addressing instruction: neg di mov cx, -45+27[bx+22]+[-195+di]+23-44 once again, the only allowable forms are +[di], [di] or [+di]. Either -[di] or [-di] will generate an assembler error. If you ever see a technical description of the addressing modes, you will find a list of 24 different machine codes. The reason for this is that: [bx] [bx] + byte constant [bx] + word constant are three different machine codes. Here is a listing of the same machine instruction with the three different styles: Chapter 11 - Addressing Modes 115 _____________________________ MACHINE CODE ASSEMBLER INSTRUCTION 03 04 add ax, [si] 03 44 1B add ax, [si+27] 03 44 E5 add ax, [si-27] 03 84 5BA7 add ax, [si+23463] 03 84 A459 add ax, [si-23463] (27d = 1Bh , 23463d = 5BA7h). The first byte of code (03) is the add (word) instruction. The second byte is the addressing code, and the third and fourth bytes (if any) are the constant (in hex). Addressing code 04 is: (ax, [si]). Addressing code 44 is: (ax, [si] + byte constant). Addressing code 84 is: (ax, [si] + word constant). The fact that there are three different machine codes is of concern to the assembler, not to you. It is the assembler's job to make the machine code as efficient as possible. It is your job to write quality, robust code. SEGMENT OVERRIDES So far, we haven't talked about segment registers. You will remember from the last chapter that the 8086 assumes that a named variable is in the DS segment: mov ax, variable1 If it isn't, the Microsoft assembler puts the correct segment override in the machine code. The segment overrides are: SEGMENT OVERRIDE MACHINE CODE (hex) CS 2E DS 3E ES 26 SS 36 As an example: MACHINE CODE ASSEMBLER INSTRUCTIONS 2E: 03 06 0000 R add ax, variable3 26: 2B 1E 0000 R sub bx, variable2 31 36 0000 R xor variable1, si ; no override 36: 21 3E 00C8 R and variable4, di when the different variables were in segments with different ASSUME statements. If you don't remember this, you should reread the section on overrides in the last chapter. Remember, the colon is in the listing only to tell you that we have a segment override. The colon is not in the machine code. The PC Assembler Tutor 116 ______________________ What about pointers? The natural segment for anything with [bp] is SS, the stack segment.{1} Everything else has DS as its natural segment. The natural segments are: (1) DS variable + (constant) [bx] + (constant) [si] + (constant) [di] + (constant) [bx+si] + (constant) [bx+di] + (constant) (2) SS [bp] + (constant) [bp+si] + (constant) [bp+di] + (constant) where the constant is always optional. Can you use segment overrides? Yes, in all cases.{2} Here is some assembler code along with the machine code which was generated. MACHINE CODE ASSEMBLER INSTRUCTIONS 26: 03 07 add ax, es:[bx] 2E: 01 05 add cs:[di], ax 36: 2B 44 11 sub ax, ss:[si+17] 2E: 29 46 00 sub cs:[bp], ax 3E: 33 03 xor ax, ds:[bp+di] 26: 31 02 xor es:[bp+si], ax 26: 89 43 16 mov es:[bp+di+22], ax 03 04 add ax, [si] 03 44 1B add ax, [si+27] 03 84 A459 add ax, [si-23463] 26: 03 04 add ax, es:[si] 26: 03 44 1B add ax, es:[si+27] 26: 03 84 A459 add ax, es:[si-23463] (17d = 11h, 22d = 16h, 27d = 1Bh, -23463d = 0A459h). The first number (which is followed by a colon) is the segment override that the assembler has inserted in the machine code. Remember, the colon is in the listing to inform you that an override is ____________________ 1 We will see why when we look at subroutines. BP is called the base pointer [bp] and is used in a special way. 2 There are some special instructions for two independent pointers which we will cover at the end of the book. These allow segment overrides but force the override to refer to the first pointer. Chapter 11 - Addressing Modes 117 _____________________________ involved; it is not in the machine code itself. Unfortunately, when you use pointers you must put the override into the assembler instructions yourself. The assembler has no way of knowing that you want an override. This can cause some truly gigantic errors (if you reference a pointer seven times and forget the override once, the 8086 will access the wrong segment that one time), and those errors are extremely difficult to detect. As you can see from above, you put the override in the instructions by writing the appropriate segment (CS, DS, ES or SS) followed by a colon. As always, it is your responsibility to make sure that the segment register holds the address of the appropriate segment before using an override. We have talked about two different types of constants in the chapter, a constant which is part of the address: mov ax, [bx+17] add [si+2190], dx and [di-8179], cx and a constant which is a number to used for an arithmetical or logical operation: add ax, 17 sub dl, 45 add dx, 22187 They are both part of the machine instruction, and are unchangeable (true constants). This machine code is going to be difficult to read, so just look for (1) the constant DATA and (2) the constant in the ADDRESS. All constants in the assembler instructions are in hex so that they look the same as in the listing of the machine code. Here's a listing of different combinations. 1. Pointer + constant as an address: MACHINE CODE ASSEMBLER INSTRUCTIONS 01 44 1B add [si+1Bh], ax 29 85 0A04 sub [di+0A04h], ax 30 5C 1F xor [si+1Fh], bl 20 9E 1FAB and [bp+1FABh], bl 2. Arithmetic instruction with a constant: MACHINE CODE ASSEMBLER INSTRUCTIONS 05 1065 add ax, 1065h 2D 6771 sub ax, 6771h 80 F3 37 xor bl, 37h 80 E3 82 and bl, 82h 3. Pointer + constant as an address; arithmetic with a constant The PC Assembler Tutor 118 ______________________ MACHINE CODE ASSEMBLER INSTRUCTIONS 81 44 1B 1065 add [si+1Bh], 1065h 81 AD 0A04 6771 sub [di+0A04h], 6771h 80 74 1F 37 xor [si+1Fh], BYTE PTR 37h 80 A6 1FAB 82 and [bp+1FABh], BYTE PTR 82h You will notice that the ADD instruction (as well as the other instructions) changes machine code depending on the complete format of the instruction (byte or word? to a register or from a register? what addressing mode? is AX one of the registers?). That's part of the 8086 machine language encoding, and it makes the 8086 machine code extremely difficult to decipher without a table listing all the options. OFFSET AND SEG There are two special instructions that the assembler has - offset and seg. For any variable or label, offset gives the offset from the beginning of the segment, and seg gives the segment address. If you write: mov ax, offset variable1 the assembler will calculate the offset of variable1 and put it in the machine code. It also signals the linker and loader; if the linker should change the offset during linking, it will also adjust this number. If you write: mov dx, seg variable1 The assembler will signal to the linker and the loader that you want the address of the segment that variable1 is in. The linker and loader will put it in the machine code at that spot. You don't need to know the name of the segment. The linker takes care of that. We will use the seg operator later. LEA LEA (load effective address) is a completely different animal. It allows you to use any addressing mode to put an address in a register. One of the addressing modes covered before was for the following code: xor dx, 45+[di+23][bx+15]-94 The 8086 added DI, BX and the constant to calculate the address. It then XOR'ed the variable at that address with DX. If you write: lea dx, 45+[di+23][bx+15]-94 the 8086 will add DI, BX and the constant to calculate the address. It will then put the ADDRESS in DX. LEA can use any Chapter 11 - Addressing Modes 119 _____________________________ addressing mode to calculate an address. The machine code looks almost the same: MACHINE CODE ASSEMBLER INSTRUCTIONS 33 51 F5 xor dx, 45+[di+23][bx+15]-94 8D 51 F5 lea dx, 45+[di+23][bx+15]-94 The first byte of the machine code is the instruction and the second and third byte are the addressing mode. You almost never need LEA. It is slower than: mov dx, offset variable1 However, when the addressing gets complicated (perhaps 1% of the time), it's nice to have. Remember, it will calculate ANY 8086 addressing mode. Let's run a program so we can see what actually happens with LEA ;lea.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE variable1 dw ? ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + START CODE BELOW THIS LINE ; reg style mov si_byte, 1 ; signed lea ax, ax_byte call set_reg_style mov bp, 0 ; clear unused registers mov di, 0 ;lea and mov show the two ways to address variable1 lea ax, variable1 ; effective address mov bx, offset variable1 ; offset call show_regs_and_wait lea_loop: mov si, 0 ; clear registers mov dx, 0 mov cx, 0 mov bx, 0 mov ax, 0 call show_regs call get_unsigned ; unsigned for bx mov bx, ax mov ax, 0 ; blank ax call show_regs call get_signed ; signed for si mov si, ax mov ax, 0 ; blank ax The PC Assembler Tutor 120 ______________________ lea cx, [bx+si]+100 ; addresses to cx and dx lea dx, [si+bx-100] call show_regs_and_wait jmp lea_loop ; + + + + END CODE ABOVE THIS LINE The first part of the program shows that LEA and MOV give the same offset address. Then we enter the loop. It gets an unsigned number, puts it in BX, gets a signed number, puts it in SI, then uses LEA to calculate [bx+si+100] and [bx+si-100]. The plus and minus 100 is simply to show you a difference of 200 in the two results. BX and SI could also have contained (1) both signed numbers or (2) both unsigned numbers. It doesn't make any difference. This program has a signed and an unsigned number for variety. Of special interest to you shold be when [bx+si] is within 100 of 65536 (or 0). One of the results will be > 0 while the other result will be < 65536 The address value wraps around from 65535 -> 0. Note that with minor alteration, this program can be used to look at ANY addressing mode that uses pointers. You should make two executable files for this. First: link lea+asmhelp and the second: link asmhelp+lea Give them different names and run them. Note the offset values for: lea ax, variable1 mov bx, offset variable1 With lea+asmhelp you should have an offset of 8 for variable1 since there are 8 bytes in the array (ax_byte, bx_byte, etc.). This array appears before variable1 in the data segment. When you link it the other way (asmhelp+lea), all the data for asmhelp.obj is in front of your data and the offset should be something completely different for variable1. Chapter 11 - Addressing Modes 121 _____________________________ SUMMARY These are the natural (default) segments of all addressing modes: (1) DS variable + (constant) [bx] + (constant) [si] + (constant) [di] + (constant) [bx+si] + (constant) [bx+di] + (constant) (2) SS [bp] + (constant) [bp+si] + (constant) [bp+di] + (constant) Where the constant is optional. Segment overrides may be used. The segment overrides are: SEGMENT OVERRIDE MACHINE CODE (hex) CS: 2E DS: 3E ES: 26 SS: 36 OFFSET The reserved word 'offset' tells the assembler to calculate the offset of the variable from the beginning of the segment. mov ax, offset variable2 SEG The reserved word 'seg' tells the assembler, linker and loader to get the segment address of the segment that the variable is in. mov ax, seg variable2 LEA LEA calculates an address using any of the 8086 addressing modes, then puts the address in a register. lea cx, [bp+di+27] Chapter 12 - Multiple Word Arithmetic I ======================================= 122 Let's review the LOOP instruction. We often want to repeat an action a specific number of times. In a FOR loop, we write: FOR I = 1, 10 That means we want to repeat the code that follows ten times. The 8086 has an instruction for this, called the loop instruction. It looks like this: mov cx, 10 label17: ... (a bunch of code) ... loop label17 The count MUST be in the CX register. This is the only register you can use for this. When the machine sees the loop instruction, it decrements the CX register by one, LEAVING ALL FLAGS ALONE, and if the result is not zero, it loops back to the label. If the result is zero, it falls through the loop. One problem we might have with this instruction is if you enter it with CX = 0, it is going to loop 65,536 times. Intel provided a second instruction to avoid this - JCXZ (jump if CX is zero). You put it right before the loop for insurance. jcxz label19 label17: ... (a bunch of code) ... loop label17 label19: Obviously, in our first example this instruction is not necessary because we set CX to 10 just before entering the loop. If you have seen the list of 8086 instructions, you will have noticed lots of strange looking add and subtract instructions. Why are they there? In this chapter we will look at ADC and SBB. The others will come in later chapters. How do engineers decide what a reasonable size for a number is? When they started making 8 bit machines (the maximum unsigned number is 255) did they go out on the streets and take a poll to find out if 255 was the maximum number that people used? No, they didn't even think about what people needed. It was a question of what the technology would allow at that time. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 12 - Multiple Word Arithmetic I 123 _______________________________________ Similarly, at the time the 8086 was engineered, 16 bits was pushing the limits of the technology. But 65,535 doesn't really cut the mustard. If those are 65,535 pennies, that isn't even your yearly food bill, let alone the cost of housing. The 8086 instructions give us the option of making integers of whatever size we want. Because it is done in software, it is slower, but if we are doing thousands or tens of thousands of additions instead of hundreds of thousands or millions, it won't be a great inconvenience.{1} ASMHELP.OBJ is set up to input 2 word (4 byte) and 4 word (8 byte) numbers so those are what we are going to use. 4 byte numbers are up to +/- 2 billion and 8 byte numbers are up to 9X10exp18. Those should be large enough. The first instruction is ADC, add with carry. When you add by hand, you add everything in the right column, then carry to the next column left, repeating this over and over. We don't need to do it column by column, but it is necessary to do it word by word. In the data section we need: variable1 dq ? variable2 dq ? and the code for a 4 word (8 byte) addition is the following: lea si, variable1 lea di, variable2 mov ax, [di] ; first addition add [si], ax pushf ; save the flags mov cx, 3 add si, 2 ; go to next higher word add di, 2 long_add_loop: ; next three additions mov ax, [di] popf ; restore the flags adc [si], ax pushf ; save the new flags add di, 2 add si, 2 loop long_add_loop popf ; pop the flags off the stack ADC is the same as ADD, but it looks at the carry flag - if the carry flag is 1, it adds 1 to the result; if the carry flag is zero, it does nothing to the result. This works for both signed and unsigned numbers. If you don't believe it, you should go back ____________________ 1 As a benchmark, it took a moderately slow PC 6.5 seconds to do 100,000 eight byte additions. The same PC can do over a million two byte additions in 6 seconds. The PC Assembler Tutor 124 ______________________ to the introductory chapter with the base 10 machine and look at long additions. Notice PUSHF and POPF. These are special instructions called push flags and pop flags. Rather than pushing an arithmetic register on the stack, pushf pushes the register containing the flags. Popf pops them back into the flags register. This is necessary because ADC looks at the carry flag and the ADD instructions in the loop: add di,2 add si,2 might change the carry flag. The last POPF after the loop is to get it off the stack (anything we put on the stack we need to take off the stack). The first addition is a normal addition, the last three take the carry into account. We are moving the pointers a word at a time. Because the 8086 doesn't allow both operands to be in memory, we need to move one into a register. After the addition is performed, the result is in memory. We can discard what is in the register. Notice that the first half of the code looks almost the same as the code inside the loop. If we could only use ADC instead of ADD, we could put the first addition inside the loop. It is possible to do this. There is another instruction, CLC, which clears the carry flag. Recall that if the carry flag is 0, ADC does nothing different from ADD. Therefore, we can have: lea si, variable1 lea di, variable2 mov cx, 4 ; 4 additions in loop clc ; set cf to zero pushf ; save the flags long_add_loop: mov ax, [di] ; word to a register popf ; restore the flags adc [si], ax ; register + memory pushf ; save the flags add di, 2 add si, 2 loop long_add_loop popf ; pop the flags off the stack It's the same code. The number of loops was increased from 3 to 4, and the carry flag was cleared to insure that the first addition would have nothing extra added. Here is the basic program. ; - - - - - PUT DATA BELOW THIS LINE variable1 dq ? variable2 dq ? ; - - - - - PUT DATA ABOVE THIS LINE Chapter 12 - Multiple Word Arithmetic I 125 _______________________________________ ; - - - - - PUT CODE BELOW THIS LINE call show_regs outer_loop: lea ax, variable1 ; get the variables call get_signed_8byte call print_signed_8byte lea ax, variable2 call get_signed_8byte call print_signed_8byte lea si, variable1 ; set the pointers lea di, variable2 mov cx, 4 ; loop 4 times (4 words) clc ; clear the cf pushf ; save the flags long_add_loop: mov ax, [di] ; word to a register popf ; restore the flags adc [si], ax ; register + memory pushf ; save the flags add di, 2 add si, 2 loop long_add_loop popf ; restore the flags lea ax, variable1 call print_signed_8byte jmp outer_loop ; - - - - - PUT CODE ABOVE THIS LINE The calls to get_signed_8byte are followed by print_signed_8byte. This is so you can see what you have actually typed in. It will be neat and with commas, so it will be much easier to read. Everything else is the same as before. As an aside, let's talk about commas. Though we can get along without commas if we have a 4 digit number, There is no reason to do without them when using larger numbers. I find printer output that has 15 digit floating point numbers without commas not only hard to read but also silly. It's not that the computer is incapable of putting in commas, it's that the prople who wrote the programs couldn't be bothered with doing 2 hours of work to save the users lots and lots of aggrivation. Therefore, for all large number input (that is, larger than 1 word) you can use commas. They don't even have to be in the right place, the computer ignores them. All the following are the same number: 723469746 723,4,69746 72,3,46974,6 The PC Assembler Tutor 126 ______________________ 7,2,3,4,6,9,7,4,6 723,469,746 The program strips all commas out of the line and then looks at the input. All large output has the commas in the right spot. In order to enlist you in the crusade to stamp out unreadable input and output, the summary at the end of this chapter has the C code necessary for stripping commas. This is my contribution to world culture. If you have played with the signed addition program, all we need to do to make it unsigned is to change all the get_signed_8byte calls to get_unsigned_8byte calls. Change the print calls to unsigned also. Run the program. Now, let's try subtraction. How much code do we have to alter to make it a subtraction program? The answer is - one word. Just change the ADC (add with carry) to SBB (subtract with borrow). SBB learns from the CF flag whether the last subtraction had to borrow, and tells the next subtraction via the CF flag whether it has had to borrow. Change it, and try it out. (Yes, really do it, don't just pretend that you are going to do it). Why does this program work with exactly 8 bytes? Because we tell the loop via CX that it is 4 words long. If we put 2 in CX, the loop will think that the number is 2 words (4 bytes) long, and if we put 17 in CX, the loop will think that the number is 17 words (34 bytes) long. In fact, this little snippet of code can do either signed or unsigned addition or subtraction of any number of words simply by altering one number and one word of code. The code as it stands has only one shortfall. Remember from our earlier subtraction that we might want to have an INTO (interrupt on overflow) instruction after signed addition and subtraction. It needs to be after the last addition (or subtraction). All we need to do is put it after the POPF just below the loop. At this point the flags show the condition right after the last addition or subtraction: loop long_add_loop popf ; pop the flags off the stack into ; interrupt if overflow is set Chapter 12 - Multiple Word Arithmetic I 127 _______________________________________ SUMMARY ADC (add with carry) is used for multiple word arithmetic. It adds two numbers along with plus the value in CF, the carry flag (1 if the flag is set and 0 if the flag is cleared). CLC (clear the carry flag) should be used to clear CF before the first addition. After the addition, the flags register should be pushed in order to store CF unless it is certain that CF will not be effected by the rest of the loop. popf adc [di], ax pushf SBB (subtract with borrow) is used for multiple word arithmetic. It subtracts one number from the other and subtracts the value in CF to account for any borrows from the right. CLC (clear the carry flag) should be used to clear CF before the first subtraction. After the subtraction, the flags register should be pushed in order to store CF unless it is certain that CF will not be effected by the rest of the loop. popf sbb [di], ax pushf These operations have the typical 5 possibilities: 1) register, register 2) register, memory 3) memory, register 4) memory, constant 5) register, constant You may manually control the value in CF with STC and CLC. STC (set the carry flag) sets the value to 1, while CLC (clear the carry flag) sets the value to 0. These are used for initial settings of multiple word operations. In order to store the values in the flags register you use PUSHF (push the flags) until you need them again. At that time you can get them back with POPF (pop the flags). The PC Assembler Tutor 128 ______________________ STRIPPING COMMAS IN C In order to get rid of commas, you need some discipline in what kind of data entry you have. Specifically, you can't enter large numbers on the same line as text strings because text strings are likely to have commas that should be kept. This is hardly a major restriction. You can have as many numbers as you want on the same line since in C you must have whitespace between pieces of data. We will take all commas out of the line. You can retrofit most old programs with almost no change. The only time that you need this capability is if you are getting something from the keyboard, and what you probably have in the program is: scanf ( "format string", variables ); The method is (1) import a text string as a single string, (2) strip the commas, and (3) use sscanf instead of scanf. char buffer[80] ; fgets (buffer, 80, stdin) ; strip_commas (buffer); sscanf (buffer, "format string", variables) ; Both "format string" and the variables remain unchanged when you switch from scanf to sscanf. You might want to check for EOF with fgets. Heres the program: strip_commas (buffer) char *buffer ; { char *ptr1, *ptr2 ; ptr1 = ptr2 = buffer ; while (1) { if ( *ptr2 == ',') /* skip commas */ { ptr2++ ; continue ; } /*move, increment, and check for 0 */ if (!(*ptr1++ = *ptr2++)) /* this is '=', not '==' */ break ; } return ; } Chapter 13 - Multiple Word Arithmetic II ======================================== 129 We have just done multiple word addition and subtraction, which are easy. Now we have multiplication and division. We are going to multiply and divide long numbers by a one word (2 byte) number. Multiplying multiple-word numbers by multiple-word numbers is complex and time consuming but can be done. Dividing by a multiple-word number is an entirely different ballgame.{1} We'll do unsigned numbers first, then in a later chapter add the code we need for signed numbers. The core routine is the same. UNSIGNED MULTIPLICATION If you multiply an n digit number by an m digit number, there is a possibility of n+m digits in the result. 863 is 3 digits, 4975 is 4 digits, 863 X 4975 = 4,293,425 is 7 digits = 4 + 3. We will be multiplying an 8 byte number by a 2 byte number, so we'll need 10 bytes for the possible maximum result. Here's the code: ; - - - - - - - - ENTER DATA BELOW THIS LINE multiplicand dq ? multiplier dw ? result db 10 dup (?) ; - - - - - - - - ENTER DATA ABOVE THIS LINE ; - - - - - - - - ENTER CODE BELOW THIS LINE outer_loop: lea ax, multiplicand ; load multiplicand call get_unsigned_8byte call print_unsigned_8byte call get_unsigned ; unsigned word to multiplier mov multiplier, ax lea si, multiplicand ; load pointers lea bx, result mov cx, 4 ; number of words sub di,di ; clear di mult_loop: ____________________ 1 For those of you with a hankering for large multiplication and division, I have included subroutines which can multiply and divide numbers of any length in a file called MISHMASH.DOC. It is in \EXTRAFILE. You will need to finish all the chapters before looking at it, since it uses things that you don't know about yet. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 130 ______________________ mov ax, [si] ; multiplicand to ax mul multiplier ; {2} add ax, di ; high word from last multiplication jnc store_result inc dx ; {3} store_result: mov [bx], ax ; store 1 word of result. mov di, dx ; save high word for next multiplication add si, 2 ; increment pointers add bx, 2 loop mult_loop mov [bx], di ; move last word of result mov ax, [bx] call print_hex lea ax, result call print_unsigned_8byte jmp outer_loop ; - - - - - - - - ENTER CODE ABOVE THIS LINE There are two different input calls, an 8 byte one and a 2 byte one. Inside the loop we store the high word from the multiplication in DI and then add it to the next result. This is the same as when you multiply single digits in base 10 (9 X 7 = 63 carry the 6). Note that when you add DI, there can be a carry from AX to DX, but there can be no carry out of DX. After we drop out of the loop, we need to put the last word in result. We take it from DI, but we could take it from DX if we wanted. Finally, the printing. Print_unsigned_8byte can't print the whole result, so we are printing the high word in hex form. If those top two bytes are non zero, what 'print_unsigned_8byte' prints will be incorrect because it is missing the top 2 bytes. Note once again that the only thing constraining this program to an 8 byte number is the 4 that we put in CX - change that number and you can do any size number that you want. Run a bunch of numbers through this, including a couple that have more than a 20 digit result. UNSIGNED DIVISION Division is done the same way in the software as it is done with pencil and paper, starting at the left and working right. On the computer, this means starting with the high order word and working down. ____________________ 2 It would be about 3% faster to have this in a register, but unfortunately we are out of registers. 3 Do we need to check DX for a carry here? No. The maximum multiplication is FFFFh X FFFFh. The result is FFFE 0001h. That means that DX is a maximum FFFEh. If you add one, that's FFFFh, and no carry occurs. Chapter 13 - Multiple Word Arithmetic II 131 ________________________________________ ; - - - - - - - - ENTER DATA BELOW THIS LINE dividend dq ? divisor dw ? quotient dq ? remainder dw ? ; - - - - - - - - ENTER DATA ABOVE THIS LINE ; - - - - - - - - ENTER CODE BELOW THIS LINE outer_loop: lea ax, dividend ; get dividend call get_unsigned_8byte call print_unsigned_8byte call get_unsigned ; get divisor mov divisor, ax lea si, dividend + 6 ; start at the top lea bx, quotient + 6 mov di, divisor mov cx, 4 ; number of words sub dx, dx ; clear dx for first division division_loop: mov ax, [si] ; dividend word to ax div di ; {4} mov [bx], ax ; word of result to quotient sub si, 2 ; decrement the pointers sub bx, 2 loop division_loop mov remainder, dx mov ax, remainder call print_unsigned lea ax, quotient call print_unsigned_8byte jmp outer_loop ; - - - - - - - - ENTER CODE ABOVE THIS LINE That's it? Yup. The division instruction is designed to work effeciently and simply. We start with the most significant digits, divide, put the quotient in the variable "quotient", ____________________ 4 After this division, the quotient is in AX and the remainder is in DX. Say, aren't we going to do anything with the remainder? There's nothing in the code about DX until we get out of the loop. In fact, we ARE doing something with the remainder. Just like division with pencil and paper, when you have a remainder, you bring it down to the left of the next digits you are going to divide. These get divided the next time around. But we don't need to move the remainder because it's already in the right place. Pretty snappy, huh? You don't need to move anything; it all takes care of itself. The PC Assembler Tutor 132 ______________________ DECREMENT the pointers, and get the next word for division. After the final division, we have the remainder left in DX, so we move it to the variable "remainder". The final instructions print the remainder and the quotient. Notice that we don't need to touch the remainder during the entire operation. The 8086 leaves it exactly where it needs to be for the next division. Using the DX register when you have single word division seems screwy, but using DX for multiple word division is both natural and elegant. The Intel people made one instruction do the work of both. Remember from the earlier chapter on division that you can get a zero divide error if the quotient is larger than 65535. Is it possible to get a quotient larger than 65535 in this routine? NO. It is impossible to get a zero divide on anything other than a zero.{5} Run the program and do several examples. You can even do a 0 divide if you feel like interrupting the program. SIGNED NUMBERS For byte or word signed multiplication and division, the 8086 changes the signed numbers into unsigned numbers, does unsigned multiplication/division, then adjusts for sign. For long numbers, we have to do these operations ourselves, so we need three sections of code. (1) change the numbers into unsigned numbers, (2) do unsigned multiplication/division and (3) adjust the signs. The routines that we have here are part two of this scheme. It will be easier to implement this once you know about subroutines, so signed division and multiplication will have to wait till later. ____________________ 5 This is technical, so if you start getting lost, don't worry about it. How do we know that it's impossible? What we are putting in DX is the remainder (R). R is always less than the divisor (D). Let Q be the number in AX the next time around. What we are dividing is: ((R*65536) + Q ) / D <= (( R*65536 ) + 65535 ) /D since Q is less than to or equal to 65535. This is the maximum. ( ((R*(65535 + 1)) + 65535 ) /D = (((R+1) * 65535) + R)/D (huh?) = ((R+1) * 65535)/D + R/D Let's do a few examples: if D = 1, R < D so R = 0 max. = (1*65535)/1 + 0/1 = 65535 rem 0 where rem = remainder. If D = 2, R < D so R = 1 max. = ((1+1)*65535)/2 + 1/2 = 65535 rem 1 If D = 3, R < D so R = 2 max. = (2+1)*65535)/3 + 2/3 = 65535 rem 2 See a pattern here? R/D < 1, so the quotient can never be 65536. The maximum will always be 65535 with the remainder 1 less than the divisor. If you aren't a techie, ignore all this. ________________________________________ SUMMARY For both signed and unsigned numbers, multiple word division and multiplication are based on an unsigned number routine. Signed numbers are changed into unsigned numbers, the operation is performed, and the signs of the results are adjusted. Multiplication is done the same as for single words except that the high word from one result is saved and added to the low word of the next result, thus adding the two partial results. If this addition gives a carry, DX must be incremented. Division operates from left to right. For the first division, DX is zeroed. After that it always contains the remainder from the last division. The quotients in AX are moved to memory one by one. At the end, the final remainder will be in DX. Chapter 14 - Zoom ================= 134 There are only a couple of reasons for working at the assembler level. Perhaps you're curious about how the PC functions at the machine level. Maybe you want to optimize a time consuming section of code. Maybe you want to work easily with the DOS function calls and the BIOS calls (which will be introduced in a later chapter). Or maybe you want raw speed. Every time that I enter: >dir /w and watch DOS meander its way down the screen at a leisurely pace, I think "Can't the computer go any faster?" Let's find out. This is going to be a short chapter. Make a copy of template.asm, and we'll call it zoom.asm. This is going to overwrite the entire screen 200 times.{1} Before we write to the screen, we need to know where the screen is. When IBM designed the structure of the PC, they decided to put the memory for the monochrome card in one place and the memory for the color card in another place - apparently they thought you might want to have both a color monitor and a monochrome monitor running at the same time. The color monitor is in segment 0B800 at offset 0, and the monochrome monitor is in segment 0B000 at offset 0. We need to put the correct number into the program, so you need to know whether you have a color card or a monochrome card. If one segment number doesn't work, you can try the other. TEMPLATE.ASM ; - - - - - - - - - - START CODE BELOW THIS LINE call show_regs_and_wait ; {2} ; we are about to start ; this marks the time mov ax, 0B800h ; color seg. 0B000h is mono seg mov es, ax ; es is at video card segment mov cx, 2 ; do this whole thing twice outer_loop: ____________________ 1 It is now time for you to go out and get a book about the internal structure and i/o interface of the PC. One good book is "The Peter Norton Programmer's Guide To The IBM PC", by guess who? It is clearly written and a good introduction. Another quality book is "DOS Programmer's Reference" by Terry Dettmann. It is very systematically laid out and is more techie oriented. 2 We need to initialize the video card to make sure it is in the right place. This is the easiest way. The registers themselves mean nothing to us. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 14 - Zoom 135 _________________ push cx ; save for outer loop instruction mov cx, 100 ; 100 repeats ; zoom_loop count mov al, '0' ; 100 characters starting at '0' mov ah, 07h ; black background, white letters zoom_loop: ; draw the screen 100 times push cx mov cx, 2000 ; 80 X 25 screen is 2000 words long mov si, 0 ; start at offset 0000. inner_loop: ; inner loop - fill the screen - 2000 words mov es:[si], ax add si, 2 loop inner_loop inc al ; next higher ASCII character pop cx ; zoom_loop count loop zoom_loop pop cx ; outer_loop count loop outer_loop call get_continue ; finished - this is for timing ; - - - - - - - - - - END CODE ABOVE THIS LINE Show_regs_and_wait resets the video card, so we use it to make sure the video memory is set at offset 0000. It then waits for ENTER. At the end, get_continue waits for ENTER.{3} This way you can mark the beginning and the end in order to time it. We set the ES segment to the video segment. This is different depending on whether you have a monochrome card or a color card. The monochrome segment is at 0B000h and the color card is at 0B800h. You need to know which kind of card you have so you can put the right number into ES via AX. Since the normal segment for SI is the DS segment, we need to put in a segment override to use SI with the ES segment. We start with the ASCII character '0' and then do the next 99 characters in increasing ASCII sequence. Technically, ASCII characters end at 127, but the PC extended characters go up to 255, so we will start with ASCII 48d and end with ASCII 147d. The zoom_loop changes the character 100 different times, and the inner_loop fills the screen. That 07h in AH means that we will have white characters on a black background. The outer loop has a count of 2. This should be adjusted. If you have a medium speed machine make it 4 (400 screen fills) and if you have a high speed machine make it 8 (800 screen fills). ____________________ 3 That is its mission in life. It is there so that you can time blocks of code. If you want to find out how long some code takes, repeat it 10,000 (or whatever is appropriate) times and put get_continue in front of it and behind it. That way you can control the start and mark the finish. When this is done, the screen will be filled with characters so you will need to use the DOS command > cls to clear the screen for anything else. When it is all assembled and linked, get a cup of coffee and a wristwatch with a second hand and we'll time it. Divide by 200 (or 400 or 800) to find out how long it takes to fill one screen. If the characters are not on your screen, you probably have the wrong segment address for your video card, so try the other one. Are you ready to time it? Then ready, set, go. Hmmm. On my machine 200 repeats takes about 4 seconds, while the dir command takes about 1 second for one screen. That means that zoom.asm is about 50 times faster. That's the difference between someone running a 4 minute mile and someone running a 3hr. 20min. mile. This is one of the reasons people like to work at the assembler level. Chapter 15 - Subroutines ======================== 137 It is now time to talk about subroutines. If you have only used BASIC this may be difficult for you. It is assumed that you are familiar with subroutines and use them constantly in your programming. You have been using subroutines since the very first program in this manual. When you wrote: call get_num you called a subroutine in ASMHELP.OBJ. Now you are going to write subroutines yourself and have them call each other. There are different template files for programs with subroutines. They are SUBTEMP1.ASM and SUBTEMP2.ASM. We will start with SUBTEMP1. It has the entry subroutine and a space for additional subroutines. The entry subroutine is the subroutine where the operating system starts the program; it does the necessary initialization and has special code for that. You will see some additions to the normal template file. At the top is the line: INCLUDE \pushregs.mac What this is will be explained later, but you must put the file PUSHREGS.MAC in the root directory of your current drive. You will find it in the \TEMPLATE subdirectory. At the end of the SUBTEMP1.ASM is: ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE This is where you will write all the subroutines except the entry subroutine which is still the same as before. All data for all subroutines still goes in the DATASTUFF segment. Our first program will just call subroutines which will print out messages. Using SUBTEMP1.ASM, it looks like this: ;prog1.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE main_message db "This is the entry routine.", 0 sub1_message db "This is subroutine1.", 0 sub2_message db "This is subroutine2.", 0 sub3_message db "This is subroutine 3.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 138 ______________________ ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE mov ax, offset main_message call print_string call sub1 mov ax, offset main_message call print_string ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE ;------------ sub1 proc near push ax mov ax, offset sub1_message call print_string call sub2 mov ax, offset sub1_message call print_string pop ax ret sub1 endp ;------------ sub2 proc near push ax mov ax, offset sub2_message call print_string call sub3 mov ax, offset sub2_message call print_string pop ax ret sub2 endp ;------------ sub3 proc near push ax mov ax, offset sub3_message call print_string pop ax ret sub3 endp ; ---------- ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE The data consists of messages to be printed by print_string. Print_string prints a zero terminated string (the number zero, not the character '0'), so there must be a zero after each message in the data segment. The entry subroutine prints a message and then calls sub1, the first subroutine, which prints a message and calls sub2 which prints a message and calls sub3. Chapter 15 - Subroutines 139 ________________________ Sub3 prints a message and then returns to sub2 which prints a message and returns to sub1 which prints a message and returns to the entry routine which prints a message and then exits. This program should print 7 messages in all. You will notice that the first thing that each subroutine does is save the value in AX, since it uses the AX register. This is the cardinal rule of robustness at the assembler level. IF YOU USE A REGISTER, YOU MUST SAVE ITS VALUE BY PUSHING IT ON THE STACK; YOU MUST THEN RESTORE THE VALUE JUST BEFORE EXITING. It is impossible to overstress this. The routines which call your routine might rely on the registers remaining unaltered. If you disobey this rule and alter the registers, you'll be sorry. Why doesn't the entry routine push and pop the registers it uses? Well, the operating system assumes the registers will contain trash upon return from the program, so it uses nothing in the data registers. All the subroutines except the entry routine are near routines. We will only use near routines. Assemble this program, link it and run it. If it works ok, it is then time for program 2, which is the same as program1, but is in two files. Often, we want parts of a program in different files. Perhaps parts are standard subprograms which you have already written and assembled, perhaps the total program is too large to be handled comfortably in one file, perhaps different people are writing different parts of the program. Not only must we write the programs, but we must be able to connect them. We will put the entry routine, sub2 and the associated data in subtemp1.asm. We will put sub1, sub3, and the associated data in subtemp2.asm. Take a look at SUBTEMP2.ASM. It is slightly different. First, it does not have the variables that you need for set_reg_style (ax_byte, bx_byte, etc.) but it does have EXTRN statements for them. This means that you can change the register style from this file. SUBTEMP1.ASM has these variables declared PUBLIC so the linker can join them correctly.{1} We will talk about the correct way to declare external data later. SUBTEMP2.ASM has no stack segment, though there could be one. There is no entry subroutine. Therefore at the very end, you have the line: END with nothing after it. In SUBTEMP1.ASM, you have ____________________ 1. The reason for having only one set of variables for the style is so that every time you change one of the style variables, the array is updated. If you had two different arrays you could have two different sets of information for set_reg_style. The PC Assembler Tutor 140 ______________________ END start so the assembler and linker know that the program begins at the label "start". Let's do the two programs. Here are the data, the entry code and the subroutine code from the first file. ;prog1.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE main_message db "This is the entry routine.", 0 sub2_message db "This is subroutine2.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE PUBLIC sub2 EXTRN sub1:NEAR, sub3:NEAR mov ax, offset main_message call print_string call sub1 mov ax, offset main_message call print_string ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE sub2 proc near push ax mov ax, offset sub2_message call print_string call sub3 mov ax, offset sub2_message call print_string pop ax ret sub2 endp ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE Notice that sub1 and sub3 have been declared EXTRN before they were referenced, and the EXTRN statement tells the assembler that they are both near procedures. sub2 has been declared PUBLIC so the assembler will give the address of sub2 to the linker. Here's the data and code for the other file. ;prog2.asm ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE sub1_message db "This is subroutine1.", 0 sub3_message db "This is subroutine 3.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE Chapter 15 - Subroutines 141 ________________________ PUBLIC sub1, sub3 EXTRN sub2:NEAR ;------------ sub1 proc near push ax mov ax, offset sub1_message call print_string call sub2 mov ax, offset sub1_message call print_string pop ax ret sub1 endp ;------------ sub3 proc near push ax mov ax, offset sub3_message call print_string pop ax ret sub3 endp ; ---------- ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE Here sub1 and sub3 have been declared PUBLIC and sub2 has been declared EXTRN. Assemble both programs and then link all three. link prog1+prog2+\asmhelp.obj assuming that asmhelp is in the root directory. Run it. You should have the same results as before. We are going to do one more thing with the same two files. Without changing any of the code, we are going to put the data for prog1 in prog2 and the data for prog2 in prog1 like this. ;prog1 ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE sub1_message db "This is subroutine1.", 0 sub3_message db "This is subroutine 3.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ;prog2 ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE main_message db "This is the entry routine.", 0 sub2_message db "This is subroutine2.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE So far, so good. Obviously we are going to need some more PUBLIC The PC Assembler Tutor 142 ______________________ statements and some EXTRN statements so the linker can link the four messages, but where do they go and what do they look like? The PUBLIC statements are the easiest. Put them in the segment where the message data appears, either before or after the data declaration. The EXTRN statement is a little more complicated. First, all data is declared EXTRN by giving the variable name followed by a colon followed by its data type. The data types are BYTE, WORD, DWORD (4bytes), QWORD (quadword or 8 bytes), and TBYTE (10 bytes). These are the standard 8086/7 data sizes. Therefore we have: EXTRN sub1_message:BYTE, sub3_message:BYTE in prog2.asm and: EXTRN main_message:BYTE, sub2_message:BYTE in prog1.asm. Where do they go? In order to know that, we need to talk about segment overrides again. You will remember from our discussion of the ASSUME statement that every time the assembler writes code with a variable, it checks the ASSUME statements to see which segment register(s) have the address of the segment that that variable is in. If we have: ASSUME cs:SEG1, ds:SEG2, es:SEG3, ss:SEG4 then if variable1 is in SEG2, the assembler will write no override in the code since DS is the 8086 default segment. MACHINE CODE ASSEMBLER INSTRUCTION A1 0000 mov ax, variable1 If variable1 is in SEG1 or SEG3 or SEG4, the assembler will write the appropriate segment override in the code. MACHINE CODE ASSEMBLER INSTRUCTION 2E: A1 0000 mov ax, variable1 26: A1 0000 mov ax, variable1 36: A1 0000 mov ax, variable1 (By the way, those zeros just mean that the variable is at 0000 offset from the beginning of the segment). The same thing happens when you have an EXTRN statement. The assembler associates the externally declared variable with the segment it is declared in. When the variable is used, it then goes through the same actions as if the variable were actually in that segment. Let's declare variable5 external with: EXTRN variable5:WORD If we have: Chapter 15 - Subroutines 143 ________________________ ASSUME cs:SEG1, ds:SEG2, es:SEG3, ss:SEG4 then if variable5 is declared external in SEG2, the assembler will write no override in the code since DS is the 8086 default segment. MACHINE CODE ASSEMBLER INSTRUCTION A1 0000 E mov ax, variable5 If variable5 is declared external in SEG1 or SEG3 or SEG4, the assembler will write the appropriate segment override in the code. MACHINE CODE ASSEMBLER INSTRUCTION 2E: A1 0000 E mov ax, variable5 26: A1 0000 E mov ax, variable5 36: A1 0000 E mov ax, variable5 The "E" after the machine code means that the assembler knows that the variable is external and it will tell the linker so the linker can put the correct offset address at that point in the machine code. Remember, as always, that it is your responsibility to have the correct segment address in the segment register before using a variable. Now we know where it goes. When you declare a variable external, you must put the EXTRN statement in a segment which uses the same segment register as the EXTRN variable is going to use. If the EXTRN variable will use DS, then the segment where the EXTRN statement is must use DS. If the variable uses ES, then the segment the EXTRN statement is in must use ES. In other words, the ASSUME statement for the segment the variable is in must match EXACTLY the ASSUME statement you would write if the variable were internal, not external.{2} Normally, this is DS, but in special circumstances you might want something else. Also, if there is no segment that exactly matches what you want, then you need to create a dummy segment: DUMMY_SEG SEGMENT EXTRN variable7:QWORD DUMMY_SEG ENDS and make the assume statement that you want: ____________________ 2. This means that if the segment with the EXTRN statement has more than one segment register in the assume statement: ASSUME ds:MORESTUFF, es:MORESTUFF then both those registers must be set to the segment of the external variable when using it or your results may be unreliable. The PC Assembler Tutor 144 ______________________ ASSUME es:DUMMY_SEG What segment has DS in an ASSUME statement? DATASTUFF in both files, so that is where the EXTRN declaration goes - in the DATASTUFF segment. ;prog1 ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE PUBLIC sub1_message, sub3_message EXTRN main_message:BYTE, sub2_message:BYTE sub1_message db "This is subroutine1.", 0 sub3_message db "This is subroutine 3.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ;prog2 ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE PUBLIC main_message, sub2_message EXTRN sub1_message:BYTE, sub3_message:BYTE main_message db "This is the entry routine.", 0 sub2_message db "This is subroutine2.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE Change the data in the two files, assemble them again and link them again: link prog1+prog2+\asmhelp.obj You should get the same results as before. We are now through with these programs. Make sure you understand how to define PUBLIC and EXTRN procedures and PUBLIC and EXTRN data before going on, since we are not going to cover it again. Everything else in this chapter will be done with a single file in order to make life easier. PASSING DATA When you pass data to the routines in ASMHELP.OBJ, you always pass it through the AX register. The reason for this is that you needed to use these routines before you knew much about 8086 assembler language. It is solely for the convenience of beginners and is totally non-standard. In the real world, when you call a subroutine you ALWAYS pass the data on the stack, no matter which language you are using. If you have the C statement: my_procedure (variable1, variable2, variable3) ; then the C compiler will generate the following code: push variable3 push variable2 push variable1 Chapter 15 - Subroutines 145 ________________________ call my_procedure{3} The C language pushes these variables in right to left order. Before the call instruction is executed variable1 is on the top of the stack, variable2 is the next down, and variable3 is third on the stack. Is variable1 still on the stack top after the call instruction is executed? No. The call instruction pushes either one or two words on the stack. Before you go any farther with subroutines you need to know how the call and return instruction operate. Every time you have used show_regs, both CS the code segment address and IP the instruction pointer have been displayed. What does IP do? When the 8086 is ready to execute an instruction, it takes IP, adds it to CS to calculate the total address, and gets the instruction at that address. It then immediately figures out how long the instruction is going to be and adds that amount to IP.{4} What this means is that at any time, IP points to the NEXT instruction, not the current instruction. When you execute a call, the 8086 changes IP to point to the first byte of the called subroutine, so the next instruction executed is the first byte of the called subroutine. There are two different types of procedures, near procedures and far procedures. In a near procedure, you keep CS, the code segment register, the same. In a far procedure you change CS. So, when you call a near procedure you change one thing (IP) and in a far procedure you change two things (IP and CS). When you want to get back from the subroutine, you need to have CS with the segment of the calling routine and IP with the address of the instruction after the call. What are the mechanics of all this? Let's take a near procedure first. In a near call, the 8086 first changes the instruction pointer to point to the next instruction. It then pushes IP on the stack, and puts the address of the called subroutine (which is in bytes 2 and 3 of the call instruction) in IP. IP now points to the called subroutine. There is one more word (2 bytes) on the stack. At the end of the called subroutine, a NEAR return (ret) pops the top word off the stack into IP. IP then points to the instruction after the call instruction. In a far call, the 8086 first changes the instruction pointer (IP) to point to the next instruction. It then pushes CS on the stack, followed by IP. It then loads the offset address of the called subroutine in IP and the segment address of the called subroutine in CS. This new IP is in bytes 2 and 3 of the call instruction and the new CS is in bytes 4 and 5 of the call ____________________ 3. You C fanatics will notice that there are some initial underscores missing. Let's not confuse the issue. 4. Instructions can vary from one byte long to six bytes long, and the 8086 can tell from the first (or first and second) byte(s) how long the total instruction will be. The PC Assembler Tutor 146 ______________________ instruction. IP and CS now have the address of the called subroutine. The stack has two words (4 bytes) more on the stack. The old IP is the stack top and the old CS is next on the stack. At the end of the subroutine, a FAR return (ret) pops the stack top into IP, then pops the next stack item into CS. Now IP and CS point to the instruction after the call instruction. These are two different types of call and they have two different machine codes. These are two different types of returns and they have two different machine codes. MACHINE CODE ASSEMBLER INSTRUCTIONS ; a far routine ;----- far_routine proc far CB ret far_routine endp ;----- ; a near routine ;----- near_routine proc near C3 ret near_routine endp ;----- ; a near and far call E8 0A43 R call near_routine 9A 015C ---- R call far_routine The machine code for a near return is C3; for a far return it's CB. The machine code for a near call is E8; for a far call it's 9A. The near call has the address of the called routine (0A43h) in the following two bytes. The far call has the address of the the called routine (015Ch) in the next two bytes followed by the segment of the called routine. The segment address isn't there yet. It will be put there by the linker and loader, but the assembler has saved the space for the address. That's why the dashes are there. Remember, the R is there because those addresses might be relocated by the linker or the loader. You tell the assembler whether to code a near return or far return by telling it whether it is a near or a far procedure. routine1 proc near routine2 proc far How does the assembler know whether to code a near or far call? If it has already seen the procedure, it knows what type it is. Chapter 15 - Subroutines 147 ________________________ If it hasn't seen it yet, it uses the default type.{5} If it is an external subroutine, the assembler knows because you have written an EXTRN statement. EXTRN routine3:NEAR, routine4:FAR This EXTRN statement should appear before the call. What if the routine appears after the call in the source file but it isn't the default type? You can override the default type. call NEAR PTR routine5 call FAR PTR routine6 This is the same cumbersome syntax that we had with pointers to data, but it's the only game in town. Normally, if the subroutine appears after the call, you don't need to do anything if it is a near call but you need to put a FAR PTR override if it is a far call. ____________________ 5. The default is near for what we are doing. However, Microsoft has something called "simplified" directives and the default changes in these cases. The PC Assembler Tutor 148 ______________________ THE STACK Up to this time we have used the stack for temporary storage. If you want to temporarily save either a register or a value in memory, you push it: push ax push variable1 and if you want to get them back you pop them: pop variable1 pop ax This is always a word (2 bytes) at a time. When you pop the stack, the 8086 gives you back the words in reverse order. Thus if you push the following: push variable1 push variable2 push variable3 push variable4 push variable5 then in order to get the data back in the same place, you need to pop in this order: pop variable5 pop variable4 pop variable3 pop variable2 pop variable1 It pops the last thing that was pushed that hasn't been popped yet. Nothing has been said about where the stack is or how it operates. It's time to change that. When the operating system starts a program, it looks for a stack segment. If the stack segment has been properly defined, the operating system puts the stack segment's segment address in SS (the stack segment register) and sets SP (the stack pointer) to point to the first byte AFTER the end of the stack segment. Exactly where this is depends on how large you have defined your stack segment. SS and SP are set, and there is nothing on the stack. When you push something: push dx the 8086 subtracts 2 from SP (making one word of space) and puts that thing at the new address in SP. SP contains the address of ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 15 - Subroutines 149 ________________________ the last thing pushed. This means that SP is decreasing, and the stack segment is filling up from back to front. In the topsy-turvy world of stacks, when you put things on the stack, the stack grows downward. What makes things especially confusing is that many book writers will picture a stack: variable1 variable2 ax dx and not bother to tell you whether the stack is growing upwards or downwards or where the stack top is. In this book, the stack TOP will always be visually on the BOTTOM. High addresses will be visually up and low addresses will be visually down. You need to get used to SP decreasing as the stack gets larger, and this is the easiest way to do it. So, if you have the instructions: push ax push variable1 push si push di after these instructions, the stack will look like this: VALUE ADDRESS ax sp + 6 variable1 sp + 4 si sp + 2 sp -> di sp + 0 When you pop a value, the 8086 moves the word (2 bytes) at SP to the appropriate location and INCREMENTS SP by 2. pop di You would now have: VALUE ADDRESS ax sp + 4 variable1 sp + 2 sp -> si sp + 0 As long as you are just using PUSH and POP, this is entirely self regulating. SS is set, and SP is modified by the 8086 without you doing anything. It is now time to get more sophisticated. In our C example: my_procedure (variable1, variable2, variable3) ; we generated the code: The PC Assembler Tutor 150 ______________________ push variable3 push variable2 push variable1 call my_procedure What will the stack look like upon entry to my_procedure? That depends on whether my_procedure is a near procedure or a far procedure. If it is a near procedure, you will have: VALUE ADDRESS variable3 sp + 6 variable2 sp + 4 variable1 sp + 2 sp -> old IP sp + 0 If it is a far procedure, you will have: VALUE ADDRESS variable3 sp + 8 variable2 sp + 6 variable1 sp + 4 old CS sp + 2 sp -> old IP sp + 0 Therefore, the variables are in different places relative to SP depending on whether it is a near or a far procedure. All examples will be with near procedures, but they are all valid for far procedures if you adjust for having the old CS on the stack. How do we access these variables? By using a pointer. We could use BX, SI or DI, but they have DS, not SS as their natural segment register. The only pointer with SS as the natural segment register is BP, the base pointer. Since we are going to use BP, we need to push its current value in order to save it: push bp The stack now looks like this: VALUE ADDRESS variable3 sp + 8 variable2 sp + 6 variable1 sp + 4 old IP sp + 2 sp -> old bp sp + 0 This is the standard way to do it and this is what the stack always looks like if you follow the standard method. The standard code for setting up the stack for access is: push bp mov bp, sp Chapter 15 - Subroutines 151 ________________________ We give BP the same value as SP, so BP also points to the top of the stack and we use BP instead of SP. We now have: VALUE ADDRESS variable3 bp + 8 variable2 bp + 6 variable1 bp + 4 old IP bp + 2 bp -> old bp bp + 0 Now, if you want to push and pop things, you can do it to your heart's content. BP will always point to the set of data that you want to work with. Let's take the average of the three variables, and print it. mov ax, [bp+4] ; add the three numbers add ax, [bp+6] add ax, [bp+8] mov dx, 0 ; prepare dx for division mov bx, 3 ; unsigned divide by 3 div bx call print_unsigned ; result is in ax We are using AX, BX, and DX, so we need to push them before doing this: push ax push bx push dx After we are done we need to (1) pop the registers and (2) restore BP. This is also a pop. pop dx pop bx pop ax pop bp ret The whole subprogram now looks like this ;----- my_procedure proc near push bp ; set up base pointer mov bp, sp push ax ; push registers push bx push dx mov ax, [bp+4] ; add the three numbers add ax, [bp+6] add ax, [bp+8] mov dx, 0 ; prepare dx for division mov bx, 3 ; unsigned divide by 3 div bx call print_unsigned ; result is in ax The PC Assembler Tutor 152 ______________________ pop dx ; pop registers pop bx pop ax pop bp ; restore old base pointer ret my_procedure endp ;------ There is only one more improvement to make. If you look at the code, it is not clear what [bp+4] refers to. We know where it is, but what is it? Therefore, we will always use EQU statements to give names to our stack variables. It will be clearer, and if you need to change the code, it is much easier to change the EQU definition than to change the stack references in the code. As usual, we follow the C convention and put EQU names in capital letters. ;----- my_procedure proc near VAR1 EQU [bp+4] VAR2 EQU [bp+6] VAR3 EQU [bp+8] push bp ; set up base pointer mov bp, sp push ax ; push registers push bx push dx mov ax, VAR1 ; add the three numbers add ax, VAR2 add ax, VAR3 mov dx, 0 ; prepare dx for division mov bx, 3 ; divide by 3 div bx call print_unsigned ; result is in ax pop dx ; pop registers pop bx pop ax pop bp ; restore old base pointer ret my_procedure endp ;------ This is a simple example, so it doesn't look that important to use the EQU statements. Just wait till you have more complex subroutines. By the way, this program does no error checking. (If the sum is > 65535 it will give the wrong answer). There is still one thing to do. When we called the subroutine, we pushed the variables on the stack: push variable3 push variable2 push variable1 Chapter 15 - Subroutines 153 ________________________ call my_procedure We now want to take them off. Do we need to pop them? No, this is trash so they go into the Great Bit Bucket. There are two ways of doing this, and this is language dependent.{1} In C, it is the STANDARD that the calling routine takes them off, and it is done this way: push variable3 push variable2 push variable1 call my_procedure add sp, 6 ; 3 variables = 6 bytes we simply INCREASE sp by the number of bytes that we pushed on the stack. Whoof, they're gone. If you use PASCAL or FORTRAN, then the CALLED routine must take the variables off the stack on return. How does it do that? There is yet another type of return statement: ret (6) ; 3 variables = 6 bytes {2} causes the 8086 to increase sp by 6 as the last thing it does before returning from the subroutine. Which method you use is decided by which high-level language you are using. MACHINE CODE ASSEMBLER INSTRUCTIONS ;----- far_routine proc far CA 001A ret (26) ; hex 1A CB ret far_routine endp ;----- ;----- near_routine proc near C2 002C ret (44) ; hex 2C C3 ret near_routine endp ;----- Here are the four different types of returns along with the machine code. Notice that the returns which increment the stack have the increment count coded in the machine code. You may have noticed that even in this first subroutine, pushing and popping the registers takes a lot of space. It is fairly ____________________ 1 And a major reason that is a real pain in keester to have multi-language programs. 2 The parentheses are not necessary. The PC Assembler Tutor 154 ______________________ normal to use 6 registers in a subroutine. This means that you will need to write: push ax push bx push cx push dx push si push di at the beginning of the subroutine and: pop di pop si pop dx pop cx pop bx pop ax before returning. This is a lot of space and it gets boring. Also, you have to remember to pop in the exact reverse order or you will screw things up. Fortunately we have two macros to help us. The file PUSHREGS.MAC has two macros, one called PUSHREGS and the other called POPREGS. A macro is a set of directions for generating additional assembler code before the file is assembled. That's why it is called the Microsoft Macro Assembler. You include \pushregs.mac at the beginning of the file, and then everytime the assembler sees the word PUSHREGS followed by register names it generates push instructions. Every time the assembler sees POPREGS followed by register names, it generates pop instructions. It generates actual text which is assembled later. The form for generating those push instructions above is: PUSHREGS ax, bx, cx, dx, si, di the word PUSHREGS followed by the registers separated by commas. (Make sure there is no comma after the last register). This must all be on one line. PUSHREGS pushes the registers in left to right order. The form for generating those pop instructions above is POPREGS ax, bx, cx, dx, si, di The registers will be popped in the REVERSE order to the way they are listed on the line, that is, in RIGHT TO LEFT order. Notice that the order of registers is the same for both PUSHREGS and POPREGS. This is so that you may write the push part: PUSHREGS ax, bx, cx, dx, si, di and then use your word processor to copy the line to the end of the subroutine, changing PUSHREGS to POPREGS. This insures that Chapter 15 - Subroutines 155 ________________________ the pushes and pops will be in exact reverse order. It saves space and time, and it generates exactly the same code as if you had written all those pushes and pops in the code. Whenever we have subroutines in the future, we will always use it. MOVING A STRING As a final example, we will create a subroutine that moves a Pascal string from one place to another. We'll assume that both strings are in the current DS, so no segments need to be changed. move_string ( from_string, to_string ) ; where from_string and to_string are the ADDRESSES of the strings. The code generated by the Pascal compiler will be: mov ax, offset from_string push ax mov ax, offset to_string push ax call move_string ; this is Pascal, so the CALLED subroutine must ; get rid of the variables on the stack. Notice that Pascal pushes this data in left to right order, exactly the opposite of how C would handle it. After setting up BP, we have: from_string offset bp + 6 to_string offset bp + 4 old IP bp + 2 bp -> old bp bp + 0 Before coding this, you need to know the structure of a Pascal text string. The first byte (string[0]) is not text, but the text count. The second byte is the first piece of text. This means two things. First, the maximum string size in Pascal is 255, the largest count that will fit in one byte. Second, you need to move 'count + 1' bytes. 'count' is how many text bytes there are, but then you need to move the count itself. If the string is empty (count = 0) you need to move 1 byte, the count byte. Here's the code ; - - - - - move_string proc near FROM_ADDRESS EQU [bp + 6] TO_ADDRESS EQU [bp + 4] push bp ; set up bp mov bp, sp PUSHREGS ax, cx, si, di mov si, FROM_ADDRESS ; source mov di, TO_ADDRESS ; destination sub cx, cx ; zero cx The PC Assembler Tutor 156 ______________________ mov cl, [si] ; text byte count of source inc cl ; add 1 for byte count itself move_loop: mov al, [si] ; source to al mov [di], al ; al to destination inc si ; move pointers to next byte inc di loop move_loop POPREGS ax, cx, si, di pop bp ret (4) ; Pascal, so pop offsets. move_string endp ; - - - - - - - We still have some more to do, and we'll do it in part three of the chapter. Chapter 15 - Subroutines 157 ________________________ What if both strings are in memory but we don't know which segments they are in? In that case, the calling subroutine needs to pass both the segment and the offset for both the from_string and the to_string. Let's do this in C. move_string ( from_string, to_string ) ; In C, this will pass the addresses of the arrays, not the arrays themselves. C, once again, pushes these variables in right to left order. If the compiler is set up to move both the segment and the offset, then the generated C code will be: mov ax, seg to_string push ax mov ax, offset to_string push ax mov ax, seg from_string push ax mov ax, offset to_string push ax call move_string add sp, 8 ; 4 pushes = 8 bytes On the 8086, the low two bytes are ALWAYS the offset and the high two bytes are ALWAYS the segment. Remember, SEG gets the segment starting address of the named variable. We will do the subroutine as a near routine. After setting up BP, we will have: to_string segment bp + 10 to_string offset bp + 8 from_string segment bp + 6 from_string offset bp + 4 old IP bp + 2 bp -> old bp bp + 0 In the subroutine we will have to move the segment and the offset for each pointer. Luckily for us, there are two 8086 instructions for doing this: LDS (load DS) loads the first two bytes into the named register and the next two bytes into DS. LES (load ES) loads the first two bytes into the named register and the next two bytes into ES. If we write: LES di, [bp + 8] then the 8086 will load the first two bytes (bp+8 and bp+9) into DI and the next two bytes (bp+10 and bp+11) into ES. This loads ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 158 ______________________ the offset and the segment in the same instruction. If we write: LDS si, [bp + 4] the 8086 will load the first two bytes (bp+4 and bp+5) into SI and the next two bytes (bp+6 and bp+7) into ES, loading both the offset and segment in one instruction. LDS and LES allow you to load the offset into any full arithmetic register (AX, BX, CX, DX, SI, DI, BP or SP) but you can't use AX, CX or DX as addressing registers, so it only makes sense to load BX, SI, DI and BP for use as pointers. The two strings will now be addressed by DS:SI and ES:DI. DS is SI's normal segment, so we don't need to do anything, but we need a segment override for ES:DI. Here is the code for a C subroutine: A C string ends with a 0 byte, that is with a byte having the numeric value 0. It can be any length, but we need to test each byte to find out if it is 0. Notice that we are using (and changing) both DS and ES this time, so we have to PUSH and POP them, just like other registers. ; - - - - - move_string proc near FROM_POINTER EQU [bp+4] TO_POINTER EQU [bp+8] push bp mov bp, sp PUSHREGS ds, es, ax, si, di lds si, FROM_POINTER les di, TO_POINTER move_loop mov al, [si] ; source to al mov es:[di], al ; al to destination inc si ; pointers to next byte inc di and al, al ; is al 0? jnz move_loop POPREGS ds, es, ax, si, di pop bp ret ; calling routine pops variables move_string endp ; - - - - - Basically, the only difference between this and the Pascal move is that (1) here we check for 0 and there we had an actual count, and (2) in Pascal we used "ret (4)" and here the calling routine does the adjustment. Chapter 15 - Subroutines 159 ________________________ DRAWING THE STACK Each time that we have used the stack we have drawn a picture of where everything is on the stack. In case you think that this is some trivial little learning technique, I'm telling you now that at the assembler level, if you are passing variables on the stack and you don't make a diagram on a piece of paper of where everything is, you are guaranteed to consistently reference things by the wrong address. ALWAYS make a paper diagram that includes BP, ALWAYS use EQU statements, and you'll avoid a lot of mistakes. It is now time to get more complex. Everything that follows is more advanced, so it requires some programming experience. Everything that follows is about C modules and recursion. If this gets too complicated or obscure, just skip to the summary at the end of the chapter. I am going to give you a sample subroutine in C and then show you where all the variables go. The following is a complete C file: /* a complete C file - - - - - - - - - - - */ int A, B ; static int C, D ; extern int E, F ; sample_routine ( G, H ) int G, *H ; { int I, J ; static int K, L ; A = I ; /* transfer the words around */ B = G ; J = E ; F = C ; D = K ; *H = I ; return ; } /* end of C file - - - - - - - - - - - - - */ The only thing this routine does is tranfer the words around. This is so you can see where things are stored and how they are accessed. If you don't know C, the only thing you really need to know here is that '*H' means that 'H' is the ADDRESS of an integer, not the integer itself, while '*H' is the actual integer that is being addressed. For those of you who DO know C, you need to know exactly what extern, static and static mean. extern means that it is in an The PC Assembler Tutor 160 ______________________ external file. The first 'static' (which is outside of any subroutine) means that the data is INTERNAL to the file; it won't be shared with other files. Variables A and B, which don't have the word 'static' are GLOBAL and will be shared with other files. The other 'static' (inside the subroutine) means that its address is fixed in memory while I and J, which are not 'static' have their addresses generated every time you call the program. Say what? Yes, that's correct, every time you call the program they can be in a different place. That's what allows recursion, and you'll see how that is implemented in a second. Here is the equivalent program in assembler: ; - - - - - START DATA BELOW THIS LINE PUBLIC A, B EXTRN E:WORD, F:WORD A dw ? B dw ? C dw ? D dw ? K dw ? L dw ? ; - - - - - END DATA ABOVE THIS LINE ; - - - - - START SUBROUTINE BELOW THIS LINE sample_routine proc near ADDRESS_OF_H EQU [bp + 6] G EQU [bp + 4] I EQU [bp - 2] J EQU [bp - 4] push bp mov bp, sp ; set up bp sub sp, 4 ; save space for I and J PUSHREGS ax, si mov ax, I ; A = I mov A, ax mov ax, G ; B = G mov B, ax mov ax, E ; J = E mov J, ax mov ax, C ; F = C mov F, ax mov ax, K ; D = K mov D, ax mov ax, I ; *H = I mov si, ADDRESS_OF_H mov [si], ax Chapter 15 - Subroutines 161 ________________________ POPREGS ax, si mov sp, bp ; readjust sp pop bp ret ; a C routine so calling routine pops sample_routine endp ; - - - - - END SUBROUTINE ABOVE THIS LINE This time the setup is a little longer. It is: push bp mov bp, sp sub sp, 4 PUSHREGS ax, si I and J are not in the data segment, so we need to make space for them somewhere, and we do it on the stack. We subtract 4 from sp to provide ourselves with 4 bytes, two for I and two for J. There are two EQU statements that say exactly where I and J will be. We also push AX and SI because we will use them. After this setup, we have: address of H bp + 6 G bp + 4 old IP bp + 2 bp -> old bp bp + 0 I bp - 2 J bp - 4 old ax bp - 6 sp -> old si bp - 8 We have created space for some variables below bp. This is temporary and will disappear when we leave the subroutine. We do our dummy calculations,{1} and then do the end adjustment. There are two ways to readjust sp before returning: add sp, 4 just adds the amount that we subtracted, so it winds up in the same place. But the place it winds up is ALWAYS the address of the old BP, and bp is now pointing to that, so mov sp, bp does exactly the same thing. ARE YOU REALLY DEMENTED? Yes, for those of you who are truly masochistic, we have - ta-dah! - the Towers of Hanoi in assembler language. ____________________ 1. And that's 'dummy' in more ways than one. The PC Assembler Tutor 162 ______________________ The Towers of Hanoi is a game with three posts and a number of disks which have incrementally smaller diameters. At the beginning of the game, all the disks are on post one, ordered by size with the smallest on top and the largest on the bottom. For five disks it looks like this: (1) (2) (3) X X X X X X XXXXX X X XXXXXXX X X XXXXXXXXX X X XXXXXXXXXXX X X XXXXXXXXXXXXX X X XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX The object is to wind up with all the disks on post three in the same order. The game has only one rule: You may never put a larger disk on top of a smaller disk. The general solution to this problem is that if you have posts A, B and C with N disks on post A and you want to move them to post C, first move (N-1) disks to post B, move the bottom disk from post A to post C, then move (N-1) disks from post B to post C. This works out to be the optimal recursive solution. Try it out with N = 1, N = 2 and N = 3. N = 4 is already complicated. I won't discuss the game further or how to get the solution. If you don't know about it, you can look it up in either Doug Cooper's "Oh! Pascal" or Robert Kruse's "Data Structures and Program Design". The fact that I need to resort to such an unususl example to illustrate recursion underlines the fundamental rule of recursion, which is: YOU SELDOM NEED TO USE RECURSION, BUT WHEN YOU NEED IT YOU REALLY REALLY NEED IT. First, here is the solution in C: /* the C solution - - - - - - - - - - - - - - - */ #define MAXIMUM 9 main () { int count ; while (1) { printf ("Enter a number less than 10.\n" ) ; scanf ( "%d", &count ) ; if ( count > MAXIMUM ) continue ; Chapter 15 - Subroutines 163 ________________________ towers_of_hanoi ( count, 1, 3, 2 ) ; } } /* - - - - - - - - - - - - - - - - - - - - - - - - - */ towers_of_hanoi ( count, from, to, via ) int count, from, to, via ; { if (count <= 0) return ; count-- ; towers_of_hanoi ( count, from, via, to ) ; printf ("Move a disk from %1d to %1d.\n" , from, to ) ; towers_of_hanoi ( count, via, to, from ) ; return ; } Notice that by letting a routine call itself, we have reduced it to just a few lines. And this is a problem that looks very complex. Here comes the assembler equivalent of the C code: ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE MAX_COUNT EQU 9 enter_message db "Enter a number less than 10", 0 make_a_move_message db "Move a disk from " from_byte db "X to " to_byte db "X.", 0 ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE outer_loop: lea ax, enter_message ; count message call print_string call get_unsigned_byte ; count in al sub ah, ah ; zero ah for 'push ax' cmp al, MAX_COUNT ; too big? ja outer_loop ; move from post 1 to post 3 via post 2 mov bx, 2 ; post 2 = 'via' push bx mov bx, 3 ; post 3 = 'to' push bx mov bx, 1 ; post 1 = 'from' push bx push ax ; al = count, ah = 0 call towers_of_hanoi add sp, 8 ; adjust the stack jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE ; + + + + + + + + + + + + START SUBROUTINES BELOW THIS LINE The PC Assembler Tutor 164 ______________________ towers_of_hanoi proc near VIA EQU [bp + 10] TO EQU [bp + 8] FROM EQU [bp + 6] COUNT EQU [bp + 4] push bp ; set up bp mov bp, sp push ax cmp BYTE PTR COUNT, 0 ; if no disks, we are done jbe exit dec BYTE PTR COUNT ; 1 less disk to move ; first half push TO push VIA push FROM push COUNT call towers_of_hanoi add sp, 8 ; adjust the stack ; print the message mov al, FROM ; get 'from' number add al, '0' ; convert to ascii mov from_byte, al ; put into message mov al, TO ; get 'to' number add al, '0' ; convert to ascii mov to_byte, al ; put into message lea ax, make_a_move_message call print_string ; second half push FROM push TO push VIA push COUNT call towers_of_hanoi add sp, 8 ; adjust the stack exit: pop ax pop bp ret towers_of_hanoi endp ; + + + + + + + + + + + + END SUBROUTINES ABOVE THIS LINE The main routine checks that the number you enter is not too big and then calls 'towers_of_hanoi'. After setting up BP, the stack looks like this: VIA post bp + 10 TO post bp + 8 Chapter 15 - Subroutines 165 ________________________ FROM post bp + 6 count bp + 4 old IP bp + 2 bp -> old bp bp + 0 We make some EQU statements to define where each variable is and set up BP. We are using only one register, so we have a single PUSH instead of using PUSHREGS. We then check to see if the count is 0. If it is we are done. If not, we decrement the count by 1 which gives us 'count - 1' and divide the problem into three parts. Part 1 calls 'towers_of_hanoi' and moves 'count - 1' disks from 'from' to 'via'. Part 2 prints a message of where to move the bottom disk. It converts the post numbers into ascii and inserts them in the string where the 'X's are.{2} Part 3 calls towers_of_hanoi again, this time moving the 'count - 1' disks from 'via' to 'to'. Assemble and link it with ASMHELP. When you run it, start with just 1 disk, then 2 then 3 etc. If you use the maximum number (9), it will print 511 lines. For N disks, you need (2 ** N) - 1 moves, so this gets very big very fast. If you still feel no need to draw a picture of the stack or to use EQU statements, why don't you try writing this subroutine using the actual pointer values, i.e: push [bp + 6] and so on. See how long it takes and see how easy it is to read once it's done. Can you get it to work correctly? By the way, how large does the stack get? Well, if you raised the limit to 30 disks and entered 30, It would take your computer about a year, running 24 hours a day, to complete the solution (about 1 billion moves). The maximum stack size would be ((disks+1) * stack_use_per_disk). The extra 1 is for the calling routine. That is 31 * 14 bytes (including pushing IP, BP, and AX), or a mere 434 bytes. ____________________ 2 It uses the fact that for a data declaration, a variable name has the address of the first piece of data on that line. The PC Assembler Tutor 166 ______________________ SUMMARY To call a procedure you use CALL with the procedure name: call subroutine1 Procedures may be either FAR or NEAR. If there is a mismatch between which type the assembler thinks it is and which type it really is, there will be an error, either at the assembler level for internal subroutines or at the linker level for external subroutines. You may override the default type for procedures with PTR: call NEAR PTR subroutine5 call FAR PTR subroutine6 This is normally needed if the procedure comes after the call in the file and is not the default type. To allow other files to use a procedure you declare it PUBLIC within its own segment. PUBLIC subroutine1 To use a PUBLIC procedure from another file you declare it EXTRN, stating which type it is: EXTRN subroutine1:NEAR, subroutine2:FAR You should make this declaration in the code segment and you should make it before the procedure is referenced. A procedure is defined by giving a name followed by the word 'proc' (procedure) followed by either FAR or NEAR subroutine1 proc near subroutine2 proc far and a procedure is ended by giving the procedure name followed by 'endp' (end of procedure): subroutine2 endp One procedure must be ended before another is begun. Data is normally passed from one procedure to another on the stack: push variable1 push variable2 push variable3 call subroutine1 If this is done, then the called procedure references this data by using BP, the base pointer. The standard setup code is: Chapter 15 - Subroutines 167 ________________________ push bp ; save old bp mov bp, sp ; set bp to current top of stack What the stack looks like at this point depends on whether it is a near or a far procedure. For a near procedure, we have: variable1 bp + 8 variable2 bp + 6 variable3 bp + 4 old IP bp + 2 bp -> old BP bp + 0 For a far procedure we have: variable1 bp + 10 variable2 bp + 8 variable3 bp + 6 old CS bp + 4 old IP bp + 2 bp -> old BP bp + 0 Although it is theoretically possible to access these variables by their pointer definition: mov ax, [bp + 10] It is much less error prone and much clearer to use EQU statements: VAR1 EQU [bp + 10] mov ax, VAR1 If you are writing a recursive procedure and you need temporary variables, you can allot space on the stack for these variables: sub sp, 6 ; room for 6 bytes of temp. variables This should be done before any other pushes are done: push bp mov bp, sp sub sp, 6 PUSHREGS ax, bx, cx, dx These variables should also be named with EQU statements, and as always, you should draw a picture of what is on the stack: variable1 bp + 10 variable2 bp + 8 variable3 bp + 6 old CS bp + 4 old IP bp + 2 bp -> old BP bp + 0 VAR4 bp - 2 The PC Assembler Tutor 168 ______________________ VAR5 bp - 4 VAR6 bp - 6 VAR4 EQU [bp - 2] VAR5 EQU [bp - 4] VAR6 EQU [bp - 6] Data which is passed to the procedure is at a positive offset to BP while data that is created in the procedure is at a negative offset to BP. If you have created a data area for yourself on the stack, then you must eliminate it before leaving the procedure. There are two ways of doing this. One way is to add back what you have subtracted: POPREGS ax, bx, cx, dx add sp, 6 pop bp ret The other way is to give SP the value in BP because this is the place where SP will wind up anyway: POPREGS ax, bx, cx, dx mov sp, bp pop bp ret Use whichever one is clearer to you. When you return from a procedure that has had data passed to it, the data must be taken off the stack. There are two ways of doing this. The C standard is that it is the calling program's responsibility to do this: push variable1 push variable2 push variable3 call subroutine1 add sp, 6 ; 3 pushes = 6 bytes The Pascal standard is that you take them off the stack on the return. There is a special return instruction for that: ret (6) ; 3 pushes = 6 bytes DATA Data can be made available to other files and data can be accessed from other files. To make data available, declare it PUBLIC: PUBLIC variable1, variable2, variable3 Chapter 15 - Subroutines 169 ________________________ To access PUBLIC data from other files, use an EXTRN statement which includes the data type: EXTRN variable7:BYTE, variable8:WORD, variable9:DWORD EXTRN variable10:QWORD, variable11:TBYTE This EXTRN statement must be in a segment which has the same ASSUME segment register as will be used when accessing the data. Normally this is DS, but it can be something else. For instance, if the above EXTRN statements were in MORESTUFF and you have: ASSUME es:MORESTUFF then every time you access variable8: mov dx, variable8 the assembler will code an ES segment override. PUSHREGS and POPREGS When writing a subroutine, you should always save any registers that you use by pushing them. push ax push bx push cx They are then popped before returning pop cx pop bx pop ax In order to save a lot of lines of code, there are two macros, PUSHREGS and POPREGS. They are designed so you may use a word processor to copy them. PUSHREGS pushes in left to right order and POPREGS pops in right to left order: PUSHREGS ax, bx, cx, dx POPREGS ax, bx, cx, dx is a matched pair. LES and LDS LDS (load DS) loads the first two bytes into the named register and the next two bytes into DS. LES (load ES) loads the first two bytes into the named register and the next two bytes into ES. les si, [bp+6] lds di, [bp+10] Chapter 16 - Long Signed Multiplication and Division ==================================================== 170 Now that you have some subroutines under your belt, it is time to get back to multiple word arithmetic. This was put on the back burner because we needed to negate long numbers and it is more efficient to do that as a subroutine. First, let's negate a long number and then put the parts together. To negate a number you complement it, then add 1. It looks like this: NUMBER_LENGTH EQU 4 variable1 dq ? mov si, offset variable1 mov cx, NUMBER_LENGTH not_loop: not WORD PTR [si] add si, 2 loop not_loop mov si, offset variable1 mov cx, NUMBER_LENGTH stc ; set carry flag add_loop: adc WORD PTR [si], 0 inc si inc si loop add_loop This is straightforward. First negate, then add 1. The first add will add 1 because the carry flag is set. If there is a carry out, it will be taken care of in the next word with ADC (we add nothing but the carry). We can make this more compact and efficient with: mov si, offset variable1 mov cx, NUMBER_LENGTH stc ; set carry flag negate_loop: not WORD PTR [si] adc WORD PTR [si], 0 inc si inc si loop negate_loop Neither NOT nor INC effect CF, the carry flag, so the correct CF value will be propagated through the whole long number. When we do negation during our multiplication, the multiplicand will be a 4 word negation while the result will be a 5 word ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 16 - Multiple Word Arithmetic III 171 _________________________________________ negation, so we will pass the length as a parameter. The call in C would look like this: negate_it ( &number, length ) ; {1} On entry, the stack will look like this: length bp + 6 address bp + 4 old IP bp + 2 bp -> old BP bp + 0 Here is the entire subroutine: ; - - - - - START SUBROUTINES BELOW THIS LINE negate_it proc near NUMBER_LENGTH EQU [bp+6] NUMBER_ADDRESS EQU [bp+4] push bp mov bp, sp PUSHREGS cx, si mov si, NUMBER_ADDRESS mov cx, NUMBER_LENGTH stc ; set carry flag negate_loop: not WORD PTR [si] adc WORD PTR [si], 0 inc si inc si loop negate_loop POPREGS cx, si pop bp ret ; calling routine adjusts stack negate_it endp ; - - - - - END SUBROUTINES ABOVE THIS LINE Well, so far we have the negation routine and the unsigned multiplication and division routines. What else is necessary? Only the main program, and here it is for multiplication: ; - - - - - START DATA BELOW THIS LINE multiplicand dq ? multiplier dw ? result dt ? result_sign_flag db ? ; - - - - - END DATA ABOVE THIS LINE ____________________ 1. For you non-C people, the '&' stands for the address. The PC Assembler Tutor 172 ______________________ ; - - - - - START CODE BELOW THIS LINE outer_loop: mov result_sign_flag, 0 ; assume positive mov ax, offset multiplicand call get_signed_8byte test WORD PTR multiplicand + 6, 8000h ; is it negative ? jz get_next_number mov ax,4 ; negate 4 word number push ax mov ax, offset multiplicand push ax call negate_it add sp, 4 ; clear 2 pushes off stack not result_sign_flag ; reverse sign of result get_next_number: call get_signed ; get signed multiplier mov multiplier, ax test ax, 8000h ; is it negative jz do_the_multiplication neg multiplier ; negate not result_sign_flag ; reverse sign of result do_the_multiplication: mov ax, offset result push ax mov ax, multiplier ; the number, not the address push ax mov ax, offset multiplicand push ax call multiply_it add sp, 6 ; clear 3 pushes off stack ; is the result negative? test result_sign_flag, 0FFh ; 1111 1111 mask jz print_it mov ax, 5 ; 5 word result push ax mov ax, offset result push ax call negate_it add sp, 4 ; clear 2 pushes off stack print_it: mov ax, WORD PTR result + 8 ; top two bytes call print_hex mov ax, offset result ; the rest of result call print_signed_8byte jmp outer_loop ; - - - - - END CODE ABOVE THIS LINE The driver routine gets an 8 byte signed number. If the number is Chapter 16 - Multiple Word Arithmetic III 173 _________________________________________ negative it negates the number (to make it positive) and switches the sign of the result_sign_flag. The sign flag will either be 00h for positive or FFh for negative. It then gets a two byte signed number. If it is negative, the routine negates it and switches the sign flag. At this point both numbers are positive, so it calls the unsigned multiplication routine. At the end, it checks the result_sign_flag to see if the result should be positive or negative. If it should be negative, the routine calls negate_it one more time. Finally, the routine prints the number. The hex portion will be 0000 for positive or FFFF for negative unless the value is larger than an 8 byte signed number can hold, at which point the value of the 8 byte signed number will be incorrect. Here's the unsigned multiplication routine which has been turned into a subroutine: ; - - - - - multiply_it proc near RESULT_ADDRESS EQU [bp+8] MULTIPLIER_VALUE EQU [bp+6] MULTIPLICAND_ADDRESS EQU [bp+4] push bp mov bp, sp PUSHREGS ax, bx, cx, dx, si, di mov si, MULTIPLICAND_ADDRESS ; load pointers mov bx, RESULT_ADDRESS mov cx, 4 ; number of words sub di,di ; clear di mult_loop: mov ax, [si] ; multiplicand to ax mul WORD PTR MULTIPLIER_VALUE add ax, di ; add high word from last multiplication jnc store_result inc dx store_result: mov [bx], ax ; store 1 word of result. mov di, dx ; save high word for next multiplication add si, 2 ; increment pointers add bx, 2 loop mult_loop mov [bx], di ; move last word of result POPREGS ax, bx, cx, dx, si, di pop bp ret ; calling routine adjusts stack multiply_it endp ; - - - - - - - - - - - ______________________ correct. The multiplication and the negation subroutines go in the subroutine section of SUBTEMP1.ASM. The driver routine is the main routine. If you don't remember how this multiplication routine works, go back to the chapter on unsigned multiple word multiplication since the code is the same. DIVISION Division is the same situation. We need a driver routine, but the division itself will be the unsigned division. In division, the remainder is the same sign as the dividend, and the sign of the quotient is (dividend_sign XOR divisor_sign). If both signs are the same, the quotient is positive; if the signs are different the quotient is negative. Here's the driver routine: ; - - - - - START DATA BELOW THIS LINE dividend dq ? divisor dw ? quotient dq ? remainder dw ? quotient_sign_flag db ? remainder_sign_flag db ? ; - - - - - END DATA ABOVE THIS LINE ; - - - - - START CODE BELOW THIS LINE outer_loop: mov quotient_sign_flag, 0 ; assume positive mov remainder_sign_flag, 0 mov ax, offset dividend call get_signed_8byte test WORD PTR (dividend + 6), 8000h ; is it negative? jz get_next_number mov ax,4 ; negate 4 word number push ax mov ax, offset dividend push ax call negate_it add sp, 4 ; adjust stack not quotient_sign_flag ; switch sign of quotient mov remainder_sign_flag, 0FFh ; remainder is negative get_next_number: call get_signed mov divisor, ax test ax, 8000h ; is it negative jz do_the_division neg divisor not quotient_sign_flag ; switch sign of quotient do_the_division: mov ax, offset remainder push ax Chapter 16 - Multiple Word Arithmetic III 175 _________________________________________ mov ax, offset quotient push ax mov ax, divisor ; the number, not the address push ax mov ax, offset dividend push ax call divide_it add sp, 8 ; clear 4 pushes off stack ; are the remainder and quotient negative? test remainder_sign_flag, 0FFh jz test_the_quotient neg remainder test_the_quotient: test quotient_sign_flag, 0FFh ; 1111 1111 mask jz print_it mov ax, 4 ; 4 word result push ax mov ax, offset quotient push ax call negate_it add sp, 4 ; clear 2 pushes off stack print_it: mov ax, offset quotient call print_signed_8byte mov ax, remainder call print_signed jmp outer_loop ; - - - - - END CODE ABOVE THIS LINE We get the dividend and check the sign. If it is negative, we (1) negate the number, (2) switch the sign of the quotient, and (3) set the remainder sign flag to negative. We get the divisor, check for negative; if it is negative we negate it and switch the sign of the quotient. We now have two unsigned numbers and do unsigned division. After division, both the quotient and remainder are adjusted for sign. The division routine is the same as the unsigned routine before except it is now a subroutine: ; - - - - - - - - ENTER SUBROUTINE BELOW THIS LINE divide_it proc near REMAINDER_ADDRESS EQU [bp+10] QUOTIENT_ADDRESS EQU [bp+8] DIVISOR_VALUE EQU [bp+6] DIVIDEND_ADDRESS EQU [bp+4] push bp mov bp, sp PUSHREGS ax, bx, cx, dx, si, di The PC Assembler Tutor 176 ______________________ mov si, DIVIDEND_ADDRESS mov bx, QUOTIENT_ADDRESS add si, 6 ; start at the top word add bx, 6 mov di, WORD PTR DIVISOR_VALUE mov cx, 4 ; number of words sub dx, dx ; clear dx for first division division_loop: mov ax, [si] ; dividend word to ax div di mov [bx], ax ; word of result to quotient sub si, 2 ; decrement the pointers sub bx, 2 loop division_loop mov bx, REMAINDER_ADDRESS ; store remainder mov [bx], dx POPREGS ax, bx, cx, dx, si, di pop bp ret ; calling routine adjusts the stack divide_it endp ; - - - - - - - - ENTER SUBROUTINE ABOVE THIS LINE Draw a picture of the stack to verify that the EQU statements are correct for a NEAR routine. The division and negation subroutines go in the SUBROUTINES section of SUBTEMP1.ASM. The driver is the main program. If you don't remember how this division works, go back to the division chapter and look it over. Try out a few numbers to make sure that it is working the way it should. DATA INTEGRITY One thing that may have been annoying some of you is that when the programs sent us numbers for multiplication and division we sometimes negated them, effectively changing the data in memory, but never changed them back when we were done. In an operational subroutine, you would have to do it differently. The logic would be: NUMBER NEGATIVE? no everything's o.k. yes make copy negate reset pointer If the number is positive we won't change it. If the number is negative, we make a copy, negate the copy and use the copy for the operation. Chapter 17 - Iinterrupts ======================== 177 Your word processor will work on a Compaq, an IBM, an AST or any other type of PC compatible computer. Every time it wants to read a file from disk or write to a printer, it calls a DOS or BIOS function that takes care of it.{1} Yet every computer has its own versions of these subroutines, and they not only are different, they are in different places in memory. How does the word processor know where to find them? It uses interrupts.{2} An interrupt is a glorified subprogram call. The first 1024 bytes of your computer's memory (that's from 0000:0000 to 0000:03FF) contain the addresses of all the DOS and BIOS interrupts. Which address contains the address of which subprogram was decided by the triumverate of Intel/IBM/Microsoft. We have two different sets of addresses here, so let's keep them straight. Starting at memory address 0000 there are 4 byte addresses called interrupt vectors. There is a different vector at 0000d, at 0004d, at 0008d, at 0012d at 0016d etc.; there are 256 of them in all. Each of these 256 places can hold the address of a subprogram somewhere in memory (although not all of them are used). When you call an interrupt, the 8086 goes to the appropriate place in low memory (from 0000 to 3FFh) and finds the address of the subprogram that you want, loads it into CS and IP, and goes to that subprogram. When that subprogram is done, it goes back to the next instruction in your program after the interrupt. How did those addresses get into the first 1024 bytes? The computer put them there when you started it up. It is one of the first things that the computer did when you turned it on. These subprograms do EVERYTHING, and you can't run the computer without them. If you want to scroll the screen, you put the appropriate information in the 8086 registers and use: int 10h ; decimal 16 {3} The 8086 goes to the interrupt 16 address (4*16 = 64), gets the address of the program that contains the video subprograms, and ____________________ 1. BIOS stands for basic input/output services. 2. If you haven't gotten it yet, it's time to get either "DOS Programmer's Reference" or "The Peter Norton Programmer's Guide to the IBM PC." I'm serious. They contain the information you need to make use of this chapter. 3. It is normal to use hex numbers for the interrupts, so if you read about an interrupt make sure you know if the numbers are hex or decimal. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 178 ______________________ goes to it. If you want to write to the printer, you put the appropriate information in the 8086 registers and call: int 21h ; decimal 33 The 8086 goes to address 132 (4*33 = 132), gets the address of the DOS program, puts it in CS and IP, and starts it. If you want to get input from the keyboard, you put the appropriate information in the 8086 registers and call: int 21h ; decimal 33 That's right, it is the same program as the one that does the printer. The 8086 goes to 132 (4*33), gets the program address, and goes to it. The lowest interrupt is int 0 (address = 4*0 = 0000). The highest interrupt is int 255 (address = 4*255 = 1020). This is an intelligent way to handle the situation. As long as everyone agrees which interrupt contains the address of which subprogram, our programs will work on any PC compatible. This is one of the things that is meant by PC compatible. On my computer, here is a section of these addresses starting with int 1Eh (30d). High memory is at the top, low memory is at the bottom. INT # DATA LOCATION 0724h cs 146 36 04A8h ip 144 cs 0724h 142 35 ip 01BDh 140 0724h cs 138 34 01B0h ip 136 cs 019Fh 134 33 ip 05EBh 132 019Fh cs 130 32 05E7h ip 128 cs 0000h 126 31 ip 0000h 124 0070h cs 122 30 0EB8h ip 120 If we call int 30d, the 8086 goes to 120 (4*30), puts 0EB8h into the IP, and puts 0070h (from the next higher location) into CS. The next instruction it does will be 0070:0EB8. If we call int 35d, the 8086 goes to 140 (4*35), puts 01BDh in IP, puts 0724h (from the next higher location) in CS. The next instruction the 8086 does will be 0724:01BD. If we call int 33d, the 8086 goes to 132 (4*33), and puts 05EBh in IP, 019Fh (from the next higher location) in CS. The next instruction executed will be 019F:05EBh. The 0000:0000 for int 31d indicates that there is no int 31d (address 0000:0000 contains data, [the vectors for int 0], not a program). Make sure that you understand how this is working before you go on. Chapter 17 - Interrupts 179 _______________________ On the PC, information for the interrupts is always passed through registers, not on the stack. Each interrupt type has a specific register for each piece of information it needs. We will do a couple to see how they work. DISPLAYING A CHARACTER We can print a character at a time on the monitor, so we'll input a string, and then print out each character of the string individually. We will stop the printout when we see the 0 at the end of the string. ; - - - - - - - - - - START DATA BELOW THIS LINE buffer db 80 dup (?) ; - - - - - - - - - - START DATA BELOW THIS LINE ; - - - - - - - - - - START CODE BELOW THIS LINE call show_regs outer_loop: mov ax, offset buffer call get_string mov si, offset buffer inner_loop: mov al, [si] cmp al, 0 je next_string mov ah, 14 ; ah contains function number mov bh, 0 ; where in memory int 10h ; 33d inc si jmp inner_loop next_string: call get_continue jmp outer_loop ; - - - - - - - - - - START CODE BELOW THIS LINE The program is simple. It gets a string, then checks each character for 0 (end of string), before shipping it off to the screen. I'll explain AH in a second. There are several places this character can be displayed{4}, so use show_regs to force the video screen to a certain place in memory, then put BH = 0 to tell the interrupt that the screen memory is in that place. Finally, with all the information in place, we do the interrupt. You will not print a carriage return, only the data, so we use get_continue to give us a carriage return. It's a little sloppy, but much easier. We have lots and lots of subprograms for the disks, printer, screen, etc. There are only 255 interrupts, so it was decided at the beginning to make most of the interrupts groups of programs ____________________ 4. Cf. one of those two books. The PC Assembler Tutor 180 ______________________ instead of a single program. Int 10h (16d) contains about two dozed different video subprograms. Each program is distinguished by a specific number in AH. For int 10h (16d): ah = 2h Get cursor position ah = 6h Scroll window up ah = Eh (14d) Write character to screen For int 21h (33d): ah = 1h Keyboard input ah = 5h Printer output ah = 17h Rename a file ah = 2Ch Get the time As you can see, int 21h is a potpourri of subprograms. We'll do the same program as above, but with printer output. Everything is the same except that the inner loop should be changed to look like this: ; - - - - - mov si, offset buffer inner_loop: mov dl, [si] cmp dl, 0 je next_string mov ah, 5 ; ah contains function number int 21h ; 33d inc si jmp inner_loop ; - - - - - There is practically no change. We use DL instead of AL, have "int 21h" instead of "int 10h", and change the function number in AH to 5. Also, BH is not needed. Leave your printer off at the beginning to see what happens. Turn your printer on and enter a string of 10 or 20 letters. Probably nothing happened. Enter another 20 or 30 letters. Nothing again. Try 50 letters this time. This time it should work. Lots of printers won't print anything unless (1) they get a carriage return (which you haven't sent) or (2) they have a backlog of more than 80 characters. If you ever use the printer interrupt, you'll have to remember that. These interrupts which are in your program are called software interrupts. They are your interface with the peripheral devices on your machine, doing everything from disk i/o to handling memory allocation. They do all your housekeeping for you.{5} If you are going to work at this level then you should buy one of those two books. They contain all the software interrupts (and there are about a hundred of them), tell how they work and which registers to set for each interrupt. If you don't have access to ____________________ 5. But they don't do Windows. Chapter 17 - Interrupts 181 _______________________ these interrupts it's like having your arms cut off - you're unable to do any i/o at all. HARDWARE INTERRUPTS The interrupts that we write in our programs aren't the only interrupts there are. The hardware uses interrupts to take temporary control of the computer. On a Macintosh, if you insert a disk, the program stops and the operating system reads in the disk directory. A modem that is in use will request time for doing i/o. Also, if the 8086 detects a zero divide, it will trigger a special interrupt. You have met int 4 already. It is the INTO instruction, which will trigger an interrupt if the overflow flag is set. Your interrupts are in specific places in your code, but these machine interrupts can happen at any time. There are two lines (wires) into the 8086. One is for serious problems that need to be taken care of NOW, and it has non-maskable interrupts. The other line is for interrupts that need to be taken care of in a timely fashion, and they are maskable interrupts. A non-maskable interrupt (NMI) is when the hardware detects that it is in deep doo doo. It sends a signal on the NMI line that says "Hey! I need an interrupt." The 8086 finishes the instruction it is processing and then IMMEDIATELY gives over control. The NMI uses the same 1024 bytes in low memory for interrupt vectors, but has its own interrupt numbers. Normally this is for very serious errors, so the interrupt program may decide to abort your program and return to the operating system; if it makes sense to, it will return to your program where your program left off. A maskable interrupt is when a piece of hardware has some work to do. It sends a signal to the 8086 (on the INTR line), and the 8086 takes care of it when it is ready. When the 8086 is ready depends on you. In the flags register is the IEF, the interrupt enable flag. It should always be set to 1 unless you are doing something critical. Basically, the only things that are critical are interrupts themselves and context switches. Context switches are done by the operating system in multitasking environments, so they don't concern you, and you are not writing interrupts, so they don't concern you. Therefore, always keep the IEF set. Just for your information, you set the IEF with: sti ; set interrupt flag (interrupts enabled) and clear it with: cli ; clear interrupt flag (interrupts disabled) Why do interrupt programs clear the interrupt flag? Because you could have interrupts interrupting other interrupts and wind up with scads of half finished interrupts lying around. This way, The PC Assembler Tutor 182 ______________________ one interrupt finishes before another can take over.