Chapter 2

Computer Architecture


Machine Language Programming

Objectives: To instill an understanding of how digital computers work from both a hardware and software perspective.


Old Chapter 2 for HC11

MC9S08QG8 Resource page

MC9S08QG8 Data Manual, 300 pages, 3Mb pdf

The essential components of the digital computer are shown in Figure 2.1.

The Central Processing Unit (CPU) is the brain of the computer. It performs the essential task of fetching instructions from program memory and executing these instructions. The CPU contains the Arithmetic/Logic Unit (ALU) which performs the various arithmetic and boolean operations. The design of the ALU gives each computer its distinctive flavour. This is what makes the Intel 486 processor different from the Motorola 68000.

Memory is used for storage of program instructions as well as data. A unit of storage is called a word. Computers have different word lengths, for example, 4, 8, 12, 14, 16, 32 and 64-bit words are in common use today. Microcomputers generally use a word length of 8 bits, or a byte. Memory is organized as a linear array of words, a word being the smallest collection of bits which can be accessed by the CPU. Each word is identified by its address or location in memory. Thus a computer with 64K words of memory (65536) would require a memory address register that is 16 bits wide.

Semiconductor memories have virutally replaced older forms of memory such as magnetic core memory. Because of their highest density and lowest cost, dynamic random-access memory (DRAM) chips are used in modern computers. These have the drawback that they must be refreshed continuously. That is, additional control circuitry is required to access banks of memory periodically otherwise the electronic charge in each memory cell would be lost. DRAM are also volatile, i.e., all data is lost after the power is turned off. Static random-access memories (SRAM) are faster than DRAM, use less power and do not require refreshing. SRAM, like DRAM, are also volatile. There are may applications where non-volatile storage is required. Read-only memories (ROM) are used to store program code or information that never changes. To be cost effective, these must be programmed by the manufacturer in large quantities. UV-EPROM (ultra-violet erasable programmable ROM) can be erased and reprogrammed by the user using a special UV lamp and programming equipment. EEPROM (electrically erasable PROM) has the advantage of being erasable and programmable in situ. Flash memory is another form of EEPROM. Because of their higher density, block orientation and longer erase times, these are more suited as replacements for hard disk drives or as removable storage devices such as CompactFlash™ cards and memory sticks.

Without Input/Output (I/O) capabilities a computer would not be of much use. I/O is any means of getting information into and out of the computer. This could be as simple as switches and lights or more traditional I/O devices such as the keyboard, mouse and video display. Modems, sound cards, analog-to-digital and digital-to-analog converters, serial and parallel ports, timers and counters are all examples of I/O devices.

Storage devices fall into another category of I/O devices. Hard disk drives, floppy disk drives and CD-ROM drives are mass storage devices for keeping large amounts of information. While these may not be perceived as part of the I/O capabilities of the computer, from the hardware perspective they are interfaced to the computer via the I/O bus.

Embedded Computers

We are familiar with the computer in its most highly visible form as the personal computer (PC) and its aliases such as desk-top or lap-top computer, engineering work-station, business computer, file server and so on. Applications of the PC cover wide areas from personal, scientific, engineering, business and commercial applications to Internet servers, Information Management Systems (e.g. patient records, airline reservations) and industrial process controllers.

And yet there are more computers installed all over the world in areas other than the ones just described. These are the computers embedded in machines, instruments and appliances such as cameras, stereos, TVs, VCRs, microwave ovens, computer printers, copiers, modems, mobile phones, wristwatches, automobiles, test and diagnostic equipment, scientific equipment, health-care instruments and the list goes on and on.

Embedded computers refer to both microprocessors as well as more complex computer systems applied to a specific task. In most cases, the user is unaware of the computational engine under the cover and neither cares about the software or operating system being employed. Embedded processors range in size and cost from miniature 4-bit microprocessors costing under $1 to full blown Pentium or Power PC based systems costing in excess of $10,000.

Freescale MC9S08QG8

(In July 2004 Freescale Semiconductor was created from the semiconductor division of Motorola.)

This course uses the Freescale MC9S08QG8 as a model microcontroller unit (MCU) because its architecture and programmer's model is relatively simple. The fundamentals are universal and can be applied to any other microprocessor. Moreover, we will promote the idea that bigger does not mean better. Smallness can be beautiful. There are many obvious advantages to being small such as being compact, portable, energy efficient, simple to design and manufacture, less costly and more reliable.

The Freescale MC9S08QG8 microprocessor is a low cost and yet very powerful 8-bit solution to many embedded processor applications. The HC08/HCS08 is a large family of processors available from Freescale in various input/output configurations based on the same CPU core.

Here are some of the highlights of the MC9S08QG8:

The MC9S08QG8/4 is available in 8-pin and 16-pin surface mount packages (SMD) and some versions in dual in-line packages (DIP). The MC9S08QG4 comes with 4Kbytes FLASH and 256 bytes RAM.


Reference: HCS08 CPU

Figure 2.2 shows the register model of the ALU of the HC08 and HCS08 CPU.

Figure 2.2 CPU Registers


The A accumulator is a general-purpose 8-bit register. In general, all mathematical and logical operations will be performed using this register. However, the Freescale family of microcontrollers can perform many operations directly on the first 256 bytes (page zero) of RAM. This is like having an additional 256 registers besides the A accumulator.

The H and X registers together constitute a 16-bit index register. Previous versions of the Motorola 6800 and 6805 families had a single 8-bit X index register. This was found to be somewhat lacking and hence the additional 8-bit H register was added to allow full 16-bit addressing capability. Note that for backward compatiblity, some instructions operate only on the X register.

The program counter (PC) contains the address of the next instruction to be executed. Similarly, the stack pointer (SP) contains the address of the next free space in SRAM which is available for temporary storage. (Stacks are explained in Chapter 4). In general, the programmer can consider the handling of the PC and SP as an internal matter and may ignore the two for the time being.

The Condition Code Register contains eight flags and control bits used to monitor the results of arithmetic/logic operations and to control certain CPU functions.

Figure 2.3 - Condition Code Register

The Zero (Z) bit is set when the result of the last arithmetic, logical, or data manipulation is zero.

The Negative (N) bit reflects the state of the MSB of the result. For 2's complement notation, the N bit is set if the result is negative.

The Overflow (V) bit is used to indicate if an arithmetic overflow has occurred as a result of the operation.

The Carry (C) bit is used to indicate if a carry from an addition or a borrow from a subtraction has occurred. The C bit is also used in shift and rotate operations.

For the time being you may ignore the functions served by H and I bits.

Machine Language vs Assembly Language

A computer program is a set of precise instructions stored in memory for the CPU to act on. Instructions on the HC08 vary in length from 1 to 4 bytes, depending on the individual instruction. This is characteristic of a Complex Instruction Set Computer (CISC) as compared to a Reduced Instruction Set Computer (RISC). For example, the Pentium is a CISC while the Power PC is a RISC. How does the CPU of a CISC know how many bytes make up the instruction? The first byte in the sequence dictates the type of instruction and therefore the number of bytes to follow for that instruction. Therefore it makes no sense to execute program bytes out of synchronous order.

A machine cycle is the length of time it takes to perform an internal hardware operation. The MC9S08QG8/4 can operate from an internal oscillator or from an external quartz crystal. On power-on reset, the internal oscillator will default to run at 16MHz which is then divided by 4 to give a bus clock of 4MHz. Thus, a machine cycle is 250ns in duration. To fetch and execute an instruction may require a number of machine cycles.

A collection of instructions (i.e. the program) is stored in memory at sequential locations and the CPU fetches and executes each instruction one at a time. The flow of execution is controlled by the CPU with the use of the program counter. This register keeps track of the address of the next byte to be fetched.

Normally, the CPU will fetch instructions from memory in sequential order unless instructed to do otherwise as in the case of a jump or branch instruction. In this case, the PC will be loaded with a new memory address and on the next fetch cycle the flow of program execution will be directed to a different location in memory.

Pieces of data are stored in memory in exactly the same fashion as instructions. The CPU has no way of distinguishing instructions from data. Of course, the programmer knows which locations are used for data storage and will create the program such that data locations are never executed. In the event that this occurs, the program will behave unexpectedly and "crash".

Machine Language & Assembly Language

What is the difference between machine language and assembly language? Machine language or machine code is the native language or instructions that the computer executes. Information on a digital computer is encoded using a two-state mechanism, that is, either ON or OFF. Therefore all information is manipulated and stored using a binary number system. Machine language is inherently a binary system.

Assembly language and machine language both represent the same information. They represent the native binary codes which will be entered into the computer and constitute a coherent set of instructions which we call the program. The difference lies in the interpretation by humans.

Machine code is binary. Programming a computer using the binary system of notation is very time consuming, inefficient and prone to errors. By utilizing decimal, octal and hexadecimal notation, it is possible to reduce the amount of data entry and book-keeping and therefore reduce effort and likelihood of errors. For example, the machine code to stop the CPU is 10001110. It is easier to write this as $8E, and even better to write STOP. Using hexadecimal formats or meaningful words is simply a better way for humans to relate to the binary instruction.

Instead of using a numeric value, either written as binary or hexadecimal, to represent an instruction, it is easier to use an acronym or mnemonic. The use of a mnemonic which has a one-to-one equivalence to the machine code and its function is called assembly language programming. The computer does not understand assembly language. Assembly language is a mechanism to assist humans in creating and managing programs in machine code more effectively and efficiently. Thus, programmers write in assembly language. Computers execute machine code.

When a program is written in assembly language, it must be translated into machine code and into a form that can be entered into the memory of the computer. This process is called assembling and can be accomplished manually. However, this is prone to errors and an automated method is preferred. A computer program which translates assembly language programs into machine code is called an assembler.

Simulators and debuggers are other program development tools which assist programmers in testing and debugging assembly language programs.

Programming Example

Assembly language programming is best introduced through the use of simple examples. Let us suppose we wish to find the sum of 12 and 23. In standard algebra we could write:

RESULT = 12 + 23

In HC08 assembly language, we have to instruct the CPU at each step of the operation. Here is a program to do this:

  LDA #12    
  ADD #23    

Here is a brief explanation of the mnemonics and what each line does.

  1. Load accumulator A with 12.
  2. Add 23 to A, leaving the result in A.
  3. Store the contents of accumulator A to a location called RESULT. For the moment let us assume that the address of RESULT is $00.
  4. Stop execution of the program.

The machine codes for this program in hexadecimal notation are as follows:

A6 0C AB 17 B7 00 8E

This program is of little use since we know the answer to be 35, a priori. Instead, it would be more useful to add two numbers whose values may be not fixed, i.e., what we call variables. Let us rephrase our example as:


The assembly language program to accomplish this may look as follows:

  LDA NUM1    
  ADD NUM2    

The difference between the two examples above is the use of the immediate symbol (#) in the first two lines. In the first example, the # specifies that the numeric data is a constant value and this value is found in the byte or pair of bytes following the instruction op-code byte. This is called immediate addressing mode. In the second example, the omission of the # specifies that the symbols NUM1 and NUM2 represent addresses and the numeric data are stored at these memory addresses.

Thus, note the difference between the following two instructions:

  LDA #4    
  LDA 4    

The first instruction loads the value 4 into accumulator A. The second loads the contents of memory address 4 into accumulator A.

Misuse of the # symbol is a common source of programming error which the assembler cannot detect.

If we know the addresses of NUM1 and NUM2 to be 4 and 5 respectively, we could write the following:

  LDA 4    
  ADD 5    

In general, it is best not to use absolute addresses of variables. Instead, let the assembler/compiler assign memory addresses to symbols as it sees fit.


What is the difference between the following statements:

LDA #123

LDA #$7B

LDA #%01111011

The answer is none.

123, $7B and %01111011 are just different notations or representations for us humans to represent the same quantity or value which is One Hundred and Twenty Three. The bit pattern stored on the computer is 01111011 and is always binary.


MC9S08QG8 Memory Model

Figure 2.3 MC9S08QG8 Memory Map

The memory map shows that the first 96 memory addresses are reserved for the most commonly used internal hardware registers. This leaves 160 locations on page zero from address $0060 to $00FF which can be used to directly address memory locations in RAM. Eighty high page registers from $1800 to $184F are for less often used internal hardware registers. To access these locations, extended or indexed addressing modes are required.

Program code is normally stored in the FLASH memory area which can be erased and reprogrammed many times.

Instruction Summary

The complete HC08 instruction set is listed here to familiarize you with the HC08 capabilities. For more information, see the HCS08 Instruction Set Summary. There are six addressing modes which refer to the different ways parameters are accessed. These modes are as follows:

Inherent Addressing

In inherent addressing mode, all of the information is contained in the instruction byte. The operands (if any) are registers and no memory reference is required. These are one or two byte instructions. Here is a list of the inherent instructions:

Mathematical Operations
  CLRA Clear A
  CLRX Clear X
  CLRH Clear H
  COMA 1's Complement A
  COMX 1's Complement X
  DAA Decimal Adjust A
  DECA Decrement A
  DECX Decrement X
  INCA Increment A
  INCX Increment X
  NEGA 2's Complement A
  NEGX 2's Complement X
  NSA Nibble Swap A
  TSTA Test A
  TSTX Test X
Shift Operations
  ASLA Arithmetic Shift Left A
  ASLX Arithmetic Shift Left X
  ASRA Arithmetic Shift Right A
  ASRX Arithmetic Shift Right X
  LSLA Logical Shift Left A (same as ASLA)
  LSLX Logical Shift Left B (same as ASLX)
  LSRA Logical Shift Right A
  LSRX Logical Shift Right X
  ROLA Rotate Left A through Carry
  ROLX Rotate Left X through Carry
  RORA Rotate Right A through Carry
  RORX Rotate Right X through Carry
 Inter-Register Operations
  TAX Transfer A to X
  TXA  Transfer X to A 
  TAP  Transfer A to Condition Code Register 
  TPA  Transfer Condition Code Register to A 
  TSX  Transfer (SP) + 1 to H:X
  TXS  Transfer (H:X) -1 to SP
  PSHA  Push A onto Stack
  PSHX  Push X onto Stack
  PSHH  Push H onto Stack
  PULA  Pull A from Stack
  PULX  Pull X from Stack
  PULH  Pull H from Stack
Flag Operations
  CLC  Clear Carry
  CLI  Clear Interrupt Mask 
  SEC  Set Carry 
  SEI  Set Interrupt Mask 
 Miscellaneous Operations
  BGND Enter background debug mode
  DIV Unsigned Divide, (H:A)/X) => A, remainder => H
  MUL  Unsigned Multiply (X) × (A) => (X:A) 
  NOP  No Operation
  RTI  Return from Interrupt
  RTS  Return from Subroutine
  STOP Stop Microprocessor
  SWI Software Interrupt
  WAIT Wait for Interrupt

Memory Operations

The HC08 is capable of performing arithmetic and logical operations on memory locations as well as the A and X registers. Thus the programming model is not restricted to just 8-bit entities. This makes implementation of multiple-precision arithmetic possible.

Some 16-bit operations are simplified using the H:X register combination. When performing 16-bit memory transfers, big-endian byte order is used. That is, bytes are stored with the high-byte appearing first follow by the low-byte. This is opposite to Intel processors which use little-endian byte order.

Immediate Addressing

In the immediate addressing mode, the actual argument is contained in the one or two bytes immediately following the instruction byte, where the number of bytes must match the size of the register being used. Thus the actual constant value is stored as part of the sequence of bytes that make up the instruction. This mode is selected when the # symbol precedes the argument. Examples:

  LDA #23    
  LDHX #ONE    
  AND #$F0    

Direct Addressing

In the direct addressing mode (also called page zero addressing) a single byte is used to specify the least significant byte of the 16-bit memory address of the parameter to be accessed. The MSB of this effective address is assumed to be $00. Therefore only addresses $0000 to $00FF are accessible using direct addressing. Instructions using direct addressing are two byte instructions and therefore make more efficient use of machine cycles and memory space. Examples:

  STA PTAD    
  LSR NUM    

Extended Addressing

In the extended addressing mode, two bytes are required to specify the full 16-bit effective address of the parameter to be referenced. Hence the full range of address from $0000 to $FFFF can be specified. Examples:

  STA SOPT1    
  LDHX table    
  JSR output    

Indexed Addressing

In indexed addressing mode, the 16-bit H:X index register is used in calculating the effective address. The address contained in the index register H:X can be used as is or an 8-bit or 16-bit unsigned offset can be added to the contents of H:X to form the effective address. Indexed addressing can also be performed using the stack pointer SP as the index register.

  LDA ,X    
  ADD 3,X    
  LSL NAME,X    
  COM 4,SP    

Indirect Addressing

The HC08 does not have an indirect addressing mode. This information is included to round out the discussion on memory addressing modes. Indirect addressing is one of the most powerful and sometimes confusing features available on most computers and yet the concept is fairly simple. With direct (as well as extended) addressing the effective address is specified in the instruction bytes. In indirect addressing mode, the memory location specified by the instruction bytes contains the effective address of the parameter.

In many programming operations, we are not so much concerned about the actual contents of a variable but more about the location of the variable. That is, many times our focus is on the address of a variable and how to manipulate this address. In high level languages such as Pascal and C, structures and pointers rely heavily on the use of indirect addressing. On the HC08, indexed addressing mode is used to implement indirect addressing.

Memory-Accumulator Operations
  ADC Add with Carry to A
  ADD Add Memory to A
  AND AND A with Memory
  BIT Bit Test A with Memory
  CMP Compare A with Memory
  CPHX Compare H:X with 16 bits in Memory
  CPX Compare X with Memory
  EOR Exclusive OR A with Memory
  LDA Load Accumulator A with Memory
  LDHX Load Index Register H:X with 16 bits
  LDX Load Index Register X with 8 bits
  ORA OR Accumulator A
  SBC Subtract with Carry from A
  STA Store Accumulator A
  STHX Store H:X to Memory
  STX Store Index Register X
  SUB Subtract Memory from A
Memory-Only Operations
  ASL Arithmetic Shift Left
  ASR Arithmetic Shift Right
  CLR Clear Memory
  COM 1's Complement 
  DEC Decrement Memory 
  INC Increment Memory 
  LSL Logical Shift Left (same as ASL) 
  LSR Logical Shift Right 
  MOV Move from Memory to Memory
  NEG 2's Complement
  ROL Rotate Left
  ROR Rotate Right 
  TST Test for zero or minus

Relative Addressing

The relative addressing mode is used only for branch instructions. If the branch condition is true, the 8-bit signed integer following the instruction opcode is added to the current contents of the program counter (PC) to form the effective branch address. If the branch is not taken, program execution continues with the next instruction. Examples

  BNE main    
  BSR putc    
loop BRCLR bit4,flags,loop    

Branching - Unsigned Arithmetic
  BHI Branch if Higher
  BHS Branch if Higher or Same (same as BCC)
  BLO Branch if Lower (same as BCS)
  BLS Branch if Lower or Same
Branching - 2's Complement Signed Arithmetic
  BGE Branch if Greater than or Equal to zero
  BGT Branch if Greater Than zero
  BLE Branch if Less than or Equal to zero
  BLT Branch if Less Than zero
General Branching
  BCC Branch if Carry is Clear
  BCS Branch if Carry is Set
  BEQ Branch if EQual to zero
  BMI Branch if MInus
  BNE Branch if Not Equal to zero
  BPL Branch if PLus
  BRA BRanch Always
  BRN BRanch Never
  BVC Branch if oVerflow is Clear
  BVS Branch if oVerflow is Set
  BSR Branch to SubRoutine
Long Branch
  JMP Jump to new location (16-bit address)
  JSR Jump to SubRoutine (16-bit address)
(Technically speaking, these are not relative branch instructions but are absolute jumps using extended addressing mode. These two instructions are listed here to complete the list of branch instructions.)

Bit Operations

Another feature of the HC08 is the ability to set or clear any individual bit of RAM or I/O register using the BSET and BCLR instructions. Program branching can also take place based on the value of any specified bit using the BRSET and BRCLR instructions..

Bit Set/Clear
  BSET n,dir Bit Set
  BCLR n,dir Bit Clear
Branching if Bit Set/Clear
  BRSET n,dir,rel Branch if bits set
  BRCLR n,dir,rel Branch if bits clear


  BSET 0,NUM ;set bit-0 of NUM
  BCLR 7,PTAD ;clear bit-7 of PORTA
  BRSET 2,NUM,MAIN ;branch if bit-2 of NUM is set

HC(S)08 Additional Instructions

There are instructions added to the HC08 and HCS08 instruction set which improve on the previous HC05 instruction set. These are primarily instructions added to handle the new H register and instructions added to manage the stack and the stack pointer. Other improvements are the MOV instruction which does not require the A accumulator, compare and branch if equal, CBEQ, and decrement and branch if not zero, DBNZ.

  MOV #imm,dir Move immediate to direct location
  MOV dir,dir Move direct to direct
  MOV dir,X+ Move direct to location addressed by H:X and increment H:X
  MOV X+,dir Move byte from location addressed by H:X to direct and increment H:X
Compare and Branch if Equal
  CBEQA #imm,rel Branch if A = imm
  CBEQX #imm,rel Branch if X = imm
  CBEQ dir,rel Branch if A = byte at direct location
  CBEQ X+,rel Branch if A = byte at H:X, post increment H:X
  CBEQ disp,X+,rel Branch if A = byte at (H:X + disp), post increment H:X
  CBEQ disp,SP,rel Branch if A = byte at (SP + disp)
Decrement and Branch if Not Zero
  DBNZA rel Decrement A and branch if A is not zero
  DBNZX rel Decrement X and branch if X is not zero
  DBNZ dir,rel Decrement byte at location dir and branch if not zero
  DBNZ X,rel Decrement byte at H:X and branch if not zero
  DBNZ disp,X,rel Decrement byte at (H:X + disp) and branch if not zero
  DBNZ disp,SP,rel Decrement byte at (SP + disp) and branch if not zero
Load H:X (big-endian byte order, i.e. H first)
  LDHX #imm Load H:X with immediate 16 bits
  LDHX mem Load H with (mem) and X with (mem + 1)
  LDHX ,X Load H:X with 16 bits from memory at (H:X)
  LDHX disp,X Load H:X with 16 bits from (H:X + disp)
  LDHX disp8,SP Load H:X with 16 bits from (SP + disp8)
Store H:X (big-endian byte order, i.e. H first)
  STHX mem Store H into (mem) and X into (mem + 1)
  STHX disp8,SP Store H:X into (SP + disp8)



  AIS #imm Add 8-bit signed value toSP
  AIX #imm Add 8-bit signed value to H:X
  BGND Enter background debug mode (if ENBDM = 1 in BDC control register)
  CLRH Clear H
  DAA Decimal Adjust Accumulator
  DIV Divide (H:A) by X, result => A, remainder =>H
  MUL Multiply A by X, result => X:A
  NSA Nibble Swap Accumulator
  PSHA Push A onto stack
  PSHH Push H onto stack
  PSHX Push X onto stack
  PULA Pull A from stack
  PULH Pull H from stack
  PULX Pull X from stack
  RSP Reset Stack Pointer, SP <= $FF

Instruction Opcode Map

What is an opcode map? Opcode refers to the machine code for each operation or instruction. For example, the opcode for the CLRA instruction is $4F. Since the HCS08 is based on an 8-bit opcode, theoretically, there are 256 possible instructions. The opcode or first byte defines the instruction and dictates how many additional bytes are required for the complete instruction. An opcode map is a 16 x 16 table showing all possible 256 instructions ordered according to the actual opcode in hexadecimal representation.

In practice, when there are more than 256 instructions, the designer of the MCU uses one or more codes out of the set of 256 to create an extension, escape code, or page-2 set of instructions. The HCS08 uses opcode $9E as the page-2 identifier. For efficiency reasons, extensions or page-2 instructions tend to be less often used instructions.

The opcode map also shows for your convienience, the number of bytes, number of machine cycles and the addressing mode along with the opcode in hexadecimal representation.

HCS08 Opcode Map


Basic Syntax Rules

In assembler, instruction mnemonics are not case sensitive. However, user names and labels are case sensitive in both assembler and C. In assembler, comments may begin with a semicolon or //. In C, single comment lines begin with //. A block of code can be treated as comments if enclosed by /* */.

In C, hexadecimal notation is preceded by 0x. In assembler, hexadecimal notation is preceded by 0x or $.


  // you can embed assembler code in a C procedure using the asm { } structure
  BSET 0,NUM ; this is a comment in assembler
  lda $E3 // instructions are not case sensitive
  // this is a comment in C
    SOPT1 = 0x52;
    In C you can comment out
    a block of code or text


MC9S08QG8 Getting Started

MC9S08QG8 Resources

MC9S08QG8 Instruction Set Summary

HCS08 Opcode Map

MC9S08QG8 Data Manual, 300 pages, 3Mb pdf

Problems Home Next Chapter