## Alpha 21164 Microprocessor ### Hardware Reference Manual Order Number: EC-QAEQA-TE Revision/Update Information: This is a preliminary version. Sept94 From John Edmondson This is the EVS letternal spec, turned into the HRM by tech writers. It may have some bugs but not major ones. (JE) Digital Equipment Corporation Maynard, Massachusetts #### Preliminary, September 1994 Possession, use, or copying of the software described in this publication is authorized only pursuant to a valid written license from Digital or an authorized sublicensor. While Digital believes the information included in this publication is correct as of the date of publication, it is subject to change without notice. Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description. © Digital Equipment Corporation 1994. All Rights Reserved. Printed in U.S.A. The following are trademarks of Digital Equipment Corporation: Alpha AXP, AlphaGeneration, AXP, DEC, DECchip, Digital, OpenVMS, VAX, VAX DOCUMENT, the AlphaGeneration design mark, and the DIGITAL logo. Hewlett-Packard is a registered trademark of Hewlett-Packard Company. IEEE is a registered trademark of The Institute of Electrical and Electronics Engineers, Inc. OSF/1 is a registered trademark of Open Software Foundation. Prentice Hall is a registered trademark of Prentice-Hall, Inc. of Englewood Cliffs, NJ. Windows NT is a registered trademark of Microsoft Corporation. All other trademarks and registered trademarks are the property of their respective holders. This document was prepared using VAX DOCUMENT Version 2.1. # Contents | Pr | eface | | xxi | |----------------------------------------------|--------------|----------------------------------------|------------| | 1 | Introdu | ction | | | | 1.1<br>1.1.1 | The Architecture Addressing Addressing | 1–1<br>1–2 | | | 1.1.2 | Integer Data Types | 1–2 | | | 1.1.3 | Floating-Point Data Types | 1–3 | | | 1.2 | Alpha 21164 Microprocessor Features | 1–3 | | 2 | Interna | I Architecture | | | | 2.1 | Alpha 21164 Microarchitecture | 2–2 | | | 2.1.1 | Instruction Fetch and Decode Unit | 2–4 | | | 2.1.1.1 | Instruction Decode and Issue | 2–5 | | | 2.1.1.2 | Instruction Prefetch | 2–5 | | | 2.1.1.3 | Branch Execution | 2–6 | | | 2.1.1.4 | Instruction Translation Buffer | 2–7 | | | 2.1.1.5 | Interrupts | 2–8 | | | 2.1.2 | Integer Execution Unit | 2–9 | | | 2.1.3 | Floating-Point Execution Unit | 2–9 | | | 2.1.4 | Memory Address Translation Unit | 2-10 | | | 2.1.4.1 | Data Translation Buffer | 2-10 | | | 2.1.4.2 | | 2–11 | | ٠ | 2.1.4.3 | | 2–11 | | | 2.1.4,4 | Write Buffer | 2–12 | | <i>\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\</i> | 2.1.5 | Cache Control and Bus Interface Unit | 2-12 | | <b></b> | 2.1.6 | Cache Organization | 2-12 | | ****** | 2,1,6.1 | Data Cache | 2–12 | | ٠٠, | 2.1.6.2 | | 2–13 | | | 2.1.6.3 | , | 2–13 | | | 2.1.6.4 | | 2–13 | | | 2.1.7 | Serial Read-Only Memory Interface | 2–13 | | | 2.2 | Pipeline Organization | <b>2–13</b> | |-----|---------|---------------------------------------------------------|------------------------| | | 2.2.1 | Pipeline Stages and Instruction Issue | 2–17 | | | 2.2.2 | Aborts and Exceptions | 2-18 | | | 2.2.3 | Nonissue Conditions | 2-19 | | | 2.3 | Scheduling and Issuing Rules | 2-19 | | | 2.3.1 | Instruction Class Definition and Instruction Slotting 🔊 | 2-20 | | | 2.3.2 | Coding Guidelines | 2-23 | | | 2.3.3 | Instruction Latencies | 2-23 | | | 2.3.3.1 | Producer-Producer Latency | » 2–26 | | | 2.3.4 | Issue Rules | * 2–27 | | | 2.4 | Replay Traps | 2–28 | | | 2.5 | Miss Address File and Load-Merging Rules | 2-29 | | | 2.5.1 | Merging Rules | 2–29 | | | 2.5.2 | Read Requests to the Cbox | 2–30 | | | 2.5.3 | Load Instructions to Noncacheable Space | 2–30 | | | 2.5.4 | MAF Entries and MAF Full Conditions | 2–31 | | | 2.5.5 | Fill Operation | 2–31 | | | 2.6 | Mbox Store Instruction Execution | 2–32 | | | 2.7 | Write Buffer and the WMB Instruction | 2–34 | | | 2.7.1 | The Write Buffer | 2–34 | | | 2.7.2 | The WMB Instruction | 2-34 | | | 2.7.3 | Entry Pointer Queues | 2–34 | | | 2.7.4 | Write-Buffer Entry Processing | 2–35 | | | 2.7.5 | Ordering of Noncacheable Space Write Instructions | 2–36 | | | 2.8 | Performance Measurement Support-Performance Counters | 2–36 | | | 2.9 | Floating-Point Control Register | 2–37 | | | 2.10 | Design Examples | 2–39 | | | | | | | 3 | Hardwa | are Interface | | | _ | | | | | | 3.1 | Alpha 21164 Microprocessor Logic Symbol | 3–1 | | | 3.2 | Alpha 21164 Signal Names and Functions | 3–3 | | | | | | | 4 | Clocks | , Cache, and External Interface Functional Description | | | | 4.1 | Introduction to the External Interface | 4–2 | | | 4.1.1 | System Interface | 4–2<br>4–2 | | á | 4.1.1.1 | 00000 - | 4-4 | | 100 | 4.1.2 | Bcache Interface | 4-4 | | | 4.1.2.1 | | 4 <del>-4</del><br>4-4 | | | 4.2 | Clocks | 4 <del>-4</del><br>4-5 | | | 4.2.1 | CPU Clock | 4–5<br>4–5 | | | 4.2.2 | System Clock | 4–3<br>4–6 | | | 7.6.6 | NISOUTH OTOER | 4-0 | | 4.2.3 | Delayed System Clock | 4–8 | | |----------|-------------------------------------------------------|------|--| | 4.2.4 | Reference Clock | 4–8 | | | 4.2.4.1 | Reference Clock Examples | 4–9 | | | 4.2.4.1. | Case 1: ref_clk_in_h Initially Sampled Low by | | | | | DPLL | 4–10 | | | 4.2.4.1. | Case 2: <b>ref_clk_in_h</b> Initially Sampled High by | | | | | DPLL | 4–11 | | | | Physical Address Considerations | 4-12 | | | 4.3.1 | Physical Address Regions | 4–12 | | | 4.3.2 | Data Wrapping | 4–13 | | | 4.3.3 | Noncached Read Operations | 4–13 | | | 4.3.4 | Noncached Write Operations | 4–14 | | | 4.4 | Bcache Structure | 414 | | | 4.4.1 | Duplicate Tag Store | 415 | | | 4.4.1.1 | Full Duplicate Tag Store | 4–15 | | | 4.4.1.2 | Partial Duplicate Tag Store | 4–18 | | | 4.5 | Cache Coherency | 4–18 | | | 4.5.1 | Cache Coherency Basics | 4–19 | | | 4.5.2 | Write Invalidate Cache Coherency Protocol Systems | 4–21 | | | 4.5.3 | Write Invalidate Cache Coherency States | 4–22 | | | 4.5.3.1 | Write Invalidate Protocol State Machines | 4–23 | | | 4.5.4 | Flush Cache Coherency Protocol Systems | 4–24 | | | 4.5.5 | Flush-Based Protocol State Machines | 4-26 | | | 4.5.6 | Cache Coherency Transaction Conflicts | 4–27 | | | 4.5.6.1 | Case 1 | 4–27 | | | 4.5.6.2 | Case 2 | 4–27 | | | 4.6 | Locks Mechanisms | 4–28 | | | 4.7 | 21164-to-Bcache Transactions | 4-29 | | | 4.7.1 | Bcache Timing | 429 | | | 4.7.2 | Bcache Read Transaction (Private Read Operation) | 4–30 | | | 4.7.3 | Wave Pipeline | 4–31 | | | 4.7.4 | Beache Write Transaction (Private Write Operation) | 4–32 | | | 4.7.5 | Selecting Bcache Options | 4–33 | | | 4.8 | 21164-Initiated System Transactions | 4–34 | | | 4.8.1 | READ MISS—No Bcache | 4–38 | | | 4.8.2 | READ MISS and FILL | 4–39 | | | 4.8.2.1 | READ MISS | 4–39 | | | 4.8.2.2 | FILL | 4–41 | | | 4.8.3 | READ MISS with Victim | 441 | | | 4.8.3.1 | READ MISS with Victim (Victim Buffer) | 4–42 | | | 4.8.3.2 | READ MISS with Victim (Without Victim Buffer) | 4–44 | | | 4.8.4 | WRITE BLOCK and WRITE BLOCK LOCK | 4–46 | | | | | | | | | | | | | 4.8.5 | SET DIRTY and LOCK | <b>&amp; 4–48</b> | |-----------|----------------------------------------------------------|-------------------| | 4.8.5.1 | When to Use a SET DIRTY and LOCK | 4-48 | | 4.8.6 | Memory Barrier (MB) | 4-50 | | 4.8.6.1 | When to use a MEMORY BARRIER Command | 4–50 | | 4.8.7 | FETCH | 450 | | 4.8.8 | FETCH_M | 4–50 | | 4.9 Sy | stem-Initiated Transactions | 4–51 | | 4.9.1 | Sending Commands to the 21164 | 4–51 | | 4.9.2 | Write Invalidate Protocol Commands | × 4–53 | | 4.9.2.1 | 21164 Responses to Write Invalidate Protogol | | | | Commands | 4–54 | | 4.9.2.2 | READ DIRTY and READ DIRTY/INVALIDATE | 4–56 | | 4.9.2.3 | INVALIDATE | 4–58 | | 4.9.2.4 | SET SHARED | 4-60 | | 4.9.3 | Flush-Based Cache Coherency Protocol Commands | 4-62 | | 4.9.3.1 | 21164 Responses to Flush-Based Protocol Commands | 4-63 | | 4.9.3.2 | FLUSH | 4–64 | | 4.9.3.3 | FLUSH READ READ | 4–66 | | 4.10 Da | ata Bus and Command/Address Bus Contention | 4–68 | | 4.10.1 | Command/Address Bus | 4–68 | | 4.10.2 | Read/Write Spacing—Data Bus Contention | 4–69 | | 4.10.3 | Using idle_bc_h and fill_h | 4-70 | | 4.10.4 | Using data_bus_reg_h | 4–72 | | 4.10.5 | Tristate Overlap | 4-73 | | 4.10.5.1 | READ or WRITE to FILL | 4–73 | | 4.10.5.2 | BCACHE VICTIM to FILL | 4–73 | | 4.10.5.3 | System Brache Command to FILL | 4–76 | | 4.10.5.4 | FILL to Private Read or Write Operation | 4–78 | | 4.11 21 | 164 Interface Restrictions | 4–79 | | 4.11.1 | FILL Operations after Other Transactions | 4–79 | | 4.11.2 | Command Acknowledge for WRITE BLOCK Commands | 479 | | 4.11.3 | Systems Without a Bcache | 4-79 | | 4.11.4 | WRITE BLOCK LOCK | 4–79 | | 4.12 21 | 164/System Race Conditions | 4–80 | | 4.12.1 | Rules for 21164 and System Use of External Interface | 480 | | 4.12.2 | READ MISS with Victim Example | 4–81 | | 4.12.3 | idle_bc_h and cack_h Race Example | 4-83 | | 4.12.4 | READ MISS with idle_bc_h Asserted Example | 4–85 | | 4.12.5 | READ MISS with Victim Abort Example | 4-86 | | 4.12.6 | Bcache Hit Under READ MISS Example | 4–87 | | 700000000 | ata Integrity, Beache Errors, and Command/Address Errors | 4–89 | | 4.13.1 | Data ECC and Parity | 4-89 | | 4 13 2 | Force Correction | 4-91 | | | 4.13.3 | Beache Tag Data Parity | 4–91 | |-----|------------------|-----------------------------------------------------------|-------------------| | | 4.13.4 | Bcache Tag Control Parity | 4-91 | | | 4.13.5 | Address and Command Parity | 4–92 | | | 4.13.6 | Fill Error | 4-92 | | | 4.13.7 | Forcing 21164 Reset | 4-92 | | | 4.14 | Interrupts | 4-93 | | | 4.14.1 | Interrupt Signals During Initialization | 4 <del>-9</del> 3 | | | 4.14.2 | Interrupt Signals During Normal Operation | 4-93 | | | 4.14.3 | Interrupt Priority Level | 4–93 | | | | | | | 5 | Interna | I Processor Registers | | | | 5.1 | Instruction Fetch/Decode Unit and Branch Unit (Ibox) IPRs | 5–5 | | | 5.1.1 | Istream Translation Buffer Tag Register (ITB_TAG) | 5–5 | | | 5.1.2 | Instruction Translation Buffer Page Table Entry (ITB_PTE) | | | | | Register | 5–6 | | | 5.1.3 | Instruction Translation Buffer Address Space Number | | | | | (ITB_ASN) Register | 5–8 | | | 5.1.4 | Instruction Translation Buffer Page Table Entry Temporary | | | | | (ITB_PTE_TEMP) Register | 5–9 | | | 5.1.5 | Instruction Translation Buffer Invalidate All Process | | | | | (ITB_IAP) Register | 5–9 | | | 5.1.6 | Instruction Translation Buffer Invalidate All (ITB_IA) | | | | C 4 7 | Register | 5–9 | | | 5.1.7 | Instruction Translation Buffer IS (ITB_IS) Register | 5–10 | | | 5.1.8 | Formatted Faulting Virtual Address (IFAULT_VA_FORM) | 5–11 | | | E 1 0 | Register | 5–11<br>5–12 | | | 5.1.9 | Virtual Page Table Base Register (IVPTBR) | 5–12<br>5–13 | | | 5.1.10<br>5.1.11 | Icache Parity Error Status (ICPERR_STAT) Register | 5–13<br>5–13 | | | 5.1.11 | Exception Address (EXC_ADDR) Register | 5–13 | | | 5.1.12 | Exception Summary (EXC_SUM) Register | 5-15 | | | 5.1.14 | Exception Mask (EXC_MASK) Register | 5–17 | | | 5.1.15 | PAL Base Address (PAL_BASE) Register | 5–17<br>5–18 | | | 5.1.16 | Processor Status (PS) Register | 5–19 | | á | 5.1.17 | Ibox Control and Status Register (ICSR) | 5-20 | | | 5.1.18 | | 5-23 | | | 5.1.19 | Interrupt ID (INTID) Register | 5-24 | | | 5.1.20 | Asynchronous System Trap Request Register (ASTRR) | 5-25 | | 366 | 5.1.21 | Asynchronous System Trap Enable Register (ASTER) | 5-26 | | | 5.1.22 | Software Interrupt Request Register (SIRR) | 5-27 | | | 5.1.23 | Hardware Interrupt Clear (HWINT_CLR) Register | 5–28 | | | 5.1.24 | Interrupt Summary Register (ISR) | 5-29 | | | | <u> </u> | _ | | 5.1.25 | Serial Line Transmit (SL_XMIT) Register | 5–31 | |--------|------------------------------------------------------------|------| | 5.1.26 | Serial Line Receive (SL_RCV) Register | 5–32 | | 5.1.27 | Performance Counter (PMCTR) Register | 5-33 | | 5.2 | Memory Address Translation Unit (Mbox) IPRs | 5-38 | | 5.2.1 | Dstream Translation Buffer Address Space Number | | | | (DTB_ASN) Register | 5–38 | | 5.2.2 | Dstream Translation Buffer Current Mode (DTB_CM) | | | | Register | 5–39 | | 5.2.3 | Dstream Translation Buffer Tag (DTB_TAG) Register | 5–40 | | 5.2.4 | Dstream Translation Buffer Page Table Entry (DTB_PTE) | | | | Register | 5–41 | | 5.2.5 | Dstream Translation Buffer Page Table Entry Temporary | | | | (DTB_PTE_TEMP) Register | 5–43 | | 5.2.6 | Dstream Memory Management Fault Status (MM_STAT) | | | | Register | 5–44 | | 5.2.7 | Faulting Virtual Address (VA) Register | 5–46 | | 5.2.8 | Formatted Virtual Address (VA_FORM) Register | 5–47 | | 5.2.9 | Mbox Virtual Page Table Base Register (MVPTBR) | 5–49 | | 5.2.10 | Deache Parity Error Status (DC_PERR_STAT) Register | 5–50 | | 5.2.11 | Dstream Translation Buffer Invalidate All Process (DTBIAP) | | | | Register | 5–52 | | 5.2.12 | Dstream Translation Buffer Invalidate All (DTBIA) Register | | | | | 5–52 | | 5.2.13 | Dstream Translation Buffer Invalidate Single (DTBIS) | | | | Register | 5–53 | | 5.2.14 | Mbox Control Register (MCSR) | 5–54 | | 5.2.15 | Dcache Mode (DC_MODE) Register | 5–56 | | 5.2.16 | Miss Address File Mode (MAF_MODE) Register | 5–58 | | 5.2.17 | Dcache Flush (DC_FLUSH) Register | 5–60 | | 5.2.18 | Alternate Mode (ALT_MODE) Register | 5–60 | | 5.2.19 | Cycle Counter (CC) Register | 5–61 | | 5.2.20 | Cycle Counter Control (CC_CTL) Register | 5-62 | | 5.2.21 | Deache Test Tag Control (DC_TEST_CTL) Register | 5–63 | | 5.2.22 | Deache Test Tag (DC_TEST_TAG) Register | 5–64 | | 5.2.23 | Deache Test Tag Temporary (DC_TEST_TAG_TEMP) | | | | Register | 5–66 | | 5.3 | External Interface Control (Cbox) IPRs | 5-68 | | 5.3.1 | Scache Control (SC_CTL) Register | 5-69 | | 5.3.2 | Scache Status (SC_STAT) Register | 5–72 | | 5.3.3 | Scache Address (SC_ADDR) Register | 5-75 | | 5.3.4 | Bcache Control (BC_CONTROL) Register | 5–78 | | 5.3.5 | Beache Configuration (BC_CONFIG) Register | 5-84 | | 5.3.6 | Bcache Tag Address (BC_TAG_ADDR) Register | 5–88 | | | 5.3.7 | External Interface Status (EI_STAT) Register | 5–90 | |----|----------------|------------------------------------------------------------|-----------------| | | 5.3.8 | External Interface Address (EI_ADDR) Register | 5–93 | | | 5.3.9 | Fill Syndrome (FILL_SYN) Register | 5-94 | | | 5.4 | PAL Storage Registers | <sup>5–98</sup> | | | 5.5 | PAL Storage Registers | <b>5</b> –99 | | | 5.5.1 | Cbox IPR PAL Restrictions | 5-99 | | | 5.5.2 | PAL Restrictions-Instruction Definitions | 5-100 | | | | | | | 6 | Drivila | ged Architecture Library Code | | | • | | | | | | 6.1 | PALcode Description | 6–1 | | | 6.2 | PALmode Environment | 6–2 | | | 6.3 | Invoking PALcode | 6–3 | | | 6.4 | PALcode Entry Points | 6–5 | | | 6.4.1 | CALL_PAL Entry | 6–5 | | | 6.4.2 | PALcode Trap Entry Points | 6–6 | | | 6.5 | Required PALcode Function Codes | 6–7 | | | 6.6 | Alpha 21164 Implementation of the Architecturally Reserved | | | | | Opcodes Instructions | 6–7 | | | 6.6.1 | HW_LD Instruction | 6–8 | | | 6.6.2 | HW_ST Instruction | 6–10 | | | 6.6.3 | HW_REI Instruction | 6–11 | | | 6.6.4 | HW_MFPR and HW_MTPR Instructions | 6–11 | | | | | | | 7 | Initiali | zation and Configuration | | | _ | | | a | | | 7.1 | Input Signals sys_reset_l and dc_ok_h and Booting | 7–1<br>7–6 | | | 7.1.1<br>7.1.2 | Power-Up Requirements | 7-6<br>7-6 | | | | Pin State with dc_ok_h Not Asserted | 7-6<br>7-6 | | | 7.2<br>7.3 | Sysclk Ratio and Delay | 7-6<br>7-6 | | | 7.3<br>7.4 | Built-In Self-Test (BiSt) | 7-6<br>7-6 | | | 7.4<br>7.5 | Serial Terminal Port | 7-7 | | | 7.5<br>7.6 | Cache Initialization | 7-7<br>7-7 | | | 7.6.1 | Icache Initialization | 7-7<br>7-8 | | .6 | 7.6.2 | 7000000 | 76<br>78 | | ø | 7.6.2 | Flushing Dirty Blocks | 7-0<br>7-9 | | | 7.7<br>7.8 | Internal Processor Register Reset State | 7-9<br>7-9 | | | 7.6<br>7.9 | Timeout Reset | 7-13 | | | 7.10 | IEEE 1149.1 Test Port Reset | 7-13 | | | 7.10 | TEPE 1143.1 Test for theset | 7-10 | | | | | | | 8 | Error D | etection and Error Handling | | |----|---------|------------------------------------------------------------|------------------| | | 8.1 | Error Flows | 8–1 | | | 8.1.1 | Icache Data or Tag Parity Error | 8-1 | | | 8.1.2 | Scache Data Parity Error—Istream | 8-2 | | | 8.1.3 | Scache Tag Parity Error—Istream | 8-3 | | | 8.1.4 | Scache Data Parity Error—Dstream Read/Write, | | | | | READ_DIRTY | 8–3 | | | 8.1.5 | Scache Tag Parity Error—Dstream or System Commands | · 8–4 | | | 8.1.6 | Dcache Data Parity Error | <sup>®</sup> 8–4 | | | 8.1.7 | Deache Tag Parity Error | 8–5 | | | 8.1.8 | Istream Uncorrectable ECC or Data Parity Errors (Beache or | | | | | Memory) | 8–5 | | | 8.1.9 | Dstream Uncorrectable ECC or Data Parity Errors (Beache | | | | | or Memory) | 8–6 | | | 8.1.10 | Bcache Tag Parity Errors—Istream | 8–7 | | | 8.1.11 | Bcache Tag Parity Errors—Dstream | 8–7 | | | 8.1.12 | System Command/Address Parity Error | 8–8 | | | 8.1.13 | System Read Operations of the Bcache | 8–8 | | | 8.1.14 | Istream or Dstream Correctable ECC Error (Beache or | | | | | Memory) | 89 | | | 8.1.15 | Fill Timeout (FILL_ERROR_H) | 8–9 | | | 8.1.16 | System Machine Check | 8-10 | | | 8.1.17 | System Machine Check | 8–10 | | | 8.1.18 | cfail_h and Not cack_h | 8–10 | | | 8.2 | MCHK Flow | 8-11 | | | 8.3 | Processor-Correctable Error Interrupt Flow (IPL 31) | 8–13 | | | 8.4 | MCK_INTERRUPT Flow | 8–14 | | | 8.5 | System-Correctable Error Interrupt Flow (IPL 20) | 8–14 | | | | | | | 9 | Flectri | cal Data | | | • | LICOLIT | | | | | 9.1 | Electrical Characteristics | 9–1 | | | 9.2 | dc Characteristics | 9–2 | | | 9.2.1 | Power Supply | 9–2 | | | 9.2.2 | Input Signal Pins | 9–2 | | | 9.2.3 | Output Signal Pins | 9–2 | | 4 | 9.3 | ac Characteristics | 9–3 | | *0 | 9.3.1 | Clocking Scheme | 9–3 | | | 9.3.2 | Input Clocks | 9–4 | | | 9.3.2.1 | ········· | 9–5 | | | 9.3.2.2 | | 9–5 | | | 9.3.3 | Signal Characteristics | 9–6 | | | | | | | | 9.3.4 | Backup Cache Loop Timing | 9-6 | |------|---------|-----------------------------------------|-------| | | 9.3.4.1 | sys_clk-Based Systems | 9–8 | | | 9.3.4.2 | Reference Clocks | 9–11 | | | 9.3.4.3 | Digital Phase Locked Loop | 9–13 | | | 9.3.4.4 | Timing—Additional Signals | 9–14 | | | 9.3.5 | Clock Test Modes | 9-17 | | | 9.3.5.1 | Normal Mode | 9-17 | | | 9.3.5.2 | Chip Test Mode | 9-17 | | | 9.3.5.3 | Module Test Mode | 9–18 | | | 9.3.5.4 | Clock Test Reset Mode | 9–18 | | | 9.3.6 | Test Configuration | 9–18 | | | 9.3.7 | IEEE 1149.1 Performance | 9-19 | | | 9.4 | Power Supply Considerations | 9–19 | | | 9.4.1 | Decoupling | 9-20 | | | 9.4.2 | Power Supply Sequencing | 9-20 | | | | | | | 40 | These | eal Management | | | 10 | mem | nal Management | | | | 10.1 | Thermal Specifications | 10-1 | | | 10.1.1 | Operating Temperature | 10–1 | | | 10.1.2 | Thermal Resistance | 10-1 | | | 10.2 | Heat Sink Specifications | 10-3 | | | 10.3 | Thermal Design Considerations | 10-4 | | | | | | | 11 | Mooh | anical Data and Packaging Information | | | | MECH | anical Data and Fackaging information | | | | 11.1 | Mechanical Specifications | 11–1 | | | 11.2 | Signal Descriptions and Pin Assignment | 11–3 | | | 11.2.1 | Signal Pin Lists | 11-3 | | | 11.2.2 | Pin Assignment | 11–8 | | | | | | | 12 | Tooto | hility and Dicanastics | | | 12 | iesta | bility and Diagnostics | | | | 12.1 | Test Port Pins | 12-1 | | ei. | 12.2 | Test Interface | 12-2 | | | 12:2:1 | SROM Port | 12-2 | | | 12.2.2 | Serial Terminal Port | 12-3 | | | 12,2,3 | IEEE 1149.1 Test Access Port | 12-3 | | | 12.2.4 | Test Status Pins | 126 | | **** | 12.3 | Serial Instruction Cache Load Operation | 12-7 | | | 12.4 | Boundary Scan Register | 12–8 | | | 12.5 | Timing of Test Features | 12-11 | | | 12.5.1 | Icache BiSt Operation Timing | 12-12 | | | | 3 | _ | | | 12.5.2 | Automatic SROM Load Timing | 12–13 | |-----|-------------|----------------------------------------------------------------|--------| | A | Alpha | AXP Instruction Set | | | | A.1 | Alpha AXP Instruction Summary | A-1 | | | A.1.1 | Opcodes Reserved for Digital | A-6 | | | A.1.2 | Opcodes Reserved for PALcode | A-7 | | | <b>A</b> .2 | IEEE Floating-Point Instructions | | | | <b>A</b> .3 | VAX Floating-Point Instructions | A−9 | | | A.4 | Opcode Summary | * A-10 | | | A.5 | Required PALcode Function Codes | A-11 | | | A.6 | Alpha 21164 Microprocessor IEEE Floating-Point | | | | | Conformance | A-12 | | _ | | | | | В | Alpha | 21164 Microprocessor Specifications | | | | | · · · · · · · · · · · · · · · · · · · | | | C | Errata | Sheet | | | | | | | | D | Techni | ical Support, Ordering, and Associated Literature | | | | D.1 | Calling the Semiconductor Information Line for Information and | | | | | Technical Support | D-1 | | | D.2 | Ordering Digital Semiconductor Products | D-1 | | | D.3 | Ordering Digital Semiconductor Sample Kits | D-2 | | | D.4 | Ordering Associated Digital Semiconductor Literature | D-2 | | | D.5 | Ordering Associated Third-Party Literature | D-3 | | | | | | | GI | ossary | | | | _ | | | | | In | dex | | | | | | | | | Fi | gures | | | | • | | | | | | 2–1 | Alpha 21164 Microprocessor Block Diagram | 2–3 | | | 2–2 | Instruction Pipeline Stages | 2–15 | | 700 | 2–3 | Floating-Point Control Register (FPCR) Format | 2–37 | | | 2-4 | Typical Uniprocessor Configuration | 2–39 | | | 2–5 | Typical Multiprocessor Configuration | 2–40 | | | 2–6 | Cacheless Multiprocessor Configuration | 2-41 | | | | | | | 3–1 | Alpha 21164 Microprocessor Logic Symbol | 3–2 | |------|--------------------------------------------------------|------| | 4–1 | Alpha 21164 System/Bcache Interface | 4–3 | | 4-2 | Clock Signals and Functions | 4–6 | | 4–3 | Alpha 21164 Uniprocessor Clock | 4–7 | | 4-4 | Alpha 21164 Reference Clock for Multiprocessor Systems | 4–9 | | 4–5 | ref_clk_in_h Initially Sampled Low | 4-10 | | 4–6 | ref_clk_in_h Initially Sampled High | 4-11 | | 4–7 | Full Scache Duplicate Tag Store | 4–16 | | 4–8 | Duplicate Tag Store Algorithm | 4–17 | | 4–9 | Partial Duplicate Tag Store | 4–18 | | 4–10 | Cache Subset Hierarchy | 4–19 | | 4–11 | Write Invalidate Protocol 21164 States | 4-23 | | 4–12 | Write Invalidate Protocol System/Bas States | 4–24 | | 4–13 | Flush-Based Protocol 21164 States | 4-26 | | 4–14 | Flush-Based Protocol System/Bus States | 4-26 | | 4–15 | Beache Read Transaction | 4-30 | | 4–16 | Wave Pipeline Timing Diagram | 4-31 | | 4–17 | Bcache Write Transaction | 4-32 | | 4–18 | READ MISS—No Beache Timing Diagram | 4–38 | | 4-19 | READ MISS Timing Diagram | 4-40 | | 4-20 | READ MISS with Victim (Victim Buffer) Timing Diagram | 4-43 | | 4-21 | READ MISS with Victim (without Victim Buffer) Timing | | | | Diagram | 4–45 | | 4-22 | WRITE BLOCK Timing Diagram | 4–47 | | 4–23 | SET DIRTY and LOCK Timing Diagram | 4-49 | | 4–24 | Algorithm for System Sending Commands to the 21164 | 4–52 | | 4–25 | READ DIRTY Timing Diagram (Scache Hit) | 4–57 | | 4–26 | INVALIDATE Timing Diagram | 4–59 | | 4–27 | SET SHARED Timing Diagram | 4–61 | | 4–28 | FLUSH Timing Diagram (Scache Hit) | 4–65 | | 4–29 | READ Timing Diagram (Scache Hit) | 4–67 | | 4–30 | Driving the Command/Address Bus | 4–68 | | 4–31 | Example of Using idle_bc_h and fill_h | 4–71 | | 4-32 | Using data_bus_req_h | 4–72 | | 4-33 | READ MISS Completed First—Victim Buffer | 4–74 | | 4–34 | READ MISS Second—No Victim Buffer | 4–75 | | 4–35 | System Command to FILL Example 1 | 4–76 | | 4–36 | System Command to FILL Example 2 | <b>4–77</b> | |------------------|---------------------------------------------------------------------------------|---------------| | 4–37 | FILL to Private Read or Write | 4–78 | | 4–38 | READ MISS with Victim Example | 4-82 | | 4–39 | idle_bc_h and cack_h Race Example | 4-84 | | 4–40 | READ MISS With idle_bc_h Asserted Example | 4-85 | | 4–41 | READ MISS with Victim Abort Example | 4-87 | | 4–42 | Bcache Hit Under READ MISS Example | 4–88 | | 4–43 | ECC Code | 4-90 | | 4-44 | Alpha 21164 Interrupt Signals | 4 <b>–9</b> 3 | | 5–1 | Istream Translation Buffer Tag Register (ITB_TAG) | 5–5 | | 5–2 | Instruction Translation Buffer Page Table Entry (ITB_PTE) Register Write Format | 5–6 | | 5–3 | Instruction Translation Buffer Page Table Entry (ITB_PTE) Register Read Format | 5–7 | | 5–4 | Instruction Translation Buffer Address Space Number (ITB_ASN) Register | 5–8 | | 5–5 | Instruction Translation Buffer IS (ITB_IS) Register | 5–10 | | 5–6 | Formatted Faulting Virtual Address (IFAULT_VA_FORM) | | | | Register (NT_Mode=0) | 5–11 | | 5–7 | Formatted Faulting Virtual Address (IFAULT_VA_FORM) Register (NT_Mode=1) | 5–11 | | 5–8 | Virtual Page Table Base Register (IVPTBR) (NT_Mode=0) | 5–12 | | 5–9 | Virtual Page Table Base Register (IVPTBR) (NT_Mode=1) | 5–12 | | 5–10 | Icache Parity Error Status (ICPERR_STAT) Register | 5–13 | | 5–11 | Exception Address (EXC_ADDR) Register | 5–14 | | 5-12 | Exception Summary (EXC_SUM) Register | 5–15 | | 5–13 | Exception Mask (EXC_MASK) Register | 5-17 | | 5–14 | PAL Base Address (PAL_BASE) Register | 5–18 | | 5–15 | Processor Status (PS) Register | 5–19 | | 5–16 | Ibox Control and Status Register (ICSR) | 5–20 | | 5-17 | Interrupt Priority Level (IPL) Register | 5–23 | | <del>5</del> –18 | Interrupt ID (INTID) Register | 5–24 | | 5–19 | Asynchronous System Trap Request Register (ASTRR) | 5–25 | | 5–20 | Asynchronous System Trap Enable Register (ASTER) | 5–26 | | 5-21 | Software Interrupt Request Register (SIRR) | 5–27 | | 5–22 | Hardware Interrupt Clear (HWINT_CLR) Register | 528 | | 5–23 | Interrupt Summary Register (ISR) | 5–29 | | 5–24 | Serial Line Transmit (SL_XMIT) Register | 5–31 | | | | | | 5–25 | Serial Line Receive (SL_RCV) Register | 5–32 | |------|-------------------------------------------------------------------------------|------| | 5–26 | Performance Counter (PMCTR) Register | 5–33 | | 5–27 | Dstream Translation Buffer Address Space Number (DTB_ASN) Register | 5–38 | | 5–28 | Dstream Translation Buffer Current Mode (DTB_CM) Register | 5–39 | | 5–29 | Dstream Translation Buffer Tag (DTB_TAG) Register | 5-40 | | 5–30 | Dstream Translation Buffer Page Table Entry (DTB_PTE) Register—Write Format | 5–42 | | 5–31 | Dstream Translation Buffer Page Table Entry Temporary (DTB_PTE_TEMP) Register | 5–43 | | 5–32 | Dstream Memory Management Fault Status (MM_STAT) Register | 5–44 | | 5-33 | Register Faulting Virtual Address (VA) Register | 5–46 | | 5-34 | Formatted Virtual Address (VA_FORM) Register | 0 10 | | 0 01 | (NT_Mode=1) | 5-47 | | 5–35 | Formatted Virtual Address (VA_FORM) Register (NT_Mode=0) | 5–47 | | 5-36 | Mbox Virtual Page Table Base Register (MVPTBR) | 5–49 | | 5–37 | Deache Parity Error Status (DC_PERR_STAT) Register | 5–50 | | 5–38 | Dstream Translation Buffer Invalidate Single (DTBIS) | | | | Register | 5-53 | | 5–39 | Mbox Control Register (MCSR) | 5-54 | | 5-40 | Dcache Mode (DC_MODE) Register | 5–56 | | 5–41 | Miss Address File Mode (MAF_MODE) Register | 5–58 | | 5-42 | Alternate Mode (ALT_MODE) Register | 5-60 | | 5-43 | Cycle Counter (CC) Register | 5–61 | | 5-44 | Cycle Counter Control (CC_CTL) Register | 5–62 | | 5–45 | Deache Test Tag Control (DC_TEST_CTL) Register | 5–63 | | 5–46 | Dcache Test Tag (DC_TEST_TAG) Register | 5-64 | | 5-47 | Dcache Test Tag Temporary (DC_TEST_TAG_TEMP) | | | | Register | 5–66 | | 5-48 | Scache Control (SC_CTL) Register | 5–69 | | 5–49 | Scache Status (SC_STAT) Register | 5–72 | | 5-50 | Scache Address (SC_ADDR) Register | 5-76 | | 5-51 | Bcache Control (BC_CONTROL) Register | 5–78 | | 5–52 | Bcache Configuration (BC_CONFIG) Register | 5–84 | | 5–53 | Bcache Tag Address (BC_TAG_ADDR) Register | 5–88 | | | 5-54 | External Interface Status (EI_STAT) Register | <u></u> 5–91 | |-----|------|--------------------------------------------------|--------------| | | 5-55 | External Interface Address (EI_ADDR) Register | <b>5</b> –93 | | | 5-56 | Fill Syndrome (FILL_SYN) Register | 5-95 | | | 6–1 | HW_LD Instruction Format | 6-9 | | | 6–2 | HW_ST Instruction Format | 6-10 | | | 6–3 | HW_REI Instruction Format | 6–11 | | | 6–4 | HW_MFPR and HW_MTPR Instruction Format | 6–12 | | | 9–1 | osc_clk_in_h,l Input Network and Terminations | 9–4 | | | 9–2 | Bcache Timing | 9–8 | | | 9–3 | sys_clk System Timing | 9–10 | | | 9–4 | ref_clk System Timing | 9–13 | | | 10–1 | Type #1 Heat Sink | 10–3 | | | 10–2 | Type #2 Heat Sink | 10–4 | | | 11–1 | Package Dimensions | 11–2 | | | 11–2 | Alpha 21164 Top View (Pin Down) | 11–8 | | | 11–3 | Alpha 21164 Bottom View (Pin Up) | 11–9 | | | 12–1 | IEEE 1149.1 Test Access Part | 12-4 | | | 12–2 | TAP Controller State Machine | 12–5 | | | 12–3 | BiSt Timing Event-Time Line | 12–12 | | | 12-4 | SROM Load Timing Event-Time Line | 12–13 | | | 12–5 | Serial ROM Load Timing | 12–14 | | Tab | oles | | | | | 1 | Register Field Type Notation | xxvi | | | 2 | Register Field Notation | xxvii | | | 2–1 | Pipeline Examples—All Cases | 2–14 | | | 2–2 | Pipeline Examples—Integer Add | 2–14 | | | 2–3 | Pipeline Examples—Floating Add | 2–16 | | | 2–4 | Pipeline Examples—Load (Deache Hit) | 2–16 | | | 2–5 | Pipeline Examples—Load (Dcache Miss) | 2–17 | | | 2–6 | Pipeline Examples—Store (Deache Hit) | 2–17 | | | 2–7 | Instruction Classes and Slotting | 2–20 | | ** | 2–8 | Instruction Latencies | 2–24 | | | 2–9 | Floating-Point Control Register Bit Descriptions | 2–37 | | | 3–1 | Alpha 21164 Signal Descriptions | 3–3 | | | 3–2 | Alpha 21164 Signal Descriptions by Function | 3–13 | | | * | | | | 4–1 | CPU Clock Generation Control | 4–5 | |------|-----------------------------------------------------------------|------| | 4–2 | System Clock Divisor | 4-7 | | 43 | System Clock Delay | 4–8 | | 4–4 | Physical Memory Regions | 4–13 | | 4–5 | Components for 21164 Write Invalidate Systems | 4-21 | | 4–6 | Beache States for Cache Coherency Protocols | 4–22 | | 47 | Components for 21164 Flush Cache Protocol Systems | 4-25 | | 48 | Beache Options | 4–33 | | 4–9 | 21164-Initiated Interface Commands | 4–35 | | 4–10 | System-Initiated Interface Commands (Write Invalidate Protocol) | 4–53 | | 4-11 | 21164 Responses on addr_res_h<1:0> to Write Invalidate | | | | Protocol Commands | 4–54 | | 4–12 | 21164 Responses on addr_res_h<2> to 21164 Commands | 4–55 | | 4–13 | 21164 Minimum Response Time to Write Invalidate Protocol | | | | Commands | 4–55 | | 4–14 | System-Initiated Interface Commands (Flush Protocol) | 4–62 | | 4–15 | 21164 Responses to Flush-Based Protocol Commands | 4-63 | | 4–16 | 21164 Responses on addr_res_h<2> to 21164 Commands | 4–63 | | 4–17 | Minimum 21164 Response Time to Write Invalidate Protocol | | | | Commands | 464 | | 4–18 | Minimum 21164 Response Time to Flush Protocol | | | | Commands | 4-64 | | 4–19 | Data Check Bit Correspondence to CBn | 4–90 | | 4–20 | Interrupt Priority Level Effect | 4–94 | | 5–1 | Ibox, Mbox, Dcache, and PALtemp IPR Encodings | 5-2 | | 5–2 | Granularity Hint Bits in ITB_PTE_TEMP Read Format | 5–9 | | 5–3 | Icache Parity Error Status Register Fields | 5–13 | | 5–4 | Exception Summary Register Fields | 5–15 | | 5–5 | Ibox Control and Status Register Fields | 5–21 | | 5–6 | Software Interrupt Request Register Fields | 5–27 | | 5–7 | Hardware Interrupt Clear Register Fields | 5–28 | | 5–8 | Interrupt Summary Register Fields | 5–30 | | 5–9 | Serial Line Transmit Register Fields | 5–31 | | 5–10 | Serial Line Receive Register Fields | 5–32 | | 5-11 | Performance Counter Register Fields | 5–34 | | 5–12 | PMCTR Counter Select Options | 5–35 | | 5–13 | Measurement Mode Control | 5–36 | | | | | | 5–14 | Dstream Memory Management Fault Status Register Fields | 5–44 | |------|--------------------------------------------------------|---------------| | 5–15 | Formatted Virtual Address Register Fields | 5–48 | | 5–16 | Deache Parity Error Status Register Fields | 5 <b>-5</b> 1 | | 5–17 | Mbox Control Register Fields | 5-55 | | 5–18 | Deache Mode Register Fields | 5–57 | | 5–19 | Miss Address File Mode Register Fields | 5–59 | | 5–20 | Alternate Mode Register Settings | 560 | | 5-21 | Cycle Counter Control Register Fields | 5–62 | | 5–22 | Deache Test Tag Control Register Fields | 5–63 | | 5–23 | Deache Test Tag Register Fields | 5–65 | | 5-24 | Deache Test Tag Temporary Register Fields | 5–67 | | 5-25 | Cbox Internal Processor Register Descriptions | 568 | | 526 | Scache Control Register Fields | 5–70 | | 5–27 | Scache Status Register Fields | 5–73 | | 5–28 | SC_CMD Field Descriptions | 5-74 | | 5–29 | Scache Address Register Fields | 5–77 | | 5–30 | Beache Control Register Fields | 5–79 | | 5–31 | PM_MUX_SEL Register Fields | 5–83 | | 5-32 | Bcache Configuration Register Fields | 5–85 | | 5–33 | Bcache Tag Address Register Fields | 5–89 | | 5–34 | Loading and Locking Rules for External Interface | | | | Registers | 5–91 | | 5–35 | EI_STAT Register Fields | 5–92 | | 5–36 | Syndromes for Single-Bit Errors | 5–95 | | 5–37 | Cbox IPR PAL Restrictions | | | 5–38 | PAL Restrictions Table | | | 6–1 | PALcode Trap Entry Points | | | 6–2 | Required PALcode Function Codes | | | 6–3 | Opcodes Reserved for PALcode | | | 6–4 | HW_LD Format Description | | | 6–5 | HW_ST Format Description | | | 6–6 | HW_REI Format Description | 6–11 | | 67 | HW_MTPR and HW_MFPR Format Description | | | 7–1 | Alpha 21164 Signal Pin Reset State | | | 7–2 | Internal Processor Register Reset State | 7–10 | | 9–1 | Alpha 21164 Absolute Maximum Ratings | 9–1 | | 9–2 | CMOS DC Characteristics | 9–2 | |------|-----------------------------------------------------------|-------------| | 9–3 | Input Clock Specification | 9–6 | | 9–4 | Bcache Loop Timing | 9–6 | | 9–5 | Output Driver Characteristics | 9–7 | | 9–6 | Alpha 21164 System Clock Output Timing (sysclk=Ta) | <b>9</b> –9 | | 9–7 | Alpha 21164 Reference Clock Input Timing | 9–11 | | 9–8 | ref_clk System Timing Stages | 9-14 | | 9–9 | Input Timing for sys_clk_out- or ref_clk_in-Based Systems | | | | Systems | 9–15 | | 9–10 | Output Timing for sys_clk_out- or ref_clk_in-Based | | | | Systems | 9–15 | | 9–11 | Beache Control Signal Timing | 9–17 | | 9–12 | Test Modes | 9–18 | | 9–13 | IEEE 1149.1 Circuit Performance Specifications | 9–19 | | 10–1 | $ heta_{ m c}a$ at Various Airflows | 10–2 | | 10-2 | Maximum $T_a$ at Various Airflows $\dots$ | 10–2 | | 11–1 | Alphabetic Signal Pin List | 11–3 | | 12-1 | Alpha 21164 Test Port Pins | 12–1 | | 12-2 | Compliance Enable Inputs | 12–3 | | 12-3 | Instruction Register | 12-6 | | 12-4 | Boundary Scan Register Organization | 12-9 | | 12–5 | BiSt Timing for Some System Clock Ratios, Port | | | • | Mode=Normal (System Cycles) | 12–12 | | 12–6 | BiSt Timing for Some System Clock Ratios, Port | | | | Mode=Normal (CPU Cycles) | 12–13 | | 12–7 | SROM Load Timing for Some System Clock Ratios (System | | | | Cycles) | 12–14 | | 12–8 | SROM Load Timing for Some System Clock Ratios (CPU | 40.44 | | A 4 | Cycles) | 12–14 | | A-1 | Instruction Format and Opcode Notation | A-1 | | A-2 | Architecture Instructions | A-2 | | A-3 | Opcodes Reserved for Digital | A-7 | | A-4 | Opcodes Reserved for PALcode | A-7 | | A-5 | IEEE Floating-Point Instruction Function Codes | A–8 | | A-6 | VAX Floating-Point Instruction Function Codes | A-9 | | A-7 | Opcode Summary | A-11 | | A8 | Required PALcode Function Codes | A-12 | | B-1 | Alpha 21164 Microprocessor Specifications | B-2 | Document Revision History ..... C-1 C-1 ### **Preface** #### **Audience** This reference manual is for system designers and programmers who use the Alpha 21164 microprocessor. #### Content This reference manual contains the following chapters and appendixes: - Chapter 1 introduces the 21164 and provides an overview of Alpha AXP architecture. - Chapter 2 describes the major hardware functions and the internal chip architecture. It includes performance measurement, coding rules, and design examples. - Chapter 3 lists and describes the external hardware interface signals. - Chapter 4 describes the external bus functions and transactions, lists bus commands, and describes the clock functions. - Chapter 5 lists and describes the 21164 internal processor register set. - Chapter 6 describes the privileged architecture library code (PALcode). - Chapter 7 describes the processes involved in, and states after, initialization and configuration. - Chapter 8 describes error detection and error handling. - Chapter 9 provides electrical data and describes signal integrity issues. - Chapter 10 provides information about thermal management considerations. - Chapter 11 provides mechanical data and packaging information, including signal pin lists. - Chapter 12 describes chip and system testability features. - Appendix A summarizes the Alpha AXP instruction set. - Appendix B summarizes the 21164 specifications. - Appendix C lists changes and revisions to this manual. - Appendix D provides phone numbers for support and lists related Digital publications with order information. - The Glossary lists and defines terms associated with the 21164. The companion volume to this manual, the *Alpha Architecture Reference Manual*, contains the Alpha AXP Architecture information. ### **Terminology and Conventions** The following sections describe the terminology and conventions used in this manual. #### Numbering All numbers are decimal unless otherwise indicated. Where there is ambiguity, numbers other than decimal are indicated with the name of the base following the number in parentheses, for example FF (hex). #### **Security Holes** Security holes exist when unprivileged software (that is, software running outside of kernel mode) can: - Affect the operation of another process without authorization from the operating system. - Amplify its privilege without authorization from the operating system. - Communicate with another process, either overtly or covertly, without authorization from the operating system. #### UNPREDICTABLE and UNDEFINED Throughout this manual, the terms UNPREDICTABLE and UNDEFINED are used. Their meanings are quite different and must be carefully distinguished. In particular, only privileged software (that is, software running in kernel mode) can trigger UNDEFINED operations. Unprivileged software cannot trigger UNDEFINED operations. However, either privileged or unprivileged software can trigger UNPREDICTABLE results or occurrences. UNPREDICTABLE results or occurrences do not disrupt the basic operation of the processor. The processor continues to execute instructions in its normal manner. In contrast, UNDEFINED operations can halt the processor or cause it to lose information. The terms UNPREDICTABLE and UNDEFINED can be further described as follows: #### **UNPREDICTABLE** - Results or occurrences specified as UNPREDICTABLE may vary from moment to moment, implementation to implementation, and instruction to instruction within implementations. Software can never depend on results specified as UNPREDICTABLE. - An UNPREDICTABLE result may acquire an arbitrary value subject to a few constraints. Such a result may be an arbitrary function of the input operands or of any state information that is accessible to the process in its current access mode. UNPREDICTABLE results may be unchanged from their previous values. - Operations that produce UNPREDICTABLE results may also produce exceptions. - An occurrence specified as UNPREDICTABLE may happen or not based on an arbitrary choice function. The choice function is subject to the same constraints as are UNPREDICTABLE results and, in particular, must not constitute a security hole. - Specifically, UNPREDICTABLE results must not depend upon, or be a function of the contents of memory locations or registers that are inaccessible to the current process in the current access mode. Also, operations that may produce UNPREDICTABLE results must not: - Write or modify the contents of memory locations or registers to which the current process in the current access mode does not have access. - Halt or hang the system or any of its components. For example, a security hole would exist if some UNPREDICTABLE result depended on the value of a register in another process, on the contents of processor temporary registers left behind by some previously running process, or on a sequence of actions of different processes. #### UNDEFINED - Operations specified as UNDEFINED may vary from moment to moment, implementation to implementation, and instruction to instruction within implementations. The operation may vary in effect from nothing, to stopping system operation. - UNDEFINED operations may halt the processor or cause it to lose information. However, UNDEFINED operations must not cause the processor to hang, that is, reach an unhalted state from which there is no transition to a normal state in which the machine executes instructions. Only privileged software (that is, software running in kernel mode) may trigger UNDEFINED operations. #### **Data Field Size** The term INTnn, where nn is one of 2, 4, 8, 16, 32, or 64, refers to a data field of nn contiguous naturally aligned bytes. For example, INT4 refers to a naturally aligned longword. #### **Ranges and Extents** Ranges are specified by a pair of numbers separated by three periods (...) and are inclusive. For example, a range of integers 0...4 includes the integers 0, 1, 2, 3, and 4. Extents are specified by a pair of numbers in angle brackets separated by a colon (:) and are inclusive. For example, bits <7:3> specify an extent of bits including bits 7, 6, 5, 4, and 3. #### ALIGNED and UNALIGNED In this manual the terms ALIGNED and NATURALLY ALIGNED are used interchangeably to refer to data objects that are powers of two in size. An ALIGNED datum of size 2\*\*N is stored in memory at a byte address that is a multiple of 2\*\*N, that is, one that has N low-order zeros. Thus, an ALIGNED 64-byte stack frame has a memory address that is a multiple of 64. If a datum of size 2\*\*N is stored at a byte address that is not a multiple of 2\*\*N, it is called UNALIGNED. #### **Register Format Notation** This manual contains illustrations that show the format of various registers. Some registers are followed by a description of each field. The fields on the register are labeled with either a name or a mnemonic. The description of each field includes the name or mnemonic, the bit extent, and the type. The "Type" column in the field description includes both the actual type of the field, and an optional initialized value, separated from the type by a comma. The type denotes the functional operation of the field, and may be one of the values shown in Table 1. If present, the initialized value indicates that the field is initialized by hardware to the specified value at power-up. If the initialized value is not present, the field is not initialized at power-up. Table 1 Register Field Type Notation | Notation | Description | |----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | RC | A read-to-clear field. The value is written by hardware and remains unchanged until read. The value may be read by software at which point, hardware may write a new value into the field. | | RO | A read-only bit or field. The value may be read by software. It is written by hardware. Software write operations are ignored. | | RW | A read-write bit or field. The value may be read and written by software. | | WOC | A write-zero-to-clear bit. If read operations are allowed to the register, then the value may be read by software. If it is a write-only register, then a read operation by software returns an UNPREDICTABLE result. Software write operations of a 0 cause the bit to be cleared by hardware. Software write operations of a 1 do not modify the state of the bit. | | W1C | A write-one-to-clear bit. If read operations are allowed to the register, then the value may be read by software. If it is a write-only register, then a read operation by software returns an UNPREDICTABLE result. Software write operations of a 1 cause the bit to be cleared by hardware. Software write operations of a 0 do not modify the state of the bit. | | WA | A write-anything-to-the register-to-clear bit. If read operations are allowed to the register, then the value may be read by software. If it is a write-only register, then a read operation by software returns an UNPREDICTABLE result. Software write operations of any value to the register cause the bit to be cleared by hardware. | | WO | A write-only bit or field. The value may be written by software and is used by hardware. Read operations by software return an UNPREDICTABLE result. | | WZ | A write bit or field. The value may be written by software and is used by hardware. Read operations by software return a 0. | In addition to named fields in registers, other bits of the register may be labeled with one of the five symbols listed in Table 2. These symbols denote the type of the unnamed fields in the register. **Table 2 Register Field Notation** | Notation | Description | | |----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------| | IGN | Register bits specified as ignore (IGN) are ignored when written as are UNPREDICTABLE when read if not otherwise specified. | nd | | MBZ | Register bits specified as MBZ (must be zero) must never be filled software with a non-zero value. If the processor encounters a non-zero value in a field specified as MBZ, an UNDEFINED operation may result. | zero | | RAO | Register bits specified as RAO (read as one) return a one when rea | ıd. | | RAZ | Register bits specified as RAZ (read as zero) return a zero when re | ad. | | SBZ | Register bits specified as SBZ (should be zero) should be filled by software with a zero value. Non-zero values in SBZ fields produce UNDEFINED results and may produce extraneous instruction-issudelays. | | ### Introduction This chapter provides a brief introduction to the Alpha AXP architecture, Digital's RISC (reduced instruction set computing) architecture designed for high performance. The chapter then summarizes the specific features of the Alpha 21164, a microprocessor that implements the Alpha AXP architecture. Appendix A provides a list of Alpha AXP instructions. For a complete introduction to the Alpha AXP architecture, refer to the companion volume, the *Alpha Architecture Reference Manual*. #### 1.1 The Architecture The Alpha AXP architecture is a 64-bit load and store RISC architecture designed with particular emphasis on speed, multiple instruction issue, multiple processors, and software migration from many operating systems. All registers are 64 bits in length and all operations are performed between 64-bit registers. All instructions are 32 bits in length. Memory operations are either load or store operations. All data manipulation is done between registers. The Alpha AXP architecture supports the following data types: - 8-, 16-, 32-, and 64-bit integers - IEEE 32-bit and 64-bit floating-point formats - VAX architecture 32-bit and 64-bit floating-point formats In the Alpha AXP architecture, instructions interact with each other only by one instruction writing to a register or memory location and another instruction reading from that register or memory location. This use of resources makes it easy to build implementations that issue multiple instructions every CPU cycle. The 21164 uses a set of subroutines, called privileged architecture library code (PALcode), that is specific to a particular Alpha AXP operating system implementation and hardware platform. These subroutines provide operating system primitives for context switching, interrupts, exceptions, and memory management. These subroutines can be invoked by hardware or CALL PAL instructions. CALL\_PAL instructions use the function field of the instruction to vector to a specified subroutine. PALcode is written in standard machine code with some implementation-specific extensions to provide direct access to low-level hardware functions. PALcode supports optimizations for multiple operating systems, flexible memory management implementations, and multi-instruction atomic sequences. The Alpha AXP architecture performs byte shifting and masking with normal 64-bit, register-to-register instructions; it does not include single-byte load and store instructions. #### 1.1.1 Addressing The basic addressable unit in the Alpha AXP architecture is the 8-bit byte. The 21164 supports a 43-bit virtual address. Virtual addresses as seen by the program are translated into physical memory addresses by the memory management mechanism. The 21164 supports a 40-bit physical address. #### 1.1.2 Integer Data Types Alpha AXP architecture supports four integer data types: | Data Type | Description | |-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Byte | A byte is 8 contiguous bits that start at an addressable byte boundary. A byte is an 8-bit value. A byte is supported in Alpha AXP architecture by the EXTRACT, MASK, INSERT, and ZAP instructions. | | Word | A word is 2 contiguous bytes that start at an arbitrary byte boundary. A word is a 16-bit value. A word is supported in Alpha AXP architecture by the EXTRACT, MASK, and INSERT instructions. | | Longword | A longword is 4 contiguous bytes that start at an arbitrary byte boundary. A longword is a 32-bit value. A longword is supported in the Alpha AXP architecture by sign-extended load and store instructions and by longword arithmetic instructions. | | Quadword | A quadword is 8 contiguous bytes that start at an arbitrary byte boundary. A quadword is supported in Alpha AXP architecture by load and store instructions and quadword integer operate instructions. | | <br> | ľ | ľ | 0 | t | E | |------|---|---|---|---|---| | <br> | • | • | _ | • | ۰ | Alpha AXP implementations impose a significant performance penalty when accessing operands that are not naturally aligned. Refer to the Alpha Architecture Reference Manual for details. #### 1.1.3 Floating-Point Data Types The 21164 recognizes the following floating-point data types: - Longword integer format in floating-point unit - · Quadword integer format in floating-point unit - IEEE floating-point formats - S\_floating - T floating - VAX floating-point formats - F\_floating - G\_floating - D\_floating (limited support) ### 1.2 Alpha 21164 Microprocessor Features The 21164 microprocessor is a superscalar pipelined processor manufactured using 0.5 micron CMOS technology. It is packaged in a 499-pin IPGA carrier and has removable application-specific heat sinks. The 21164 is designed so that maximum performance is achieved in high-performance systems while offering competitive performance. A number of configuration options allow its use in a range of system designs ranging from extremely simple uniprocessor systems with minimum component count to high-performance multiprocessor systems with very high cache and memory bandwidth. The 21164 can issue four Alpha AXP instructions in a single cycle, thereby minimizing the average cycles per instruction (CPI). A number of low-latency and/or high-throughput features in the instruction issue unit and the on-chip components of the memory subsystem further reduce the average CPI. The 21164 and associated PALcode implements IEEE single and double precision, VAX F\_floating and G\_floating data types, and supports longword (32-bit) and quadword (64-bit) integers. Byte (8-bit) and word (16-bit) support is provided by byte manipulation instructions. Limited hardware support is provided for the VAX D\_floating data type. Partial hardware implementation is provided for the architecturally optional FETCH and FETCH\_M instructions. #### Other 21164 features include: - A peak instruction execution rate of four times the input clock frequency. - The ability to issue up to four instructions during each clock cycle. - An on-chip, demand-paged memory management unit with translation buffer, which when used with PALcode, implements a variety of page table structures and translation algorithms. The unit consists of a 64-entry data translation buffer (DTB) and a 48-entry instruction translation buffer (ITB), with each entry able to map a single 8K-byte page or a group of 8, 64, or 512 8K-byte pages. The size of each translation buffer entry's group is specified by hint bits stored in the entry. The DTB and ITB implement 7-bit address space numbers (ASN), (MAX\_ASN=127). - Two on-chip, high-throughput pipelined floating-point units, capable of executing both Digital and IEEE floating-point data types. - An on-chip, 8K-byte virtual instruction cache with 7-bit ASNs (MAX\_ ASN=127). - An on-chip, dual-read-ported, 8K-byte data cache. - An on-chip write buffer with six 32-byte entries. - An on-chip, 96K-byte, 3-way, set-associative, write-back, second-level mixed instruction and data cache. - A 128-bit data bus with on chip parity and error correction code (ECC) support. - Support for an optional external third-level cache. The size and access time of the external third-level cache is programmable. - An internal clock generator providing a high-speed clock used by the 21164. and a pair of programmable system clocks for use by the CPU module. - On-chip performance counters to measure and analyze CPU and system performance. - Chip and module level test support, including an instruction cache test interface to support chip and module level testing. - A 3.3-V power supply. (Direct connection to 5-V logic supported.) Refer to Chapter 9 for 21164 dc and ac electrical characteristics. Refer to the Alpha Architecture Reference Manual for a description of address space numbers (ASNs). ### Internal Architecture This chapter provides both an overview of the 21164 microarchitecture and a system designer's view of the 21164 implementation of Alpha AXP architecture. The combination of the 21164 microarchitecture and privileged architecture library code (PALcode) defines the chip's implementation of the Alpha AXP architecture. If a certain piece of hardware seems to be "architecturally incomplete," the missing functionality is implemented in PALcode. Chapter 6 provides more information on PALcode. This chapter describes the major functional hardware units and is not intended to be a detailed hardware description of the chip. It is organized as follows: - Alpha 21164 microarchitecture - Pipeline organization - Scheduling and issuing rules - Replay traps - Miss address file (MAF) and load merging rules - Mbox store execution - Write buffer and the WMB instruction - Performance measurement support - Floating-point control register - Design examples ### 2.1 Alpha 21164 Microarchitecture The Alpha 21164 Microprocessor is a high-performance implementation of Digital's Alpha AXP architecture. The following sections provide an overview of the chip's architecture and major functional units. Figure 2-1 is a block diagram of the 21164. The Alpha 21164 microprocessor consists of the following internal sections (Figure 2–1): - · Clock generation logic - Instruction fetch and decode unit (Ibox), which includes: Instruction prefetcher and instruction decoder Branch prediction Instruction translation buffer Interrupt support - Integer execution unit (Ebox) - Floating-point execution unit (Fbox) - Memory address translation unit (Mbox), which includes: Data translation buffer (DTB) Miss address file (MAF) Write buffer Dcache control - · Cache control and bus interface unit (Cbox) with interface to external cache - Data cache (Dcache) - Instruction cache (Icache) - Second-level cache (Scache) - Serial read-only memory (SROM) interface #### 2.1.1 Instruction Fetch and Decode Unit The primary function of the instruction fetch and decode unit (Ibox) is to manage and issue instructions to the Ebox, Mbox, and Fbox. It also manages the instruction cache. The Ibox contains - Prefetcher and instruction buffer - Instruction slot and issue logic - Program counter (PC) and branch prediction logic - 48-entry instruction translation buffers (ITBs) - Abort logic - Register conflict logic - Interrupt and exception logic #### 2.1.1.1 Instruction Decode and Issue The Ibox decodes up to four instructions in parallel and checks that the required resources are available for each instruction. The Ibox issues only the instructions for which all required resources are available. The Ibox does not issue instructions out of order, even if the resources are available for a later instruction and not for an earlier one. #### In other words: - If resources are available, and multiple issue is possible, then all four instructions are issued. - If resources are available only for a later instruction and not for an earlier one, then only the instructions up to the latest one for which resources are available are issued. The Ibox handles only NATURALLY ALIGNED groups of four instructions (INT16). The Ibox does not advance to a new group of four instructions until all instructions in a group are issued. If a branch to the middle of an INT16 group occurs, then the Ibox attempts to issue the instructions from the branch target to the end of the current INT16, then it proceeds to the next INT16 of instructions after all the instructions in the target INT16 are issued. Thus, achieving maximum issue rate and optimal performance requires that code be be scheduled properly and that floating or integer NOP instructions be used to fill empty slots in the scheduled instruction stream. For more information on instruction scheduling and issuing, including detailed rules governing multiple instruction issue, refer to Section 2.3. #### 2.1.1.2 Instruction Prefetch The Ibox contains an instruction prefetcher and a four-entry prefetch buffer called the refill buffer. Each instruction cache (Icache) miss is checked in the refill buffer. If the refill buffer contains the instruction data, it fills the Icache and instruction buffer simultaneously. If the refill buffer does not contain the necessary data, a fetch and a number of prefetches are sent to the Mbox. If these requests are all Scache hits, it is possible for instruction data to stream into the Ibox at the rate of one INT16 (four instructions) per cycle. The Ibox can sustain up to quad-instruction issue from this Scache fill stream, filling the Icache simultaneously. The refill buffer holds all returned fill data until the data is required by the Ibox pipeline. Each fill occurs when the instruction buffer stage in the Ibox pipeline requires a new INT16. The INT16 is written into the Icache and the instruction buffer simultaneously. This can occur at a maximum rate of one Icache fill per cycle. The actual rate depends on how frequently the instruction buffer stage requires a new INT16, and on availability of data in the refill buffer. Once an Icache miss occurs, the Icache enters fill mode. When the Icache is in fill mode, the refill buffer is checked each cycle to see if it contains the next INT16 required by the instruction buffer. When the required data is not available in the refill buffer, the Icache is checked for a hit while it awaits the arrival of the data from the Scache or beyond. If there is an Icache hit at this time, the Icache returns to access mode and the prefetcher stops sending fetches to the Mbox. When a new program counter (PC) is loaded (that is, taken branches), the Icache returns to access mode until the first miss. The refill buffer receives and holds instruction data from fetches initiated before the Icache returned to access mode. #### 2.1.1.3 Branch Execution When a branch or jump instruction is fetched from the Icache, the Ibox needs one cycle to calculate the target PC before it is ready to fetch the target instruction stream. In the second cycle after the fetch, the Icache is accessed at the target address. Branch and PC prediction are necessary to predict and begin fetching the target instruction stream before the branch or jump instruction is issued. The Icache records the outcome of branch instructions in a 2-bit history state provided for each instruction location in the cache. This information is used as the prediction for the next execution of the branch instruction. The history status is not initialized on Icache fill, therefore it may "remember" a branch that was evicted from the leache and subsequently reloaded. The 21164 does not limit the number of branch predictions outstanding to one. It predicts branches even while waiting to confirm the prediction of previously predicted branches. There can be one branch prediction pending for each of pipeline stages 3 and 4. plus up to four in pipeline stage 2. Refer to Section 2.2 for a description of pipeline stages. When a predicted branch is issued, the Ebox or Fbox checks the prediction. The branch history table is updated accordingly. On branch mispredict, a mispredict trap occurs and the Ibox restarts execution from the correct PC. The 21164 provides a 12-entry subroutine return stack that is controlled by decoding the opcode (BSR, HW\_REI and JMP/JSR/RET/JSR\_COROUTINE), and DISP<15:14> in JMP/JSR/RET/JSR\_COROUTINE. The stack stores an Icache index in each entry. The stack is implemented as a circular queue that wraps around in the overflow and underflow cases. The 21164 uses the Icache index hint in the JMP and JSR instructions to predict the target PC. The Icache index hint in the instruction's displacement field is used to access the direct mapped Icache. The upper bits of the PC are formed from the data in the Icache tag store at that index. Later in the pipeline, the PC prediction is checked against the actual PC generated by the Ebox. A mismatch causes a PC mispredict trap and restart from the correct PC. This is similar to branch prediction. The RET, JSR\_COROUTINE, and HW\_REI instructions predict the next PC using the index from the subroutine return stack. The upper bits of the PC are formed from the data in the Icache tag at that index. These predictions are checked against the actual PC in exactly the same way that JMP and JSR predictions are checked. Changes from PALmode to native mode and vice versa are predicted on all PC predictions that use the subroutine return stack. In all cases, if the PC prediction is correct, the mode prediction will also be correct. Instruction stream (Istream) prefetching is disabled when a PC prediction is outstanding. #### 2.1.1.4 Instruction Translation Buffer The Ibox includes a 48-entry, fully associative instruction translation buffer (ITB). The buffer stores recently used Istream address translations and protection information for pages ranging from 8K bytes to 4M bytes and uses a not-last-used replacement algorithm. PALcode fills and maintains the ITB Each entry supports all four granularity hint bit combinations, permitting translation for up to 512 contiguously mapped 8K-byte pages, using any single ITB entry. The operating system, using PALcode, must ensure that virtual addresses can only be mapped through a single ITB entry or superpage mapping at one time. Multiple simultaneous mapping can cause UNDEFINED results. While not executing in PALmode, the 43-bit virtual PC is routed to the ITB each cycle. If the page table entry (PTE) associated with the PC is cached in the ITB, the protection bits for the page that contains the PC are used by the Ibox to do the necessary access checks. If there is an Icache miss and the PC is cached in the ITB, the page frame number (PFN) and protection bits for the page that contains the PC are used by the Ibox to do the address translation and access checks. The 21164's ITB supports 128 address space numbers (ASNs) (MAX\_ASN=127) by means of a 7-bit ASN field in each ITB entry. PALcode, which supports write operations to the architecturally defined TBIAP register, does so by using the hardware-specific HW\_MTPR instruction to write to a specific hardware register. This has the effect of invalidating ITB entries that do not have their ASN bit set. The 21164 provides two optional translation extensions called superpages. Access to superpages is enabled using ICSR<SPE> and is allowed only while executing in privileged mode. - One superpage maps virtual address bits <39:13> to physical address bits <39:13>, on a one-to-one basis, when virtual address bits <42:41> equal 2. This maps the entire physical address space four times over to the quadrant of the virtual address space. - The other superpage maps virtual address bits <29:13> to physical address bits <29:13>, on a one-to-one basis, and forces physical address bits <39:30> to 0 when virtual address bits <42:30> equal 1FFE<sub>16</sub>. This effectively maps a 30-bit region of physical address space to a single region of the virtual address space defined by virtual address bits $\langle 42:30 \rangle = 1FFE_{16}$ . Access to either superpage mapping is allowed only while executing in kernel mode. ## 2.1.1.5 Interrupts The Ibox exception logic supports three sources of interrupts: Hardware interrupts There are seven level-sensitive hardware interrupt sources supplied by the following signals: ``` irq_h<3:0> mch_hlt_irq_h pwr fail irg h sys_mch_chk_irq_h ``` Software interrupts There are 15 prioritized software interrupts sourced by the software interrupt request register (SIRR) (see Section 5.1.22). Asynchronous system traps (ASTs) There are four ASTs controlled by the Asynchronous System Trap Request (ASTRR) register and the Asynchronous System Trap Enable register (ASTER) internal processor registers (IPRs) (see Section 5.1.20 and Section 5.1.21). Most interrupts can be independently masked in on-chip enable registers. In addition, AST interrupts are qualified by the current processor mode. Interrupts are masked by the hardware interrupt priority level (IPL) register (see Section 5.1.18). In addition, AST interrupts are qualified by the current processor mode. The serial line interrupt, the internally detected correctable error interrupt, the performance counter interrupts, and **irq\_h<3:0>** are all maskable by bits in the Ibox control and status register (ICSR) (see Section 5.1.17). All interrupts are disabled when the processor is executing PALcode. # 2.1.2 Integer Execution Unit The integer execution unit (Ebox) contains two 64-bit integer execution pipelines, E0 and E1, which include the following: - Two adders - Two logic boxes - A barrel shifter - Byte manipulation logic - An integer multiplier The Ebox also includes the 40 entry, 64-bit integer register file (IRF) that contains the 32 integer registers defined by the Alpha AXP architecture and 8 PAL shadow registers. The register file has four read ports and two write ports which provide operands to both integer execution pipelines and accept results from both pipes. The register file also accepts load instruction results (memory data) on the same two write ports. # 2.1.3 Floating-Point Execution Unit The on-chip, pipelined floating-point unit (FPU) can execute both IEEE and VAX floating point instructions. The 21164 supports IEEE S\_floating and T\_floating data types, and all rounding modes. It also supports VAX F\_floating and G\_floating data types, and provides limited support for the D\_floating format. The FPU contains: - A 32-entry, 64-bit floating-point register file. - A user-accessible control register. - A floating-point multiply pipeline. - A floating-point add pipeline—The floating-point divide unit is associated with the floating-point add pipeline but is not pipelined. The FPU can accept two instructions every cycle, with the exception of floatingpoint divide instructions. The result latency for nondivide, floating-point instructions is four cycles. The floating-point register file (FRF) has five read ports and four write ports. Four of the read ports are used by the two pipelines to source operands. The remaining read port is used by floating-point stores. Two of the write ports are used to write results from the two pipelines. The other two write ports are used to write fills from floating-point loads. # 2.1.4 Memory Address Translation Unit The memory address translation unit (Mbox) contains three major sections: - Data translation buffer (dual ported) - Miss address file - Write buffer address file The Mbox arbitrates between floating-point loads that hit in the Dcache and floating-point fills from the Cbox, making certain that only one register is written per fill port in each cycle. Floating-point loads that conflict with Cbox fills for use of these write ports are forced to miss in the Dcache so that the Cbox fill can occur. The Mbox receives up to two virtual addresses every cycle from the Ebox. The translation buffer generates the corresponding physical addresses and access control information for each virtual address. The 21164 implements a 43-bit virtual address and a 40-bit physical address. #### 2.1.4.1 Data Translation Buffer The 64-entry, fully associative, dual-read-ported data translation buffer (DTB) stores recently used data stream (Dstream) page table entries (PTEs). Each entry supports all four granularity hint-bit combinations, which permits translation for up to 512 contiguously mapped, 8K-byte pages, using a single DTB entry. The translation buffer uses a not-last-used replacement algorithm. For load and store instructions, and other Mbox instructions requiring address translation, the effective 43-bit virtual address is presented to the DTB. If the PTE of the supplied virtual address is cached in the DTB, the page frame number (PFN) and protection bits for the page that contains the address are used by the Mbox to complete the address translation and access checks. The DTB also supports the register-enabled superpage extensions. The DTB superpage maps provide virtual-to-physical address translation for two regions of the virtual address space. PALcode fills and maintains the DTB. The operating system, using PALcode, must ensure that virtual addresses be mapped either through a single DTB entry or through superpage mapping. Multiple simultaneous mapping can cause UNDEFINED results. The only exception to this rule is that one virtual page may be mapped twice with identical data in two different DTB entries. This occurs in operating systems, such as OpenVMS, which atilize virtually accessible page tables. If the level 1 page table is accessed virtually, PALcode loads the translation information twice; once in the double-miss handler, and once in the primary handler. The PTE mapping the level 1 page table must remain constant during accesses to this page to meet this requirement. #### 2.1.4.2 Load Instruction and the Miss Address File The Mbox begins the execution of each load instruction by translating the virtual address and by accessing the data cache (Deache). Translation and Deache tag read operations occur in parallel. If the addressed location is found in the Deache (a hit), then the data from the Deache is formatted and written to either the integer register file (IRF) or floating-point register file (FRF). The formatting required depends on the particular load instruction executed. If the data is not found in the Deache (a miss), then the address, target register number, and formatting information are entered in the miss address file (MAF). The MAF performs a load-merging function. When a load miss occurs, each MAF entry is checked to see if it contains a load miss that addresses the same Dcache (32-byte) block. If it does, and certain merging rules are satisfied, then the new load miss is merged with an existing MAF entry. This allows the Mbox to service two or more load misses with one data fill from the Cbox. There are six MAF entries for load misses and four more for Ibox instruction fetches and prefetches. Load misses are usually the highest Mbox priority. Refer to Section 2.5 for additional information on load-merging rules. #### 2.1.4.3 Store Execution The Dcache follows a write-through protocol. During store execution, the Mbox checks to see if data is in the Dcache. If there is data in the cache, then the Dcache is updated. Regardless of the Dcache state, the Mbox forwards the data to the cache control and bus interface unit (BIU). A load instruction that is issued one cycle after a store instruction in the pipeline creates a conflict if both the load and store operations access the same memory location. (The store instruction has not yet updated the location when the load instruction reads it.) This conflict is handled by forcing the load instruction to replay trap; that is, the Ibox flushes the pipeline and restarts execution from the load instruction. By the time the load instruction arrives at the Deache the second time, the conflicting store instruction has written the Deache and the load instruction is executed normally. Replay traps can be avoided by scheduling the load instruction to issue three cycles after the store instruction. If the load instruction is scheduled to issue two cycles after the store instruction, then it will be issue-stalled for one cycle. #### 2.1.4.4 Write Buffer The Mbox contains a write buffer that has six 32-byte entries. The write buffer provides a finite, high-bandwidth resource for receiving store data to minimize the number of CPU stall cycles. The write buffer and associated WMB instruction are described in Section 2.7. ## 2.1.5 Cache Control and Bus Interface Unit The cache control and bus interface unit (Cbox) processes all accesses sent by the Mbox and implements all memory-related external interface functions, particularly the coherence protocol functions for write-back caching. It controls the second-level cache (Scache) and the optional board-level backup cache (Bcache). The Cbox handles all instruction and primary Dcache read misses, performs the function of writing data from the write buffer into the shared coherent memory subsystem, and has a major role in executing the Alpha AXP memory barrier instruction. The Cbox also controls the 128-bit bidirectional data bus, address bus, and I/O control. Chapter 4 describes the external interface. ## 2.1.6 Cache Organization The 21164 has three on-chip caches—a primary data cache (Dcache), a primary instruction cache (Icache), and a second-level data and instruction cache (Scache). All memory cells in the on-chip caches are fully static, 6-transistor, CMOS structures. The 21164 also provides control for an optional board-level, external cache (Bcache). ## 2.1.6.1 Data Cache The data cache (Deache) is a dual-read-ported, single-write-ported, 8K-byte cache. It is a write-through, read-allocate, direct-mapped, physical cache with 32-byte blocks. #### 2.1.6.2 Instruction Cache The instruction cache (Icache) is an 8K-byte, virtual, direct-mapped cache. Each block tag contains: - A 7-bit address space number (ASN) field as defined by the Alpha AXP architecture - A 1-bit address space match (ASM) field as defined by the Alpha AXP architecture - A 1-bit PALcode (physically addressed) indicator Software, rather than Icache hardware, maintains Icache coherence with memory. #### 2.1.6.3 Second-Level Cache The second-level cache (Scache) is a 96K-byte, 3-way set associative, physical, write-back, write-allocate cache with 32- or 64-byte blocks. It is a mixed data and instruction cache. The Scache is fully pipelined; it processes read and write operations at the rate of one INT16 per CPU cycle and can alternate between read and write accesses without bubble cycles. If configured to 32 bytes, the Scache is organized as three sets of 512 blocks, with each block divided into two 32-byte subblocks. Otherwise the Scache is three sets of 512 64-byte blocks. #### 2.1.6.4 External Cache The Cbox implements control for an optional, external, direct-mapped, physical, write-back, write-allocate cache with 32- or 64-byte blocks. The 21164 supports board-level cache sizes of 1, 2, 4, 8, 16, 32, and 64 megabytes. ## 2.1.7 Serial Read-Only Memory Interface The serial read-only memory (SROM) interface provides the initialization data load path from a system SROM to the Icache. Chapter 7 provides information about the SROM interface. # 2.2 Pipeline Organization The 21164 has a 7-stage (or 7-cycle) pipeline for integer operate and memory reference instructions, and a 9-stage pipeline for floating-point operate instructions. The Ibox maintains state for all pipeline stages to track outstanding register write operations. Figure 2-2 shows the integer operate, memory reference, and floating-point operate pipelines for the Ibox, FPU, Ebox, and Mbox. The first four stages are executed in the Ibox. Remaining stages are executed by the Ebox, Fbox, Mbox, and Cbox. There are bypass paths that allow the result of one instruction to be used as a source operand of a following instruction before it is written to the register file. Tables 2-1, 2-2, 2-3, 2-4, 2-5, and 2-6 provide examples of events at various stages of pipelining during instruction execution. Table 2-1 Pipeline Examples—All Cases | Pipeline Stage | Events | |----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | Access Icache tag and data. | | 1 | Buffer four instructions, check for branches, calculate branch displacements, and check for Icache hit. | | 2 | Slot-swap instructions around so they are headed for pipelines capable of executing them. Stall preceding stages if all instructions in this stage cannot issue simultaneously because of function unit conflicts. | | 3 | Check the operands of each instruction to see that the source is valid and available and that no write-write hazards exist. Read the IRF. Stall preceding stages if any instruction cannot be issued. All source operands must be available at the end of this stage for the instruction to issue. | Table 2-2 Pipeline Examples-Integer Add | Pipeline | Stage | Events | |----------|-------|----------------------------------------------------------------------| | 4 | | Perform the add operation. | | 5 | | Result is available for use by an operate function in this cycle. | | 6 | | Write the IRF. Result is available for use by an operate function in | | | | this cycle. | Figure 2-2 Instruction Pipeline Stages Instruction Cache Read Instruction Buffer, Branch Decode, Determine Next PC Slot by Function Unit Register File Access Checks, Integer Register File Access Integer IC IB SL AC Operate 0 3 Arithmetic, logical, shift and 5 Pipeline compare instructions complete in pipeline stage 4 (1-cycle latency). C-MOV completes First Integer in stage 5 (2-cycle latency). IMULL has an 8- or 9 cycle latency. A Dependent C MOV or BR Can Operate Stage If Needed, Second Integer Operate Stage ssue in Parallel (0-Cycle Latency) with a CMP or Logical Instruction. Write Integer Register File Floating-IC ΙB SL AC Point 0 2 3 **Pipeline** Floating-Point Register File Access First Floating-Point Operate Stage Last Floating-Point Operate Stage Write Floating-Point Register File Memory Reference 6 10 11 **Pipeline Dcache Read Operation Begins** Dcache Read Operation Ends Use Deache Data, Write Store Data to Dcache Scache Tag Access Begins Scache Data Access Begins Scache Data Access Ends Fill Dcache Use Scache Data LJ-03560-TI0 Table 2-3 Pipeline Examples—Floating Add | Pipeline Stage | Events | | |----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--| | 4 | Read the FRF. | | | 5 | First stage of Fbox add pipeline. | | | 6 | Second stage of Fbox add pipeline. | | | 7 | Third stage of Fbox add pipeline. | | | 8 | Fourth stage of Fbox add pipeline. Write the FRF | | | 9 | Result is available for use by an operate function in this cycle. For instance, Pipeline Stage 5 of the user instruction can coincide with Pipeline Stage 9 of the producer (latency of 4). | | Table 2-4 Pipeline Examples—Load (Dcache Hit) | Pipeline Stage | Events | |----------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 4 | Calculate the effective address. Begin the Deache data and tag store access. | | 5 | Finish the Dcache data and tag store access. Detect Dcache hit.<br>Format the data as required. Scache arbitration defaults to pipe E0 in anticipation of a possible miss. | | 6 | Write the IRF or FRF. Data is available for use by an operate function in this cycle. | Table 2-5 Pipeline Examples—Load (Dcache Miss) | Pipeline Stage | Events | |----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 4 | Calculate the effective address. Begin the Dcache data and tag store access. | | 5 | Finish the Dcache data and tag store access. Detect Dcache miss. Scache arbitration defaults to pipe E0 in anticipation of a possible miss. A load in pipe E1 would be delayed at least one more cycle because default arbitration speculatively selects E0. | | 6 | Begin Scache tag read. | | 7 | Finish Scache tag read. Begin detecting Scache hit. | | 8 | Finish detecting Scache hit. Begin accessing the correct Scache data bank. (Bcache index at interface—Bcache access begins.) | | 9 | Finish the Scache data bank access. Begin sending fill data from the Scache. | | 10 | Finish sending fill data from the Scache. Begin Dcache fill. Format the data as required. | | 11 | Finish the Dcache fill. Write the integer or floating register file. | | 12 | Data is available for use by an operate function in this cycle. | Table 2–6 Pipeline Examples—Store (Dcache Hit) | Pip | eline Stage Events | |-----|-------------------------------------------------------------------------------------------------------| | 4 | Calculate the effective address. Begin the Dcache tag store access. | | 5 | Finish the Deache tag store access. Detect Deache hit. Send store to the write buffer simultaneously. | | 6 | Write the Dcache data store if hit (write begins this cycle). | # 2.2.1 Pipeline Stages and Instruction Issue The 21164 pipeline divides instruction processing into four static and a number of dynamic stages of execution. The first four stages consist of the instruction fetch, buffer and decode, slotting, and issue check logic. These stages are static in that instructions may remain valid in the same pipeline stage for multiple cycles while waiting for a resource or stalling for other reasons. Dynamic stages (Ebox and Fbox) always advance state and are unaffected by any stall in the pipeline. A pipeline stall may occur while zero instructions issue, or while some instructions of a set of four issue and the others are held at the issue stage. A pipeline stall implies that a valid instruction is (or instructions are) presented to be issued but cannot proceed. Upon satisfying all issue requirements, instructions are issued into their slotted pipeline. After issuing, instructions cannot stall in a subsequent pipeline stage. The issue stage is responsible for ensuring that all resource conflicts are resolved before an instruction is allowed to continue. The only means of stopping instructions after the issue stage is an abort condition. (The term abort as used here is different from its use in the Alpha Architecture Reference Manual.) # 2.2.2 Aborts and Exceptions Aborts may result from a number of causes. In general, they may be grouped into two classes, exceptions (including interrupts) and nonexceptions. The difference between the two is that exceptions require that the pipeline be drained of all outstanding instructions before restarting the pipeline at a redirected address. In either case, the pipeline must be flushed of all instructions that were fetched subsequent to the instruction that caused the abort condition (arithmetic exceptions are an exception to this rule). This includes aborting some instructions of a multiple-issued set in the case of an abort condition on the one instruction in the set. The nonexception case does not need to drain the pipeline of all outstanding instructions ahead of the aborting instruction. The pipeline can be restarted immediately at a redirected address. Examples of nonexception abort conditions are branch mispredictions, subroutine call/return mispredictions, and replay traps. Data cache misses can cause aborts or issue stalls depending on the cycle-by-cycle timing. In the event of an exception other than an arithmetic exception, the processor aborts all instructions issued after the exceptional instruction as described in the preceding paragraphs. Due to the nature of some exception conditions, this may occur as late as the integer register file (IRF) write cycle. In the case of an arithmetic exception, the processor may execute instructions issued after the exceptional instruction. After aborting, the address of the exceptional instruction or the immediately subsequent instruction is latched in the EXC\_ADDR internal processor register (IPR). In the case of an arithmetic exception, EXC\_ADDR contains the address of the instruction immediately after the last instruction executed. (Every instruction prior to the last instruction executed was also executed.) For machine check and interrupts, EXC\_ADDR points to the instruction immediately following the last instruction executed. For the remaining cases, EXC\_ADDR points to the exceptional instruction; where in all cases its execution should naturally restart. When the pipeline is fully drained, the processor begins instruction execution at the address given by the PALcode dispatch. The pipeline is drained when all outstanding write operations to both the IRF and FRF have completed and all outstanding instructions have passed the point in the pipeline such that they are guaranteed to complete without an exception in the absence of a machine check. Replay traps are aborts that occur when an instruction requires a resource that is not available at some point in the pipeline. These are usually Mbox resources whose availability could not be anticipated accurately at issue time (refer to Section 2.4). If the necessary resource is not available when the instruction requires it, the instruction is aborted and the Ibox begins fetching at exactly that instruction, thereby replaying the instruction in the pipeline. A slight variation on this is the load-miss-and-use replay trap in which an operate instruction is issued just as a Deache hit is being evaluated to determine if one of the instruction's operands is valid. If the result is a Deache miss, then the operate instruction is aborted and replayed. #### 2.2.3 Nonissue Conditions There are two reasons for nonissue conditions. The first is a pipeline stall wherein a valid instruction or set of instructions are prepared to issue but cannot due to a resource conflict (register conflict or function unit conflict). These types of nonissue cycles can be minimized through code scheduling. The second type of nonissue conditions consists of pipeline bubbles where there is no valid instruction in the pipeline to issue. Pipeline bubbles result from the abort conditions described in the previous section. In addition, a single pipeline bubble is produced whenever a branch type instruction is predicted to be taken, including subroutine calls and returns. Pipeline bubbles are reduced directly by the instruction buffer hardware and through bubble squashing, but can also be effectively minimized through careful coding practices. Bubble squashing involves the ability of the first four pipeline stages to advance whenever a bubble or buffer slot is detected in the pipeline stage immediately ahead of it while the pipeline is otherwise stalled. # 2.3 Scheduling and Issuing Rules The following sections define the classes of instructions and provide rules for instruction slotting, instruction issuing, and latency. # 2.3.1 Instruction Class Definition and Instruction Slotting The scheduling and multiple issue rules presented here are performance related only; that is, there are no functional dependencies related to scheduling or multiple issuing. The rules are defined in terms of instruction classes. Table 2-7 specifies all of the instruction classes and the pipeline that executes the particular class. With a few additional rules, the table provides the information necessary to determine the functional resource conflicts that determine which instructions can issue in a given cycle. Table 2-7 Instruction Classes and Slotting | | 20007 - 20007 - 2000 | |-------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Pipeline | Instruction List | | E0 <sup>1</sup> or E1 <sup>2</sup> | All loads except LDr_L | | E0 | All stores except STx_C | | E0 . | LDx_L, MB, WMB, STx_C, HW_LD-lock, HW_ST-cond, FETCH | | E0 | RS, RC | | E0 or E1<br>(depends on<br>the IPR) | HW_MFPR, HW_MTPR | | E1 | Integer conditional branches | | FA <sup>3</sup> | Floating-point conditional branches | | E1 | Jump-to-subroutine instructions: JMP, JSR, RET, or JSR_COROUTINE, BSR, BR, HW_REI, CALLPAL | | E0 or E1 | ADDL, ADDL/V, ADDQ, ADDQ/V, SUBL, SUBL/V,<br>SUBQ, SUBQ/V, S4ADDL, S4ADDQ, S8ADDL,<br>S8ADDQ, S4SUBL, S4SUBQ, S8SUBL, S8SUBQ,<br>LDA, LDAH | | E0 or E1 | AND, BIS, XOR, BIC, ORNOT, EQV | | <b>E</b> 0 | SLL, SRL, SRA, EXTQL, EXTLL, EXTWL, EXTBL, EXTQH, EXTLH, EXTWH, MSKQL, MSKLL, MSKWL, MSKBL, MSKQH, MSKLH, MSKWH, INSQL, INSLL, INSWL, INSBL, INSQH, INSLH, INSWH, ZAP, ZAPNOT | | | E0 or E1 E0 E0 E0 E0 or E1 (depends on the IPR) E1 FA <sup>3</sup> E1 E0 or E1 | <sup>&</sup>lt;sup>1</sup>Ebox pipeline 0. (continued on next page) <sup>&</sup>lt;sup>2</sup>Ebox pipeline 1. Fbox add pipeline. Table 2-7 (Cont.) Instruction Classes and Slotting | Class Name | Pipeline | Instruction List | |------------|-----------------|---------------------------------------------------------------------------------------| | CMOV | E0 or E1 | CMOVEQ, CMOVNE, CMOVLT, CMOVLE, CMOVGT, CMOVGE, CMOVLBS, CMOVLBC | | ICMP | E0 or E1 | CMPEQ, CMPLT, CMPLE, CMPULT, CMPULE, CMPBGE | | IMULL | E0 | MULL, MULL/V | | IMULQ | E0 | MULQ, MULQ/V | | IMULH | E0 | UMULH | | FADD | FA | Floating-point operates, including CPYSN and CPYSE, except multiply, divide, and CPYS | | FDIV | FA | Floating-point divide | | FMUL | FM <sup>4</sup> | Floating-point multiply | | FCPYS · | FM or FA | CPYS, not including CPYSN or CPYSE | | MISC | E0 | RPCC, TRAPB | | UNOP | none | UNOP <sup>5</sup> | <sup>&</sup>lt;sup>4</sup>Fbox multiply pipeline. #### Slotting The slotting function in the Ibox determines which instructions will be sent forward to attempt to issue. The slotting function detects and removes all static functional resource conflicts. The set of instructions output by the slotting function will issue if no register or other dynamic resource conflict is detected in stage 3 of the pipeline. The slotting algorithm follows: Starting from the first (lowest addressed) valid instruction in the INT16 in stage 2 of the 21164 Ibox pipeline, attempt to assign that instruction to one of the four pipelines (E0, E1, FA, FM). If it is an instruction that can issue in either E0 or E1, assign it to E0. However, if one of the following is true, assign it to E1: - E0 is not free and E1 is free. - The next integer instruction in this INT16 can issue only in E0. <sup>&</sup>lt;sup>5</sup>UNOP is LDQ\_U R31,0(Rx). <sup>&</sup>lt;sup>1</sup> In this context, an integer instruction is one that can issue in one or both of E0 or E1, but not FA or FM. If the current instruction is one that can issue in either FA or FM, assign it to FA unless FA is not free. If it is an FA-only instruction, it must be assigned to FA. If it is FM-only instruction, it must be assigned to FM. Mark the pipeline selected by this process as taken and resume with the next sequential instruction. Stop when an instruction cannot be allocated in an execution pipeline because any pipeline it can use is already taken. The slotting logic does not send instructions forward out of logical instruction order because the 21164 always issues instructions in order. The slotting logic also enforces the special rules in the following list, stopping the slotting process when a rule would be violated by allocating the next instruction an execution pipeline: - An instruction of class LD cannot be simultaneously issued with an instruction of class ST. - All instructions are discarded at the slotting stage after a predicted-taken IBR or FBR class instruction, or a JSR class instruction. - After a predicted not-taken IBR or FBR, no other IBR, FBR, or JSR class can be slotted together. - The following cases are detected by the slotting logic: - From lowest address to highest within an INT16, with the following arrangement: I-instruction, F-instruction, I-instruction, I-instruction I-instruction is any instruction that can issue in one or both of E0 or E1. F-instruction is any instruction that can issue in one or both of FA or FM From lowest address to highest within an INT16, with the following arrangement: F-instruction, I-instruction, I-instruction, I-instruction When this type of case is detected, the first two instructions are forwarded to the issue point in one cycle. The second two are sent only when the first two have both issued, provided no other slotting rule would prevent the second two from being slotted in the same cycle. ## 2.3.2 Coding Guidelines Code should be scheduled according to latency and function unit availability. This is good practice in most RISC architectures. Code alignment and the effects of split-issue<sup>1</sup> should be considered. Instructions [a] and [b] in the following example are slotted together, but [b] stalls (split-issue), thus preventing [c] and [d] from advancing to the issue stage: Code example showing bad ordering Code example showing good ordering | (1) | [a] | LDL | R2,0(R1) | |-----|-----|-------|----------| | 131 | [h] | ADDT. | D2 D3 D4 | - (4) [c] ADDL R2,R5,R6 - (5) [d] ADDL R4,R8,R9 (1) [e] LDL R2.R3.R4 - (1) [f] NOP - (3) [g] ADDL R2, R3, R4 (3) [h] ADDL R2, R5, R6 - (4) [i] ADDL R4, R8, R9 NOTES: The instruction examples are assumed to begin on an INT16 alignment. (n) = Expected execute cycle. Eventually [b] issues when the result of [a] is returned from a presumed Deache hit. Instruction [c] is delayed because it cannot advance to the issue stage until [b] issues. Instructions [c] and [d] advance together to the issue stage, but [d] stalls due to another split issue. In the improved code order example, a NOP (or independent) instruction prevents the split-issue cases and the sequence executes in one less cycle. #### 2.3.3 Instruction Latencies After slotting, instruction issue is governed by the availability of registers for read or write operations, and the availability of the floating divide unit and the integer multiply unit. There are producer-consumer dependencies, producerproducer dependencies (also known as write-after-write conflicts), and dynamic function unit availability dependencies (integer multiply and floating divide). The Ibox logic in stage 3 of the 21164 pipeline detects all these conflicts. The latency to produce a valid result for most instructions is fixed. The exceptions are loads that miss, floating-point divides, and integer multiplies. Table 2-8 gives the latencies for each instruction class. A latency of 1 means that the result may be used by an instruction issued one cycle after the producing instruction. Most latencies are only a property of the producer. An exception is integer multiply latencies. There are no variations in latency due to which particular unit produces a given result relative to the particular unit that consumes it. In the case of integer multiply, the instruction is issued at Split-issue is the situation in which not all instructions sent from the slotting stage to the issue stage issue. One or more stalls result. the time determined by the standard latency numbers. The multiply's latency is dependent on which previous instructions produced its operands and when they executed. Table 2-8 Instruction Latencies | Class | Latency | Additional Time Before<br>Result Available to Integer<br>Multiply Unit <sup>1</sup> | |-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------| | LD | Deache hits, latency=2. Deache miss/Seache hit, latency=8 or longer. <sup>2</sup> | 1 cycle | | ST | Store operations produce no result. | | | MBX | LDx_L always Dcache misses, latency depends on memory subsystem state. STx_C, latency depends on memory subsystem state. MB, WMB, and FETCH produce no result. | | | RX | RS, RC, latency=1. | 2 cycles | | MXPR | HW_MFPR, latency=1, 2, or longer, depending on<br>the IPR.<br>HW_MTPR, produces no result. | 1 or 2 cycles | | IBR | Produces no result. (Taken branch issue latency minimum = 1 cycle, branch mispredict penalty = 5 cycles.) | <del>-</del> | | FBR | Produces no result. (Taken branch issue latency minimum = 1 cycle, branch mispredict penalty = 5 cycles.) | <u> </u> | | JSR | All but HW_REI, latency=1. HW_REI produces no result. (Issue latency—minimum 1 cycle.) | 2 cycles | | IADD | Latency≢1. | 2 cycles | <sup>&</sup>lt;sup>1</sup>The multiplier is unable to receive data from Ebox bypass paths. The instruction issues at the expected time, but its latency is increased by the time it takes for the input data to become available to the multiplier. For example, an IMULL instruction issued one cycle later than an ADDL instruction, which produced one of its operands, has a latency of 10 (8 + 2). If the IMULL instruction is issued two cycles later than the ADDL instruction, the latency is 9 (8 + 1). (continued on next page) <sup>&</sup>lt;sup>2</sup>When idle, Scache arbitration predicts a load miss in E0. If a load actually does miss in E0, it is sent to the Scache immediately. If it hits, and no other event in the Cbox affects the operation, the requested data is available for use in eight cycles. Otherwise, the request takes longer (possibly much longer depending on the state of the Scache and Cbox). It should be possible to schedule some unrolled code loops for Scache by using a data access pattern that takes advantage of the Mbox load-merging function, achieving high throughput with large data sets. Table 2-8 (Cont.) Instruction Latencies | Class | Latenav | Additional Time Before<br>Result Available to Integer | |-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------| | ILOG | Latency 14 | Multiply Unit <sup>1</sup> | | | Latency=1.4 | 2 cycles | | SHIFT | Latency=1. | 2 cycles | | CMOV | Latency=2. | 1 cycle | | ICMP | Latency=1.4 | 2 cycles | | IMULL | Latency=8, plus up to 2 cycles of added latency depending on the source of the data.¹ Latency until next IMULL, IMULQ, or IMULH instruction can issue (if there are no data dependencies) is 4 cycles plus the number of cycles added to the latency. | 1 cyele | | IMULQ | Latency=12, plus up to 2 cycles of added latency depending on the source of the data. Latency until next IMULL, IMULQ, or IMULH instruction can issue (if there are no data dependencies) is 8 cycles plus the number of cycles added to the latency. | 1 cycle | | IMULH | Latency=14, plus up to 2 cycles of added latency depending on the source of the data. Latency until next IMULL, IMULQ, or IMULH instruction can issue (if there are no data dependencies) is 8 cycles plus the number of cycles added to the latency. | 1 cycle | | FADD | Latency=4. | | | FDIV | Data-dependent latency: 15 to 31 single precision, 22 to 60 double precision. Next floating divide can be issued in the same cycle. The result of the previous divide is available, regardless of data dependencies. | | | FMUL | Latency=4. | · <del></del> | $<sup>^{1}</sup>$ The multiplier is unable to receive data from Ebox bypass paths. The instruction issues at the expected time, but its latency is increased by the time it takes for the input data to become available to the multiplier. For example, an IMULL instruction issued one cycle later than an ADDL instruction, which produced one of its aperands, has a latency of 10~(8+2). If the IMULL instruction is issued two cycles later than the ADDL instruction, the latency is 9~(8+1). (continued on next page) A special bypass provides an effective latency of 0 (zero) cycles for an ICMP or ILOG instruction producing the test operand of an IBR or CMOV instruction. This is true only when the IBR or CMOV instruction issues in the same cycle as the ICMP or ILOG instruction that produced the test operand of the IBR or CMOV instruction. In all other cases the effective latency of ICMP and ILOG instruction is one cycle Table 2-8 (Cont.) Instruction Latencies | | | Additional Time Before Result Available to Integer | | |-------|--------------------------------------------|----------------------------------------------------|--| | Class | Latency | Multiply Unit <sup>1</sup> | | | FCPYS | Latency=4. | — — — — — — — — — — — — — — — — — — — | | | MISC | RPCC, latency=2. TRAPB produces no result. | 1 cycle | | | UNOP | UNOP produces no result. | | | <sup>&</sup>lt;sup>1</sup>The multiplier is unable to receive data from Ebox bypass paths. The instruction issues at the expected time, but its latency is increased by the time it takes for the input data to become available to the multiplier. For example, an IMULL instruction issued one cycle later than an ADDL instruction, which produced one of its operands, has a latency of 10 (8 + 2). If the IMULL instruction is issued two cycles later than the ADDL instruction, the latency is 9(8 + 1). #### 2.3.3.1 Producer-Producer Latency Producer-producer latency, also known as write after-write conflicts, cause issue-stalls to preserve write order. If two instructions write the same register, they are forced to do so in different excles by the Ibox. This is necessary to ensure that the correct result is left in the register file after both instructions have executed. For most instructions, the order in which they write the register file is dictated by issue order. However IMUL, FDIV and LD instructions may require more time than other instructions to complete. Subsequent instructions that write the same destination register are issuestalled to preserve write ordering at the register file. Conditions that involve an intervening producer-consumer conflict can occur commonly in a multiple-issue situation when a register is reused. In these cases, producer-consumer latencies are equal to or greater than the required producer-producer latency as determined by write ordering and therefore dictate the overall latency. An example of this case is shown in the following code: ``` R2,0(R0) ; R2 destination ADDQ R2, R3, R4 wr-rd conflict stalls execution waiting for R2 LDQ R2,D(R1) ; wr-wr conflict may dual issue when ADDQ issues ``` Producer-producer latency is generally determined by applying the rule that register file write operations must occur in the correct order (enforced by Ibox hardware). Two IADD or ILOG class instructions that write the same register issue at least one cycle apart. The same is true of a pair of CMOVclass instructions, even though their latency is 2. For IMUL, FDIV and LD instructions, producer-producer conflicts with any subsequent instruction results in the second instruction being issue-stalled until the IMUL, FDIV, or LD instruction is about to complete. The second instruction is issued as soon as it is guaranteed to write the register file at least one cycle after the IMUL, FDIV, or LD instruction. If a load writes a register, and within two cycles a subsequent instruction writes the same register, the subsequent instruction is issued speculatively assuming the load hits. If the load misses, a load-miss-and use trap is generated. This causes the second instruction to be replayed by the Ibox. When the second instruction again reaches the issue point, it is issue-stalled until the load fill occurs. #### 2.3.4 Issue Rules The following is a list of conditions that prevent the 21164 from issuing an instruction: - No instruction can be issued until all of its source and destination registers are clean; that is, all outstanding write operations to the destination register are guaranteed to complete in issue order and there are no outstanding write operations to the source registers, or those write operations can be bypassed. - Technically, load-miss-and-use replay traps are an exception to this rule. The consumer of the load's result issues, and is aborted, because a load was predicted to hit and was discovered to miss just as the consumer instruction issued. In practice, the only difference is that the latency of the consumer may be longer than it would have been had the issue logic "known" the load would miss in time to prevent issue. - An instruction of class LD cannot be issued in the second cycle after an instruction of class ST is issued. - No LD, ST, MXPR (to an Mbox register), or MBX class instructions can be issued after an MB instruction has been issued until the MB has been acknowledged by the Cbox. - No LD, ST, MXPR (to an Mbox register), or MBX class instructions can be issued after a STx\_C (or HW\_ST-cond) instruction has been issued until the Mbox writes the success/failure result of the STx\_C (HW\_ST-cond) in its destination register. - No IMUE instructions can be issued if the integer multiplier is busy. - No floating-point divide instructions can be issued if the floating-point divider is busy. - No instruction can be issued to pipe E0 exactly two cycles before an integer multiplication completes. - No instruction can be issued to pipe FA exactly five cycles before a floatingpoint divide completes. - No instruction can be issued to pipe E0 or E1 exactly two cycles before an integer register fill is requested (speculatively) by the Cbox, except IMULL, IMULQ, and IMULH instructions and instructions that do not produce any - No LD, ST, or MBX class instructions can be issued to pipe E0 or E1 exactly one cycle before a integer register fill is requested (speculatively) by - No instruction issues after a TRAPB instruction until all previously issued instructions are guaranteed to finish without generating a trap other than a machine check. All instructions sent to the issue stage (stage 3) by the slotting logic (stage 2) are issued subject to the above rules. If issue is prevented for a given instruction at the issue stage, all logically subsequent instructions at that stage are prevented from issuing automatically. The 21164 only issues instructions in order. # 2.4 Replay Traps There are no stalls after the instruction issue point in the pipeline. In some situations, an Mbox instruction cannot be executed because of insufficient resources (or some other reason). These instructions trap and the Ibox restarts their execution from the beginning of the pipeline. This is called a replay trap. Replay traps occur in the following cases: - The write buffer is full when a store instruction is executed and there are already six write buffer entries allocated. The trap occurs even if the entry would have merged in the write buffer. - A load instruction is issued in pipe E0 when all six MAF entries are valid (not available), or a load instruction issued in pipe E1 when five of the six MAF entries are valid. The trap occurs even if the load instruction would have hit in the Dcache or merged with an MAF entry. - Alpha AXP shared memory model order trap (Litmus test 1 trap): If a load instruction issues that address matches with any miss in the MAF, the load instruction is aborted through a replay trap regardless of whether the newly issued load instruction hits or misses in the Dcache. The address match is precise except that it includes the case in which a longword access matches within a quadword access. This ensures that the two loads execute in issue order. - Load-after-store trap: A replay trap occurs if a load instruction is issued in the cycle immediately following a store instruction that hits in the Dcache, and both access the same location. The address match is exact with respect to low-order bits of the address, but ignores address bits <42:13>. - When a load instruction is followed, within one cycle, by any instruction that uses the result of that load, and the load misses in the Dcache, the consumer instruction traps and is restarted from the beginning of the pipeline. This occurs because the consumer instruction is issued speculatively while the Dcache hit is being evaluated. If the load misses in the Dcache, the speculative issue of the consumer instruction was incorrect. The replay trap generally brings the consumer instruction to the issue point before or simultaneously with the availability of fill data. # 2.5 Miss Address File and Load-Merging Rules The following sections describe the miss address file (MAF) and its load-merging function, and the load-merging rules that apply after a load miss. # 2.5.1 Merging Rules When a load miss occurs, each MAF entry is checked to see if it contains a load miss that addresses the same 32-byte Dcache block. If it does, and certain merging rules are satisfied, then the new load miss is merged with an existing MAF entry. This allows the Mbox to service two or more load misses with one data fill from the Cbox. The merging rules for an individual MAF entry are as follows: - Merging only occurs if the new load miss addresses a different INT8 from all loads previously entered or merged to that MAF entry. - Merging only occurs if the new load miss is the same access size as the load instructions previously entered in that MAF entry. That is, quadword load instructions merge only with other quadword load instructions and longword load instructions merge only with other longword load instructions. - In the case of longword load instructions, both <02> address bits must be the same. That is, longword load instructions with even addresses merge only with other even longword load instructions, and longword load instructions with odd addresses merge only with other odd longword load instructions. - The MAF does not merge floating-point and integer load misses in the same entry. Merging is prevented for the MAF entry a certain number of cycles after the Scache access corresponding to the MAF entry begins. Merging is prevented for that entry only if the Scache access hits. The minimum number of cycles of merging is three; the cycle in which the first load is issued, and the two subsequent cycles. This corresponds to the most optimistic case of a load miss being forwarded to the Scache without delay (accounting for the cycle saved by the bypass that sends new load misses directly to the Scache when there is nothing else pending). # 2.5.2 Read Requests to the Cbox When merging does not occur, a new MAF entry is allocated for the new load miss. Merging is done for two load instructions issued simultaneously, which both miss in effect as if they were issued sequentially with the load from Ebox pipe E0 first. The Mbox sends a read request to the Cbox for each MAF entry allocated. A bypass is provided so that if the load instruction issues in Ebox pipe E0, and no MAF requests are pending, the load instruction's read request is sent to the Cbox immediately. Similarly, if a load instruction from Ebox pipe E1 misses, and there was no load instruction in pipe E0 to begin with, the E1 load miss is sent to the Cbox immediately. In either case, the bypassed read request is aborted if the load hits in the Deache or merges in the MAF. # 2.5.3 Load Instructions to Noncacheable Space Merging is normally allowed for load instructions to noncacheable space (physical address bit <39> = 1). It is prevented when MAF\_MODE<03>=1. At the external interface, these read instructions tell the system environment which INT32 is addressed and which of the INT8s within the INT32 are actually accessed. Merging stops for a load instruction to noncacheable space as soon as the Cbox accepts the reference. This permits the system environment to access only those INT8s that are actually requested by load instructions. For memory-mapped INT4 registers, the system environment must return the result of reading each register within the INT8. This occurs because the 21164 only indicates those INT8s that are accessed, not the exact length and offset of the access within each INT8. Systems implementing memory-mapped registers with side effects from read instructions should place each such register in a separate INT8 in memory. #### 2.5.4 MAF Entries and MAF Full Conditions There are six MAF entries for load misses and four for Ibox instruction fetches and prefetches. Load misses are usually the highest Mbox priority request. If the MAF is full and a load instruction issues in pipe E0, or if five of the six MAF entries are valid and a load instruction issues in pipe E1, an MAF full trap occurs causing the Ibox to restart execution with the load instruction that caused the MAF overflow. When the load instruction arrives at the MAF the second time, an MAF entry may have become available. If not, the MAF full trap occurs again. # 2.5.5 Fill Operation Eventually, the Cbox provides the data requested for a given MAF entry (a fill). If the fill is integer data and not floating-point data, the Cbox requests that the Ibox allocate two consecutive "bubble" cycles in the Ebox pipelines. The first bubble prevents any instruction from issuing. The second bubble prevents only Mbox instructions (particularly load and store instructions) from issuing. The fill uses the first bubble cycle as it progresses down the Ebox/Mbox pipelines to format the data and load the register file. It uses the second bubble cycle to fill the Dcache. An instruction typically writes the register file in pipeline stage 6 (see Figure 2-2). Because there is only one register file write port per integer pipeline, a no-instruction bubble cycle is required to reserve a register file write port for the fill. A load or store instruction accesses the Dcache in the second half of stage 4 and the first half of stage 5. The fill operation writes the Dcache, making it unavailable for other accesses at that time. Relative to the register file write operation, the Dcache (write) access for a fill occurs a cycle later than the Dcache access for a load hit. Only load and store instructions use the Dcache in the pipeline. Therefore, the second bubble reserved for a fill is a no-Mbox-instruction bubble. The second bubble is a subset of the first bubble. When two fills are in consecutive cycles, as in an Scache hit, then three total bubbles are allocated; two no-instruction bubbles, followed by one no-Mbox-instruction bubble. The bubbles are requested speculatively before it is known whether the Scache or the optional external Bcache will hit. For fills from the Cbox to floating-point registers, no cycle is allocated. Load instructions that conflict with the fill in the pipeline are forced to miss. Store instructions that conflict in the pipeline force the fill to be aborted in order to keep the Dcache available to the store operation. In all cases, the floating-point registers are filled as dictated by the associated MAF entry. The Fbox has separate write ports for fill data as is necessary for this fill scheme. Up to two floating or integer registers may be written for each Cbox fill cycle. Fills deliver 32 bytes in two cycles: two INT8s per cycle. The MA merging rules ensure that there is no more than one register to write for each INT8, so that there is a register file write port available for each INT8. After appropriate formatting, data from each INT8 is written into the IRF or FRF provided there is a miss recorded for that INT8. Load misses are all checked against the write buffer contents for conflicts between new load instructions and previously issued store instructions. Refer to Section 2.7 for more information on write operations. LDL\_L and LDQ\_L instructions always allocate a new MAF entry. No load instructions that follow an LDL L or LDQ L instruction are allowed to merge with it. After an LDL\_L or LDQ\_L instruction is issued, the Ibox does not issue any more Mbox instructions until the Mbox has successfully sent the LDL\_L or LDQ\_L instruction to the Cbox. This guarantees correct ordering between an LDL\_L or LDQ\_L instruction and a subsequent STL\_C or STQ\_C instruction even if they access different addresses. # 2.6 Mbox Store Instruction Execution Store instructions execute in the Mbox by: - Reading the Dcache tag store instruction in the pipeline stage in which a load instruction would read the Dcache - Checking for a hit in the next stage - Writing the Deache data store instruction if there is a hit in the second (following) pipeline stage. Load instructions are not allowed to issue in the second cycle after a store instruction (one bubble cycle). Other instructions can be issued in that cycle. Store instructions can issue at the rate of one per cycle because store instructions in the Dstream do not conflict in their use of resources. The Deache tag store and Deache data store are the principal resources. However, a load instruction uses the Deache data store in the same early stage that it uses the Dcache tag store. Therefore, a load instruction would conflict with a store instruction if it were issued in the second cycle after any store instruction. Refer to Section 2.2 for more information on store instruction execution in the pipeline. A load instruction that is issued one cycle after a store instruction in the pipeline creates a conflict if both access exactly the same memory location. This occurs because the store instruction has not yet updated the location when the load instruction reads it. This conflict is handled by forcing the load instruction to replay trap. The Ibox flushes the pipeline and restarts execution from the load instruction. By the time the load instruction arrives at the Dcache the second time, the conflicting store instruction has written the Dcache and the load instruction is executed normally. Software should not load data immediately after storing it. The replay trap that is incurred "costs" seven cycles. The best solution is to schedule the load instruction to issue three cycles after the store. No issue stalls or replay traps will occur in that case. If the load instruction is scheduled to issue two cycles after the store instruction, it will be issue-stalled for one cycle. This is not an optimal solution, but is preferred over incurring a replay trap on the load instruction. For three cycles during store instruction execution, fills from the Cbox are not placed in the Dcache. Register fills are unaffected. There are conflicts that make it impossible to fill the Dcache in each of these cycles. Fills are prevented in cycles in which a store instruction is in pipeline stage 4, 5, or 6. This always applies to fills of floating-point data. Fills of integer data allocate bubble cycles, such that an integer fill never conflicts with a store instruction in pipeline stages 4 or 5. Instead, a store instruction that would have conflicted in stage 4 or 5 is issue-stalled but an integer fill will conflict with a store instruction in pipeline stage 6. If a store instruction is stalled at the issue point for any reason, it interferes with fills just as if it had been issued. This applies only to fills of floating-point data. For each store instruction, a search of the MAF is done to detect load-beforestore hazards. If a store instruction is executed, and a load of the same address is present in the MAF, two things happen: - 1. Bits are set in each conflicting MAF entry to prevent its fill from being placed in the Deache when it arrives, and to prevent subsequent load instructions from merging with that MAF entry. - 2. Conflict bits are set with the store instruction in the write buffer to prevent the store instruction from being issued until all conflicting load instructions have been issued to the Cbox. This ensures proper results from the load instructions and prevents incorrect data from being cached in the Dcache. A check is performed for each new store against store instructions in the write buffer that have already been sent to the Cbox but have not been completed. Section 2.7 describes this process. ## 2.7 Write Buffer and the WMB Instruction The following sections describe the write buffer and the WMB instruction #### 2.7.1 The Write Buffer The write buffer contains six fully associative 32-byte entries. The purpose of the write buffer is to minimize the number of CPU stall cycles by providing a finite, high-bandwidth resource for receiving store data. This is required because the 21164 can generate store data at the peak rate of one INT8 every CPU cycle. This is greater than the average rate at which the Scache can accept the data if Scache misses occur. In addition to HW\_ST and other store instructions, the STQ\_C, STL\_C, FETCH, and FETCH\_M instructions are also written into the write buffer and sent off-chip. However, unlike store instructions, these write-buffer-directed instructions are never merged into a write-buffer entry with other instructions. A write-buffer entry is invalid if it does not contain one of these commands. #### 2.7.2 The WMB Instruction The WMB instruction has a special effect on the write buffer. When it is executed, a bit is set in every write-buffer entry containing valid store data that will prevent future store instructions from merging with any of the entries. Also, the next entry to be allocated is marked with a WMB flag. At this point, the entry marked with the WMB flag does not yet have valid data in it. When an entry marked with a WMB flag is ready to issue to the Cbox, the entry is not issued until every previously issued write instruction is complete. This ensures correct ordering between store instructions issued before the WMB instruction and store instructions issued after it. Each write-buffer entry contains a content-addressable memory (CAM) for holding physical address bits <39:05>, 32 bytes of data, eight INT4 mask bits (that indicate which of the eight INT4s in the entry contain valid data), and miscellaneous control bits. Among the control bits are the WMB flag, and a no-merge bit, which indicates that the entry is closed to further merging. # 2.7.3 Entry Pointer Queues Two entry pointer queues are associated with the write buffer: a free entry queue and a pending request queue. The free-entry queue contains pointers to available invalid write-buffer entries. The pending-request queue contains pointers to valid write-buffer entries that have not yet been issued to the Cbox. The pending-request queue is ordered in allocation order. Each time the write buffer is presented with a store instruction, the physical address generated by the instruction is compared to the address in each valid write-buffer entry that is open for merging. If the address is in the same INT32 as an address in a valid write-buffer entry (that also contains a store instruction), and the entry is open for merging, then the new store data is merged into that entry and the entry's INT4 mask bits are updated. If no matching address is found, or all entries are closed to merging, then the store data is written into the entry at the top of the free-entry queue. This entry is validated, and a pointer to the entry is moved from the free-entry queue to the pending-request queue. # 2.7.4 Write-Buffer Entry Processing When two or more entries are in the pending-request queue, the Mbox requests that the Cbox process the write-buffer entry at the head of the pending-request queue. Then the Mbox removes the entry from the pending-request queue without placing it in the free-entry queue. When the Cbox has completely processed the write-buffer entry, it notifies the Mbox, and the now invalid write-buffer entry is placed in the free-entry queue. The Mbox may request that a second write-buffer entry be processed while waiting for the Cbox to finish the first. The write-buffer entries are invalidated and placed in the free-entry queue in the order that the requests complete. This order may be different from the order in which the requests were made. The Mbox requests that a write-buffer entry be processed every 64 cycles, even if there is only one valid entry. This ensures that write instructions do not wait forever to be written to memory. (This is triggered by a free running timer.) When an LDL\_L or LDQ\_L instruction is processed by the Mbox, the Mbox requests processing of the next pending write-buffer request. This increases the chances of the write buffer being empty when an STL\_C or STQ\_C instruction is issued. The Mbox continues to request that write-buffer entries be processed as long as one of the following occurs: - One buffer contains an STQ\_C, STL\_C, FETCH, or FETCH\_M instruction - One buffer is marked by a WMB flag - An MB instruction is being executed by the Mbox. This ensures that these instructions complete as quickly as possible. Every store instruction that does not merge in the write buffer is checked against every valid entry. If any entry is an address match, then the WMB flag is set on the newly allocated write-buffer entry. This prevents the Mbox from concurrently sending two write instructions to exactly the same block in the Cbox. Load misses are checked in the write buffer for conflicts. The granularity of this check is an INT32. Any load instruction matching any write-buffer entry's address is considered a hit even if it does not access an INT4 marked for update in that write-buffer entry. If a load hits in the write buffer, a conflict bit is set in the load instruction's MAF entry, which prevents the load instruction from being issued to the Cbox before the conflicting write-buffer entry has been issued and completed. At the same time, the no-merge bit is set in every write-buffer entry with which the load hit. A write-buffer flush flag is also set. The Mbox continues to request that write-buffer entries be processed until all the entries that were ahead of, and including, the conflicting write instructions at the time of the load hit have been processed. Some write instructions cannot be processed in the Scache without external environment involvement. To support this, the Moox retransmits a write instruction at the Cbox's request. This situation arises when the Scache block is not dirty when the write instruction is issued, or when the access misses in the Scache. # 2.7.5 Ordering of Noncacheable Space Write Instructions Special logic ensures that write instructions to noncacheable space are sent off-chip in the order in which their corresponding buffers were allocated (placed in the pending-request queue). # 2.8 Performance Measurement Support-Performance Counters The 21164 contains a performance recording feature. The implementation of this feature provides a mechanism to count various hardware events and causes an interrupt upon counter overflow. Interrupts are triggered six cycles after the event, and therefore, the exception PC may not reflect the exact instruction causing counter overflow. Three counters are provided to allow accurate comparison of two variables under a potentially nonrepeatable experimental condition. Counter inputs include: - Issues - Nonissues - Total cycles - Pipe dry - Pipe freeze - Mispredicts and cache misses - Counts for various instruction classifications In addition, the 21164 provides one signal-pin input (**perf\_mon\_h**) to measure external events at a rate determined by the selected system clock speed. For information about counter control, refer to the following IPR descriptions: - Hardware interrupt clear (HWINT\_CLR) register (see Section 5.1.23) - Interrupt summary register (ISR) (see Section 5.1.24) - Performance counter (PMCTR) register (see Section 5.1.27) - Bcache control (BC\_CONTROL) register (see Section 5.3.4) bits <24:19> and Table 5–31 # 2.9 Floating-Point Control Register Figure 2–3 shows the format of the floating-point control register (FPCR) and Table 2–9 describes the fields Figure 2-3 Floating-Point Control Register (FPCR) Format | 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 | | |------------------------------------------------------|---------| | S I UU DYN I I UOD I QD I UU UU B FD RM VEFFE V DD D | RAZ/IGN | | 0.000000 0.0000000000000000000000000000 | | MLO-011301 Table 2-9 Floating-Point Control Register Bit Descriptions | 33337 | | |-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Bit | Description (Meaning When Set) | | <b>&lt;</b> 63> | Summary bit (SUM). Records bitwise OR of FPCR exception bits. Equal to FPCR<57 56 55 54 53 52> | | <62> | Inexact disable (INED). Suppress INE trap and place correct IEEE nontrapping result in the destination register if the 21164 is capable of producing correct IEEE nontrapping result. | | | (continued on next page) | (------ F--8-, Table 2–9 (Cont.) Floating-Point Control Register Bit Descriptions | ss UNF trap if UNDZ<br>on.<br>UNFD, on underflow,<br>the destination register<br>EEE standard.<br>Ig mode to be used by<br>instruction's function<br>are: | |-----------------------------------------------------------------------------------------------------------------------------------------------------------| | the destination register<br>SEE standard.<br>Ig mode to be used by<br>instruction's function | | instruction's function | | 1001/00000000 | | <u> </u> | | | | | | | | | | tion or a conversion recision. | | sion operation gave a esult. | | on operation | | operation overflowed | | rform a floating divide | | erform a floating<br>d one or more of the | | | | | | | | | | | # 2.10 Design Examples The 21164 can be designed into many different uniprocessor and multiprocessor system configurations. Figures 2-4, 2-5, and 2-6 illustrate three possible configurations. These configurations employ additional system/memory controller chipsets. Figure 2-4 shows a typical uniprocessor system with a board level cache. This system configuration could be used in standalone or networked workstations. Figure 2-4 Typical Uniprocessor Configuration Figure 2-5 shows a typical multiprocessor system, each processor with a board-level cache. Each interface controller must employ a duplicate tag store to maintain cache coherency. This system configuration could be used in a networked database server application. Figure 2–5 Typical Multiprocessor Configuration Figure 2-6 shows a cacheless multiprocessor system. This system configuration could be used in high-bandwith dedicated server applications. # Hardware Interface This chapter contains the Alpha 21164 microprocessor logic symbol and provides a list of signal names and their functions. # 3.1 Alpha 21164 Microprocessor Logic Symbol Figure 3-1 shows the logic symbol for the 21164 chip. Figure 3–1 Alpha 21164 Microprocessor Logic Symbol # 3.2 Alpha 21164 Signal Names and Functions The 21164 is contained in a 499-pin IPGA package. Of these pins, 291 are used for functional signals. There are three spare (unused) signal pins. The remaining pins are used for power (104) and ground (101). The following table defines the 21164 signal types referred to in this section: | Signal Type | Definition | | |-------------|---------------|--| | В | Bidirectional | | | I | Input only | | | O | Output only | | The remaining two tables describe the function of each 21164 external signal. Table 3–1 lists all signals in alphanumeric order. This table provides full signal descriptions. Table 3–2 lists signals by function and provides an abbreviated description. Table 3-1 Alpha 21164 Signal Descriptions | Signal | Туре | Count | Description | |----------------|------|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | addr_h<39;4> | В | 36 | Address bus. These bidirectional signals provide the address of the requested data or operation between the 21164 and the system. If bit 39 is asserted, then the reference is to noncached, I/O memory space. | | addr_bus_req_h | 1 | 1 | Address bus request. The system interface uses this signal to gain control of the addr_h<39:4>, addr_cmd_par_h, and cmd_h<3:0> pins. | | addr_emd_par_h | В | 1 | Address command parity. This is the odd parity bit on<br>the current command and address buses. The 21164<br>takes a machine check if a parity error is detected. The<br>system should do the same if it detects an error. | | | | | (continued on next page) | Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Type | Count | Descrip | tion | | |-----------------|----------|-------|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | addr_res_h<1:0> | 0 | 2 | comma | | l> and <0>. For system<br>es these pins to indicate the<br>Scache. | | | | | Bits | Command | Meaning | | | | | 00 | NOP | Nathing | | | | | 01 | NOACK | Data not found or clean | | | | | 10 | ACK/Scache | Data from Scache | | | | | 11 | ACK/Bcache | Data from Beache | | | | | | | | | addr_res_h<2> | 0 | 1 | <b>21164</b> u | | <ul> <li>For system commands, the<br/>dicate if the command hits in<br/>d lock register.</li> </ul> | | cack_h | <b>I</b> | 1 | | o <mark>ackno</mark> wledge ar | The system interface uses thing one of the commands driver | | cfail_h | I | 1 | asserter command success asserter cack_h timeout reset an | d during a cack cy<br>nd to indicate tha<br>ful. In this case,<br>d together. It can<br>to force an instr<br>eyent. This cause<br>nd trap to the ma | nal has two uses. It can be yele of a WRITE BLOCK LOC at the write operation is not both cack_h and cfail_h are also be asserted instead of uction fetch/decode unit (Ibox) ses the 21164 to do a partial chine check (MCHK) PALcode ates a serious hardware error. | | clk_mode_h<1:0> | I | 2 | between | n <b>osc_clk_in</b> and | signals specify a relationship<br>the CPU cycle time. These<br>in normal operation mode. | | cmd_h<3:0> | В | 4 | tables o | nds from the com<br>define the comma<br><3:0> bus by the | gnals drive and receive the mand bus. The following nds that can be driven on the 21164 or the system. For efer to Section 4.1.1.1. | | | | | | | (continued on next pag | Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Туре | Count | Description | | |--------|------|-------|---------------------------|--| | | | | 21164 Commands to System: | | | | | | cmd_h | | | cmd_h<br><3:0> | Command | Meaning | |----------------|-----------------------|-------------------------------------------------------| | 0000 | NOP | Nothing. | | 0001 | LOCK | Lock register address. | | 0010 | FETCH | The 21164 passes a FETCH instruction to the system. | | 0011 | FETCH_M | The 21164 passes a FETCH_M instruction to the system. | | 0100 | MEMORY<br>BARRIER | MB instruction. | | 0101 | SET DIRTY | Dirty bit set if shared bit is clear. | | 0110 | WRITE BLOCK | Request to write a block. | | 0111 | WRITE BLOCK<br>LOCK | Request to write a block with lock. | | 1000 | READ MISSO | Request for data. | | 1001 | READ MISS1 | Request for data. | | 1010 | READ MISS MOD0 | Request for data; modify intent. | | 1011 | READ MISS MOD1 | Request for data; modify intent. | | 1100 | BCACHE VICTIM | Bcache victim should be removed. | | 1101 | · | Reserved. | | 1110 | READ MISS MOD<br>STC0 | Request for data, STx_C data. | | 1111 | READ MISS MOD<br>STC1 | Request for data, STx_C data. | Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Туре | Count | Descript | ion | | |-----------------|-------|-------|----------------------------------------------------|-----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------| | | | | System | Commands to 21164: | | | | | | cmd_h<br><3:0> | Command | Meaning | | | | | 0000 | NOP | Nothing. | | | | | 0001 | FLUSH | Remove block from caches; return dirty data. | | | | | 0010 | INVALIDATE | Invalidate the block from eaches. | | | | | 0011 | SET SHARED | Block goes to the shared state. | | | | | 0100 | READ | Read a block. | | | | | 0101 | READ DIRTY | Read a block; set shared. | | | | | 0111 | READ DIRTY/INV | Read a block; invalidate. | | cpu_clk_out_h | O | 1 | CPU clos | ck output. This signal i | s used for test purposes | | dack_h | I | 1 | | | interface uses this<br>between the 21164 and | | data_h<127:0> | В | 128 | | s. These signals are use<br>4, the system, and the | ed to move data between<br>Bcache. | | data_bus_req_h | | 1 | n, then to n+2. Bet assert id this sign drives the | fore asserting this sign. lle_bc_h for the correct al is deasserted in system al data bus in sysclk n- | e the data bus in sysclk<br>al, the system should<br>t number of cycles. If | | data_check_h<15 | :0> B | 16 | ECC for | the current data cycle. | even byte parity or INTS<br>Refer to Section 4.13.1<br>of each <b>data_check_h</b> | | | | | | | (continued on next page | Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Type | Count | Description | |----------------|------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | data_ram_oe_h | 0 | 1 | Data RAM output enable. This signal is asserted for Bcache reads. | | data_ram_we_h | O | 1 | Data RAM write enable. This signal is asserted for any Bcache write operation. Refer to Section 5.3.5 for timing details. | | dc_ok_h | I | 1 | dc voltage OK. Must be deasserted until dc voltage reaches proper operating level. After that, dc_ok_h is asserted. | | <b>fill_h</b> | I | 1 | Fill warning. If this signal is asserted at the rising edge of sysclk $n$ , then the 21164 provides the address indicated by fill id_h to the Bcache in sysclk $n+2$ . The Bcache begins to write in that sysclk. At the end of the rising edge of sysclk $n+1$ , the 21164 waits for the next sysclk and then begins the write again if $\mathbf{dack}_h$ is not asserted. | | fill_error_h | I | 1 | Fill error. If this signal is asserted during a fill from memory, it indicates to the 21164 that the system has detected an invalid address or hard error. The system still provides an apparently normal read sequence with correct ECC/parity though the data is not valid. The 21164 traps to the machine check (MCHK) PALcode entry point and indicates a serious hardware error. fill error_h should be asserted when the data is returned. Each assertion produces a MCHK trap. | | fill_id_h | 1 | 1 | Fill identification. Asserted with <b>fill_h</b> to indicate whicl register is used. The 21164 supports two outstanding load instructions. If this signal is asserted in sysclk $n$ , the 21164 provides the address from miss register 1. If it is deasserted, then the address in miss register 0 is used for the read operation. | | fill_nocheck_h | 1 | 1 | Fill checking off. If this signal is asserted, then the 21164 does not check the parity or ECC for the current data cycle on a fill. | | idle_be_h | I | 1 | Idle Bcache. When asserted, the 21164 finishes the current Bcache read or write operation but does not start a new read or write operation until the signal is deasserted. Systems must assert this signal in time to idle the Bcache before fill data arrives. | | index_h<25:4> | O | 22 | Index. These signals index the Bcache. | | | | | (continued on next page | Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Туре | Count | Description | | | | | |-------------------|------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|--|--|--| | int4_valid_h<3:0> | 0 4 | 4 | INT4 data valid. During write operations, these are used to indicate which INT4 bytes of data are This is useful for noncached write operations that been merged in the write buffer. | | | | | | | | | int4_valid_h<3:0> | Write Meaning | | | | | | | | xxx1 | data_h<31:0> valid | | | | | | | | xx1x | data_h<63:32> valid | | | | | | | | x1xx | <b>data_h&lt;95:64&gt; v</b> alid | | | | | | | | 1xxx | <b>data_h&lt;127:96&gt;</b> valid | | | | | • | | | INT8 bytes of a 32-b | ons, these signals indicate which byte block need to be read and essor. This is useful for read hed memory. | | | | | | | | int4_valid_h<3:0> | Read Meaning | | | | data\_h<63:0> valid xxx1 xx1xdata\_h<127:64> valid xlxx data\_h<191:128> valid data\_h<255:192> valid 1xxx Note: For both read and write operations, multiple int4\_valid\_h<3:0> bits can be set simultaneously. Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Type | Count | Descript | ion | · | | | |------------------------------|-----------|--------|----------------------------------------------|--------------------------------------------------------------|----------------------------------------------|---------------------------------------------|------------------------------------------------------------------------------| | irq_h<3:0> | I | 4 | modes of<br>level-sen<br>requests. | operation<br>sitive sign<br>During in | . During no<br>als are used<br>nitialization | ormal opera<br>d to signal<br>ı, these sign | have multiple<br>ation, these<br>interrupt<br>nals are used<br>ys_clk_out as | | | | | <3> | i:<br><2> | rq_h<br><1> | <0> | Ratio | | | | | Low | Low | High | High | 3 | | | | | Low | Low<br>High | Low | Low | 4 | | | | | Low | High | Low | High | 5 | | • | | | Low | High | High | Low | 6 | | | | | Low | High | High | High | 7 | | | | | High | Low | Low | Low | 8 | | | | | High | Low | Low | High | 9 | | | | | High | Low | High | Low | 10 | | | | | High | Low | High | High | 11 | | | | | High | High | Low | Low | 12 | | | · · · · · | `<br>ا | High | High | Low | High | 13 | | | | | High | High | High | Low | 14 | | | | | High | High | High | High | 15 | | mch_hlt_irq_h | I | 1 | multiple<br>this sign | modes of<br>al is used<br>3). During | | During init<br>ys_clk_out | gnal has<br>cialization,<br><b>2_h,l</b> delay (see<br>s used to signa | | osc_elk_in_h<br>osc_elk_in_l | I | 1 | different<br>of the 2<br>desired<br>operatin | tial clock in<br>1164. Thes<br>internal close<br>g condition | se signals a<br>ock frequen | the fundar<br>re driven a<br>cy. (Under | mental timing<br>t twice the | Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Туре | Count | Description | |----------------------------------|----------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | perf_mon_h | Ι | 1 | Performance monitor. This signal provides input to the 21164 internal performance monitoring hardware from off-chip events (such as bus activity). | | port_mode_h<1:0> | • I • • | 2 | Select test port interface modes (normal, manufacturing and debug). For normal test mode, both signals must be set low. | | pwr_fail_irq_h | <b>I</b> | 1 | Power failure interrupt request. This signal has multiple modes of operation. During initialization, this signal is used to set up sys_clk_out2_h,l delay (see Table 4-3). During normal operation, this signal is use to signal a power failure. | | ref_clk_in_h | , I | 1 | Reference clock input. Optional Used to synchronize the timing of multiple microprocessors to a single reference clock. | | scache_set_h<1:0> | О | 2 | Secondary cache set. During a read miss request, these signals indicate the Scache set number that will be filled when the data is returned. This information can be used by the system to maintain a duplicate copy of the Scache tag store. | | shared_h | I | 1 | Keep block status shared. For systems without a Beache, when a WRITE BLOCK/NO VICTIM PENDING OF WRITE BLOCK LOCK command is acknowledged, this pin can be used to keep the block status shared or private in the Scache. | | srom_clk_h | 0 | 1 | Serial ROM clock. Supplies the clock that causes the SROM to advance to the next bit. The cycle time of this clock is 128 times the cycle time of the CPU clock. | | srom_data_h | 1 | 1 | Serial ROM data. Input for the SROM. | | srom_oe_l | O | 1 | Serial ROM output enable. Supplies the output enable to the SROM. | | srom_present_l | В | 1 | Serial ROM present. Indicates that SROM is present and ready to load the Icache. | | sys_clk_out1_h<br>sys_clk_out1_l | 0<br>0 | 1 | System clock outputs. Programmable system clock (cpu_clk_out_h divided by a value of 3 to 15) is used for board-level cache and system logic. | | | | | (continued on next page | Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Type | Count | Description | | |----------------------------------|------|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------| | sys_clk_out2_h<br>sys_clk_out2_l | 0 | 1 | System clock outputs. The value of sysis delayed by a programmable amount f cycles. | | | sys_mch_chk_irq_h | I | 1 | System machine check interrupt request has multiple modes of operation. During it is used to set up sys_clk_out2_h,l d Table 4-3). During normal operation, it signal a machine interrupt check request | g initialization,<br>elay (see<br>t is used to | | sys_reset_l | I | 1 | System reset. This signal protects the damage during initial power-up. It musuntil dc_ok_h is asserted. After that, i and the 21164 begins a sequence of rese | st be asserted<br>t is deasserted | | system_lock_flag_h | I | 1 | System lock flag. During fills, the 21164 the value of the system copy with its or produce the true value of the lock flag. | | | tag_ctl_par_h | В | 1 | Tag control parity. This signal indicates tag_valid_h, tag_shared_h, and tag_cfills, the system should drive the correct the state of the valid, shared, and dirty | dirty_h. During<br>t parity based o | | tag_data_h<38:20> | В | 19 | Bcache tag data bits. This bit range su<br>64-megabyte Bcaches. | pports 1- to | | tag_data_par_h | В | 1 | Tag data parity bit. This signal indicate tag_data_h<38:20>. | es odd parity fo | | tag_dirty_h | В | 1 | Tag dirty state bit. During fills, the systems assert this signal if the 21164 request is MOD, and the shared bit is not asserted Table 4-6 for information about Bcache | s a READ MISS<br>ed. Refer to | | tag_ram_oe_h | 0 | 1 | Tag RAM output enable. This signal is any Beache read operation. | asserted during | | tag_ram_we_h | O | 1 | Tag RAM write enable. This signal is a any tag write operation. During the fir a write operation, the write pulse is desecond and following CPU cycles of a w the write pulse is asserted if the corres in the write pulse register is asserted. WE_CTL<8:0> control the shape of the Section 5.3.5). | st CPU cycle of asserted. In the rite operation, sponding bit Bits BC_ | | | | | (continu | ied on next pag | Table 3-1 (Cont.) Alpha 21164 Signal Descriptions | Signal | Туре | Count | Description | |--------------------|------------|----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | tag_shared_h | В | 1 | Tag shared bit. During fills, the system should drive<br>this signal with the correct value to mark the cache<br>block as shared. See Table 4-6 for information about<br>Bcache protocol. | | tag_valid_h | В | 1 | Tag valid bit. During fills, this signal is asserted to indicate that the block has valid data. See Table 4-6 for information about Bcache protocol. | | tek_h | Ι | 1 | JTAG boundary scan clock. | | tdi_h | Ι | 1 | JTAG serial boundary scan data in signal. | | tdo_h | Ο | 1 | JTAG serial boundary scan data out signal. | | temp_sense_h | I | 1 | Temperature sense. This signal is used to measure the die temperature and is for manufacturing use only. | | test_status_h<1:0> | O | 2 | Icache test status. These signals are used to extract Icache test status information from the chip. test_status_h<0> is asserted if ICSR<39> is true, on Ibox timeout, or remains asserted if the Icache built-in self-test (BiSt) fails. Also, test_status_h<0> outputs the value written by PALcode to test_status_h<1> through IPR access. For additional information, refer to Section 12.2.4. | | tms_h | <b>B</b> < | <b>1</b> | JTAG test mode select signal. | | trst_l | <b>B</b> | 1 | JTAG test access port (TAP) reset signal. | | victim_pending_h | 0 | 1 | Victim pending. When asserted, this signal indicates that the current read miss has generated a victim. Systems can delay requesting the command or address bus until the victim is removed. | Table 3-2 lists signals by function and provides an abbreviated description. Table 3-2 Alpha 21164 Signal Descriptions by Function | Signal | Туре | Count | Description | | |--------------------|--------------|----------|--------------------------|-----| | Clocks | | | | | | clk_mode_h<1:0> | I | 2 | Clock test mode. | | | cpu_clk_out_h | O | 1 2 | CPU clock output. | 400 | | osc_clk_in_h,l | I | 2 | Oscillator clock inputs. | | | ref_clk_in_h | Ι | 1 | Reference clock input. | | | sys_clk_out1_h,l | 0 | 2 | System clock outputs. | | | sys_clk_out2_h,l | 0 | 2 | System clock outputs. | | | sys_reset_l | Ι | 1 | System reset. | | | Bcache | | | | | | data_h<127:0> | В | 128 | Data bus. | | | data_check_h<15:0> | $\mathbf{B}$ | 16 | Data check. | | | data_ram_oe_h | 0 | 1 | Data RAM output enable. | | | data_ram_we_h | O | 1 | Data RAM write enable. | | | index_h<25:4> | 0 | 22 | Index. | | | tag_ctl_par_h | В | 1 | Tag control parity. | | | tag_data_h<38:20> | В | 19 | Bcache tag data bits. | | | tag_data_par_h | В | 1 | Tag data parity bit. | | | tag_dirty_h | В | <b>1</b> | Tag dirty state bit. | | | tag_ram_oe_h | 0 | 1 | Tag RAM output enable. | | | tag_ram_we_h | 0 | 1 | Tag RAM write enable. | | | tag_shared_h | В | 1 | Tag shared bit. | | | tag_valid_h | В | 1 | Tag valid bit. | | Table 3-2 (Cont.) Alpha 21164 Signal Descriptions by Function | Signal | Туре | Count | Description | | |-----------------------------|------------|----------|--------------------------------------|---------| | System Interface | | | | | | addr_h<39:4> | В | 36 | Address bus. | | | addr_bus_req_h | I | 1 | Address bus request. | * | | addr_cmd_par_h | <b>B</b> , | 1 | Address command parity. | | | addr_res_h<2:0> | 0 | 3 | Address response | | | cack_h | I | 1 | Command acknowledge. | | | cfail_h | I | 1 | Command fail. | | | cmd_h<3:0> | В | 4 | Command bus. | | | dack_h | I | 1 | Data acknowledge. | | | data_bus_req_h | I | 1 | Data bus request. | | | fill_h | I | 1 | Fill warning. | • | | fill_error_h | I | 1 | Fill error. | | | fill_id_h | I | 1 | Fill identification. | | | fill_nocheck_h | I | 1 | Fill checking off. | | | idle_bc_h | I | 1 | Idle Bcache. | | | int4_valid_h<3:0> | 0 💉 | 4 | INT4 data valid. | | | scache_set_h<1:0> | 0 | 2 | Secondary cache set. | | | shared_h | T | 1 | Keep block status shared. | | | system_lock_flag_h | I | 1 | System lock flag. | | | victim_pending_h | 0 | 1 | Victim pending. | | | | | <u> </u> | | | | Interrupts | | F . | | | | irq_h<3:0> | I | 4 | System interrupt requests. | | | mch_hlt_irq_h | Ĭ | 1 | Machine halt interrupt request. | | | pwr_fail_i <del>r</del> q_h | Ι | 1 | Power failure interrupt request. | | | sys_mch_chk_irq_h | I | 1 | System machine check interrupt reque | est. | | | | | (continued on next | t page) | Table 3–2 (Cont.) Alpha 21164 Signal Descriptions by Function | Signal | Type | Count | Description | |-----------------------|---------|----------|----------------------------------------------------------------------| | Test Modes and Miscel | laneous | | | | dc_ok_h | I | 1 | dc voltage OK. | | perf_mon_h | I | 1 | Performance monitor. | | port_mode_h<1:0> | I | 2 | Select test port interface modes (normal, manufacturing, and debug). | | srom_clk_h | O | 1 | Serial ROM clock. | | srom_data_h | I | 1 | Serial ROM data. | | srom_oe_l | O | 1 | Serial ROM output enable. | | srom_present_l | В | 1 | Serial ROM present. | | tck_h | Ι | 1 | JTAG boundary scan clock. | | tdi_h | I | 1 | JTAG serial boundary scan data in. | | tdo_h | O | 1 | JTAG serial boundary scan data out. | | temp_sense_h | I | 1 | Temperature sense. | | test_status_h<1:0> | 0 | 2 | Icache test status. | | tms_h | В | <b>1</b> | JTAG test mode select. | | trst_l | В | 1 | JTAG test access port (TAP) reset. | # Clocks, Cache, and External Interface Functional Description This chapter describes the Alpha 21164 microprocessor external interface, which includes the backup cache (Bcache) and system interfaces. It also describes the clock circuitry, locks, interrupt signals, and ECC/parity generation. It is organized as follows: - Introduction to the external interface - Clocks - Physical address considerations - Bcache structure and operation - Cache coherency - Locks mechanisms - 21164-to-Beache transactions - 21164-initiated system transactions - System-initiated transactions - Data bus and command/address bus contention - 21164 interface restrictions - 21164/system race conditions - Data integrity, Bcache errors, and command/address errors - Interrupts Chapter 3 lists and defines all 21164 hardware interface signal pins. Chapter 9 describes the 21164 hardware interface electrical requirements. ## 4.1 Introduction to the External Interface A 21164-based system can be divided into three major sections: - Alpha 21164 microprocessor - Optional external Bcache - System interface logic - Optional duplicate tag store - Optional lock register - Optional victim buffers The 21164 external interface is flexible and mandates few design rules. allowing a wide range of prospective systems. The interface includes a 128-bit bidirectional data bus, a 36-bit bidirectional address bus, and several control signals. Read and write speeds of the optional Brache array can be programmed by means of register bits. Read and write speeds are independent of each other and the system interface clock frequency. The cache system supports a selectable 32-byte or 64-byte block size. Figure 4-1 shows a simplified view of the external interface. The function and purpose of each signal is described in Chapter 3. # 4.1.1 System Interface This section describes the system or external bus interface. The system interface is made up of bidirectional address and command buses, a data bus that is shared with the Beache interface, and several control signals. The system interface is under the control of the bus interface unit (BIU) in the Cbox. The system interface is a 128-bit bidirectional data bus. The cycle time of the system interface is programmable to speeds of 3 to 15 times the CPU cycle time. All system interface signals are driven or sampled by the 21164 on the rising edge of signal sys\_clk\_out1\_h. In this chapter, this edge is sometimes referred to as "sysclk." #### 4.1.1.1 Commands and Addresses The 21164 can take up to two commands from the system at a time. The Scache or Bcache or both are probed to determine what must be done with the command. - If nothing is to be done, the 21164 acknowledges receiving the command. - If a Bcache read, set shared, or invalidate operation is required, the 21164 performs the task as soon as the Bcache becomes free. The 21164 acknowledges receiving the command at the start of the Bcache transaction. There are two miss and two victim buffers in the BIU. They can hold one or two miss addresses and one or two Scache victim addresses or up to two shared write operations at a time. - A miss occurs when the 21164 searches its caches but does not find the addressed block. The 21164 can queue two misses to the system. - An Scache victim occurs when the 21164 deallocates a dirty block from the Scache. #### 4.1.2 Bcache Interface The 21164 includes an interface and control for an optional backup cache (Bcache). The Bcache interface is made up of the following: - A 128-bit data bus (which it shares with the system interface) - Index address bits (index\_h<25:4>) - Tag and state bits for determining hit and coherence - SRAM output and write control signals #### 4.1.2.1 Bcache Victim Buffers A Bcache victim is generated when the 21164 deallocates a dirty block from the Bcache. Each time a Bcache victim is produced, the 21164 stops reading the Bcache until the system takes the current victim. Then Bcache transactions resume. External logic may help improve system performance by implementing any number of victim buffers. The victim buffers hold cache victims and enable the cache location to be filled with data from the desired address. Data in the victim buffers will be written to memory at a later time. This action reduces the time that the 21164 is waiting for data. # 4.2 Clocks The 21164 develops three clock signals that are are available at output pins: | Signal | Description | | | |------------------|----------------------------------------------------------------------------------------------|------------------------------|----------------------| | cpu_clk_out_h | A 21164 internal clock that may or may n | ot drive | the system clock. | | sys_clk_out1_h,l | A clock of programmable speed supplied t | o the ex | ternal interface. | | sys_clk_out2_h,l | A delayed copy of <b>sys_clk_out1_h,l</b> The and is an integer number of <b>cpu_clk_out</b> | delay is<br>_ <b>h</b> perio | programmable<br>ods. | The 21164 may use ref\_clk\_in\_h as a reference clock when generating sys\_clk\_out1\_h,l and sys\_clk\_out2\_h,l. # 4.2.1 CPU Clock The 21164 uses the differential input clock lines osc\_clk\_in\_h, l as a source to generate its CPU clock. The input signals clk\_mode\_h<1:0> control generation of the CPU clock as listed in Table 4–1 and as shown in Figure 4–2. Table 4-1 CPU Clock Generation Control | Mode | clk_mode_h≼¹ | l:0> Divisor | Description | |-------------|--------------|--------------|---------------------------------------------------------------------------------------------| | Normal | 0 0 | 2 | Usual operation—CPU clock frequency is ½ input frequency. | | Chip test | 0 1 | 1 | CPU clock frequency is the same as the input clock frequency to accommodate chip testers. | | Module test | 1 0 | 4 | CPU clock frequency is ¼ input frequency to accommodate module testers. | | Reset | 1 1 | | Initializes CPU clock allowing system clock to be synchronized to a stable reference clock. | Caution A clock source should always be provided on **osc\_clk\_in\_h**, **l** when signal **dc\_ok\_h** is asserted. 21164 osc\_clk\_in\_h, I CPU Clock Digital cpu\_clk\_out h Divider clk\_mode\_h<1:0> PLL (/1, /2, or /4) ref\_clk\_in\_h System Clock sys\_clk\_out1\_h, l\_ Divider irq\_h<3:0> (/3 through /15) mch\_hlt\_irq\_h System Clock sys\_clk\_out2\_h, I pwr\_fail\_irq\_h Delay sys\_mch\_chk\_irq\_h (0 through 7) sys\_reset\_l dc\_okay\_h MK-1455-02 Figure 4-2 Clock Signals and Functions # 4.2.2 System Clock The CPU clock is the source clock used to generate the system clock sys\_clk\_ out1\_h, l. The system clock divisor controls the frequency of sys\_clk\_out1\_h, 1. The divisor, 3 to 15, is obtained from the four interrupt lines irq\_h<3:0> at power-up as listed in Table 4-2. The system clock frequency is determined by dividing the ratio into the CPU clock frequency. Table 4-2 System Clock Divisor | irq_h<3> | irq_h<2> | irq_h<1> | irq_h<0> | Ratio | | |----------|----------|----------|----------|-------|------------| | Low | Low | High | High | 3 | | | Low | High | Low | Low | 4 | | | Low | High | Low | High | 5 | | | Low | High | High | Low | 6 | | | Low | High | High | High | 7 | | | High | Low | Low | Low | 8 | | | High | Low | Low | High | 9 | | | High | Low | High | Low | 10 | | | High | Low | High | High | 11 | <b>***</b> | | High | High | Low | Low | 12 | | | High | High | Low | High | 13 | | | High | High | High | Low | 14 | | | High | High | High | High | 15 | | Figure 4-3 shows the 21164 driving the system clock on a uniprocessor system. Figure 4–3 Alpha 21164 Uniprocessor Clock ## 4.2.3 Delayed System Clock The system clock sys\_clk\_out1\_h, l is the source clock for the delayed system clock sys\_clk\_out2\_h, 1. These clock signals provide flexible timing for system use. The delay unit, 0 to 7, is obtained from the three interrupt signals mch hlt\_irq\_h, pwr\_fail\_irq\_h, and sys\_mch\_chk\_irq\_h at power-up as listed in Table 4-3. The output of this programmable divider is symmetric if the divisor is even. The output is asymmetric if the divisor is odd. Table 4-3 System Clock Delay | sys_mch_chk_irq_h | pwr_fa | il_irq_h | mch_hlt_ir | q_h | Delay Cycles | |-------------------|--------|----------|------------|------|--------------| | Low | Low | | Low | | 0 | | Low | Low | | High | | 1 | | Low | High | | Low | | 2 | | Low | High | **** | High | • | 3 | | High | Low | • | Low | | 4 | | High | Low | | High | .ek. | 5 | | High | High | | Low | | 6 | | High | High | | High | | 7 | #### 4.2.4 Reference Clock The 21164 provides a reference clock input so that other CPUs and system devices can be synchronized in multiprocessor systems. If a clock is asserted on signal ref\_clk\_in\_h, then the sys\_clk\_out1\_h, I signals are synchronized to that reference clock. The reference clock input should be connected to Vdd if the input is not to be used. The 21164 synchronizes the sys\_clk\_out1 h frequency with the ref\_clk\_in\_h signal by means of a digital phase-locked loop (DPLL). The DPLL does not lock the two frequencies, but rather, creates a window. To accomplish this, the frequency of signal sys\_clk\_out1 must be slightly higher, but no greater than 0.35% higher, than that of signal ref\_clk\_in\_h. This causes the rising edge of sys\_clk\_out1 to drift back toward the rising edge of ref\_clk\_in\_h. The 21164 detects when the edges meet and stalls the internal clock generator for one osc\_elk\_in cycle. This moves the rising edge of sys\_clk\_out1 back in front of ref\_clk\_in\_h. Figure 4-4 shows a multiprocessor 21164 system synchronized to a reference clock. Memory ASIC ref clk in sys\_clk\_out 21164 Bus ASIC Reference Clock Memory **ASIC** ref\_clk\_in sys\_clk\_out 21164 Bus **ASIC** LJ-03675-TI0 Figure 4-4 Alpha 21164 Reference Clock for Multiprocessor Systems ### 4.2.4.1 Reference Clock Examples This section contains example calculations of setting time in systems using the DPLL for synchronization. After sys\_clk\_out1\_h,l has stabilized (20 cycles after irq\_h<3:0> have settled) there will be a delay before sys\_clk\_out1\_h,l comes into lock with ref\_clk\_in\_h. The two cases for this event are described in Section 4.2.4.1.1 and Section 4.2.4.1.2. 4.2.4.1.1 Case 1: ref\_clk\_in\_h Initially Sampled Low by DPLL When the DPLL initially samples ref\_clk\_in\_h in the low state, as shown in Figure 4-5, it slips its internal cycle repeatedly until it samples ref\_clk\_in\_h in the high state. After it samples ref\_clk\_in\_h in the high state, the DPLL stays in lock mode. Figure 4-5 ref\_clk\_in\_h Initially Sampled Low The worst case (slowest) maximum rate at which the DPLL will slip its internal cycle (the frequency of phase slips) is calculated from the lock range specification of 0.35%. In effect, an average of 0.35% period is added to each sys\_clk\_out1\_h,l period until lock mode is reached. Assuming the worst case **ref\_clk\_in\_h** duty cycle is 60/40 to 40/60: $$SettlingTime = \frac{0.6*RefClockPeriod}{0.0035} = 171*RefClockPeriod$$ Depending upon the **sys\_clk\_out1\_h,l** ratio, the DPLL may come into lock much more quickly. The DPLL may insert phase slips more frequently at smaller **sys\_clk\_out1\_h,l** ratios. 4.2.4.1.2 Case 2: ref\_clk\_in\_h Initially Sampled High by DPLL When the DPLL initially samples ref\_clk\_in\_h in the high state, as shown in Figure 4-6, it will not slip its internal cycle until it samples ref\_clk\_in\_h in the low state. After it samples ref\_clk\_in\_h in the low state, the DPLL stays in lock mode. Figure 4-6 ref\_clk\_in\_h Initially Sampled High The The rate at which sys\_clk\_out1\_h,l gains on ref\_clk\_in\_h depends on the difference in frequency of the two signals. Assuming that: ref\_clk\_in\_h is nominally selected to run 0.175% slower than sys\_clk\_out1\_h,l (in the center of the specified lock range), and that worst case deviation of 200 ppm from the specified frequency for ref\_clk\_in\_h and osc\_clk\_in\_h,l, Then the worst case (smallest) frequency difference is calculated to be, 0.00175 - 200ppm - 200ppm = 0.00135 = 0.135% | | SettlingTime = | RefClockHighRatio*RefClockPeri | <u>od</u> | |---------|--------------------|----------------------------------------|-------------| | | | Note | | | referen | nce clock high rat | tio equals the portion of the <b>r</b> | ef_clk_in_h | period that **ref\_clk\_in\_h** is high. Assuming the worst case ref\_clk\_in\_h duty cycle is 60/40 to 40/60: $$SettlingTime = \frac{0.6*RefClockPeriod}{0.00135} = 444*RefClockPeriod$$ # 4.3 Physical Address Considerations This section lists and describes the physical address regions. Cache and data wrapping characteristics of physical addresses are also described. # 4.3.1 Physical Address Regions Physical memory of the 21164 is divided into three regions: - 1. The first region is the first half of the physical address space. It is treated by the 21164 as memory-like. - 2. The second region is the second half of the physical address space except for a 1M-byte region reserved for Cbox IPRs. It is treated by the 21164 as noncachable. - 3. The third region is the 1M-byte region reserved for Cbox IPRs. In the first region, write invalidate caching, write merging, and load merging are all permitted. All 21164 accesses in this region are 32- or 64-byte depending on the programmable block size. The 21164 does not cache data accessed in the second and third region of the physical address space; 21164 read accesses in these regions are always 32-byte requests. Load merging is permitted, but the request includes a mask to tell the system environment which INT8s are accessed. Write merging is permitted. Write accesses are 32-byte requests with a mask indicating which INT4s are actually modified. The 21164 never writes more than 32 bytes at a time in noncached space. The 21164 does not broadcast accesses to the Cbox IPR region if they map to a Cbox IPR. Accesses in this region, that are not to a defined Cbox IPR, produce UNDEFINED results. The system should not probe this region. Table 4-4 shows the 21164 physical memory regions. **Table 4-4 Physical Memory Regions** | Region | Address Range | Description | |--------------|---------------------------------------------|----------------------------------------------------------------------------------------------------------------------| | Memory-like | 00 0000 0000–<br>7F FFFF FFFF <sub>16</sub> | Write invalidate cached, load, and store merging allowed. | | Noncacheable | 80 0000 0000–<br>FF FFEF FFFF <sub>16</sub> | Not cached, load merging limited. | | IPR region | FF FFF0 0000–<br>FF FFFF FFFF $_{16}$ | Accesses do not appear on the interface unless an undefined location is accessed (which produces UNDEFINED results). | # 4.3.2 Data Wrapping The 21164 requires that wrapped read operations be performed on INT16 boundaries. READ, READ DIRTY, and FLUSH commands are all wrapped on INT16 boundaries as described here. The valid wrap orders for 64-byte blocks are selected by addr\_h<5:4>. They are: 0, 1, 2, 3 1, 0, 3, 2 2, 3, 0, 1 3, 2, 1, 0 For 32-byte blocks, the valid wrap orders are selected by addr\_h<4>. They are: 0, 1 1, 0 WRITE BLOCK and WRITE BLOCK LOCK commands from the 21164 are not wrapped. They always write INT16 0, 1, 2, and 3. BCACHE VICTIM commands provide the data with the same wrap order as the read miss that produced them. # 4.3.3 Noncached Read Operations Read operations to physical addresses that have addr\_h<39> asserted are not cached in the Dcache, Scache, or Bcache. They are merged like other read operations in the miss address file (MAF). To prevent several read operations to noncached memory from being merged into a single 32-byte bus request, software must insert memory barrier (MB) instructions or set MAF\_MODE IPR bit [IO\_NMERGE]. The MAF merges as many Dstream read operations together as it can and sends the request to the BIU through the Scache. Rather than merging two 32-byte requests into a single 64-byte request the BIU requests a READ MISS from the system. Signals int4\_valid\_h<3:0> indicate which of the four quadwords are being requested by software. The system should return the fill data to the 21164 as usual. The 21164 does not write the Dcache, Scache, or Bcache with the fill data. The requested data is written in the register file or Icache. | | Note | | | *************************************** | | |------------------------------------|-----------|-----------|-----------|-----------------------------------------|------| | | | | | | | | A special case using int4_valid_ | h<3:0> | ccurs dur | ing an Ic | ache fill | ?"In | | this case the entire returned bloc | k is vali | d althoug | h int4_va | lid_h< | 3:0> | | indicates zero. | | | V 17 | | | # 4.3.4 Noncached Write Operations Write operations to physical addresses that have addr\_h<39> asserted are not written to any of the caches. These write operations are merged in the write buffer before being sent to the system. If software does not want write operations to merge, it must insert MB or WMB instructions between them. When the write buffer decides to write data to noncached memory, the BIU requests a WRITE BLOCK. During each data cycle, int4\_valid\_h<3:0> indicates which INT4s within the INT16 were actually written. # 4.4 Bcache Structure The 21164 supports a 1M-byte, 2M-byte, ..., 32M-byte and 64M-byte Bcache. The size is under program control and is specified by BC\_CONF<2:0>, $(BC\_SIZE<2:0>).$ The Bcache block size may consist of 32-byte or 64-byte blocks. The Scache also supports either 32-byte or 64-byte blocks. The block size must be the same for both and is selected using SC\_CTL<12>, [SC\_BLK\_SIZE]. Off-the-shelf static RAMs (SRAMs) may be connected to the 21164 without many extra components although fanout buffers may be required for the index lines. The SRAMs are directly controlled by the 21164, and the Bcache data lines are connected to the 21164 data bus. The 21164 partitions physical address (addr\_h <39:5>) into an index field and a tag field. The 21164 presents index\_h <25:4> and tag\_data\_h<38:20> to the Bcache interface. The system designer uses the signal lines needed for a particular size Bcache. For example the smallest Bcache (1 MB) needs index\_h <19:4> to address the cache block while the tag field would be tag\_data\_h<38:20>. Only those bits that are actually needed for the amount of cached system main memory need to be stored in the Bcache tag, although the 21164 uses all the relevant tag address bits for that Bcache size on its tag compare. A larger Bcache uses more index bits and fewer tag address bits. The CPU data bus is 16 bytes wide (128 bits) and thus each Bcache transaction requires two data cycles for a 32-byte block or four data cycles for a 64-byte block. # 4.4.1 Duplicate Tag Store In systems that have a Bcache, it is possible to build a full copy of the Bcache tag store. This data can then be used to filter requests coming off the system bus to the 21164. In systems without a Bcache it is possible to build a full or partial copy of the Scache tag store and to model the contents of the Scache victim buffers. #### 4.4.1.1 Full Duplicate Tag Store The complete Bcache duplicate tag store would contain an entry for each Bcache block and each victim buffer. Each entry would contain state bits for the VALID, SHARED, and DIRTY status bits along with part or all of addr\_h<38:20> for a Bcache block. The part of addr\_h<38:20> stored in an entry depends upon the size of the Bcache. In a system without a Beache a full Scache duplicate tag store may be maintained. The full Scache duplicate tag store should contain three sets of 512 entries—one for each of the three Scache sets. It should also have two entries for the two Scache victim buffers. Figure 4–7 is a simplified diagram showing the signal lines of interest. scache\_set\_h<1:0> addr\_h<14:6> (index) tag\_shared\_h, tag\_oliny\_h, tag\_valid\_h addr\_h<39:15> (Tag Data) victim\_pending\_h MLO-012395 4–16 Preliminary Edition–September 1994 The system should use the algorithm shown in Figure 4–8 to maintain the duplicate tag store. Figure 4–8 Duplicate Tag Store Algorithm #### 4.4.1.2 Partial Duplicate Tag Store System designers may also choose to build a partial duplicate tag store such as that shown in Figure 4-9. This store contains one or more bits of tag data for each block in the Scache, and for the two victim buffers inside 21164. If a system bus transaction hits in the partial duplicate tag store, then the block may be in the Scache. If a system bus transaction misses in the partial duplicate tag store, then the block is not in the Scache. Figure 4-9 Partial Duplicate Tag Store # 4.5 Cache Coherency Cache coherency is a concern for single and multiprocessor 21164-based systems as there may be several caches on a processor module and several more in multiprocessor systems. The system hardware designer need not be concerned about Icache and Dcache coherency. Coherency of the Icache is a software concern—it is flushed with an IMB (PALcode) instruction. The 21164 maintains coherency between the Deache and the Scache. If the system does not have a Bcache the system designer must create mechanisms in the system interface logic to support cache coherency between the Scache, main memory, and other caches in the system. If the system has a Bcache, the 21164 maintains cache coherency between the Scache and the Bcache. The Scache is a subset of the Bcache. In this case the designer must create mechanisms in the system interface logic to support cache coherency between the Bcache, main memory, and other caches in the system. # 4.5.1 Cache Coherency Basics Alpha 21164 systems maintain the cache coherency and hierarchy shown in (Figure 4–10). Figure 4-10 Cache Subset Hierarchy MK-1455-01 Tasks that must be performed to maintain cache coherency follow: - The Cbox in the 21164 maintains coherency in the Dcache and keeps it as a subset of the Scache. - If an optional Brache is present, then the 21164 maintains the Scache as a subset of the Bcache. The Scache is set associative but is kept a subset of the larger externally implemented direct mapped Bcache. - System logic must help the 21164 to keep the Bcache coherent with main memory and other caches in the system. - The Icache is not a subset of any cache and also is not kept coherent with memory system. The 21164 requires the system to allow only one change to a block at a time. This means that if the 21164 gains the bus to read or write a block, no other node on the bus should be allowed to access that block until the data has been moved. The 21164 includes hardware mechanisms to support several cache coherency protocols. The protocols can be separated into two classes: write invalidate cache coherency protocol and flush cache coherency protocol. ## Write Invalidate Cache Coherency Protocol The write invalidate cache coherency protocol is best suited for shared memory multiprocessors. The write invalidate protocol allows for shared data in the cache. If a Bcache (optional) is used then a duplicate tag store is required. If a Bcache is not used the duplicate tag store is not required but the module designer may include an Scache duplicate tag store. Requiring the duplicate tag store if there is a Bcache allows the 21164 to process system commands in the Beache without probing to see if the block is present (system logic knows the block is present). This results in higher performance for these transactions. If a Beache is not used the module designer may include an Scache duplicate tag store to improve system performance. ### Flush Cache Coherency Protocol This protocol is best suited for low-cost single-processor systems. Flush protocol does not allow shared data in the cache. Plush protocol does not require a duplicate tag store. Because the duplicate tag store is optional for this protocol, the Bcache is probed for each transaction to determine if the block is present. If the block is present, the requested action is taken; if the block is not present, the command is ignored. Section 4.5.2 and Section 4.5.3 describe the write invalidate cache coherency protocol in more detail while Section 4.5.4 and Section 4.5.5 provide a more detailed description of flush cache coherency protocol. # 4.5.2 Write Invalidate Cache Coherency Protocol Systems All 21164-based systems that implement the write invalidate cache protocol must have the combinations of components listed in Table 4-5. For example, a system such as that listed in write invalidate (3), having an Scache and Bcache, is required to have a Bcache duplicate tag store and a lock register. Table 4–5 Components for 21164 Write Invalidate Systems | Cache Protocol | Scache | Scache<br>Duplicate<br>Tag | Beache<br>Duplicate<br>Beache Tag | Lock<br>Register | |----------------------|--------|----------------------------|-----------------------------------|------------------| | Write invalidate (1) | Yes | No | No No | No | | Write invalidate (2) | Yes | Yes | No No | Required | | Write invalidate (3) | Yes | No | Yes Required | Required | ### Write invalidate 1 This system has no external cache, duplicate tag store, or lock register. The 21164 must be made aware of all memory data transactions which occur on the system bus. System logic uses an INVALIDATE, READ DIRTY or READ DIRTY/INVALIDATE transaction to the 21164 to maintain cache coherency and to support the lock mechanism. ### Write invalidate 2 This system has an external Scache duplicate tag store and lock register. System logic uses the duplicate Scache tag store and lock register to filter out unneeded transactions to the 21164. System logic only initiates transactions which affect Scache coherency and maintains the lock mechanism status. ## Write invalidate 3 This system has an external Bcache duplicate tag store and lock register. An Scache duplicate tag store is not needed because the Scache is a subset of the Bcache. This system operates similar to the write invalidate 2 system, except that the cache is larger. # 4.5.3 Write Invalidate Cache Coherency States Each processor in the system must be able to read and write data as if all transactions were going onto the system bus to memory or I/O modules. Therefore, the system bus is the point at which cache coherency must be maintained. Table 4-6 describes the Bcache states that determine cache coherency protocol for 21164 systems. **Table 4–6 Bcache States for Cache Coherency Protocols** | Valid <sup>1</sup> | Shared <sup>1</sup> | Dirty <sup>1</sup> | State of Cache Line | |--------------------|---------------------|--------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | X | X | Not valid. | | 1 | 0 | 0 | Valid for read or write operations. This cache line contains the only eached copy of the block and the copy in memory is identical to this line. | | 1 | 0 | 1 | Valid for read or write operations. This cache line contains the only cached copy of the block. The contents of the block have been modified more recently than the copy in memory. | | 1 | 1 | 0 | Valid for read or write operations. This block may be in another CPU's cache. | | 1 | 1 | 1 | Valid for read or write operations. This block may<br>be in another CPU's cache. The contents of the block<br>have been modified more recently than the copy in | | | | | memory. | <sup>1</sup>The tag\_valid\_h, tag\_shared\_h, and tag\_dirty\_h signals are described in Table 3-1. Note . Unlike some other systems, the 21164 will not take an update to a shared block but instead will invalidate the block. ## 4.5.3.1 Write Invalidate Protocol State Machines Figure 4–11 shows the 21164 cache states that can occur as a result of 21164 transactions to the system. Figure 4-11 Write Invalidate Protocol 21164 States \* Optionally this transition can be configured to occur without a SET DIRTY command being issued. | L | . 1000000000000000000000000000000000000 | | |-----------------|-----------------------------------------|----------| | ** Only allowed | in ma Reacha | evetome | | | III MONDOUCHE | 37316113 | MLO-012934 \_\_\_\_\_ Note \_\_\_\_\_ The abbreviations "I,S,D" indicate the INVALID, SHARED, and DIRTY states. Figure 4-12 shows the 21164 cache states changes maintained by the 21164 as a result of transactions by other nodes on the system bus. Figure 4–12 Write Invalidate Protocol System/Bus States # 4.5.4 Flush Cache Coherency Protocol Systems All 21164-based systems that implement the flush cache protocol must have the combinations of components listed in Table 4-7. For example, a system such as that listed in flush (3), having a Bcache and a Bcache duplicate tag store, is required to have a lock register. Table 4-7 Components for 21164 Flush Cache Protocol Systems | Cache Protocol | Scache | Scache<br>Duplicate<br>Tag | Bcache | Bcache<br>Duplicate Lock<br>Tag Register | |--------------------|--------|----------------------------|--------|------------------------------------------| | Flush protocol (1) | Yes | No | No | No No | | Flush protocol (2) | Yes | No | Yes | No No | | Flush protocol (3) | Yes | No | Yes | Yes Required | ### Flush-based 1 This system has no external cache, duplicate tag store, or lock register. System logic notifies the 21164 of all memory data read operations that occur on the system bus using the interface READ command. The 21164 returns data if the block is dirty. System logic notifies the 21164 of all memory data write operations that occur on the system bus using the interface FLUSH command. The 21164 provides dirty data, then invalidates the block in cache, and updates the lock mechanism status. ## Flush-based 2 This system has an external cache but no duplicate tag store or lock register. System logic and 21164 operation is identical to operation for the flush-based 1 system. ### Flush-based 3 This system has an external cache, a Bcache duplicate tag store, and lock register. System logic notifies the 21164 of all memory data read operations that occur on the system bus to addresses that are valid in the Bcache duplicate tag store. System logic uses the READ command and the 21164 returns data if the block is dirty. System logic uses the FLUSH command to notify the 21164 of all memory data write transactions that occur on the system bus to addresses that are valid in the Bcache duplicate tag store. If the block is dirty, the 21164 provides the block data and invalidates the block in cache in any case. System logic updates its lock mechanism status. ## 4.5.5 Flush-Based Protocol State Machines Figure 4-13 shows the 21164 cache states that can occur as a result of transactions with the system. Figure 4-13 Flush-Based Protocol 21164 States Optionally this transition can be configured to occur without a SET DIRTY command being issued externally. MC-012937 Figure 4-14 shows the 21164 cache states changes maintained by the 21164 as a result of transactions by other nodes on the system bus. Figure 4–14 Flush-Based Protocol System/Bus States # 4.5.6 Cache Coherency Transaction Conflicts Cache coherency conflicts that can occur during system operation are described here. Systems should be designed to avoid these conflicts. #### 4.5.6.1 Case 1 If the 21164 requests a READ MISS MOD transaction, it expects the block to be returned SHARED, DIRTY. However, if the system returns the data SHARED, DIRTY, the 21164 follows with a WRITE BLOCK command. This might cause a multiprocessor system to have live-lock problems, a condition that can cause long delays in writing from the 21164 to memory. #### 4.5.6.2 Case 2 If the 21164 attempts to write a clean/private block of memory, it sends a SET DIRTY command to the system. The system could be sending a SET SHARED or INVALIDATE command to the 21164 at the same time for the same block. The bus is the coherence point in the system; therefore, if the bus has already changed the state of the block to shared, setting the dirty bit is incorrect. The 21164 will not resend the SET DIRTY command when the ownership of the ADDRESS/CMD bus is returned. The write will be restarted and will use the new tag state to generate a new system request. Another possibility is for the system to send an INVALIDATE instruction at the same time the 21164 is attempting to do a WRITE BLOCK transaction to the same block. In this case the 21164 aborts the WRITE BLOCK transaction, services the INVALIDATE instruction, then restarts the write transaction, which produces a READ MISS command. In both of these cases, if the SET DIRTY or WRITE BLOCK transaction is started by the 21164 and then interrupted by the system, the 21164 resumes the same transaction unless the system request was to the same block as the request the 21164 had started. In this case, the 21164 request is restarted internally by the CPU and it is UNPREDICTABLE what transaction the 21164 presents next to the system. ## 4.6 Locks Mechanisms The LDx\_L instruction is forced to miss in the Dcache. When the Scache read, the BIU's lock IPR is loaded with the physical address and the lock flag set. The BIU sends a LOCK command to the system so that it can load its own lock register. The system lock register is used only if the locked block is displaced from the cache system. The lock flag is cleared if any of the following events occur- - Any write operation from the bus addresses the locked block (FLUSH, INVALIDATE, or READ DIRTY/INV). - An STx\_C is executed by the processor. - The locked block is refilled from memory and SYSTEM\_LOCK\_FLAG\_H is cleared. The system copy of the lock register is required on systems that have a duplicate tag store to filter write traffic. The direct mapped Icache, Dcache, and Bcache; along with the subsetting rules, branch prediction, and Istream prefetching, can cause a lock to always fail because of constant Scache thrashing of the locked block. Each time a block is loaded into the Scache, the value of the lock register is logically ANDed with the value of signal system lock flag h. If the locked block is displaced from the cache system, the 21164 does not "see" bus write operations to the locked block. In this case, the system's copy of the lock register corrects the processor copy of the lock flag when the block is filled into the cache, using signal system\_lock\_flag\_h. Systems that do not have duplicate tag stores, and send all probe traffic to the 21164, are not required to implement a lock register or lock flag. Such systems should tie signal system\_lock\_flag\_h permanently true. When the STx C instruction is issued, the Ibox stops issuing memory-type instructions. The store updates the Dcache in the usual way, and places itself in the write buffer. It is not merged with other pending write operations. The write buffer is flushed. When the write buffer arrives at an STx\_C instruction in cached memory, it probes the Seache to check the block state. When the STx\_C passes through the Scache, an INVALIDATE command is sent to the Dcache. If the lock flag is clean the STx\_C fails. If the block is SHARED, DIRTY, the write buffer writes the STx C data into the Scache. Success is written to the register file and the Box begins issuing memory instructions again. If the block is in the shared state, the BIU requests a WRITE BLOCK transaction. If the system CACKs the WRITE BLOCK transaction, the Scache is written and the Ibox starts as previously stated. When the write buffer arrives at an STx\_C instruction in noncached memory, it probes the Scache to check the block state. The Scache misses, the state of the lock flag is ignored, and the BIU requests a WRITE BLOCK LOCK transaction. If the system CACKs the WRITE BLOCK LOCK transaction, the Ibox starts as stated previously. If cfail\_h is asserted along with cack\_h, then the STx\_C fails. ## 4.7 21164-to-Bcache Transactions When initiating an Istream or Dstream data transaction, the 21164 first tries the Icache or Dcache, respectively. If that access is unsuccessful, then the Scache will be tried next. If that fails, then the 21164 tries the Bcache. The 21164 interface to the system and Bcache is in the Cbox. The Cbox provides address and control signals for transactions to and from the Bcache and the system interface logic. The Cbox also transfers data across the 128-bit bidirectional data bus. The 21164 controls all Bcache transactions and will often be able to read and write to the Bcache with no assistance from the system. When system logic reads or writes to the Bcache, it supplies or takes data from the Bcache but only under the direct control of the 21164. # 4.7.1 Bcache Timing Bcache cycle time may be faster, identical to, or slower than, that of the sysclk. If the system is involved in a Bcache transaction, each read or write operation starts on a sysclk edge. It is the responsibility of the system to control the rate of Bcache transactions by using the dack\_h signal. Read and write operations that are private to the 21164 and Bcache may start on any CPU clock. There is no relation between sysclk and private Bcache accesses. Bcache timing is under control of the user through the BC\_CONFIG and BC\_CONTROL internal processor registers (IPRs). Section 5.3.5 and Section 5.3.4 show the layout of these registers. These registers are normally configured by 21164 initialization code. Bcache read and write timing are programmable. Read speed is selected using BC\_CONFIG<7:4>, [BC\_RD\_SPD<3:0>]. Write speed is selected using BC\_CONFIG<11:8> [BC\_WR\_SPD<3:0>]. ## 4.7.2 Bcache Read Transaction (Private Read Operation) Figure 4-15 shows an example of the timing for a private read operation to Bcache by the 21164. The read speed is 4 because BC\_CONFIG [BC\_RD\_SPD] is set to 4, defaulting to the minimum read speed of 4 CPU cycles. Figure 4-15 Bcache Read Transaction The index increments through four 16-byte addresses, each being asserted for four CPU cycles. The Beache logic delays one CPU clock cycle before returning the data associated with each index. The 21164 always delays one cycle before asserting the tag\_ram\_oe\_h and data\_ram\_oe\_h lines. The lines are deasserted after the fourth index address is deasserted. # 4.7.3 Wave Pipeline The wave pipeline is implemented to improve performance for systems that use 64-byte block size. It is not supported for systems with 32-byte block size. The wave pipeline is controlled using BC\_CONFIG<7:4> [BC\_RD\_SPD <3:0>] and BC\_CTL<18:17> [BC\_WAVE<1:0>]. BC\_CONFIG<7:4> [BC\_RD\_SPD<3:0>] is set to the latency of the Bcache read transaction. BC\_CTL<18:17> [BC\_WAVE<1:0>] is set to the number of cycles to subtract from [BC\_RD\_SPD] to get the Bcache repetition rate. For example, if BC\_RD\_SPD is set to 7 and BC\_WAVE<1:0> is set to 2, it takes 7 cycles for valid data to arrive at the pins, but a new read starts every 5 cycles. The read repetition rate must be greater than 3. For example it is not permitted to set BC\_RD\_SPD to 5 and BC\_WAVE<1:0> to 2. The example shown in Figure 4-16 has BC\_RD\_SPD=6, BC\_WAVE<1:0>=2. Figure 4–16 Wave Pipeline Timing Diagram # 4.7.4 Bcache Write Transaction (Private Write Operation) Figure 4-17 shows an example of the timing for a private write operation to Bcache by the 21164. The write speed is 4 because BC\_CONFIG [BC\_WR\_SPD] is set to 4, defaulting to the minimum write speed of 4 CPU cycles. Figure 4-17 Bcache Write Transaction The index increments through four 16-byte addresses, each being asserted for four cycles. The 21164 always delays one cycle then drives the data associated with each index. Signals tag\_ram\_we\_h and data\_ram\_we\_h are asserted high for two cycles because the BC\_CONFIG<28:20>, [BC\_WE\_CTL<8:0>] is set to 6. BC\_CONFIG<22:21> being set causes the write-enable lines to be asserted during the second and third CPU cycles. BC\_CONFIG<20,23> being clear causes the write enable lines to not be asserted during the first and fourth CPU cycles. The Bcache maximum read or write speed is 15 cycles. The minimum read or write speed is 4 cycles except that in 32-byte mode the minimum read speed is 5 cycles. So the index and data can be asserted from 4 to 15 cycles. The write enable signals can be asserted from 0 to 9 cycles. If BC\_CONFIG [BC\_WE\_CTL] is set to 0, the write enable signals will not be asserted. If the 9-bit field is set to 1FF<sub>16</sub>, then the write-enable signals will be asserted for 9 CPU cycles. # 4.7.5 Selecting Bcache Options Table 4-8 lists the variables to consider when designing and implementing a Bcache. **Table 4–8 Bcache Options** | Parameter | Selection | | | |-------------------------------------------|------------|---------------------------------------|---------------------------------------| | Sysclk ratio (3–15) | CPU cycles | | · · · · · · · · · · · · · · · · · · · | | Cache protocol, write invalidate or flush | | | | | Cache block size 64/32 | byte block | r . | | | ECC or byte parity | | | | | Bcache present? | | | | | Bcache size (1 to 64M byte) | M byte | | | | Bcache read speed (4–15) | CPU cycles | 3 | | | Bcache wave pipelining (0–3) | CPU cycles | s | | | Bcache victim buffer? | | | | | Bcache write speed (4–15) | | | | | Bcache read to write spacing (1-7) | | | | | Bcache fill write pulse offset (1–7) | | | | | Bcache write pulse (bit mask 9–0) | <b></b> | | | | Enable LOCK and SET DIRTY commands? | * | | | | Enable memory barrier (MB) commands? | | | | | | | · · · · · · · · · · · · · · · · · · · | | # 4.8 21164-Initiated System Transactions This section describes how commands are used to move data in and out of the 21164 and its cache system. The 21164 starts an external transaction when - It encounters a "miss". - A LOCK command is invoked. - A WRITE command is directed at a shared block. - A WRITE command is directed at a clean block in Seache. - The CPU addresses a noncached region of memory. - The 21164 executes a FETCH, FETCH\_M or MB instruction. For example, the sequence for a 21164-initiated transaction caused by a Bcache miss is listed and described here. - At the start of a Bcache transaction, the 21164 checks the tag and tag control status of the target block. - When checking the Bcache shows a need for system help, the 21164 starts an external READ MISS transaction that tells the system logic to access and return data. - System logic acknowledges acceptance of the command from the 21164 by asserting cack\_h. - If the transaction is a read operation, requiring a FILL transaction, the transaction is broken (pended) while system logic obtains the FILL data. - At a later time the system asserts fill\_h. - The 21164 will assert the tag and tag control bits, and will control the write action during the FILL transaction. - The system logic provides the data during cycles in which dack\_h is asserted. Interface commands from the 21164 to the system are driven on the cmd\_h<3:0> signals. Table 4-9 lists and describes the set of interface commands. Table 4-9 21164-Initiated Interface Commands | Table 4-9 ZTT | 04-mittated ii | nterrace Commands | |-------------------|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Command | cmd_h<br><3:0> | Description | | NOP | 0000 | The NOP command is driven by the owner of the cmd_h bus when it has no tasks queued. | | LOCK | 0001 | The LOCK command is used to load the system lock register with a new lock register address. The state of the system lock register flag is used on each fill to update the 21164's copy of the lock flag. Refer to Section 4.6 for more information. | | FETCH | 0010 | The 21164 passes a FETCH instruction to the system when the FETCH instruction is executed. | | FETCH_M | 0011 | The 21164 passes a FETCH_M instruction to the system when the FETCH_M instruction is executed. | | MEMORY<br>BARRIER | 0100 | The 21164 issues the MEMORY BARRIER command when an MB instruction is executed. This command synchronizes read and write accesses with other processors in the system. The 21164 stops issuing memory reference instructions and waits for the command to be acknowledged before continuing. | | SET DIRTY | 0101 | Dirty bit set if shared bit is clear. The 21164 uses the SET DIRTY command when it wants to write a clean, private block in its Scache and it wants the dirty bit set in the duplicate tag store. The 21164 does not proceed with the write until a CACK response is received from the system. When the CACK is received, the 21164 attempts to set the dirty bit. If the shared bit is still clear, the dirty bit is set and the write operation is completed. If the shared bit is set, the dirty bit is not set and the 21164 requests a WRITE BLOCK transaction. The copy of the dirty bit in the Bcache is not updated until the block is removed from the Scache. | | WRITE BLOCK | 0116 | Request to write a block. When the 21164 wants to write a block of data back to memory, it drives the command, address, and first INT16 of data on a sysclk edge. The 21164 outputs the next INT16 of data when dack_h is received. When the system asserts cack_h, the 21164 removes the command and address from the bus and begins the write of the Scache. Signal cack_h can be asserted before all the data is removed. | (continued on next page) Table 4-9 (Cont.) 21164-Initiated Interface Commands | cmd_h<br><3:0> | Description | |----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0111 | Request to write a block with lock. This command is identical to a WRITE BLOCK command except that the <b>cfail_h</b> signal may be asserted by the system, indicating that the data cannot be written. This command is only used for STx_C in noncached space. | | 1000 | Request for data. This command indicates that the 21164 has probed its caches and that the addressed block is not present. | | 1001 | Request for data. This command indicates that the 21164 has probed its caches and that the addressed block is not present. | | 1010 | Request for data; modify intent. This command indicates that the 21164 plans to write to the returned cache block. Normally, the dirty bit should be set when the tag status is returned to the 21164. | | 1011 | Request for data; modify intent. This command indicates that the 21164 plans to write to the returned cache block. Normally, the dirty bit should be set when the tag status is returned to the 21164. | | 1100 | Bcache victim should be removed. If there is a victim buffer in the system, this command is used to pass the address of the victim to the system. The READ MISS command that produced the victim precedes the BCACHE VICTIM command. Signal victim_pending_h is asserted during the READ MISS command to indicate that a BCACHE VICTIM command is waiting, and that the Bcache is starting the read of the victim data. | | | If the system does not have a victim buffer, the BCACHE VICTIM command precedes the READ MISS commands. The BCACHE VICTIM command is driven, along with the address of the victim. At the same time, the Bcache is read to provide the victim data. | | | 0111<br>1000<br>1001<br>1010 | Table 4–9 (Cont.) 21164-Initiated Interface Commands | 145.6 4 6 (661.11) | 211011111 | | |-----------------------|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Command | cmd_h<br><3:0> | Description | | | | | | | | If the system does have a victim buffer, and it asserts signal dack_h any time before the BCACHE VICTIM command is driven, then address bits addr_h<5:4> of the address sent with the BCACHE VICTIM command are UNPREDICTABLE. The system must use the values of addr_h<5:4> that were sent with the READ MISS command that produced the victim. | | | 1101 | Spare. | | READ MISS<br>MOD STC0 | 1110 | Request for data, STx_C data. | | READ MISS<br>MOD STC1 | 1111 | Request for data, STx_C data. | ## 4.8.1 READ MISS—No Bcache A read operation to the Dcache misses causing a read operation to the Scache, which also misses. After the Scache miss there is no Bcache probe—the 21164 sends a READ MISS command to the system. The system acknowledges receipt of the READ MISS by immediately asserting cack\_h as shown in Figure 4–18. Figure 4-18 READ MISS—No Bcache Timing Diagram ### 4.8.2 READ MISS and FILL The 21164 issues a READ MISS command if it encounters a cache miss as described in Section 4.8.2.1. The system acknowledges receipt of the command. Later the system asserts fill\_h and asserts data<127:0> on the proper cycles and sequence as described in Section 4.8.2.2. ### 4.8.2.1 READ MISS The 21164 starts a Bcache read operation on any CPU clock. The index is asserted to the RAM for a programmable number of CPU cycles in the range of 4 to 15. The tag is accessed at the same time. At the end of the first read, the 21164 latches the data and tag information and begins the read operation of the next 16 bytes of data. The tag is checked for a hit. If there is a miss, a READ MISS or READ MISS MOD command, along with the address, is queued to the cmd\_h<3:0> bus. It appears on the interface at the next sysclk edge. Figure 4–19 shows the timing of a Bcache read and the resulting READ MISS MOD request. Figure 4—19 shows the READ MISS MOD command being acknowledged on cack\_h as soon as it is sent. This allows the 21164 to make additional READ MISS requests. It is also possible for the system to defer assertion of cack\_h until the fill data is returned. This allows the system to use cmd\_h<0> for the value of fill\_id\_h. The assertion of cack\_h should arrive no later than the last fill dack\_h. | | | Note _ | | | |-----------------|----------------------------------------|----------------|------------------------------------|---------| | dia. | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | | | A READ MISS | command | with int4_vali | <b>d_h&lt;3:0&gt;</b> of zero is a | request | | for Istream dat | a while <b>in</b> | t4_valid_h<3:0 | <b>)&gt;</b> of non-zero is a requ | est for | | Dstream data. | | | | | | | 70000000 | | | | Figure 4–19 READ MISS Timing Diagram #### 4.8.2.2 FILL Signals fill\_h, fill\_id\_h, and fill\_error\_h are used to control the return of fill data to the 21164 and the Bcache, if it is present. Signal idle\_bc\_h must be used to stop CPU requests in the Bcache in such a way that the Bcache will be idle when the fill data arrives (but not the FILL command). Signal fill\_h should be asserted at least two sysclk periods before the fill data arrives. Signal fill\_id\_h should be asserted at the same time to indicate whether the FILL is for a READ MISSO or READ MISSO operation. The 21164 uses this information to select the correct fill address. Figure 4-19 shows the timing of a FILL command. If signals fill\_h and fill\_id\_h are asserted at the rising edge of sysclk n, then the 21164 asserts the Bcache index and begins a Bcache write at the rising edge of sysclk n+1. The system should drive the data onto the data bus and assert dack\_h before the end of the sysclk cycle. At the end of the write time, the 21164 waits for the next sysclk edge. If dack\_h has not been asserted, the Bcache write operation starts again at the same index. If dack\_h is asserted, the index advances to the next part of the fill and the write begins again. The system must provide the data and dack\_h signal at the correct sysclk edges to complete the fill correctly. For example, if the Bcache requires 17 ns to write, and the sysclk is 12 ns, then two sysclk cycles are required for each write. The 21164 calculates and asserts tag\_valid\_h and writes the Bcache tag store with each INT16 of data. The system is required to drive signals tag\_shared\_h, tag\_dirty\_h, and tag\_data\_par\_h with the correct value for the entire FILL transaction. At the end of the FILL transaction, the 21164 will not assert data\_ram\_oe\_h or begin to drive the data bus until the fifth CPU cycle after the sysclk that loads the last DACK. If systems require more time to turn off their drivers, they must use idle\_bc\_h in combination with data\_bus\_req\_h to stop 21164 requests, and not send any system requests. ## 4.8.3 READ MISS with Victim The 21164 supports two models for removing displaced dirty blocks from the Beache. The first assumes that the system does not contain a victim buffer. In this case, the victim must be read from the Beache before the new block can be requested. In the second case, if the system has a victim buffer, the 21164 requests the new block from memory while it starts to read the victim from the Beache. The VICTIM command and address follows the miss request. In either case, the 21164 treats a miss/victim as a single transaction. If the assertion of addr\_bus\_req\_h or idle\_bc\_h causes the BIU sequencer to reset, both the READ MISS and BCACHE VICTIM transactions are restarted from the beginning. For example, if the 21164 is operating in victim first mode, and it sends a BCACHE VICTIM command to the system, then the system sends an INVALIDATE request to the 21164. The 21164 processes the INVALIDATE request and then restarts the READ operation and resends the BCACHE VICTIM command and data, then processes the READ MISS. Sections 4.8.3.1 and 4.8.3.2 describe each of these methods of victim processing. ## 4.8.3.1 READ MISS with Victim (Victim Buffer) When the miss is detected, if the system has a victim buffer, the 21164 waits for the next sysclk, then asserts a READ MISS command, the read miss address, the victim\_pending\_h signal, and indexes the Bcache to begin the read operation of the victim. When the system asserts cack\_h, the 21164 sends out the BCACHE VICTIM command and the victim address. Each assertion of dack h causes the Beache index to advance to the next part of the block. Figure 4-20 shows the timing of a READ MISS command with a victim. Figure 4-20 READ MISS with Victim (Victim Buffer) Timing Diagram ## 4.8.3.2 READ MISS with Victim (Without Victim Buffer) If the system does not contain a victim buffer, the 21164 stops reading the Becache as soon as the miss is detected. This occurs while the second INTIG data is on data\_h<127:0>, as shown in Figure 4-21. A BCACHE VICTIM command is asserted at the next sysclk along with the victim address. A Bcache read operation of the victim is also started at the sysclk edge. When dack\_h is received for the first INT16 of the victim, the 21164 begins reading the next INT16 of the victim. cack\_h can be sent any time before the last dack\_h is asserted or with the last dack\_h assertion. The 21164 sends the READ MISS command during the sysclk after cack\_h is received. Figure 4-21 shows the timing of a victim being removed. Notice the data wrap sequence of this transaction—D2, D3, D0, and D1. Figure 4-21 READ MISS with Victim (without Victim Buffer) Timing Diagram ## 4.8.4 WRITE BLOCK and WRITE BLOCK LOCK The WRITE BLOCK command is used to complete writes to shared data to remove Scache victims in systems without a Bcache, and to complete write operations to noncached memory. The WRITE BLOCK LOCK command follows the same protocol. The LOCK qualifier allows the system to be more "conservative" on interlocked write operations to noncached memory space. The WRITE BLOCK command to cached memory regions that source data from the Scache sends data to the system and also causes the data to be written in the Bcache. The 21164 asserts the WRITE BLOCK command, along with the address and the first 16 bytes of data, at the start of a sysclk. If the system removes ownership of the cmd\_h<3:0> bus, the 21164 retains the WRITE command and waits for bus ownership to be returned. If the block in question is invalidated, the 21164 restarts the write operation. This results in the READ MISS MOD request instead. When the system takes the first part of the data, it asserts dack\_h. This causes the 21164 to drive the next 16 bytes of data on the same sysclk edge. If the system asserts cack h, the 21164 outputs the next command in the next sysclk. Receipt of signal cack h indicates to the 21164 that the write operation will be taken, and that it is safe to update the Scache with the new version of the block. During each cycle, the int4\_valid\_h<3:0> signals indicate which INT4 parts of the write operation are really being written by the processor. For write operations to cached memory, all of the data is valid. For write operations to noncached memory, only those INT4 with the int4\_valid\_h<n> signal asserted are valid. See the definition for int4\_valid\_h<n> in Table 3-1. Figure 4-22 shows the timing of a WRITE BLOCK command. Figure 4–22 WRITE BLOCK Timing Diagram ## 4.8.5 SET DIRTY and LOCK Figure 4-23 shows the timing of a SET DIRTY and a LOCK operation. The 21164 uses the SET DIRTY transaction to inform a duplicate tag store that a cached block is changing from the SHARED, DIRTY state to the SHARED, DIRTY state. When cack\_h is received from the system, the 21164 sets the dirty bit. If a SET SHARED or INVALIDATE command is received for the same block, the 21164 responds with a WRITE BLOCK or READ MISS MOD command. ### 4.8.5.1 When to Use a SET DIRTY and LOCK The 21164 uses the LOCK command to pass the address of a LDx\_L to the system. A system lock register is required in any system that filters write traffic with a duplicate tag store. If the locked block is displaced from the 21164 caches, the 21164 uses the value of the system lock register to determine if the LDx\_L/STx\_C sequence should pass or fail. The system may use BC\_CONTROL<2>, [EL\_CMD\_GRP2], to modify operation for these commands. - If BC\_CONTROL [EI\_CMD\_GRP2] is set, the 21164 is allowed to issue SET DIRTY and LOCK commands to the system interface. The system logic acknowledges receipt of these commands. - If BC\_CONTROL [EI\_CMD\_GRP2] is clear, it is UNPREDICTABLE if the SET DIRTY and LOCK commands will be driven to the interface command pins. However, the system should never assert cack\_h for the command when BC\_CONTROL [EI\_CMD\_GRP2] is clear. Figure 4–23 SET DIRTY and LOCK Timing Diagram ## 4.8.6 Memory Barrier (MB) The 21164 may encounter a memory barrier (MB) instruction when executing the instruction stream. The action taken by the 21164 depends upon the state of BC\_CONTROL<3>,[EI\_CMD\_GRP3]. - If BC\_CONTROL [EI\_CMD\_GRP3] is set, the 21164 drains its pipeline and buffers, then issues an MB command to the system interface. The system logic must empty its buffers and complete all pending transactions before acknowledging receipt for the MB command. - If BC\_CONTROL [EI\_CMD\_GRP3] is clear, it is UNPREDICTABLE if the MB command will be driven to the interface command pins. However, the system should never assert cack\_h for the command when BC\_CONTROL [EI\_CMD\_GRP3] is clear. #### 4.8.6.1 When to use a MEMORY BARRIER Command If the system interface buffers invalidate between the duplicate tag store and the 21164, then the system interface must enable the MB command and drain all invalidates before asserting **cack\_h** in response to an MB command. ### 4.8.7 **FETCH** The 21164 passes a FETCH command to the system when it executes a FETCH instruction. # 4.8.8 **FETCH\_M** The 21164 passes a FETCH\_M (fetch with modify intent) command to the system when it executes a FETCH\_M instruction. # 4.9 System-Initiated Transactions System commands to the 21164, are driven on the **cmd\_h<3:0>** signal lines. The algorithm used by the 21164 for accepting system commands to be processed in parallel by the 21164 is presented in Section 4.9.1. System-initiated commands may be separated into two protocol groups. The group of commands used by write invalidate protocol systems is listed and described in Section 4.9.2. The group of commands used by flush-based protocol systems is listed and described in Section 4.9.3. # 4.9.1 Sending Commands to the 21164 The rules used by the Cbox BIU to process commands sent by the system to the 21164 are listed in Section 4.12.1. The algorithm used by the system to send commands to the 21164 without overflowing the two Cbox BIU command buffers is shown in Figure 4-24. ## 4.9.2 Write Invalidate Protocol Commands All 21164-based systems that use the write invalidate protocol expect that the system will use the READ DIRTY, READ DIRTY/INVALIDATE, INVALIDATE, and SET SHARED commands to keep the state of each block up to date. These commands are defined in Table 4–10. Table 4–10 System-Initiated Interface Commands (Write Invalidate Protocol) | Command | cmd_h<br><3:0> | Description | |------------|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | NOP | 0000 | The NOP command is driven by the owner of the cmd_h bus when it has no tasks queued. | | INVALIDATE | 0010 | Remove the block. When the system issues the INVALIDATE command, the 21164 probes its Scache. If the block is found, the 21164 responds with ACK/Scache and invalidates the block. If the block is not found, and the system does not contain a Bcache, the 21164 responds with a NOACK. | | | | If the system contains a Bcache, the block is assumed to be in the Bcache. The 21164 responds with ACK/Bcache, and the block is changed to the invalid state without probing. | | SET SHARED | 0011 | Block goes to the shared state. The SET SHARED command is used by the system to change the state of a block in the cache system to shared. The shared bit in the Scache is set if the block is present. The Bcache tag is written to the shared not dirty state. The 21164 assumes that this action is correct, because the system would have sent a READ DIRTY command if the dirty bit were set. | | | | If the block is found in the Scache, the 21164 responds with ACK/Scache. Otherwise, if the system contains a Bcache, the block is assumed to be in the Bcache, and the 21164 responds with ACK/Bcache. If the system does not contain a Bcache, and the block is not found in the Scache, the 21164 responds with NOACK. | (continued on next page) Table 4–10 (Cont.) System-Initiated Interface Commands (Write Invalidate Protocol) | | , | | |---------------------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Command | cmd_h<br><3:0> | Description | | READ DIRTY | 0101 | Read a block; set shared. The READ DIRTY command probes the Scache to see if the requested block is present and dirty. If the block is not found, | | | | or if the block is clean, and the system does not contain a Bcache, the 21164 responds with NOACK. If the block is found and dirty in the Scache, the | | | | 21164 responds with ACK/Scache and drives the data on the data_h bus. If the block is not found in the Scache, and the system contains a Bcache, the block is assumed to be in the Bcache. The 21164 | | | | responds with ACK/Bcache, indexes the Bcache to<br>read the block, and changes the block status to the<br>shared dirty state. | | READ DIRTY<br>/INVALIDATE | 0111 | Read a block; invalidate. This command is identical to the READ DIRTY command except that if the block is present in the caches, it will be invalidated from the caches. | ## 4.9.2.1 21164 Responses to Write Invalidate Protocol Commands The 21164 responses on addr\_res\_h<1:0> to write invalidate protocol commands are listed in Table 4-11. Table 4–11 21164 Responses on addr\_res\_h<1:0> to Write Invalidate Protocol Commands | INVALIDATE | and SET SHARED Commands | |-----------------------------|-------------------------| | No Bcache Scache_Miss | NOACK | | No Bcache Scache Hit | ACK/Scache | | Bcache_Hit/Miss Scache_Hit/ | Miss ACK/Bcache | | READ DIRTY and READ DIRTY/INVALIDATE Commands | | | |-----------------------------------------------|----------------------|------------| | No Bcache | Scache_Miss | NOACK | | No Bcache | Scache_Hit,Not Dirty | NOACK | | No Bcache | Scache_Hit,Dirty | ACK/Scache | | Bcache | Scache_Hit,Dirty | ACK/Scache | | Bcache | Scache_Miss | ACK/Beache | The purpose of addr\_res\_h<2> is to allow a system without a duplicate tag store to determine if a block is present in the Scache or lock register. The system logic could then use this information to correctly assert tag\_shared\_h in a multiprocessor system. The 21164 responds to the READ, FLUSH, READ DIRTY, SET SHARED and READ DIRTY/INVALIDATE commands on addr\_res\_h<2> as listed in Table 4–12. Table 4-12 21164 Responses on addr\_res\_h<2> to 21164 Commands | Scache | Lock Register | addr_res_h<2> | | |--------|---------------|---------------|--| | Miss | Miss | 0 | | | Miss | Hit | 1 | | | Hit | Miss | 1 | | | Hit | Miss<br>Hit | 1 | | Table 4-13 presents the 21164 best-case response time to sytem commands in a write invalidate protocol system. Table 4–13 21164 Minimum Response Time to Write Invalidate Protocol Commands | Cache Status | Response | Number of sys_clk_out1_h,i Cycles | |--------------|----------------------------------|----------------------------------------------------------| | No Bcache | NOACK | 8 CPU cycles rounded up to next sys_clk_out1_h,l cycles | | No Bcache | ACK/Scache | 12 CPU cycles rounded up to next sys_clk_out1_h,l cycles | | Bcache | NOACK, ACK/Scache,<br>ACK/Bcache | 10 CPU cycles rounded up to next sys_clk_out1_h,l cycles | #### 4.9.2.2 READ DIRTY and READ DIRTY/INVALIDATE The READ DIRTY command is used to read modified data from the cache system. The block status changes from DIRTY, SHARED to DIRTY, SHARED. Figure 4-25 shows the timing of a READ DIRTY transaction. The Scache is probed, the data read (if it is found), and the state is set to SHARED. If the data is not found in the Scache, it is assumed to be in the Bcache. The 21164 starts the Bcache read and writes the tag to DIRTY, SHARED." The READ DIRTY/INVALIDATE command is identical to the READ DIRTY command except that the block is changed to VALID rather than to SHARED. Figure 4–25 READ DIRTY Timing Diagram (Scache Hit) sys\_clk\_out1\_h addr\_bus\_reg\_h READ DIRTY cmd\_h<3:0> victim\_pending\_h addr\_h<39:4> 0000 cack\_h addr\_res\_h<2:0> idle\_bc\_h 0000 0010 0020 0030 0000 0 index\_h<25:4> data\_h<127:0> dack\_h data\_ram\_oe\_h data\_ram\_we\_h tag\_ram\_oe\_h tag\_ram\_we\_h tag\_data\_h<38:20> tag\_dirty\_h tag\_shared\_h tag\_valid\_h MLO-012408 Preliminary Edition-September 1994 4-57 #### 4.9.2.3 INVALIDATE The INVALIDATE command can be used to remove a block from the cache system. Unlike the FLUSH command, any modified data will not be read. The Scache is probed and invalidated if the block is found. The Beache is invalidated without probing. Figure 4–26 shows the timing of an INVALIDATE transaction. Figure 4–26 INVALIDATE Timing Diagram #### **4.9.2.4 SET SHARED** When the 21164 receives a SET SHARED command, it probes the Scache and changes the state of the block to SHARED if it is found. The 21164 "assumes" that the block is in the Bcache and writes the state of the tag to SHARED, DIRTY. Figure 4-27 shows the timing of a SET SHARED command. Figure 4-27 SET SHARED Timing Diagram # 4.9.3 Flush-Based Cache Coherency Protocol Commands A flush-based design using the 21164 "assumes" the system will use the READ and FLUSH commands defined in Table 4-14 to maintain cache coherency Table 4–14 System-Initiated Interface Commands (Flush Protocol) | Command | cmd_h<br><3:0> | Description | |---------|----------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | NOP | 0000 | The NOP command is driven by the owner of the cmd_h bus when it has no tasks queued. | | FLUSH | 0001 | Remove block from caches; return dirty data. The FLUSH command causes a block to be removed from the 21164 cache system. If the block is not found, the 21164 responds with NOACK. If the block is found and the block is clean, the 21164 responds with NOACK. The block is invalidated in the Dcache, Scache, and Bcache. If the block is found and is dirty, the 21164 responds with ACK/Scache or ACK/Bcache. If the data is found dirty in the Scache, it is driven at the interface in the same sysclk as the ACK/Scache. If the data is found dirty in the Bcache, the Bcache read starts on the same sysclk as ACK. The block is invalidated in the Dcache, Scache, and Bcache. | | READ | 0100 | Read a block. The READ command probes the Scache and Bcache to see if the requested block is present. If the block is present, the 21164 responds with ACK/Scache or ACK/Bcache. If the data is in Scache, the data is driven on the data_h bus in the same sysclk as the ACK. If the data is in the Bcache, a Bcache read operation begins in the same sysclk as the ACK. If the block is not present in either cache, the 21164 asserts NOACK on addr_res_h<1:0>. | #### 4.9.3.1 21164 Responses to Flush-Based Protocol Commands The system responds to flush-based protocol commands on addr\_res\_h<1:0> as shown in Table 4-15. Table 4–15 21164 Responses to Flush-Based Protocol Commands | | | .000,000 | |--------------------------|----------------------------|----------------| | | READ and FLUSH Commands | 3 | | Bcache Status | Scache Status | 21164 Response | | No Bcache | Scache_Miss | NOACK | | No Bcache | Scache_Hit,Not Dirty | NOACK | | No Bcache | Scache_Hit,Dirty | ACK/Scache | | Bcache_Miss | Scache_Miss | NOACK | | Bcache_Hit | Scache_Hit,Dirty | ACK/Scache | | Bcache_Hit,<br>Not Dirty | Scache_Miss/Hit, Not Dirty | NOACK | | Bcache_Hit,Dirty | Scache_Miss | ACK/Bcache | The purpose of addr\_res\_h<2> is to allow a system without a duplicate tag store to determine if a block is present in the Scache or lock register. The system logic could then use this information to correctly assert tag\_shared\_h in a multiprocessor system. The 21164 responds to the READ, FLUSH, READ DIRTY, SET SHARED and READ DIRTY/INVALIDATE commands on addr\_res\_h<2> as listed in Table 4–16. Table 4-16 21164 Responses on addr res h<2> to 21164 Commands | Scache | Lock Register | addr_res_h<2> | | |--------|---------------|---------------|--| | 50000 | Miss | 0 | | | Miss | Hit | 1 | | | Hit | Miss | 1 | | | Hit | Hit | 1 | | Table 4-17 presents the 21164 best-case response time to sytem commands in a write invalidate protocol system. Table 4–17 Minimum 21164 Response Time to Write Invalidate Protocol Commands | Cache Status | Response | Number of sys_clk_out1_h,i Cycles | |--------------|----------------------------------|-------------------------------------------------------------| | No Bcache | NOACK | 8 CPU cycles rounded up to next<br>sys_clk_out1_h,l cycles | | No Bcache | ACK/Scache | 12 CPU cycles rounded up to next sys_clk_out1_h,l cycles | | Bcache | NOACK, ACK/Scache,<br>ACK/Bcache | 10 CPU cycles rounded up to next<br>sys_clk_out1_h,l cycles | Table 4–18 presents the 21164 best-case response time to sytem commands in a flush protocol system. Table 4–18 Minimum 21164 Response Time to Flush Protocol Commands | Cache Status | Response | Number of sys_clk_out1_h,I Cycles | |--------------|-------------------------------------|---------------------------------------------------------------------------| | No Bcache | NOACK | 8 CPU cycles rounded up to next sys_clk_out1_<br>h,I cycles | | No Bcache | ACK/Scache | 12 CPU cycles rounded up to next sys_clk_out1_<br>h,l cycles | | Bcache | NOACK,<br>ACK/Scache,<br>ACK/Bcache | 10 CPU cycles plus [BC_RD_SPD] rounded up to next sys_clk_out1_h,l cycles | #### 4.9.3.2 FLUSH The FLUSH command can be used to remove blocks from the 21164 cache system. Figure 4-28 shows the timing of a FLUSH transaction. If the block is DIRTY, the block will be read from the cache and written to memory. In the timing diagram shown in Figure 4-28, the cache block state changes from DIRTY, SHARED, VALID) to DIRTY, SHARED, VALID. When the block state changes to VALID, the state of SHARED and DIRTY do not matter. Figure 4–28 FLUSH Timing Diagram (Scache Hit) ## 4.9.3.3 READ The READ command is used by the system to read DIRTY data from the 21164. The tag control status does not change. Figure 4-29 shows the timing and tag control status of a READ transaction. Figure 4–29 READ Timing Diagram (Scache Hit) # 4.10 Data Bus and Command/Address Bus Contention The data bus is composed of data\_h<127:0> and data\_check\_h<15:0>. The command/address bus is composed of cmd h<3:0>, addr h<39:4> and addr\_cmd\_par\_h. The following sections describe situations that have contention for use of the data bus or contention for use of the command/address bus. #### 4.10.1 Command/Address Bus Figure 4-30 shows the 21164 and the system alternately driving the command/address bus. If signal addr\_bus\_req\_h is asserted at the end of a sysclk 0, the next cycle on the command/address bus belongs to the system. The 21164 turns off its drivers at the start of syscik 1. While the system must turn on its drivers during sysclk 1, it must ensure that the drivers do not turn on before the 21164 drivers turn off. The 21164 samples the state of the command/address bus at the end of sysclk 1. If addr\_bus\_req\_h remains asserted, the system should continue to drive the command/address bus. Figure 4-30 Driving the Command/Address Bus To pass control of the command/address bus back to the 21164, the system should turn off its drivers during a sysclk and deassert addr\_bus\_req\_h. The 21164 does not sample the state of the bus if addr\_bus\_req\_h is deasserted. The 21164 drives the command/address bus at the next sysclk edge. # 4.10.2 Read/Write Spacing—Data Bus Contention The data bus, data\_h<127:0>, can be driven by the 21164, the Bcache array, or the system. In the case of private Bcache write operations followed by private Bcache read operations, the 21164 stops driving the data bus well in advance of the Bcache turning on. For private Bcache read operations followed by private Bcache write operations, the 21164 inserts a programmable number of CPU cycles between the read and the write operation. This allows time for the Bcache drivers to turn off before the 21164 data drivers are turned on. \_\_\_\_\_ Note & This rule also applies to WRITE BLOCK, WRITE BLOCK LOCK, READ, READ DIRTY, READ DIRTY/INV, and FLUSH commands. # 4.10.3 Using idle bc h and fill h The 21164 uses the idle\_bc\_h and fill\_h signals to fill data into the Bcache. The system asserts the idle\_bc\_h signal early enough to ensure that the 21164 completes any Bcache transaction it might have started while waiting for the fill data. Signal fill\_h is asserted a fixed number of sysclk cycles before the fill data to start the fill transaction in the Bcache. At the end of the fill, the 21164 waits five CPU cycles before starting a read or write operation. This time should allow the system to turn off its drivers. If, in practice, this is not enough time, the system may assert data bus req h to gain additional cycles. ## Calculating Time to Assert idle bc h The equations for calculating length of time to assert idle\_bc\_h are: ``` read hit idle = 2 + (block size/16) * BC RD SPD + tristate ram turn off - 3 * wave pipelining; read miss idle = 6 + BC RD SPD + Sysclk ratio * tristate RAM turn off; = 4 + (block size/16) * BC WRT SPD + tristate 21164 turn off; write idle ``` Take the largest of the three times and then round up to the next sysclk boundary. When determining the tristate turn off times, if the system will not turn on its drivers for some number of nanoseconds after the 21164 starts driving Bcache index\_h<25:4>; this time can be used to reduce the tristate\_turn\_off time. For example if the syscik ratio is 6 (64B block), Beache read/write speed is 5, with no wave pipelining, 2 cycles for tristate read, 0 cycles for tristate\_write, then the equations would work out to: ``` read hit idle = 2 + (64/16) * 5 + 2 - 3 * 0 = 24 read miss idle = 6 + 5 + 6 + 2 = 19 write idle = 4 + (64/16) * 5 + 0 = 24 Maximum of (24/6), (19/6), (24/6) = 4 ``` If the 21164 receives asserted **idle\_bc\_h** at sysclk edge N, the FILL command can be received at sysclk edge N+3. The 21164 drives **index\_h<25:4>** to fill the Bcache on sysclk edge N+4. Figure 4-31 Example of Using idle\_bc\_h and fill\_h #### Minimum idle bc h time If the sytem contains a Bcache, and the write ratio of the Bcache is greater than or equal to twice the sysclk ratio, then the minimum idle\_bc\_h assertion time is two sysclk cycles. For example, if the Beache write speed is 10, and the sysclk ratio is 4, then any assertion of idle\_bc\_h must be for two or more sysclk cycles. # 4.10.4 Using data\_bus\_req\_h The signal data\_bus\_req\_h can be used along with the idle\_bc\_h signal to prevent the 21164 and the Bcache from driving the data bus. In general the system should not need to use this feature but it is useful if the system places other devices on the data bus. To gain control of the data bus, the system must ensure that the Bcache is idle by asserting idle\_bc\_h for the required time. It can then assert data\_bus\_ req\_h. If data\_bus\_req\_h is received asserted at the rising edge of sysclk N, the 21164 stops driving the bus on the rising edge of sysclk N+1. To return the bus to the 21164, the system should deasert data\_bus\_req\_h and then deassert idle\_bc\_h on the next sysclk. Figure 4-32 Using data bus req h ## 4.10.5 Tristate Overlap The addr\_h<39:4>, cmd\_h<3:0>, data\_h<127:0>, and tag\_data\_h<38:20> buses must be operated in such a way that no more than one driver may drive the bus at a time. This section describes the 21164 features that can be used to prevent tristate overlap. The "owner" of each bus must drive the bus to some value for each cycle. Tristate drivers in the 21164 turn on and off very fast (in the 0.5 ns to 1.0 ns range). At the other end of the range, SRAM memory devices turn on and off slowly (in the 7.0 ns to 10.0 ns range). Generally, system drivers fall somewhere in the middle. #### 4.10.5.1 READ or WRITE to FILL The time required to tristate the 21164 drivers at the end of a WRITE command, or the Beache drivers at the end of a READ command is part of the idle\_bc\_h equation. #### 4.10.5.2 BCACHE VICTIM to FILL The time to turn off the Bcache drivers at the end of a BCACHE VICTIM is fixed by the 21164 design. The system must allow for this time before starting a FILL. There are two READ MISS with victim cases to consider. In one case, the READ MISS operation will be completed first because the system logic contains a victim buffer. In the other case the READ MISS operation will be completed second because the system logic does not have a victim buffer. #### **READ MISS Completed First—Victim Buffer** The final dack\_h will be sampled by the 21164 on the rising edge of sysclk. If the corresponding rising CPU clock edge is labeled N, then data\_ram\_oe\_h will deassert at the rising edge of CPU clock N+4. N N+1 N+2 N+3 N+4 **CPU Clock Cycles** sys\_clk\_out1\_h dack\_h 13 index <25:4> **D3** data\_h <127:0> Figure 4-33 READ MISS Completed First-Victim Buffer ## **READ MISS Second—No Victim Buffer** data\_ram\_oe\_h The final dack\_h will be sampled by 21164 on the rising edge of sysclk. If the corresponding rising CPU clock edge is labeled N, then the READ MISS command will arrive on the next sysclk edge, and the data\_ram\_oe\_h will deassert at the rising edge of CPU clock N+S+1, where S is the sysclk ratio. If the sysclk ratio is 3, it will take an extra sysclk to send the READ MISS command, so the data\_ram\_oe\_h will deassert at N+2S+1. MLO-012415 CPU Clock Cycles sys\_clk\_out1\_h cmd\_h <3:0> READ MISS dack\_h index <25:4> 13 data\_h <127:0> D3 MLO-012416 Figure 4–34 READ MISS Second—No Victim Buffer #### 4.10.5.3 System Bcache Command to FILL At the end of a system command that uses the Bcache, the system must provide enough time for the Bcache drivers to turn off before returning any fill data. The final dack\_h will be sampled by the 21164 on the rising edge of sysclk. If the corresponding rising CPU clock edge is labeled N, data\_ram\_oe\_h will deassert at the rising edge of CPU clock N+5. Figure 4-35 System Command to FILL Example 1 A side effect of this is the earliest assertion of fill\_h after a system command. The system must allow time for data\_ram\_oe\_h to turn off and the RAMs to stop driving the bus before the system drives the fill data. If the system command was a SET SHARED or an INVALIDATE command, the system must allow time for the 21164 to complete the Bcache tag write operation and then for the drivers to turn off before driving the tag\_shared\_h, tag\_dirty\_h, and tag\_ctl\_par\_h lines. The 21164 begins the tag write operation one CPU cycle after the response is sent to the system. The write transaction will take BC\_WRT\_SPD cycles to complete. During the write transaction, data\_ram\_oe\_h will be asserted but not tag\_ram\_oe\_h. At the end of the write transaction, tag\_ram\_oe\_h will pulse for one CPU cycle, then both will go off. Refer to Figure 4-36 if the response is driven at the rising edge of CPU clock N, then data\_ram\_oe\_h will fall at N+2+BC\_WRT\_SPD, or N+6 for a 4-cycle write speed. Figure 4-36 System Command to FILL Example 2 #### 4.10.5.4 FILL to Private Read or Write Operation At the end of the fill, the 21164 does not begin to drive the data bus until the fifth CPU cycle after the sysclk that loads the last dack\_h. The 21164 does not assert data\_ram\_oe\_h until the fifth cycle after the sysclk that loads the last dack\_h. Systems requiring more time to turn off their drivers must not send any more requests and must use idle\_bc\_h and data\_bus\_req\_h at the end of the fill to stop 21164 requests. Figure 4-37 FILL to Private Read or Write ## 4.11 21164 Interface Restrictions This section lists restrictions on the use of 21164 interface features. # 4.11.1 FILL Operations after Other Transactions If the system has removed data from the 21164 with any of the system commands, or removed a Bcache victim from the Bcache, and wants to follow either of these transactions with a FILL, then the earliest point the system can assert the fill\_h signal is at the sysclk after the last assertion of dack\_h. FILLs followed by FILLs is a special case. FILLs can be pipelined back-to-back so that 100% of the data bus bandwidth can be used. # 4.11.2 Command Acknowledge for WRITE BLOCK Commands When the 21164 requests a WRITE BLOCK or WRITE BLOCK LOCK operation, the system can acknowledge the data by asserting dack\_h before asserting cack\_h. The system must assert cack\_h no later than the last assertion of dack\_h. # 4.11.3 Systems Without a Bcache Systems without a Bcache must set a 64-byte block size. If systems without a Bcache have an Scache duplicate tag store, they are required to maintain tags for the two blocks in the 21164 Scache victim buffer. #### 4.11.4 WRITE BLOCK LOCK A WRITE BLOCK LOCK transaction is caused by a store conditional instruction to I/O space. Two octawords of data are provided by the 21164, each requiring the system to assert dack\_h. If the system asserts dack\_h for the first octaword, asserts cack\_h and cfail\_h together, and the sysclk ratio is three, the 21164 hangs. If dack\_h, cack\_h, and cfail\_h are asserted for the second INT16 of data, the write operation will be failed correctly. If **cack\_h** and **cfail\_h** are asserted at any time without asserting **dack\_h**, the write operation will be failed correctly. If the syscik ratio is anything other than three, any legal combination of dack\_h, cack\_h, and cfail\_h causes the write operation to fail correctly. # 4.12 21164/System Race Conditions When certain sequences of transactions occur on the interface between the 21164, the Bcache and the system race conditions may occur. The rules for use of the interface by the 21164 and the system are listed in Section 4.12.1. Examples of race conditions to be avoided are described and illustrated in Section 4.12.2 through Section 4.12.6. # 4.12.1 Rules for 21164 and System Use of External Interface This section goes over the rules for determining the order in which 21164 and system requests are allowed by the Cbox BIU. In general, the order allowed is determined by use of cmd\_h<3:0>, idle\_bc\_h, and fill\_h. - 1. If idle\_bc\_h is not asserted and there are no valid requests in the BIU command buffer, then the BIU is free to perform any 21164 request. - 2. If a FILL transaction is pending, the BIU only produces another READ MISS command, with a possible BCACHE VICTIM command. The BIU will not attempt any other command. - 3. The assertion of idle\_bc\_h, or the sending of a system command other than NOP to the 21164, causes the BIU to idle. If the BIU has a command loaded in the pad ring, it removes the command and replaces it with a NOP command. The state of cmd\_h<3:0> is unpredictable until the idle condition ends. - 4. The idle condition ends when the 21164 receives a deasserted idle\_bc\_h, and the 21164 has responded to all the system commands that were sent. - The system must not assert cack\_h during the idle condition. - There is one exception to rules 3, 4, and 5. If idle\_bc\_h or a system command arrives while the 21164 is reading the Bcache, and that read transaction turns into a READ MISS transaction, and it does not produce a victim, then the 21164 loads the miss into the pad ring. The system may assert cack h for this READ MISS request at any time. - 7. If cack h is asserted at the same time as idle\_bc\_h or a valid system request, cack h wins and the command is taken by the system. Signal cack h should not be asserted if idle\_bc\_h has been asserted or a valid system command is under way. - 8. A READ MISS with a BCACHE VICTIM transaction is treated as an atomic pair. The command order, READ MISS then BCACHE VICTIM or BCACHE VICTIM then READ MISS, is programmable. Either way, if the first command is acknowledged with cack h, then both commands must be - acknowledged with cack\_h and all the data acknowledged with dack\_h, before the 21164 responds to any other request. - 9. The cack\_h acknowledgment for a WRITE BLOCK or BCACHE VICTIM transaction must be received by the 21164 with or before the last dack\_h acknowledgment of the data. For WRITE BLOCK and BCACHE VICTIM transactions, it is possible to acknowledge all but the last data, and then decide to do something else. - 10. For a READ MISS transaction, **cack\_h** must be **received** with or before the last data acknowledgment (**dack\_h**) for the requested FILL operation. - 11. If a 21164 request is interrupted by an idle condition, the 21164 restarts the same command unless: - a. A system request is received that changes the state of the block made by the original 21164 request. - For example, if the 21164 is requesting a WRITE BLOCK and the system sends an INVALIDATE command to the same block, then the WRITE BLOCK command will not be restarted. - b. If the system does not have a Bcache, and a WRITE BLOCK command to write an Scache victim back is interrupted, then the WRITE BLOCK command will not be restarted if a higher priority request arrives in the BIU. # 4.12.2 READ MISS with Victim Example In this example, the 21164 asserts a READ MISS command with a victim. The system asserts **dack\_h** for two data cycles received from the Bcache and then asserts **idle\_bc\_h**. This causes the 21164 to remove the READ MISS command with victim pending. The 21164 reasserts the READ MISS and BCACHE VICTIM commands, if needed, at a later time. Figure 4-38 READ MISS with Victim Example sys\_clk\_out1\_h Cycles **READ MISS** cmd\_h<3:0> addr\_h<127:0> victim\_pending\_h addr\_bus\_req\_h idle\_bc\_h cack\_h dack\_h 12 index <25:4> D0 D1 D2 data\_h <127:0> data\_ram\_oe\_h MLO-012420 # 4.12.3 idle\_bc\_h and cack\_h Race Example In this example, idle\_bc\_h and cack\_h are asserted in the same syscik. The system takes the READ MISS and BCACHE VICTIM commands before doing anything else. The last dack\_h meets the requirement that the cack\_h arrive before or with the last dack\_h. # 4.12.4 READ MISS with idle\_bc\_h Asserted Example In this example, the 21164 has started a Bcache read operation that misses. The signal idle\_bc\_h is asserted, but no victim was created, so the READ MISS request is loaded into the pad ring. The system then takes the request. Figure 4-40 READ MISS With idle\_bc\_h Asserted Example # 4.12.5 READ MISS with Victim Abort Example In this example, the 21164 produces a READ MISS command with a victim and is waiting for the system to take it when the system takes the bus and requests a READ DIRTY transaction. The 21164 drives the READ MISS request for one more cycle after it gets command of the bus and then removes the request. The 21164 then responds to the READ DIRTY command and drives index\_h<2:4> to read the Bcache. The 21164 restarting the Bcache read operation, requesting the read miss with victim, is not shown in the timing diagram. If the victim block was invalidated by the system request, the 21164 produces a clean READ MISS transaction. Figure 4-41 READ MISS with Victim Abort Example # 4.12.6 Bcache Hit Under READ MISS Example In this example, the 21164 produces a READ MISS transaction and requests a fill from the system. A Bcache hit to index j take places while waiting for the fill. The system then returns the requested data in two bursts, asserting eack\_h at the same time as the last assertion of dack\_h. Figure 4–42 Bcache Hit Under READ MISS Example # 4.13 Data Integrity, Bcache Errors, and Command/Address Errors Mechanisms for ensuring that errors on data received by the 21164 from the Beache, the system, or both are described in this section. Tag data and tag control errors are described. Command/address bus parity protection is also described. ### 4.13.1 Data ECC and Parity The 21164 supports INT8 error correction code (ECC) for the external Bcache and memory system. ECC is generated by the CPU for each INT8 that is written into the Bcache. FILL data from the Bcache to the system is not checked for errors. The receiving node detects any ECC errors. Uncorrected data from the Bcache or system is sent to the Dcache, and register files. If a correctable error is detected (single bit error) the machine traps and the fill is replayed with corrected data. Double bit errors are detected. If the system indicates that the data should not be checked, then no checking or correcting is performed. Each data bus cycle delivers one INT16 worth of data. ECC is calculated as ECC(data<063:000>) and ECC(data<127:064>). Figure 4-43 shows the code. Two IDT49C460 or AMD29C660 chips can be cascaded to produce this ECC code. A single IDT49C466 chip also supports this ECC code. The code provides single bit correct, double bit detect, and all 1s and all 0s detect. If the 21164 is in parity mode, it generates byte parity and places it on data\_check\_h<15:0> for write operations. Parity is checked for read operations. Parity for data\_h<7:0> is driven on signal data\_check\_h<0> and so on. Figure 4-43 ECC Code 0123 4567 8901 2345 6789 0123 4567 8901 2345 6789 0123 4567 8901 2345 6789 0123 4567 .111 .1. 11.1 ..1 ..1. .111 .1. 11.1 ..1 ..1. 1..1 ..1. 1.11 ..1. 11.1 1... 1.11 ... 1.11 ... 1.11 ... 1.11 ... 111. 1.1. 1.1. 1... 111. 1.1. 1.1. 1... 111. 1.1. 1.1. 1... 111. 1.1. 1... 111. 1.1. 1... 111. 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... 1... CB0 CB1 CB2 CB3 .... 11 1111 .... .... 11 .... 1111 .... .... 11 .... 11 .... .... 11 .... 11 .... 11 .... 11 .... 11 .... 11 .... 11 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 ... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 ... 111 .... 111 .... 111 .... 111 .... 111 .... 111 .... 111 ... 111 .... 111 .... 111 .... 111 .... 111 .... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 ... 111 CB4 CB5 1111 1111 ... CB6 CB7 CB2 and CB3 are calculated for CDD parity (an odd number of 1s counting CB0, CB1, CB4, CB5, CB6, and CB7 are calculated for EVEN parity (an even number of 1s counting the CB). LJ-03461-TI0 The correspondence of data check bits to CBn is shown in Table 4-19. Table 4–19 Data Check Bit Correspondence to CBn | | data_check_h | | | |-----|----------------|---------------|--| | CBn | Upper 64 bits | Lower 64 bits | | | CB0 | <8> | <0> | | | CB1 | <b>*9</b> > | <1> | | | CB2 | <10> | <2> | | | CB3 | <11> | <3> | | | CB4 | <12> | <4> | | | CB5 | <13≽ | <5> | | | CB6 | <b>≼14&gt;</b> | <6> | | | CB7 | <b>≪15</b> > | <7> | | For x4 RAMs, the following bit arrangement detects nibble errors: | CB0 | CB1 | CB5 | CB6 | |-----|-----|-----|-----| | CB2 | D0 | D4 | D5 | | CB3 | CB4 | D7 | D8 | | CB7 | D2 | D3 | D11 | | D1 | D6 | D10 | D13 | | D9 | D14 | D18 | D21 | | D12 | D16 | D17 | D22 | | D15 | D19 | D20 | D23 | | D24 | D25 | D27 | D30 | | D26 | D28 | D29 | D31 | | D32 | D34 | D35 | D37 | | D33 | D36 | D38 | D40 | | D39 | D41 | D43 | D46 | | D42 | D44 | D45 | D47 | | D48 | D50 | D51 | D53 | | D49 | D52 | D54 | D56 | | D55 | D57 | D59 | D62 | | D58 | D60 | D61 | D63 | | | | | | #### 4.13.2 Force Correction Setting BC\_CTL<4>, [CORR\_FILL\_DAT], forces the 21164 to route fill data from the Bcache or memory to go through error correction logic before being driven to the Scache or Deache. If the error is correctable, it is transparent to the 21164. #### 4.13.3 Bcache Tag Data Parity The signal line **tag\_data\_par\_h** is used to maintain parity over **tag\_data\_h<38:20>**. A Bcache tag data parity error is usually not recoverable. A Bcache hit is determined based on the tag alone, not the tag parity bit. The Cbox records the Bcache probe address and the tag value read from the Bcache. A tag data parity error causes a trap to privileged architecture library code (PALcode), which handles the error condition. #### 4.13.4 Bcache Tag Control Parity The signal line tag\_ctl\_par\_h is used to maintain parity over tag\_shared\_h, tag\_valid\_h, and tag\_dirty\_h. A Bcache tag control parity error is usually not recoverable. A Bcache victim is processed according to the tag control status alone, not the tag control parity bit. The Cbox records the Bcache probe address and the tag control value read from the Bcache. A tag control parity error causes a trap to PALcode, which handles the error condition. #### 4.13.5 Address and Command Parity The signal line addr\_cmd\_par\_h is used to maintain odd parity over addr\_h<39:04> and cmd\_h<3:0>. #### 4.13.6 Fill Error The signal fill\_error\_h is asserted by the system to notify the 21164 that a fill error has occurred. Systems in which a fill error timeout is not expected, such as a small system with fixed access time, it is likely that the 21164 internal Ibox timeout logic would detect a stall if the system fails to complete a fill transaction. Systems in which a fill error timeout could occur should contain logic to detect fill timeouts and cleanly terminate the transaction with the 21164. To properly terminate a fill in an error case, the fill\_error\_h line is asserted for one cycle and the normal fill sequence involving lines fill\_h, fill\_id\_h and dack\_h is generated by the system. Asserting fill\_error\_h forces a trap to the PAL code at the MCHK entry point but has no other effect. #### 4.13.7 Forcing 21164 Reset Assertion of cfail\_h in a sysclk cycle in which cack\_h is not asserted causes the 21164 to execute a partial internal reset and then trap to the MCHK entry point in PALcode. This mechanism is used by the 21164 to restore itself and the system to a consistent state after command or address parity error or a timeout error. ### 4.14 Interrupts The 21164 has seven interrupt signals that have different uses during initialization and normal operation. Figure 4-44 shows the 21164 interrupt signals. Figure 4-44 Alpha 21164 Interrupt Signals #### 4.14.1 Interrupt Signals During Initialization The 21164 interrupt signals work in tandem with the sys\_reset\_l signal to set the values for many of the user-selectable clock ratios, clock delays, and interface timing parameters. During initialization, the 21164 reads system clock configuration parameters from the interrupt pins. Section 4.2.2 and Section 4.2.3 describe how the interrupt signals are used to set system clock values when the system is initialized. ## 4.14.2 Interrupt Signals During Normal Operation During normal operation, interrupt signals indicate interrupt requests from external devices such as the real-time clock and I/O controllers. ### 4.14.3 Interrupt Priority Level Table 4-20 shows which interrupts are enabled for a given interrupt priority level (IPL). An interrupt is enabled if the current IPL is less than the target IPL of the interrupt. Table 4-20 Interrupt Priority Level Effect | Interrupt Source | Target<br>IPL <sub>10</sub> | Source | |---------------------------------------------------------------------------------------------------------|-----------------------------|-------------------------------------------| | Software Interrupt Request 1 | 1 | Internal | | Software Interrupt Request 2 | 2 | Internal | | Software Interrupt Request 3 | 3 | Internal | | Software Interrupt Request 4 | 4 | Internal | | Software Interrupt Request 5 | 5 | Internal | | Software Interrupt Request 6 | 6 | Internal | | Software Interrupt Request 7 | 7 | Internal | | Software Interrupt Request 8 | 8 | Internal | | Software Interrupt Request 9 | 9 | Internal | | Software Interrupt Request 10 | 10 | Internal | | Software Interrupt Request 11 | 11 | Internal | | Software Interrupt Request 12 | 12 | Internal | | Software Interrupt Request 13 | 13 | Internal | | Software Interrupt Request 14 | 14 | Internal | | Software Interrupt Request 15 | 15 | Internal | | Asynchronous system trap (AST) pending (for current or more privileged mode) | <b>2</b> | Internal | | Performance counter interrupt | 29 | Internal | | Power fail interrupt <sup>1</sup> | 30 | pwr_fail_irq_h | | System machine check interrupt <sup>1</sup> , internally detected correctable error interrupt pending | 31 | <pre>sys_mch_chk_irq_h and internal</pre> | | External interrupt 20 <sup>1</sup> (I/O interrupt at IPL 20, corrected system error interrupt) | 20 | irq_h<0> | | External interrupt 21 <sup>1</sup> (I/O interrupt at IPL 21) | 21 | irq_h<1> | | External interrupt 22 <sup>1</sup> (I/O interrupt at IPL 22, interprocessor interrupt, timer interrupt) | 22 | irq_h<2> | <sup>&</sup>lt;sup>1</sup>These interrupts are from external sources. In some cases, the system environment provides the logic-OR of multiple interrupt sources at the same IPL to a particular pin. (continued on next page) Table 4–20 (Cont.) Interrupt Priority Level Effect | Interrupt Source | Target<br>IPL <sub>10</sub> | Source | |--------------------------------------------------------------|---------------------------------------|---------------| | External interrupt 23 <sup>1</sup> (I/O interrupt at IPL 23) | 23 | irq_h<3> | | Halt <sup>1</sup> | Masked only by executing in PAL mode. | mch_hlt_irq_h | <sup>&</sup>lt;sup>1</sup>These interrupts are from external sources. In some cases, the system environment provides the logic-OR of multiple interrupt sources at the same IPL to a particular pin. When the processor receives an interrupt request and that request is enabled, an interrupt is reported or delivered to the exception logic if the processor is not currently executing PALcode. Before vectoring to the interrupt service PAL dispatch address, the pipeline is completely drained to the point that instructions issued before entering the PALcode cannot trap (implied TRAPB). The restart address is saved in the exception address (EXC\_ADDR) IPR and the processor enters PALmode. The cause of the interrupt can be determined by examining the state of the INTID and ISR registers. Hardware interrupt requests are level sensitive and therefore may be removed before an interrupt is serviced. PALcode must verify that the interrupt actually indicated in INTID is to be serviced at an IPL higher that the current IPL. If it is not, PALcode should ignore the spurious interrupt. ## Internal Processor Registers This chapter describes the 21164 microprocessor internal processor registers (IPRs). It is organized as follows: - Instruction fetch/decode unit and branch unit (Ibox) IPRs - Memory address translation unit (Mbox) IPRs - Cache control and bus interface unit (Cbox) IPRs - PAL storage registers - Restrictions Ibox, Mbox, data cache (Dcache), and PALtemp IPRs are accessible to PALcode by means of the HW\_MTPR and HW\_MFPR instructions. Table 5–1 lists the IPR numbers for these instructions. Cbox, second-level cache (Scache), and backup cache (Bcache) IPRs are accessible in the physical address region FF FFF0 0000 to FF FFFF FFFF. Table 5–25 summarizes the Cbox, Scache, and Bcache IPRs. Table 5–38 lists restrictions on the IPRs. | | Note | |--------------------------|----------------------------------------------| | Unless explicitly stated | IPRs are not cleared or set by hardware on | | chip or timeout reset. | 22 200 dro not cloured or set by hardware on | Table 5-1 Ibox, Mbox, Dcache, and PALtemp IPR Encodings | IPR Mnemonic | Access | Index <sub>16</sub> | ibox Slots to Pipe | |----------------|---------|---------------------|--------------------| | Ibox IPRs | | | | | ISR | R | 100 | E1 | | ITB_TAG | W | 101 | E1 | | ITB_PTE | R/W | 102 | E1 | | ITB_ASN | R/W | 103 | E1 | | ITB_PTE_TEMP | R | 104 | E1 | | ITB_IA | W | 105 | E1 | | ITB_IAP | W | 106 | E1 | | ITB_IS | W | 107 | E1 | | SIRR | R/W | 108 | E1 | | ASTRR . | R/W | 109 | El | | ASTER | R/W | 10A | E1 | | EXC_ADDR | R/W | 10B | El | | EXC_SUM | R/W0C | 10C | E1 | | EXC_MASK | R | 10D | <b>E</b> 1 | | PAL_BASE | R/W | 10 <b>E</b> | <b>E</b> 1 | | PS | R/W | 10F | <b>E</b> 1 | | IPL | R/W | 110 | <b>E</b> 1 | | INTID | R | 1111 | E1 | | IFAULT_VA_FORM | ${f R}$ | 112 | E1 | | IVPTBR | R/W | 113 | E1 | | HWINT_CLR | W | 115 | <b>E</b> 1 | | SL_XMIT | W | 116 | E1 | | SL_RCV | R | 117 | E1 | | ICSR | R/W | 118 | E1 | | IC_FLUSH_CTL | W | 119 | E1 | | ICPERR_STAT | R/W1C | 11A | E1 | | PMCTR | R/W | 11C | E1 | (continued on next page) Table 5-1 (Cont.) Ibox, Mbox, Dcache, and PALtemp IPR Encodings | IPR Mnemonic | Access | Index <sub>16</sub> | Ibox Slots to Pipe | | |--------------|--------|---------------------|--------------------|--| | PALtemp IPRs | | | | | | PALtemp0 | R/W | 140 | E1 | | | PALtemp1 | R/W | 141 | E1 | | | PALtemp2 | R/W | 142 | E1 | | | PALtemp3 | R/W | 143 | E1 | | | PALtemp4 | R/W | 144 | E1 | | | PALtemp5 | R/W | 145 | E1 | | | PALtemp6 | R/W | 146 | E1 | | | PALtemp7 | R/W | 147 | E1 | | | PALtemp8 | R/W | 148 | E1 | | | PALtemp9 · | R/W | 149 | E1 | | | PALtemp10 | R/W | 14 <b>A</b> | E1 | | | PALtemp11 | R/W | 14B | E1 | | | PALtemp12 | R/W | 14C | E1 | | | PALtemp13 | R/W | 14D | <b>E</b> 1 | | | PALtemp14 | R/W | 14E | E1 | | | PALtemp15 | R/W | 14 <b>F</b> | <b>E1</b> | | | PALtemp16 | R/W | 150 | E1 | | | PALtemp17 | R/W | 151 | <b>E1</b> | | | PALtemp18 | R/W | 152 | <b>E</b> 1 | | | PALtemp19 | R/W | 153 | E1 | | | PALtemp20 | R/W | 154 | E1 | | | PALtemp21 | R/W | 155 | E1 | | | PALtemp22 | R/W | 156 | E1 | | | PALtemp23 | R/W | 157 | E1 | | | | | | | | | Mbox IPRs | | | | | | DTB_ASN | W | 200 | <b>E</b> 0 | | | DTB_CM | w | 201 | E0 | | | | | | | | (continued on next page) Table 5–1 (Cont.) Ibox, Mbox, Dcache, and PALtemp IPR Encodings | IPR Mnemonic | Access | Index <sub>16</sub> | Ibox Slots to Pipe | | |------------------|--------|---------------------|--------------------|--| | DTB_TAG | W | 202 | E0 | | | DTB_PTE | R/W | 203 | EO | | | DTB_PTE_TEMP | R | 204 | Е0 | | | MM_STAT | R | 205 | Е0 | | | VA | R | 206 | Е0 | | | VA_FORM | R | 207 | E0 | | | MVPTBR | w | 208 | E0 | | | DTBIAP | W | 209 | E0 | | | DTBIA | W | 20A | E0 | | | DTBIS | W | 20B | E0 | | | ALT_MODE | W | 20C | E0 | | | CC | W | 20D | E0 | | | CC_CTL | W | 20E | E0 | | | MCSR | R/W | 20F | E0 | | | DC_FLUSH | W | 210 | E0 | | | DC_PERR_STAT | R/W1C | 212 | <b>E0</b> | | | DC_TEST_CTL | R/W | 213 | EO | | | DC_TEST_TAG | R/W | 214 | E0 | | | DC_TEST_TAG_TEMP | R/W | 215 | E0 | | | DC_MODE | R/W | 216 | <b>E0</b> | | | MAF_MODE | R/W | 217 | <b>E0</b> | | # 5.1 Instruction Fetch/Decode Unit and Branch Unit (Ibox) IPRs The Ibox internal processor registers (IPRs) are described in Section 5.1.1 through Section 5.1.27. #### 5.1.1 Istream Translation Buffer Tag Register (ITB\_TAG) ITB\_TAG is a write-only register written by hardware on an ITBMISS/IACCVIO, with the tag field of the faulting virtual address. To ensure the integrity of the instruction translation buffer (ITB), the TAG and page table entry (PTE) fields of an ITB entry are updated simultaneously by a write operation to the ITB\_PTE register. This write operation causes the contents of the ITB\_TAG register to be written into the tag field of the ITB location, which is determined by a not-last-used replacement algorithm. The PTE field is obtained from the HW\_MTPR ITB\_PTE instruction. Figure 5-1 shows the ITB\_TAG register format. Figure 5-1 Istream Translation Buffer Tag Register (ITB\_TAG) LJ-03473-TI0 #### 5.1.2 Instruction Translation Buffer Page Table Entry (ITB PTE) Register ITB\_PTE is a read/write register. #### **Write Format** A write operation to this register writes both the PTE and TAG fields of an ITB location determined by a not-last-used replacement algorithm. The TAG and PTE fields are updated simultaneously to ensure the integrity of the ITB. A write operation to the ITB\_PTE register increments the not last-used (NLU) pointer, which allows for writing the entire set of ITB PTE and TAG entries. If the HW MTPR ITB PTE instruction falls in the shadow of a trapping instruction, the NLU pointer may be incremented multiple times. The TAG field of the ITB location is determined by the contents of the ITB TAG register. The PTE field is provided by the HW\_MTPR ITB\_PTE instruction. Write operations to this register use the memory format bits as described in the Alpha Architecture Reference Manual. Figure 5-2 shows the ITB\_PTE register write format. Figure 5-2 Instruction Translation Buffer Page Table Entry (ITB PTE) Register Write Format #### Read Format A read of the ITB PTE requires two instructions. A read of the ITB PTE register returns the PTE pointed to by the NLU pointer to the ITB\_PTE\_ EMP register and increments the NLU pointer. If the HW MFPR ITB PTE instruction falls in the shadow of a trapping instruction, the NLU pointer may be incremented multiple times. A zero value is returned to the integer register file. A second read of the ITB\_PTE\_TEMP register returns the PTE to the general purpose integer register file (IRF). Figure 5–3 shows the ITB\_PTE register read format. Figure 5–3 Instruction Translation Buffer Page Table Entry (ITB\_PTE) Register Read Format #### 5.1.3 Instruction Translation Buffer Address Space Number (ITB\_ASN) Register ITB\_ASN is a read/write register that contains the address space number (ASN) of the current process. Figure 5-4 shows the ITB\_ASN register format. Figure 5-4 Instruction Translation Buffer Address Space Number (ITB\_ASN) Register LJ-03476-T10 ## 5.1.4 Instruction Translation Buffer Page Table Entry Temporary (ITB\_PTE\_TEMP) Register ITB\_PTE\_TEMP is a read-only holding register for ITB\_PTE read data. A read of the ITB\_PTE register returns data to this register. A second read of the ITB\_PTE\_TEMP register returns data to the general purpose integer register file (IRF). Figure 5–3 shows the ITB\_PTE register format. Table 5-2 shows the GHD settings for the ITB\_PTE\_TEMP register. Table 5–2 Granularity Hint Bits in ITB PTE TEMP Read Format | Name | Extent | Type | Description | |------|--------|------|-----------------------------------------------| | GHD | <29> | RO | Set if granularity hint equals 01, 10, or 11. | | GHD | <30> | RO | Set if granularity hint equals 10 or 11. | | GHD | <31> | RO | Set if granularity hint equals 11. | ## 5.1.5 Instruction Translation Buffer Invalidate All Process (ITB\_IAP) Register ITB\_IAP is a write-only register. Any write operation to this register invalidates all ITB entries that have an address space match (ASM) bit that equals zero. ### 5.1.6 Instruction Translation Buffer Invalidate All (ITB\_IA) Register ITB\_IA is a write-only register. A write operation to this register invalidates all ITB entries, and resets the ITB not-last-used (NLU) pointer to its initial state. RESET PALcode must execute an HW\_MTPR ITB\_IA instruction in order to initialize the NLU pointer. #### 5.1.7 Instruction Translation Buffer IS (ITB\_IS) Register ITB\_IS is a write-only register. Writing a virtual address to this register invalidates the ITB entry that meets either of the following criteria: - An ITB entry whose virtual address (VA) field matches ITB\_IS<42:13> and whose ASN field matches ITB\_ASN<10:04>. - An ITB entry whose VA field matches ITB\_IS<42:13> and whose ASM bit Figure 5-5 shows the ITB\_IS register format. Figure 5-5 Instruction Translation Buffer IS (ITB\_IS) Register #### 5.1.8 Formatted Faulting Virtual Address (IFAULT\_VA\_FORM) Register IFAULT\_VA\_FORM is a read-only register containing the formatted faulting virtual address on an ITBMISS/IACCVIO (except on IACCVIOs generated by sign-check errors). The formatted faulting address generated depends on whether NT superpage mapping is enabled through ICSR bit SPE<0> Figure 5-6 shows the IFAULT\_VA\_FORM register format in non-NT mode. Figure 5–6 Formatted Faulting Virtual Address (IFAULT\_VA\_FORM) Register (NT\_ Mode=0) Figure 5-7 shows the IFAULT\_VA\_FORM register format in NT mode. Figure 5–7 Formatted Faulting Virtual Address (IFAULT\_VA\_FORM) Register (NT\_ Mode=1) LJ-03480-TI0 #### 5.1.9 Virtual Page Table Base Register (IVPTBR) IVPTBR is a read/write register. Bits <32:30> are UNDEFINED on a read of this register in non-NT mode. Figure 5-8 shows the IVPTBR format in non-NT mode. Figure 5-8 Virtual Page Table Base Register (IVPTBR) (NT\_Mode=0) Figure 5-9 shows the IVPTBR format in NT mode. Figure 5–9 Virtual Page Table Base Register (IVPTBR) (NT\_Mode=1) #### 5.1.10 Icache Parity Error Status (ICPERR\_STAT) Register ICPERR\_STAT is a read/write register. The Icache parity error status bits may be cleared by writing a 1 to the appropriate bits. Figure 5–10 and Table 5–3 describe the ICPERR\_STAT register format. Figure 5-10 Icache Parity Error Status (ICPERR\_STAT) Register Table 5-3 Icache Parity Error Status Register Fields | Name | Extent | Туре | Description | | |------|--------|------|---------------------|---------------------------------------------| | DPE | <11> | W1C | Data parity error | | | TPE | <12> | W1C | Tag parity error | | | TMR | <13> | WiC | Timeout reset error | r or <b>cfail_h</b> /no <b>cack_h</b> error | #### 5.1.11 Icache Flush Control (IC\_FLUSH\_CTL) Register IC\_FLUSH\_CTL is a write-only register. Writing any value to this register flushes the entire Icache. #### 5.1.12 Exception Address (EXC\_ADDR) Register EXC\_ADDR is a read/write register used to restart the system after exceptions or interrupts. The HW\_REI instruction causes a return to the instruction pointed to by the EXC\_ADDR register. This register can be written both by hardware and software. Hardware write operations occur as a result of exceptions/interrupts and CALL\_PAL instructions. Hardware write operations that occur as a result of exceptions/interrupts take precedence over all other write operations. In case of an exception/interrupt, hardware writes a program counter (PC) to this register. In case of precise exceptions, this is the PC value of the instruction that caused the exception. In case of imprecise exceptions/interrupts, this is the PC value of the next instruction that would have issued if the exception/interrupt was not reported. In case of a CALL\_PAL instruction, the PC value of the next instruction after the CALL\_PAL is written to EXC\_ADDR. Bit <00> of this register is used to indicate PALmode. On a HW\_REI instruction, the mode of the system is determined by bit <00> of EXC\_ADDR. Figure 5-11 shows the EXC\_ADDR register format. Figure 5-11 Exception Address (EXC\_ADDR) Register ## 5.1.13 Exception Summary (EXC\_SUM) Register EXC\_SUM is a read/write register that records the different arithmetic traps that occur between EXC\_SUM write operations. Any write operation to this register clears bits <16:10>. Figure 5–12 and Table 5–4 describe the EXC\_SUM register format. Figure 5-12 Exception Summary (EXC\_SUM) Register **Ы-03484-T10** Table 5-4 Exception Summary Register Fields | Table 5 | rable 5-4 Exception duminary negleter rields | | | | | |---------|----------------------------------------------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--| | Name | Extent | Type | Description | | | | SWC | <10> | WA | Indicates software completion possible. This bit is set after a floating-point instruction containing the /S modifier completes with an arithmetic trap and if all previous floating-point instructions that trapped since the last HW_MTPR EXC_SUM instruction also contained the /S modifier. | | | | | | | The SWC bit is cleared whenever a floating-point instruction without the /S modifier completes with an arithmetic trap. The bit remains cleared regardless of additional arithmetic traps until the register is written by an HW_MTPR instruction. The bit is always cleared upon any HW_MTPR write operation to the EXC_SUM register. | | | | a. | | | (continued on next page) | | | Table 5-4 (Cont.) Exception Summary Register Fields | Name | Extent | Туре | Description | |------|--------|------|------------------------------------------------------------------------------------------------------------| | INV | <11> | WA | Indicates invalid operation. | | DZE | <12> | WA | Indicates divide by zero. | | FOV | <13> | WA | Indicates floating-point overflow. | | UNF | <14> | WA | Indicates floating-point underflow. | | INE | <15> | WA | Indicates floating inexact error. | | IOV | <16> | WA | Indicates floating-point execution unit (Fbox) convert to integer overflow or integer arithmetic overflow. | #### 5.1.14 Exception Mask (EXC\_MASK) Register EXC\_MASK is a read/write register that records the destinations of instructions that have caused an arithmetic trap between EXC\_MASK write operations. The destination is recorded as a single bit mask in the 64-bit IPR representing F0-F31 and I0-I31. A write operation to EXC\_SUM clears the EXC\_MASK register. Figure 5-13 shows the EXC\_MASK register format. Figure 5-13 Exception Mask (EXC\_MASK) Register LJ-03485-TI0 #### 5.1.15 PAL Base Address (PAL\_BASE) Register PAL\_BASE is a read/write register containing the base address for PALcode. The register is cleared by hardware on reset. Figure 5-14 shows the PAL\_BASE register format. Figure 5-14 PAL Base Address (PAL\_BASE) Register LJ-03486-TI0 #### 5.1.16 Processor Status (PS) Register PS is a read/write register containing the current mode bits of the architecturally defined processor status as described in the Alpha Architecture Reference Manual. Figure 5–15 shows the PS register format. Figure 5-15 Processor Status (PS) Register #### 5.1.17 Ibox Control and Status Register (ICSR) ICSR is a read/write register containing Ibox-related control and status information. Figure 5–16 and Table 5–5 describe ICSR format. Figure 5-16 Ibox Control and Status Register (ICSR) Table 5-5 Ibox Control and Status Register Fields | Name | Extent | Туре | Description | | |----------|---------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------| | PME<1:0> | | RW,0 | Performance Counter master enable bits. If both PME<1> and PME<0> are clear, all performance counters in the PMCTR IPR are disabled. If either PME<1> or PME<0> are set, the counter is enabled according to the settings of the PMCTR CTL fields. If set, each IMSK<3:0> signal disables the | | | ТММ | <24> | RW,0 | corresponding IRQ_H<3:0> interrupt. If set, the timeout counter counts 5 thousand cycles before asserting timeout reset. If clear, the timeout counter counts 1 billion cycles before asserting timeout reset. | | | TMD | <25> | RW,0 | If set, disables the Ibox timeout counter. Does not affect <b>efail_h</b> /no <b>cack_h</b> error. | • | | FPE | <26> | RW,0 | If set, floating-point instructions may be issued. If clear, floating-point instructions cause FEN exceptions. | | | HWE | <27> | RW,0 | If set, allows PALRES instructions to be issued in kernel mode. | | | SPE<1:0> | <29:28> | RW,0 | If SPE<1> is set, it enables superpage mapping of Istream virtual address VA<39:13> directly to physical address PA<39:13> assuming VA<42:41> = 10. Virtual address bit VA<40> is ignored in this translation. Access is allowed only in kernel mode. | OSF / | | | | | If SPE<0> is set (NT mode), it enables superpage mapping of Istream virtual addresses $VA<42:30>=1FFE_{16}$ directly to physical address $PA<39:30>=0_{16}$ . $VA<30:13>$ is mapped directly to $PA<30:13>$ . Access is allowed only in kernel mode. | NT | | SDE | <30> | RW,0 | If set, enables PAL shadow registers. | | | CRDE | <32> | RW,0 | If set, enables correctable error interrupts. | | | SLE | <33> | RW,0 | If set, enables serial line interrupts. | | | FMS | <34> | RW,0 | If set, forces miss on Icache references. MBZ in normal operation. | | | | | | (continued on next page) | | Table 5–5 (Cont.) Ibox Control and Status Register Fields | Name | Extent | Туре | Description | |----------|--------|------|---------------------------------------------------------------------------------------| | FBT | <35> | RW,0 | If set, forces bad Icache tag parity. MBZ in normal operation. | | FBD | <36> | RW,0 | If set, forces bad Icache data parity. MBZ in normal operation. | | Reserved | <37> | RW,1 | Reserved to Digital. Must be one. | | ISTA | <38> | RO | Reading this bit indicates ICACHE BIST status.<br>If set, ICACHE BIST was successful. | | TST | <39> | RW,0 | Writing a 1 to this bit asserts the <b>test_status_</b> h<1> signal. | #### 5.1.18 Interrupt Priority Level (IPL) Register IPL is a read/write register containing the value of the interrupt priority level (IPL). Whenever hardware detects an interrupt whose target IPL level is greater than the value in IPL<04:00>, an interrupt is taken. Figure 5-17 shows the IPL register format. Figure 5-17 Interrupt Priority Level (IPL) Register LJ-03489-T10 #### 5.1.19 Interrupt ID (INTID) Register INTID is a read-only register that is written by hardware with the target interrupt priority level of the highest priority pending interrupt. The hardware recognizes an interrupt if the IPL being read is greater than the IPL given by IPL<04:00>. Interrupt service routines may use the value of this register to determine the cause of the interrupt. PALcode, for the interrupt service, must ensure that the IPL level in INTID is greater than the IPL level specified by the IPL register. This restriction is required because a level-sensitive hardware interrupt may disappear before the interrupt service routine is entered (passive release). The contents of INTID are not correct on a HALT interrupt because this particular interrupt does not have a target IPL at which it can be masked. When a HALT interrupt occurs, INTID indicates the next highest priority pending interrupt. PALcode for interrupt service must check the interrupt summary register (ISR) to determine if a HALT interrupt has occurred. Figure 5–18 shows the INTID register format. Figure 5-18 Interrupt ID (INTID) Register #### 5.1.20 Asynchronous System Trap Request Register (ASTRR) ASTRR is a read/write register containing bits to request asynchronous system trap (AST) interrupts in each of the four processor modes (U,S,E,K). In order to generate an AST interrupt, the corresponding enable bit in the ASTER must be set and the current processor mode given in the PS<04:03> should be equal to or higher than the mode associated with the AST request. Figure 5–19 shows the ASTRR format. Figure 5-19 Asynchronous System Trap Request Register (ASTRR) #### **5.1.21 Asynchronous System Trap Enable Register (ASTER)** ASTER is a read/write register containing bits to enable corresponding asynchronous system trap (AST) interrupt requests. Figure 5-20 shows the ASTER format. Figure 5-20 Asynchronous System Trap Enable Register (ASTER) ### 5.1.22 Software Interrupt Request Register (SIRR) SIRR is a read/write register used to control software interrupt requests. A software request for a particular IPL may be requested by setting the appropriate bit in SIRR<15:01>. Figure 5-21 and Table 5-6 describe the SIRR format. Figure 5-21 Software Interrupt Request Register (SIRR) Table 5-6 Software Interrupt Request Register Fields | Name | Extent | | Description | | |------------|---------|----|------------------------------|--| | SIRR<15:1> | <18:04> | RW | Request software interrupts. | | #### 5.1.23 Hardware Interrupt Clear (HWINT\_CLR) Register HWINT\_CLR is a write-only register used to clear edge-sensitive hardware interrupt requests. Figure 5-22 and Table 5-7 describe the HWINT\_CLR register format. Figure 5-22 Hardware Interrupt Clear (HWINT\_CLR) Register Table 5-7 Hardware Interrupt Clear Register Fields | Name | Extent Type | Description | |------|-------------|-------------------------------------------------| | PC0C | <27> W1C | Clears performance counter 0 interrupt requests | | PC1C | <28> W1C | Clears performance counter 1 interrupt requests | | PC2C | <29> W1€ | Clears performance counter 2 interrupt requests | | CRDC | <32> W1C | Clears correctable read data interrupt requests | | SLC | <33> W1C | Clears serial line interrupt requests | ## 5.1.24 Interrupt Summary Register (ISR) ISR is a read-only register containing information about all pending hardware, software, and asynchronous system trap (AST) interrupt requests. Figure 5–23 and Table 5–8 describe the ISR format. Figure 5-23 Interrupt Summary Register (ISR) **Table 5–8 Interrupt Summary Register Fields** | Name | Extent | Туре | Description | |---------------------------|---------|------|--------------------------------------------------------------------------------------------------------------------------------------------| | ASTRR<3:0> and ASTER<3:0> | <03:00> | RO | Enabled AST requests 3 through 0 (U,S,E,K) at IPL 2 | | SISR<15:1> | <18:04> | RO,0 | Software interrupt requests 15 through 1 corresponding to IPL 15 through 1 | | ATR | <19> | RO | Set if any AST request and corresponding<br>enable bit is set and if the processor mode is<br>equal to or higher than the AST request mode | | I20 | <20> | RO | External hardware interrupt at IPL 20 | | I21 | <21> | RO | External hardware interrupt at IPL 21 | | I22 | <22> | RO | External hardware interrupt at IPL 22 | | I23 | <23> | RO | External hardware interrupt at IPL 23 | | PC0 | <27> | RO | External hardware interrupt—performance counter 0 (IPL 29) | | PC1 | <28> | RO | External hardware interrupt—performance counter 1 (IPL 29) | | PC2 | <29> | RO | External hardware interrupt—performance counter 2 (IPL 29) | | PFL | <30> | RO | External hardware interrupt—power failure (IPL 30) | | MCK | <31> | RO | External hardware interrupt—system machine check (IPL 31) | | CRD | <32> | RO | Correctable ECC errors (IPL 31) | | SLI | <33> | RO | Serial line interrupt | | HLT | <34> | RO | External hardware interrupt—halt | # 5.1.25 Serial Line Transmit (SL\_XMIT) Register SL\_XMIT is a write-only register used to transmit bit-serial data out of the microprocessor chip under the control of a software timing loop. The value of the TMT bit is transmitted off chip on the **srom\_clk\_h** signal. In normal operation mode (not in debug mode), the **srom\_clk\_h** signal is overloaded and serves both the serial line transmission and the Icache serial ROM interface. Figure 5–24 and Table 5–9 describe the SL\_XMIT register format. Figure 5-24 Serial Line Transmit (SL\_XMIT) Register Table 5-9 Serial Line Transmit Register Fields | Name | Extent | Туре | Description | | |------|--------|------|---------------------------|--| | TMT | <07> | WO,1 | Serial line transmit data | | ### 5.1.26 Serial Line Receive (SL\_RCV) Register SL\_RCV is a read-only register used to receive bit-serial data under the control of a software timing loop. The RCV bit in the SL\_RCV register is functionally connected to the srom\_data\_h signal. A serial line interrupt is requested whenever a transition is detected on the srom\_data\_h signal and the SLE bit in the ICSR is set. During normal operations (not in test mode, the srom\_ data\_h signal is overloaded and serves both the serial line reception and the Icache serial ROM (SROM) interface. Figure 5-25 and Table 5-10 describe the SL\_RCV register format. Figure 5-25 Serial Line Receive (SL\_RCV) Register Table 5-10 Serial Line Receive Register Fields | Name | Extent Type | Description | |------|-------------|--------------------------| | RCV | <06> RO | Serial line receive data | ### 5.1.27 Performance Counter (PMCTR) Register PMCTR is a read/write register that controls the three on-chip performance counters. Figure 5–26 and Table 5–11 describe the PMCTR format. Performance counter interrupt requests are summarized in Section 5.1.24. Cbox inputs to the counter select options are described in Table 5–31. Section 2.8 describes the performance measurement support features. Figure 5-26 Performance Counter (PMCTR) Register MA-0601 **Table 5–11 Performance Counter Register Fields** | Name | Extent | Туре | Description | | |------------|---------|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------| | CTR0<15:0> | <63:48> | RW | A 16-bit counter of events selected by a enabled by CTL0<1:0>. | SELO and | | CTR1<15:0> | <47:32> | RW | A 16-bit counter. | | | SEL0 | <31> | RW | Counter0 Select—refer to Table 5–12. | | | Ku | <30> | RW | Kill user mode—disables all counters i<br>mode (refer to Table 5-13). | n user | | CTR2<13:0> | <29:16> | RW | 14-bit counter | | | CTL0<1:0> | <15:14> | RW,0 | CTR0 counter control: 00 counter disable, interrupt disable 01 counter enable, interrupt disable 10 counter enable, interrupt at count (Refer to Section 5.1.23 and Section 11 counter enable, interrupt at count | 5.1.24.) | | CTL1<1:0> | <13:12> | RW,0 | CTR1 counter control: 00 counter disable, interrupt disable 01 counter enable, interrupt disable 10 counter enable, interrupt at count 11 counter enable, interrupt at count | | | CTL2<1:0> | <11:10> | RW,0 | CTR2 counter control: 00 counter disable, interrupt disable 01 counter enable, interrupt disable 10 counter enable, interrupt at count 11 counter enable, interrupt at count | | | Kp | <09> | ₽₩ | Kill PALmode—disables all counters in PALmode (refer to Table 5–13). | n . | | Kk | <08> | RW | Kill kernel, executive, supervisor mode disables all counters in kernel, execution and supervisor modes (refer to Table 5 Ku=1, Kp=1, and Kk=1 enables counterexecutive and supervisor modes only. | ive,<br>–13). | | SEL1<3;0≽ | <07:04> | RW | Counter1 Select—refer to Table 5–12. | | | SEL2<3:0> | <03:00> | RW | Counter2 Select—refer to Table 5–12. | | Table 5-12 shows the PMCTR counter select options. Table 5-12 PMCTR Counter Select Options | Counter0<br>SEL0<0> | Counter1<br>SEL1<3:0> | Counter2<br>SEL2<3:0> | |---------------------|-----------------------------------------------------------------------------|-----------------------------| | 0:Cycles | 0x0: non-issue cycles Valid instruction in S3 but none issued. | 0x0: long(>15 cycle) stalls | | | 0x1: split-issue cycles<br>Some, but not all, instructions at<br>S3 issued. | 0x1: reserved | | | 0x2: pipe-dry cycles No valid instruction at S3. | | | | 0x3: replay trap<br>A replay trap occurred. | | | • | 0x4: single-issue cycles Exactly one instruction issued. | | | | 0x5: dual-issue cycles Exactly two instructions issued. | | | | 0x6: triple-issue cycles Exactly three instructions issued. | | | , sta | 0x7: quad-issue cycles<br>Exactly four instructions issued. | | | 1:Instructions | 0x8: jsr-ret if sel2=PC-M<br>Instruction issued if sel2 is<br>PC-M. | 0x2: PC-mispredicts | | | 0x8: cond-branch if sel2=BR-M<br>Instruction issued if sel2 is<br>BR-M | 0x3: BR-mispredicts | | | 0x8: all flow-change instructions if sel2=! (PC-M or BR-M) | | | | 0x9: IntOps issued | 0x4: Icache/RFB misses | | | 0xA: FPOps issued | 0x5: ITB misses | | | 0xB: loads issued | 0x6: Dcache LD misses | | | 0xC: stores issued | 0x7: DTB misses | | | 0xD: Icache issued | 0x8: LDs merged in MAF | | | | (continued on next page) | Table 5-12 (Cont.) PMCTR Counter Select Options | Counter0<br>SEL0<0> | Counter1<br>SEL1<3:0> | Counter2<br>SEL2<3:0> | |---------------------|------------------------|----------------------------------------------------------------------------------------------| | | 0xE: Deache accesses | 0x9: LDU replay traps<br>0xA:WB/MAF full replay traps | | | | 0xB: external perf_mon_h input (counts in CPU cycles, but input is sampled in sysclk cycles) | | | 0xF: pick CBOX input 1 | 0xC: CPU cycles 0xD: MB stall cycles 0xE: LDxL instructions issued 0xF: pick CBOX input 2 | Table 5-13 Measurement Mode Control | | Kill Bit | Settings | <b>*</b> | | |----------------------------------------------------|----------|----------|----------|--| | Measurement Mode Desired | Ku | Кр | Kk | | | Program | 0 | 0 | 0 | | | PAL only | 1 | 0 | 1 | | | OS only (kernel, executive, supervisor) | 1 | 1 | 0 | | | User only | 0 | 1 | 1 | | | All except PAL | 0 | 1 | 0 | | | OS + PAL (not user) | 1 | 0 | 0 | | | User + PAL (not kernel, executive, and supervisor) | 0 | 0 | 1 | | | Executive and supervisor only $^{1}$ | 1 | 1 | . 1 . | | $<sup>^{1}</sup>$ In this instance, Kk means kill kernel only. The combination Ku=1, Kp=1, and Kk=1 is used to focus only on the executive and supervisor modes only. Note \_ Both the user and the operating system can make PAL subroutine calls that put the machine in PALmode. The "OS only," "user only," and "executive and supervisor only" modes do not measure the events during the PAL subroutine calls made by the OS or user. The "OS + PAL" and "user + PAL" modes should be used carefully. "OS + PAL" mode measures the events during the PAL calls made by the user, whereas "user + PAL" mode measures the events during the PAL calls made by the OS. | 5.2 I | Memory | Address | <b>Translation</b> | Unit ( | (Mbox) | <b>IPRs</b> | |-------|--------|---------|--------------------|--------|--------|-------------| |-------|--------|---------|--------------------|--------|--------|-------------| The Mbox internal processor registers (IPRs) are described in Section 5.2.1 through Section 5.2.23. | | Note | | | |---------------------------------|-------------------|---------------|----------| | Traps are factored into Mbox IP | R write operation | ons unless sj | pecified | | otherwise. | | | | # 5.2.1 Dstream Translation Buffer Address Space Number (DTB\_ASN) Register DTB\_ASN is a write-only register that must be written with an exact duplicate of the ITB\_ASN register ASN field. Figure 5-27 shows the DTB\_ASN register format. Figure 5–27 Dstream Translation Buffer Address Space Number (DTB\_ASN) Register الـا-03499-TI0 # 5.2.2 Dstream Translation Buffer Current Mode (DTB\_CM) Register DTB\_CM is a write-only register that must be written with an exact displicate of the Ibox processor status (PS) register CM field. These bits indicate the current mode of the machine as described in the *Alpha Architecture Reference Manual*. Figure 5–28 shows the DTB\_CM register format. Figure 5-28 Dstream Translation Buffer Current Mode (DTB\_CM) Register ### 5.2.3 Dstream Translation Buffer Tag (DTB\_TAG) Register DTB TAG is a write-only register that writes the DTB tag and the contents of the DTB\_PTE register to the DTB. To ensure the integrity of the DTBs, the DTB's PTE array is updated simultaneously from the internal DTB\_PTE register when the DTB\_TAG register is written. The entry to be written is chosen at the time of the DTB\_TAG write operation by a not-last-used replacement algorithm implemented in hardware. A write operation to the DTB\_TAG register increments the translation buffer (TB) entry pointer of the DTB, which allows writing the entire set of DTB PTE and TAG entries. The TB entry pointer is initialized to entry zero and the TB valid bits are cleared on chip reset but not on timeout reset. Figure 5-29 shows the DTB\_TAG register format. Figure 5-29 Dstream Translation Buffer Tag (DTB\_TAG) Register ### 5.2.4 Dstream Translation Buffer Page Table Entry (DTB\_PTE) Register DTB\_PTE is a read/write register representing the 64-entry DTB page table entries (PTEs). The entry to be written is chosen by a not-last-used replacement algorithm implemented in hardware. Write operations to DTB\_PTE use the memory format bit positions as described in the Alpha Architecture Reference Manual with the exception that some fields are ignored. In particular, the page frame number (PFN) valid bit is not stored in the DTB. To ensure the integrity of the DTB, the PTE is actually written to a temporary register and not transferred to the DTB until the DTB\_TAG register is written. As a result, writing the DTB\_PTE and then reading without an intervening DTB\_TAG write operation does not return the data previously written to the DTB\_PTE register. Read operations of the DTB\_PTE require two instructions. First, a read from the DTB\_PTE sends the PTE data to the DTB\_PTE\_TEMP register. A zero value is returned to the integer register file (IRF) on a DTB\_PTE read operation. A second instruction reading from the DTB\_PTE\_TEMP register returns the PTE entry to the register file. Reading the DTB\_PTE register increments the TB entry pointer of the DTB, which allows reading the entire set of DTB PTE entries. Figure 5–30 shows the DTB\_PTE register format. | | Note _ | | |--------------------|--------------------|--------------------------------| | PP3 42 7 4 7 1 | | | | | re Reference Manua | l provides descriptions of the | | fields of the PTE. | | | Figure 5–30 Dstream Translation Buffer Page Table Entry (DTB\_PTE) Register—Write Format # 5.2.5 Dstream Translation Buffer Page Table Entry Temporary (DTB\_PTE\_TEMP) Register DTB\_PTE\_TEMP is a read-only holding register used for DTB\_PTE data Read operations of the DTB\_PTE require two instructions to return the PTE data to the register file. The first reads the DTB\_PTE register to the DTB\_PTE\_TEMP register and returns zero to the register file. The second returns the DTB\_PTE\_TEMP register to the integer register file (IRF). Figure 5–31 shows the DTB\_PTE\_TEMP register format. Figure 5–31 Dstream Translation Buffer Page Table Entry Temporary (DTB\_PTE\_TEMP) Register LJ-03503-TI0 # 5.2.6 Dstream Memory Management Fault Status (MM\_STAT) Register MM\_STAT is a read-only register that stores information on Dstream faults and Dcache parity errors. The VA, VA\_FORM, and MM\_STAT registers are locked against further updates until software reads the VA register. The MM\_STAT bits are only modified by hardware when the register is not locked and a memory management error, DTB miss, or Dcache parity error occurs. The MM\_STAT register is not unlocked or cleared on reset. Figure 5–32 and Table 5–14 describe the MM\_STAT register format. Figure 5-32 Dstream Memory Management Fault Status (MM\_STAT) Register Table 5-14 Dstream Memory Management Fault Status Register Fields | Name | Extent | Type | Description | |----------|--------|------|-------------------------------------------------------------------------------| | WR | <00> | RO | Set if reference that caused error was a write operation. | | ACV | <01> | RO | Set if reference caused an access violation.<br>Includes bad virtual address. | | FOR | <02> | RO | Set if reference was a read operation and the PTE FOR bit was set. | | FOW | <03> | RO | Set if reference was a write operation and the PTE FOW bit was set. | | DTB_MISS | <04> | RO | Set if reference resulted in a DTB miss. | | BAD_VA | <05> | RO | Set if reference had a bad virtual address. | | | | | (continued on next page) | Table 5–14 (Cont.) Dstream Memory Management Fault Status Register Fields | Name | Extent | Туре | Description | |--------|---------|------|-------------------------------------------| | RA | <10:06> | RO | RA field of the faulting instruction. | | OPCODE | <16:11> | RO | Opcode field of the faulting instruction. | # 5.2.7 Faulting Virtual Address (VA) Register VA is a read-only register. When Dstream faults, DTB misses, or Dcache parity errors occur the effective virtual address associated with the fault, miss, or error is latched in the VA register. The VA, VA\_FORM, and MM\_ STAT registers are locked against further updates until software reads the VA register. The VA register is not unlocked on reset. Figure 5-33 shows the VA register format. Figure 5-33 Faulting Virtual Address (VA) Register ### 5.2.8 Formatted Virtual Address (VA\_FORM) Register VA\_FORM a read-only register containing the virtual page table entry (PTE) address calculated as a function of the faulting virtual address and the virtual page table base (VA and MVPTBR registers). This is done as a performance enhancement to the Dstream TBmiss PAL flow. The virtual address is formatted as a 32-bit PTE when the NT\_Mode bit (MCSR<01>) is set (see Figure 5-34). VA\_FORM is locked on any Dstream fault, DTB miss, or Dcache parity error. The VA, VA\_FORM, and MM\_STAT registers are locked against further updates until software reads the VA register. The VA\_FORM register is not unlocked on reset. Figure 5-35 shows the VA\_FORM register format when MCSR<01> is clear. Figure 5-34 Formatted Virtual Address (VA\_FORM) Register (NT\_Mode=1) Figure 5-35 Formatted Virtual Address (VA FORM) Register (NT Mode=0) Table 5-15 describes the VA\_FORM register fields. **Table 5–15 Formatted Virtual Address Register Fields** | Name | Extent | Туре | Description | |-----------|---------|------|-----------------------------------------------------| | NT_Mode=0 | | | | | VPTB | <63:33> | RO | Virtual page table base address as stored in MVPTBR | | VA<42:13> | <32:03> | RO | Subset of the original faulting virtual address | | NT_Mode=1 | | | | | VPTB | <63:30> | RO | Virtual page table base address as stored in MVPTBR | | VA<31:13> | <21:03> | RO | Subset of the original faulting virtual address | ## **5.2.9 Mbox Virtual Page Table Base Register (MVPTBR)** MVPTBR is a write-only register containing the virtual address of the base of the page table structure. It is stored in the Mbox to be used in calculating the VA\_FORM value for the Dstream TBmiss PAL flow. Unlike the VA register, the MVPTBR is not locked against further updates when a Dstream fault, DTB Miss, or Dcache parity error occurs. Figure 5–36 shows the MVPTBR format. Figure 5–36 Mbox Virtual Page Table Base Register (MVPTBR) Preliminary Edition-September 1994 5-49 # 5.2.10 Dcache Parity Error Status (DC PERR STAT) Register DC\_PERR\_STAT is a read/write register that locks and stores Dcache parity error status. The VA, VA\_FORM, and MM\_STAT registers are locked against further updates until software reads the VA register. If a Deache parity error is detected while the Dcache parity error status register is unlocked, the error status is loaded into DC\_PERR\_STAT<05:02>. The LOCK bit is set and the register is locked against further updates (except for the SEQ bit) until software writes a 1 to clear the LOCK bit. The SEO bit is set when a Dcache parity error occurs while the Dcache parity error status register is locked. Once the SEO bit is set, it is locked against further updates until the software writes a 1 to DC PERR STAT < 00> to unlock and clear the bit. The SEO bit is not set when Deache parity errors are detected on both pipes within the same cycle. In this particular situation, the pipe0/pipe1 Dcache parity error status bits indicate the existence of a second parity error. The DC\_PERR\_STAT register is not unlocked or cleared on reset. Figure 5-37 and Table 5-16 describe the DC\_PERR\_STAT register format. Figure 5-37 Dcache Parity Error Status (DC PERR STAT) Register Table 5-16 Dcache Parity Error Status Register Fields | Name | Extent | Туре | Description | |------|--------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | SEO | <00> | W1C | Set if second Dcache parity error occurred in a cycle after the register was locked. The SEO bit is not set as a result of a second parity error that occurs within the same cycle as the first. | | LOCK | <01> | W1C | Set if parity error detected in Deache. Bits <05:02> are locked against further updates when this bit is set. Bits <05:02> are cleared when the LOCK bit is cleared. | | DP0 | <02> | RO | Set on data parity error in Dcache bank 0. | | DP1 | <03> | RO | Set on data parity error in Deache bank 1. | | TP0 | <04> | RO | Set on tag parity error in Deache bank 0. | | TP1 | <05> | RO | Set on tag parity error in Deache bank 1. | # 5.2.11 Dstream Translation Buffer Invalidate All Process (DTBIAP) Register DTBIAP is a write-only register. Any write operation to this register invalidates all data translation buffer (DTB) entries in which the address space match (ASM) bit is equal to zero. ## 5.2.12 Dstream Translation Buffer Invalidate All (DTBIA) Register DTBIA is a write-only register. Any write operation to this register invalidates all 64 DTB entries, and resets the DTB not-last-used (NLU) pointer to its initial state. ### 5.2.13 Dstream Translation Buffer Invalidate Single (DTBIS) Register DTBIS is a write-only register. Writing a virtual address to this register invalidates the DTB entry that meets either of the following criteria: - A DTB entry whose VA field matches DTBIS<42:13> and whose ASN field matches DTB\_ASN<63:57>. - A DTB entry whose VA field matches DTBIS<42:13> and whose ASM bit is set. Figure 5-38 shows the DTBIS register format. Figure 5-38 Dstream Translation Buffer Invalidate Single (DTBIS) Register The DTBIS register is written before the normal Ibox trap point. The DTB invalidate single operation is aborted by the Ibox only for the following trap conditions: - ITB miss - PC mispredict - When the HW\_MTPR DTBIS is executed in user mode # 5.2.14 Mbox Control Register (MCSR) MCSR is a read/write register that controls features and records status in the Mbox. This register is cleared on chip reset but not on timeout reset. Figure 5–39 and Table 5–17 describe the MCSR format. Figure 5-39 Mbox Control Register (MCSR) LJ-03511-TI0 Table 5-17 Mbox Control Register Fields | Name | Extent | Туре | Description | |------------------|---------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | M_BIG_<br>ENDIAN | <00> | RW,0 | Mbox Big Endian mode enable. When set, bit 2 of the physical address is inverted for all longword Detream references. | | SP<1:0> | <02:01> | RW,0 | Superpage mode enables. Note: Superpage access is only allowed in kernel mod | | | | | SP<1> enables superpage mapping when VA<42:41> = In this mode, virtual addresses VA<39:13> are mappedirectly to physical addresses PA<39:13>. Virtual address bit VA<40> is ignored in this translation. | | | | | SP<0> enables one-to-one superpage mapping of Dstream virtual addresses with VA<42:30> = 1FFE <sub>16</sub> . In this mode, virtual addresses VA<29:13> are mappedirectly to physical addresses PA<29:13>, with bits <39:30> of physical address set to 0. SP<0> is the NT_Mode bit that is used to control virtual address formatting on a read operation from the VA_FORM register. | | Reserved | <03> | RW,0 | Reserved to Digital. Must be zero (MBZ). | | E_BIG_<br>ENDIAN | <04> | RW,0 | Ebox Big Endian mode enable. This bit is sent to the Ebox to enable Big Endian support for the EXTxx, MSKxx and INSxx byte instructions. This bit causes t shift amount to be inverted (ones-complemented) prior to the shifter operation. | | Reserved | <05> | RW,0 | Reserved to Digital. Must be zero (MBZ). | # 5.2.15 Dcache Mode (DC\_MODE) Register DC\_MODE is a read/write register that controls diagnostic and test modes in the Dcache. This register is cleared on chip reset but not on timeout reset Figure 5-40 and Table 5-18 describe the DC\_MODE register format. Note \_ The following bit settings are required for normal operation: $DC_ENA = 1$ $DC_FHIT = 0$ $DC_BAD_PARITY = 0$ $DC_PERR_DISABLE = 0$ ## Figure 5-40 Dcache Mode (DC\_MODE) Register Table 5-18 Dcache Mode Register Fields | Name | Extent | Туре | Description | |---------------------|----------------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | DC_ENA | <00> | RW,0 | Software Dcache enable. The DC_ENA bit enables the Dcache unless the Dcache has been disabled in hardware (DC_DOA is set). (The Dcache is enabled if DC_ENA=1 and DC_DOA=0). When clear, the Dcache command is not updated by ST or FILL operations, and all LD operations are forced to miss in the Dcache. Must be one (MBO) in normal operation. | | DC_FHIT | <01> | RW,0 | Description Descri | | DC_BAD_<br>PARITY | <02> | RW,0 | When set, the DC_BAD_PARITY bit inverts the data parity inputs to the Dcache on integer stores. This has the effect of putting bad data parity into the Dcache on integer stores that hit in the Dcache. This bit has no effect on the tag parity written to the Dcache during FILL operations, or the data parity written to the Cbox write data buffer on integer store instructions. | | | | | Floating-point store instructions should <i>not</i> be issued when this bit is set because it may result in bad parity being written to the Cbox write data buffer. Must be zero (MBZ) in normal operation. | | DC_PERR_<br>DISABLE | <b>⊲03&gt;</b> | RW,0 | When set, the DC_PERR_DISABLE bit disables Dcache parity error reporting. When clear, this bit enables all Dcache tag and data parity errors. Parity error reporting is enabled during all other Dcache test modes unless this bit is explicitly set. Must be zero (MBZ) in normal operation. | ### 5.2.16 Miss Address File Mode (MAF\_MODE) Register MAF\_MODE is a read/write register that controls diagnostic and test modes in the Mbox miss address file (MAF). This register is cleared on chip reset. MAF\_MODE<05> is also cleared on timeout reset. Figure 5–41 and Table 5–19 describe the MAF\_MODE register format. The following bit settings are required for normal operation: DREAD\_NOMERGE = 0 WB\_FLUSH\_ALWAYS = 0 WB\_NOMERGE = 0 MAF\_ARB\_DISABLE = 0 WB\_CNT\_DISABLE = 0 Figure 5-41 Miss Address File Mode (MAF\_MODE) Register LJ-03513-TI0 Table 5-19 Miss Address File Mode Register Fields | Name | Extent | Туре | Description | |---------------------|--------|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | DREAD_<br>NOMERGE | <00> | RW,0 | Miss address file (MAF) DREAD Merge Disable. When set, this bit disables all merging in the DREAD portion of the MAF. Any load instruction that is issued when DREAD_NOMERGE is set is forced to allocate a new entry. Subsequent merging to that entry is not allowed (even if DREAD_NOMERGE is cleared). Must be zero (MBZ) in normal operation. | | WB_FLUSH_<br>ALWAYS | <01> | RW,0 | When set, this bit forces the write buffer to flush whenever there is a valid WB entry. Must be zero (MBZ) in normal operation. | | WB_<br>NOMERGE | <02> | RW,0 | When set, this bit disables all merging in the write buffer. Any store instruction that is issued when WB_NOMERGE is set is forced to allocate a new entry. Subsequent merging to that entry is not allowed (even if WB_NOMERGE is cleared). Must be zero (MBZ) in normal operation. | | IO_NMERGE | <03> | RW,0 | When set, this bit prevents loads from I/O space (address bit <39>=1) from merging in the MAF. Should be zero (SBZ) in typical operation. | | WB_CNT_<br>DISABLE | <04> | RW,0 | When set, this bit disables the 64-cycle WB counter in the MAF arbiter. The top entry of the WB arbitrates at low priority only when a $LDx_L$ instruction is issued or a second WB entry is made. Must be zero (MBZ) in normal operation. | | MAF_ARB_<br>DISABLE | <05> | RW,0 | When set, this bit disables all DREAD and WB requests in the MAF arbiter. WB_Reissue, Replay, Iref and MB requests are not blocked from arbitrating for the Scache. This bit is cleared on both timeout and chip reset. Must be zero (MBZ) in normal operation. | | DREAD_<br>PENDING | <06≽ | R,0 | Indicates the status of the MAF DREAD file. When set, there are one or more outstanding DREAD requests in the MAF file. When clear, there are no outstanding DREAD requests. | | WB_<br>PENDING | <07> | R,0 | This bit indicates the status of the MAF WB file. When set, there are one or more outstanding WB requests in the MAF file. When clear, there are no outstanding WB requests. | ### 5.2.17 Dcache Flush (DC\_FLUSH) Register DC\_FLUSH is a write-only register. A write operation to this register clears all the valid bits in both banks of the Dcache. ### 5.2.18 Alternate Mode (ALT MODE) Register ALT\_MODE is a write-only register that specifies the alternate processor mode used by some HW\_LD and HW\_ST instructions. Figure 5-42 and Table 5-20 describe the ALT\_MODE register format. Figure 5-42 Alternate Mode (ALT\_MODE) Register Table 5-20 Alternate Mode Register Settings | ALT_MODE< | 04:03> Mode | |-----------|-------------| | 0 0 | Kernel | | 01 | Executive | | 10 | Supervisor | | 11 | User | ### 5.2.19 Cycle Counter (CC) Register CC is a read/write register. The 21164 supports it as described in the Alpha Architecture Reference Manual. The low half of the counter, when enabled, increments once each CPU cycle. The upper half of the CC register is the counter offset. A HW\_MTPR writes CC<63:32>. Bits <31:00> are unchanged. CC\_CTL<32> is used to enable or disable the cycle counter. The CC<31:00> is written to CC\_CTL by a HW\_MTPR instruction. The CC register is read by the RPCC instruction as defined in the Alpha Architecture Reference Manual. The RPCC instruction returns a 64-bit value. The cycle counter is enabled to increment only three cycles after the MTPR CC\_CTL (with CC\_CTL<32> set) instruction is issued. This means that an RPCC instruction issued four cycles after an HW\_MTPR CC\_CTL instruction that enables the counter reads a value that is one greater than the initial count. The CC register is disabled on chip reset. Figure 5-43 shows the CC register format. Figure 5-43 Cycle Counter (CC) Register # 5.2.20 Cycle Counter Control (CC\_CTL) Register CC\_CTL is a write-only register that writes the low 32 bits of the cycle counter to enable or disable the counter. Bits CC<31:04> are written with the value in CC\_CTL<31:04> on a HW\_MTPR instruction to the CC\_CTL register. Bits CC<03:00> are written with zero. Bits CC<63:32> are not changed. If CC\_ CTL<32> is set then the counter is enabled, otherwise the counter is disabled. Figure 5-44 and Table 5-21 describe the CC\_CTL register format. Figure 5-44 Cycle Counter Control (CC CTL) Register Table 5-21 Cycle Counter Control Register Fields | Name | Extent | Туре | Description | |--------------|---------|------|--------------------------------------------------------------------------------------------------------------------------------------| | COUNT<31:04> | <31:04> | Wo | Cycle count. This value is loaded into CC<31:04>. | | CC_ENA | <32> | wo | Cycle Counter enable. When set, this bit enables the CC register to begin incrementing 3 cycles later. An RPCC issued 4 cycles after | | | | | CC_CTL<32> is written "sees" the initial count incremented by 1. | # 5.2.21 Dcache Test Tag Control (DC\_TEST\_CTL) Register DC\_TEST\_CTL is a read/write register used exclusively for testing and diagnostics. An address written to this register is used to index into the Dcache array when reading or writing to the DC\_TEST\_TAG register. Figure 5-45 and Table 5-22 describe the DC\_TEST\_CTL register format. Section 5.2.22 describes how this register is used. Figure 5-45 Dcache Test Tag Control (DC\_TEST\_CTL) Register Table 5-22 Deache Test Tag Control Register Fields | Name | Extent | Туре | Description | |----------|--------------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | BANK0 | <00> | RW | Decache Bank0 enable. When set, reads from DC_TEST_TAG return the tag from Deache bank0, writes to DC_TEST_TAG write to Deache bank0. When clear, reads from DC_TEST_TAG return the tag from Deache bank1. | | BANK1 | <01> | RW | Dcache Bank1 enable. When set, writes to DC_TEST_TAG write to Dcache bank1. This bit has no effect on reads. | | INDEX<12 | 2:3> <12:03> | RW | Dcache tag index. This field is used on reads from and writes to the DC_TEST_TAG register to index into the Dcache tag array. | ## 5.2.22 Dcache Test Tag (DC\_TEST\_TAG) Register DC\_TEST\_TAG is a read/write register used exclusively for testing and diagnostics. When DC\_TEST\_TAG is read, the value in the DC\_TEST\_CTL register is used to index into the Dcache. The value in the tag, tag parity, valid and data parity bits for that index are read out of the Dcache and loaded into the DC\_TEST\_TAG\_TEMP register. A zero value is returned to the integer register file (IRF). If BANKO is set, the read operation is from Dcache bankO. Otherwise, the read operation is from Dcache bank1. When DC\_TEST\_TAG is written, the value written to DC\_TEST\_TAG is written to the Dcache index referenced by the value in the DC\_TEST\_CTL register. The tag, tag parity, and valid bits are affected by this write operation. Data parity bits are not affected by this write operation (use DC\_MODE<02> and force hit modes). If BANKO is set, the write operation is to Dcache bankO. If BANK1 is set, the write operation is to Dcache bank1. If both are set, both banks are written. Figure 5-46 and Table 5-23 describe the DC\_TEST\_TAG register format. Figure 5-46 Dcache Test Tag (DC TEST TAG) Register Table 5-23 Dcache Test Tag Register Fields | Name | Extent | Туре | Description | |----------------|---------|------|--------------------------------------------------------------------------------------------------------------------------| | TAG_<br>PARITY | <02> | wo | Tag parity. This bit refers to the Deache tag parity bit that covers tag bits 38 through 13 (valid bits not covered). | | OW0_VALID | <11> | WO | Octaword valid bit 0. This bit refers to the Dcache valid bit for the low-order octaword within a Dcache 32-byte block. | | OW1_VALID | <12> | WO | Octaword valid bit 1. This bit refers to the Dcache valid bit for the high-order octaword within a Dcache 32-byte block. | | TAG<38:13> | <38:13> | WO | TAG<38:13>. These bits refer to the tag field in the Deache array. | | | | | Note: Bit 39 is not stored in the array. | # 5.2.23 Dcache Test Tag Temporary (DC\_TEST\_TAG\_TEMP) Register DC\_TEST\_TAG\_TEMP is a read-only register used exclusively for testing and diagnostics. Reading the Deache tag array requires a two-step read process: - 1. The first read operation from DC\_TEST\_TAG reads the tag array and data parity bits and loads them into the DC\_TEST\_TAG\_TEMP register. An UNDEFINED value is returned to the integer register file (IRF). - 2. The second read operation of the DC\_TEST\_TAG\_TEMP register returns the Deache test data to the integer register file (IRF). Figure 5-47 and Table 5-24 describe the DC\_TEST\_TAG\_TEMP register format. Figure 5-47 Dcache Test Tag Temporary (DC\_TEST\_TAG\_TEMP) Register LJ-03519-Tl0 Table 5-24 Dcache Test Tag Temporary Register Fields | Name | Extent | Туре | Description | |--------------|---------|------|-----------------------------------------------------------------------------------------------------------------------------------------| | TAG_PARITY | <02> | RO | Tag parity. This bit refers to the Dcache tag parity bit that covers tag bits 38 through 13 (valid bits not covered). | | DATA_PAR0<0> | <03> | RO | Data parity. This bit refers to the Bank 0 Dcache data parity bit that covers the lower longword of data indexed by DC_TEST_CTL<12:03>. | | DATA_PAR0<1> | <04> | RO | Data parity. This bit refers to the Bank 0 Dcache data parity bit that covers the upper longword of data indexed by DC_TEST_CTL<12:03>. | | DATA_PAR1<0> | <05> | RO | Data parity. This bit refers to the Bank1 Dcache data parity bit that covers the lower longword of data indexed by DC_TEST_CTL<12:03>. | | DATA_PAR1<1> | <06> | RO | Data parity. This bit refers to the Bank1 Dcache data parity bit that covers the upper longword of data indexed by DC_TEST_CTL<12:03>. | | OW0_VALID | <11> | RO | Octaword valid bit 0. This bit refers to the Dcache va<br>bit for the low-order octaword within a Dcache 32-byt<br>block. | | OW1_VALID | <12> | RO | Octaword valid bit 1. This bit refers to the Dcache va<br>bit for the high-order octaword within a Dcache 32-by<br>block. | | TAG<38:13> | <38:13> | RO | TAG<38:13>. These bits refer to the tag field in the Deache array. | | | | | Note: Bit 39 is not stored in the array. | ## 5.3 External Interface Control (Cbox) IPRs Table 5-25 lists specific IPRs for controlling Scache, Bcache, system configuration, and logging error information. These IPRs cannot be read or written from the system. They are placed in the 1 MB region of 21164specific I/O address space ranging from FF FFF0 0000 to FF FFFF FFFF. Any read or write operation to an undefined IPR in this address space produces UNDEFINED behavior. The operating system should not map any address in this region as writable in any mode. The Cbox internal processor registers are described in Section 5.3.1 through Section 5.3.9. Table 5-25 Cbox Internal Processor Register Descriptions | Register | Address | Type <sup>1</sup> | Description | |-------------|--------------|-------------------|-----------------------------------------------------------------------------| | SC_CTL | FF FFF0 00A8 | RW | Controls Scache behavior. | | SC_STAT | FF FFF0 00E8 | R | Logs Scache-related errors. | | SC_ADDR | FF FFF0 0188 | R | Contains the address for Scache-<br>related errors. | | BC_CONTROL | FF FFF0 0128 | w | Controls Bcache/system interface and Bcache testing. | | BC_CONFIG | FF FFF0 01C8 | W | Contains Beache configuration parameters. | | BC_TAG_ADDR | FF FFF0 0108 | R | Contains tag and control bits for FILLs from Bcache. | | EI_STAT | FF FFF0 0168 | R | Logs Bcache/system-related errors. | | EI_ADDR | FF FFF0 0148 | R | Contains the address for Bcache/system-related errors. | | FILL_SYN | FF FFF0 0068 | R | Contains fill syndrome or parity bits for FILLs from Bcache or main memory. | <sup>&</sup>lt;sup>1</sup>BC\_CONTROL<01> must be 0 when reading any IPR in this table. ## 5.3.1 Scache Control (SC\_CTL) Register SC\_CTL is a read/write register that controls Scache activity. Figure 5—48 and Table 5—26 describe the SC\_CTL register format. The bits in this register are initialized to the value indicated in Table 5—26 on reset, but not on timeout reset. Figure 5-48 Scache Control (SC\_CTL) Register LJ-03520-TI0 Table 5-26 Scache Control Register Fields | Field | Extent | Type | Description | |----------------------|---------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | SC_FHIT | <00> | RW,0 | When set, this bit forces cacheable load and store instructions to hit in the Scache, irrespective of the tag status bits. Noncacheable references are not forced to hit in the Scache and will be driven off-chip | | | | | In this mode, only one Scache set may be enabled. The Scache tag and data parity checking are disabled. | | | | | For store instructions, the value of the tag status and parity bits are specified by the SC_TAG_STAT<5:0> field. The tag is written with the address provided to the Scache with the store instruction. | | SC_FLUSH | <01> | RW,0 | When set, all the tag valid bits in the Scach are cleared every time SC_CTL is written. | | SC_TAG_<br>STAT<5:0> | <07:02> | RW,0 | This field is used only in the SC_FHIT mod<br>to write any combination of tag status and<br>parity bits in the Scache. The parity bit can<br>be used to write bad tag parity. The correct<br>value of tag parity is even. | | | | | The following bits must be zero for normal | operation: | Scache Tag<br>Status<5:0> | Description | | | |---------------------------|-----------------------------------------------------------------------------|--|--| | SC_TAG_<br>STAT<5:2> | Tag parity, valid,<br>shared, dirty;<br>bits 7, 6, 5, and 4<br>respectively | | | | SC_TAG_<br>STAT<1:0> | Octaword modified bits | | | Table 5-26 (Cont.) Scache Control Register Fields | Field | Extent | Туре | Description | |----------------|-----------------|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | SC_FB_DP<3:0> | <11:08> | RW,0 | Force bad parity—This field is used to write bad data parity for the selected longwords within the octaword when writing the Scache. If any one of these bits is set to one, then the corresponding longword's computed parity value is inverted when writing the Scache. For Scache write transactions, the | | | | · · | Cbox allocates two consecutive cycles to write up to two octawords based on the longword valid bits received from the Mbox. Therefore, the same longword parity control bits are used for writing both octawords. For example, SC_FB_DP<0> corresponds to LW0 and LW4. This bit field must be zero during normal operation. | | SC_BLK_SIZE | <12> | RW,1 | This bit selects the Scache and Bcache block size to be either 64 bytes or 32 bytes. The Scache and Bcache always have identical block sizes. All the Bcache and main memory FILLs or write transactions are of the selected block size. At power-up time, this bit is set and the default block size is 64 bytes. When clear, the block size is 32 bytes. This bit must be set to the desired value to reflect the correct Scache/Bcache block size before the 21164 does the first cacheable read or write transaction from Bcache or system. | | SC_SET_EN<2:0> | <b>≤15:13</b> > | <b>RW,7</b> | This field is used to enable the Scache sets. Only <i>one</i> or <i>all three</i> sets may be enabled at a time. Enabling any combination of <i>two</i> sets at a time results in UNPREDICTABLE behavior. | | Reserved | <18:16> | RW,0 | Reserved to Digital. Must be zero (MBZ). | ### 5.3.2 Scache Status (SC\_STAT) Register SC\_STAT is a read-only register. It is not cleared or unlocked by reset. Any PALcode read of this register unlocks SC\_ADDR and SC\_STAT and clears SC\_STAT. If an Scache tag or data parity error is detected during an Scache lookup, the SC\_STAT register is locked against further updates from subsequent transactions. Figure 5-49 and Table 5-27 describe the SC\_STAT register format. Figure 5-49 Scache Status (SC\_STAT) Register Table 5-27 Scache Status Register Fields | Field | Extent | Туре | Description | |---------------|---------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | SC_TPERR<2:0> | <02:00> | RO | When set, these bits indicate that an Scache tag lookup resulted in a tag parity error and identify the set that had the tag parity error. | | SC_DPERR<7:0> | <10:03> | RO | When set, these bits indicate that an Scache read transaction resulted in a data parity error and indicate which longword within the two octawords had the data parity error. These bits are loaded if any longword within two octawords read from the Scache during lookup had a data parity error. If SC_FHIT (SC_CTL<00>) is set, this field is used for loading the longword parity bits read out from the Scache. | | SC_CMD<4:0> | <15:11> | RO | This field indicates the Scache transaction that resulted in a Scache tag or data parity error. This field is written at the time the actual Scache error bit is written. The Scache transaction may be DREAD, IREAD, or WRITE command from the Mbox, Scache victim command, or the system command being serviced. Refer to Table 5–28 for field encoding. | | SC_SCND_ERR | <16> | RO | When set, this bit indicates that an Scache transaction resulted in a parity error while the SC_TPERR or SC_DPERR bit was already set from the earlier transaction. This bit is not set for two errors in different octawords of the same transaction. | Table 5-28 SC\_CMD Field Descriptions | SC CMD Source<15:14> | SC C | MD Encod | Description | | | |----------------------|------|----------|-------------|--------------------------|----| | 1x | 110 | | | Set shared fre<br>system | m | | | 101 | | | Read dirty fro<br>system | om | | | 100 | | | Invalidate fro<br>system | m | | | 001 | | | Scache victim | | | 00 | 001 | | | Scache IREAI | ) | | 01 | 001 | | | Scache DREA | D | | | 011 | | | Scache DWRI | TE | #### 5.3.3 Scache Address (SC\_ADDR) Register SC\_ADDR is a read-only register. It is not cleared or unlocked by reset. The address is loaded into this register every time the Scache is accessed if one of the error bits in the SC\_STAT register is not set. If an Scache tag or data parity error is detected, then this register is locked preventing further updates. This register is unlocked whenever SC\_STAT is read. For Scache read transactions, address bits <39:04> are valid to identify the address being driven to the Scache. Address bit <04> identifies which octaword was accessed first. For each Scache lookup, there is one tag access and two data access cycles. If there is a hit, two octawords are read out in consecutive CPU cycles. Tag parity error is detected only while reading the first octaword. However, data parity error can be detected on either of the two octawords. SC\_ADDR<39> is always zero. If SC\_CTL<00> is set (force hit mode), SC\_ADDR is used for storing the Scache tag and status bits. For each tag in the Scache, there are unique valid, shared, and dirty bits for a 32-byte subblock, and modify bits for each octaword (16 bytes). There is a single tag and a parity bit for two consecutive 32-byte subblocks. In force hit mode, only reads and probes load tag and status into the SC\_ADDR register. In this mode, tag and data parity checking are disabled and the SC\_ADDR and SC\_STAT registers are not locked on an error. In force hit mode, to write the Scache and read back the same block and corresponding tag status bits, a minimum of 5-cycle spacing is required between the Scache write and read of the SC\_ADDR or SC\_STAT. Figure 5-50 and Table 5-29 describe the SC\_ADDR register format. Figure 5-50 Scache Address (SC\_ADDR) Register Table 5-29 Scache Address Register Fields | Name | Extent | Type | Description | |----------------|---------|------|-----------------------------------| | Normal Mode | | | | | SC_ADDR<38:04> | <38:04> | RO | Scache address. | | Force Hit Mode | | | | | TP | <04> | RO | Scache tag parity bit. | | V0 | <05> | RO | Subblock0 tag valid bit. | | S0 | <06> | RO | Subblock0 tag shared bit. | | D0 | <07> | RO | Subblock0 tag dirty bit. | | V1 | <08> | RO | Subblock1 tag valid bit. | | S1 | <09> | RO 💮 | Subblock1 tag shared bit. | | D1 | <10> | RO | Subblock1 tag dirty bit. | | M0 | <12,11> | RO | Octawords modified for subblock0. | | M1 | <14,13> | RO | Octawords modified for subblock1. | | TAG<38:15> | <38:15> | RØ | Seache tag. | ### 5.3.4 Bcache Control (BC\_CONTROL) Register BC\_CONTROL is a write-only register. It is used to enable and control the external Bcache. Figure 5-51 and Table 5-30 describe the BC\_CONTROL register format. The bits in this register are initialized to the value indicated in Table 7-2 on reset, but not on timeout reset. Figure 5-51 Bcache Control (BC\_CONTROL) Register Table 5-30 Bcache Control Register Fields | Field | Extent | Туре | Description | |---------------|--------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | BC_ENABLED | <00> | WO,0 | When set, the external Bcache is enabled. When clear, the Bcache is disabled. When the Bcache is disabled, the BIU does not perform external cache read or write transactions. | | ALLOC_CYC | <01> | WO,0 | When set, the issue unit does not allocate a cycle for noncacheable fill data. When clear, the instruction issue unit allocates a cycle for returning noncacheable fill data to be written to the Dcache. In either case, a cycle is always allocated for cacheable integer fill data. | | | | | Note: This bit must be clear before reading any Cbox IPR. It can be set when reading all other IPRs and noncacheable LDs. | | EI_CMD_GRP2. | <02> | WO,0 | When set, the optional commands, LOCK and SET DIRTY are driven to the 21164 external interface command pins to be acknowledged by the system interface. When clear, the SET DIRTY command is not driven to the command pins. It is UNPREDICTABLE if the LOCK command is driven to the pins. However, the system should never CACK the LOCK command if this bit is clear. | | EI_CMD_GRP3 | <03> | WO,0 | When set, the MB command is driven to the 21164 external interface command pins to be acknowledged by the system interface. When clear, the MB command is not driven to the command pins. | | CORR_FILL_DAT | <04> | WO;1 | Correct fill data from Bcache or main memory, in ECC mode. When set, fill data from Bcache or main memory first goes through error correction logic before being driven to the Scache or Dcache. If the error is correctable, it is transparent to the system. | | | | | When clear, fill data from Bcache or main memory is driven directly to the Dcache before an ECC error is detected. If the error is correctable, corrected data is returned again, Dcache is invalidated, and an error trap is taken. | | | | | (continued on next page) | Table 5-30 (Cont.) Bcache Control Register Fields | Field | Extent | Туре | Description | |----------------------|---------|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | VTM_FIRST | <05> | WO,1 | This bit is set for systems without a victim buffer. On a Bcache miss, the 21164 first drives out the victimized block's address on the system address bus, followed by the read miss address and command. This bit is cleared for systems with a victim buffer. On a Bcache miss with victim, the 21164 first drives out the read miss followed by the victim address and command. | | EI_ECC_OR_<br>PARITY | <06> | WO,1 | When set, the 21164 generates or expects quadword ECC on the data check pins. When clear, the 21164 generates or expects even-byte parity on the data check pins. | | BC_FHIT | <07> | WO,0 | Bcache force hit. When set, and the Bcache is enabled, all references in cached space are forced to hit in the Bcache. A FILL to the Scache is forced to be private. Software should turn off BC_CONTROL<02> to allow clean to private transitions without going to the system. | | | | | For write transactions, the values of tag status and parity bits are specified by the BC_TAG_STAT field. Bcache tag and index are the address received by the BIU. The Bcache tag RAMs are written with the address minus the Bcache index. This bit must be zero during normal operation. | | BC_TAG_<br>STAT<4:0> | <12:08≽ | WO | This bit field is used only in BC_FHIT=1 mode to write any combination of tag status and parity bits in the Bcache. The parity bit can be used to write bad tag parity. These bits are UNDEFINED on reset. This bit field must be zero during normal operation. The field encoding is as follows: | | | | | (continued on next page) | Table 5-30 (Cont.) Bcache Control Register Fields | Field | Extent | Туре | Description | | |------------|---------|------|------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | | | | | | | | | Bcache Tag Status<br>Bit | Description | | | | | BC_TAG_STAT<4> | Parity for Beache tag | | | | | BC_TAG_STAT<3> | Parity for Bcache tag status<br>bits | | | | | BC_TAG_STAT<2> | Bcache tag valid bit | | | • | | BC_TAG_STAT<1> | Bcache tag shared bit | | | | | BC_TAG_STAT<0> | Beache tag dirty bit | | BC_BAD_DAT | <14:13> | WO,0 | bad data with correctin ECC mode. When and <64> are inverted bit <1> and <65> are octaword is read from a correctable/uncorrectable/uncorrectable based on | is field can be used to write table or uncorrectable errors bit <13> is set, data bit <0> ed. When bit <14> is set, data e inverted. When the same m the Bcache, the 21164 detects ectable ECC error on both the the value of bits <14:13> used bit field must be zero during | | EI_DIS_ERR | <15> | WO,1 | any ECC (parity) err<br>the Bcache or main<br>control parity error. | auses the 21164 to ignore ror on fill data received from memory; or Bcache tag or It also ignores a system arity error. No machine check is is set. | | PIPE_LATCH | <16> | WO,0 | | auses the 21164 to pipe the (addr_bus_req_h, cack_h, and tem clock. | | | | | | (continued on next page | Table 5-30 (Cont.) Bcache Control Register Fields | Field | Extent | Туре | Description | |---------------------|---------|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | BC_WAVE<1:0> | <18:17> | WO,0 | The bits in this field determine the number of cycles of wave pipelining that should be used during private read transactions of the Beache. Wave pipelining cannot be used in 32-byte block systems | | | | | To enable wave pipelining, BC_CONFIG<07:04> should be set to the latency of the Bcache read. BC_CONTROL<18:17> should be set to the number of cycles to subtract from BC_CONFIG<07:04> to obtain the Bcache repetition rate. For example, if BC_CONFIG<07:04>=7 and BC_CONTROL<18:17>=2, it takes seven cycles for valid data to arrive at the interface pins, but a new read will start every five cycles. The read repetition rate must be greater than 3. For example, it is not permitted to set BC_CONFIG<07:04>=5 and BC_CONTROL<18:17>=2. | | | | | The value of BC_CONTROL<18:17> should be added to the normal value of BC_CONFIG<14:12> to increase the time between read and write transactions. This prevents a write transaction from starting before the last data of a read transaction received. | | PM_MUX_<br>SEL<5:0> | <24:19> | <b>W</b> O,0 | The bits in this field are used for selecting the BIU parameters to be driven to the two performance monitoring counters in the Ibox. Refer to Table 5-for the field encoding. | | Reserved | <25> | WO,0 | Reserved—MBZ. | | FLUSH_SC_VTM | <26> | WO,0 | Flush Scache victim buffer. For systems without a Bcache, when this bit is clear, the 21164 flushes the on-chip victim buffer if it has to write-back an entry from the victim buffer. When this bit is set, the 21164 writes only one entry back from the vict buffer as needed. | | | | | For systems with a Bcache, this bit must always be clear. At power-up this bit is initialized to a value 0. | | Reserved | <27> | WO,0 | Reserved—MBZ. | | | | | (continued on next page | Table 5-30 (Cont.) Bcache Control Register Fields | Field | Extent | Туре | Description | |-------------|--------|------|-------------------------------------------------------------------------------------------------------------------------------| | DIS_SYS_PAR | <28> | WO,0 | When set, the 21164 does not check parity on the system command/address bus. However, correct parity will still be generated. | Table 5-31 describes the PM\_MUX\_SEL fields. Table 5-31 PM\_MUX\_SEL Register Fields | PM_MUX_SEL<21:19> | Counter 1 | |-------------------|----------------------------------------------------------------------------| | 0x0 | Scache accesses | | 0x1 | Scache read operations | | 0x2 | Scache write operations . | | 0x3 | Scache victims | | 0x4 | Undefined | | 0x5 | Bcache accesses | | 0x6 | Beache victims | | 0x7 | System command requests | | | | | PM_MUX_SEL<24:22> | Counter 2 | | 0x0 | Scache misses | | 0x1 | Scache read misses | | | State History | | 0x2 | Scache write misses | | 0x2<br>0x3 | | | | Scache write misses | | 0x3 | Scache write misses Scache shared write operations | | 0x3<br>0x4 | Scache write misses Scache shared write operations Scache write operations | ## 5.3.5 Bcache Configuration (BC\_CONFIG) Register BC\_CONFIG is a write-only register used to configure the size and speed of the external Bcache array. The bits in this register are initialized to the values indicated in Table 5–32 on reset, but not on timeout reset. Figure 5–52 and Table 5–32 describe the BC\_CONFIG register format. Figure 5-52 Bcache Configuration (BC\_CONFIG) Register MLO-012926 Table 5-32 Bcache Configuration Register Fields | Field | Extent | Type | Description | | |----------------|---------|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------| | BC_SIZE<2:0> | <02:00> | WO,1 | The bits in this field ar<br>the size of the Bcache<br>field is initialized to a<br>a 1M-byte Bcache. The<br>follows: | At power-on, this value representing | | | | | | | | | | | BC_<br>SIZE<2:0> <sup>1</sup> Size | | | | | | 000 Invali<br>001 1 M | d Bcache size<br>B | | | | | 010 2 M | | | | | | 011 4 MI<br>100 8 MI | | | | | | 101 16 M | B | | | | | 110 32 M | | | | | | 111 64 M | В | | Reserved | <03> | WO,0 | Must be zero (MBZ). | | | BC_RD_SPD<3:0> | €07:04> | WO,4 | The bits in this field ar to the BIU the read ac Bcache, measured in C start of a read transact valid at the input pins. speed must be within 4 At power-up, this field value of four CPU cycle | cess time of the PU cycles, from the tion until data is The Bcache read to 10 CPU cycles. is initialized to a | | | | | For systems without a speed must be equal to clock ratio. | | | | | | The Bcache read and value to the must be within three cother (absolute value = BC_WR_SPD) < 4). | ycles of each | Table 5-32 (Cont.) Bcache Configuration Register Fields | Field | Extent | Туре | Description | |-----------------------|---------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | BC_WR_SPD<3:0> | <11:08> | WO,4 | The bits in this field are used to indicate to the BIU the write time of the Beache, measured in CPU cycles. The Beache write speed must be within 4 to 10 CPU cycles. At power-up, this field is initialized to a value of four CPU cycles. | | | | | For systems without a Brache, the write speed must be equal to syselk to CPU clock ratio. | | BC_RD_WR_<br>SPC<2:0> | <14:12> | WO,7 | The bits in this field are used to indicate to the BIU the number of CPU cycles to wait when switching from a private read to a private write Beache transaction. For other data movement commands, such as READ DIRTY or FILL from main memory it is up to the system to direct systemwide data movement in a way that is safe. A value of 1 must be the minimum value for this field. | | | | | The BIU always inserts three CPU cycles between private Bcache read and private Bcache write transactions, in addition to the number of CPU cycles specified by this field. The maximum value (BC_RD_WR_SPC+3) should not be greater than the Bcache READ speed when Bcache is enabled. | | | | | At power-up, this field is initialized to a read/write spacing of seven CPU cycles. | | Reserved | <15> | WO,0 | Must be zero (MBZ). | | | | | (continued on next page | | | | | | Table 5–32 (Cont.) Bcache Configuration Register Fields | Field | Extent | Туре | Description | |-------------------------|---------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | FILL_WE_<br>OFFSET<2:0> | <18:16> | WO,1 | Bcache write-enable pulse offset, from the sys_clk_outn_x edge, for FILL transactions from the system. This field does not affect private write transactions to Bcache. It is used during FILLs from the system when writing the Bcache to determine the number of CPU cycles to wait before shifting out the contents of the write pulse field. | | | | | This field is programmed with a value in<br>the range of one to seven CPU cycles. It<br>must never exceed the sysclk ratio. For<br>example, if the sysclk ratio is 3, this field<br>must not be larger than 3. At power-up,<br>this field is initialized to a write offset<br>value of one CPU cycle. | | Reserved | <19> | WO,0 | Must be zero (MBZ). | | BC_WE_CTL<8:0> | <28:20> | WO,0 | Bcache write-enable control. This field is used to control the timing of the write-enable during a write or FILL transaction. If the bit is set, the write pulse is asserted. If the bit is clear, the write pulse is not asserted. Each bit corresponds to a CPU cycle. | | | | | For private Bcache write and shared-write transactions, this field is used to assert the write pulse without any offset. | | | | | For FILLs to the Bcache, the FILL_WE_OFFSET<18:16> field determines the number of CPU cycles to wait before asserting the write pulse as programmed in this field. | | | | | At power-up, all bits in this field are cleared. | | Reserved | <63:29> | WO | Ignored. | ### 5.3.6 Bcache Tag Address (BC\_TAG\_ADDR) Register BC\_TAG\_ADDR is a read-only register. Unless locked, the BC\_TAG\_ADDR register is loaded with the results of every Bcache tag read. When a tag or tag control parity error occurs, this register is locked against further updates. Software may read this register by using the 21164-specific I/O space address instruction. This register is unlocked whenever the EI STAT register is read, or the user enters BC\_FHIT mode. It is not unlocked by reset. Unused tag bits in the TAG field of this register are always zero, based on the size of the Bcache as determined by the BC\_SIZE field of the BC\_CONTROL register. Figure 5-53 and Table 5-33 describe the BC\_TAG\_ADDR register format. Figure 5-53 Bcache Tag Address (BC\_TAG\_ADDR) Register Table 5-33 Bcache Tag Address Register Fields | Field | Extent | Туре | Description | |---------------|---------|------|---------------------------------------------------------------------------| | HIT | <12> | RO | If set, Bcache access resulted in a hit in the Bcache. | | TAGCTL_P | <13> | RO | Value of the parity but for the Bcache tag<br>status bits. | | TAGCTL_D | <14> | RO | Value of the Bcache TAG dirty bit. | | TAGCTL_S | <15> | RO | Value of the Bcache TAG shared bit. | | TAGCTL_V | <16> | RO | Value of the Bcache TAG valid bit. | | TAG_P | <17> | RO | Value of the tag parity bit. | | BC_TAG<38:20> | <38:20> | RO | Beache tag bits as read from the Beache.<br>Unused bits are read as zero. | #### 5.3.7 External Interface Status (El STAT) Register EI\_STAT is a read-only register. Any PALcode read access of this register unlocks and clears it. A read access of EI STAT also unlocks the EI ADDR BC\_TAG, and FILL\_SYN registers subject to some restrictions. The ELSTAT register is not unlocked or cleared by reset. Fill data from Bcache or main memory could have correctable (c) or uncorrectable (u) errors in ECC mode. In parity mode, fill data parity errors are treated as uncorrectable hard errors. System address/command parity errors are always treated as uncorrectable hard errors irrespective of the mode. The sequence for reading, unlocking, and clearing El ADDR, BC\_TAG, FILL\_SYN, and EI\_STAT is as follows: - 1. Read EL\_ADDR, BC\_TAG, and FILL\_SYN in any order. Does not unlock or clear any register. - 2. Read EI\_STAT register. Reading this register unlocks EI\_ADDR, BC\_TAG, and FILL\_SYN registers. EI\_STAT is also unlocked and cleared when read, subject to conditions described in Table 5-34. Loading and locking rules for external interface registers are defined in Table 5-34. If the first error is correctable, the registers are loaded but not locked. On the second correctable error, registers are neither loaded nor locked. Registers are locked on the first uncorrectable error except the second hard error bit. The second hard error bit is set only for an uncorrectable error followed by an uncorrectable error. If a correctable error follows an uncorrectable error, it is not logged as a second error. Bcache tag parity errors are uncorrectable in this context. Table 5–34 Loading and Locking Rules for External Interface Registers | Correctable<br>Error | Uncorrectable<br>Error | Second Hard<br>Error | Load<br>Register | Lock<br>Register | Action when El_STAT is read | |----------------------|------------------------|----------------------|------------------|-------------------|----------------------------------------------------------------| | 0 | 0 | Not possible | No | No | Clears and unlocks everything. | | 1 | 0 | Not possible | Yes | No | Clears and unlocks everything. | | 0 | 1 | 0 | Yes | Yes | Clears and unlocks everything. | | 11 | 1 | 0 | Yes | Yes | Clear (c) bit does not unlock. Transition to (0,1,0) state. | | 0 | 1 | 1 | No | Already<br>locked | Clears and unlocks everything. | | 11 | 1 | 1 | No | Already<br>locked | Clear (c) bit does not unlock.<br>Transition to (0,1,1) state. | These are special cases. It is possible that when ELADDR is read, only the correctable error bit is set and the registers are not locked. By the time ELSTAT is read, an uncorrectable error is detected and the registers are loaded again and locked. The value of ELADDR read earlier is no longer valid. Therefore, for the (1,1,x) case, when ELSTAT is read correctable, the error bit is cleared and the registers are not unlocked or cleared. Software must reexecute the IPR read sequence. On the second read operation, error bits are in (0,1,x) state, all the related IPRs are unlocked, and ELSTAT is cleared. The EI\_STAT register is a read-only register used to control external interface registers. Figure 5–54 and Table 5–35 describe the EI\_STAT register format. Figure 5-54 External Interface Status (EI\_STAT) Register Table 5-35 EI\_STAT Register Fields | Field | Extent | Туре | Description | |--------------|----------------|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | CHIP_ID<3:0> | <27:24> | RO | Read as "2." Future update revisions to the chip will return new unique values. | | BC_TPERR | <28> | RO | Indicates that a Bcache read transaction encountered bad parity in the tag address RAM. | | BC_TC_PERR | <29> | RO | Indicates that a Bcache read transaction encountered bad parity in the tag control RAM. | | EI_ES | <30> | RO | When set, this bit indicates that the error source is fill data from main memory or a system address/command parity error. | | | | | When clear, the error source is fill data from the Bcache. This bit is only meaningful when COR_ECC_ERR, UNC_ECC_ERR, or EI_PAR_ERR is set. | | | | | This bit is not defined for a Bcache tag error (BC_TPERR) or a Bcache tag control parity error (BC_TC_ERR). | | COR_ECC_ERR | <31> | RO | Correctable ECC error. This bit indicates that a fill data received from outside the CPU contained a correctable ECC error. | | UNC_ECC_ERR | <32> | RO | Uncorrectable ECC error. This bit indicates that fill data received from outside the CPU contained an uncorrectable ECC error. In the parity mode, it indicates data parity error. | | EI_PAR_ERR | <33> | RO | External interface command/address parity error. This bit indicates that an address and command received by the CPU has a parity error. | | FIL_IRD | <34> | RO | This bit has meaning only when one of the ECC or parity error bit is set. It is set to indicate that the error occurred during an I-ref FILL and clear to indicate that the error occurred during a D-ref FILL. | | | | | This bit is not defined for a Bcache tag error (BC_TPERR) or a Bcache tag control parity error (BC_TC_ERR). | | SEO_HRD_ERR | <b>≥35&gt;</b> | RO | Second external interface hard error. This bit indicates that a FILL from Bcache or main memory, or a system address/command received by the CPU has a hard error while one of the hard error bits in the EI_STAT register is already set. | ## 5.3.8 External Interface Address (EI\_ADDR) Register EI\_ADDR is a read-only register that contains the physical address associated with errors reported by the EI\_STAT register. Its content is meaningful only when one of the error bits is set. A read of EI\_STAT unlocks the EI\_ADDR register. Figure 5-55 shows the EI\_ADDR register format. Figure 5–55 External Interface Address (El\_ADDR) Register ### 5.3.9 Fill Syndrome (FILL SYN) Register FILL\_SYN is a 16-bit read-only register. It is loaded but not locked on a correctable ECC error, so that another correctable error does not reload it. This loaded and locked if an uncorrectable ECC error or parity error is recognized during a FILL from Bcache or main memory as shown in Table 5-34. The FILL\_SYN register is unlocked when the EI\_STAT register is read. This register is not unlocked by reset. If the 21164 is in ECC mode and an ECC error is recognized during a cache fill transaction, the syndrome bits associated with the bad quadword are loaded in the FILL\_SYN register. FILL\_SYN<07:00> contains the syndrome associated with the lower quadword of the octaword. FILL\_SYN<15:08> contains the syndrome associated with the higher quadword of the octaword. A syndrome value of 0 means that no errors where found in the associated quadword. If the 21164 is in parity mode and a parity error is recognized during a cache fill transaction, the FILL\_SYN register indicates which of the bytes in the octaword has bad parity. FILL\_SYNDROME<07:00> is set appropriately to indicate the bytes within the lower quadword that were corrupted. Likewise, FILL SYN<15:08> is set to indicate the corrupted bytes within the upper quadword. Figure 5-56 shows the FILL\_SYN register format. Figure 5-56 Fill Syndrome (FILL\_SYN) Register Table 5-36 lists the syndromes associated with correctable single-bit errors. Table 5–36 Syndromes for Single-Bit Errors | Data Bit | Syndrome <sub>16</sub> | Check Bit | Syndrome <sub>46</sub> | | |----------|------------------------|-----------|------------------------|--| | 00 | CE | 00 | 01 | | | 01 | CB | 01 | 02 | | | 02 | D3 | 02 | 04 | | | 03 | D5 | 03 | 08 | | | 04 | D6 | 04 | 10 | | | 05 | D9 | 05 | 20 | | | 06 | DA | 06 | 40 | | | 07 | DC | 07 | 80 | | | 08 | 23 | | | | | 09 | 25 | | | | | 10 | 26 | | | | | 11 | 29 | | | | | 12 | 2A | | | | | 13 | 2C | | | | | 14 | 31 | | | | | 15 | 34 | | | | | 16 | 0 <b>E</b> | | | | | 17 | 0B | | | | | | | | | | Table 5-36 (Cont.) Syndromes for Single-Bit Errors | Data Bit | Syndrome <sub>16</sub> | Check Bit | Syndrome <sub>16</sub> | | |-----------|------------------------|-----------|------------------------|---------------| | 18 | 13 | | | | | 19 | 15 | | | | | 20 | 16 | | | | | 21 | 19 | | | | | 22 | 1A | | | | | 23 | 1C | | | | | 24 | <b>E3</b> | | | •••• | | <b>25</b> | <b>E</b> 5 | | | | | 26 | <b>E6</b> | | | | | 27 | E9 | | <b>.</b> | | | 28 | EA | | | • | | 29 | EC | | | | | 30 | F1 | | | | | 31 | F4 | | | | | 32 | <b>4F</b> | | | | | 33 | 4A | | | | | 34 | 52 | | | | | 35 | 54 | | | | | 36 | 57 | | | | | 37 | 58 | | | | | 38 | 5B | | | | | 39 | 5D | , | | | | 40 | A2 | | | | | 41 | A4 | | | | | 42 | A7 | | | | | 43 | A8 | | | | | 44 | AB | | | | | 45 | AD | | | | | 46 | В0 | | | | | | | | (continued | on next page) | Table 5–36 (Cont.) Syndromes for Single-Bit Errors | Data Bit | Syndrome <sub>16</sub> | Check Bit | Syndrome <sub>16</sub> | | |-----------|------------------------|-----------|------------------------|--| | 47 | B5 | | | | | 48 | 8 <b>F</b> | | | | | 49 | 8A | | | | | 50 | 92 | | | | | 51 | 94 | | | | | <b>52</b> | 97 | | | | | 53 | 98 | | | | | 54 | 9B | | | | | 55 | <b>9</b> D | | | | | 56 | 62 | | | | | 57 | 64 | | | | | 58 | 67 | | | | | 59 | 68 | | | | | 60 | <b>6B</b> | | | | | 61 | 6D | | | | | 62 | 70 | | | | | 63 | 75 | | | | ## 5.4 PAL Storage Registers The 21164 Ebox register file has eight extra registers that are called the PALshadow registers. The PALshadow registers overlay R8 through R14 and R25 when the CPU is in PALmode and ICSR<SDE> is set. Thus, PALcode can consider R8 through R14 and R25 as local scratch. PALshadow registers can not be written in the last two cycles of a PALcode flow. The normal state of the CPU is ICSR<SDE> = ON. PALcode disables SDE for the unaligned trap and for error flows. The Ibox holds a bank of 24 PALtemp registers. The PALtemp registers are accessed with the HW\_MTPR and HW\_MFPR instructions. The latency from a PALtemp read operation to availability is one cycle. # 5.5 Restrictions The following sections list all known register access restrictions. ### 5.5.1 Cbox IPR PAL Restrictions Table 5-37 describes the Cbox IPR PAL restrictions. Table 5-37 Cbox IPR PAL Restrictions | Condition | Restriction | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------| | | | | Store to SC_CTL, BC_CONTROL, BC_CONFIG except if no bit is changed other than BC_CONTROL <alloc_cyc>, BC_CONTROL<pm_mux_sel>, or BC_CONTROL<dbg_mux_sel>.</dbg_mux_sel></pm_mux_sel></alloc_cyc> | Must be preceded by MB, must be followed<br>by MB, must have no concurrent cacheable<br>Istream references or concurrent system<br>commands. | | Store to BC_CONTROL that only changes bits BC_CONTROL <alloc_cyc>, BC_CONTROL<pm_mux_sel>, or BC_CONTROL<dbg_mux_sel>.</dbg_mux_sel></pm_mux_sel></alloc_cyc> | Must be preceded by MB and must be followed by MB. | | Load from SC_STAT. | Unlocks SC_ADDR and SC_STAT. | | Load from EI_STAT. | Unlocks EI_ADDR, EI_STAT, FILL_SYN, and BC_TAG_ADDR. | | Any Cbox IPR address. | No LDx_L or STx_C. | | Any undefined Cbox IPR address. | No store instructions. | | Scache or Bcache in force hit mode. | No STx_C to cacheable space. | | Clearing of SC_FHIT in SC_CTL. | Must be followed by MB, read operation of SC_STAT, then MB prior to subsequent store. | | Clearing of BC_FHIT in BC_CONTROL. | Must be followed by MB, read operation of EI_STAT, then MB prior to subsequent store. | | Load from any Cbox IPR | BC_CONTROL<01> (ALLOC_CYCLE) must be clear. | #### 5.5.2 PAL Restrictions-Instruction Definitions Mbox instructions are: LDx, LDQ\_U, LDx\_L, HW\_LD, STx, STQ\_U, STx\_C, HW\_ST, and FETCHx. Virtual Mbox instructions are: LDx, LDQ\_U, LDx\_L, HW\_LD (virtual), STx, STQ\_U, STx\_C, HW\_ST (virtual), and FETCHx. Load instructions are: LDx, LDQ\_U, LDx\_L, and HW\_LD. Store instructions are: STx, STQ\_U, STx\_C, and HW\_ST Table 5-38 lists PALcode restrictions. Table 5-38 PAL Restrictions Table | The following in cycle 0: | Restrictions (Note: Numbers refer to cycle number): | Y if checked<br>by PVC <sup>1</sup> | |--------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------| | CALL_PAL entry | No HW_REI or HW_REI_STALL in cycle 0.<br>No HW_MFPR EXC_ADDR in cycle 0,1 | Y | | PALshadow write instruction | No HW_REI or HW_REI_STALL in 0, 1. | Y | | HW_LD, lock bit set | PAL must slot to E0.<br>No other Mbox instruction in 0. | | | HW_LD, VPTE bit set | No other virtual reference in 0. | | | Any load instruction | No Mbox HW_MTPR or HW_MFPR in 0. No HW_MFPR MAF_MODE in 1,2 (DREAD_PENDING may not be updated). No HW_MFPR DC_PERR_STAT in 1,2. No HW_MFPR DC_TEST_TAG slotted in 0. | Y<br>Y<br>Y | | Any store instruction | No HW_MFPR DC_PERR_STAT in 1,2. No HW_MFPR MAF_MODE in 1,2 (WB_PENDING may not be updated). | Y<br>Y | | Any virtual Mbox instruction | No HW_MTPR DTBIS in 1. | Y | | Any Mbox instruction or WMB, if it traps | HW_MTPR any Ibox IPR not aborted in 0,1 (except that EXC ADDR is updated with correct faulting PC). HW_MTPR DTBIS not aborted in 0,1. | Y<br>Y | | Any Ibox trap except PC-<br>mispredict, ITBMISS, or<br>OPCDEC due to user mode | HW_MTPR DTBIS not aborted in 0,1. | | | HW_REI_STALL | Only one HW_REI_STALL in an aligned block of four instructions. | | <sup>&</sup>lt;sup>1</sup>PALcode verification checker Table 5–38 (Cont.) PAL Restrictions Table | The following in cycle 0: | Restrictions (Note: Numbers refer to cycle number): | Y if checked<br>by PVC <sup>1</sup> | |----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|-------------------------------------| | HW_MTPR any undefined IPR number | Illegal in any cycle. | | | ARITH trap entry | No HW_MFPR EXC_SUM or EXC_MASK in cycle 0,1. | Y | | Machine check trap entry | No register file read or write access in 0,1,2,3,4,5,6,7. No HW_MFPR EXC_SUM or EXC_MASK in cycle 0,1. | Y | | HW_MTPR any Ibox IPR (including PALtemp registers) | No HW_MFPR same IPR in cycle 1,2. No floating-point conditional branch in 0. No FEN or OPCDEC instruction in 0. | Y | | HW_MTPR ASTRR, ASTER | No HW_MFPR INTID in 0,1,2,3,4,5.<br>No HW_REI in 0,1. | Y<br>Y | | HW_MTPR SIRR | No HW_MFPR INTID in 0,1,2,3,4. | Y | | HW_MTPR EXC_ADDR | No HW_REI in eycle 0,1. | Y | | HW_MTPR IC_FLUSH_CTL | Must be followed by 44 inline PALcode instructions. | | | HW_MTPR ICSR: HWE | No HW_REI in 0,1,2,3. | Y | | HW_MTPR ICSR: FPE | No floating-point instructions in 0, 1, 2, 3.<br>No HW_REI in 0,1,2. | | | HW_MTPR ICSR: SPE, FMS | If HW_RELSTALL, then no HW_RELSTALL in 0,1. If HW_REL, then no HW_REL in 0,1,2,3,4. | Y<br>Y | | HW_MTPR ICSR: SPE | Must flush Icache. | | | HW_MTPR ICSR: SDE | No PALshadow read/write access in 0,1,2,3.<br>No HW_REI in 0,1,2. | Y | | HW_MTPR ITB_ASN | Must be followed by HW_REI_STALL. No HW_REI_STALL in cycle 0,1,2,3,4. No HW_MTPR ITB_IS in 0,1,2,3. | Y<br>Y | | HW_MTPR ITB_PTE | Must be followed by HW_REI_STALL. | | | HW_MTPR ITB_IAP, ITB_IS, ITB_IA | Must be followed by HW_REI_STALL. | | | HW_MTPR ITB_IS | HW_REI_STALL must be in the same Istream octaword. | | | HW_MTPR IVPTBR | No HW_MFPR IFAULT_VA_FORM in 0,1,2. | Y | | HW_MTPR PAL_BASE | No CALL_PAL in 0,1,2,3,4,5,6,7.<br>No HW_REI in 0,1,2,3,4,5,6. | Y<br>Y | | HW_MTPR.PS | No HW_REI in 0,1,2.<br>No private CALL_PAL in 0,1,2,3. | Y | <sup>&</sup>lt;sup>1</sup>PALcode verification checker Table 5–38 (Cont.) PAL Restrictions Table | The following in cycle 0: | Restrictions (Note: Numbers refer to cycle number): | Y if checked<br>by PVC <sup>1</sup> | |------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------| | HW_MTPR CC, CC_CTL | No RPCC in 0,1,2.<br>No HW_REI in 0,1. | Y<br>Y | | HW_MTPR DC_FLUSH | No Mbox instructions in 1,2. No outstanding fills in 0. No HW_REI in 0,1. | Y<br>Y | | HW_MTPR DC_MODE | No Mbox instructions in 1,2,3,4. No HW_MFPR DC_MODE in 1,2. No outstanding fills in 0. No HW_REI in 0,1,2,3. No HW_REI_STALL in 0,1. | Y<br>X<br>Y<br>Y | | HW_MTPR DC_PERR_STAT | No load or store instructions in 1; No HW_MFPR DC_PERR_STAT in 1,2. | Y<br>Y | | HW_MTPR DC_TEST_CTL | No HW_MFPR DC_TEST_TAG in 1,2,3. No HW_MFPR DC_TEST_CTL issued or slotted in 1,2. | Y | | HW_MTPR DC_TEST_TAG | No outstanding DC fills in 0.<br>No HW_MFPR DC_TEST_TAG in 1,2,3. | Y | | HW_MTPR DTB_ASN | No virtual Mbox instructions in 1,2,3. No HW_REI in 0,1,2. | Y<br>Y | | HW_MTPR DTB_CM, ALT_<br>MODE | No virtual Mbox instructions in 1,2. No HW_REI in 0,1. | Y<br>Y | | HW_MTPR DTB_PTE | No virtual Mbox instructions in 2. No HW_MTPR DTB_ASN, DTB_CM, ALT_MODE, MCSR, MAF_MODE, DC_MODE, DC_PERR_STAT, DC_TEST_CTL, DC_TEST_TAG in 2. | Y<br>Y | | HW_MTPR DTB_TAG | No virtual Mbox instructions in 1,2,3. No HW_MTPR DFB_TAG in 1. No HW_MFPR DTB_PTE in 1,2. No HW_MTPR DTBIS in 1,2. No HW_REI in 0,1,2. | Y<br>Y<br>Y<br>Y | | HW_MTPR DTBIAP, DTBIA | No virtual Mbox instructions in 1,2,3. No HW_MTPR DTBIS in 0,1,2. No HW_REI in 0,1,2. | Y<br>Y<br>Y | | HW_MTPR DTBIA | No HW_MFPR DTB_PTE in 1. | $\mathbf{Y}^{-1}$ | | HW_MTPR MAF_MODE | No Mbox instructions in 1,2,3. No WMB in 1,2,3. No HW_MFPR MAF_MODE in 1,2. No HW_REI in 0,1,2. | Y<br>Y<br>Y<br>Y | <sup>&</sup>lt;sup>1</sup>PALcode verification checker Table 5–38 (Cont.) PAL Restrictions Table | The following in cycle 0: | Restrictions (Note: Numbers refer to cycle number): | Y if checked<br>by PVC <sup>1</sup> | |---------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------| | HW_MTPR MCSR | No virtual Mbox instructions in 0,1,2,3,4. No HW_MFPR MCSR in 1,2. No HW_MFPR VA_FORM in 1,2,3. No HW_REI in 0,1,2,3. No HW_REI_STALL in 0,1. | Y<br>Y<br>Y<br>Y<br>Y | | HW_MTPR MVPTBR | No HW_MFPR VA_FORM in 1,2. | Y | | HW_MFPR ITB_PTE | No HW_MFPR ITB_PTE_TEMP in 1,2,3. | <b>Y</b> | | HW_MFPR DC_TEST_TAG | No outstanding DC fills in 0. No HW_MFPR DC_TEST_TAG_TEMP issued or slotted in 1. No LDx instructions slotted in 0. No HW_MTPR DC_TEST_CTL between HW_MFPR DC_TEST_TAG_TEMP. | | | HW_MFPR DTB_PTE | No Mbox instructions in 0,1. No HW_MTPR DC_TEST_CTL, BG_TEST_TAG in 0,1. No HW_MFPR DTB_PTE_TEMP issued or slotted in 1,2,3. | Y<br>Y<br>Y | | | No HW_MFPR DTB_PTE in 1. No virtual Mbox instructions in 0,1,2. | Y | | HW_MFPR VA | Must be done in ARITH, MACHINE CHECK, DTBMISS_<br>SINGLE, UNALIGN, DFAULT traps and ITBMISS flow<br>after the VPTE load. | | ## Privileged Architecture Library Code This chapter describes the 21164 privileged architecture library code (PALcode). The chapter is organized as follows: - PALcode description - PALmode environment - Invoking PALcode - PALcode entry points - Required PALcode function codes - Alpha 21164 implementation of the architecturally reserved opcodes instructions ## 6.1 PALcode Description Privileged architecture library code (PALcode) is macrocode that provides an architecturally defined operating-system-specific programming interface that is common across all Alpha microprocessors. The actual implementation of PALcode differs for each operating system. PALcode runs with privileges enabled, instruction stream mapping disabled, and interrupts disabled. PALcode has privilege to use five special opcodes that allow functions such as physical data stream references and internal processor register (IPR) manipulation. PALcode can be invoked by the following events: - Reset - System hardware exceptions (MCHK, ARITH) - Memory-management exceptions - Interrupts - CALL\_PAL instructions PALcode has characteristics that make it appear to be a combination of microcode, ROM BIOS, and system service routines, though the analogy to any of these other items is not exact. PALcode exists for several major reasons. - There are some necessary support functions that are too complex to implement directly in a processor chip's hardware, but that cannot be handled by a normal operating system software routine. Routines to fill the translation buffer (TB), acknowledge interrupts, and dispatch exceptions are some examples. In some architectures, these functions are handled by microcode, but the Alpha AXP architecture is careful not to mandate the use of microcode so as to allow reasonable chip implementations. - There are functions that must run atomically, yet involve long sequences of instructions that may need complete access to all the underlying computer hardware. An example of this is the sequence that returns from an exception or interrupt. - There are some instructions that are necessary for backward compatibility or ease of programming; however, these are not used often enough to dedicate them to hardware, or are so complex that they would jeopardize the overall performance of the computer. For example, an instruction that does a VAX style interlocked memory access might be familiar to someone used to programming on a CISC machine, but is not included in the Alpha AXP architecture. Another example is the emulation of an instruction that has no direct hardware support in a particular chip implementation. In each of these cases, PALcode routines are used to provide the function. The routines are nothing more than programs invoked at specified times, and read in as Istream code in the same way that all other Alpha AXP code is read. Once invoked, however, PALcode runs in a special mode called PALmode. #### **6.2 PALmode Environment** PALcode runs in a special environment called PALmode, defined as follows: - Istream memory mapping is disabled. Because the PALcode is used to implement translation buffer fill routines, Istream mapping clearly cannot be enabled. Dstream mapping is still enabled. - The program has privileged access to all the computer hardware. Most of the functions handled by PALcode are privileged and need control of the lowest levels of the system. - Interrupts are disabled. If a long sequence of instructions need to be executed atomically, interrupts cannot be allowed. An important aspect of PALcode is that it uses normal Alpha AXP instructions for most of its operations; that is, the same instruction set that nonprivileged Alpha AXP programmers use. There are a few extra instructions that are only available in PALmode, and will cause a dispatch to the OPCDEC PALcode entry point if attempted while not in PALmode. The Alpha AXP architecture allows some flexibility in what these special PALmode instructions do. In the 21164 the special PALmode-only instructions perform the following functions: - Read or write internal processor registers (HW\_MFPR, HW\_MTPR). - Perform memory load or store operations without invoking the normal memory-management routines (HW\_LD, HW\_ST). - Return from an exception or interrupt (HW\_REI). When executing in PALmode, there are certain restrictions for using the privileged instructions because PALmode gives the programmer complete access to many of the internal details of the 21164. Refer to Section 6.6 for information on these special PALmode instructions. #### Caution It is possible to cause unintended side effects by writing what appears to be perfectly acceptable PALcode. As such, PALcode is not something that many users will want to change. ## 6.3 Invoking PALcode PALcode is invoked at specific entry points, under certain well-defined conditions. These entry points provide access to a series of callable routines, with each routine indexed as an offset from a base address. The base address of the PALcode is programmable (stored in the PAL\_BASE IPR), and is normally set by the system reset code. Refer to Section 6.4 for additional information on PALcode entry points. PC<00> is used as the PALmode flag both to the hardware and to PALcode itself. When the CPU enters a PALflow, the Ibox sets PC<00>. This bit remains set as instructions are executed in the PAL Istream. The Ibox hardware ignores this and behaves as if the PC were still longword aligned for the purposes of Istream fetch and execute. On HW\_REI, the new state of PALmode is copied from EXC\_ADDR<00>. When an event occurs that needs to invoke PALcode, the 21164 first drains the pipeline. The current PC is loaded into the EXC\_ADDR IPR, and the appropriate PALcode routine is dispatched. These operations occur under direct control of the chip hardware, and the machine is now in PAL mode. When the HW REI instruction is executed at the end of the PALcode routine. the hardware executes a jump to the address contained in the EXC ADDR IPR The LSB is used to indicate PALmode to the hardware. Generally, the LSB is clear upon return from a PALcode routine, in which case, the hardware loads the new PC, enables interrupts, enables memory mapping, and dispatches back to the user. The most basic use of PALcode is to handle complex hardware events, and it is called automatically when the particular hardware event is sensed. This use of PALcode is similar to other architectures' use of microcode. There are several major categories of hardware-initiated invocations of PALcode: - When the 21164 is reset, it enters PAL mode and executes the RESET PALcode. The system will remain in PALmode until a HW\_REI instruction is executed and EXC\_ADDR<00> is cleared. It then continues execution in non-PAL mode (native mode), as just described. It is during this initial RESET PALcode execution that the rest of the low-level system initialization is performed, including any modification to the PALcode base register. - When a system hardware error is detected by the 21164, it invokes one of several PALcode routines, depending upon the type of error. Errors such as machine checks, arithmetic exceptions, reserved or privileged instruction decode, and data fetch errors are handled in this manner. - When the 21164 senses an interrupt, it dispatches the acknowledgment of the interrupt to a PALcode routine that does the necessary information gathering then handles the situation appropriately for the given interrupt. - When a Dstream or Istream translation buffer miss occurs, one of several PALcode routines is called to perform the TB fill. The 21164 Ebox register file has eight extra registers that are called the PALshadow registers. The PALshadow registers overlay R8, R9, R10, R11, R12, R13, R14, and R25 when the CPU is in PALmode and ICSR<SDE> is asserted. For additional PAL scratch, the Ibox has a register bank of 24 PALtemp registers, which are accessible via HW MTPR and HW MFPR instructions. ## **6.4 PALcode Entry Points** PALcode is invoked at specific entry points. The 21164 has two types of PALcode entry points: CALL\_PAL and traps. ## 6.4.1 CALL\_PAL Entry CALL\_PAL entry points are used whenever the Ibox encounters a CALL\_PAL instruction in the instruction stream (Istream). CALL\_PAL instructions start at the following offsets: - Privileged CALL\_PAL instructions start at offset 2000. - Nonnprivileged CALL\_PAL instructions start at offset 3000. The CALL\_PAL itself is issued into pipe E1 and the Ibox stalls for the minimum number of cycles necessary to perform an implicit TRAPB. The PC of the instruction immediately following the CALL\_PAL is loaded into EXC\_ADDR and is pushed onto the return prediction stack. The Ibox contains special hardware to minimize the number of cycles in the TRAPB at the start of a CALL\_PAL. Software can benefit from this by scheduling CALL\_PALs such that they do not fall in the shadow of: - IMUL - Any floating-point operate, especially FDIV Each CALL\_PAL instruction includes a function field that will be used in the calculation of the next PC. The PAL OPCDEC flow will be started if the CALL\_PAL function field is: - In the range 40<sub>16</sub> to 7F<sub>16</sub> inclusive. - Greater than BF<sub>16</sub>. - Between 00<sub>16</sub> and 3F<sub>16</sub> inclusive, and PS<CUR\_MOD> is not equal to kernel. If no OPCDEC is detected on the CALL\_PAL function, then the PC of the instruction to execute after the CALL\_PAL is calculated as follows: - PC<63:14> = PAL\_BASE IPR<63:14> - PC<13>=1 - PC<12> = CALL\_PAL function field<7> - PC<11:06> = CALL\_PAL function field<5:0> - PC<05:01>=0 #### PC<00> = 1 (PALmode) The minimum number of cycles for a CALL\_PAL execution is 4: | Number of Cycles | Description | |------------------|-----------------------------------------------------------------------------------------------------------------------| | 1 | Minimum TRAPB for empty pipe. Typically this will be four cycles. | | 1 | Issue the CALL_PAL instruction. | | 2 | The minimum length of a PAL flow. However, in most cases there will be more than two cycles of work for the CALL_PAL. | ## 6.4.2 PALcode Trap Entry Points Chip-specific trap entry points start PALcode. No PALcode assist is required for replay and mispredict type traps.) EXC\_ADDR is loaded with the return PC and the Ibox performs a TRAPB in the shadow of the trap. The return prediction stack is pushed with the PC of the trapping instruction for precise traps, and with some later PC for imprecise traps. Table 6-1 shows the PALcode trap entry points and their offset from the PAL\_BASE IPR. Entry points are listed from highest to lowest priority. (Prioritization among the Dstream traps works because DTBMISS is suppressed when there is a sign check error. The priority of ITBMISS and interrupt is reversed if there is an Icache miss.) Table 6-1 PALcode Trap Entry Points | Entry Name | Offset <sub>16</sub> | Description | |---------------|----------------------|------------------------------------------------------------| | RESET | 0000 | Reset | | IACCVIO | 0080 | Istream access violation or sign check error on PC | | INTERRUPT | 0100 | Interrupt: hardware, software, and AST | | ITBMISS | 0180 | Istream TBMISS | | DTBMISS_SINGI | Æ 0200 | Dstream TBMISS | | DTBMISS_DOUB | LE 0280 | Dstream TBMISS during virtual page table entry (PTE) fetch | | UNALIGN | 0300 | Dstream unaligned reference | | | | (continued on next page | Table 6-1 (Cont.) PALcode Trap Entry Points | Entry Name | Offset <sub>16</sub> | Description | |------------|----------------------|-------------------------------------------------------------------------------------------------------------------------| | DFAULT | 0380 | Dstream fault or sign check error en virtual<br>address | | MCHK | 0400 | Uncorrected hardware error | | OPCDEC | 0480 | Illegal opcode | | ARITH | 0500 | Arithmetic exception | | FEN | 0580 | Floating-point operation attempted with: | | | | <ul> <li>Floating-point instructions (LD, ST, and<br/>operates) disabled through FPE bit in the<br/>ICSR IPR</li> </ul> | | | | • Floating-point IEEE operation with data type other than S, T, or Q | ## 6.5 Required PALcode Function Codes Table 6-2 lists opcodes required for all Alpha AXP implementations. The notation used is oo.ffff, where oo is the hexadecimal 6-bit opcode and ffff is the hexadecimal 26-bit function code. Table 6-2 Required PALcode Function Codes | Mnemonic | Туре | Function Code | | |----------|--------------------|---------------|--| | DRAINA | Privileged | 00.0002 | | | HALT | <b>Pr</b> ivileged | 00.0000 | | | IMB | Unprivileged | 00.0086 | | # 6.6 Alpha 21164 Implementation of the Architecturally Reserved Opcodes Instructions PALcode uses the Alpha AXP instruction set for most of its operations. Table 6-3 lists the opcodes reserved by the Alpha AXP architecture for implementation-specific use. These opcodes are privileged and are only available in PALmode. Table 6-3 Opcodes Reserved for PALcode | 21164<br>Mnemonic | Opcode | Architecture<br>Mnemonic | Function | |-------------------|-----------|--------------------------|---------------------------------------------------------------------------------| | HW_LD | 1B | PAL1B | Performs Dstream load instructions. | | HW_ST | <b>1F</b> | PAL1F | Performs Dstream store instructions. | | HW_REI | 1E | PAL1E | Returns instruction flow to the program counter (PC) pointed to by EXC_ADDR IPR | | HW_MFPR | 19 | PAL19 | Accesses the Ibox, Mbox, and Deache internal processor registers (IPRs): | | HW_MTPR | 1D . | PAL1D | Accesses the Ibox, Mbox, and Dcache IPRs. | These instructions produce an OPCDEC exception if executed while not in the PALmode environment. If ICSR<HWE> is set, these instructions can be executed in kernel mode. Any software executing with ICSR<HWE> set must use extreme care to obey all restrictions listed in this chapter and Chapter 5. Register checking and bypassing logic is provided for PALcode instructions as it is for non-PALcode instructions, when using general purpose registers (GPRs). | `````````````````````````````````````` | *********** | | | | |--------------------------------------------|-----------------------------------------|--------|------|--| | 700000000 | ************************************** | _ Note | <br> | | | <br>************************************** | 770000000000000000000000000000000000000 | | | | Explicit software timing is required for accessing the hardware-specific IPRs and the PAL TEMP registers. These constraints are described in Table 5-38. ## 6.6.1 HW\_LD Instruction PALcode uses the HW\_LD instruction to access memory outside of the realm of normal Alpha AXP memory management and to do special forms of Dstream loads. Figure 6-1 and Table 6-4 describe the format and fields of the HW\_LD instruction. Data alignment traps are inhibited for HW\_LD instructions. Figure 6–1 HW\_LD Instruction Format Table 6-4 HW\_LD Format Description | Field | Value | Description | |--------|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------| | OPCODE | 1B <sub>16</sub> | The OPCODE field contains 1B <sub>16</sub> . | | RA | | Destination register number | | RB | | Base register for memory address. | | PHYS | 0<br>1 | The effective address for the HW_LD is virtual. The effective address for the HW_LD is physical. Translation and memory-management access checks are inhibited. | | ALT | 0 | Memory-management checks use Mbox IPR DTB_CM for access checks. Memory-management checks use Mbox IPR ALT_MODE for access checks. | | WRTCK | 0<br>1 | Memory-management checks FOR and read access violations.<br>Memory-management checks FOR, FOW, read, and write access<br>violations. | | QUAD | 0<br>1 | Length is longword.<br>Length is quadword. | | VPTE | 1 | Flags a virtual PTE fetch. Used by trap logic to distinguish single TBMISS from double TBMISS. Access checks are performed in kernel mode. | | LOCK | 1 | Load lock version of HW_LD. PAL must slot to E0 pipe. | | DISP | | Holds a 10-bit signed byte displacement. | ## 6.6.2 HW\_ST Instruction PALcode uses the HW\_ST instruction to access memory outside of the realm of normal Alpha AXP memory management and to do special forms of Dstream store instructions. Figure 6-2 and Table 6-5 describe the format and fields of the HW\_ST instruction. Data alignment traps are inhibited for HW\_ST instructions. The Ibox logic will always slot HW\_ST to pipe EQ. Figure 6-2 HW\_ST Instruction Format Table 6-5 HW\_ST Format Description | Field | Value | Description | |--------|------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------| | OPCODE | 1F <sub>16</sub> | The OPCODE field contains 1F <sub>16</sub> . | | RA | | Write data register number. | | RB | | Base register for memory address. | | PHYS | 0<br>1 | The effective address for the HW_ST is virtual. The effective address for the HW_ST is physical. Translation and memory-management access checks are inhibited. | | ALT | 0 | Memory-management checks use Mbox IPR DTB_CM for access checks. Memory-management checks use Mbox IPR ALT_MODE for access checks. | | QUAD | 0<br>1 | Length is longword.<br>Length is quadword. | | COND | 1 | Store_conditional version of HW_ST. In this case, RA is written with the value of LOCK_FLAG. | | DISP | | Holds a 10-bit signed byte displacement. | | MBZ | | HW_ST<13,11> must be zero. | #### 6.6.3 HW REI Instruction The HW\_REI instruction is used to return instruction flow to the PC pointed to by the EXC\_ADDR IPR. The value in EXC\_ADDR<0> will be used as the new value of PALmode after the HW\_REI instruction. The Ibox uses the return prediction stack to speed the execution of HW\_REI. There are two different types of HW\_REI: - Prefetch: In this case, the Ibox begins fetching the new Istream as soon as possible. This is the version of HW\_REI that is normally used. - Stall prefetch: This encoding of HW\_REI inhibits Istream fetch until the HW\_REI itself is issued. Thus, this is the method used to synchronize Ibox changes (such as ITB write instructions) with the HW\_REI. There is a rule that PALcode can have only one such HW\_REI in an aligned block of four instructions. Figure 6-3 and Table 6-6 describe the format and fields of the HW\_REI instruction. The Ibox logic will slot HW\_REI to pipe E1. Figure 6-3 HW\_REI Instruction Format Table 6-6 HW REI Format Description | OPCODE 1E <sub>18</sub> RA/RB | The OPCODE field contains $1E_{16}$ . Register numbers, should be R31 to avoid unnecessary stalls. | |-------------------------------|-----------------------------------------------------------------------------------------------------| | RA/RB | Register numbers, should be R31 to avoid unnecessary stalls. | | | —————————————————————————————————————— | | TYP 10 | Normal version. | | 11 | Stall version. | | MBZ 0 | HW_REI<13:00> Must be zero. | ## 6.6.4 HW MFPR and HW MTPR Instructions The HW\_MFPR and HW\_MTPR instructions are used to access internal state from the Ibox, Mbox, and Dcache. The HW\_MFPR from Ibox IPRs has a latency of one cycle (HW\_MFPR in cycle n results in data available to the using instruction in cycle n+1). HW\_MFPR from Mbox and Dcache IPRs has a latency of two cycles. Ibox hardware slots each type of MXPR to the correct Ebox pipe (refer to Table 5–1). Figure 6–4 and Table 6–7 describe the format and fields of the HW\_MFPR and HW\_MTPR instructions. Figure 6-4 HW\_MFPR and HW\_MTPR Instruction Format | 31 | 26 2 | 5 2 | 21 20 | 16 | 15 | | | Q) | 0 | |-----|------|-----|-------|----|----|------------|-------|--------------|----------| | OPC | ODE | RA | | RB | | T T<br>L L | Index | | | | | | | | | | | | LJ-03472-Tio | -<br>0.3 | Table 6-7 HW\_MTPR and HW\_MFPR Format Description | Field | Value | Description | | | • | - | |--------|--------------------------------------|---------------------------------|----------------------------|----------------------------|--------------------------------|-------------| | OPCODE | 19 <sub>16</sub><br>1D <sub>16</sub> | The OPCODE | | | | | | RA/RB | | Must be the s<br>destination re | same, sour<br>gister for l | ce register fo<br>HW_MFPR. | or HW_MTPR and | d | | Index | | Specifies the Chapter 5 for | | | for field encoding cific IPRs. | g. Refer to | ## Initialization and Configuration This chapter provides information on 21164-specific microprocessor/system initialization and configuration. It is organized as follows: - Input signals sys\_reset\_l and dc\_ok\_h and booting - Sysclk ratio and delay - Built-in Self-test (BiSt) - · Serial read-only memory (SROM) interface port - Serial terminal port - Cache initialization - External interface initialization - Internal processor register (IPR) reset state - Timeout reset - IEEE 1149.1 test port reset ## 7.1 Input Signals sys\_reset\_I and dc\_ok\_h and Booting The 21164 reset sequence uses two input signals: sys\_reset\_1 and dc\_ok\_h. When transitioning from a powered-down state to a powered-up state, signal dc\_ok\_h must be deasserted, and signal sys\_reset\_1 must be asserted until power has reached the proper operating point. After power has reached the proper operating point, signal dc\_ok\_h must be asserted. Then, signal sys\_reset\_1 must be deasserted. At this point, the 21164 recognizes a powered on state. If signal dc\_ok\_h is not asserted, signal sys\_reset\_1 is forced asserted internally. After sys\_reset\_1 is deasserted, the 21164 begins the following sequence of operations: - 1. Icache built-in self-test (BiSt) - 2....An optional automatic Icache initialization, using an external serial ROM (SROM) interface - 3. Dispatch to the reset PALcode trap entry point (physical location 0) - a. If step 2 initialized the Icache using the SROM interface, the cache should contain code that appears to be at location 0, that is, the cache should be initialized such that it hits on the dispatch. Typically the code in the Icache should configure the 21164's IPRs as necessary before causing any off-chip read or write commands. This allows the 21164 to be configured to match the external system implementation. - b. If step 2 did not initialize the Icache, the Icache has been flushed by reset. The reset PALcode trap dispatch misses in the Icache and Scache (also flushed by reset) and produces an off-chip read command. The external system implementation must be compatible with the 21164's default configuration after reset (refer to Section 7.8). The code that is executed at this point should complete the 21164 configuration as necessary. - 4. After configuring the 21164, control can be transferred to code anywhere in memory, including the noncacheable regions. If the SROM interface was used to initialize the Icache, the Icache can be flushed by a write operation to IC\_FLUSH\_CTL after control is transferred. This transfer of control should be to addresses not loaded in the Icache by the SROM interface or the Icache may provide unexpected instructions. - Typically, PALbase and any state required by PALcode are initialized and the console is started (switching out of PALmode and into native mode). The console code initializes and configures the system and boots an operating system from an I/O device such as a disk or the network. Signal sys\_reset\_l forces the CPU into a known state. Section 7.8 lists the reset state of each IPR. Table 7-1 provides the reset state of each external signal pin. Table 7-1 Alpha 21164 Signal Pin Reset State | Signal | Reset State | | |-----------------|---------------|--| | Clocks. | **** | | | clk_mode_h<1:0> | NA (input). | | | cpu_clk_out_h | Clock output. | | | de_ok_h | NA (input). | | Table 7-1 (Cont.) Alpha 21164 Signal Pin Reset State | Signal | Reset State | | |--------------------|-------------------|--| | Clocks. | | | | osc_clk_in_h,l | Must be clocking. | | | ref_clk_in_h | NA (input). | | | sys_clk_out1_h,l | Clock output. | | | sys_clk_out2_h,l | Clock output. | | | sys_reset_l | NA (input). | | | Bcache | | | | data_h<127:0> | Tristated. | | | data_check_h<15:0> | Tristated. | | | data_ram_oe_h | Deasserted. | | | data_ram_we_h | Deasserted. | | | index_h<25:4> | Unspecified. | | | tag_ctl_par_h | Tristated. | | | tag_data_h<38:20> | Tristated. | | | tag_data_par_h | Tristated. | | | tag_dirty_h | Tristated. | | | tag_ram_oe_h | Deasserted. | | Deasserted. Tristated. Tristated. tag\_ram\_we\_h tag\_shared\_h tag\_valid\_h Table 7-1 (Cont.) Alpha 21164 Signal Pin Reset State | Signal | Reset State | |--------------------|--------------------------------------------------------------------------------------------------------------------| | System Interface | | | addr_h<39:4> | Driven or tristated depending upon addr_bus_req_h at most recent sysclk edge. If driven, the value is unspecified. | | addr_bus_req_h | NA (input). | | addr_cmd_par_h | Driven or tristated depending upon addr_bus_req_h at most recent sysclk edge. If driven, the command is NOP. | | addr_res_h<2:0> | NOP. | | cack_h | Must be deasserted. | | cfail_h | Must be deasserted. | | cmd_h<3:0> | Driven or tristated depending upon addr_bus_req_h at most recent sysclk edge. If driven, the command is NOP. | | dack_h | Must be deasserted. | | data_bus_req_h | NA (input). | | fill_h | Must be deasserted. | | fill_error_h | Must be deasserted. | | fill_id_h | Must be deasserted. | | fill_nocheck_h | Must be deasserted. | | idle_bc_h | Must be deasserted. | | int4_valid_h<3:0> | Unspecified | | scache_set_h<1:0> | Unspecified. | | shared_h | NA (input): | | system_lock_flag_h | Must be deasserted. | | victim_pending_h | Unspecified. | | Interrupts | | | irq_h<3:0> | Sysclk divisor ratio input. | | meh_hlt_irq_h | Sysclk delay input. | | pwr_fail_irq_h | Sysclk delay input. | | sys_mch_chk_irq_h | Sysclk delay input. | | Table 7–1 (Cont.) | Alpha 21164 Signal | Pin Reset State | |-------------------|------------------------|-------------------| | Table 1-1 toolit. | Albiia 2 i lut Sidilai | ı III HESEL State | | Reset State | | |-------------------|-------------------------------------------------------------------------------------------------------------------------------| | | . 4 | | | | | NA (input). | | | Deasserted. | | | NA (input). | | | Deasserted. | | | NA (input). | | | NA (input). | | | NA (input). | | | NA (input). | | | NA (input). | | | Deasserted. | | | NA (input). | | | Must be asserted. | | | | | | NA (input).<br>NA | | | | NA (input). Deasserted. NA (input). NA (input). NA (input). NA (input). NA (input). Deasserted. NA (input). Must be asserted. | While signal dc\_ok\_h is deasserted, the 21164 provides its own internal clock source from an on-chip ring oscillator. When dc\_ok\_h is asserted, the 21164 clock source is the differential clock input pins osc\_clk\_in\_h, l. | | | | Caution | | |------------|------|--------|-------------------------------------------|--| | | | | | | | A clock so | urce | should | always be provided when signal dc_ok_h is | | | asserted | | | · · · · · · · · · · · · · · · · · · · | | Signal sys\_reset\_1 must remain asserted while signal dc\_ok\_h is deasserted, and for some period of time after dc\_ok\_h assertion. It should remain asserted for at least 400 internal CPU cycles in length. Then, signal sys\_reset\_1 may be deasserted. Signal sys\_reset\_1 deassertion need not be synchronous with respect to sysclk. When the 21164 is free-running from the internal ring oscillator, the internal clock frequency is in the range TBD. The sysclk divisor and sys\_clk\_out2\_x delay are determined by input pins while signal sys\_reset\_l remains asserted. Refer to Section 4.2.2 and Section 4.2.3 for ratio and delay values. ## 7.1.1 Power-Up Requirements The 21164 chip uses a 3.3-V dc power supply. This 3.3-V power supply must be stable before any input or bidirectional pin rises above 4 V. ## 7.1.2 Pin State with dc\_ok\_h Not Asserted While dc\_ok\_h is deasserted, and sys\_reset\_l is asserted, every output and bidirectional 21164 pin is tristated and pulled weakly to ground by a small pull-down transistor. ## 7.2 Sysclk Ratio and Delay While in reset, the 21164 reads sysclk configuration parameters from the interrupt signal pins. These inputs should be driven with the correct configuration values whenever signal sys\_reset\_1 is asserted. Refer to Section 4.2.2 and Section 4.2.3 for relevant input signals and ratio/delay values. If the signal inputs reflecting configuration parameters change while sys\_reset\_1 is asserted, allow 20 internal CPU cycles before the new sysclk behavior is correct. ## 7.3 Built-In Self-Test (BiSt) Upon deassertion of signal sys\_reset\_l, the 21164 automatically executes the Icache built-in self-test (BiSt). The Icache is automatically tested and the result is made available in the ICSR IPR and on signal test\_status\_h<0>. Internally, the CPU reset continues to be asserted throughout the BiSt process. For additional information, refer to Section 12.5.1. ## 7.4 Serial Read-Only Memory Interface Port The serial read-only memory (SROM) interface provides the initialization data load path from a system SROM to the instruction cache (Icache). Following initialization, this interface can function as a diagnostic port using privileged architecture library code (PALcode). The following signals make up the SROM interface: srom\_present\_l srom\_data\_h #### srom\_oe\_l srom\_clk\_h During system reset, the 21164 samples the **srom\_present\_1** signal for the presence of SROM. If **srom\_present\_1** is deasserted, the SROM load is disabled and the reset sequence clears the Icache valid bits. This causes the first instruction fetch to miss the Icache and read instructions from off-chip memory. If **srom\_present\_l** is asserted during setup, then the **system** performs an SROM load as follows: - 1. The srom\_oe\_l signal supplies the output enable to the SROM. - The srom\_clk\_h signal supplies the clock to the ROM that causes it to advance to the next bit. The cycle time of this clock is 126± times the CPU clock period. - 3. The srom\_data\_h signal inputs the SROM data. Every data and tag bit in the Icache is loaded by this sequence. The format of the Icache data and SROM load timing is described in Chapter 12. #### 7.5 Serial Terminal Port After the SROM data is loaded into the Icache, the three SROM load signals become parallel I/O pins that can drive a diagnostic terminal using an interface such as RS422. #### 7.6 Cache Initialization Regardless of whether the Icache BiSt is executed, the Icache is flushed during the reset sequence prior to the SROM load. If the SROM load is bypassed, the Icache will be in the flushed state initially. The second-level cache (Scache) is flushed and enabled by internal reset. This is required if the SROM load is bypassed. The initial Istream reference after reset is location 0. Because that is a cacheable-space reference, the Scache will be probed. The data cache (Dcache) is disabled by reset. It is not initialized or flushed by reset. It should be initialized by PALcode before being enabled. The external board-level Bcache is disabled by reset. It should be initialized by PALcode before being enabled. #### 7.6.1 Icache Initialization The Icache is not kept coherent with memory. When it is necessary to make it coherent with memory, the following procedure is used. The CALLPAL IMB function performs this function using this procedure. - 1. Execute an MB instruction. This forces all writes in the write buffer into memory. - Stall until write buffer is drained - Carry load or issue a HW\_MFPR from any Mbox IPR - 2. Write to IC\_FLUSH\_CTL with an HW\_MTPR to flush the Icache. - 3. Execute a total of 44 NOP instructions (BIS #31,r31,r31) to clear the prefetch buffers and Ibox pipeline. The 44 NOP instructions must start on an INT16 boundary. Pad with additional NOP instructions if necessary. ## 7.6.2 Flushing Dirty Blocks During a power failure recovery, dirty blocks must be flushed out of the Scache and backup cache (Bcache), if present #### Systems Without a Bcache To flush out dirty blocks from the Scache on power failure, the following sequence must be used to guarantee that all the dirty blocks have been written back to main memory. The BC\_CONFIG<BC\_SIZE> field is used for this function in systems without a Bcache. When powering up, this field is initialized to a value representing a 1M-byte Bcache. During system configuration flow, this field must be changed to a value of 0 for normal operation. To flush out the dirty blocks from all three sets in the Scache, perform the following tasks: - 1. Set BC CONFIG-BC SIZE><2:0> = 0x1; do loads at a stride of 64 bytes through 128K bytes of continuous memory; guarantees all dirty blocks from set0 are flushed out. - Set BC\_CONFIG<BC\_SIZE><2:0> = 0x2; do loads at a stride of 64 bytes through 96K bytes of continuous memory; guarantees all dirty blocks from set1 are flushed out. - Set BC\_CONFIG<BC\_SIZE><2:0> = 0x4; do loads at a stride of 64 bytes through 64K bytes of continuous memory; guarantees all dirty blocks from set2 are flushed out. All other values of BC CONFIG<BC SIZE><2:0> are undefined in this mode. #### Systems with a Bcache To flush out dirty blocks from the Scache and Bcache on power failure, the following sequence must be used to guarantee that all the dirty blocks have been written back to main memory: perform loads at a stride of Bcache block size = 2× size of the Bcache ## 7.7 External Interface Initialization After reset, the cache control and bus interface unit (Cbox) is in the default configuration dictated by the reset state of the IPR bits that select the configuration options. The Cbox response to system commands and internally generated memory accesses is determined by this default configuration. System environments that are not compatible with the default configuration must use the SROM Icache load feature to initially load and execute a PALcode program. This program configures the external interface control (Cbox) IPRs as needed. ## 7.8 Internal Processor Register Reset State Many IPR bits are not initialized by reset. They are located in error-reporting registers and other IPR states. They must be initialized by initialization PALcode. Table 7-2 lists the state of all internal processor registers (IPRs) immediately following reset. The table also specifies which registers need to be initialized by power-up PALcode. Table 7–2 Internal Processor Register Reset State | IPR | Reset State | Comments | | |----------------|--------------|----------------------------------------------|---------------| | | | | | | Ibox Registers | | | | | ITB_TAG | UNDEFINED | | &. | | ITB_PTE | UNDEFINED | | | | ITB_ASN | UNDEFINED | PALcode must initiali | ze. | | ITB_PTE_TEMP | UNDEFINED | | | | ITB_IAP | UNDEFINED | | | | ITB_IA | UNDEFINED | PALcode must initiali | ze. | | ITB_IS | UNDEFINED | | | | IFAULT_VA_FORM | UNDEFINED | | | | IVPTBR | UNDEFINED | PALcode must initiali | ze. | | ICPERR_STAT | UNDEFINED | PALcode must initiali | ze. | | IC_FLUSH_CTL | UNDEFINED | | | | EXC_ADDR | UNDEFINED | | | | EXC_SUM | UNDEFINED | PALcode must clear e | exception | | | | summary and excepti<br>write mask by writing | | | EXC MASK | UNDEFINED | write mask by writing | , mac_bom. | | PAL_BASE | Cleared | Cleared on reset. | | | PS | UNDEFINED | PALcode must set pro | rossor status | | ICSR | See Comments | All bits are cleared or | | | 10011 | | ICSR<37>, which is s<br>ICSR<38>, which is U | et, and | | IPL | UNDEFINED | PALcode must initiali | ze. | | INTID | UNDEFINED | | | | ASTRR | UNDEFINED | PALcode must initiali | ze. | | ASTER | UNDEFINED | PALcode must initiali | ze. | | SIRR | UNDEFINED | PALcode must initiali | ze. | | HWINT_CLR | UNDEFINED | PALcode must initiali | ze. | | ISR | UNDEFINED | | | | | | | | Table 7–2 (Cont.) Internal Processor Register Reset State | IPR | Reset State | Comments | |----------------|--------------|----------------------------------------------------------------| | SL_XMIT | Cleared | Appears on external pin. | | SL_RCV | UNDEFINED | | | PMCTR | See Comments | PMCTR<15:10> are cleared | | | | on reset. All other bits are UNDEFINED. | | Mbox Registers | | | | DTB_ASN | UNDEFINED | PALcode must initialize. | | DTB_CM | UNDEFINED | PALcode must initialize. | | DTB_TAG | Cleared | Valid bits are cleared on chip reset but not on timeout reset. | | DTB_PTE | UNDEFINED | • | | DTB_PTE_TEMP | UNDEFINED | | | MM_STAT | UNDEFINED | Must be unlocked by PALcode by reading VA register. | | VA | UNDEFINED | Must be unlocked by PALcode by reading VA register. | | VA_FORM | UNDEFINED | Must be unlocked by PALcode by reading VA register. | | MVPTBR | UNDEFINED | PALcode must initialize. | | DC_PERR_STAT | UNDEFINED | PALcode must initialize. | | DTBIAP | UNDEFINED | | | DTBIA | UNDEFINED | | | DTBIS | UNDEFINED | | | MCSR | Cleared | Cleared on chip reset but not on timeout reset. | | DC_MODE | Cleared | Cleared on chip reset but not on timeout reset. | | MAF_MODE | Cleared | Cleared on chip reset. MAF_MODE<05> cleared on timeout reset. | | DC_FLUSH | UNDEFINED | PALcode must write this register to clear Dcache valid bits. | | | | | Table 7-2 (Cont.) Internal Processor Register Reset State | IPR | Reset State | Comments | |------------------|--------------|----------------------------------------------------------------------| | ALT_MODE | UNDEFINED | | | CC | UNDEFINED | CC is disabled on chip reset. | | CC_CTL | UNDEFINED | | | DC_TEST_CTL | UNDEFINED | | | DC_TEST_TAG | UNDEFINED | | | DC_TEST_TAG_TEMP | UNDEFINED | | | | | | | Cbox Registers | | | | SC_CTL | See Comments | SC_CTL<11:00> cleared on reset.<br>SC_CTL<12> is set at power-up. | | SC_STAT | UNDEFINED | PALcode must read to unlock. | | SC_ADDR | UNDEFINED | | | BC_CONTROL | See Comments | BC_CONTROL<01:00>, <07>,<br><14:13>, <16>, and <27:19> cleared. | | | | BC_CONTROL<06:04> and <15> | | | | set on reset but not timeout reset. All other bits are UNDEFINED and | | | | must be initialized by PALcode. | | BC_CONFIG | See Comments | At power-up, BC_CONFIG is initialized to a value of | | | | 0000 0000 0001 7441 <sub>16</sub> . | | BC_TAG_ADDR | UNDEFINED | | | EI_STAT | UNDEFINED | PALcode must read twice to unlock. | | EI_ADDR | UNDEFINED | | | FILL_SYN | UNDEFINED | | Note \_ The Bcache parameters BC\_SIZE (size), BC\_RD\_SPD (read speed), BC\_WR\_SPD (write speed), and BC\_WE\_CTL (write-enable control) are all configured to default values on reset and must be initialized in the BC\_CONFIG register before enabling the Bcache. ## 7.9 Timeout Reset The instruction fetch/decode unit and branch unit (Ibox) contains a timer that times out when a very long period of time passes with no instruction completing. When this timeout occurs, an internal reset event occurs. This clears sufficient internal state to allow the CPU to begin executing again. Registers, IPRs (except as noted in Table 7–2), and caches are not affected. Dispatch to the PALcode MCHK trap entry point occurs immediately. ## 7.10 IEEE 1149.1 Test Port Reset Signal trst\_l must be asserted when sys\_reset\_l is asserted or when dc\_ok\_h is deasserted. Continuous trst\_l assertion during normal operation is used to guarantee that the IEEE 1149.1 test port does not affect 21164 operation. | · • | | |-----|--| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ## **Error Detection and Error Handling** This chapter provides an overview of the 21164's error handling strategy. Each internal cache (instruction cache [Icache], data cache [Deache], and second-level cache [Scache]) implements parity protection for tag and data. Error correction code (ECC) protection is implemented for memory and backup cache (Bcache) data. (The implementation provides detection of all double-bit errors and correction of all single-bit errors.) Correctable instruction stream (Istream) and data stream (Dstream) ECC errors are corrected in hardware without privileged architecture library code (PALcode) intervention. Bcache tags are parity protected. The instruction fetch/decode unit and branch unit (Ibox) implements logic that detects when no progress has been made for a very long time and forces a machine check trap. PALcode handles all error traps (machine checks and correctable error interrupts). Where possible, the address of affected data is latched in an IPR. Most of the Istream errors can be retried by the operating system because the machine check occurs before any part of the instruction causing the error is executed. In some other cases, the system may be able to recover from an error by terminating all processes that had access to the affected memory location. ## 8.1 Error Flows The following flows describe the events that take place during an error, the recommended responses necessary to determine the source of the error, and the suggested actions to resolve them. ## 8.1.1 Icache Data or Tag Parity Error - Machine check occurs before the instruction causing the parity error is executed. - EXC\_ADDR contains either the PC of the instruction that caused the parity error or that of an earlier trapping instruction. - ICPERR STAT<TPE> or <DPE> is set. | | | Note | |-------|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | | The Icache is not flushed by hardware in this event. If an Icache parity error occurs early in the PALcode routine at the machine check entry point, an infinite loop may result. | | | • | Recommendation: Flush the Icache early in the MCHK routine. | | 8.1.2 | Sca | che Data Parity Error—Istream | | | • , | Machine check occurs before the instruction causing the parity error is executed. | | | • | Bad data may be written to the leache or leache refill buffer and validated. | | | • | Can be retried if there are no multiple errors. | | | • | Recommendation: Flush the Icache to remove bad data. The Icache refill buffer may be flushed by executing enough instructions to fill the refill buffer with new data (32 instructions). Then flush the Icache again. | | | • | SC_STAT: SC_DPERR<7:0> is set; <sc_scnd_err> is set if there are multiple errors.</sc_scnd_err> | | | • | SC_STAT: CBOX_CMD is IRD. | | | • | SC_ADDR: Contains the address of the 32-byte block containing the error. (Bit 4 indicates which octaword was accessed first, but the error may be in either octaword.) | Recommendation: On data parity errors, it may be feasible for the operating system to "flush" the block of data out of the Scache by requesting a block of data with the same Bcache index, but a different tag. This may not be feasible on tag parity errors, because the tag address is suspect. If the requested block is loaded with no problems, then the "bad data" has been replaced. If the "bad data" is marked dirty, then when the new data tries to replace the old data, another parity error may result If the Istream parity error occurs early in the PALcode routine at the machine check entry point, an infinite loop may result. Can be retried. during the write-back (this is a reason not to attempt this in PALcode, because a MCHK from PALcode is always fatal). ## 8.1.3 Scache Tag Parity Error—Istream - Machine check occurs before the instruction causing the parity error is executed. - Bad data may be written to the Icache or Icache refill buffer and validated. - Cannot be retried. Probably will not be able to recover by deleting a single process because the exact address is unknown. - Recommendation: Flush the Icache to remove bad data. The Icache refill buffer may be flushed by executing enough instructions to fill the refill buffer with new data (32 instructions). Then flush the Icache again. - SC\_STAT: SC\_TPERR<2:0> is set; <SC\_SCND\_ERR> is set if there are multiple errors. - SC STAT: CBOX CMD is IRD. - SC\_ADDR: Contains the address of the 32-byte block containing the error. (Bit 4 indicates which octaword was accessed first, but the error may be in either octaword.) | | <u> </u> | Note | | | |-------------|----------------------------------------|--------------------|--------------------|--------| | * | | W | | | | | | | ne PALcode routine | at the | | machine che | ck entry point, | an infinite loop 1 | nay result. | | | | ************************************** | . *** | | | ## 8.1.4 Scache Data Parity Error—Dstream Read/Write, READ\_DIRTY - Machine check occurs. Machine state may have changed. - Cannot be retried, but may only need to delete the process if data is confined to a single process and no second error occurred. - SC\_STAT: SC\_DPERR<7:0> is set; SC\_SCND\_ERR is set if there are multiple errors. - SC\_STAT: CBOX\_CMD is DRD, DWRITE, or READ\_DIRTY. - SC\_ADDR: Contains the address of the 32-byte block containing the error. (Bit 4 indicates which octaword was accessed first, but the error may be in either octaword.) ## 8.1.5 Scache Tag Parity Error—Dstream or System Commands - Machine check occurs. Machine state may have changed. - Cannot be retried. Probably will not be able to recover by deleting a single process because the exact address is unknown. - SC\_STAT: SC\_TPERR<7:0> is set; <SC\_SCND\_ERR> is set if there are multiple errors. - SC\_STAT: CBOX\_CMD is DRD, DWRITE, READ\_DIRTY, SET\_SHARED, or INVAL. - SC\_ADDR: records physical address bits <39:04> of location with error. ## 8.1.6 Dcache Data Parity Error - Machine check occurs. Machine state may have changed. - Cannot be retried, but may only need to delete the process if data is confined to a single process and no second error occurred. - DCPERR\_STAT: <DP0> or <DP1> is set. <EOCK> is set. <SEO> is set if there are multiple errors. | ut more tha | n one erre | or bit <del>wi</del> l | l be set. | | O> bit is not set | |--------------|------------|------------------------|--------------|-------------|-------------------| | * | | *** | | | | | | | | | | | | A: Contains | the virtu | al addres | s of the qu | adword witl | the error. | | M STAT le | cked. Cor | ntents cor | ntain inform | nation abou | t instruction car | | arity error. | | | | | | | | | | | | | | - A. W | <i>***</i> | <b>.</b> | Note | | | ## 8.1.7 Dcache Tag Parity Error | • | Machine | check | occurs. | Machine | state | may | have | changed | İ. | |---|---------|-------|---------|---------|-------|-----|------|---------|----| |---|---------|-------|---------|---------|-------|-----|------|---------|----| | there are mult | | | | | | |---------------------------------|-----------------|---------------|------------|--------------|------------| | - | | Note | | | | | For multiple p<br>but more than | | | cle, the ≪ | SEO> bit is | not set, | | | | | | | | | | | | | | | | VA: Contains t | he virtual addı | ress of the D | cache blo | ck (hexawo | rd) with t | | | ced. Contents | contain infor | mation al | oout instruc | ction caus | | error. MM_STAT locl | ced. Contents | contain infor | mation al | oout instruc | ction caus | Probably will not be able to recover by deleting a single process, because exact address is unknown, and a load may have falsely hit. ## 8.1.8 Istream Uncorrectable ECC or Data Parity Errors (Bcache or Memory) - Machine check occurs before the instruction causing the error is executed. - Bad data may be written to the Icache or Icache refill buffer and validated. - Can be retried if there are no multiple errors. - Must flush leache to remove bad data. The Icache refill buffer may be flushed by executing enough instructions to fill the refill buffer with new data (32 instructions). Then flush the Icache again. - EI\_STAT <UNC\_ECC\_ERR> is set; <SEO\_HRD\_ERR> is set if there are multiple errors. - EI\_STAT: <EI\_ES> is set if source of fill data is memory/system; clear if Bcache. - EI\_STAT: <FIL\_IRD> is set. - EI\_ADDR: Contains the physical address bits <39:04> of the octaword associated with the error. - FILL SYN: Contains syndrome bits associated with the failing octaword This register contains byte parity error status if in parity mode. - BC\_TAG\_ADDR: Holds results of external cache tag probe if external cache was enabled for this transaction. | and the second s | Note | | | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|---------------|-----------| | If the Istream ECC or parity erro | r occurs early i | n the PALcod | e routine | | at the machine check entry point, | an infinite loo | p may result. | | Recommendation: On data ECC/parity errors, it may be feasible for the operating system to "flush" the block of data out of the Bcache by requesting a block of data with the same Bcache index, but a different tag. If the requested block is loaded with no problems, then the "bad data" has been replaced. If the "bad data" is marked dirty, then when the new data tries to replace the old data, another ECC/parity error may result during the write-back (this is a reason not to attempt this in PALcode, because a MCHK from PALcode is always fatal). ## 8.1.9 Dstream Uncorrectable ECC or Data Parity Errors (Bcache or Memory) - Machine check occurs. Machine state may have changed. - Cannot be retried, but may only need to delete the process if data is confined to a single process and no second error occurred. - EI\_STAT: «UNC\_ECC\_ERR» is set; <SEO\_HRD\_ERR» is set if there are multiple errors - EI\_STAT: <EL ES> is set if source of fill data is memory/system, is clear if Bcache. - EI\_STAT: <FIL\_IRD> is clear. - EL ADDR: Contains the physical address bits <39:04> of the octaword associated with the error. - FILL\_SYN: Contains syndrome bits associated with the failing octaword. This register contains byte parity error status if in parity mode. BC\_TAG\_ADDR: Holds results of external cache tag probe if external cache was enabled for this transaction. ## 8.1.10 Bcache Tag Parity Errors—Istream - Machine check occurs before the instruction causing the error is executed. - Bad data may be written to the Icache or Icache refill buffer and validated. - Can be retried if there are no multiple errors. - Must flush Icache to remove bad data. The Icache refill buffer may be flushed by executing enough instructions to fill the refill buffer with new data (32 instructions). Then flush the Icache again. - EI\_STAT: <BC\_TPERR> or <BC\_TC\_PERR> is set; <SEO\_HRD\_ERR> is set if there are multiple errors. - EI\_STAT: <EI\_ES> is clear. - EI STAT: <FIL IRD> is set. - EI\_ADDR: Contains the physical address bits <39:04> of the octaword associated with the error. - BC\_TAG\_ADDR: Holds results of external cache tag probe. | | ······································ | | |----------------------------------------|----------------------------------------|------------------------------------| | | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | | | ************************************** | - W. W. W. | | | ml . D l . L | | the tag alone, not the parity bit. | | The Brache hit is de | terminen nasen nr | i the tag ainne, not the namty nit | The Bcache hit is determined based on the tag alone, not the parity bit. The victim is processed according to the status bits in the tag, ignoring the control field parity. PALcode can distinguish fatal from nonfatal occurrences by checking for the case in which a potentially dirty block is replaced without the victim being properly written back and the case of false hit when the tag parity is incorrect. ## 8.1.11 Bcache Tag Parity Errors—Dstream - Machine check occurs. Machine state may have changed. - Cannot be retried, but may only need to delete the process if data is confined to a single process and no second error occurred. Beache hit is determined based on the tag alone, not the parity bit. The victim is processed according to the status bits in the tag, ignoring the control field parity. PALcode can distinguish fatal from nonfatal occurrences by checking for the case in which a potentially dirty block is replaced without the victim being properly written back and the case of false hit when the tag parity is incorrect. - EI\_STAT: <BC\_TPERR> or <BC\_TC\_PERR> is set; <SEO\_HRD\_ERR> set if there are multiple errors. - EI\_STAT: <EI\_ES> is clear. - EI\_STAT: <FIL\_IRD> is clear. - EI\_ADDR: Contains the physical address bits <39:04> of the octaword associated with the error. - BC\_TAG\_ADDR: Holds results of external cache tag probe. ## 8.1.12 System Command/Address Parity Error - Machine check occurs. Machine state may have changed. - EI\_STAT: <EI\_PAR\_ERR> is set; <SEO\_HRD\_ERR> is set if there are multiple errors. - EI\_STAT: <EI\_ES> is set. - EI\_ADDR: Contains the physical address bits <39:04> of the octaword associated with the error. - BC\_TAG\_ADDR: Holds results of external cache tag probe if external cache was enabled for this transaction. - When the 21164 detects a command or address parity error, the command is unconditionally NOACKed. | Note | |---------------------------------------------------------------------| | | | For a syscik-to-CPU clock ratio of 3, if the 21164 detects a system | | command/address parity error on a NOP, and immediately receives a | | valid command from the system, then the 21164 may not acknowledge | | the command. The 21164 does take the machine check. | ## 8.1.13 System Read Operations of the Bcache The 21164 does not check the ECC on outgoing Bcache data. If it is bad, the receiving processor will detect it. #### 8.1.14 Istream or Dstream Correctable ECC Error (Bcache or Memory) - The 21164 hardware corrects the data before filling the Scache and Icache. The Dcache is completely invalidated. The data in the Bcache contains the ECC error, but is scrubbed by PALcode in the correctable error interrupt routine. (Using LDxL, STxC. If the STxC fails, the location can be assumed to be scrubbed.) - A separately maskable correctable error interrupt occurs at IPL 31 (same as machine check). (Masked by clearing ICSR<CRDE>.) - ISR: <CRD> is set. - EI STAT: <COR ECC ERR> is set. - EI\_STAT: <FIL\_IRD> is set if Istream; is clear if Dstream. - EI\_STAT: <EI\_ES> is clear if source of error is Beache, is set otherwise. - EI\_ADDR: Contains the physical address bits <39:04> of the octaword associated with the error. - FILL\_SYN: Contains syndrome bits associated with the octaword containing the ECC error. - BC\_TAG\_ADDR: Unpredictable (not loaded on correctable errors). | | | | Note | | | | |----------|-------------|------------|-------------|--------------|-------------|------------| | | | | | | | | | There w | ill be per | formance | degradation | n in systen | ns when ex | ctremely | | high rat | es of corr | ectable E0 | CC errors a | re present | due to the | e internal | | handling | g of this e | rror (the | implementa | ation utiliz | es a replay | trap and | | automat | ic Deache | flush to r | revent use | of the inc | orrect data | i). | | | | ****** | | | | | ## 8.1.15 Fill Timeout (FILL\_ERROR\_H) For systems in which fill timeout can occur, the system environment should detect fill timeout and cleanly terminate the reference to 21164. If the system environment expects fill timeout to occur, it should detect them. If it does not expect them (as might be true in small systems with fixed memory access timing), it is likely that the internal Ibox timeout will eventually detect a stall if a fill fails to occur. To properly terminate a fill in an error case, the fill\_error\_h pin is asserted for one cycle and the normal fill sequence involving the fill\_h, fill\_id\_h, and dack\_h pins is generated by the system environment. A fill\_error\_h assertion forces a PALcode trap to the MCHK entry point, but has no other effect. Note No internal status is saved to show that this happened. If necessary, systems must save this status, and include read operations of the appropriate status registers in the MCHK PALcode. #### 8.1.16 System Machine Check - The 21164 has a maskable machine check interrupt input pin. It is used by system environments to signal fatal errors that are not directly connected to a read access from the 21164. It is masked at IPL 31 and anytime the 21164 is in PALmode. - ISR: <MCK> is set. #### 8.1.17 Ibox Timeout - When the Ibox detects a timeout, it causes a PALcode trap to the MCHK entry point. - Simultaneously, a partial internal reset occurs: most states except IPR state is reset. This should not be depended on by systems in which fill timeouts occur in typical use (such as, operating system or console code probing locations to determine if certain hardware is present). The purpose of this error detection mechanism is to attempt to prevent system hang in order to write a machine check stack frame. - ICPERR\_STAT: <TMR> is set. ## 8.1.18 cfail\_h and Not cack\_h - Assertion of **cfail\_h** in a sysclk cycle in which **cack\_h** is not asserted causes the 21164 to immediately execute a partial internal reset. - PALcode trap to the MCHK entry point. - Simultaneously, a partial internal reset occurs: most states except IPR state is reset. - ICPERR\_STAT: <TMR> is set. This can be used to restore 21164 and the external environment to a consistent state after the external environment detects a command or address parity error. | There is no internal status saved | to differentiate | the <b>cfail_h</b> / | no <b>cack_h</b> | |-----------------------------------|------------------|----------------------|------------------| | case from the timeout reset case. | | | | | status, and include read operatio | ns of the appro | priate status | registers | | in the MCHK PALcode. | | | | #### 8.2 MCHK Flow The following flow is the recommended IPR access order to determine the source of a machine check. - Must flush Icache to remove bad data on Istream errors. The Icache refill buffer may be flushed by executing enough instructions to fill the refill buffer with new data (32 instructions). Then flush the Icache again. - Read EXC\_ADDR. - If EXC\_ADDR=PAL, then halt. - Issue MB to clear out Mbox/Cbox before reading Cbox registers or issuing DC\_FLUSH. - Flush Dcache to remove bad data on Dstream errors. - · Read ICSR. - Read ICPERR\_STAT. - Read DCPERR\_STAT. - Read SC\_ADDR. - Use register dependencies or MB to ensure read operation of SC\_ADDR finishes before subsequent read operation of SC\_STAT. - Read SC\_STAT (unlocks SC\_ADDR). - Read EI\_ADDR, BC\_TAG\_ADDR, and FILL\_SYN. - Use register dependencies or MB to ensure read operations of EI\_ADDR, BC\_TAG\_ADDR, and FILL\_SYN finish before subsequent read operation of EI\_STAT. - Read EI STAT and save (unlocks EI ADDR, BC TAG ADDR, FILL SYN). - Read EI\_STAT again to be sure it is unlocked, discard result. - Check for cases that cannot be retried. If any one of the following are true, then skip retry: - EI\_STAT<TPERR> - EI\_STAT<TC\_PERR> - EI\_STAT<EI\_PAR\_ERR> - EI\_STAT<SEO\_HRD\_ERR> - EI\_STAT<UNC\_ECC\_ERR> and not EI\_STAT<FIL\_IRD> - DCPERR\_STAT<LOCK> - SC\_STAT<SC\_SCND\_ERR> - SC\_STAT<SC\_TPERR> - Not (SC\_STAT<CMD> = IRD) and SC\_STAT<SC\_DPERR> - ICPERR\_STAT<TMR> - ISR<MCK> - If none of the previous conditions are true, then there is either an IRD that can be retried or the source of the MCHK is a fill\_error\_h. Add code for query of system status. - The case can be retried if any one or several of the following are true (and none of the previous conditions were true): - EI\_STAT<UNC\_ECC\_ERR> and EI\_STAT<FIL\_IRD> - SC\_STAT<SC\_DPERR> and (SC\_STAT<CMD> = IRD) - ICPERR\_STAT<TPE> - ICPERR\_STAT<DPE> - Unlock the following IPRs: - ICPERR\_STAT (write 0x1800) - DCPERR STAT (write 0x03) - VA, SC\_STAT, and EI\_STAT are already unlocked. - Check for arithmetic exceptions: - Read EXC\_SUM. - Check for arithmetic errors and handle according to operating-systemspecific requirements. - Clear EXC\_SUM (unlocks EXC\_MASK). - Report the processor-uncorrectable MCHK according to operating systemspecific requirements. ## 8.3 Processor-Correctable Error Interrupt Flow (IPL 31) The following flow is the recommended way to report correctable errors: - Arrived here through interrupt routine because ISR < CRD > bit set. - Read EI\_ADDR and FILL\_SYN. - Use register dependencies or MB to ensure read operations of EI\_ADDR and FILL\_SYN finish before subsequent read operation of EI\_STAT. - Read EI\_STAT. (Unlocks EI\_STAT, EI\_ADDR, and FILL\_SYN.) - Scrub the memory location by using LDQ\_L/STQ\_C to one of the quadwords in each octaword of the Bcache block whose address is reported in EI\_ ADDR. No need to scrub I/O space addresses as these are noncacheable. - ACK the CRD Interrupt by writing a "0" to HWINT\_CLR<CRDC>. - No need to unlock any registers because conditions that would cause a lock would also cause a MCHK. VA will not be locked because DTB\_MISS and FAULT PALcode routines will not ever be interrupted. - Report the processor-correctable MCHK according to operating-systemspecific requirements. | Only read ELSTAT once in the CRD flow, and then only if ISR <crd></crd> | |--------------------------------------------------------------------------| | is set. If an uncorrectable error were to occur just after a second read | | operation from EI_STAT was issued, then there could be a race between | | the unlocking of the register and the loading of the new error status, | | potentially resulting in the loss of the error status. | Note ## 8.4 MCK\_INTERRUPT Flow - Arrived here through interrupt routine because ISR<MCK> bit set. - Report the system-uncorrectable MCHK according to operating-systemspecific requirements. ## 8.5 System-Correctable Error Interrupt Flow (IPL 20) The system-correctable error interrupt is system specific. ## **Electrical Data** This chapter describes the electrical characteristics of the 21164 component and its interface pins. It is organized as follows: - Electrical characteristics - dc characteristics - ac characteristics - Power supply considerations #### 9.1 Electrical Characteristics Table 9-1 lists the maximum ratings for the 21164. Table 9-1 Alpha 21164 Absolute Maximum Ratings | Characteristics | Ratings | |------------------------------------------------|---------------------------------| | Storage temperature | -55°C to 125°C (-67°F to 257°F) | | Junction temperature | 15°C to 85°C (59°F to 185°F) | | Supply voltage | Vss -0.5 V, Vdd 3.6 V | | Input or output applied | 3.3 V to 5.5 V | | Maximum power @Vdd=3.45 V<br>Frequency=FBD MHz | TBD W typical<br>TBD W maximum | #### Caution . Stress beyond the absolute maximum rating can cause permanent damage to the 21164. Exposure to absolute maximum rating conditions for extended periods of time can affect the 21164 reliability. #### 9.2 dc Characteristics The 21164 is designed to run in a CMOS/TTL environment. The 21164 is tested and characterized in a CMOS environment. #### 9.2.1 Power Supply The Vss pins are connected to 0.0 V, and the Vdd pins are connected to 3.3 V, ±5%. #### 9.2.2 Input Signal Pins Nearly all input signals are ordinary CMOS inputs with standard TTL levels (see Table 9-2). (See Section 9.3.2 for a description of an exception osc\_clk\_in\_h,l.) #### 9.2.3 Output Signal Pins Output pins are ordinary 3.3-V CMOS outputs. Although output signals are rail-to-rail, timing is specified to standard TTL levels. Bidirectional pins are either input or output pins depending on control timing. When functioning as output pins, they are ordinary 3.3-V CMOS outputs. After power has been applied, input and bidirectional pins can be driven to a maximum dc voltage of 6.3 V 6.8 V for 1 ns) without harming the 21164. (It is not necessary to use static RAMs with 3.3-V outputs.) Table 9-2 shows the CMOS dc input and output pins. Table 9-2 CMOS DC Characteristics | | Parameter | Requ | irements | | | |-----------------|---------------------------|-----------|-----------|-------|-------------------------------------------| | Symbol | Description | Min. | Max. | Units | Test Conditions | | | | TTL Input | s/Outputs | | | | Vih | High-level input voltage | 2.0 | | V | _ | | Vil | Low-level input voltage | _ | 0.8 | V | | | V <sub>oh</sub> | High-level output voltage | 2.4 | - | V | $I_{oh} = -8.0 \text{ mA}$ | | $V_{ol}$ | Low-level output voltage | | 0.4 | V | $I_{ol} = 12.0 \text{ mA}$ | | | | Power/L | .eakage | | | | $I_{cin}$ | Clock input leakage | -50 | 50 | μA | $-0.5 \text{ V} < V_{in} < 5.5 \text{ V}$ | Most pins have low current pull-down devices. On most pins the pull-down is to Vss. However, two pins have the bleeder to pull up to Vdd. The bleeders are always enabled, even when a pin is in the high-impedance state. This means that some current will flow from the 21164 (if the pin has a pull-up bleeder) or into the 21164 (if the pin has a pull-down device) even when the pin is driven to the high-impedance state. The pull-up sources 150 $\mu$ A max from Vdd through the signal pin when the pin is at 2.4 V. The pull-down device sinks at least 10 $\mu$ A from the signal pin to Vss when the pin is at 0.4 V. All pins have pull-down devices, except for the pins in the following table: | Signal Name | Notes | | |--------------|---------------------------------------------------------------|--| | tms_h | Has pull-up bleeder | | | tdi_h | Has pull-up bleeder | | | osc_clk_in_h | 50 $\Omega$ to $V_{term}$ ( $pprox \frac{V_{AA}}{2}$ ) | | | osc_clk_in_l | 50 $\Omega$ to $V_{\text{term}} ( \approx \frac{V_{dd}}{2} )$ | | | temp_sense | 150 $\Omega$ to $V_{ss}$ | | The temp\_sense pin must be left unconnected by the user. #### 9.3 ac Characteristics This section describes the ac timing specifications for the 21164. Timing parameters are given for the nominal speed 21164 operating at an internal frequency of 294 MHz (3.4 ns). #### 9.3.1 Clocking Scheme The differential input clock signals osc\_clk\_in\_h,l run at two times the internal frequency of the time base for the 21164. Input clocks are divided by two on chip to generate a 50% duty cycle clock for internal distribution. Signals osc\_clk\_in\_h,l are delayed by some propagation delay and have no relation to output signal cpu\_clk\_out\_h. System designers have a choice of two system clocking schemes to run the 21164 synchronous to the system: The 21164 generates and drives out a system clock, sys\_clk\_out1\_h,l. It runs synchronous to the internal clock at a selected ratio of the internal clock frequency. There is a small clock skew between the internal clock and sys\_clk\_out1\_h,l. The 21164 synchronizes to a system clock, ref\_clk\_in\_h, supplied by the system. The ref\_clk\_in\_h clock runs at a selected ratio of the 21164 internal clock frequency. The reference clock is synchronized to the internal clock by an on-chip digital phase-locked loop (DPLL). Refer to Section 4.2 for more information on clock functions. #### 9.3.2 Input Clocks The differential input clocks osc\_clk\_in\_h,l provide the time base for the chip when dc\_ok\_h is asserted. These pins are self-biasing, and must be capacitively coupled to the clock source on the module, or they can be directly driven. The terminations on these signals are designed to be compatible with system oscillators of arbitrary dc bias. The oscillator must have a duty cycle of 60%/40% or tighter. Figure 9–1 shows the input network and the schematic equivalent of osc\_clk\_in\_h,l terminations. Figure 9-1 osc clk in h,l Input Network and Terminations The clock outputs follow the internal ring oscillator when the 21164 is running off the oscillator, just as they would when an external clock is applied. The frequency of the ring oscillator varies from chip to chip within a range of 10 MHz to 100 MHz. This corresponds to an internal CPU clock frequency range of 5 MHz to 50 MHz. When signal dc\_ok\_h is deasserted, the system clock divisor is forced to 8, and the sys\_clk\_out2 delay is forced to 3. A special on-chip circuit monitors the osc\_clk\_in pins and detects when input clocks are not present. When activated, this circuit switches the 21164 clock generator from the osc\_clk\_in pins to the internal ring oscillator. This happens independently of the state of the dc\_ok\_h pin. The dc\_ok\_h pin functions normally if clocks are present on the osc\_clk\_in pins. #### 9.3.2.1 Clock Termination and Impedance Levels In Figure 9–1, the clock is designed to approximate a 50-17 termination for the purpose of impedance matching for those systems that drive input clocks across long traces. The clock input pins appear as a 50-17 series termination resistor connected to a high impedance voltage source. The voltage source produces a nominal voltage value of Vdd/2. The source has an impedance of a few thousand ohms. This voltage is called the self-bias voltage and sources current when the applied voltage at the clock input pins is less than the self-bias voltage. It sinks current when the applied voltage exceeds the self-bias voltage. This high impedance bias driver allows a clock source of arbitrary dc bias to be ac coupled to the 21164. The peak-to-peak amplitude of the clock source must be between 0.6 V and 3.0 V. Either a square-wave or a sinusoidal source may be used. Full-rail clocks may be driven by testers. In any case, the oscillator should be ac coupled to the osc\_clk\_in\_h,l inputs by 47 pF through 220 pF capacitors. #### 9.3.2.2 ac Coupling Using series coupling (blocking) capacitors renders the 21164 clock input pins insensitive to the oscillator's de level. When connected this way, oscillators with any dc offset relative to Vss can be used provided they can drive a signal into the osc\_clk\_in\_h,l pins with a peak-to-peak level of at least 600 mV, but no greater than 3.0 V peak to peak. The value of the coupling capacitor is not overly critical. However, it should be sufficiently low impedance at the clock frequency so that the oscillator's output signal (when measured at the osc\_clk\_in\_h,l pins) is not attenuated below the 600 mV peak-to-peak lower limit. For sine waves or oscillators producing nearly sinusoidal (pseudo square wave) outputs, 220 pF is recommended at 250 MHz. A high quality dielectric such as NPO is required to avoid dielectric losses. Table 9-3 shows the input clock specification. Table 9-3 Input Clock Specification | Signal Parameter | Nominal Bin <sup>1</sup> | Unit | |--------------------------------|--------------------------|------------------| | osc_clk_in_h,l symmetry | 50 ± 10 | % | | osc_clk_in_h,l minimum voltage | 0.6 | V (peak-to-peak) | | osc_clk_in_h,l Z input | 50 | $\boldsymbol{a}$ | | | 3333 | | <sup>1</sup>Minimum clock frequency = 10.0 MHz (if lower, then ring oscillator cuts in) Maximum clock frequency = TBD MHz #### 9.3.3 Signal Characteristics All 21164 input signals are TTL compatible with the exception of the osc\_clk\_in\_h,l signals (see Table 9-3). All output signals are TTL compatible. #### 9.3.4 Backup Cache Loop Timing The 21164 can be configured to support an optional off-chip backup cache (Bcache). Private Bcache read or write (Scache victims) transactions initiated by the 21164 are independent of the system clocking scheme. Bcache loop timing must be an integer multiple of the 21164 cycle time. Table 9-4 lists the Bcache loop timing. Table 9-4 Bcache Loop Timing | Signal | Specification | Value | Name | |---------------------------|------------------|-------------------------------------------------|------| | data_h<127:0> | Input setup | 1.1 ns | Tdsu | | data_h<127:0> | Input hold | 0.0 ns | Tdh | | index_h <b>&lt;25</b> :4> | Output delay | $\mathbf{Tdd} + 0.4 \text{ ns}^1$ | Tiod | | index_h<25:4> | Output hold time | Tmdd | Tioh | | data_h<127:0> | Output delay | <b>Tdd</b> + <b>Tcycle</b> + $0.4 \text{ ns}^1$ | Tdod | | data_h<127:0> | Output hold | Tmdd + Tcycle | Tdoh | <sup>1</sup>The value 0.4 ns accounts for on-chip driver delay and clock skew. Outgoing Beache index and data signals are driven off the internal clock edge and the incoming Beache tag and data signals are latched on the same internal clock edge. Table 9–5 shows the output driver characteristics. **Table 9–5 Output Driver Characteristics** | Specification | 40 pF Load | 10 pF Load Name | | |----------------------|------------|-----------------|--| | Maximum driver delay | 2.6 ns | | | | Minimum driver delay | 1.0 ns | 1.0 ns Tmdd | | Output pin timing is specified for lumped 40-pF and 10 pF loads. In some cases the circuit may have loads higher than 40 pF. The 21164 can safely drive higher loads provided the average charging or discharging current from each pin is 10 mA or less. The following equation can be used to determine the maximum capacitance that can be safely driven by each pin: $C_{max}$ (in pF) = 3t, where t is the waveform period (measured from rising to rising or falling to falling edge), in nanoseconds. For example, if the waveform appearing on a given I/O pin has a 20.4-ns period, it can safely drive up to and including 61 pF. Figure 9-2 shows the Bcache read and write timing. #### 9.3.4.1 sys clk-Based Systems Table 9-6 shows 21164 system clock sys\_clk\_out1\_h,l output timing. All timing is shown in conjunction with the rising edge of the internal CPU clock. This allows the setup and hold times to be specified independent of the relative capacitive loading of sys\_clk\_out1\_h,l, addr\_h<39:4>, data\_h<127:0>, and cmd\_h<3:0> signals. The ref\_clk\_in\_h signal must be tied to Vdd for proper operation. Table 9–6 Alpha 21164 System Clock Output Timing (sysclk=T<sub>Ø</sub>) ♠ | Signal | Specification | Value | Name | |---------------------------------------------------|----------------------|-------------------------------------------|-------------------| | sys_clk_out1 | Output delay | Tdd | Tsysd | | sys_clk_out1_h,l | Minimum output delay | Tmdd | Tsysdm | | data_bus_req_h,<br>data_h<127:0>,<br>addr_h<39:4> | Input setup | 1.1 ns | Tdsu | | data_bus_req_h,<br>data_h<127:0>,<br>addr_h<39:4> | Input hold | 0 ns | Tdh | | addr_h<39:4> | Output delay | <b>Tdd</b> + 0.4 ms <sup>1</sup> | Taod | | addr_h<39:4> | Output hold time | Tmdd | Taoh | | data_h<127:0> | Output delay | <b>Tdd + Tcycle</b> + 0.4 ns <sup>1</sup> | $\mathbf{Tdod}^2$ | | data_h<127:0> | Output hold time | Tmdd + Tcycle <sup>1</sup> | Tdoh <sup>2</sup> | | | Non-Turbo I | lode | | | addr_bus_req_h | Input setup | 3.8 ns | Tabrsu | | addr_bus_req_h | Input hold | _1.0 ns | Tabrh | | dack_h | Input setup | 3.4 ns | Tntacksu | | cack_h | Input setup | 3.7 ns | Tntcacks | | cack, dack | Input hold | -1.0 ns | Tntackh | | | Turbo Mod | le <sup>3</sup> | | | addr_bus_req_h,<br>cack_h, dack_h | Input setup | 1.1 ns | Ttacksu | | addr_bus_req_h,<br>cack_h, daek_h | Input hold | 0 ns | Ttackh | <sup>&</sup>lt;sup>1</sup>The value 0.4 ns accounts for on-chip driver delay and clock skew. Figure 9-3 shows sys\_clk system timing. <sup>&</sup>lt;sup>2</sup>For all write transactions initiated by the 21164, data is driven one CPU cycle later. <sup>&</sup>lt;sup>2</sup>In turbo mode, control signals are piped on chip for one sys\_clk\_out1\_h,l before usage. Figure 9-3 sys\_clk System Timing #### 9.3.4.2 Reference Clocks cack h Systems that generate their own system clock expect the 21164 to synchronize its sys\_clk\_out1\_h,l outputs to their system clock. The 21164 uses a digital phase-locked loop (DPLL) to synchronize its sys\_clk\_out1 signals to the system clock that is applied to the ref\_clk\_in\_h signal. The DPLL scheme requires the internal CPU clock to run slightly faster than the clock that is applied to the ref\_clk\_in\_h signal. Phase locking is accomplished as follows. The internal CPU clock is forced to stall for one phase whenever the rising edge of ref\_clk\_in\_h occurs just before the rising edge of the internal CPU clock that triggers the rising edge of sys\_clk\_out1\_h. Table 9-7 shows all timing in conjunction with the rising edge of ref\_clk\_in\_h. Table 9-7 Alpha 21164 Reference Clock Input Timing | Signal | Specification | Value | Name | |---------------------------------------------------|------------------|------------------------------------------------------------------|--------------------| | data_bus_req_h,<br>data_h<127:0>,<br>addr_h<39:4> | Input setup | 1.1 ns | Tdsu | | data_bus_req_h,<br>data_h<127:0>,<br>addr_h<39:4> | Input held | 0.5 x Tcycle | Tsdadh | | addr_h<39:4> | Output delay | $\mathbf{Tdd} + 0.5 \times \mathbf{Tcycle} + 0.9 \text{ ns}^{1}$ | Traod | | addr_h<39:4> | Output hold time | Tmdd | Traoh | | data_h<127:0> | Output delay | $\mathbf{Tdd} + 1.5 \times \mathbf{Tcycle} + 0.9 \text{ ns}^1$ | $\mathbf{Trdod}^2$ | | data_h<127:0> | Output hold time | Tmdd + Tcycle | $Trdoh^2$ | | | Non-T | urbo Mode | | | addr_bus_req_h | Input setup | 3.8 ns | Tntrabrsu | | addr_bus_req_h | Input hold | 0.5 x <b>Tcycle</b> | Tntrabrh | | dack_h | Input setup | 3.3 ns | Tntracksu | The value 0.9 ns accounts for on-chip skews that include 0.4 ns for driver delay and clock skew, and phase detector skews due to circuit delay (0.2 ns) and delay in ref\_clk\_in\_h due to the package (0.3 ns). 3.7 ns Input setup (continued on next page) **Tntrcacksu** <sup>&</sup>lt;sup>2</sup>For all write transactions initiated by the 21164, data is driven one CPU cycle later. Table 9-7 (Cont.) Alpha 21164 Reference Clock Input Timing | Signal | Specification | Value | Name | |-----------------------------------|---------------|------------------------|----------| | | No | n-Turbo Mode | | | cack_h, dack_h | Input hold | (0.5 x <b>Tcycle</b> ) | Tntrackh | | Turbo Mode <sup>3</sup> | | | | | addr_bus_req_h,<br>cack_h, dack_h | Input setup | 1.1 ns | Ttracksu | | addr_bus_req_h,<br>cack_h, dack_h | Input hold | 0.5 x Tcycle | Ttrackh | <sup>&</sup>lt;sup>3</sup>In turbo mode, control signals are piped on chip for one sys\_clk\_out1\_h,l before usage. #### 9.3.4.3 Digital Phase Locked Loop Figure 9-4 and Table 9-8 describe the digital phase-locked loop (DPLL) stages of operation. Figure 9-4 ref\_clk System Timing Relationship of CPU Clock, ref\_clk\_in, and sys\_clk\_out1 Table 9-8 ref\_clk System Timing Stages | Stage | Description | |-------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | 0 | The internal CPU clock rising edge coincides with the rising edge of ref_clk_in_h. | | 2 | The DPLL causes the internal CPU clock to stretch for one phase (1 cycle of osc_clk_in_h,l). | | 3 | The stretch causes ref_clk_in_h to lead the internal CPU clock by one phase. | | • | The CPU clock is always slightly faster than the external ref_clk_in_h and gains on ref_clk_in_h over time. Eventually the gain equals one phase and a new stretch phase follows. | Although systems that supply a ref\_clk\_in\_h do not use sys\_clk\_out1\_h,l, a relationship between the two signals exists, just as in the sys\_clk-based systems, because the 21164 uses sys\_clk\_out1\_h,l internally to determine timing during system transactions. #### 9.3.4.4 Timing—Additional Signals This section lists timing for all other signals. #### **Asynchronous Input Signals** The following is a list of the asynchronous input signals: | ref_clk_in_h | sys_reset_l | perf_mon_h | |--------------|----------------------|---------------| | clk_mode_h | dc_ok_h | irq_h | | sys_mch_chk_ | irq_h pwr_fail_irq_h | mch_hlt_irq_h | #### Miscellaneous Signals Table 9-9 and Table 9-10 list the timing for miscellaneous input-only and output-only signals. All timing is expressed in nanoseconds. Table 9–9 Input Timing for sys\_clk\_out- or ref\_clk\_in-Based Systems | | | Value | Name | |---------------------------------------------------------------------------------------------------|---------------|-----------------------------|-----------------------------------| | Signal | Specification | sys_clk_out ref_clk_in | sys_clk_out_ref_clk_in | | cfail_h, fill_h, fill_error_h, fill_id_h, fill_nocheck_h, idle_bc_h, shared_h, system_lock_flag_h | Input setup | 1.1 ns 1.1 ns | T <sub>dsu</sub> T <sub>dsu</sub> | | Testability pins: port_mode_h, srom_data_h, srom_present_l | | | | | cfail_h, fill_h, fill_error_h, fill_id_h, fill_nocheck_h, idle_bc_h, shared_h, system_lock_flag_h | Input hold | 0 ns 0.5*T <sub>eycle</sub> | $T_{ m dh}$ $T_{ m sdadh}$ | | Testability pins: port_mode_h, srom_data_h, srom_present_l | | | • | Table 9-10 Output Timing for sys\_clk\_out- or ref\_clk\_in-Based Systems | | | Cłocking | System Value | Clocking Sy | stem Name | |---------------------------------------------------------------------------|---------------|---------------------------------------------|------------------------------------|----------------|-------------------| | Signal | Specification | sys_cik_out | ref_clk_in | sys_clk_out | ref_clk_in | | Unidirectional Signals | | | | | | | addr_res_h,<br>int4_valid_h,1 | Output delay | T <sub>dd</sub> +0.4 ns | $T_{dd}$ +0.5* $T_{cycle}$ +0.9 ns | $T_{aod}$ | T <sub>raod</sub> | | scache_set_h,<br>srom_clk_h,<br>srom_oe_l,<br>victim_pending_h | | | | | | | addr_res_h,<br>int4_valid_h, <sup>1</sup><br>scache_set_h,<br>srom_clk_h, | Output hold | T <sub>mdd</sub> | ${ m T_{mdd}}$ | ${ m T_{aoh}}$ | $T_{\rm raoh}$ | | srom_oe_l,<br>victim_pending_h<br>int4_valid_h <sup>2</sup> | Output delay | T <sub>dd</sub> +T <sub>cycle</sub> +0.4 ns | $T_{dd}$ +1.5* $T_{cycle}$ +0.9 ns | ${ m T_{dod}}$ | $T_{ m rdod}$ | | int4_valid_h <sup>2</sup> | Output hold | $T_{mdd} + T_{cycle}$ | $T_{mdd}+T_{cycle}$ | $T_{ m doh}$ | T <sub>rdoh</sub> | <sup>1</sup>Read transaction <sup>2</sup>Write transaction (continued on next page) Table 9–10 (Cont.) Output Timing for sys\_clk\_out- or ref\_clk\_in-Based Systems | | | Clocking | System Value | Clocking S | ystem Name | |-------------------------------------------------------------------------------------------------------------------------------------------------|---------------|---------------------------|------------------------------------------------|-----------------------------|------------------| | Signal | Specification | sys_clk_out | ref_clk_in | sys_clk_out | ref_clk_in | | Bidirectional Signals | | • | | | | | Input mode: | | | | | | | addr_cmd_par_h, | Input setup | 1.1 ns | 1.1 ns | $\mathbf{T}_{\mathbf{dsu}}$ | T <sub>dsu</sub> | | cmd_h,<br>data_check_h, <sup>1</sup><br>tag_ctl_par_h, <sup>3</sup><br>tag_dirty_h, <sup>3</sup><br>tag_shared_h <sup>3</sup> | | | | | | | addr_cmd_par_h, | Input hold | 0 ns | 0.5*T <sub>cycle</sub> | T <sub>dh</sub> | $T_{sdadh}$ | | cmd_h,<br>data_check_h, <sup>1</sup><br>tag_ctl_par_h, <sup>3</sup><br>tag_dirty_h, <sup>3</sup><br>tag_shared_h <sup>3</sup> | | | | | | | Output mode: | | | | | | | addr_cmd_par_h,<br>cmd_h,<br>tag_ctl_par_h, <sup>4</sup><br>tag_dirty_h, <sup>4</sup><br>tag_shared_h, <sup>4</sup><br>tag_valid_h <sup>4</sup> | Output delay | T <sub>dd</sub> +0.4 ns | T <sub>dd</sub> +0.5*T <sub>cycle</sub> +0.9 r | is ${ m T_{aod}}$ | $T_{raod}$ | | data_check_h <sup>2</sup> | Output delay | $T_{dd}+T_{cycle}+0.4$ ns | T <sub>dd</sub> +1.5*T <sub>cycle</sub> +0.9 r | is $T_{ m dod}$ | $T_{rdod}$ | | addr_cmd_par_h,<br>cmd_h,<br>tag_ctl_par_h, <sup>4</sup><br>tag_dirty_h, <sup>4</sup><br>tag_shared_h, <sup>4</sup><br>tag_valid_h <sup>4</sup> | Output hold | T <sub>mdd</sub> | $T_{ m mdd}$ | ${ m T_{aoh}}$ | $T_{raoh}$ | | data_check_h <sup>2</sup> | Output hold | $T_{mdd} + T_{cycle}$ | $T_{mdd} + T_{cycle}$ | $T_{doh}$ | $T_{rdoh}$ | | <sup>1</sup> Read transaction | | | | | | Signals in Table 9-11 are used to control Bcache data transfers. These signals are driven off the CPU clock. The choice of sys\_clk\_out or ref\_clk\_in has no impact on the timing of these signals. <sup>&</sup>lt;sup>2</sup>Write transaction <sup>&</sup>lt;sup>3</sup>Fills from memory <sup>&</sup>lt;sup>4</sup>Only for write broadcasts and system transactions Table 9-11 Bcache Control Signal Timing | Signal | Specification | Value | Name | |---------------------------------------------------------------|---------------|----------------------|--------------------| | Input mode: | | | | | tag_data_h, tag_data_par_h,<br>tag_valid_h | Input setup | 1.1 ns | $\mathbf{T_{dsu}}$ | | tag_data_h, tag_data_par_h,<br>tag_valid_h | Input hold | 0 ns | T <sub>db</sub> | | Output mode: | | | | | data_ram_oe_h, data_ram_we_h,¹<br>tag_ram_oe_h, tag_ram_we_h¹ | Output delay | $ m T_{dd} + 0.4~ns$ | T <sub>aod</sub> | | tag_data_h, tag_data_par_h,<br>tag_valid_h | Output delay | $T_{ m dd}$ +0.4 ns | $T_{aod}$ | | data_ram_oe_h, data_ram_we_h,¹<br>tag_ram_oe_h, tag_ram_we_h¹ | Output hold | $T_{mdd}$ | $T_{aoh}$ | | tag_data_h, tag_data_par_h,<br>tag_valid_h | Output hold | T <sub>mdd</sub> | $\mathbf{T_{aoh}}$ | <sup>&</sup>lt;sup>1</sup>Pulse width for this signal is controlled through the BC\_CONFIG IPR. #### 9.3.5 Clock Test Modes This section describes the 21164 clock test modes. #### 9.3.5.1 Normal Mode When the clk\_mode\_h<1:0> signals are not asserted, the osc\_clk\_in\_h,l frequency is divided by 2. This is the normal operational mode of the clock circuitry. #### 9.3.5.2 Chip Test Mode To lower the maximum frequency that the chip manufacturing tester is required to supply, a divide-by-1 mode has been designed into the clock generator circuitry. When the clk\_mode\_h<0> signal is asserted and clk\_mode\_h<1> is not asserted, the clock frequency that is applied to the input clock signals osc\_clk\_in\_h,l bypasses the clock divider and is sent to the chip clock driver. This allows the chip internal circuitry to be tested at full speed with a one-half frequency (up to 294 Mhz) osc\_clk\_in\_h,l. #### 9.3.5.3 Module Test Mode When the clk\_mode\_h<0> signal is not asserted and clk\_mode\_h<1> is asserted, the clock frequency that is applied to the input clock signals osc\_clk\_in\_h,l is divided by 4 and is sent to the chip clock driver. The digital phase-locked loop (DPLL) continues to keep the on-chip sys\_clk\_out1\_h,l locked to ref\_clk\_in\_h within the normal limits if a ref\_clk\_in\_h signal is applied (0 ns to 1 osc\_clk\_in\_h,l cycle after ref\_clk\_in\_h). #### 9.3.5.4 Clock Test Reset Mode When both the clk\_mode\_h<0> and the clk\_mode\_h<1> signals are asserted, the sys\_clk\_out generator circuit is forced to reset to a known state. This allows the chip manufacturing tester to synchronize the chip to the tester cycle. Table 9-12 lists the test modes. Table 9-12 Test Modes | Mode | clk_n | node_h<0> cik_mode_h<1> | |-------------|-------|-------------------------| | Normal | 0 | 0 | | Chip test | 1 | 0 | | Module test | 0 | | | Clock reset | 1 | 1 | #### 9.3.6 Test Configuration All input timing is specified in conjunction with the crossing of standard TTL input levels of 0.8 V and 2.0 V. Output timing is to the nominal CMOS switch point of Vdd/2. Because the speed and complexity of microprocessors has increased substantially over the years, it is necessary to change the way they are tested. Traditional assumptions that all loads can be lumped into some accumulation of capacitance cannot be employed any more. Rather, the model of a transmission line with discrete loads is a much more realistic approach for current test technology. Typically, printed circuit board (PCB) etch has a characteristic impedance of approximately 75 $\Omega$ . This may vary from 60 $\Omega$ to 90 $\Omega$ with tolerances. If the line is driven in the electrical center, the load could be as low as 30 $\Omega$ . Therefore, a characteristic impedance range of 30 $\Omega$ to 90 $\Omega$ could be experienced. The 21164 output drivers are designed with typical printed circuit board applications in mind rather than trying accommodate a 40-pF test load specification. As such, it "launches" a voltage step into a characteristic impedance, ranging from $30 \Omega$ to $90 \Omega$ . To prevent signal quality problems due to overshoot or ringing, "near end" terminated transmission line design rules are used. By combining the source impedance of the driver transistors with an additional 20- $\Omega$ resistor, a source impedance of approximately 40 $\Omega$ is achieved. Additionally, a load value of 10 pF, when added to the PCB etch delays, provides a realistic estimate of actual system timing. When employing this test configuration, the signal at the end of the line will transition cleanly through the TTL input specification range of 0.8 V to 2.0 V without plateaus, or reversal into the range. #### 9.3.7 IEEE 1149.1 Performance Table 9-13 lists the standard mandated performance specifications for the IEEE 1149.1 circuits. Table 9-13 IEEE 1149.1 Circuit Performance Specifications | Item | | Specification | |-------------------------------------------------------------------------|------------------------------|---------------| | trst_l is asserted asynchronously ar synchronously with respect to TBD. | id deasserted | TBD | | Maximum acceptable tck_h clock fre | eque <b>ncy</b> . | 16.6 MHz | | tdi_h/tms_h setup time (referenced | to <b>tck_h</b> rising edge) | 4 ns | | tdi_h/tms_h hold time (referenced to | tck_h rising edge) | 4 ns | | Maximum propagation delay at pin ttck_h falling edge) | tdo_h (referenced to | 14 ns | | Maximum propagation delay at syst (referenced to tck_h falling edge) | em output pins | 20 ns | ## 9.4 Power Supply Considerations For correct operation of the 21164, all of the Vss pins must be connected to ground and all of the Vdd pins must be connected to a $3.3 \text{ V} \pm 5\%$ power source. This source voltage should be guaranteed (even under transient conditions) at the 21164 pins, and not just at the PCB edge. Plus 5 V is not used in the 21164. The voltage difference between the Vdd pins and Vss pins must never be greater than 3.6 V. If the differential exceeds this limit, the 21164 chip will be damaged. #### 9.4.1 Decoupling The effectiveness of decoupling capacitors depends on the amount of inductance placed in series with them. The inductance depends both on the capacitor style (construction) and on the module design. In general, the use of small, high frequency capacitors placed close to the chip package's power and ground pins with very short module etch will give best results. Depending on the user's power supply and power supply distribution system, bulk decoupling may also be required on the module. Each individual case must be separately analyzed, but generally designers should plan to use at least 6 $\mu$ F of capacitance. Typically, 40 to 60 small, high frequency 0.1 $\mu$ F capacitors are placed near the chip's Vdd/Vss pins. Actually placing the capacitors in the pin field is the best approach. Several tens of $\mu$ F of bulk decoupling (comprised of tantalum and ceramic capacitors) should be positioned near the 21164 chip. Use capacitors that are as physically small as possible. Connect the capacitors directly to the 21164 Vdd and Vss pins (or to their own down by way of the power and ground plane) by short (0.64 cm [0.25 in] or less) surface etch. The small capacitors generally have better electrical characteristics than the larger units, and will more readily fit close to the IPGA pin field. #### 9.4.2 Power Supply Sequencing Although the 21164 uses a 3.3 V (nominal) power source, most of the other logic on the PCB probably requires a 5-V power supply. These 5-V devices can damage the 21164's I/O circuits if the 5-V power source powering the PCB logic and the Vdd supply feeding the 21164 are not sequenced correctly. | | | Caution | | |--------------------|---------------|---------------------|---------------------| | | ****** | | | | To avoid damaging | the 21164's | I/O circuits, the I | O pin voltages must | | not exceed 4 V unt | il the Vdd si | upply is at léast 3 | V or greater | | 1100 011000 | | apply is at least 5 | , or Broator. | This rule can be satisfied if the Vdd and the 5-V supplies come up together, or if the Vdd supply comes up before the 5-V supply is asserted. Bringing the lower voltage up before the higher voltage is the opposite of the way that CMOS systems with multiple power supplies of different voltages are usually sequenced, but it is required for the 21164. A three-terminal voltage regulator can be used to make 3.3-V Vdd from the 5-V supply, provided the output of the regulator (Vdd) tracks the 5-V supply with only a small offset. The requirement is that when the 5-V supply reaches 4 V, Vdd must be 3 V or higher. While the 5-V supply is below 4 V, Vdd can be less than 3 V. All 5-V sources on the 21164's I/O pins should be disabled if the power supply sequencing is such that the 5-V supply will exceed 4 V before the Vdd is at least 3 V. The 5-V sources should remain disabled until the Vdd power supply is equal to or greater than 3 V. Disabling all 5-V sources can be very difficult because there are so many possible sneak paths. Inputs, for example, on bipolar TTL logic can be a source of current, and will put a voltage across a 21164 I/O pin high enough to violate the (no higher than 4 V until there is 3 V) rule. TTL outputs are specified to drive a logic one to at least 2.4 V, but usually drive voltages much higher. CMOS logic and CMOS SRAMs usually drive "fail rail" signals that match the value of the 5-V power supply. Another concern is parallel (dc) terminations or pull-ups connected between the 21164 and the 5-V supply. The Vdd supply should be used to power parallel terminations. Disabling the non-21164 5-V outputs of PCB logic is generally possible, but raises the PCB complexity and can reduce system performance by increasing critical path timing. If the 5-V logic device has an enable pin, circuits (such as power supply supervisor chips) on the PCB can monitor the Vdd and 5-V supplies. When the supervision circuit detects that 5 V is increasing from zero while the Vdd supply is below 3 V, the power supply supervisor circuit produces a disable signal to force all PCB logic with 5-V outputs into the high impedance state. This technique will not prevent bipolar TTL inputs from acting as a 5-V source, but it can be used to disable sources such as cache RAM outputs. # **Thermal Management** This chapter describes the 21164 thermal management and thermal design considerations. ## 10.1 Thermal Specifications Sections 10.1.1 and 10.1.2 specify the 21164 operating temperature and thermal resistance. ## 10.1.1 Operating Temperature The 21164 is specified to operate when the temperature at the center of the heat sink $(T_c)$ is 82°C. Temperature $(T_c)$ should be measured at the center of the heat sink (between the two package studs). The Grafoil pad is the interface material between the package and the heat sink. #### 10.1.2 Thermal Resistance The following equations define the junction-to-ambient and junction-to-heatsink thermal resistance values: $$egin{aligned} heta_j a &= rac{(T_j - T_a)}{P} \ heta_j h s &= rac{(T_j - T_c)}{P} \ heta_j a &= heta_j h s + heta_h s a \ T_j &= T_a + P * + heta_j a \end{aligned}$$ The symbols in the previous equations are defined as follows: $\theta_{\mathbf{j}}a$ is the junction-to-ambient thermal resistance (°C/W). $\theta_{\mathbf{j}}hs$ is the junction-to-heat-sink thermal resistance (°C/W). $\theta_{\mathbf{h}}sa$ is the heat-sink-to-ambient thermal resistance (°C/W). $T_{\mathbf{j}}$ is the maximum junction temperature (°C). $T_{\mathbf{a}}$ is the ambient temperature (°C). $T_{\rm c}$ is the heat-sink temperature at a predefined location (°C). P is the power dissipation (W). Table 10–1 lists the values for the center of heat-sink-to-ambient $(\theta_{c}a)$ for the 499-pin grid array. Table 10–2 shows the allowable $T_{a}$ (without exceeding $T_{c}$ ) at various airflows. | Note | | |------------------------------------------------|---------------------| | Digital recommends using the heat sink because | it greatly improves | | the ambient temperature requirement. | | Table 10–1 $\theta_c a$ at Various Airflows | Airflow<br>(ft/min) | 100 | 200 400 600 800 1000 | | |----------------------------------------|------|--------------------------|--| | $\theta_{c}a$ with heat sink #1 (°C/W) | 2.30 | 1.30 0.70 0.53 0.45 0.41 | | | $\theta_{c}a$ with heat sink #2 (°C/W) | 1.25 | 0.75 0.48 0.40 0.35 0.32 | | | Frequency: 266 MHz | | | | Table 10-2 Maximum Ta at Various Airflows | | | ````````````````` | | | | | | |---------------------------------------|------|-------------------|------|------|------|------|--| | Airflow<br>(ft/min) | 100 | 200 | 400 | 600 | 800 | 1000 | | | T <sub>a</sub> with heat sink #1 (°C) | _ | 23.5 | 50.5 | 58.2 | 61.8 | 63.6 | | | T <sub>a</sub> with heat sink #2 (°C) | 25.8 | 48.3 | 60.4 | 64.0 | 66.3 | 67.6 | | | Frequency: 266 MHz<br>Power: 45 W | | | | | | | | ## 10.2 Heat Sink Specifications Two heat sinks are specified. Heat sink type #1 mounting holes are in line with the cooling fins. Heat sink type #2 mounting holes are rotated 90° from the cooling fins. The heat sink composition is aluminum alloy 6063. Type #1 heat sink is shown in Figure 10-1, and type #2 heat sink is shown in Figure 10-2, along with their approximate dimensions. Figure 10-1 Type #1 Heat Sink Figure 10-2 Type #2 Heat Sink ## 10.3 Thermal Design Considerations Follow these guidelines for printed circuit board (PCB) component placement: - Orient the 21164 on the PCB with the heat sink fins aligned with the airflow direction. - Avoid preheating ambient air. Place the 21164 on the PCB so that inlet air is not preheated by any other PCB components. - Do not place other high power devices in the vicinity of the 21164. - Do not restrict the airflow across the 21164 heat sink. Placement of other devices must allow for maximum system airflow in order to maximize the performance of the heat sink. # 11 # Mechanical Data and Packaging Information This chapter describes the Alpha 21164 microprocessor mechanical packaging including chip package physical specifications and a signal/pin list. For heat sink dimensions, refer to Chapter 10. ## 11.1 Mechanical Specifications Figure 11-1 shows the package physical dimensions without a heat sink. 11-2 Preliminary Edition-September 1994 ## 11.2 Signal Descriptions and Pin Assignment This section provides detailed information about the 21164 pinout. The 21164 has 499 pins aligned in an interstitial IPGA design. #### 11.2.1 Signal Pin Lists Table 11-1 lists the 21164 signal pins and their corresponding pin grid array (PGA) locations in alphabetic order. There are 291 functional signal pins, 3 spare (unused) signal pins, 104 power (Vdd) pins, and 101 ground (Vss) pins, for a total of 499 pins in the array. Table 11-1 Alphabetic Signal Pin List | Signal | PGA<br>Location | Signal | PGA<br>Location | Signat | PGA<br>Location | |-----------------|-----------------|-----------------|-----------------|-----------------|-----------------| | addr_bus_req_h | E23 | addr_emd_par_h | B20 | addr_h<4> | BB14 | | addr_h<5> | BC13 | addr_h<6> | BA13 | addr_h<7> | AV14 | | addr_h<8> | AW13 | addr_h<9> | BC11 | addr_h<10> | BA11 | | addr_h<11> | AV12 | addr_h<12> | AW11 | addr_h<13> | BC09 | | addr_h<14> | BA09 | addr_h<15> | AV 10 | addr_h<16> | AW09 | | addr_h<17> | BC07 | addr_h<18> | BA07 | addr_h<19> | AV08 | | addr_h<20> | AW07 | addr_h<21> | BC05 | addr_h<22> | BC39 | | addr_h<23> | AW37 | addr_h<24> | AV36 | addr_h<25> | BA37 | | addr_h<26> | BC37 | addr_h<27> | AW35 | addr_h<28> | AV34 | | addr_h<29> | BA35 | addr_h<30> | BC35 | addr_h<31> | AW33 | | addr_h<32> | AV32 | addr_h<33> | BA33 | addr_h<34> | BC33 | | addr_h<35> | AW31 | addr_h<36> | AV30 | addr_h<37> | BA31 | | addr_h<38> | BC31 | addr_h<39> | BB30 | addr_res_h<0> | C27 | | addr_res_h<1> | F26 | addr_res_h<2> | E27 | cack_h | G21 | | cfail_h | C25 | clk_mode_h<0> | AU21 | clk_mode_h<1> | BA23 | | cmd_h<0> | F20 | cmd_h<1> | A19 | cmd_h<2> | C19 | | cmd_h<3> | E19 | cpu_clk_out_h | BA25 | dack_h | B24 | | data_bus_req_h | E25 | data_check_h<0> | J41 | data_check_h<1> | K38 | | data_check_h<2> | J39 | data_check_h<3> | G43 | data_check_h<4> | G41 | | | | | | (continue | d on next p | Table 11-1 (Cont.) Alphabetic Signal Pin List | Table II (Cont.) | , , , ipii as ( | zio oignai i iii ziot | | | | |------------------|-----------------|-----------------------|-----------------|------------------|-----------------| | Signal | PGA<br>Location | Signal | PGA<br>Location | Signal | PGA<br>Location | | data_check_h<5> | H38 | data_check_h<6> | G39 | data_check_h<7> | E43 | | data_check_h<8> | J03 | data_check_h<9> | K06 | data_check_h<10> | J05 | | data_check_h<11> | G01 | data_check_h<12> | G03 | data_check_h<13> | H06 | | data_check_h<14> | G05 | data_check_h<15> | E01 | data_h<0> | J43 | | data_h<1> | L39 | data_h<2> | M38 | data_h≼3> | IA1 | | data_h<4> | L43 | data_h<5> | N39 | data_h<6> | P38 | | data_h<7> | N41 | data_h<8> | N43 | data_h<9> | P42 | | data_h<10> | R39 | data_h<11> | T38 | data_h<12> | R41 | | data_h<13> | R43 | data_h<14> | U39 | data_h<15> | V38 | | data_h<16> | U41 | data_h<17> | U43 | data_h<18> | W39 | | data_h<19> | W41 | data_h<20> | W43 | data_h<21> | Y38 | | data_h<22> | Y42 | data_h<23> | <b>AA</b> 39 | data_h<24> | AA41 | | data_h<25> | AA43 | data_h<26> | AB38 | data_h<27> | AC43 | | data_h<28> | AC41 | data_h<29> | AC39 | data_h<30> | AD42 | | data_h<31> | AD38 | data_h<32> | AE43 | data_h<33> | AE41 | | data_h<34> | AE39 | data_h<35> | AG43 | data_h<36> | AG41 | | data_h<37> | AF38 | data_h<38> | AG39 | data_h<39> | AJ43 | | data_h<40> | AJ41 | data_h<41> | AH38 | data_h<42> | AJ39 | | data_h<43> | AK42 | data_h<44> | AL43 | data_h<45> | AL41 | | data_h<46> | AK38 | data_h<47> | AL39 | data_h<48> | AN43 | | data_h<49> | AN41 | data_h<50> | AM38 | data_h<51> | AN39 | | data_h<52> | AR43 | data_h<53> | AR41 | data_h<54> | AP38 | | data_h<55> | AR39 | data_h<56> | AU43 | data_h<57> | AU41 | | data_h<58> | AT38 | đata_h<59> | AU39 | data_h<60> | AW43 | | data_h<61> | AW41 | data_h<62> | AV38 | data_h<63> | AW39 | | data_h<64> | <b>J</b> 01 | data_h<65> | L05 | data_h<66> | M06 | | data_h<67> | L03 | data_h<68> | L01 | data_h<69> | N05 | | data_h<70> | P06 | data_h<71> | N03 | data_h<72> | N01 | | | | | | | | (continued on next page) Table 11-1 (Cont.) Alphabetic Signal Pin List | Signal | PGA<br>Location | Signal | PGA<br>Location | Signal | PGA<br>Location | | |-----------------------------------------|-----------------|----------------|-----------------|---------------|-----------------|--| | data_h<73> | P02 | data_h<74> | R05 | data_h<75> | T06 | | | data_h<76> | R03 | data_h<77> | R01 | data_h<78>> | U05 | | | data_h<79> | V06 | data_h<80> | U03 | data_h<81> | U01 | | | data_h<82> | W05 | data_h<83> | W03 | data_h<84> | W01 | | | data_h<85> | Y06 | data_h<86> | Y02 | data_h<87> | AA05 | | | data_h<88> | AA03 | data_h<89> | AA01 | data_h<90> | AB06 | | | data_h<91> | AC01 | data_h<92> | AC03 | data_h<93> | AC05 | | | data_h<94> | AD02 | data_h<95> | AD06 | data_h<96> | AE01 | | | data_h<97> | AE03 | data_h<98> | AE05 | data_h<99> | AG01 | | | data_h<100> | AG03 | data_h<101> | AF06 | data_h<102> | AG05 | | | data_h<103> | AJ01 | data_h<104> | AJ03 | data_h<105> | AH06 | | | data_h<106> | AJ05 | data_h<107> | AK02 | data_h<108> | AL01 | | | data_h<109> | AL03 | data_h<110> | AK06 | data_h<111> | AL05 | | | data_h<112> | AN01 | data_h<113> | AN03 | data_h<114> | AM06 | | | data_h<115> | AN05 | data_h<116> | AR01 | data_h<117> | AR03 | | | data_h<118> | AP06 | data_h≮119> | AR05 | data_h<120> | AU01 | | | data_h<121> | AU03 | data_h<122> | AT06 | data_h<123> | AU05 | | | data_h<124> | AW01 | data_h<125> | AW03 | data_h<126> | AV06 | | | data_h<127> | AW05 | data_ram_oe_h | F22 | data_ram_we_h | A23 | | | dc_ok_h | AU23 | fill_error_h | A25 | fill_h | G23 | | | fill_id_h | F24 | fill_nocheck_h | G25 | idle_bc_h | A27 | | | index_h<4> | A29 | index_h<5> | C29 | index_h<6> | F28 | | | index_h<7> | E29 | index_h<8> | B30 | index_h<9> | A31 | | | index_h<10> | C31 | index_h<11> | F30 | index_h<12> | E31 | | | index_h<13> | A33 | index_h<14> | C33 | index_h<15> | F32 | | | index_h<16> | E33 | index_h<17> | A35 | index_h<18> | C35 | | | index_h<19> | F34 | index_h<20> | E35 | index_h<21> | A37 | | | index_h<22> | C37 | index_h<23> | F36 | index_h<24> | E37 | | | *************************************** | | | | | | | Table 11-1 (Cont.) Alphabetic Signal Pin List | Signal | PGA<br>Location | Signal | PGA<br>Location | Signal | PGA<br>Location | |--------------------|-----------------|------------------|-----------------|---------------------|-----------------| | index_h<25> | A39 | int4_valid_h<0> | F38 | int4_valid_h<1> | <b>E4</b> 1 | | int4_valid_h<2> | F06 | int4_valid_h<3> | E03 | irq_h<0> | BA29 | | irq_h<1> | AU27 | irq_h<2> | BC29 | irq_h<3> | AW27 | | mch_hlt_irq_h | AU25 | osc_elk_in_h | BC21 | osc_clk_in_i | BB22 | | perf_mon_h | AW29 | port_mode_h<0> | AY20 | port_mode_h <t></t> | BB20 | | pwr_fail_irq_h | AV26 | ref_clk_in_h | AW25 | scache_set_h<0> | C17 | | scache_set_h<1> | A17 | shared_h | C23 | srom_clk_h | BA19 | | srom_data_h | BC19 | srom_oe_l | AW19 | srom_present_l | AV20 | | system_lock_flag_h | G27 | sys_clk_out1_h | AW23 | sys_clk_out1_l | <b>BB24</b> | | sys_clk_out2_h | AV24 | sys_clk_out2_l | BC25 | sys_mch_chk_irq_h | BA27 | | sys_reset_l | BC27 | tag_ctl_par_h | F18 | tag_data_h<20> | A05 | | tag_data_h<21> | E07 | tag_data_h<22> | <b>₽</b> 08 | tag_data_h<23> | C07 | | tag_data_h<24> | A07 | tag_data_h<25> | E09 | tag_data_h<26> | F10 | | tag_data_h<27> | C09 | tag_data_h<28> | A09 | tag_data_h<29> | E11 | | tag_data_h<30> | F12 | tag_data_h<31> | C11 | tag_data_h<32> | A11 | | tag_data_h<33> | E13 | tag_data_h<34> | F14 | tag_data_h<35> | C13 | | tag_data_h<36> | A13 | tag_data_h<37> | B14 | tag_data_h<38> | E15 | | tag_data_par_h | C15 | tag_dirty_h | E17 | tag_ram_oe_h | C21 | | tag_ram_we_h | <b>A21</b> | tag_shared_h | A15 | tag_valid_h | F16 | | tck_h | AW17 | tdi_h | BC17 | tdo_h | BA17 | | temp_sense | AW15 | test_status_h<0> | BA15 | test_status_h<1> | AV16 | | tms_h | AV 18 | trst_l | BC15 | victim_pending_h | E21 | | spare_in<438> | E39 | spare_io<002> | E05 | spare_io<250> | AV28 | | Signal | PGA Loca | ation | | | | Table 11-1 (Cont.) Alphabetic Signal Pin List | Signal | PGA Location | |----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Vss—Metal planes 2 <sup>1</sup> and 5 <sup>2</sup> | A03, A41, AA07, AA37, AC07, AC37, AD04, AD40, AF02, AF42, AG07, AG37, AH04, AH40, AL07, AL37, AM04, AM40, AP02, AP42, AR07, AR37, AF04, AT46, AU09, AU13, AU17, AU31, AU35, AV02, AV22, AV42, AW21, AY04, AY08, AY12, AY16, AY22, AY24, AY28, AY32, AY36, AY40, B02, B06, B10, B18, B26, B34, B38, B42, BA01, BA21, BA43, BB02, BB06, BB10, BB18, BB26, BB34, B38, BB42, BC03, BC41, C01, C43, D04, D08, D12, D16, D20, D24, D28, D32, D36, D40, F02, F42, G09, G13, G17, G31, G35, H04, H40, J07, J37, K02, K42, M04, M40, N07, N37, T04, T40, U07, U37, V02, V42, Y04, Y40 | | Vdd<br>Metal planes 4 and 6 | AB02, AB04, AB40, AB42, AE07, AE37, AF04, AF40, AH02, AH42, AJ07, AJ37, AK04, AK40, AM02, AM42, AN07, AN37, AP04, AP40, AT02, AT42, AU07, AU11, AU15, AU19, AU29, AU33, AU37, AV04, AV40, AY02, AY06, AY10, AY14, AY18, AY26, AY30, AY34, AY38, AY42, B04, B08, B12, B16, B22, B28, B32, B36, B40, BA03, BA05, BA39, BA41, BB04, BB08, BB12, BB16, BB28, BB32, BB36, BB40, BC23, C03, C05, C39, C41, D02, D06, D10, D14, D18, D22, D26, D30, D34, D38, D42, F04, F40, G11, G15, G19, G29, G33, G37, H02, H42, K04, K40, L07, L37, M02, M42, P04, P40, R07, R37, T02, T42, V04, V46, W07, W37 | <sup>&</sup>lt;sup>1</sup>Metal plane 2—Seal ring connection tied to Vss $<sup>^2</sup>$ Metal plane 5—Heat slug braze pad connections tied to Vss # 11.2.2 Pin Assignment Figure 11-2 shows the 21164 pinout from the top view with pins facing down. Figure 11-2 Alpha 21164 Top View (Pin Down) Figure 11-3 shows the 21164 pinout from the bottom view with pins facing up. Figure 11-3 Alpha 21164 Bottom View (Pin Up) # Testability and Diagnostics The 21164 has a wide variety of user-initiated testability features. This chapter covers only those testability features that are available to the user. The 21164 has several internal testability features that are implemented for factory use only. These features are beyond the scope of this document. # **12.1 Test Port Pins** Table 12-1 summarizes the test port pins and their function. Table 12-1 Alpha 21164 Test Port Pins | Pin Name | Type | Function | |------------------|------|------------------------------------------------------------| | port_mode_h<1> | I | Must be false. | | port_mode_h<0> | I | Must be false. | | srom_present_l | I | Tied low if serial ROMs (SROMs) are present in system. | | srom_data_h/Rx | I | Receives SROM or serial terminal data. | | srom_clk_h/Tx | 0 | Supplies clock to SROMs or transmits serial terminal data. | | srom_oe_I | Ö | SROM enable. | | tdi_h | I | IEEE 1149.1 TDI port. | | tdo_h | O | IEEE 1149.1 TDO port. | | tms_h | I | IEEE 1149.1 TMS port. | | tck_h | I | IEEE 1149.1 TCK port. | | trst_l | · I | IEEE 1149.1 optional TRST port. | | test_status_h<0> | O | Indicates Icache BiSt status. | | test_status_h<1> | Ο | Outputs an IPR-written value and timeout reset. | # 12.2 Test Interface The 21164 test interface supports a serial ROM interface, a serial diagnostic terminal interface, and an IEEE 1149.1 test access port. These ports are available and set to normal test interface mode when port mode h<1:0>=00. Driving these pins to a value of anything other than 00 redefines all other test interface pins and invokes special factory test modes not covered in this document. ### 12.2.1 **SROM Port** Signal pins srom\_present\_l, srom\_data\_h, srom\_oe\_l, and srom\_clk\_h constitute the SROM interface. If SROMs are present in the system, signal srom\_present\_I may be pulled down on the board. The 21164 samples this pin during the system reset. If the pin is pulled down during the system reset, then the 21164's reset sequence automatically loads its Icache from SROMs before executing its first instruction. If **srom\_present\_**l is pulled up during system reset, the SROM load is disabled. In this case the Icache valid hits are cleared by the reset sequence, causing the first instruction fetch to miss the Icache and seek the instructions from the off-chip memory. ### During the SROM load: - Signal srom oe 1 supplies the output enable to the SROM, serving both as an output enable and as a reset. Refer to the SROM specification for details. - The 21164 asserts this signal low for the duration of Icache load from SROM. Once the load is complete, the signal is deasserted. - Output signal srom\_clk\_h supplies the clock to the ROM that causes it to advance to the next bit. The cycle time of this clock is approximately 126 times the CPU clock rate. - The SROM data drives input signal srom\_data\_h. The SROMs can contain enough Alpha AXP code to complete the configuration of the external interface (for example, setting the timing on the external cache RAMs, and diagnosing the path between the CPU chip and the other ROMs). The 21164 is in PALmode following the deassertion of system reset and the conclusion of the Icache self-test. This gives the code loaded into the Icache access to all of the visible state within the chip. Refer to Section 12.3 for details of the Icache fill operation from SROMs. ### 12.2.2 Serial Terminal Port Once the data in the SROM has been loaded into the Icache, the three SROM port pins turn into a simple serial I/O pins that can be used to drive a diagnostic terminal through an interface such as RS422. When the SROM is not being read, the **srom\_oe\_l** output signal is false. The serial diagnostic terminal port is enabled if this pin is wired to the active high enable of an RS422 (or 26LS32) receiver driving onto signal **srom\_data\_h** and to the active high enable of an RS422 (or 26LS31) driver driven from signal **srom\_clk\_h**. The 21164 allows **srom\_data\_h** to be read and **srom\_clk\_h** to be written by PALcode. This supports a bit-banged serial interface. IPRs associated with this interface are described in Chapter 5. ## 12.2.3 IEEE 1149.1 Test Access Port Pins tdi\_h, tdo\_h, tck\_h, tms\_h, and trst\_l constitute the IEEE 1149.1 test access port. This port accesses the 21164 chip's boundary scan register and chip tristate functions for board level manufacturing test. The port also allows access to factory manufacturing features not described in this document. The port is compliant with most requirements of IEEE 1149.1 test access port. ### **Compliance Enable Inputs** Table 12-2 shows the compliance enable inputs and the pattern that must be driven to those inputs in order to activate the 21164 IEEE 1149.1 circuits. Table 12-2 Compliance Enable Inputs | Input | | Compliance | Enable | Pattern | | |----------|-----------------|---------------|--------|---------|--| | port_mod | 200000000000 PM | W- AA 3977777 | | | | | dc_ok_h | | 1 | | | | ### **Exceptions to Compliance** The 21164 is compliant with IEEE Standard 1149.1–1993 with two exceptions. Both exceptions provide enhanced value to the user. ## 1. trst\_l pin The optional trst\_l pin has an internal pull-down, instead of a pull-up as required by IEEE 1149.1 (non-complied spec 3.6.1(b) in IEEE 1149.1—1993). The trst\_l pull-down allows the chip to automatically force reset to the IEEE 1149.1 circuits in a system in which the IEEE 1149.1 port is unconnected. This may be considered a feature for most system designs that use IEEE 1149.1 circuits solely during module manufacturing. ### 2. Coverage of oscillator differential input pins The two differential clock input pins, osc\_clk\_in\_h and osc\_clk\_in\_l do not have any boundary scan cells associated with them (non-complied spec 10.4.1(b) in IEEE 1149.1–1993). Instead, there is an extra input BSR cell in the boundary scan register in bit position 33 (at pin dc\_ok\_h). This cell captures the output of a "clock sniffer" circuit. It captures a "1" when the oscillator is connected, and captures a "0" if the chip's oscillator connections are broken. This exception to the standard is made to permit a meaningful test of the oscillator input pins. Refer to IEEE Standard 1149.1 A Test Access Port and Boundary Scan Architecture for a full description of the specification. Figure 12-1 shows the user-visible features from this port. Figure 12-1 IEEE 1149.1 Test Access Port **TAP Controller** The TAP controller contains a state machine. It interprets IEEE 1149.1 protocols received on signal tms\_h and generates appropriate clocks and control signals for the testability features under its jurisdiction. The state machine is shown in Figure 12–2 Figure 12-2 TAP Controller State Machine ## Instruction Register The 5-bit-wide instruction register (IR) supports IEEE 1149.1 mandated public instructions (EXTEST, SAMPLE, BYPASS, HIGHZ) and a number of optional instructions for public and private factory use. Table 12–3 summarizes the public instructions and their functions. During the capture operation, the shift register stage of IR is loaded with the value 00001. This automatic load feature is useful for testing the integrity of the IEEE 1149.1 scan chain on the module. Table 12–3 Instruction Register | IR<4:0> | Name | Scan Register<br>Selected | Operation | |---------------------------|--------------------|---------------------------|------------------------------------------| | 00000 | EXTEST | BSR | BSR drives pins. Interconnect test mode. | | 00010 | SAMPLE/<br>PRELOAD | BSR | Preloads BSR. | | 00010 | Private | BSR | Private. | | 00011 | Private | BSR | Private. | | 00100 | CLAMP | BPR | BSR drives pins. | | 00101 | HIGHZ | BPR | Tristate all output and I/O pins. | | 00110 | Private | IDR | Private: | | 00111 | Private | IDR | Private. | | 01000<br>through<br>11110 | Private | BPR | Private. | | 11111 | BYPASS | BPR | Default. | # **Bypass Register** The bypass register is a 1-bit shift register. It provides a short single-bit scan path through the port (chip). ## **Boundary Scan Register** The 288-bit boundary scan register is accessed during SAMPLE, EXTEST, and CLAMP instructions. Refer to Section 12.4 for the organization of this register. ## 12.2.4 Test Status Pins Two test status signal test\_status\_h<1:0> pins are used for extracting test status information from the chip. System reset drives both test status pins low. The default operation for test\_status\_h<0> is to output the BiSt results. The default operation for test status h<1> is to output the IPR-written value. **During Icache BiSt Operation** test\_status\_h<<0> is forced high at the start of the Icache BiSt. If the Icache BiSt passes, the pin is deasserted at the end of the BiSt operation, otherwise it remains high. - IPR read and write operations to test status pins PALcode can write to the test\_status\_h<1> signal pin and can read the test\_status\_h<0> signal pin through hardware IPR access. Refer to Chapter 6. - Timeout Reset The 21164 generates a timeout reset signal under two conditions: - 1. If an instruction is not retired within 1 billion cycles. - 2. If the system asserts cfail\_h when cack\_h is deasserted. In either of these conditions, the CPU signals the timeout reset event by outputting a 256 CPU cycle wide pulse on the test\_status\_h<1> pin. The pulse on test\_status\_h<1> pin is clocked by sysclk and therefore appears as an approximately 256 CPU cycle pulse that rises and falls on system clock rising edges. # 12.3 Serial Instruction Cache Load Operation All Icache bits, including each block's tag, address space number (ASN), address space match (ASM), valid and branch history bits can be loaded serially from off-chip serial ROMs. Once the serial load has been invoked by the chip reset sequence, the entire cache is loaded automatically from the lowest to the highest addresses. The automatic serial Icache fill invoked by the chip reset sequence operates internally at a frequency of 126\*CPU clock period. However, due to the synchronization with the system clocks, consecutive access cycles to SROM may shrink or stretch by a system cycle. For example, for a system with a system clock ratio of 15, the time between the two consecutive SROM accesses may be anywhere in the range 111 to 141 CPU cycles. The SROM used in the system must be able to support access times in this range. The serial bits are received in a 200-bit-long fill scan path, from which they are written in parallel into the Icache address. The fill scan path is organized as shown in the text following this paragraph. The farthest bit (<42>) is shifted in first and the nearest bit (BHT<0>) is shifted in last. The data and predecode bits in the data array are interleaved. ``` serial input -> srom data h BHT Array 0 -> 1 -> ... -> 127 -> 95 -> 126 -> 94 -> ... Data 96 -> 64 -> -> Predecodes 19 -> 14 -> 18 -> 15 -> 10 -> Data parity 1 -> 0 -> 0 -> 9 -> Predecodes 4 -> 8 -> 3 -> 30 -> ... Data 63 -> 31 -> 62 -> Tag Parity b -> Tag Valids 0 -> b -> TAG Phy.Address TAG ASN 0 -> TAG ASM b -> TAGS 13 -> ``` b = Single bit signal # 12.4 Boundary Scan Register The 21164 boundary scan register (BSR) is 288 bits long. Table 12–4 provides the boundary scan register organization. The BSR is connected between the tdi\_h and tdo\_h pins whenever an instruction selects it (Table 12–3). The scan register runs clockwise beginning at the upper left corner of the chip. There are seven groups of bidirectional pins, each group controlled from a group control cell. Loading a value of "1" in the control cell tristates the output drivers and all bidirectional pins in the group are configured as input pins. The bidirectional pin groups are identified as groups gr\_1 through gr\_7 in the Control Group column in Table 12-4. \_\_\_\_\_ Notes \_\_\_\_\_ The following notes apply to Table 12-4: - The direction of shift is from top to bottom, and from left to right. - The bottom most signals appear first at the tdo\_h pin when shifting. - Given an arrayed signal of the form signal<a:b>, signal<b> appears at the tdo\_h pin prior to signal<a>. Table 12-4 Boundary Scan Register Organization | Signal Name | Pin<br>Type | BSR<br>Count | BSR<br>Cell Type | Control<br>Group | Remarks | |--------------------|-------------|----------------|------------------|------------------|-------------------------------------------------------| | TR_ADL | Control | 0 | io_bcell | gr_1 | Upper left corner. | | addr_h<21:4> | В | 1:18 | io_bcell | gr_1 | <u></u> | | temp_sense_h | 0 | <del></del> | None | _ | Analog pin. | | test_status_h<1:0> | Ο | 19:20 | io_bcell | - | \ | | trst_l | . <b>B</b> | _ | None | | <del>-</del> | | tek_h | <b>B</b> | _ | None | > // | <b></b> | | tms_h | В | _ | None | . — ` | <del>_</del> | | tdo_h | О | | None | - | <u> </u> | | tdi_h | В | | None | - | | | srom_oe_l . | O | 21 | io_bcell | ~ <u>~</u> | <del>-</del> . | | srom_clk_h | O | 22 | io_bcell | - | | | srom_data_h | I | 23 | in_beell | · <del></del> | <del>-</del> | | srom_present_l | В | 24 | in_bœll | <b>&gt;</b> _ | - <del>-</del> - | | port_mode_h<0:1> | I | | in_bcell | | Compliance enable pins | | clk_mode_h<0> | I | 25 | in_bœll | <del></del> | _ | | osc_clk_in_h,l | I | <sub>w</sub> - | None | - | Analog pins. | | clk_mode_h<1> | I | 26 | in_bœll | | <u> </u> | | sys_clk_out1_h,l | 0 | 27:28 | io_bcell | | . <del>-</del> | | sys_clk_out2_h,l | O | 29:30 | io_bcell | · <u> </u> | | | cpu_clk_out_h | o | · <u>**</u> | none | | For chip test. | | ref_clk_in_h | I | 31 | in_bcell | | <del>_</del> , . | | sys_reset_l | I | 32 | in_bcell | · | - | | dc_ok_h | I | _ | in_bcell | | Compliance enable pin | | Osc_Sniffer_h | Internal | 33 | in_bcell | · <u> </u> | Captures 1 if osc is connected, otherwise captures 0. | | sys_mch_chk_irq_h | I | 34 | in_bcell | | | | pwr_fail_irq_h | I | 35 | in_bcell | | · · | | mch_hlt_irq_h | . I | 36 | in_bcell | _ | | Table 12-4 (Cont.) Boundary Scan Register Organization | Signal Name | Pin<br>Type | BSR<br>Count | BSR<br>Cell Type | Control<br>Group | Remarks | |--------------------|--------------|--------------|------------------|------------------|----------------------------------| | irq_h<3:0> | . I | 37:40 | in_bcell | | | | SPARE_IO<250> | В | 41 | io_bcell | _ | Tied off as input. | | perf_mon_h | I | 42 | in_bcell | | | | TR_ADR | Control | 43 | io_bcell | gr_2 | | | addr_h<39:22> | В | 44:61 | io_bcell | gr_2 | Upper right corner. | | TR_DDR | Control | 62 | io_bcell | gr_3 | | | data_h<63:0> | В | 63:126 | io_bcell | gr_3 | | | data_check_h<0:7> | В | 127:134 | io_bcell | gr_3 | | | int4_valid_h<1:0> | 0 | 135:136 | . io_bcell | <u> </u> | <del>-</del> | | SPARE_IO<438> | <del>.</del> | | None | | Lower right corner, unpopulated. | | index_h<25:4> | 0 | 137:158 | io_bcell | <u> </u> | _ | | addr_res_h<2:0> | , O | 159:161 | io_bcell | | <del></del> | | idle_bc_h | I | 162 | in_bcell | <u> </u> | _ | | system_lock_flag_h | I | 163 | in_bcell | <u> </u> | · <del>-</del> | | data_bus_req_h | I | 164 | in_bcell | · · | <del>-</del> | | cfail_h | I | 165 | in_bcell | | | | fill_nocheck_h | I | 166 | in_bcell | <del></del> | - | | fill_error_h | 1 | 167 | in_bcell | _ | · <u> </u> | | fill_id_h | I | 168 | in_bcell | | <del></del> • | | fill_h | 1 | 169 | in_bcell | ****** | | | dack_h | 1 | 170 | in_bcell | · | <u> </u> | | addr_bus_req_h | I | 171 | in_bcell | ******* | | | cack_h | 1 | 172 | in_bcell | _ | · · | | shared_h | I | 173 | in_bcell | | | | data_ram_we_h | 0 | 174 | io_bcell | | · <u> </u> | | data_ram_oe_h | 0 | 175 | io_bcell | <u> </u> | <del></del> | | tag_ram_we_h | 0 | 176 | io_bcell | | | Table 12-4 (Cont.) Boundary Scan Register Organization | Signal Name | Pin<br>Type | BSR<br>Count | BSR<br>Cell Type | Control<br>Group | Remarks | |--------------------|-------------|--------------|------------------|----------------------|---------------------------------| | tag_ram_oe_h | 0 | 177 | io_bcell | | 4 | | victim_pending_h | 0 | 178 | io_bcell | | | | TMIS1 | Control | 179 | io_bcell | gr_4 | <u> </u> | | addr_cmd_par_h | В | 180 | io_bcell | gr_4 | \_ \\ | | cmd_h<0:3> | В | 181:184 | io_bcell | gr_4 | - | | scache_set_h<1:0> | 0 | 185:186 | io_bcell | > // | | | TTAG1 | Control | 187 | io_bcell | gr_5 | | | tag_ctl_par_h | В | 188 | io_bcell | gr_5 | <u>"</u> | | tag_dirty_h | В | 189 | io_bcell | gr_5 | | | tag_shared_h | В | 190 | io_bcell | gr_5 · | . <del>-</del> | | TTAG2 | control | 191 | io_bcell | gr_6 | · . | | tag_data_par_h | В | 192 | io_bcell | gr_6 | | | tag_valid_h | В | 193 | io_bcell | gr_6 | | | tag_data_h<38:20> | В | 194:212 | io_bcell | gr_6 | | | SPARE_IO<002> | | | None | | Lower left corner, unpopulated. | | int4_valid_h<2:3> | 0 | 213:214 | io_bcell | <del>-</del> + ,+ 1, | | | TR_DDL | control | 215 | io_bcell | gr_7 | | | data_check_h<15:8> | В | 216:223 | io_bcell | gr_7 | <u> </u> | | data_h<64:127> | В | 224:287 | io_bcell | gr_7 | | # 12.5 Timing of Test Features Timing of 21164 testability features depends on the system clock rate and the test port's operating mode. This section provides timing information that may be needed for most common operations. # 12.5.1 Icache BiSt Operation Timing The Icache BiSt is invoked by deasserting the external reset signal sys reset 1. Figure 12-3 shows the timing between various events relevant to BiSt operations. Figure 12–3 BiSt Timing Event-Time Line In Figure 12-3 (see asterisk), timing for the deassertion of internal reset (time t2 is valid only if an SROM is not present (indicated by keeping signal srom\_present\_1 deasserted). If an SROM is present, the SROM load is performed once the BiSt completes. The internal reset signal T%Z\_RESET\_ B\_L is extended until the end of the SROM load (Section 12.5.2). In this case, the end of the time line shown in Figure 12-3 connects to the beginning of the time line shown in Figure 12-4. Table 12-5 and Table 12-6 list timing shown in Figure 12-3 for some of the system clock ratios. Time t<sub>1</sub> is measured starting from the rising edge of sysclk following the deassertion of the sys\_reset\_l signal. Table 12-5 BiSt Timing for Some System Clock Ratios, Port Mode=Normal (System Cycles) | Sys | scik | System Cyc | les | | | | |-----|------|------------|-----------------------|-----------------------|-------|--| | Rai | ilo | $t_1$ | <i>t</i> <sub>2</sub> | <i>t</i> <sub>3</sub> | -<br> | | | 3 | | 8 | 22644+21/2 | 22645 | | | | 4 | | 7 | 19721+21/2 | 19722 | | | | 15 | | 7 | 13291+141/2 | 13292 | | | Table 12–6 BiSt Timing for Some System Clock Ratios, Port Mode=Normal (CPU Cycles) | Syscik | CPU Cy | cles | | | |--------|--------|----------|-----------------------|--| | Ratio | $t_1$ | $t_2$ | <i>t</i> <sub>3</sub> | | | 3 | 24 | 67934½ | 67935 | | | 4 | 28 | 788861/2 | 78888 | | | 15 | 105 | 199379½ | 199380 | | # 12.5.2 Automatic SROM Load Timing The SROM load is triggered by the conclusion of BiSt if **srom\_present\_1** is asserted. The SROM load occurs at the internal cycle time of approximately 126 CPU cycles for **srom\_clk\_h**, but the behavior at the pins may shift slightly. Refer to Chapter 7 for more information on input signals, booting, and the SROM interface port. Timing events are shown in Figure 12-4 and listed in Table 12-7 and Table 12-8. Figure 12-4 SROM Load Timing Event-Time Line Table 12–7 SROM Load Timing for Some System Clock Ratios (System Cycles) | Syscik | Syste | m Cycles <sup>1</sup> | | | | <br>1 | |--------|-------|-----------------------|-----------------------|--------------|-----------------------|-------| | Ratio | $t_1$ | $t_2$ | <i>t</i> <sub>3</sub> | $t_4$ | <i>t</i> <sub>5</sub> | | | 3 | 4 | 22 | 4408090 | 4408216+1/2 | 4408217 | | | 4 | 3 | 48 | 3306099 | 3306193+21/2 | 3306194 | b * | | 15 | 3 | 13 | 881627 | 881651+91/2 | 881652 | | <sup>&</sup>lt;sup>1</sup>Measured in sysclk cycles, where +n refers to an additional n CPU cycles. Table 12-8 SROM Load Timing for Some System Clock Ratios (CPU Cycles) | Syscik | CPU Cycles | | | |--------|------------|-------|-------------------------------| | Ratio | $t_1$ | $t_2$ | $t_3$ $t_4$ $t_5$ | | 3 | 12 | 66 | 13224270 132246481/2 13224651 | | 4 | 12 | 192 | 13224396 132247741/4 13224776 | | 15 | 45 | 195 | 13224405 132247741/2 13224780 | Figure 12-5 is a timing diagram of an SROM load sequence. Figure 12-5 Serial ROM Load Timing The minimum srom\_clk\_h cycle = (126 - sysclk ratio) \* (CPU cycle time). The maximum **srom\_clk\_h** to **srom\_data\_h** delay allowable (in order to meet the required setup time) = [126 - (5 \* sysclk ratio)] \* (CPU cycle time). # Alpha AXP Instruction Set # A.1 Alpha AXP Instruction Summary This appendix contains a summary of all Alpha AXP architecture instructions. All values are in hexadecimal radix. Table A-1 describes the contents of the Format and Opcode columns that are in Table A-2. Table A-1 Instruction Format and Opcode Notation | Instruction<br>Format | Format<br>Symbol | Opcode<br>Notation | Meaning | |--------------------------|------------------|--------------------|------------------------------------------------------------------------------------------------------------------| | Branch | Bra | 00 | oo is the 6-bit opcode field. | | Floating-<br>point | F-P | oo.fff | oo is the 6-bit opcode field. fff is the 11-bit function code field. | | Memory | Mem | 00 | oo is the 6-bit opcode field. | | Memory/<br>function code | Mfc | eo.ffff | oo is the 6-bit opcode field. ffff is the 16-bit function code in the displacement field. | | Memory/<br>branch | Mbr | oo.h | oo is the 6-bit opcode field. h is the high-order 2 bits of the displacement field. | | Operate | Opr | oo.ff | oo is the 6-bit opcode field. ff is the 7-bit function code field. | | PALcode | Ped | 00 | oo is the 6-bit opcode field; the particular PALcode instruction is specified in the 26-bit function code field. | Qualifiers for operate instructions are shown in Table A-2. Qualifiers for IEEE and VAX floating-point instructions are shown in Tables A-5 and A-6, respectively. Table A-2 Architecture Instructions | Mnemonic | Format | Opcode | Description | |----------|-------------|------------|-------------------------------| | ADDF | F-P | 15.080 | Add F_floating | | ADDG | F-P | 15.0A0 | Add G_floating | | ADDL | Opr | 10.00 | Add longword | | ADDL/V | Opr | 10.40 | Add longword | | ADDQ | Opr | 10.20 | Add quadword | | ADDQ/V | Opr | 10.60 | Add quadword | | ADDS | F-P | 16.080 | Add S_floating | | ADDT | F-P | 16.0A0 | Add T_floating | | AND | Opr | 11.00 | Logical product | | BEQ | Bra | 39 | Branch if = zero $\cdot$ | | BGE | Bra | 3E | Branch if $\geq$ zero | | BGT | Bra | 3F | Branch if > zero | | BIC | Opr | 11.0 | Bit clear | | BIS | Opr | 11.20 | Logical sum | | BLBC | Bra | 38 | Branch if low bit clear | | BLBS | Bra | 3 <b>C</b> | Branch if low bit set | | BLE | Bra | 3B | Branch if $\leq$ zero | | BLT | Bra | 3A | Branch if < zero | | BNE | Вта | 3D | Branch if $\neq$ zero | | BR | Bra | 30 | Unconditional branch | | BSR | Mbr | 34 | Branch to subroutine | | CALL_PAL | Ped | 00 | Trap to PALcode | | CMOVEQ | Opr | 11.24 | CMOVE if = zero | | CMOVGE | Opr | 11.46 | <b>CMOVE</b> if $\geq$ zero | | CMOVGT | Opr | 11.66 | CMOVE if > zero | | CMOVLBC | Opr | 11.16 | CMOVE if low bit clear | | CMOVLBS | Opr | 11.14 | CMOVE if low bit set | | CMOVLE | <b>O</b> pr | 11.64 | <b>CMOVE</b> if $\leq$ zero | | CMOVLT | Opr | 11.44 | CMOVE if < zero | | CMOVNE | Opr | 11.26 | CMOVE if $\neq$ zero | | CMPBGE | Opr | 10.0F | Compare byte | | CMPEQ | Opr | 10.2D | Compare signed quadword equal | | | | 15.0A5 | Compare G_floating equal | Table A-2 (Cont.) Architecture Instructions | Mnemonic | Format | Opcode | Description | |----------|------------|--------|-----------------------------------------------| | CMPGLE | F-P | 15.0A7 | Compare G_floating less than or equal | | CMPGLT | F-P | 15.0A6 | Compare G_floating less than | | CMPLE | Opr | 10.6D | Compare signed quadword less<br>than or equal | | CMPLT | Opr | 10.4D | Compare signed quadword less than | | CMPTEQ | F-P | 16.0A5 | Compare T_floating equal | | CMPTLE | F-P | 16.0A7 | Compare T_floating less than or equal | | CMPTLT | F-P | 16.0A6 | Compare T_floating less than | | CMPTUN | F-P | 16.0A4 | Compare T_floating unordered | | CMPULE | Opr | 10.3D | Compare unsigned quadword less than or equal | | CMPULT | Opr | 10.1D | Compare unsigned quadword less than | | CPYS | F-P | 17.020 | Copy sign | | CPYSE | F-P | 17.022 | Copy sign and exponent | | CPYSN | F-P | 17.021 | Copy sign negate | | CVTDG | F-P | 15.09E | Convert D_floating to G_floating | | CVTGD | F-P | 15.0AD | Convert G_floating to D_floating | | CVTGF | F4P | 15.0AC | Convert G_floating to F_floating | | CVTGQ | F-P | 15.0AF | Convert G_floating to quadword | | CVTLQ | <b>F-P</b> | 17.010 | Convert longword to quadword | | CVTQF | F-P | 15.0BC | Convert quadword to F_floating | | CVTQG | F-P | 15.0BE | Convert quadword to G_floating | | CVTQL | <b>F-P</b> | 17.030 | Convert quadword to longword | | CVTQL/SV | F-P | 17.530 | Convert quadword to longword | | CVTQL/V | F-P | 17.130 | Convert quadword to longword | | CVTQS | F-P | 16.0BC | Convert quadword to S_floating | | CVTQT | F-P | 16.0BE | Convert quadword to T_floating | | CVTST | F-P | 16.2AC | Convert S_floating to T_floating | | CVTTQ | F-P | 16.0AF | Convert T_floating to quadword | | CVTTS | F-P | 16.0AC | Convert T_floating to S_floating | | DIVF | F-P | 15.083 | Divide F_floating | | DIVG | F-P | 15.0A3 | Divide G_floating | | DIVS | F-P | 16.083 | Divide S_floating | | w | | | | Table A-2 (Cont.) Architecture Instructions | Mnemonic | Format | Opcode | Description | |---------------|-----------------------------|---------|--------------------------------| | EQV | Opr | 11.48 | Logical equivalence | | EXCB | Mfc | 18.0400 | Exception barrier | | EXTBL | Opr | 12.06 | Extract byte low | | EXTLH | Opr | 12.6A | Extract longword high | | EXTLL | Opr | 12.26 | Extract longword low | | EXTQH | Opr | 12.7A | Extract quadword high | | EXTQL | Opr | 12.36 | Extract quadword low | | EXTWH | Opr | 12.5A | Extract word high | | EXTWL | Opr | 12.16 | Extract word low | | FBEQ | Bra | 31 | Floating branch if ≠ zero | | FBGE | Bra | 36 | Floating branch if $\geq$ zero | | FBGT | Bra | 37 | Floating branch if > zero | | FBLE | Bra | 33 | Floating branch if $\leq$ zero | | FBLT | Bra | 32 | Floating branch if < zero | | FBNE | Bra | 35 | Floating branch if ≠ zero | | FCMOVEQ | F-P | 17.02A | FCMOVE if = zero | | FCMOVGE | $\mathbf{F}$ - $\mathbf{P}$ | 17.02D | <b>FCMOVE</b> if $\geq$ zero | | FCMOVGT | F-P | 17.02F | FCMOVE if > zero | | FCMOVLE | F-P | 17.02E | FCMOVE if ≤ zero | | FCMOVLT | F-P | 17.02C | FCMOVE if < zero | | FCMOVNE | F-P | 17.02B | <b>FCMOVE</b> if $\neq$ zero | | FETCH | Mfc | 18.80 | Prefetch data | | FETCH_M | Mfc | 18.A0 | Prefetch data, modify intent | | INSBL | Opr | 12.0B | Insert byte low | | INSLH | Opr | 12.67 | Insert longword high | | | . 17 | | | | INSLL | Opr | 12.2B | Insert longword low | | INSQH | Opr | 12.77 | Insert quadword high | | INSQL | Opr | 12.3B | Insert quadword low | | INSWH | Opr | 12.57 | Insert word high | | INSWL | Opr | 12.1B | Insert word low | | JMP | Mbr | 1A.0 | Jump | | JSR | Mbr | 1A.1 | Jump to subroutine | | JSR_COROUTINE | Mbr | 1A.3 | Jump to subroutine return | | LDA | Mem | 08 | Load address | | LDAH | Mem | 09 | Load address high | | LDF | Mem | 20 | Load F_floating | | LDG | Mem | 21 | Load G_floating | Table A-2 (Cont.) Architecture Instructions | Mnemonic | Format | Opcode | Description | |----------|--------|---------|-------------------------------| | LDL | Mem | 28 | Load sign-extended longword | | LDL_L | Mem | 2A | Load sign-extended longword | | | | | locked | | LDQ | Mem | 29 | Load quadword | | LDQ_L | Mem | 2B | Load quadword locked | | LDQ_U | Mem | 0B | Load unaligned quadword | | LDS | Mem | 22 | Load S_floating | | LDT | Mem | 23 | Load T_floating | | MB | Mfc | 18.4000 | Memory barrier | | MF_FPCR | F-P | 17.025 | Move from FPCR | | MSKBL | Opr | 12.02 | Mask byte low | | MSKLH | Opr | 12.62 | Mask longword high | | MSKLL | Opr | 12.22 | Mask longword low | | MSKQH | Opr | 12.72 | Mask quadword high | | MSKQL | Opr | 12.32 | Mask quadword low | | MSKWH | Opr | 12.52 | Mask word high | | MSKWL | Opr | 12.12 | Mask word low | | MT_FPCR | F-P | 17.024 | Move to FPCR | | MULF | F-P | 15.082 | Multiply F_floating | | MULG | F-P | 15.0A2 | Multiply G_floating | | MULL | Opr | 13.00 | Multiply longword | | MULL/V | | 13.40 | | | MULQ | Opr | 13.20 | Multiply quadword | | MULQ/V | | 13.60 | | | MULS | F-P | 16.082 | Multiply S_floating | | MULT | F-P | 16.0A2 | Multiply T_floating | | ORNOT | Opr | 11.28 | Logical sum with complement | | RC | Mfc | 18.E0 | Read and clear | | RET | Mbr | 1A.2 | Return from subroutine | | RPCC | Mfc | 18.C0 | Read process cycle counter | | RS | Mfc | 18.F000 | Read and set | | S4ADDL | Opr | 10.02 | Scaled add longword by 4 | | S4ADDQ | Opr | 10.02 | Scaled add quadword by 4 | | S4SUBL | Opr | 10.0B | Scaled subtract longword by 4 | | S4SUBQ | Opr | 10.2B | Scaled subtract quadword by | | S8ADDL | Opr | 10.12 | Scaled add longword by 8 | | | | | (continued on next pa | Table A-2 (Cont.) Architecture Instructions | Mnemonic | Format | Opcode | Description | |----------|-----------------------------|------------|---------------------------------| | S8ADDQ | Opr | 10.32 | Scaled add quadword by 8 | | S8SUBL | Opr | 10.1B | Scaled subtract longword by 8 | | S8SUBQ | Opr | 10.3B | Scaled subtract quadword by 8 | | SLL | Opr | 12.39 | Shift left logical | | SRA | Opr | 12.3C | Shift right arithmetic | | SRL | Opr | 12.34 | Shift right logical | | STF | Mem | 24 | Store F_floating | | STG | Mem | 25 | Store <b>G_</b> floating | | STS | Mem | 26 | Store S_floating | | STL | Mem | 2C | Store longword | | STL_C | $\mathbf{Mem}$ | 2E | Store longword conditional | | STQ | Mem | <b>2</b> D | Store quadword | | STQ_C | Mem | 2F. | Store quadword conditional | | STQ_U | Mem | 0F | Store unaligned quadword | | STT | Mem | 27 | Store T_floating | | SUBF | F-P | 15.081 | Subtract F_floating | | SUBG | $\mathbf{F}$ - $\mathbf{P}$ | 15.0A1 | Subtract G_floating | | SUBL | Opr | 10.09 | Subtract longword | | SUBL/V | - | 10.49 | | | SUBQ | Opr | 10.29 | Subtract quadword | | SUBQ/V | | 10.69 | | | SUBS | F-P | 16.081 | Subtract S_floating | | SUBT | F-P | 16.0A1 | Subtract T_floating | | TRAPB | Mfe | 18.00 | Trap barrier | | UMULH | Opr | 13.30 | Unsigned multiply quadword high | | WMB | Mfe | 18.44 | Write memory barrier | | XOR | Opr | 11.40 | Logical difference | | ZAP | Opr | 12.30 | Zero bytes | | ZAPNOT | Opr | 12.31 | Zero bytes not | # A.1.1 Opcodes Reserved for Digital Table A-3 lists opcodes reserved for Digital. Table A-3 Opcodes Reserved for Digital | Mnemonic | Opcode | Mnemonic | Opcode | Mnemonic | Opcode | |----------|--------|----------|--------|----------|--------| | OPC01 | 01 | OPC05 | 05 | OPC0B | 0В | | OPC02 | 02 | OPC06 | 06 | OPC0C | 0C | | OPC03 | 03 | OPC07 | 07 | OPC0D | 0D | | OPC04 | 04 | OPC0A | 0A | OPC14 | 14 | # A.1.2 Opcodes Reserved for PALcode Table A-4 lists the 21164-specific instructions. For more information, refer to Section 6.6. Table A-4 Opcodes Reserved for PALcode | 21164<br>Mnemonic | Opcode | Architecture<br>Mnemonic | Function | |-------------------|--------|--------------------------|----------------------------------------------------------------------------------------------------------------| | HW_LD | 1B | PAL1B | Performs Dstream loads. | | HW_ST | 1F | PAL1F | Performs Dstream stores. | | HW_REI | 1E | PAL1E | Returns instruction flow to the program counter (PC) pointed to by EXC_ADDR internal processor register (IPR). | | HW_MFPR | 19 | PAL19 | Accesses the Ibox, Mbox, and Dcache IPRs. | | HW_MTPR | 1D | PAL1D | Accesses the Ibox, Mbox, and Dcache IPRs. | # A.2 IEEE Floating-Point Instructions Table A-5 lists the hexadecimal value of the 11-bit function code field for the IEEE floating-point instructions, with and without qualifiers. The opcode for these instructions is $16_{16}$ . Table A-5 IEEE Floating-Point Instruction Function Codes | | None | /C | /M | /D | /U | /UC | /UM | /UD | |--------|---------------|-----------------------------------------|---------|-------|---------|-------|-------|--------------| | ADDS | 080 | 000 | 040 | 0C0 | 180 | 100 | 140 | 1C0 | | ADDT | 0A0 | 020 | 060 | 0E0 | 1A0 | 120 | 160 | 1E0 | | CMPTEQ | 0A5 | | | | | | | | | CMPTLT | 0A6 | | | | | | | | | CMPTLE | 0A7 | | | | | | | | | CMPTUN | 0A4 | | | | | | | | | CVTQS | 0BC | 03C | 07C | 0FC | | | | Mi n | | CVTQT | 0BE | 03E | 07E | OFE | | | | | | CVTTS | 0AC | 02C | 06C | 0EC | 1AC | 12C | 16C | 1EC | | DIVS | 083 | 003 | 043 | 0C3 | 183 | 103 | 143 | 1C3 | | DIVT | 0A3 | 023 | 063 | 0E3 | 1A3 | 123 | 163 | 1E3 | | MULS | 082 | 002 | 042 | 0C2 | 182 | 102 | 142 | 1C2 | | MULT | 0A2 | 022 | 062 | 0E2 | 1A2 | 122 | 162 | 1 <b>E2</b> | | SUBS | 081 | 001 | 041 . | OC1 | 181 | 101 | 141 | 1 <b>C</b> 1 | | SUBT | 0A1 | 021 | 061 | 0E1 | 1A1 | 121 | 161 | 1E1 | | | | | | ***** | | | | | | | /SU | /SUC | /SUM | /SUD | /SUI | /SUIC | /SUIM | /SUID | | ADDS | 580 | 500 | 540 | 5C0 | 780 | 700 | 740 | 7C0 | | ADDT | 5A0 | 520 | 560 | 5E0 | 7A0 | 720 | 760 | 7E0 | | CMPTEQ | 5 <b>A</b> 5 | | | | <b></b> | | | | | CMPTLT | 5A6 | | | | | | | | | CMPTLE | 5A7 | | | | | | | | | CMPTUN | 5 <b>A</b> 4 | <b>/////</b> | | | | | | | | CVTQS | | *************************************** | | **** | 7BC | 73C | 77C | 7FC | | CVTQT | | **** | <b></b> | | 7BE | 73E | 77E | 7FE | | CVTTS | 5AC | 52C | 56C | 5EC | 7AC | 72C | 76C | 7EC | | DIVS | 583 | 503 | 543 | 5C3 | 783 | 703 | 743 | 7C3 | | DIVT | 5A3 | 523 | 563 | 5E3 | 7A3 | 723 | 763 | 7 <b>E</b> 3 | | MULS | 582 | 502 | 542 | 5C2 | 782 | 702 | 742 | 7C2 | | MULT | 5A2 | 522 | 562 | 5E2 | 7A2 | 722 | 762 | 7 <b>E</b> 2 | | SUBS | 581 | 501 | 541 | 5C1 | 781 | 701 | 741 | 7C1 | | SUBT | -5 <b>A</b> 1 | 521 | 561 | 5E1 | 7A1 | 721 | 761 | 7E1 | | | | · | | | | | | <del> </del> | | | None | /S | | | | | | · | | | | | | | | | | | Table A-5 (Cont.) IEEE Floating-Point Instruction Function Codes | · · · · · · · · · · · · · · · · · · · | None | /C | N | /VC | /SV | /SVC | /SVI | /SVIC | | |---------------------------------------|------|-----|------|-------|-----|------|------|-------|--| | CVTTQ | 0AF | 02F | 1AF | 12F | 5AF | 52F | 7AF | 72F | | | | D | /VD | /SVD | /SVID | /M | /VM | /SVM | /SVIM | | | CVTTQ | 0EF | 1EF | 5EF | 7EF | 06F | 16F | 56V | 76F | | # **Programming Note** Because underflow cannot occur for CMPTxx, there is no difference in function or performance between CMPTxx/S and CMPTxx/SU. It is intended that software generate CMPTxx/SU in place of CMPTxx/S. # A.3 VAX Floating-Point Instructions Table A-6 lists the hexadecimal value of the 11-bit function code field for the VAX floating-point instructions. The opcode for these instructions is $15_{16}$ . Table A-6 VAX Floating-Point Instruction Function Codes | | | | ********* | ****** | | | | | | |--------|------|------------|-----------|--------|-----|-----|-----|------|---| | | None | /c | /U | /UC | /S | /SC | /SU | /SUC | - | | ADDF | 080 | 000 | 180 | 100 | 480 | 400 | 580 | 500 | | | CVTDG | 09E | 01E | 19E | 11E | 49E | 41E | 59E | 51E | | | ADDG | 0A0 | 020 | 1A0 | 120 | 4A0 | 420 | 5A0 | 520 | | | CMPGEQ | 0A5 | <b>***</b> | | | 4A5 | | | | | | CMPGLT | 0A6 | | | | 4A6 | | | | | | CMPGLE | 0A7 | /// 🧆 | | | 4A7 | | | | | | CVTGF | 0AC | 02C | 1AC | 12C | 4AC | 42C | 5AC | 52C | | | CVTGD | 0AD | 02D | 1AD | 12D | 4AD | 42D | 5AD | 52D | | | CVTQF | 0BC | 03C | | | | | | | | | CVTQG | OBE | 03E | | | | | | | | | DIVF | 083 | 003 | 183 | 103 | 483 | 403 | 583 | 503 | | | DIVG | 0A3 | 023 | 1A3 | 123 | 4A3 | 423 | 5A3 | 523 | | | MULF | 082 | 002 | 182 | 102 | 482 | 402 | 582 | 502 | | | MULG | 0A2 | 022 | 1A2 | 122 | 4A2 | 422 | 5A2 | 522 | | | SUBF | 081 | 001 | 181 | 101 | 481 | 401 | 581 | 501 | | Table A-6 (Cont.) VAX Floating-Point Instruction Function Codes | | None | /C | /U | /UC | /S | /SC | /SU | /SUC | |-------|------|-----|-----------|-----|-----|-----|-------------|------| | SUBG | 0A1 | 021 | 1A1 | 121 | 4A1 | 421 | 5A1 | 521 | | | None | /C | <b>/V</b> | /VC | /S | /SC | /S <b>V</b> | | | CVTGQ | 0AF | 02F | 1AF | 12F | 4AF | 42F | 5AF | 52F | # A.4 Opcode Summary Table A-7 lists all Alpha AXP opcodes from 00 (CALL\_PAL) through 3F (BGT). In the table, the column headings that appear over the instructions have a granularity of 8<sub>16</sub>. The rows beneath the Offset column supply the individual hex number to resolve that granularity. If an instruction column has a 0 in the right (low) hex digit, replace that 0 with the number to the left of the backslash in the Offset column on the instruction's row. If an instruction column has an 8 in the right (low) hexadecimal digit, replace that 8 with the number to the right of the backslash in the Offset column. For example, the third row (2/A) under the $10_{16}$ column contains the symbol INTS\*, representing the all-integer shift instructions. The opcode for those instructions would then be $12_{16}$ because the 0 in 10 is replaced by the 2 in the Offset column. Likewise, the third row under the $18_{16}$ column contains the symbol JSR\*, representing all jump instructions. The opcode for those instructions is 1A because the 8 in the heading is replaced by the number to the right of the backslash in the Offset column. The instruction format is listed under the instruction symbol. **Table A-7 Opcode Summary** | Offset | 00 | 08 | 10 | 18 | 20 | 28 | 30 | 38 | |-------------|---------------|----------------|---------------|----------------|--------------|----------------|--------------|--------------| | 0/8 | PAL*<br>(pal) | LDA<br>(mem) | INTA*<br>(op) | MISC*<br>(mem) | LDF<br>(mem) | LDL<br>(mem) | BR (br) | BLBC<br>(br) | | 1/9 | Res | LDAH<br>(mem) | INTL*<br>(op) | \PAL\ | LDG<br>(mem) | LDQ<br>(mem) | FBEQ<br>(br) | BEQ<br>(br) | | 2/A | Res | Res | INTS* (op) | JSR*<br>(mem) | LDS<br>(mem) | LDL_L<br>(mem) | FBLT<br>(br) | BLT<br>(br) | | 3/B | Res | LDQ_U<br>(mem) | INTM*<br>(op) | \PAL\ | LDT<br>(mem) | LDQ_L<br>(mem) | FBLE<br>(br) | BLE<br>(br) | | 4/C | Res | Res | Res | Res | STF<br>(mem) | STL<br>(mem) | BSR<br>(br) | BLBS<br>(br) | | 5/D | Res | Res | FLTV*<br>(op) | \PAL\ | STG<br>(mem) | STQ<br>(mem) | FBNE<br>(br) | BNE<br>(br) | | <b>6/E</b> | Res | Res | FLTI*<br>(op) | <b>NPALN</b> | STS<br>(mem) | STL_C<br>(mem) | FBGE<br>(br) | BGE<br>(br) | | 7/ <b>F</b> | Res | STQ_U<br>(mem) | FLTL*<br>(op) | \PAL\ | STT<br>(mem) | STQ_C<br>(mem) | FBGT<br>(br) | BGT<br>(br) | | G1 1 | Meaning | |--------|--------------------------------------------| | Symbol | | | FLTI* | IEEE floating-point instruction opcodes | | FLTL* | Floating-point operate instruction opcodes | | FLTV* | VAX floating-point instruction opcodes | | INTA* | Integer arithmetic instruction opcodes | | INTL* | Integer logical instruction opcodes | | INTM* | Integer multiply instruction opcodes | | INTS* | Integer shift instruction apcodes | | JSR* | Jump instruction opcodes | | MISC* | Miscellaneous instruction opcodes | | PAL* | PALcode instruction (CALL_PAL) apcodes | | \PAL\ | Reserved for PALcode | | Res | Reserved for Digital | # A.5 Required PALcode Function Codes The opcodes listed in Table A-8 are required for all Alpha AXP implementations. The notation used is oo.ffff, where oo is the hexadecimal 6-bit opcode and ffff is the hexadecimal 26-bit function code. Table A-8 Required PALcode Function Codes | Mnemonic | Туре | Function Code | | |----------|--------------|---------------|--| | DRAINA | Privileged | 00.0002 | | | HALT | Privileged | 00.0000 | | | IMB | Unprivileged | 00.0086 | | # A.6 Alpha 21164 Microprocessor IEEE Floating-Point Conformance The 21164 supports the IEEE floating-point operations as defined by the Alpha AXP architecture. Support for a complete implementation of the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Standard 754 1985) is provided by a combination of hardware and software as described in the Alpha Architecture Reference Manual. Additional information about writing code to support precise exception handling (necessary for complete conformance to the standard) is in the Alpha Architecture Reference Manual. The following information is specific to the 21164: ### Invalid operation (INV) The invalid operation trap is always enabled. If the trap occurs, then the destination register is UNPREDICTABLE. This exception is signaled if any VAX architecture operand is non-finite (reserved operand or dirty zero) and the operation can take an exception (that is, certain instructions, such as CPYS, never take an exception). This exception is signaled if any IEEE operand is non-finite (NAN, INF, denorm) and the operation can take an exception. This trap is also signaled for an IEEE format divide of +/- 0 divided by +/- 0. If the exception occurs, then FPCR[INV] is set and the trap is signaled to the Ibox. ### • Divide-by-zero (DZE) The divide-by-zero trap is always enabled. If the trap occurs, then the destination register is UNPREDICTABLE. For VAX architecture format, this exception is signaled whenever the numerator is valid and the denominator is zero. For IEEE format, this exception is signaled whenever the numerator is valid and non-zero, with a denominator of +/- 0. If the exception occurs, then FPCR[DZE] is set and the trap is signaled to the Ibox. For IEEE format divides, 0/0 signals INV, not DZE. ## Floating overflow (OVF) The floating overflow trap is always enabled. If the trap occurs, then the destination register is UNPREDICTABLE. The exception is signaled if the rounded result exceeds in magnitude the largest finite number, which can be represented by the destination format. This applies only to operations whose destination is a floating-point data type. If the exception occurs, then FPCR[OVF] is set and the trap is signaled to the Ibox. ### Underflow (UNF) The underflow trap can be disabled. If underflow occurs, then the destination register is forced to a true zero, consisting of a full 64 bits of zero. This is done even if the proper IEEE result would have been -0. The exception is signaled if the rounded result is smaller in magnitude than the smallest finite number that can be represented by the destination format. If the exception occurs, then FPCR[UNF] is set. If the trap is enabled, then the trap is signaled to the Ibox. The 21164 never produces a denormal number; underflow occurs instead. ### Inexact (INE) The inexact trap can be disabled. The destination register always contains the properly rounded result, whether the trap is enabled. The exception is signaled if the rounded result is different from what would have been produced if infinite precision (infinitely wide data) were available. For floating-point results, this requires both an infinite precision exponent and fraction. For integer results, this requires an infinite precision integer and an integral result. If the exception occurs, then FPCRINE] is set. If the trap is enabled, then the trap is signaled to the Ibox. The IEEE-754 specification allows INE to occur concurrently with either OVF or UNF. Whenever OVF is signaled (if the inexact trap is enabled), INE is also signaled. Whenever UNF is signaled (if the inexact trap is enabled), INE is also signaled. The inexact trap also occurs concurrently with integer overflow. All valid opcodes that enable INE also enable both overflow and underflow. If a CVTQL results in an integer overflow (IOV), then FPCR[INE] is automatically set. (The INE trap is never signaled to the Ibox because there is no CVTQL opcode that enables the inexact trap.) ### Integer overflow (IOV) The integer overflow trap can be disabled. The destination register always contains the low-order bits (<64> or <32>) of the true result (not the truncated bits). Integer overflow can occur with CVTTQ, CVTGQ or CVTQL. In conversions from floating to quadword integer or longword integer, an integer overflow occurs if the rounded result is outside the range $-2^{63}$ ... $2^{63-1}$ . In conversions from quadword integer to longword integer, an integer overflow occurs if the result is outside the range $-2^{31}$ ... $2^{31-1}$ . If the exception occurs, then the appropriate bit in the FPCR is set. If the trap is enabled, then the trap is signaled to the Ibox. ## Software completion (SWC) The software completion signal is not recorded in the FPCR. The state of this signal is always sent to the Ibox. If the Ibox detects the assertion of any of the listed exceptions concurrent with the assertion of the SWC signal, then it sets EXC\_SUM[SWC]. Input exceptions always take priority over output exceptions. If both exception types occur, then only the input exception is recorded in the FPCR and only the input exception is signaled to the Ibox. # Alpha 21164 Microprocessor Specifications Table B-1 lists specifications for the 21164. Table B-1 Alpha 21164 Microprocessor Specifications | Feature | Description | |----------------------------------------|--------------------------------------------------------------------------------------------------------------------------| | Cycle time range | 4.4 ns to 3.2 ns. | | Process technology | 0.5 micron CMOS. | | Die size | 664 X 732 mils. | | Package | 499-pin IPGA (interstitial pin grid array). | | Number of signal pins | 291. | | Maximum power dissipation (typ) | 45 W @ 3.75 ns cycle time (266 MHz), Vdd=3.45 V <sup>3</sup> . | | Clocking input | Two times the internal clock speed (for example, 571.4 MHz at a 3.5-ns cycle time). | | Virtual address size | 43 bits. | | Physical address size | 40 bits. | | Page size | 8K byte. | | Issue rate | 4 instructions per cycle. | | Integer instruction pipeline | 7 stage. | | Floating instruction pipeline | 9 stage. | | On-chip Dcache | 8K-byte, physical, direct-mapped, write-through, 32-byte block, 32-byte fill. | | On-chip Icache | 8K-byte, virtual, direct-mapped, 32-byte block, 32-byte fill, 128 address space numbers (ASNs) (MAX_ASN=127). | | On-chip Scache | 96K-byte, physical, 3-way set-associative, write-back, 32- or 64-byte block, 32- or 64-byte fill. | | On-chip data translation buffer | 64-entry, fully associative, not-last-used replacement, 8K pages, 128 ASNs (MAX_ASN=127), full granularity hint support. | | On-chip instruction translation buffer | 48 entry, fully associative, not-last-used replacement, 128 ASNs (MAX_ASN=127), full granularity hint support. | | Floating-point unit | On-chip FPU supports both IEEE and Digital floating point. | | Bus | Separate 128-bit data and address bus. | | Serial ROM interface | Allows microprocessor to access a serial ROM. | <sup>&</sup>lt;sup>1</sup>Power consumption scales linearly with frequency over the frequency range 225 MHz to 312 MHz. # Errata Sheet Table C-1 lists the revision history for this document. Table C-1 Document Revision History | Date | Revision | | |--------------------|-----------------------------|-----| | July 20, 1994 | First Preliminary version. | • . | | September 12, 1994 | Second Preliminary version. | | | | First edition. | | # Technical Support, Ordering, and Associated Literature This appendix describes how to: - Obtain Digital semiconductor information and technical support - Order Digital semiconductor products and associated literature # D.1 Calling the Semiconductor Information Line for Information and Technical Support Call the Semiconductor Information Line for information and technical support: United States and Canada 1-800-332-2717 Outside North America . +1-508-568-6868 # **D.2 Ordering Digital Semiconductor Products** To order the Alpha 21164 microprocessor and evaluation boards, contact your local Digital sales office. When working with your sales representative, you may be able to take advantage of discounts and volume pricing. You can order the following semiconductor products from Digital: | Product | Order Number | |-------------------------------------------------------------------------------------------------------------|----------------| | Alpha 21164-xxx Microprocessor | 21–40658–0x | | Alpha 21164 Microprocessor Evaluation Board 266 MHz<br>Kit (Supports OSF/1 and Windows NT operating systems | 21A01-xx<br>.) | | Alpha 21164 Microprocessor Evaluation Board Design<br>Package | EB164-xx | | Heat Sink Assembly Type 1 | xxxxx-xx | | Product | Order Number | | |---------------------------|--------------|--| | Heat Sink Assembly Type 2 | xxxxx-xx | | # **D.3 Ordering Digital Semiconductor Sample Kits** To order an Alpha 21164 Microprocessor Sample Kit, which contains one Alpha 21164 microprocessor, one heat sink, and supporting documentation, call 1-800-DIGITAL. You will need a purchase order number or credit card to order the following products. | Product | Order Number | |----------------------------|--------------| | Alpha 21164-xxx Sample Kit | 21164-xx | | Alpha 21164-xxx Sample Kit | 21164—xx | | Alpha 21164–xxx Sample Kit | 21164-xx | # D.4 Ordering Associated Digital Semiconductor Literature The following table lists some of the Alpha AXP literature that is available. For a complete list, and for information about ordering, contact the Semiconductor Information Line. | Title | Order Number | |---------------------------------------------------------|-----------------| | Alpha Architecture Reference Manual <sup>1</sup> | EY-L520E-DP-YCH | | Alpha 21164 Microprocessor Product Brief | EC-QAENA-TE | | Alpha 21164 Microprocessor Data Sheet | EC-QAEPA-TE | | Alpha 21164 Microprocessor Hardware Reference Manual | EC-QAEQA-TE | | Alpha 21164 PALcode System Design Guide | EC-QAExx-TE | | DECchip Preprocessor for Hewlett-Packard Logic Analyzer | EC-X2454-72 | To order and purchase the Alpha Architecture Reference Manual, call 1-800-DIGITAL from the U.S. or Canada, or contact your local Digital office, or technical or reference bookstore where Digital Press books are distributed by Prentice Hall. # **D.5 Ordering Associated Third-Party Literature** You can order the following third-party literature directly from the vendor: | Title | Vendor | | |-------------------------|--------------------------------------------------------------------------------------------------------------------------|--| | PCI System Design Guide | PCI Special Interest Group<br>M/S HF3–15A<br>5200 N.E. Elam Young Pkwy<br>Hillsboro, Oregon 97124–6497<br>1–503–696–2000 | | # Glossary The glossary provides definitions for specific terms and acronyms associated with the Alpha 21164 microprocessor and chips in general. # abort The unit stops the operation it is performing, without saving status, to perform some other operation. # **ABT** Advanced bipolar/CMOS technology. # address space number (ASN) An optionally implemented register used to reduce the need for invalidation of cached address translations for process specific addresses when a context switch occurs. ASNs are processor specific; the hardware makes no attempt to maintain coherency across multiple processors. # address translation The process of mapping addresses from one address space to another. # **ALIGNED** A datum of size 2\*\*N is stored in memory at a byte address that is a multiple of 2\*\*N (that is, one that has N low-order zeros). # ALU Arithmetic logic unit. ## ANSI American National Standards Institute. An organization that develops and publishes standards for the computer industry. # **ASIC** Application-specific integrated circuit. # **ASN** See address space number. # assert To cause a signal to change to its logical true state. # **AST** See asynchronous system trap. # asynchronous system trap (AST) A software-simulated interrupt to a user-defined routine. ASTs enable a user process to be notified asynchronously, with respect to that process, of the occurrence of a specific event. If a user process has defined an AST routine for an event, the system interrupts the process and executes the AST routine when that event occurs. When the AST routine exits, the system resumes execution of the process at the point where it was interrupted. # backmap A memory unit that is used to note addresses of valid entries within a cache. # bandwidth Bandwidth is often used to express "high rate of data transfer" in a bus or an I/O channel. This usage assumes that a wide bandwidth may contain a high frequency, which can accommodate a high rate of data transfer. # **Bcache** See external cache. # barrier transaction A transaction on the external interface as a result of an MB (memory barrier) instruction. # **BCT** Bipplar/CMOS technology. # **BICMOS** Bipolar/CMOS. The combination of bipolar and MOSFET transistors in a common integrated circuit. # bidirectional Flowing in two directions. The buses are bidirectional; they carry both input and output signals. # **BISr** Built-in self-repair. # **BiSt** Built-in self-test. #### bit Binary digit. The smallest unit of data in a binary notation system, designated as 0 or 1. # BIU Bus interface unit. See Cbox. # block exchange Memory feature that improves bus bandwidth by paralleling a cache victim write-back with a cache miss fill. # board-level cache See external cache. # boot Short for bootstrap. Loading an operating system into memory is called booting. # **BSR** Boundary scan register. # buffer An internal memory area used for temporary storage of data records during input or output operations. # bugcheck A software condition, usually the response to software's detection of an "internal inconsistency," which results in the execution of the system bugcheck code. # bus A group of signals that consists of many transmission lines or wires. It interconnects computer system components to provide communications paths for addresses, data, and control information. # byte Eight contiguous bits starting on an addressable byte boundary. The bits are numbered right to left, 0 through 7. # byte granularity Memory systems are said to have byte granularity if adjacent bytes can be written concurrently and independently by different processes or processors. # cache See cache memory. # cache block The smallest unit of storage that can be allocated or manipulated in a cache. Also known as a cache line. # cache coherence Maintaining cache coherence requires that when a processor accesses data cached in another processor, it must not receive incorrect data and when cached data is modified, all other processors that access that data receive modified data. Schemes for maintaining consistency can be implemented in hardware or software. Also called cache consistency. # cache fill An operation that loads an entire cache block by using multiple read cycles from main memory. # cache flush An operation that marks all cache blocks as invalid. # cache hit The status returned when a logic unit probes a cache memory and finds a valid cache entry at the probed address. # cache interference The result of an operation that adversely affects the mechanisms and procedures used to keep frequently used items in a cache. Such interference may cause frequently used items to be removed from a cache or incur significant overhead operations to ensure correct results. Either action hampers performance. ## cache line See cache block. # cache line buffer A buffer used to store a block of cache memory # cache memory A small, high-speed memory placed between slower main memory and the processor. A cache increases effective memory transfer rates and processor speed. It contains copies of data recently used by the processor and fetches several bytes of data from memory in anticipation that the processor will access the next sequential series of bytes. The Alpha 21164 microprocessor contains three on-chip internal caches. See also write-through cache and write-back cache. # cache miss The status returned when cache memory is probed with no valid cache entry at the probed address. # **CALL PAL Instructions** Special instructions used to invoke PALcode. # Cbox The external interface control logic unit. Provides the 21164 microprocessor with an interface to the external data bus, board-level Bcache, and the on-chip Scache. # central processing unit (CPU) The unit of the computer that is responsible for interpreting and executing instructions. # CISC Complex instruction set computer. An instruction set consisting of a large number of complex instructions that are managed by microcode. *Contrast with* RISC. # clean In the cache of a system bus node, refers to a cache line that is valid but has not been written. # clock A signal used to synchronize the circuits in a computer # **CMOS** Complementary metal-oxide-semiconductor. A silicon device formed by a process that combines PMOS and NMOS semiconductor material. # conditional branch instructions Instructions that test a register for positive/negative or for zero/non-zero. They can also test integer registers for even/odd # control and status register (CSR) A device or controller register that resides in the processor's I/O space. The CSR initiates device activity and records its status. # **CPLD** Complex programmable logic device. # **CPU** See central processing unit. # **CSR** See control and status register. # cycle One clock interval. ## data bus The bus used to carry data between the 21164 and external devices. Also called the pin bus. # **Dcache** Data cache. A cache reserved for storage of data. The Dcache does not contain instructions. # DIP Dual inline package. # direct-mapping cache A cache organization in which only one address comparison is needed to locate any data in the cache, because any block of main memory data can be placed in only one possible position in the cache. # direct memory access (DMA) Access to memory by an I/O device that does not require processor intervention. # dirty One status item for a cache block. The cache block is valid and has been written so that it may differ from the copy in system main memory. # dirty victim Used in reference to a cache block in the cache of a system bus node. The cache block is valid but is about to be replaced due to a cache block resource conflict. The data must therefore be written to memory. # DRAM Dynamic random-access memory. Read/write memory that must be refreshed (read from or written to) periodically to maintain the storage of information. # DTL Diode-transistor logic. # dual issue Two instructions are issued, in parallel, during the same microprocessor cycle. The instructions use different resources and so do not conflict. # **EB**164 An evaluation board. A hardware/software applications development platform for the Alpha AXP program and a debug platform for the Alpha 21164 microprocessor. # **Ebox** The Ebox contains the 64-bit integer execution data path. # **ECC** Error correction code. Code and algorithms used by logic to facilitate error detection and correction. See also ECC error. # **ECC** error An error detected by ECC logic, to indicate that data (or the protected "entity" has been corrupted. The error may be correctable (soft error) or uncorrectable (hard error). # **ECL** Emitter-coupled logic. # **EEPROM** Electrically erasable programmable read-only memory. A memory device that can be byte-erased, written to, and read from. Contrast with FEPROM. # **EPLD** Erasable programmable logic device. # external cache A cache memory provided outside of the microprocessor chip, usually located on the same module. Also called board-level or module-level cache. ## **Fbox** The unit within the 21164 microprocessor that performs floating-point calculations. # **FEPROM** Flash-erasable programmable read-only memory. FEPROMs can be bank- or bulk-erased. Contrast with EEPROM. ## FET Field-effect transistor. # firmware Machine instructions stored in hardware. # floating point A number system in which the position of the radix point is indicated by the exponent part and another part represents the significant digits or fractional part. # flush See cache flush. # **FPGA** Field-programmable gate array. # **FPLA** Field-programmable logic array. # granularity A characteristic of storage systems that defines the amount of data that can be read and/or written with a single instruction, or read and/or written independently. VAX systems have byte or multibyte granularities, whereas disk systems typically have 512-byte or greater granularities. For a given storage device, a higher granularity generally yields a greater throughput. # hardware interrupt request (HIR) An interrupt generated by a peripheral device. # high-impedance state An electrical state of high resistance to current flow, which makes the device appear not physically connected to the circuit. # hit See cache hit. # ibox A logic unit within the 21164 microprocessor that fetches, decodes, and issues instructions. It also controls the microprocessor pipeline. # **Icache** Instruction cache. A cache reserved for storage of instructions. One of the three areas of primary cache (located on the 21164) used to store instructions. The Icache contains 8 Kb of memory space. It is a direct-mapped cache. Icache blocks, or lines, contain 32 bytes of instruction stream data with associated tag as well as a 6-bit ASM field and an 8-bit branch history field per block. Icache does not contain hardware for maintaining cache coherency with memory and is unaffected by the invalidate bus. # **IEEE Standard 754** A set of formats and operations that apply to floating-point numbers. The formats cover 32-, 64-, and 80-bit operand sizes. # IEEE Standard 1149.1 A standard for the Test Access Port and Boundary Scan Architecture used in board-level manufacturing test procedures. # **INT**nn The term INTnn, where nn is one of 2, 4, 8, 16, 32, or 64, refers to a data field size of nn contiguous NATURALLY ALIGNED bytes. For example, INT4 refers to a NATURALLY ALIGNED longword. # internal processor register (IPR) One of many registers internal to the Alpha 21164 microprocessor. # **IPGA** Interstitial pin grid array. # **JFET** Junction field-effect transistor. # latency The amount of time it takes the system to respond to an event. # LCC Leadless chip carrier. # LFSR Linear feedback shift register. # load/store architecture A characteristic of a machine architecture where data items are first leaded into a processor register, operated on, and then stored back to memory. No operations on memory other than load and store are provided by the instruction set. # iongword Four contiguous bytes starting on an arbitrary byte boundary. The bits are numbered from right to left, 0 through 31. # **LSB** Least significant bit. # machine check An operating system action triggered by certain system hardware-detected errors that can be fatal to system operation. Once triggered, machine check handler software analyzes the error. # MAF Miss address file. # main memory The large memory, external to the microprocessor, used for holding most instruction code and data. Usually built from cost-effective DRAM memory chips. May be used in connection with the microprocessor's internal caches and an optional external cache. # masked write A write cycle that only updates a subset of a nominal data block. # **MBO** See must be one. # Mbox This section of the processor unit performs address translation, interfaces to the Dcache, and performs several other functions. # MBZ See must be zero. # **MESI** protocol A cache consistency protocol with full support for multiprocessing. The MESI protocol consists of four states that define whether a block is modified (M), exclusive (E), shared (S), or invalid (I). # **MIPS** Millions of instructions per second. # miss See cache miss. # module A board on which logic devices (such as transistors, resistors, and memory chips) are mounted and connected to perform a specific system function. # module-level cache See external cache. # MOS Metal-oxide-semiconductor. # MOSFET Metal-oxide-semiconductor field-effect transistor. # MSI Medium-scale integration. # multiprocessing A processing method that replicates the sequential computer and interconnects the collection so that each processor can execute the same or a different program at the same time. # Must be one (MBO) A field that must be supplied as one. # Must be zero (MBZ) A field that is reserved and must be supplied as zero. If examined, it must be assumed to be UNDEFINED. # **NATURALLY ALIGNED** See ALIGNED. # **NATURALLY ALIGNED data** Data stored in memory such that the address of the data is evenly divisible by the size of the data in bytes. For example, an ALIGNED longword is stored such that the address of the longword is evenly divisible by 4. # **NMOS** N-type metal-oxide-semiconductor. # **NVRAM** Nonvolatile random-access memory. # **OBL** Observability linear feedback shift register. # octaword Sixteen contiguous bytes starting on an arbitrary byte boundary. The bits are numbered from right to left, 0 through 127. # OpenVMS AXP operating system Digital's open version of the VMS operating system, which runs on Alpha AXP machines. # operand The data or register upon which an operation is performed. # PAL Privileged architecture library. See PALcode. Also Programmable array logic (hardware). A device that can be programmed by a process that blows individual fuses to create a circuit. # **PALcode** Alpha AXP privileged architecture library code, written to support Alpha microprocessors. PALcode implements architecturally defined behavior. # **PALmode** A special environment for running PALcode routines. # parameter A variable that is given a specific value that is passed to a program before execution. # parity A method for checking the accuracy of data by calculating the sum of the number of ones in a piece of binary data. Even parity requires the correct sum to be an even number, odd parity requires the correct sum to be an odd number. # **PGA** Pin grid array. # pipeline A CPU design technique whereby multiple instructions are simultaneously overlapped in execution. # **PLA** Programmable logic array. ## PLCC Plastic leadless chip carrier or plastic leaded chip carrier. ## **PLD** Programmable logic device. # PLL Phase-locked loop, # **PMOS** P-type metal-oxide-semiconductor. # **PQFP** Plastic quad flat pack. # primary cache The cache that is the fastest and closest to the processor. The first-level caches, located on the CPU chip, composed of the Dcache, Icache, and Scache. # program counter That portion of the CPU that contains the virtual address of the next instruction to be executed. Most current CPUs implement the program counter (PC) as a register. This register may be visible to the programmer through the instruction set. # **PROM** Programmable read-only memory. # pull-down resistor A resistor placed between a signal line and a negative voltage. # pull-up resistor A resistor placed between a signal line to a positive voltage. # quad issue Four instructions are issued, in parallel, during the same microprocessor cycle. The instructions use different resources and so do not conflict. # quadword Eight contiguous bytes starting on an arbitrary byte boundary. The bits are numbered from right to left, 0 through 63. # **RAM** Random-access memory. # **READ BLOCK** A transaction where the 21164 requests that an external logic unit fetch read data. # read data wrapping System feature that reduces apparent memory latency by allowing read data cycles to differ the usual low-to-high sequence. Requires cooperation between the 21164 and external hardware. # read stream buffers Arrangement whereby each memory module independently prefetches DRAM data prior to an actual read request for that data. Reduces average memory latency while improving total memory bandwidth. # register A temporary storage or control location in hardware logic. # reliability The probability a device or system will not fail to perform its intended functions during a specified time interval when operated under stated conditions. # reset An action that causes a logic unit to interrupt the task it is performing and go to its' initialized state. # RISC Reduced instruction set computer. A computer with an instruction set that is paired down and reduced in complexity so that most can be performed in a single processor cycle. High-level compilers synthesize the more complex, least frequently used instructions by breaking them down into simpler instructions. This approach allows the RISC architecture to implement a small, hardware-assisted instruction set, thus eliminating the need for microcode. # **ROM** Read-only memory. # **RTL** Register-transfer logic. # SAM Serial access memory. # SBO Should be one. # **SBZ** Should be zero. # Scache Secondary cache. A three-way set-associative, second-level cache located on the Alpha 21164 microprocessor. # scheduling The process of ordering instruction execution to obtain optimum performance. # set-associative A form of cache organization in which the location of a data block in main memory constrains, but does not completely determine, its location in the cache. Set-associative organization is a compromise between direct-mapped organization, in which data from a given address in main memory has only one possible cache location, and fully associative organization, in which data from anywhere in main memory can be put anywhere in the cache. An "n-way set-associative" cache allows data from a given address in main memory to be cached in any of n locations. The Scache in the 21164 microprocessor has a three-way set-associative organization. # SIMM Single inline memory module. # SIP Single inline package. ## SIDD Single inline pin package. # **SMD** Surface mount device. # **SRAM** Static random-access memory. # **SROM** Serial read-only memory. # SSI Small-scale integration. # **SSRAM** Synchronous static random-access memory. # stack An area of memory set aside for temporary data storage or for procedure and interrupt service linkages. A stack uses the last-in/first-out concept. As items are added to (pushed on) the stack, the stack pointer decrements. As items are retrieved from (popped off) the stack, the stack pointer increments. # **STRAM** Self-timed random-access memory. # superpipelined Describes a pipelined machine that has a larger number of pipe stages and more complex scheduling and control. See also pipeline. # superscalar Describes a machine architecture that allows multiple independent instructions to be issued in parallel during a given clock cycle. # tag The part of a cache block that holds the address information used to determine if a memory operation is a hit or a miss on that cache block. # TB Translation buffer. # tristate Refers to a bused line that has three states: high, low, and high-impedance. ## TTI Transistor-transistor logic. # **UART** Universal asynchronous receiver-transmitter. # **UNALIGNED** A datum of size 2\*\*N stored at a byte address that is not a multiple of 2\*\*N. # unconditional branch instructions Instructions that write a return address into a register. # UNDEFINED An operation that may halt the processor or cause it to lose information. Only privileged software (that is, software running in kernel mode) can trigger an UNDEFINED operation. # **UNPREDICTABLE** Results or occurrences that do not disrupt the basic operation of the processor; the processor continues to execute instructions in its normal manner. Privileged or unprivileged software can trigger UNPREDICTABLE results or occurrences. # **UVPROM** Ultraviolet (erasable) programmable read-only memory # valid Allocated. Valid cache blocks have been loaded with data and may return cache hits when accessed. # victim Used in reference to a cache block in the cache of a system bus node. The cache block is valid but is about to be replaced due to a cache block resource conflict. # virtual cache A cache that is addressed with virtual addresses. The tag of the cache is a virtual address. This process allows direct addressing of the cache without having to go through the translation buffer making cache hit times faster. # **VHSIC** Very-high-speed integrated circuit. # VLSI Very-large-scale integration. # **VRAM** Video random-access memory. # word Two contiguous bytes (16 bits) starting on an arbitrary byte boundary. The bits are numbered from right to left, 0 through 15. # write data wrapping System feature that reduces apparent memory latency by allowing write data cycles to differ the usual low-to-high sequence. Requires cooperation between the 21164 and external hardware. # write-back A cache management technique in which write operation data is written into cache but is not written into main memory in the same operation. This may result in temporary differences between cache data and main memory data. Some logic unit must maintain coherency between cache and main memory. # write-back cache Copies are kept of any data in the region; read and write operations may use the copies, and write operations use additional state to determine whether there are other copies to invalidate or update. # write-through A cache management technique in which a write operation to cache also causes the same data to be written in main memory during the same operation. # write-through cache Copies are kept of any data in the region; read operations may use the copies, but write operations update the actual data location and either update or invalidate all copies. # WRITE\_BLOCK A transaction where the 21164 requests that an external logic unit process write data. # Index # Aborts, 2–18 Absolute Maximum Rating, 9–1 ac coupling, 9–5 Addressing, 1–2 Address regions, physical, 4–12 Address translation, 2–10 Alpha AXP documentation, D–2 ALT\_MODE register, 5–60 Architecture, 1–1 to 1–3 Associated literature, D–2 AST, 2–8 ASTER register, 5–26 ASTRR register, 5–25 Asynchronous system trap See AST # B Bcache, 2-13 block size, 4-15 hit under READ MISS example, 4-87 interface, 4-4 introduction, 4-2 to 4-4 structure, 4-14 systems without, 4-79 timing, 4-29 victim buffers, 4-4 BCACHE VICTIM command, 4-36 BC\_CONFIG register, 5-84 BC\_CONTROL register, 5-78 BC\_TAG\_ADDR register, 5-88 BIU, 4-2 See also Cbox buffer, 4-4 Block diagram, 21164, 2-2 Boundaries data wrap order, 4-13 Branch prediction, 2-5, 2-19 Bubble squashing, 2-19 Bus contention command/address bus, 4-68 to 4-78 data bus, 4-68 to 4-78 Bus interface unit See BIU # C Cache coherency, 4–18 to 4–27 basics, 4–19 flush protocol, 4–20 flush protocol state machines, 4–26 flush protocol systems, 4–24 transaction conflicts, 4–27 write invalidate protocol, 4–20 write invalidate protocol state machines, 4–23 write invalidate protocol states, 4–22 write invalidate protocol systems, 4–21 Cache control and bus interface unit See Cbox Cache organization, 2–12 | Cbox, 2–12 | | |-------------------------------|----------------------------------| | IPR PAL restrictions, 5-99 | D | | IPRs, 5-68 to 5-97 | | | read requests, 2-30 | Data cache | | write buffer data store, 2-34 | See Dcache | | write ordering, 2–36 | Data integrity, 4-89 | | CC register, 5–61 | address and command parity, 4-92 | | CC_CTL register, 5-62 | Beache tag control parity, 4-91 | | Clocks, 4-5 to 4-12 | Bcache tag data parity, 4–91 | | CPU, 4-5 | ECC and parity, 4-89 | | reference, 4-8, 4-9 | force correction, 4-91 | | system, 4–6 | Data translation buffer | | Commands | See DTB | | 21164 initiated, 4-34 | Data types, 1-1 | | BCACHE VICTIM, 4-36 | floating-point, 1-3, 2-9 | | FETCH, 4-35 | integer, 1–2 | | FETCH_M, 4-35 | Data wrap order, 4-13 | | FLUSH, 4-62 | data_bus_req_h signal | | INVALIDATE, 4–53 | using, 4–72<br>Deache, 2–12 | | LOCK, 4-35 | Dcache, 2-12 | | MEMORY BARRIER, 4–35 | DC_FLUSH register, 5-60 | | NOP, 4–35, 4–53, 4–62 | DC_MODE register, 5-56 | | READ, 4-62 | DC_PERR_STAT register, 5-50 | | READ DIRTY, 4–53 | DC_TEST_CTL register, 5-63 | | READ DIRTY/INVALIDATE, 4–54 | DC_TEST_TAG register, 5-64 | | READ MISSO, 4–36 | DC_TEST_TAG_TEMP register, 5-66 | | READ MISS1, 4–36 | Decoupling, 9-20 | | READ MISS MOD0, 4–36 | Design examples, 2–39 | | READ MISS MOD1, 4–36 | Documentation, D-2 | | READ MISS MOD STC0, 4-37 | DTB, 2-10 | | READ MISS MOD STC1, 4-37 | DTBIAP register, 5-52 | | SET DIRTY, 4–35 | DTBIA register, 5–52 | | SET SHARED, 4–53 | DTBIS register, 5-53 | | WRITE BLOCK, 4–35 | DTB_ASN register, 5–38 | | WRITE BLOCK LOCK, 4-35 | DTB_CM register, 5-39 | | Conventions, xxii to xxvii | DTB_PTE register, 5-41 | | CPU | DTB_PTE_TEMP register, 5-43 | | microarchitecture, 2-2 | DTB_TAG register, 5-40 | | CPU elock, 4-5 | Duplicate tag store, 4-15 | | | algorithm, 4-17 | | | full, 4-15 | | | partial, 4–18 | | Ε | · H | | |---------------------------------------------|------------------------------|----------------------------------------| | Ebox, 2–9 | Heat sink, 10-3 | | | registers, 2-9, 5-98 | Hint bits, 2-10 | | | ECC, 4-89 to 4-91 | HWINT_CLR register, 5-28 | | | EI_ADDR register, 5-93 | HW_LD Instruction, 6-3 | | | EI_STAT register, 5-90 | HW_MFPR Instruction, 6-3 | | | Entry pointer queues, 2-34 | HW_MTPR Instruction, 6-3 | * | | Environment instructions | HW_REI Instruction, 6-3 | ************************************** | | PALcode, 6-7 | HW_ST Instruction, 6-3 | | | Error correction code | | | | See ECC | | | | Exceptions, 2-18 | <u> </u> | | | EXC_ADDR register, 5-14 | Ibox, 2–4 | | | EXC_MASK register, 5-17 | branch prediction, 2–5 | | | EXC_SUM register, 5–15 | instruction | | | External cache | decode, 2–5 | | | See Bcache | issue, 2–5 | | | External interface | instruction translation buf | fer, 2–7 | | rules for use, 4–80 | interrupts, 2–8 | | | External interface introduction, 4–2 to 4–4 | IPRs, 5–5 to 5–37 | | | | encoding, 5–2 | | | | slotting, 2-21 | | | F | Icache, 2–13 | | | Features, 1–3 to 1–4 | EPERR_STAT register, 5-1 | 3 | | FETCH command, 4-35 | ICSR register, 5–20 | | | FETCH_M command, 4-35 | IC_FLUSH_CTL register, 5- | -13 | | Fill, 2-31, 4-79 | idle_bc_h signal | | | after other transactions, 4-79 | length of assertion, 4-70 | | | FILL error, 4–92 | using, 4–70 | | | FILL transaction, 4-41 | IEEE floating-point conforma | | | fill_h signal | IFAULT_VA_FORM register, | 5–11 | | using, 4-70 | Initialization | | | FILL_SYN register, 5-94 | role of interrupt signals, | 4–93 | | Floating data types, 2-9 | Input clock | | | Floating-point unit | ac coupling, 9-5 | | | See FPU | impedance levels, 9-5 | | | FLUSH command, 4-62 | termination, 9–5 | | | FLUSH timing diagram, 4–64 | Input clocks, 9-4 | | | FLUSH transaction, 4-64 | Instruction | | | FPU, 2-9 | decode, 2–5 | | | Free-entry queue, 2–34 | issue, 2–5 | | | 1100 mmy queue, 2-01 | | | | Instruction cache | irks (cont a) | |-----------------------------------------------|-------------------------------| | See Icache | BC_CONFIG, 5-84 | | Instruction fetch/decode unit and branch unit | BC_CONTROL, 5-78 | | See Ibox | BC_TAG_ADDR, 5-88 | | Instruction issue, 1–3, 2–17 | CC, 5–61 | | Instructions | CC_CTL, 5-62 | | classes, 2-20 | DC_FLUSH, 5-60 | | issue rules, 2–27 | DC_MODE, 5-56 | | latencies, 2-23, 2-24 | DC_PERR_STAT, 5-50 | | MB, 2–12 | DC_TEST_CTL, 5-63 | | slotting, 2–20, 2–21 | DC_TEST_TAG, 5-64 | | WMB, 2–12, 2–34 | DC_TEST_TAG_TEMP, 5 | | Instruction translation buffer, 2–7 | DTBIA, 5–52 | | See ITB | DTBIAP, 5-52 | | Integer execution unit | DTBIS, 5-53 | | See Ebox | DTB_ASN, 5-38<br>DTB_CM, 5-39 | | Integer register.file | DIB_CM, 5-39<br>DTB_PTE, 5-41 | | See IRF | DTB_PTE_TEMP, 5-43 | | Interface restrictions, 4–79 | DTB_TAG, 5-40 | | Interface transactions | EI_ADDR, 5-93 | | 21164 initiated, 4–34 to 4–50 | ELSTAT, 5-90 | | system initiated, 4–51 to 4–67 | EXC_ADDR, 2-18, 5-14 | | Internal processor registers | EXC_MASK, 5-17 | | See IPRs | EXC_SUM, 5-15 | | Interrupts, 4–93 to 4–95 | FILL_SYN, 5-94 | | ASTs, 2–8 | HWINT_CLR, 5–28 | | disabling, 2–9 | ICPERR_STAT, 5-13 | | hardware, 2–8 | ICSR, 2-9, 5-20 | | initialization, 4–93 | IC_FLUSH_CTL, 5-13 | | normal operation, 4–93 | IFAULT_VA_FORM, 5-11 | | priority level, 4–93 | INTID, 5–24 | | software, 2–8 | IPL, 2-9, 5-23 | | Interrupt signals, 4–93 | ISR, 5–29 | | INTID register, 5–24 | ITB_ASN, 5-8 | | INTnn, xxiv | ITB_IA, 5–9 | | INVALIDATE command, 4-53 | ITB_IAP, 5–9 | | INVALIDATE timing diagram, 4-58 | ITB_IS, 5-10 | | INVALIDATE transaction, 4-58 | ITB_PTE, 5-6 | | IPL register, 5-23 | ITB_PTE_TEMP, 5–9 | | IPRs | ITB_TAG, 5–5 | | accessibility, 5-1 | IVPTBR, 5–12 | | ALT_MODE, 5-60 | MAF_MODE, 5–58 | | ASTER, 5-26 | MCSR, 5-54 | | ASTRR, 5-25 | MM_STAT, 5-44 | | IPRs (cont'd) | | |-----------------------------|-------------------------------------| | MVPTBR, 5-49 | | | PAL_BASE, 5-18, 6-3 | M | | PMCTR, 5-33 | MAF, 2–11, 2–29 to 2–32 | | PS, 5-19 | entries, 2–31 | | reset state, 7–9 | entry, 2-32 | | SC_ADDR, 5-75 | rules, 2–29 | | SC_CTL, 5-69 | MAF_MODE register, 5-58 | | SC_STAT, 5-72 | MB instruction, 2-12 | | SIRR, 5–27 | Mbox, 2-4, 2-10 | | SL_RCV, 5-32 | address translation, 2–10 | | SL_XMIT, 5-31 | data translation buffer, 2-10 | | VA, 5–46 | IPRs, 5-38 to 5-67 | | VA_FORM, 5-47 | encoding, 5-3 | | IRF, 2-9 | load instruction, 2-11 | | ISR register, 5–29 | miss address file, 2–11 | | Issue rules, 2–27 | store execution, 2-11, 2-32 to 2-33 | | Issuing rules, 2–19 to 2–28 | write buffer, 2–12 | | ITB, 2–7 | write buffer address file, 2-34 | | ITB_ASN register, 5–8 | MCSR register, 5-54 | | ITB_IAP register, 5–9 | Memory address translation unit | | ITB_IA register, 5–9 | See Mbox | | ITB_IS register, 5–10 | MEMORY BARRIER command, 4-35 | | ITB_PTE register, 5-6 | Memory regions, physical, 4–12 | | ITB_PTE_TEMP register, 5-9 | Merge | | ITB_TAG register, 5–5 | write buffer, 4–14 | | IVPTBR register, 5–12 | Merging | | | loads to noncacheable space, 2–30 | | | rules, 2–29 | | | Microarchitecture, 2–2 to 2–13 | | Latencies, 2–23, 2–24 | Miss address file | | Literature, D-2 | See MAF | | Live lock | | | cache conflict, 4-27 | MM_STAT register, 5-44 | | Load-after-store trap, 2–28 | Multiple instruction issue, 2–5 | | Load instructions | MVPTBR register, 5–49 | | noncacheable space, 2-30 | | | Load miss, 2-29 | N | | LOCK command, 4-35 | Nangachad road appretions 4 19 | | Locks, 4-28 | Noncached read operations, 4–13 | | LOCK timing diagram, 4-48 | Noncached write operations, 4–14 | | LOCK transaction, 4-48 | Nonissue conditions, 2–19 | | Logic Symbol, 3-1 | NOP command, 4-35, 4-53, 4-62 | | 0 | PMCTR register, 5–33 Power supply considerations, 9–19 | |---------------------------------------|------------------------------------------------------------------------| | Operating temperature, 10-1 | decoupling, 9-20 | | Ordering products, D-1 | sequencing, 9-20 | | | Private Bcache transactions | | | 21164 to Bcache, 4-29 to 4-33 | | Page table entry | Privileged architecture library code | | See PTE | See PALcode | | PAL | Producer-consumer dependencies, 2-23 | | restrictions, 5–100 | Producer-producer dependencies, 2-23 | | PALcode, 1–2 | Producer-producer latency, 2-26 | | environment instructions, 6–7 | PS register, 5-19 | | invoke, 6–3 | PTE, 2-7, 2-10 | | PALmode, 6–2 | | | environment, 6–2 | (A) | | PALshadow registers, 5–98 | - | | PALtemp IPRs, 5–98 | Queues | | encoding, 5–2 | entry pointer, 2-34 | | PAL_BASE IPR, 6–3 | | | PAL_BASE register, 5–18 | R | | Parity, 4–89 | Race condition | | Parts | 00000. | | ordering, D-1 | 21164 and system, 4–80 Race example | | Pending request queue, 2–34 | idle_bc_h and cack_h, 4–83 | | Performance counters, 2–36 | READ command, 4–62 | | Physical address considerations, 4–12 | READ Command, 4–52 READ DIRTY command, 4–53 | | Physical address regions, 4–12 | READ DIRTY/INVALIDATE command, 4–54 | | Physical memory regions, 4–12 | READ DIRTY/INVALIDATE command, 4–54 READ DIRTY/INVALIDATE transaction, | | Pipeline organization, 2–13 to 2–19 | 4–56 | | Pipelines, 2–9 | READ DIRTY timing diagram, 4-56 | | bubbles, 2–19 | READ DIRTY transaction, 4–56 | | examples, 2-14 | READ MISSO command, 4–36 | | floating add, 2-14 | READ MISSI command, 4–36 | | integer add, 2-14 | READ MISS MODO command, 4–36 | | load (Deache hit), 2-16 | READ MISS MODU command, 4–36 | | load (Deache miss), 2-16 | READ MISS MODI command, 4–30 READ MISS MOD STC0 command, 4–37 | | store (Deache hit), 2-17 | READ MISS MOD STC0 command, 4–37 READ MISS MOD STC1 command, 4–37 | | instruction issue, 2-17 | READ MISS no Beache timing diagram, | | stages, 2–14, 2–17 | 4–38 | | stall, 2-17, 2-19 | READ MISS timing diagram, 4–39 | | wave, 4-31 | READ MISS timing diagram, 4–39 READ MISS transaction, 4–39 | | | while mino maneachon, 4-00 | | READ MISS with idle_bc_h asserted | See SROM | |-----------------------------------------|-------------------------------------------| | example, 4-85 | SET DIRTY command, 4-35 | | READ MISS with victim abort example, | SET DIRTY timing diagram, 4-48 | | 4–86 | SET DIRTY transaction, 4-48 | | READ MISS with victim example, 4-81 | SET SHARED command, 4-53 | | READ MISS with victim timing diagram, | SET SHARED timing diagram, 4-60 | | 4-43, 4-44 | SET SHARED transaction, 4-60 | | READ MISS with victim transaction, 4-41 | Signal descriptions, 3-3 to 3-15 | | READ timing diagram, 4-66 | SIRR register, 5-27 | | READ transaction, 4-66 | Slotting, 2-21 | | Read/write spacing | SL_RCV register, 5-32 | | data bus contention, 4-69 | SL_XMIT register, 5-31 | | Reference clock, 4-8, 4-9 | Specifications | | example 1, 4-10 | mechanical, 11–1 | | example 2, 4-11 | SROM, 2-13 | | examples, 4-9 | Store | | Registers | execution, 2–11, 2–32 | | See also IPRs | Superpages, 2-8 | | accessibility, 5-1 | System clock, 4-6 | | integer, 2-9 | delayed, 4-8 | | PALshadow, 2-9, 5-98 | System clock delay, 4-8 | | PALtemp, 5–98 | System interface, 4–2 | | Related documentation, D-2 | addresses, 4-4 | | Replay traps, 2–28 to 2–29 | commands, 4-4 | | as aborts, 2–19 | System interface introduction, 4-2 to 4-4 | | load instruction, 2-11, 2-32 | | | load-miss-and-use, 2-19 | T | | Reset | | | forcing, 4-92 | Technical support, D-1 | | Resource conflict, 2–19 | Temperature, 10-1 | | Restrictions | Terminology, xxii to xxvii | | interface, 4–79 | Thermal design considerations, 10-4 | | | Thermal heat sink, 10-3 | | c · | Thermal management, 10-1 | | S | Thermal operating temperature, 10-1 | | Scache, 2-13 | Thermal resistance, 10-1 | | block size, 4-15 | Thermal specifications, 10-1 | | Scheduling rules, 2-19 to 2-28 | Timing diagrams | | SC_ADDR register, 5-75 | FLUSH, 4-64 | | SC_CTL register, 5-69 | INVALIDATE, 4–58 | | SC_STAT register, 5-72 | LOCK, 4-48 | | Second-level cache | READ, 4-66 | | See Scache | READ DIRTY, 4-56 | | occ somete | READ MISS, 4-39 | | | | | | | Serial read-only memory READ MISS transaction (no Bcache), 4-38 Timing diagrams (cont'd) READ MISS-no Bcache, 4-38 READ MISS with victim, 4-43, 4-44 SET DIRTY, 4-48 SET SHARED, 4-60 WRITE BLOCK, 4-47 Transactions FILL, 4-41 FLUSH, 4-64 INVALIDATE, 4-58 LOCK, 4-48 READ, 4-66 READ DIRTY, 4-56 READ DIRTY/INVALIDATE, 4-56 READ MISS, 4-39 READ MISS (no Bcache), 4-38 READ MISS with victim, 4-41 SET DIRTY, 4-48 SET SHARED, 4-60 system initiated, 4-51 WRITE BLOCK, 4-46 WRITE BLOCK LOCK, 4-46 Traps load-after-store, 2-28 load-miss-and-use, 2-27 replay, 2-19, 2-28, 2-32 Tristate BCACHE VICTIM to fill, 4-73 FILL to private Bcache read or write, 4-78 overlap, 4–68, 4–73 READ or WRITE to fill, 4–73 system Bcache command to fill, 4–76 # V VA register, 5-46 VA\_FORM register, 5-47 Victim buffers, 4-4, 4-42 # W Wave pipeline, 4-31 WMB instruction, 2-12 Write-after-write conflicts See Producer-producer dependencies and latency WRITE BLOCK command, 4-35 WRITE BLOCK command acknowledge, 4-79 WRITE BLOCK LOCK command, 4-35 WRITE BLOCK LOCK restriction, 4-79 WRITE BLOCK LOCK transaction, 4-46 WRITE BLOCK timing diagram, 4-47 WRITE BLOCK transaction, 4-46 Write buffer, 2-12, 2-34 to 2-36 entry processing, 2-35 Write ordering, 2-36