作者:(美国)帕特森(David Patterson) (美国)亨尼斯(John L. Hennessy)
帕特森(Patterson D.A),加州大学伯克利分校计算机科学系教授,美国国家工程院院士,美国国家科学院院士,IEEE和ACM会士。他因为教学成果显著而荣获了加州大学的杰出教学奖、ACM的Karlstrom奖、IEEE的Mulligan教育奖章和本科生教学奖。因为对RISC技术的贡献。他获得IEEE的技术成就奖幂,IIACM的EckertMauchly奖;而在RAID方面的贡献为他赢得了IEEE Johnson信息存储奖。他还和dohn L.Hennessy分享了IEEE John von Neumann奖章和NEC C&C奖金。Patterson还是美国艺术与科学院院士、美国计算机历史博物馆院士,并被选入硅谷工程名人堂。Patterson身为美国总统信息技术顾问委员会委员,还曾担任加州大学伯克利分校电子工程与计算机科学系计算机科学分部主任、计算机研究协会(CRA)主席和ACM主席。这一履历使他荣获了ACM币IICRA颁发的杰出服务奖。
亨尼斯(Hennessy J.L.),斯坦福大学的第10任校长,从1977年开始在该校电子工程与计算机系任教。Hennessy教授是IEEE和ACM会士,美国国家工程院、美国国家科学院和美国哲学院院士,美国艺术与科学院院士。他获得过众多奖项,如2001年度Eckert—Mauchly奖,表彰他对RISC技术的贡献;2001年度SeymourCray计算机工程奖;与David Patterson共同获得的2000年度IEEE John von Neumann奖章。他还拥有7个荣誉博士学位。
目錄:
Preface v
About the Author xiii
CHAPTERS
1 Computer Abstractions and Technology 2
1.1 Introduction 3
1.2 Eight Great Ideas in Computer Architecture 11
1.3 Below Your Program 13
1.4 Under the Covers 16
1.5 Technologies for Building Processors and Memory 24
1.6 Performance 28
1.7 The Power Wall 40
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43
1.9 Real Stuff: Benchmarking the Intel Core i7 46
1.10 Fallacies and Pitfalls 49
1.11 Concluding Remarks 52
1.12 Historical Perspective and Further Reading 54
1.13 Exercises 54
2 Instructions: Language of the Computer 60
2.1 Introduction 62
2.2 Operations of the Computer Hardware 63
2.3 Operands of the Computer Hardware 66
2.4 Signed and Unsigned Numbers 73
2.5 Representing Instructions in the Computer 80
2.6 Logical Operations 87
2.7 Instructions for Making Decisions 90
2.8 Supporting Procedures in Computer Hardware 96
2.9 MIPS Addressing for 32-Bit Immediates and Addresses 106
2.10 Parallelism and Instructions: Synchronization 116
2.11 Translating and Starting a Program 118
2.12 A C Sort Example to Put It All Together 126
2.13 Advanced Material: Compiling C 134
2.14 Real Stuff: ARMy7 32-bit Instructions 134
2.15 Real Stuff: x86 Instructions 138
2.16 Real Stuff: ARMv8 64-bit Instructions 147
2.17 Fallacies and Pitfalls 148
2.18 Concluding Remarks 150
2.19 Historical Perspective and Further Reading 152
2.20 Exercises 153
3 Arithmetic for Computers 164
3.1 Introduction 166
3.2 Addition and Subtraction 166
3.3 Multiplication 171
3.4 Division 177
3.5 Floating Point 184
3.6 Parallelism and Computer Arithmetic: Subword Parallelism 210
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions in x86 212
3.8 Going Faster: Subword Parallelism and Matrix Multiply 213
3.9 Fallacies and Pitfalls 217
3.10 Concluding Remarks 220
3.11 Historical Perspective and Further Reading 224
3.12 Exercises 225
4 The Processor 230
4.1 Introduction 232
4.2 Logic Design Conventions 236
4.3 Building a Datapath 239
4.4 A Simple Implementation Scheme 247
4.5 An Overview ofPipelining 260
4.6 Pipelined Datapath and Control 274
4.7 Data Hazards: Forwarding versus Stalling 291
4.8 Control Hazards 304
4.9 Exceptions 313
4.10 Parallelism via Instructions 320
4.11 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Pipelines 332
4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply 339
4.13 Advanced Topic: An Introduction to Digital Design Using a Hardware
Design Language to Describe and Model a Pipeline and More Pipelining
Illustrations 342
4.14 Fallacies and Pitfalls 343
4.15 CondudingRemarks 344
4.16 Historical Perspective and Further Reading 345
4.17 Exercises 345
5 Large and Fast: Exploiting Memory Hierarchy 360
5.1 Introduction 362
5.2 Memory Technologies 366
5.3 The Basics of Caches 371
5.4 Measuring and Improving Cache Performance 386
5.5 Dependable Memory Hierarchy 406
5.6 Virtual Machines 412
5.7 Virtual Memory 415
5.8 A Common Framework for Memory Hierarchy 442
5.9 Using a Finite-State Machine to Control a Simple Cache 449
5.10 Parallelism and Memory Hierarchies: Cache Coherence 454
5.11 Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive Disks 458
5.12 Advanced Material: Implementing Cache Controllers 458
5.13 Real Stuff: The ARM Cortex-A8 and Intel Core i7 Memory Hierarchies 459
5.14 Going Faster: Cache Blocking and Matrix Multiply 463
5.15 Fallacies and Pitfalls 466
5.16 GoncludingRemarks 470
5.17 Historical Perspective and Further Reading 471
5.18 Exercises 471
6 Parallel Processors from Client to Cloud 488
6.1 Introduction 490
6.2 The Difficulty of Creating Parallel Processing Programs 492
6.3 SISD, MIMD, SIMD, SPMD, and Vector 497
6.4 Hardware Multithreading 504
6.5 Multicore and Other Shared Memory Multiprocessors 507
6.6 Introduction to Graphics Processing Units 512
6.7 Clusters, Warehouse Scale Computers, and Other Message-Passing Multiprocessors 519
6.8 Introduction to Multiprocessor Network Topologies 524
6.9 Communicating to the Outside World: Cluster Networking 527
6.10 Multiprocessor Benchmarks and Performance Models 528
6.11 Real Stuff: Benchmarking Intel Core i7 versus NVIDIA Tesla GPU 538
6.12 Going Faster: Multiple Processors and Matrix Multiply 543
6.13 Fallacies and Pitfalls 546
6.14 Concluding Remarks 548
6.15 Historical Perspective and Further Reading 551
6.16 Exercises 551
APPENDICES
A Assemblers, Linkers, and the SPiM Simulator A-2
A.1 Introduction A-3
A.2 Assemblers A-IO
A.3 Linkers A-18
A.4 Loading A-19
A.5 Memory Usage A-20
A.6 Procedure Call Convention A-22
A.7 Exceptions and Interrupts A-33
A.8 Input and Output A-38
A.9 SPIM A-40
A.10 MIPS R2000 Assembly Language A-45
A.11 Concluding Remarks A-81
A.12 Exercises A-82
B TH-2 High Performance Computing System B-2
B.1 Introduction B-3
B.2 Compute Node B-3
B.3 The Frontend Processors B-5
B.4 The Interconnect B-6
B.5 The Software Stack B-7
B.6 LINPACK Benchmark Run HPL B-7
B.7 Concluding Remarks B-8
F Networks-on-Chip F-2
F.1 Introduction F-3
F.2 Communication Centric Design F-3
F.3 The Design Space Exploration ofNoCs F-5
F.4 Router Micro-architecture F-8
F.5 Performance Metric F-9
F.6 Concluding Remarks F-9
Index I-1