Product Cover Image

CUDA Handbook, The: A Comprehensive Guide to GPU Programming

By Nicholas Wilt

Published by Addison-Wesley Professional

Published Date: Jun 11, 2013

Description

The CUDA Handbook begins where CUDA By Example leaves off, discussing both CUDA hardware and software in detail that will engage any CUDA developer, from the casual to the most hardcore. Newer CUDA developers will see how the hardware processes commands and the driver checks progress; hardcore CUDA developers will appreciate topics such as the driver API, context migration, and how best to structure CPU/GPU data interchange and synchronization.

The book is partly a reference and partly a cookbook. Careful descriptions of hardware and software abstractions, best practices, and example source code are included. Much of the source code appears in the form of reusable “microbenchmarks” or “microdemos” designed to expose specific hardware characteristics or highlight specific use cases. Best practices are discussed and accompanied with source code. One idea emphasized is the “EERS Principle” (Empirical Evidence Reigns Supreme): that is, determining the fastest way to perform a given operation is best done empirically.

The book includes an extensive glossary, because it’s difficult to write about this topic without throwing word salad at the reader. 

Table of Contents


Preface   xxi

Acknowledgments   xxiii

About the Author   xxv

 

PART I: 1

 

Chapter 1: Background  3

1.1 Our Approach    5

1.2 Code    6

1.3 Administrative Items   7

1.4 Road Map   8

 

Chapter 2: Hardware Architecture  11

2.1 CPU Configurations   11

2.2 Integrated GPUs    17

2.3 Multiple GPUs    19

2.4 Address Spaces in CUDA   22

2.5 CPU/GPU Interactions   32

2.6 GPU Architecture    41

2.7 Further Reading    50

 

Chapter 3: Software Architecture  51

3.1 Software Layers    51

3.2 Devices and Initialization   59

3.3 Contexts    67

3.4 Modules and Functions   71

3.5 Kernels (Functions)   73

3.6 Device Memory    75

3.7 Streams and Events   76

3.8 Host Memory    79

3.9 CUDA Arrays and Texturing   82

3.10 Graphics Interoperability   86

3.11 The CUDA Runtime and CUDA Driver API  87

 

Chapter 4: Software Environment  93

4.1 nvcc–CUDA Compiler Driver   93

4.2 ptxas–the PTX Assembler   100

4.3 cuobjdump    105

4.4 nvidia-smi    106

4.5 Amazon Web Services   109

 

PART II: 119

 

Chapter 5: Memory   121

5.1 Host Memory    122

5.2 Global Memory    130

5.3 Constant Memory   156

5.4 Local Memory    158

5.5 Texture Memory    162

5.6 Shared Memory    162

5.7 Memory Copy    164

 

Chapter 6: Streams and Events  173

6.1 CPU/GPU Concurrency: Covering Driver Overhead   174

6.2 Asynchronous Memcpy   178

6.3 CUDA Events: CPU/GPU Synchronization  183

6.4 CUDA Events: Timing   186

6.5 Concurrent Copying and Kernel Processing  187

6.6 Mapped Pinned Memory   197

6.7 Concurrent Kernel Processing  199

6.8 GPU/GPU Synchronization: cudaStreamWaitEvent()  202

6.9 Source Code Reference   202

 

Chapter 7: Kernel Execution   205

7.1 Overview    205

7.2 Syntax   206

7.3 Blocks, Threads, Warps, and Lanes   211

7.4 Occupancy    220

7.5 Dynamic Parallelism   222

 

Chapter 8: Streaming Multiprocessors   231

8.1 Memory    233

8.2 Integer Support    241

8.3 Floating-Point Support   244

8.4 Conditional Code    267

8.5 Textures and Surfaces   269

8.6 Miscellaneous Instructions   270

8.7 Instruction Sets    275

 

Chapter 9: Multiple GPUs   287

9.1 Overview    287

9.2 Peer-to-Peer    288

9.3 UVA: Inferring Device from Address   291

9.4 Inter-GPU Synchronization   292

9.5 Single-Threaded Multi-GPU   294

9.6 Multithreaded Multi-GPU   299

 

Chapter 10: Texturing  305

10.1 Overview    305

10.2 Texture Memory    306

10.3 1D Texturing    314

10.4 Texture as a Read Path   317

10.5 Texturing with Unnormalized Coordinates  323

10.6 Texturing with Normalized Coordinates  331

10.7 1D Surface Read/Write   333

10.8 2D Texturing    335

10.9 2D Texturing: Copy Avoidance  338

10.10 3D Texturing    340

10.11 Layered Textures   342

10.12 Optimal Block Sizing and Performance   343

10.13 Texturing Quick References   345

 

PART III: 351

 

Chapter 11: Streaming Workloads  353

11.1 Device Memory    355

11.2 Asynchronous Memcpy   358

11.3 Streams    359

11.4 Mapped Pinned Memory   361

11.5 Performance and Summary   362

 

Chapter 12: Reduction  365

12.1 Overview    365

12.2 Two-Pass Reduction   367

12.3 Single-Pass Reduction   373

12.4 Reduction with Atomics   376

12.5 Arbitrary Block Sizes   377

12.6 Reduction Using Arbitrary Data Types   378

12.7 Predicate Reduction   382

12.8 Warp Reduction with Shuffle   382

 

Chapter 13: Scan   385

13.1 Definition and Variations   385

13.2 Overview    387

13.3 Scan and Circuit Design   390

13.4 CUDA Implementations   394

13.5 Warp Scans    407

13.6 Stream Compaction   414

13.7 References (Parallel Scan Algorithms)   418

13.8 Further Reading (Parallel Prefix Sum Circuits)   419

 

Chapter 14: N-Body  421

14.1 Introduction    423

14.2 Naïve Implementation   428

14.3 Shared Memory    432

14.4 Constant Memory   434

14.5 Warp Shuffle    436

14.6 Multiple GPUs and Scalability  438

14.7 CPU Optimizations   439

14.8 Conclusion    444

14.9 References and Further Reading  446

 

Chapter 15: Image Processing: Normalized Correlation  449

15.1 Overview    449

15.2 Naïve Texture-Texture Implementation   452

15.3 Template in Constant Memory  456

15.4 Image in Shared Memory   459

15.5 Further Optimizations   463

15.6 Source Code    465

15.7 Performance and Further Reading   466

15.8 Further Reading    469

 

Appendix A: The CUDA Handbook Library   471

A.1 Timing   471

A.2 Threading    472

A.3 Driver API Facilities   474

A.4 Shmoos    475

A.5 Command Line Parsing   476

A.6 Error Handling    477

 

Glossary / TLA Decoder   481

Index   487

Purchase Info

ISBN-10: 0-13-326152-2

ISBN-13: 978-0-13-326152-3

Format: eBook (Watermarked)?

This eBook includes the following formats, accessible from your Account page after purchase:

ePubEPUBThe open industry format known for its reflowable content and usability on supported mobile devices.

MOBIMOBIThe eBook format compatible with the Amazon Kindle and Amazon Kindle applications.

Adobe ReaderPDFThe popular standard, used most often with the free Adobe® Reader® software.

This eBook requires no passwords or activation to read. We customize your eBook by discretely watermarking it with your name, making it uniquely yours.

Includes EPUB, MOBI, and PDF

$47.99 $38.39

Add to Cart