Product Cover Image

Solaris Application Programming, Safari

By Darryl Gove

Published by Prentice Hall

Published Date: Dec 28, 2007

More Product Info

Description

Solaris™ Application Programming is a comprehensive guide to optimizing the performance of applications running in your Solaris environment. From the fundamentals of system performance to using analysis and optimization tools to their fullest, this wide-ranging resource shows developers and software architects how to get the most from Solaris systems and applications.


Whether you’re new to performance analysis and optimization or an experienced developer searching for the most efficient ways to solve performance issues, this practical guide gives you the background information, tips, and techniques for developing, optimizing, and debugging applications on Solaris.


The text begins with a detailed overview of the components that affect system performance. This is followed by explanations of the many developer tools included with Solaris OS and the Sun Studio compiler, and then it takes you beyond the basics with practical, real-world examples. In addition, you will learn how to use the rich set of developer tools to identify performance problems, accurately interpret output from the tools, and choose the smartest, most efficient approach to correcting specific problems and achieving maximum system performance.


Coverage includes

  • A discussion of the chip multithreading (CMT) processors from Sun and how they change the way that developers need to think about performance
  • A detailed introduction to the performance analysis and optimization tools included with the Solaris OS and Sun Studio compiler
  • Practical examples for using the developer tools to their fullest, including informational tools, compilers, floating point optimizations, libraries and linking, performance profilers, and debuggers
  • Guidelines for interpreting tool analysis output
  • Optimization, including hardware performance counter metrics and source code optimizations
  • Techniques for improving application performance using multiple processes, or multiple threads
  • An overview of hardware and software components that affect system performance, including coverage of SPARC and x64 processors

 

Table of Contents

Preface xix

 

Part I: Overview of the Processor 1


Chapter 1: The Generic Processor 3

1.1 Chapter Objectives 3

1.2 The Components of a Processor 3

1.3 Clock Speed 4

1.4 Out-of-Order Processors 5

1.5 Chip Multithreading 6

1.6 Execution Pipes 7

1.7 Caches 11

1.8 Interacting with the System 14

1.9 Virtual Memory 16

1.10 Indexing and Tagging of Memory 18

1.11 Instruction Set Architecture 18


Chapter 2: The SPARC Family 21

2.1 Chapter Objectives 21

2.2 The UltraSPARC Family 21

2.3 The SPARC Instruction Set 23

2.4 32-bit and 64-bit Code 30

2.5 The UltraSPARC III Family of Processors 30

2.6 UltraSPARC T1 37

2.7 UltraSPARC T2 37

2.8 SPARC64 VI 38


Chapter 3: The x64 Family of Processors 39

3.1 Chapter Objectives 39

3.2 The x64 Family of Processors 39

3.3 The x86 Processor: CISC and RISC 40

3.4 Byte Ordering 41

3.5 Instruction Template 42

3.6 Registers 43

3.7 Instruction Set Extensions and Floating Point 46

3.8 Memory Ordering 46


Part II: Developer Tools 47


Chapter 4: Informational Tools 49

4.1 Chapter Objectives 49

4.2 Tools That Report System Configuration 49

4.3 Tools That Report Current System Status 55

4.4 Process- and Processor-Specific Tools 72

4.5 Information about Applications 84


Chapter 5: Using the Compiler 93

5.1 Chapter Objectives 93

5.2 Three Sets of Compiler Options 93

5.3 Using -xtarget=generic on x86 95

5.4 Optimization 96

5.5 Generating Debug Information 102

5.6 Selecting the Target Machine Type for an Application 103

5.7 Code Layout Optimizations 107

5.8 General Compiler Optimizations 116

5.9 Pointer Aliasing in C and C++ 123

5.10 Other C- and C++-Specific Compiler Optimizations 133

5.11 Fortran-Specific Compiler Optimizations 135

5.12 Compiler Pragmas 136

5.13 Using Pragmas in C for Finer Aliasing Control 142

5.14 Compatibility with GCC 147


Chapter 6: Floating-Point Optimization 149

6.1 Chapter Objectives 149

6.2 Floating-Point Optimization Flags 149

6.3 Floating-Point Multiply Accumulate Instructions 173

6.4 Integer Math 174

6.5 Floating-Point Parameter Passing with SPARC V8 Code 178


Chapter 7: Libraries and Linking 181

7.1 Introduction 181

7.2 Linking 181

7.3 Libraries of Interest 193

7.4 Library Calls 199


Chapter 8: Performance Profiling Tools 207

8.1 Introduction 207

8.2 The Sun Studio Performance Analyzer 207

8.3 Collecting Profiles 208

8.4 Compiling for the Performance Analyzer 210

8.5 Viewing Profiles Using the GUI 210

8.6 Caller—Callee Information 212

8.7 Using the Command-Line Tool for Performance Analysis 214

8.8 Interpreting Profiles 215

8.9 Intepreting Profiles from UltraSPARC III/IV Processors 217

8.10 Profiling Using Performance Counters 218

8.11 Interpreting Call Stacks 219

8.12 Generating Mapfiles 222

8.13 Generating Reports on Performance Using spot 223

8.14 Profiling Memory Access Patterns 226

8.15 er_kernel 233

8.16 Tail-Call Optimization and Debug 235

8.17 Gathering Profile Information Using gprof 237

8.18 Using tcov to Get Code Coverage Information 239

8.19 Using dtrace to Gather Profile and Coverage Information 241

8.20 Compiler Commentary 244


Chapter 9: Correctness and Debug 247

9.1 Introduction 247

9.2 Compile-Time Checking 248

9.3 Runtime Checking 256

9.4 Debugging Using dbx 262

9.5 Locating Optimization Bugs Using ATS 271

9.6 Debugging Using mdb 274


Part III: Optimization 277


Chapter 10: Performance Counter Metrics 279

10.1 Chapter Objectives 279

10.2 Reading the Performance Counters 279

10.3 UltraSPARC III and UltraSPARC IV Performance Counters 281

10.4 Performance Counters on the UltraSPARC IV and UltraSPARC IV+ 302

10.5 Performance Counters on the UltraSPARC T1 304

10.6 UltraSPARC T2 Performance Counters 308

10.7 SPARC64 VI Performance Counters 309

10.8 Opteron Performance Counters 310


Chapter 11: Source Code Optimizations 319

11.1 Overview 319

11.2 Traditional Optimizations 319

11.3 Data Locality, Bandwidth, and Latency 326

11.4 Data Structures 339

11.5 Thrashing 349

11.6 Reads after Writes 352

11.7 Store Queue 354

11.8 If Statements 357

11.9 File-Handling in 32-bit Applications 364


Part IV: Threading and Throughput 369


Chapter 12: Multicore, Multiprocess, Multithread 371

12.1 Introduction 371

12.2 Processes, Threads, Processors, Cores, and CMT 371

12.3 Virtualization 374

12.4 Horizontal and Vertical Scaling 375

12.5 Parallelization 376

12.6 Scaling Using Multiple Processes 378

12.7 Multithreaded Applications 385

12.8 Parallelizing Applications Using OpenMP 402

12.9 Using OpenMP Directives to Parallelize Loops 403

12.10 Using the OpenMP API 406

12.11 Parallel Sections 407

12.12 Automatic Parallelization of Applications 408

12.13 Profiling Multithreaded Applications 410

12.14 Detecting Data Races in Multithreaded Applications 412

12.15 Debugging Multithreaded Code 413

12.16 Parallelizing a Serial Application 417


Part V: Concluding Remarks 435


Chapter 13: Performance Analysis 437

13.1 Introduction 437

13.2 Algorithms and Complexity 437

13.3 Tuning Serial Code 442

13.4 Exploring Parallelism 444

13.5 Optimizing for CMT Processors 446


Index 447

Purchase Info

ISBN-10: 0-7686-8139-1

ISBN-13: 978-0-7686-8139-0

Format: Safari PTG

This publication is not currently for sale.