深入理解计算机系统
深入理解计算机系统(CS-APP)笔记。
参考:
Preface
This book is written from a programmer’s perspective. This book spands all of these aspects:
- hardware architecture
- operation system
- compiler
- network
- cyber security
Assumptions about the reader’s background
- systems that execute x86-64 machine code
- how these machines execute C programs on Linux
- C or C++
How to read the book
- the only way to learn systems is to do systems
Overview
12 chapters:
-
A Tour of Computer Systems. introduce the major ideas and themes in coumputer systems.
-
Representing and Manipulating Information. cover computer arithmetic, emphasizing the properties of unsigned and two’s-complement number representations that affect programmers.
-
Machine-Level Representation of Programs. teach you how to read the x86-64 machine code generated by a C compiler.
-
Processor Architecture. cover basic combinational and sequential logic elements, and shows how these elements can be combined in a datapath that executes a simplified subset of the x86-64 instruction set.
-
Optimizing Program Performance. introduce a number of techniques for imporving code performance.
-
The Memory Hierarchy. The memory system is one of the most visible parts of a computer system to application programmers.
-
Linking. cover both static and dynamic linking, including the ideas of relocatable and executable object files, symbol resolution, relocation, static libraries, shared object libraries, position-independent code, and library interpositioning.
-
Exceptional Control Flow.
-
Virtual Memory.
-
System-Level I/O. Unix I/O such as files and descriptors.
-
Network Programming. Networks are interesting I/O devices to program.
-
Concurrent Programming.
A Tour of Computer System
A computer system consists of hardware and software that work together to run application programs.All CS have similar hardware and software components that perform similar functions.
As we follow the lifetime of the hello program, we will briefly introduce the key concepts, terminology, and components that come into play.
|
|
The ASCII text representation of hello.c
|
|
Information Is Bits + Context
hello program begins life as a source file(hello.c
), is a sequence of bits(0 or 1), organized in 8-bit chunks called bytes.
Each byte represents some text charachter in the program.
Most CS represent text characters using the ASCII standard that represents each charachter with a unique byte-size interger value.
the hello.c
is stored in a file as a sequence of bytes. (35-#, 105-i, …)
Files such as hello.c
that consist exclusively of ASCII characters are known as text files. Other files are known as binary files.
The representation of hello.c
illustrates a fundamental idea: All info in a system (including disk files, programs in memory, user data in memory, data transferred across a network), is represented as a bunch of bits. The only thing that distiguished different data objects is the context in which we view them.
As programmers, we need to understand machine representations of numbers, because they are not the same as integers and real numbers.
Programs Are Translated by Other Programs into Different Forms
hello programs as a high-level C program, because it can be read and understood by human in that form.
The individual C statements must be translated by other programs into a sequence of low-level machine-language instructions. These instrucstions are then packaged in a form called an executable object program and stored as a binary disk file.
On a Unix system, the translation from source file to object file is performed by a compiler driver.
|
|
GCC compiler driver read source file and translate it into an executable object file.
Four phases:
- Preprocessing
- Compilation
- Assembly
- Linking
The GNU project
Gcc is one of tool developed by the GNU project
It Pays to Understand How Compilation Systems Work
Why programmers need to understand how compilation systems work:
- Optimizing program performance
- Understanding link-time errors
- Avoiding security holes
Processors Read and Interpret Instructions Stored in Memory
If the first word of the command line does not correspond to a built-in shell command, then the shell assumes that it is the name of an executable file that is should load and run.
|
|
Hardware Organization of a System
To understand what happends to hello program when we run it, we need to understand the hardware organization of a typical system.
- Buses: carry bytes of info back and forth between the components.
- I/O Devices: the system’s connection to the external world.
- Main Memory: a temporary storage device that holds both a program and the data it manipulates while the processor is executing the program.
- Processor/CPU: the engine that executes instructions stored in main memory.
- Load: copy a byte or a word from main memory into a register(overwrite)
- Store: copy a byte of a word from a register to a location in main memory(overwrite)
- Operate: copy the contents of two registers to the ALU, perform an arithmetic operation on the two words, and store the result in a register(ovewrite)
- Jump: extract a word from the instruction itself and copy that word into the program counter(overwrite)
Running the hello Program
As we type ./hello
, the shell program reads each one into a register and then stores it in memory.
When hit the enter key, the shell knows that we have finished typing the command. the shell then loads the executable hello file by executing a sequence of instructions that copy the code and data in the hello object file from disk to main memory.
Once the code and data in the hello object file are loaded into memory, the processor begins executing the machine-language instructions in the hello program’s main routine.
Caches Matter
A system spends a lot of time moving infos from one place to another.
The machine instructrions in the hello program are originally stored on disk. When it is loaded, they are copied to main memory. As the processor runs the program, instructions are copied from memory into processor.
|
|
Much of this copying is overhead that slows down the “real work” of the program. Thus, a major goal for system designers is to make these copy operations run as fast as possible.
The processor read data from the register file almost 100 times faster than from memory(processor-memory gap). It is easier and cheaper to make processors run faster than it is to make main memory run faster.
To deal with the processor-memory gap, system designer include smaller, faster storage devices called cache memories that serve as temporary staging areas for infos that the processor is likely to need in the near future.
L1 cache can be accessed nearly as fast as the register file.
A larger L2 cache is connected to the processor by a special bus. It might take 5 times longer for the processor to access the L2 cache than the L1 cache, but this is still 5 to 10 times faster than than accessing the main memory.
Newer and more powerful systems even have three levels of caches: L1, L2, L3.
The L1 and L2 caches are implemented with a hardware technology known as static random access memory (SRAM).
Storage Devices Form a Hierarchy
The OS Manages the Hardware
We can think of the OS as a layer of software interposed between the application program and the hardware. All attempts by an application program and the hardware must go through the OS.
The OS has two primary purposed:
- To protect the hardware from misuse by runaway applications
- To provide applications with simple and uniform mechaisms for manupulating complicated and often wildly different low-level hardware devices.
Process
A process is the OS’s abstraction for a running program. Multiple processes can run concurrently on the same system, and each process appears to have exclusive use of the hardware.
Aside Unix, Posix, and the Standard Unix Specification
In 1970, Brian Kernighan dubbed the new system “Unix” as a pun on the complexity of “Multics”. The kernel was rewritten in C in 1973, and Unix was announced to the outside world in 1974.
Because Bell Labs made the source code available to schools, Unix developend a large following. 1980s, the Berkeley researchers adding virtual memory and the Internet protocols in a series of releases called Unix 4.x BSD. Concurrently, Bell Labs was releasing their own versions, which bacame known as System V Unix.
Posix standards, that cover such issues as the C language interface for Unix system calls, shell programs and utilities, threads and network programming. More recently, a separate standardization effort, known as the “Standard Unix Specification” has joined forces with Posix to create a single, unified standard for Unix systems.
As a result of these standardization efforts, the differences between Unix versions have largely disappeared.
The OS performs this interleaving with a mechanism known as context switching.
The OS keeps track of all the state info that the process needs in order to run. This state, which is known as the context, includes info such as the current values of the PC, the register file, and the contents of main memory.
When the OS decides to transfer control from the current process to some new process, it performs a context swtich by saving the context of the current process, restoring the context of the new process, and then passing control to the new process.
Example:
- Initially, the shell process is running alone, waiting for input on the command line.
- run hello program
- the shell carries out request by invoking a special function known as a system call that pass control to the OS.
- the OS saves the shell’s context, creates a new hello process and its context, and pass control to the new hello process.
- After hello terminates, the OS restores the context of the shell process and passes control back to it, where is watis for the next cmd line input.
Threads
In modern systems, a process can actually consist of multiple execution units, called threads, each running in the context of the process and sharing the same code and global data.
It is easier to share data between multiple threads than between multiple processes, and threads are typically more efficient than processes.
Multi-threading is also one way to make programs run faster when multiple processors are available.
Virtual Memory
Virtual memory is an abstraction that provides each process with the illusion that it has exclusive of the main memory. Each process has the same uniform view of memory, which is known as its virtual address space.
- Program code and data: fixed address
- Heap: expands and contracts dynamically at run time
- Shared libraries
- Stack: expands and contracts dynamically at run time
- Kernel virtual memory
Files
A file is a sequence of bytes. Every I/O device, including disks, keyboards, displays, and even networks, is modeled as a file. All input and output in the system is performed by reading and writing files.
Systems Communicate with Other Systems Using Networks
Important Themes
A system is a collection of intertwined hardware and systems software that must cooperate in order to achieve the ultimate goal of running programs.
Amdahl’s Law
The main idea is that when we speed up one part of a system, the effect on the overall system performance depends on both how significant this part was and how much it sped up.
Convurrency and Parallelism
- concurrency to refer to the general concept of a system with multiple, simultaneous activities.
- parallelism to refer to the use of concurrency to make a system run faster. Parallelism can be exploited at multiple levels of abstraction in a computer system.
Thread-Level Concurrency
Building on the process abstraction, we are able to devise systems where multiple programs execute at the same time, leading to concurrency. With threads, we can even have multiple control flows executing within a single process.
Support for concurrent execution has been found in computer systems since “time-sharing”.
Hyperthreading, sometimes called simultaneous multi-threading, is a technique that allows a single CPU to execute multiple flows of control.
Instruction-Level Parallelism
At a much lower level of abstraction, modern processors can execure multiple instructions at one time, a property known as instruction-level parallelism.
Single-Instruction, Multiple-Data Parallelism
At the lowest leve, many modern processors have special hardware that allows a single instruction to cause multiple operations to be performed in parallel.
The Importrance of Abstractions in CS
The use of abstractions is one of the mose important concepts in computer science.
On the processor side, the instruction set architecture provides an abstraction of the actural processor hardware. With this abstraction, a machine code program behaves as if it were executed on a processor that performs just on instruction at a time.
The underlying hardware is far more elaborate, executing multiple instructions in parallel, but always in a way that consistent with the simple, sequential model. By keeping the same execution model, different processor implementations can execute the same machein code while offering a range of cost and performance.
On the OS side, we have introduced three abstractions:
- files as an abstraction of IO devices
- virtual memory a an abstraction of program memory
- processes as an abstraction of a running program
- a new one: virtual machine providing an abstraction of the entire computer, including the OS, the processor, and the programs.
Representing and Manipulating Information
Modern computer store and process infos represented as two-valued signals. These lowly binary digits, or bits.
three most important representations of numbers:
- unsigned
- signed
- floating-point
Computer representations use a limited number of bits to encode a number, and hence come operations can overflow when the results are too large to be represented.
Information Storage
Rather than accessing individual bits in memory, most computers use blocks of 8 bits, or bytes, as the smallest addressable unit of memory.
A machine-level program views memory as a very large array of bytes, refered to as virtual memory.
Every byte of memory is identified by a unique number, known as its address, and the set of all possible addressed is known as the virtual address space.
Hexadecimal Notation
binary and decimal, neither notation is convenient for describing bit patterns. Instead, we write bit patterns as base-16, or hexadecimal numbers.
hex uses 0-9A-F to represent 16 possbile values.
|
|
Data Sizes
Every computer has a word size, indicating the nominal size of pointer data. Such as 32 bits, 64 bits word sizes.
A 32-bit word size limits the virtual address space to 4 gigabytes. Scaling up to 64-bit word size leads to a virtual address space of 16 exabytes.
Most 64-bit machines can also run programs compiled for use on 32-bit machines, a form of backward compatibility.
Computers and compilers support multiple data formats using different ways to encode data.
Programmers should strive to make their programs portable across different machines and compilers.
Addressing and Byte Ordering
For program objects that span multiple bytes, we must establish two convetions:
- What the address of the object will be
- How we will order the bytes in memory
|
|
Representing Strings
As a consequence, text data are more platform independent than binary data.
Aside: The Unicode standard for text encoding
The ASCII character set is suitable for encoding english-language documents, but it does not have much in the way of special character, such as Frech, Chines, Greek… The Unicode Consortium has devised the most comprehensive and widely accepted standard for encoding text.
The current Unicode standard has a rpertoire of over 100,000 characters supporting.
Representing Code
Different machine types use different and incompatible instructions and encodings.
Introduction to Boolean Algebra
Integer Representations
bits can be used to encode integers:
- one that can only represent nonnegative numbers
- ont that can present negative, zero, and positive numbers
Integral Data Types
Unsigned Encodings
Two’s-Complement Encodings
The most common computer representation of signed numbers is known as two’s-complement form.
When the sign bit is set to 1, the represented value is negative, and when set to 0, the value is nonnegative.
Conversions between Signed and Unsigned
Signed versus Unsigned in C
Generally, most numbers are signed by default.
Expanding the Bit Representation of a Number
To convert an unsigned number to a larger data type, we can simply add leading zeros to the representation, this operation is known as zero extension.
For converting a two’s-complement number to a larger data type, the rule is to perform a sign extension, adding copies of the most significant bit to the representation.
Truncating Numbers
Advice on Signed versus Unsigned
The implicit casting of signed to unsigned leads to some nonintutive behavior. Nonintuitive features often lead to program bugs, and ones involving the nuances of implicit casting can be espcially difficult to see.
Integer Arithmetic
Many programmers are suprised to find that adding two positive numbers can yield a negative result.
Understanding the nuances of computer arithmetic can help programmers write more reliable code.
Unsigned Addition
Two’s-Complement Addition
With two’s complement addition, we must decide what to do when the result is either too large(positive) or to0 samll(negative) to represent.
In fact, most computers use the same machine instruction to perform either unsigned or signed addition.
Two’s-Complement Negation
Unsigned Multiplication
Two’s-Complement Multiplication
Multiplying by Constants
Integer multiply instruction is slow, requiring 10 or more clock cycles, whereas other operations, such as addition, subtraction, bit-level operations, and shifting——required only 1 clock cycle.
Dividing by Powers of 2
Integer division on most machines is even slower than inter multiplication, requiring 30 or more clock cycles.
Dividing by a power of 2 can also performed using shift operations, but we use a right shift rather than a left shift.
Final Thoughts on Integer Arithmetic
The integer arithmetic performed by computers is really a form of modular arithmetic.
Floating Point
A floating point representation encodes rational numbers of the form $V = Xx2^Y$
Nowdays, virtually all computers support what has become known as IEEE floating point.
Fractional Binary Numbers
$$101.11_2=1x2^2+0x2^1+1x2^0+1x2^{-1}+1x2^{-2}=4+0+1+\frac{1}{2}+\frac{1}{4}=5\frac{3}{4}=5.75$$
IEEE Floating-Point Representation
The privious representation would not be efficient for very large numbers.
The IEEE floating point standard represents a number in form $V = (-1)^s x M x 2^E$
- the sign s determines thether the number is negative or positive.
- the significand M is a fractional binary number.
- the exponent E weights the value by a power of 2.
Rounding
One key problem is to define the direction to round a value that is halfway between two possibilities. If have 1.50 and want to round it to the nearest, should the result be 1 or 2? An alternative approach is to maintain a lower and an upper bound on the actual number.
The IEEE floating point format defines four different rounding modes. The default method finds a closest match, while the other three can be used for computing upper and lower bounds.
Floating Point Operations
For Example, with single precision floating point the expression $(3.14+1e10)-1e10$ evaluates to 0.0, the value 3.14 is lost due to rounding. On the other hand, the expression $3.14+(1e10-1e10)$ evaluates to 3.14.
Floating Point in C
floating point data types in C:
float
double
Summary
Computers encode info as bits, generally organized as sequences of bytes.
Different encodings are used for representing integers, real numbers, and character strings. Different models of computers use different conventions for encoding numbers and for ordering the bytes within multi-byte data.
The advantage of 64-bit programs is that they can go beyond the 4 GB address limitation of 32-bit programs.
Most machines encode signed numbers using a two’s complement representation and encode floating point numbers using IEEE Standard 754. Understanding these encodings at the bit level, as well as understanding the mathematical characteristics of the arithmetic operations, is important for writing programs that operate correctly over the full range of numeric values.
Due to the finite lengths of the encodings, computer arithmetic has properties quite different from conventional integer and real arithmetic. The finite length can cause numbers to oveflow, when they exceed the range of the representation. Floating point values can also underflow, then they are co close to 0.0 that they are changed to zero.
Floating point representations approximate real numbers by encoding numbers of the form $Xx2^Y$. IEEE Standard 754 provides for several different precisions, with the most common being single and double.
Floating point arithmetic must be used very carefully, because it has only limited range and precision, and because it does not obey common mathematical properties such as associativity.