Graphics Pipeline (GLSL) Graphics Pipeline (GLSL) GPGPU (GLSL) GPU Computing (CUDA, OpenCL) - Student Presentation
- Final Project
Goal: Prepare you for your presentation and project
A historical perspective on the graphics pipeline A historical perspective on the graphics pipeline - Dimensions of innovation.
- Where we are today
- Fixed-function vs programmable pipelines
A closer look at the fixed function pipeline We can program the fixed-function pipeline ! What constitutes data and memory, and how access affects program design.
High fragment load / low vertex load High fragment load / low vertex load
Simultaneous rendering to multiple buffers Simultaneous rendering to multiple buffers PCIe bus Vertex texture fetch
Not exactly a quantum leap, but… Not exactly a quantum leap, but… Simultaneous rendering to multiple buffers True conditionals and loops Higher precision throughput in the pipeline (64 bits end-to-end, compared to 32 bits earlier.) PCIe bus More memory/program length/texture accesses
Complete quantum leap Complete quantum leap Ground-up rewrite of GPU Support for DirectX 10, and all it implies (more on this later) Geometry Shader Support for General GPU programming Shared Memory (NVIDIA only)
Not covered today: Not covered today: - SM 5 / D3D 11 / GL 4
- Tessellation shaders
- *cough* student presentation *cough*
- Later this semester: NVIDIA Fermi
- Dual warp scheduler
- Configurable L1 / shared memory
- Double precision
- …
Released 01/04/2011 Released 01/04/2011 http://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspx
Vertices mapped from object space to world space Vertices mapped from object space to world space M = model transformation (scene) V = view transformation (camera)
Lighting information is combined with normals and other parameters at each vertex in order to create new colors. Lighting information is combined with normals and other parameters at each vertex in order to create new colors.
More matrix transformations that operate on a vertex to transform it into the viewport space. More matrix transformations that operate on a vertex to transform it into the viewport space. Note that a vertex may be eliminated from the input stream (if it is clipped). The viewport is two-dimensional: however, vertex z-value is retained for depth testing.
All primitives are now converted to fragments. All primitives are now converted to fragments. Data type change ! Vertices to fragments
The rasterizer produces a stream of fragments. The rasterizer produces a stream of fragments. Each fragment undergoes a series of tests with increasing complexity.
Stencil test: S(x, y) is stencil buffer value for fragment with coordinates (x,y) Stencil test: S(x, y) is stencil buffer value for fragment with coordinates (x,y) If f(S(x,y)), let pixel pass else kill it. Update S(x, y) conditionally depending on f(S(x,y)) and g(D(x,y)). Depth test: D(x, y) is depth buffer value. If g(D(x,y)) let pixel pass else kill it. Update D(x,y) conditionally.
Stencil and depth tests are more general conditionals. Why ? Stencil and depth tests are more general conditionals. Why ? These are the only tests that can change the state of internal storage (stencil buffer, depth buffer). One of the update operations for the stencil buffer is a “count” operation. Remember this! Unfortunately, stencil and depth buffers have lower precision (8, 24 bits resp.)
Blending: pixels are accumulated into final framebuffer storage Blending: pixels are accumulated into final framebuffer storage If op is +, we can sum all the (say) red components of pixels that pass all tests. Problem: In generation<= IV, blending can only be done in 8-bit channels (the channels sent to the video card); precision is limited.
Color Buffers Color Buffers - Front-left
- Front-right
- Back-left
- Back-right
Depth Buffer (z-buffer) Stencil Buffer Accumulation Buffer
Scissor Test Scissor Test - If(fragment exists inside rectangle)
- keep
- Else
- delete
Alpha Test – Compare fragment’s alpha value against reference value Stencil Test – Compare fragment against stencil map Depth Test – Compare a fragment’s depth to the depth value already present in the depth buffer - Never
- Always
- Less
- Less-Equal
- Greater-Equal
- Greater
- Not-Equal
What is the output of a “computation” ? What is the output of a “computation” ? Display on screen. Render to buffer and retrieve values (readback) Readbacks are VERY slow !
You are given n sites (p1, p2, p3, … pn) in the plane (think of each site as having a color) You are given n sites (p1, p2, p3, … pn) in the plane (think of each site as having a color) For any point p in the plane, it is closest to some site pj. Color p with color i. Compute this colored map on the plane. In other words, Compute the nearest-neighbour diagram of the sites.
In order to compute the lower envelope, we need to determine, at each pixel, the fragment having the smallest depth value. In order to compute the lower envelope, we need to determine, at each pixel, the fragment having the smallest depth value. This can be done with a simple depth test. - Allow a fragment to pass only if it is smaller than the current depth buffer value, and update the buffer accordingly.
The fragment that survives has the correct color.
The 1-median of a set of sites is a point q* that minimizes the sum of distances from all sites to itself. The 1-median of a set of sites is a point q* that minimizes the sum of distances from all sites to itself. q* = arg min Σ d(p, q)
Can we compute, for each pixel q, the value Can we compute, for each pixel q, the value F(q) = Σ d(p, q) We can use the cone trick from before, and instead of computing the minimum depth value, compute the sum of all depth values using blending. What’s the catch ?
Using texture interpolation helps here. Using texture interpolation helps here. Instead of drawing a single cone, we draw a shaded cone, with an appropriately constructed texture map. Then, fragment having depth z has color component 1.0 * z. Now we can blend the colors. OpenGL has an aggregation operator that will return the overall min Warning: we are ignoring issues of precision.
Stream data (data associated with vertices and fragments) Stream data (data associated with vertices and fragments) - Color/position/texture coordinates.
- Functionally similar to member variables in a C++ object.
- Can be used for limited message passing: I modify an object state and send it to you.
Memory “connectivity” in the graphics use of a GPU is tricky. Memory “connectivity” in the graphics use of a GPU is tricky. In a traditional C program, all global variables can be written by all routines. In the fixed-function pipeline, certain data is private. - A fragment cannot change a depth or stencil value of a location different from its own.
- The framebuffer can be copied to a texture; a depth buffer cannot be copied in this way, and neither can a stencil buffer.
- Only a stencil buffer can count (efficiently)
In the fixed-function pipeline, depth and stencil buffers can be used in a multi-pass computation only via readbacks. A texture cannot be written directly. In programmable GPUs, the memory connectivity becomes more open, but there are still constraints. Understanding access constraints and memory “connectivity” is a key step in programming the GPU.
The most important question to ask when programming the GPU is: The most important question to ask when programming the GPU is: What can I do in one pass ? Limitations on memory connectivity mean that a step in a computation may often have to be deferred to a new pass. For example, when computing the second smallest element, we could not store the current minimum in read/write memory. Thus, the “communication” of this value has to happen across a pass.
Dostları ilə paylaş: |