OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.
Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics...
-
date post
19-Dec-2015 -
Category
Documents
-
view
226 -
download
0
Transcript of Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics...
Dealing with Computational Load in Multi-user Scalable City
with OpenCL
Assets and Dynamics Computation for Virtual
Worlds
Assets and Dynamics Computation for Virtual
Worlds
Focus on OpenCL
• Leveraging OpenCL allows targeting and testing various parallel compute resources for each workload: Core i7, Cell, Tesla, GPU
• Using a combination of compute accelerators allows us to alter configuration and mappings between software and hardware more easily.
• Vendors continue to optimize OpenCL drivers while we optimize our use of OpenCL: free development!
Multi-User Load Challenges• Communications
• Graphics Rendering– Geometry Processing
– Shaders
– Rendering Techniques
• Dynamics Computation– Physics
– AI or other application specific behaviors
– Animation
Particle Systems
• Each player is represented by a cloud of particles in the shape of a cyclone.
• Particle systems are client-side only: no effect upon environment, no need for synchronizing.
• With many players per city, each player processes many times more particles.
• Causing scalability and performance problems on the client with multiple players visible.
Particle Systems: Now
• CPU computes new positions & texture coordinates
• New info must be sent to the card each frame.
• More CPU & bandwidth used as players increase.
Particle Systems: Solution
• Utilize OpenCL to compute particle updates on the GPU.
• Particle system is an ideal GPU workload.
• Use OpenGL/DirectX interoperability to keep all data on the card: primary reason for GPU supremacy for this task!
• Expect an order of magnitude or more performance improvement on this system.
Server Physics
Effect of Multi-User On Physics
• Multi-user Scalable City will:– Scale total interactivity occurring at once
– Shift focus more towards parallel computation
– Impose greater demands on the state of the art in parallel computation
– Consequently expand upon the state of the art in parallel computation
– Utilize both incremental and parallel methods• Reduce work as much as possible
• Parallelize all work that must be done
Utilizing Parallel Hardware• Next step: offloading physics from z10
– Xeon blades running Scalable Engine
– Cell BE blades running Bullet for Cell
– Distribute heavy computational stages
• Collision Detection on broad phase pair output
• Constraint solving/Integration on contact groups
• Then: OpenCL plan
– Develop physics system using algorithms well-suited to OpenCL parallelization
Server Physics: Parallelization
• Physics is difficult to parallelize well:
– Each stage has vastly different properties
• If stages map to different devices, large data buffer transmissions must synchronize compute accelerators.
– Computationally heavy stages
• Collision detection is coarse-grained
• Independent contact groups can be large, unbalanced
• Constraint solving difficult to break down
– Traditional systems that solve all constraints simultaneously (e.g. using Gauss-Seidel) parallelize in limited ways.
Server Physics: Goals
• Produce a new physics processing pipeline based as much as possible on OpenCL
• Minimize buffer transmission to/from OpenCL devices by keeping most all stages in OpenCL
• Specialize physics algorithms to highly parallel hardware
• Scale to as much activity as possible in real time as we scale hardware resources
Server Physics: Approach• Present efforts for OpenCL physics focus upon
traditional algorithms and pipeline
– These methods have “won out” in single-core era
– Physics programmers most familiar with these methods and their trade-offs
– Ex: AMD-funded OpenCL port of Bullet
• We choose techniques better suited to massively parallel computation!
– Greater potential, more exploration to perform
“Advanced Character Physics” – Jakobsen GDC 2001
• History
– Developed for speed and simplicity
– Not yet implemented in a parallel system
• Features
– All physics operates on solely on particles
– There is no large, global set of constraints to solve• Ever
Jakobsen: Key Features
• Objects represented as a set of particles and stick constraints rather than geometric shapes
• All constraints solved individually, w/o reference to other constraints
• Collision response by simple projection
• Velocity-less Verlet integration method (often used in molecular dynamics)
Jakobsen: Key Features
• Objects represented as a set of particles and stick constraints rather than geometric shapes
• All constraints solved individually, w/o reference to other constraints
• Collision response by simple projection
• Velocity-less Verlet integration method (often used in molecular dynamics)
Rigid Bodies from Particles
• Cube as set of particles and stick constraints
– Corners are particles
– Stick constraints placed for edges
– Stick constraints placed to prevent collapse
• A stick constraint requires that the distance between two points be a constant value
Jakobsen: Key Features
• Objects represented as a set of particles and stick constraints rather than geometric shapes
• All constraints solved individually, w/o reference to other constraints
• Collision response by simple projection
• Velocity-less Verlet integration method (often used in molecular dynamics)
Constraint Solving• Normally, all constraints upon a body are
solved simultaneously– Solution to large set of equations and unknowns– Produces a transform that does not violate any
constraints
• Jakobsen method solves each constraint individually– Solving one constraint violates another– Iteratively solving constraints approaches solution– Fewer iterations can be used when warranted
Jakobsen: Key Features
• Objects represented as a set of particles and stick constraints rather than geometric shapes
• All constraints solved individually, w/o reference to other constraints
• Collision response by simple projection
• Velocity-less Verlet integration method (often used in molecular dynamics)
Jakobsen: Key Features• Verlet Integration– Replace use of ‘velocity’ with ‘previous position’
• Projection– Simply move particle out of collision with face!
• Combination of features results in very stable simulation– Velocity & acceleration never get out of control
• Retaining OpenCL buffers on device– Last cycle’s result array binds to ‘previous position’
Jakobsen: The Good• Physics processing tends toward large number of
simpler, evenly divided computations!– Collision detection and constraints operate on particles– All constraints are solved independently
• Can reduce buffer transfers to half!– No contact graph generation stage– Allows parallel computation across collision detection,
constraint generation, constraint solving, integration
Contact GraphColl. Det. Integration
Coll. Det. Integration
Jakobsen: The Bad
• Constraints can be violated temporarily
– Solving one constraint can re-violate another
– ~10 constraint solving passes produces rigid body-like behavior: fewer passes produce cloth->plant behavior
• Body transforms must be computed based upon particle configuration in post-processing for display purposes
• Additional behaviors require re-engineering (friction model, bouncing, inverse kinematics, etc) in light of Verlet & particle scheme (see paper)
Progress
• Work is just beginning
OpenCL particle system with Verlet integration
Complex forces, texture animation for cyclones
Particle/height-map collision detection
Collision response by projection
Rigid bodies as set of particles + stick constraints
Multi-pass relaxation solver
Progress
Further Out
• Distribute responsibility for dynamics among qualified clients
– Increases world asynchronicity
– Develop operational semantics to indicate synchronous status
– Server gathers, correlates, and distributes updates to world
Conclusion
• OpenCL at front and center of efforts to alleviate performance problems on both server and client in our multi-user systems.
• Focus is on algorithms to– Reduce buffer communication to/from OpenCL– Increase parallelism by breaking up problems into
smaller pieces