Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics...

Dealing with Computational Load in Multi-user Scalable City

with OpenCL

Assets and Dynamics Computation for Virtual

Worlds

Assets and Dynamics Computation for Virtual

Worlds

Focus on OpenCL

• Leveraging OpenCL allows targeting and testing various parallel compute resources for each workload: Core i7, Cell, Tesla, GPU

• Using a combination of compute accelerators allows us to alter configuration and mappings between software and hardware more easily.

• Vendors continue to optimize OpenCL drivers while we optimize our use of OpenCL: free development!

Multi-User Load Challenges• Communications

• Graphics Rendering– Geometry Processing

– Shaders

– Rendering Techniques

• Dynamics Computation– Physics

– AI or other application specific behaviors

– Animation

Particle Systems

• Each player is represented by a cloud of particles in the shape of a cyclone.

• Particle systems are client-side only: no effect upon environment, no need for synchronizing.

• With many players per city, each player processes many times more particles.

• Causing scalability and performance problems on the client with multiple players visible.

Particle Systems: Now

• CPU computes new positions & texture coordinates

• New info must be sent to the card each frame.

• More CPU & bandwidth used as players increase.

Particle Systems: Solution

• Utilize OpenCL to compute particle updates on the GPU.

• Particle system is an ideal GPU workload.

• Use OpenGL/DirectX interoperability to keep all data on the card: primary reason for GPU supremacy for this task!

• Expect an order of magnitude or more performance improvement on this system.

Server Physics

Effect of Multi-User On Physics

• Multi-user Scalable City will:– Scale total interactivity occurring at once

– Shift focus more towards parallel computation

– Impose greater demands on the state of the art in parallel computation

– Consequently expand upon the state of the art in parallel computation

– Utilize both incremental and parallel methods• Reduce work as much as possible

• Parallelize all work that must be done

Utilizing Parallel Hardware• Next step: offloading physics from z10

– Xeon blades running Scalable Engine

– Cell BE blades running Bullet for Cell

– Distribute heavy computational stages

• Collision Detection on broad phase pair output

• Constraint solving/Integration on contact groups

• Then: OpenCL plan

– Develop physics system using algorithms well-suited to OpenCL parallelization

Server Physics: Parallelization

• Physics is difficult to parallelize well:

– Each stage has vastly different properties

• If stages map to different devices, large data buffer transmissions must synchronize compute accelerators.

– Computationally heavy stages

• Collision detection is coarse-grained

• Independent contact groups can be large, unbalanced

• Constraint solving difficult to break down

– Traditional systems that solve all constraints simultaneously (e.g. using Gauss-Seidel) parallelize in limited ways.

Server Physics: Goals

• Produce a new physics processing pipeline based as much as possible on OpenCL

• Minimize buffer transmission to/from OpenCL devices by keeping most all stages in OpenCL

• Specialize physics algorithms to highly parallel hardware

• Scale to as much activity as possible in real time as we scale hardware resources

Server Physics: Approach• Present efforts for OpenCL physics focus upon

traditional algorithms and pipeline

– These methods have “won out” in single-core era

– Physics programmers most familiar with these methods and their trade-offs

– Ex: AMD-funded OpenCL port of Bullet

• We choose techniques better suited to massively parallel computation!

– Greater potential, more exploration to perform

“Advanced Character Physics” – Jakobsen GDC 2001

• History

– Developed for speed and simplicity

– Not yet implemented in a parallel system

• Features

– All physics operates on solely on particles

– There is no large, global set of constraints to solve• Ever

Jakobsen: Key Features

• Objects represented as a set of particles and stick constraints rather than geometric shapes

• All constraints solved individually, w/o reference to other constraints

• Collision response by simple projection

• Velocity-less Verlet integration method (often used in molecular dynamics)

Rigid Bodies from Particles

• Cube as set of particles and stick constraints

– Corners are particles

– Stick constraints placed for edges

– Stick constraints placed to prevent collapse

• A stick constraint requires that the distance between two points be a constant value

Constraint Solving• Normally, all constraints upon a body are

solved simultaneously– Solution to large set of equations and unknowns– Produces a transform that does not violate any

constraints

• Jakobsen method solves each constraint individually– Solving one constraint violates another– Iteratively solving constraints approaches solution– Fewer iterations can be used when warranted

Jakobsen: Key Features• Verlet Integration– Replace use of ‘velocity’ with ‘previous position’

• Projection– Simply move particle out of collision with face!

• Combination of features results in very stable simulation– Velocity & acceleration never get out of control

• Retaining OpenCL buffers on device– Last cycle’s result array binds to ‘previous position’

Jakobsen: The Good• Physics processing tends toward large number of

simpler, evenly divided computations!– Collision detection and constraints operate on particles– All constraints are solved independently

• Can reduce buffer transfers to half!– No contact graph generation stage– Allows parallel computation across collision detection,

constraint generation, constraint solving, integration

Contact GraphColl. Det. Integration

Coll. Det. Integration

Jakobsen: The Bad

• Constraints can be violated temporarily

– Solving one constraint can re-violate another

– ~10 constraint solving passes produces rigid body-like behavior: fewer passes produce cloth->plant behavior

• Body transforms must be computed based upon particle configuration in post-processing for display purposes

• Additional behaviors require re-engineering (friction model, bouncing, inverse kinematics, etc) in light of Verlet & particle scheme (see paper)

Progress

• Work is just beginning

OpenCL particle system with Verlet integration

Complex forces, texture animation for cyclones

Particle/height-map collision detection

Collision response by projection

Rigid bodies as set of particles + stick constraints

Multi-pass relaxation solver

Progress

Further Out

• Distribute responsibility for dynamics among qualified clients

– Increases world asynchronicity

– Develop operational semantics to indicate synchronous status

– Server gathers, correlates, and distributes updates to world

Conclusion

• OpenCL at front and center of efforts to alleviate performance problems on both server and client in our multi-user systems.

• Focus is on algorithms to– Reduce buffer communication to/from OpenCL– Increase parallelism by breaking up problems into

smaller pieces

Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics...

Documents

Transcript of Dealing with Computational Load in Multi-user Scalable City with OpenCL Assets and Dynamics...