T, H,cis800/lectures/cfd_yue.pdfSir Isaac Newton. Philosophiæ Naturalis Principia Mathematica....

T, H, ...

CFD Simulation and Building Control

Yue Wang

Computational Fluid Dynamics (CFD) for Indoor Air Quality (IAQ) modelingSpyros K. Stamatelos1,2, Panos G. Georgopoulos1 • Computational Chemodynamics Laboratory (www.ccl.rutgers.edu)

1Environmental and Occupational Health Sciences Institute, a Joint Institute of UMDNJ-Robert Wood Johnson Medical School and Rutgers University, Piscataway, New Jersey

Acknowledgments

Base Funding for the Ozone Research Center is Provided by the State of New Jersey Department of Environmental Protection. This work has also been funded in part by the USEPA-funded Environmental Bioinformatics and Computational ToxicologyCenter (ebCTC), under STAR Grant # GAD R 832721-010 and by the USEPA-funded Center for Exposure and Risk Modeling (CERM), under Cooperative Agreement # CR-83162501. This work has not been reviewed by and does not represent the opinions of the funding agencies.

Abstract

Indoor Air Quality Modeling can be conducted based on the ideal or non-ideal mixed assumption regarding the dispersion of contaminants in the indoor space or part of it. CFD analysis involves the numerical solution of the equations that describe the balance of mass, momentum and energy in the flow domain. This approach resolves all the fluid phenomena and patterns that occur due to ventilation, heat sources (e.g. humans), furniture etc. down to scales corresponding to the resolution of the computational grid for the solution of the governing equations. With the indoor airflow thus resolved, species transport and transformation equations can be solved and indoor contamination issues can be characterized.

Case study

Cross-contamination can occur in hospital Emergency Rooms (ER) where patients share the same space before they are examined by a physician. This could become extremely important in cases where a massive civilian contamination occurs (e.g. epidemic or terrorist attack), and the number of affected people may grow due to short term exposure to contaminated air. It is therefore important to know how factors such as ventilation and room architecture can affect the quick escape of contaminants.

• Schematic depiction of source to exposure modeling framework

• Time and length scales define the expediency of each approach for different exposure scenarios

Observation and treatment room of an ER unit, including 2 beds, 2 inlets, 1 outlet, and 4 human subjects who are simulated as heat source rectangular bodies emitting 60 W/m2. Moreover the heat emitted by equipment and lighting was taken into account in the simulations (Dimensions, architecture and equipment of the room were based on the VA Design Guide –Ambulatory Care, Department of Veterans Affairs, 1995)

Airflow pathlines and velocity magnitude (m/s) after approximately 70 seconds

• Buoyant flows are created around the heat sources (humans)

• Significant velocity variations in the flow field

Vector plot of the airflow velocity (m/s) in the midplane of the room

• Recirculation as well as stagnant areas can be observed

Contour plot of the turbulence intensity percentage in the midplane of the room

• Significant turbulence gradients• Non-isotropic turbulence

Particle dispersion and residence time (s) in the airflow domain produced after 70 seconds of airflow simulation

• Steady State tracking (~ 1500 particles tracked)• Discrete Random Walk (DRW) for

characterization of particle fluctuations due to turbulent eddies

• Extremely preferential dispersion of particles in the room

• Obvious influence by the flow field

ER unit modeling domain incorporating 4 thermal mannequins

Conclusions

• Location of inlets, outlets, distribution of furniture and release location of particles can influence the contaminant concentration in the room regions

• The presence of ‘obstacles’ in the flow such as humans, furniture etc. seems to reinforce turbulence gradients

• Unfurnished room mixing times of particles differ significantly from times for furnished room mixing

Preprocessing and model inputs

• Software used for preprocessing: GAMBIT• Unstructured grid (tetrahedral cell topology) with

1,332,057 cells• Reynolds number: 3940 (based on the inlet airflow)• Inlet velocity: 0.1 m/s• Hydraulic diameter: 0.6m• Air exchange rate: 4 per hour

Implementation methodology

• Software used: FLUENT• Turbulence model: Reynolds Averaged Navier

Stokes (RANS) RNG k-• Transient simulation (~ 70 seconds)• Boussinesq approximation for the air density • Pressure velocity coupling: SIMPLE algorithm• Particle tracking: Langragian approach

Advantages of RNG towards standard k- model

• Indoor airflows are low Re (in the boundary layer)• Equally valid for both high and low Re number

(Yakhot et al.,1986) • Extra strain term in the dissipation rate equation

which can influence the turbulent viscosity and partially account for strong anisotropy in regions of large shearing

• Improved predictions especially in separated flows and anisotropic large-scale eddies (Posner et al., 2002)

• Does not overpredict turbulent kinetic energy and in turn turbulent viscosity compared to the standard k- model (Tian et al.,2005)

2Department of Biomedical Engineering, Rutgers University - UMDNJ, Piscataway, New Jersey

References

Yakhot, V. and S. A. Orszag (1986). "Renormalization-Group Analysis of Turbulence." Physical Review Letters 57(14): 1722.

Posner, J. D., C. R. Buchanan, et al. (2003). "Measurement and prediction of indoor air flow in a model room." Energy and Buildings 35(5): 515-526.

Tian, Z. F., J. Y. Tu, et al. (2005). "Numerical simulation and validation of dilute gas-particle flow over a backward-facing step." Aerosol Science and Technology 39(4): 319-332.

Abanto, J., D. Barrero, et al. (2004). "Airflow modelling in a computer room." Building and Environment 39(12): 1393-1402.

Zhu, S. W., S. Kato, et al. (2005). "Study on inhalation region by means of CFD analysis and experiment." Building and Environment 40(10): 1329-1336.

Gao, N. and J. Niu (2006). "Transient CFD simulation of the respiration process and inter-person exposure assessment." Building and Environment 41(9): 1214-1222.

Ongoing and Future Work

• More comprehensive characterization of the airflow and the contamination that may occur for various exposure scenarios

• Simulation of humans’ breathing process• Incorporation of human thermal mannequins • Incorporation of unsteady particle tracking and interaction of turbulence

eddies with particles• Comparison of the outputs of various turbulence models (k- , LES)

• The velocity vector field can be significantly affected by the jaw

• This influence cannot be ignored when respiration is studied (Zhu et al.,2005)

• Unstructured mesh around the human body can reach y+ values less than 5

• Viscosity affected sub-layer can be resolved only by employing the enhanced wall treatment version

Gao et al., 2006 Gao

et

al.,

2006

Background

Airflow study in a model room (Tian et al.,2006)

Airflow study in a computer room (Abanto et al.,2004)

Operating room contaminant dispersion study (Chow et al.,2003)

Tian, Z. F., J. Y. Tu, et al. (2006). "On the numerical study of contaminant particleconcentration in indoor airflow." Building and Environment 41(11): 1504-1514.

Chow, T. T. and X. Y. Yang (2003). "Performance of ventilation system in a non-standard operating room." Building and Environment 38(12): 1401-1411.

3

2 Traditional Computational Fluid Dynamics for BuildingSimulationComputational fluid dynamics (CFD) is one of the branches of fluidmechanics that uses numerical methods and algorithms to solveand analyze problems that involve fluid flows. Computers are usedto perform the millions of calculations required to simulate theinteraction of liquids and gases with surfaces defined by boundaryconditions.

The foundation of CFD is based on the Navier-Stokes equations.The Navier-Stokes equations, named after Claude-Louis Navier andGeorge Gabriel Stokes, describe the motion of fluid substances, thatis substances that can flow. These equations arise from applyingNewton’s second law to fluid motion, together with the assumptionthat the fluid stress is the sum of a diffusing viscous term propor-tional to the gradient of velocity, plus a pressure term.

A fluid whose density and temperature are nearly constant isdescribed by a velocity field u and a pressure field . These quan-tities generally vary both in space and in time and depend on theboundaries surrounding the fluid. We will denote the spatial co-ordinate by x, which for two- dimensional fluids is x = ( ) andthree-dimensional fluids is equal to ( ). Given that the velocityand the pressure are known for some initial time = 0, then theevolution of these quantities over time is given by the Navier-Stokesequations:

∇ · u = 0 ∂u∂ = −(u · ∇)u − 1

ρ∇ + ν∇2u + f (1)

where ν is the kinematic viscosity of the fluid, ρ is its density and fis an external force [6].

The Reynolds-averaged Navier-Stokes (RANS) equations are time-averaged equations of motion for fluid flow. They are primarilyused while dealing with turbulent flows. These equations can be

5 July 1687

Sir Isaac Newton

Philosophiæ Naturalis Principia Mathematica

F=mdv/dt

Claude-Louis Navier & Sir George Stokes

3




∇ · u = 0 ∂u∂ = −(u · ∇)u − 1

ρ∇ + ν∇2u + f (1)



2.3

2.2

2.1

2.0

1.9

1.8

1.7

100 80 60 40 20 0

(a)

uvu

utime

-0.4

-0.2

0.0

0.2

100 80 60 40 20 0

(b)

uvu'

utime

0.12

0.10

0.08

0.06

0.04

0.02

0.00

100 80 60 40 20 0

(c)

utime

u (u')2

Figure 1: Example of a time history of a component of a fluctuating velocity at a point ina turbulent flow. (a) Shows the velocity, (b) shows the fluctuating component of velocityu ≡ u− u and (c) shows the square of the fluctuating velocity. Dashed lines in (a) and (c)indicate the time averages.

The most mathematically general average is the ensemble average, in which you repeat agiven experiment a large number of times and average the quantity of interest (say velocity)at the same position and time in each experiment. For practical reasons, this is rarely done.Instead, a time or volume average (or combination of the two) is made with the assumptionthat they are equivalent to the ensemble average. For the sake of this discussion, let us definethe time average for a stationary flow1 as

u(y) ≡ limτ→∞

1

2τ

τ

−τu(y, t)dt (18)

The deviation of the velocity from the mean value is called the fluctuation and is usuallydefined as

u ≡ u− u (19)

Note that by definition u = 0 (the average of the fluctuation is zero). Consequently, abetter measure of the strength of the fluctuation is the average of the square of a fluctuating

1A stationary flow is defined as one whose statistics are not changing in time. An example of a stationaryflow is steady flow in a channel or pipe.

15

3




∇ · u = 0 ∂u∂ = −(u · ∇)u − 1

ρ∇ + ν∇2u + f (1)



4

used with approximations based on knowledge of the properties

of flow turbulence to give approximate averaged solutions to the

Navier-Stokes equations. For a stationary, incompressible flow of

Newtonian fluid, these equations can be written as:

ρ∂∂

= ρ + ∂∂

−δ + µ

∂∂

+ ∂∂

− ρ

(2)

The K-epsilon model is one of the most common RANS turbu-

lence models. It is a two-equation model, which means, it includes

two extra transport equations to represent the turbulent properties

of the flow. This allows a two-equation model to account for his-

tory effects like convection and diffusion of turbulent energy. The

model is widely used in building science research, especially indoor

air quality and thermo distribution simulation [7], [8], [9], [10], [11].

The model is relatively simple. For turbulent kinetic energy :

∂∂ (ρ) + ∂

∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+ P + P − ρ − YM + S

(3)

For dissipation ∂∂ (ρ)+ ∂

∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+C1

(P + C3P)−C2ρ2

+S

(4)

The K-epsilon model should be a reference implementation for

us since it is the most widely solver in building science. When we

developed a new solver, we should compare the results to the K-

epsilon model and see whether the new solver is accurate enough.

3 Fast Fluid Dynamics: A Tentative Way to Make CFD

Faster

There are several ways to simplify the Navier-Stokes equation nu-

merical solving method to make CFD calculations faster. One of

4





ρ∂∂

= ρ + ∂∂

−δ + µ

∂∂

+ ∂∂

− ρ

(2)









∂∂ (ρ) + ∂

∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+ P + P − ρ − YM + S

(3)


∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+C1

(P + C3P)−C2ρ2

+S

(4)






Faster



variable. Figures 1(b) and 1(c) show the time evolution of the velocity fluctuation, u, and

the square of that quantity, u2. Notice that the latter quantity is always greater than zero

as is its average.

The equations governing a turbulent flow are precisely the same as for a laminar flow;

however, the solution is clearly much more complicated in this regime. The approaches to

solving the flow equations for a turbulent flow field can be roughly divided into two classes.

Direct numerical simulations (DNS) use the speed of modern computers to numerically

integrate the Navier Stokes equations, resolving all of the spatial and temporal fluctuations,

without resorting to modeling. In essence, the solution procedure is the same as for laminar

flow, except the numerics must contend with resolving all of the fluctuations in the velocity

and pressure. DNS remains limited to very simple geometries (e.g., channel flows, jets and

boundary layers) and is extremely expensive to run.2

The alternative to DNS found in

most CFD packages (including FLUENT) is to solve the Reynolds Averaged Navier Stokes

(RANS) equations. RANS equations govern the mean velocity and pressure. Because these

quantities vary smoothly in space and time, they are much easier to solve; however, as will

be shown below, they require modeling to “close” the equations and these models introducesignificant error into the calculation.

To demonstrate the closure problem, we consider fully developed turbulent flow in a

channel of height 2H. Recall that with RANS we are interested in solving for the meanvelocity u(y) only. If we formally average the Navier Stokes equations and simplify for this

geometry we arrive at the following

duv

dy+

1

ρ

dp

dx= ν

d2u(y)

dy2(20)

subject to the boundary conditions

y = 0du

dy= 0 , (21)

y = H u = 0 , (22)

The kinematic viscosity ν=µ/ρ. The quantity uv, known as the Reynolds stress,3is a higher-

order moment that must be modeled in terms of the knowns (i.e., u(y) and its derivatives).

This is referred to as the “closure” approximation. The quality of the modeling of this term

will determine the reliability of the computations.4

Turbulence modeling is a rather broad discipline and an in-depth discussion is beyond

the scope of this introduction. Here we simply note that the Reynolds stress is modeled in

terms of two turbulence parameters, the turbulent kinetic energy k and the turbulent energy

dissipation rate defined below

k ≡ 1

2

u2 + v2 + w2

(23)

≡ ν

∂u

∂x

2

+

∂u

∂y

2

+

∂u

∂z

2

+

∂v

∂x

2

+

∂v

∂y

2

+

∂v

∂z

2

2The largest DNS to date was recently published by Kaneda et al., Phys. Fluids 15(2):L21–L24 (2003);

they used 40963 grid point, which corresponds roughly to 0.5 terabytes of memory per variable!3Name after the same Osborne Reynolds from which we get the Reynolds number.4Notice that if we neglect the Reynolds stress the equations reduce to the equations for laminar flow;

thus, the Reynolds stress is solely responsible for the difference in the mean profile for laminar (parabolic)

and turbulent (blunted) flows.

16

4





ρ∂∂

= ρ + ∂∂

−δ + µ

∂∂

+ ∂∂

− ρ

(2)









∂∂ (ρ) + ∂

∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+ P + P − ρ − YM + S

(3)


∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+C1

(P + C3P)−C2ρ2

+S

(4)






Faster



4





ρ∂∂

= ρ + ∂∂

−δ + µ

∂∂

+ ∂∂

− ρ

(2)









∂∂ (ρ) + ∂

∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+ P + P − ρ − YM + S

(3)


∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+C1

(P + C3P)−C2ρ2

+S

(4)






Faster



variable. Figures 1(b) and 1(c) show the time evolution of the velocity fluctuation, u, and

the square of that quantity, u2. Notice that the latter quantity is always greater than zero

as is its average.

The equations governing a turbulent flow are precisely the same as for a laminar flow;

however, the solution is clearly much more complicated in this regime. The approaches to

solving the flow equations for a turbulent flow field can be roughly divided into two classes.

Direct numerical simulations (DNS) use the speed of modern computers to numerically

integrate the Navier Stokes equations, resolving all of the spatial and temporal fluctuations,

without resorting to modeling. In essence, the solution procedure is the same as for laminar

flow, except the numerics must contend with resolving all of the fluctuations in the velocity

and pressure. DNS remains limited to very simple geometries (e.g., channel flows, jets and

boundary layers) and is extremely expensive to run.2

The alternative to DNS found in

most CFD packages (including FLUENT) is to solve the Reynolds Averaged Navier Stokes

(RANS) equations. RANS equations govern the mean velocity and pressure. Because these

quantities vary smoothly in space and time, they are much easier to solve; however, as will

be shown below, they require modeling to “close” the equations and these models introducesignificant error into the calculation.

To demonstrate the closure problem, we consider fully developed turbulent flow in a

channel of height 2H. Recall that with RANS we are interested in solving for the meanvelocity u(y) only. If we formally average the Navier Stokes equations and simplify for this

geometry we arrive at the following

duv

dy+

1

ρ

dp

dx= ν

d2u(y)

dy2(20)

subject to the boundary conditions

y = 0du

dy= 0 , (21)

y = H u = 0 , (22)

The kinematic viscosity ν=µ/ρ. The quantity uv, known as the Reynolds stress,3is a higher-

order moment that must be modeled in terms of the knowns (i.e., u(y) and its derivatives).

This is referred to as the “closure” approximation. The quality of the modeling of this term

will determine the reliability of the computations.4

Turbulence modeling is a rather broad discipline and an in-depth discussion is beyond

the scope of this introduction. Here we simply note that the Reynolds stress is modeled in

terms of two turbulence parameters, the turbulent kinetic energy k and the turbulent energy

dissipation rate defined below

k ≡ 1

2

u2 + v2 + w2

(23)

≡ ν

∂u

∂x

2

+

∂u

∂y

2

+

∂u

∂z

2

+

∂v

∂x

2

+

∂v

∂y

2

+

∂v

∂z

2

2The largest DNS to date was recently published by Kaneda et al., Phys. Fluids 15(2):L21–L24 (2003);

they used 40963 grid point, which corresponds roughly to 0.5 terabytes of memory per variable!3Name after the same Osborne Reynolds from which we get the Reynolds number.4Notice that if we neglect the Reynolds stress the equations reduce to the equations for laminar flow;

thus, the Reynolds stress is solely responsible for the difference in the mean profile for laminar (parabolic)

and turbulent (blunted) flows.

16

+

∂w

∂x

2

+

∂w

∂y

2

+

∂w

∂z

2

(24)

where (u, v, w) is the fluctuating velocity vector. The kinetic energy is zero for laminarflow and can be as large as 5% of the kinetic energy of the mean flow in a highly turbulentcase. The family of models is generally known as k– and they form the basis of most CFDpackages (including FLUENT). We will revisit turbulence modeling towards the end of thesemester.

17

4





ρ∂∂

= ρ + ∂∂

−δ + µ

∂∂

+ ∂∂

− ρ

(2)









∂∂ (ρ) + ∂

∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+ P + P − ρ − YM + S

(3)


∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+C1

(P + C3P)−C2ρ2

+S

(4)






Faster



__

- - -.

-_ ~_ t i _ . _-

. - - -N

Turbulence Modeling for CFDDavid C. Wilcox, July 1993

Finite Volume MethodFinite Element Method

Finite Difference Method

38.2 Mathematical Background 643

Table 38-1. Vector Calculus Operators Used in Fluid Simulation

Operator Definition Finite Difference Form

Gradient

Divergence

Laplacian

subscripts i and j used in the expressions in the table refer to discrete locations on aCartesian grid, and δx and δy are the grid spacing in the x and y dimensions, respec-tively (see Figure 38-2).

The gradient of a scalar field is a vector of partial derivatives of the scalar field. Diver-gence, which appears in Equation 2, has an important physical significance. It is therate at which “density” exits a given region of space. In the Navier-Stokes equations, itis applied to the velocity of the flow, and it measures the net change in velocity across asurface surrounding a small piece of the fluid. Equation 2, the continuity equation, en-forces the incompressibility assumption by ensuring that the fluid always has zero diver-gence. The dot product in the divergence operator results in a sum of partial derivatives(rather than a vector, as with the gradient operator). This means that the divergenceoperator can be applied only to a vector field, such as the velocity, u = (u, v).

Notice that the gradient of a scalar field is a vector field, and the divergence of a vectorfield is a scalar field. If the divergence operator is applied to the result of the gradientoperator, the result is the Laplacian operator ∇ ⋅ ∇ = ∇2. If the grid cells are square(that is, if δx = δy, which we assume for the remainder of this chapter), the Laplaciansimplifies to:

The Laplacian operator appears commonly in physics, most notably in the form ofdiffusion equations, such as the heat equation. Equations of the form ∇2x = b areknown as Poisson equations. The case where b = 0 is Laplace’s equation, which is theorigin of the Laplacian operator. In Equation 2, the Laplacian is applied to a vector

∇ =+ + + −+ − + −

( )2 1 1 1 1

2

4p

p p p p p

x

i j i j i j i j i j, , , , , .δ

(3)

p p p

x

p p p

y

i j i j i j i j i j i j+ − + −− +

( )+

− +

( )1 1

21 1

2

2 2, , , , , ,

δ δ∇ =

∂∂

+∂∂

22

2

2

2pp

xp

y

u u

x

v v

yi j i j i j i j+ − + −−

+−1 1 1 1

2 2, , , ,

δ δ∇ ⋅ = ∂

∂+ ∂∂

uux

vy

p p

x

p p

yi j i j i j i j+ − + −− −1 1 1 1

2 2, , , ,,δ δ

∇ =∂∂

∂∂

ppx

py

,

Direct and Iterative Solvers

We saw that we need to perform iterations to deal with the nonlinear terms in the governingequations. We next discuss another factor that makes it necessary to carry out iterations inpractical CFD problems.

Verify that the discrete equation system resulting from the finite-difference approxima-tion (12) on our four-point grid is

1 0 0 0−1 1 + 2∆xug2 0 00 −1 1 + 2∆xug3 00 0 −1 1 + 2∆xug4

u1

u2

u3

u4

=

1∆xu2

g2

∆xu2g3

∆xu2g4

(13)

In a practical problem, one would usually have thousands to millions of grid points or cellsso that each dimension of the above matrix would be of the order of a million (with most ofthe elements being zeros). Inverting such a matrix directly would take a prohibitively largeamount of memory. So instead, the matrix is inverted using an iterative scheme as discussedbelow.

Rearrange the finite-difference approximation (12) at grid point i so that ui is expressedin terms of the values at the neighboring grid points and the guess values:

ui =ui−1 + ∆xu2

gi

1 + 2 ∆xugi

If a neighboring value at the current iteration level is not available, we use the guess valuefor it. Let’s say that we sweep from right to left on our grid i.e. we update u4, then u3 andfinally u2 in each iteration. In the mth iteration, u(l)

i−1 is not available while updating umi and

so we use the guess value u(l)gi−1

for it instead:

u(l)i =

u(l)gi−1

+ ∆xu(l)2gi

1 + 2 ∆xu(l)gi

(14)

Since we are using the guess values at neighboring points, we are effectively obtaining onlyan approximate solution for the matrix inversion in (13) during each iteration but in theprocess have greatly reduced the memory required for the inversion. This tradeoff is goodstrategy since it doesn’t make sense to expend a great deal of resources to do an exact matrixinversion when the matrix elements depend on guess values which are continuously beingrefined. In an act of cleverness, we have combined the iteration to handle nonlinear termswith the iteration for matrix inversion into a single iteration process. Most importantly, asthe iterations converge and ug → u, the approximate solution for the matrix inversion tendstowards the exact solution for the inversion since the error introduced by using ug insteadof u in (14) also tends to zero.

Thus, iteration serves two purposes:

1. It allows for efficient matrix inversion with greatly reduced memory requirements.

2. It is necessary to solve nonlinear equations.

In steady problems, a common and effective strategy used in CFD codes is to solve theunsteady form of the governing equations and “march” the solution in time until the solutionconverges to a steady value. In this case, each time step is effectively an iteration, with thethe guess value at any time level being given by the solution at the previous time level.

10

function [x] = conjgrad(A,b,x) r=b-A*x; p=r; rsold=r'*r; for i=1:size(A)(1) Ap=A*p; alpha=rsold/(p'*Ap); x=x+alpha*p; r=r-alpha*Ap; rsnew=r'*r; if sqrt(rsnew)<1e-10 break; end p=r+rsnew/rsold*p; rsold=rsnew; endend

Iterative Convergence

Recall that as ug → u, the linearization and matrix inversion errors tends to zero. So wecontinue the iteration process until some selected measure of the difference between ug andu, refered to as the residual, is “small enough”. We could, for instance, define the residualR as the RMS value of the difference between u and ug on the grid:

R ≡

N

i=1

(ui − ugi)2

N

It’s useful to scale this residual with the average value of u in the domain. An unscaledresidual of, say, 0.01 would be relatively small if the average value of u in the domain is 5000but would be relatively large if the average value is 0.1. Scaling ensures that the residual isa relative rather than an absolute measure. Scaling the above residual by dividing by theaverage value of u gives

R =

N

i=1

(ui − ugi)2

N

NN

i=1

ui

=

NN

i=1

(ui − ugi)2

N

i=1

ui

(15)

For the nonlinear 1D example, we’ll take the initial guess at all grid points to be equalto the value at the left boundary i.e. u(1)

g = 1. In each iteration, we update ug, sweepfrom right to left on the grid updating, in turn, u4, u3 and u2 using (14) and calculatethe residual using (15). We’ll terminate the iterations when the residual falls below 10−9

(which is referred to as the convergence criterion). Take a few minutes to implement thisprocedure in MATLAB which will help you gain some familiarity with the mechanics of theimplementation. The variation of the residual with iterations obtained from MATLAB isshown below. Note that logarithmic scale is used for the ordinate. The iterative processconverges to a level smaller than 10−9 in just 6 iterations. In more complex problems, a lotmore iterations would be necessary for achieving convergence.

1 2 3 4 5 610−10

10−8

10−6

10−4

10−2

100

Iteration number

Res

idua

l

11

FLUENT 6.0 User’s Guide

Volume 5

December 2001

Introduction to Using FLUENT

Problem Description: The problem to be considered is shown sche-matically in Figure 1.1. A cold fluid at 26C enters through thelarge pipe and mixes with a warmer fluid at 40C in the elbow.The pipe dimensions are in inches, and the fluid properties andboundary conditions are given in SI units. The Reynolds numberat the main inlet is 2.03 × 105, so that a turbulent model will benecessary.

32

12

164

!

32 !

16 !

!

!!

U = 0.2 m/sT = 26 CI = 5%

U = 1 m/sT = 40 CI = 5%

x

y

°

°

Conductivity: k = 0.677 W/m-K

Density: = 1000 kg/m" 3

Viscosity: µ = 8 x 10 Pa-s-4

p

39.93°39.93 °

Specific Heat: C = 4216 J/kg-K

Figure 1.1: Problem Specification

1-2 c© Fluent Inc. November 27, 2001


Step 1: Grid

1. Read the grid file elbow.msh.

File −→ Read −→Case...

(a) Select the file elbow.msh by clicking on it under Files and thenclicking on OK.

Note: As this grid is read by FLUENT, messages will appear in theconsole window reporting the progress of the conversion. Afterreading the grid file, FLUENT will report that 918 triangularfluid cells have been read, along with a number of boundaryfaces with different zone identifiers.

1-4 c© Fluent Inc. November 27, 2001


Grid Oct 23, 2000FLUENT 6.0 (2d, segregated, lam)

Figure 1.2: The Triangular Grid for the Mixing Elbow

Extra: You can use the right mouse button to check which zone numbercorresponds to each boundary. If you click the right mouse buttonon one of the boundaries in the graphics window, its zone number,name, and type will be printed in the FLUENT console window.This feature is especially useful when you have several zones of thesame type and you want to distinguish between them quickly.

c© Fluent Inc. November 27, 2001 1-9


2. Turn on the standard k-ε turbulence model.

Define −→ Models −→Viscous...

(a) Select k-epsilon in the Model list.

The original Viscous Model panel will expand when you do so.

(b) Accept the default Standard model by clicking OK.



Step 3: Materials

1. Create a new material called water.

Define −→Materials...



Scaled Residuals Oct 24, 2000FLUENT 6.0 (2d, segregated, ske)

Iterations

1e-07

1e-06

1e-05

1e-04

1e-03

1e-02

1e-01

1e+00

1e+01

1e+02

0 10 20 30 40 50 60

epsilon

Residualscontinuityx-velocityy-velocityenergyk

Figure 1.3: Residuals for the First 60 Iterations

5. Save the data file (elbow1.dat).

Use the same prefix (elbow1) that you used when you saved thecase file earlier. Note that additional case and data files will bewritten later in this session.

File −→ Write −→Data...


Introduction to Using FLUENT: Fluid Flow and Heat Transfer in a Mixing Elbow

(b) Click Display and close the Contours panel.

Contours of Static Temperature (k)FLUENT 6.3 (3d, pbns, rke)

3.13e+023.12e+023.11e+023.10e+023.09e+023.08e+023.07e+023.06e+023.05e+023.04e+023.03e+023.02e+023.01e+023.00e+022.99e+022.98e+022.97e+022.96e+022.95e+022.94e+022.93e+02

ZY

X

Figure 1.6: Predicted Temperature Distribution after the Initial Calculation

1-30 c Fluent Inc. September 21, 2006

Introduction to Using FLUENT: Fluid Flow and Heat Transfer in a Mixing Elbow

Velocity Vectors Colored By Velocity Magnitude (m/s)FLUENT 6.3 (3d, pbns, rke)

1.48e+001.42e+001.35e+001.29e+001.23e+001.17e+001.11e+001.05e+009.85e-019.24e-018.62e-018.01e-017.39e-016.77e-016.16e-015.54e-014.93e-014.31e-013.69e-013.08e-012.46e-01

ZY

X

Figure 1.7: Resized Velocity Vectors

Velocity Vectors Colored By Velocity Magnitude (m/s)FLUENT 6.3 (3d, pbns, rke)

1.48e+001.42e+001.35e+001.29e+001.23e+001.17e+001.11e+001.05e+009.85e-019.24e-018.62e-018.01e-017.39e-016.77e-016.16e-015.54e-014.93e-014.31e-013.69e-013.08e-012.46e-01

ZY

X

Figure 1.8: Magnified View of Velocity Vectors

1-32 c Fluent Inc. September 21, 2006

9

solved to simulate the flow of a Newtonian fluid with collision mod-

els such as Bhatnagar-Gross-Krook (BGK). By simulating stream-

ing and collision processes across a limited number of particles, the

intrinsic particle interactions evince a microcosm of viscous flow

behavior applicable across the greater mass [34], [35], [36].

The essential quantity in the Lattice Boltzmann Method (LBM)

[11] is a density function (DF) (x ) on a discrete lattice, x =(δ δ δ) ( ∈ [0 N], ∈ [0 N], ∈ [0 N]) with discrete ve-

locity values e ( ∈ [0 Nν − 1]) at time t. Here, each e points from

a lattice site to one of its Nν near-neighbor sites. N , N , and Nare the numbers of lattice sites in the , , and directions, respec-

tively, with δ, δ and δ being the corresponding lattice spacing

and Nµ = 18 being the number of discrete velocity values. From

the DF, we can calculate various physical quantities such as fluid

density ρ(x ) and velocity u(x ):ρ(x ) =

(x ) ρ(x )u(x ) =

e(x ) (13)

The time evolution of the DF is governed by the Boltzmann

equation in the Bhatnagar-Gross-Krook (BGK) model. The LBM

simulation thus consists of a time stepping iteration, in which col-

lision and streaming operations are performed as time is incre-

mented by δ at each iteration step.

The Collision equation can be written as:

(x +) = (x ) − 1τ ((x ) − eq

(ρ(x) u(x))) (14)

where

eq (ρ u) = ρ(A + B(e u) + C(e u)2 + Du2) (15)

Where A, B, C and D are constants, and the time constant τ is

related to the kinematic viscosity ν through a relation ν = (τ−1/2)/3.

The Streaming equation can be written as:

P1: ARK/ary P2: MBL/vks QC: MBL/bsa T1: MBL

November 24, 1997 15:23 Annual Reviews AR049-12

Annu. Rev. Fluid Mech. 1998. 30:329–64Copyright c 1998 by Annual Reviews Inc. All rights reserved

LATTICE BOLTZMANN METHODFOR FLUID FLOWSShiyi Chen1,2 and Gary D. Doolen21IBM Research Division, T. J. Watson Research Center, P.O. Box 218, YorktownHeights, NY 10598; 2Theoretical Division and Center for Nonlinear Studies, LosAlamos National Laboratory, Los Alamos, NM 87545; e-mail: [email protected]

KEY WORDS: lattice Boltzmann method, mesoscopic approach, fluid flow simulation

ABSTRACTWe present an overview of the lattice Boltzmann method (LBM), a parallel andefficient algorithm for simulating single-phase and multiphase fluid flows andfor incorporating additional physical complexities. The LBM is especially usefulformodeling complicated boundary conditions andmultiphase interfaces. Recentextensions of thismethod are described, including simulations of fluid turbulence,suspension flows, and reaction diffusion systems.

INTRODUCTIONIn recent years, the lattice Boltzmann method (LBM) has developed into analternative and promising numerical scheme for simulating fluid flows andmodeling physics in fluids. The scheme is particularly successful in fluid flowapplications involving interfacial dynamics and complex boundaries. Unlikeconventional numerical schemes based on discretizations of macroscopic con-tinuum equations, the lattice Boltzmann method is based on microscopic mod-els and mesoscopic kinetic equations. The fundamental idea of the LBM isto construct simplified kinetic models that incorporate the essential physics ofmicroscopic or mesoscopic processes so that the macroscopic averaged prop-erties obey the desired macroscopic equations. The basic premise for usingthese simplified kinetic-type methods for macroscopic fluid flows is that themacroscopic dynamics of a fluid is the result of the collective behavior of manymicroscopic particles in the system and that the macroscopic dynamics is notsensitive to the underlying details in microscopic physics (Kadanoff 1986).By developing a simplified version of the kinetic equation, one avoids solving

3290066-4189/98/0115-0329$08.00

9














(x ) ρ(x )u(x ) =

e(x ) (13)







(x +) = (x ) − 1τ ((x ) − eq

(ρ(x) u(x))) (14)

where

eq (ρ u) = ρ(A + B(e u) + C(e u)2 + Du2) (15)




9














(x ) ρ(x )u(x ) =

e(x ) (13)







(x +) = (x ) − 1τ ((x ) − eq

(ρ(x) u(x))) (14)

where

eq (ρ u) = ρ(A + B(e u) + C(e u)2 + Du2) (15)




10

(x + e + δ) = (x +) (16)It should be noted that the collision step involves a large number

of floating-point operations that are strictly local to each lattice site,while the streaming step contains no floating-point operation butsolely memory copies between nearest-neighbor lattice sites.

Due to its particulate nature and local dynamics, LBM has sev-eral advantages over other conventional CFD methods, especiallyin dealing with complex boundaries, incorporating of microscopicinteractions, and parallelization of the algorithm [37]. Even thoughthe LBM is based on a particle picture, its principal focus is the av-eraged macroscopic behavior. The kinetic equation provides manyof the advantages of molecular dynamics, including clear physi-cal pictures, easy implementation of boundary conditions, and fullyparallels algorithms. Because of the availability of very fast andmassively parallel machines, there is a current trend to use codesthat can exploit the intrinsic features of parallelism. The LBM ful-fills these requirements in a straightforward manner. Benchmarkshows LBM on parallel machines with fast internet connection haveclose to linear speed up when more processing units are added [38].

6 CFD based on GPU Programming And Its ProblemIn recent years, the GPU has attracted attention for numericalcomputing. Its structure is highly parallelized and optimized toachieve high performance for image processing. GPU can per-forms floating-point calculations, especially such can be translatedinto shading, blazingly fast thanks to the hardware accelerations.Moreover, GPU speed is improving dramatically over the past fiveyears, the acceleration is much greater than CPU [39].

Incorporating CFD calculations using a GPU chip first appearedon a GPU software-programming book from NVIDIA. The author

9














(x ) ρ(x )u(x ) =

e(x ) (13)







(x +) = (x ) − 1τ ((x ) − eq

(ρ(x) u(x))) (14)

where

eq (ρ u) = ρ(A + B(e u) + C(e u)2 + Du2) (15)




9














(x ) ρ(x )u(x ) =

e(x ) (13)







(x +) = (x ) − 1τ ((x ) − eq

(ρ(x) u(x))) (14)

where

eq (ρ u) = ρ(A + B(e u) + C(e u)2 + Du2) (15)




10






10






SPHSmoothed-particle hydrodynamics

Real-Time Particle-Based Simulation on GPUs (sap 0151)

Takahiro Harada∗ Masayuki Tanaka† Seiichi Koshizuka‡ Yoichiro Kawaguchi§

The University of Tokyo

Figure 1: Real-time simulation of glasses and liquid. Glass tower is filled with liquid and a glass is thrown into the scene. This simulationruns 17.1 frames per second on GeForce 8800GTX.

1 Introduction

As physical laws govern the motion of objects around us, aphysically-based simulation plays an important role in computergraphics. For instance, the motion of a fluid, which is difficult togenerate by hand, can be produced by solving the governing equa-tions. Acceleration of a simulation is one of the most importantresearch themes because the speed and stability of a simulation areessential for real-time applications.

The current trend in processor technology is to improve the effi-ciency of processors and not increase their frequency. Processorsnowadays are equipped with parallel architecture. Cell BroadbandEngine Architecture is a multi-core processor for general-purposecomputation and Graphics Processing Units (GPUs) are specializedparallel processors for graphics tasks. Additionally, CPUs are alsoshifting to multi-core design. What we need to do now is adapta-tion to these platforms. Therefore, we need to develop data-parallelalgorithms that exploit their computational powers.

In this sketch, we show that a particle-based simulation can be par-allelized and implemented entirely on Graphics Processing Units(GPUs) as a parallel computation platform. As a result, we canobtain unprecedented performance with scalar processors. We alsopresents a particle-based method to interact fluids and rigid bodies.In this method, rigid bodies are represented by a set of particles.The benefits of this method are low computational cost and paral-lelism of its algorithm.

2 Methods

Smoothed Particle Hydrodynamics (SPH) is employed to solve thegoverning equation of a fluid[Muller et al. 2003]. A characteris-tics of particle method including SPH is that there is no numericaldissipation caused by advection calculation and so mass loss doesnot occure even if the resolution of a simulation is low. Therefore,the particle methods are suited for a real-time application. As forthe rigid body simulation, a rigid body is represented by a set of

∗e-mail: [email protected]†e-mail:[email protected]‡e-mail:[email protected]§e-mail:[email protected]

particles (spheres) as the title of this skech implies. We call themrigid particles. The size of rigid particles is all the same and alsothe same to the size of fluid particles. An advantage of this shaperepresentation is the computation speed is controllable by chang-ing the accuracy, i.e., the resolution of particles. With this shaperepresentation, not only collision between rigid bodies but also in-teraction between a rigid body and a fluid can be converted to thesimple problem of computation of particle interactions. Thus, thecomputation is simple and it can be computed in parallel. How-ever, the shape representation using particles increases the numberof simulation entites because a rigid body is consists of a few rigidparticles. A uniform grid is introduced to make the neighboringparticle search efficient. The interaction between fluid particles andrigid particles is calculated by assuming rigid bodies as a fluid. Thedensity is also computed on rigid particles and then the pressure andviscosity forces are calculated between fluid particles. The force onthe rigid particle, which is the sum of the force from fluid and theforce from collisions between rigid particles, is used to update thelinear and anguler momenta of a rigid body.

As described above, the force computation between rigid bodiesand a fluid can be executed in parallel. When GPUs are used, aframe buffer is rendered with a fragment shader by assigning a pixelto a particle. The fragment shader compute the force with physicalvalues of neighboring particles which are read from other textures.A data which stores an information about neighboring particles isgenerated in advance by a vertex shader. In this way, forces on rigidbodies and a fluid are computed in parallel.

3 Results

In Figure 1, 10 glasses are stacked and a fluid is poured from abovethem. Then, a glass is thrown onto them and the stacked glasses col-lapse. This simulation uses 49,153 particles and runs 17.1 framesper second on GeForce 8800GTX using a rendering in which pointsprites are used to render particles. The simulator outputs simula-tion data and the surface of the fluid is constructed by MarchingCubes by assigning densities to fluid particles. The polygons arerendered after the simulation with an offline renderer. The accom-panying video includes several examples which runs in real-time. Asimulation which uses the largest particle number runs 3.85 framesper second with 245,760 particles. These examples show the capa-bility of the present technique.

References

MULLER, M., CHARYPAR, D., AND GROSS, M. 2003. Particle-based fluidsimulation for interactive applications. In Proc. of SIGGRAPH Sympo-sium on Computer Animation, 154–159.

Harada et al. use modern GPUs (GeForce 8800 GTX) to simulate 49,153 particles at 17 fps.

Control, Estimation and Optimization of Energy Efficient Buildings

Jeff Borggaard ∗, John A. Burns †, Amit Surana §, Lizette Zietsman ‡∗ Interdisciplinary Center for Applied Mathematics, Virginia Tech, Blacksburg, VA [email protected]

† Interdisciplinary Center for Applied Mathematics, Virginia Tech, Blacksburg, VA [email protected]‡ Interdisciplinary Center for Applied Mathematics, Virginia Tech, Blacksburg, VA [email protected]

§ United Technologies Research Center, Hartford, CT [email protected]

Abstract— Commercial buildings are responsible for a sig-nificant fraction of the energy consumption and greenhousegas emissions in the U.S. and worldwide. Consequently, thedesign, optimization and control of energy efficient buildingscan have a tremendous impact on energy cost and greenhousegas emission. Buildings are complex, multi-scale in time andspace, multi-physics and highly uncertain dynamic systems withwide varieties of disturbances. Recent results have shown thatby considering the whole building as an integrated systemand applying modern estimation and control techniques to thissystem, one can achieve greater efficiencies than obtained byoptimizing individual building components such as lighting andHVAC. We consider estimation and control for a distributedparameter model of a multi-room building. In particular,we show that distributed parameter control theory, coupledwith high performance computing, can provide insight andcomputational algorithms for the optimal placement of sensorsand actuators to maximize observability and controllability.Numerical examples are provided to illustrate the approach. Wealso discuss the problems of design and optimization (for energyand CO2 reduction) and control (both local and supervisory) ofwhole buildings and demonstrate how sensitivities can be usedto address these problems.

I. INTRODUCTION

Whole buildings are complex, multi-scale, multi-physics,highly uncertain dynamic systems with wide varieties ofdisturbances. By itself, whole building simulation is a signifi-cant computational challenge. However, when addressing theadditional requirements that center on design, optimization(for energy and CO2) and control (both local and supervi-sory) of whole buildings, it becomes an immense challengeto develop practical computational tools that are scalable andwidely applicable to current and future building stock.

At a fundamental level, there are several potential solutionsto the design and control of high performance buildings.Roughly speaking, these approaches include: (1) SimulationBased Design, (2) Holistic Fully Integrated Design and (3)Hybrid Design Methods. Regardless of the approach, itis clear that computing resources and the development ofcomputational methods will be an enabling science becauseat some point in the design and control process, numericalmethods must be employed. A major question is, “Whendoes one introduce the approximations”? In the best case

one keeps the physics of the problem as long as possibleand then introduce approximations at the last stage of thedesign. The current state is the opposite; the physics isapproximated by a numerical (lumped) model and then usedas a design model. This is what is known as simulationbased design. At the other end of the spectrum is the holisticapproach where the design problem is abstracted and thencomputational methods and tools are developed to solvethe fully integrated design, optimization or control problem.Here the numerical approximations are introduced at the laststage. Hybrid methods attempt to take advantages of bothapproaches. In this paper we show that distributed parametercontrol, combined with high performance computing can beused to provide practical insight into important issues suchas sensor/actuator placement and state estimation for control.

The Model and Problem Formulation

Before focusing on a specific problem it is important tonote that whole buildings are very complex multi-scale (intime and space) systems as Fig. 1 below illustrates. Optimaldesign and control of these systems are very challengingproblems and are often done by first developing a reducedorder model and then basing the design on the simplifiedmodel. In this short paper we show that distributed parametercontrol theory can provide useful information about buildingdesign and control and then we suggest future areas ofresearch.

Fig. 1. A Whole Building is a Complex System

In order to illustrate some of the ideas, we consider theproblem illustrated by a single room shown in Fig. 2 below.

Fig. 2. Room Control Problem

Here, the goal is to design the room (locate vents, placesensors, etc.) in order to control the room temperature nearthe workspace and minimize energy. The problems of designand control should be considered simultaneously because thetype and effectiveness of the controller depends on the typeand quality of the sensed information and conversely. Inthis problem the system is governed by the Navier-Stokesequations in the room denoted by Ω and given by∂v(t,x)

∂t+ v(t,x) ·∇v(t,x) = −∇p(t,x) +

1Re

∆v(t,x)

(I.1)

∇ · v(t,x) = 0, (I.2)∂T (t,x)

∂t+ v(t,x) ·∇T (t,x) =

1RePr

∆T (t,x) + B(t,x),

(I.3)

where x ∈ Ω ⊂ R3, v(t,x) is the velocity vector, p(t,x) isthe pressure and T (t,x) is the temperature. Nondimension-alization has been carried out such that Re is the Reynoldsnumber and Pr is the Prandtl number. Note that for this study,the energy equation (I.3) does not influence the momentumequation (I.1).

The ideas presented in this work should ultimately bestudied on the Boussinesq equations, where temperatureintroduces a bouyancy force in (I.1). For ease of presentation,we assume inflow is fixed and the control term is given byB(t,x) = b(x)u(t) where b(x) is a given distribution andu(t) is a thermal control input. The case where the controlis applied at the boundary is slightly more complex andrequires a different technical framework. However, for the

discussion here it is sufficient to think of b(x) as a functionwith support near the wall vent defined on the domain Ω.The controlled output, w, of the system will be defined by aweighted average over the sub-domain in the room occupiedby the workspace. In particular, let

w(t) =

ΩW

d(x)T (t,x)dx, (I.4)

where ΩW ⊂ Ω is specified to be a region around the desk.Consider the problem of finding the control that minimizes

J(u) = ∞

0

[w(t)− r(t)]2 + R[u(t)]2

dt, (I.5)

where R > 0 and r(t) is a desired average temperature tobe tracked. For the discussion here, we set r(t) = 0.

Also, assume that one has p sensors with supports in theregions Ωi. In particular, for i = 1, 2, ..., p, let

yi(t) =

Ωi

ci(x)T (t,x)dx, (I.6)

and hence the sensed output is given by

y(t) = [y1(t), y2(t), · · · , yp(t)]T .

II. ABSTRACT FORMULATION OF THE CONTROLPROBLEM

Under suitable assumptions and applying the appropriateboundary conditions, we formulate (I.1)-(I.3) as a differentialequation on an infinite dimensional (Hilbert) space, Z, of theform

z(t) = Az(t) +N (z(t)) + Bu(t), t > 0, (II.1)

z(0) = z0 ∈ Z,

where A : D(A) ⊆ Z → Z generates a C0-semigroup S(t)on Z, N : D(N ) ⊆ Z → Z is a nonlinear operator andB : U → Z is a linear input operator (perhaps unbounded)from the control space U to the state space Z. If the operatorE : Z → R is defined by (I.4), then the cost function (I.5)has the form

J(u) = ∞

0[Qz(t), z(t)

Z+ Ru(t), u(t)

U] dt, (II.2)

where Q = E∗E .Also, if the operator C : Z → Rp is defined by (I.6), then

the measured output is defined by

y(t) = Cz(t). (II.3)

Here, Z = V ×E where V is the space of divergent freevector fields on Ω, and A has the form

A =

AO 0−(∇T )T AAD

III. NUMERICAL EXAMPLES AND CONCLUSIONS

Here we use distributed parameter LQR theory combinedwith finite elements to compute both feedback functionalgains and observer functional gains for thermal control ofthe 3D room above (Fig. 2) with a fixed flow.

The flow is computed in a cubical room with an inflowduct at (x = 0, 0.375 < y < 0.625, 0.75 < z < 0.825)with a bi-quadratic flow profile. The Reynolds number, Re,is 50 based on the height of the room and the maximuminlet flow velocity. The Prandtl number is Pr = 0.7, whichis appropriate for air. Streamlines as well as velocity vectorsand velocity magnitude contours are shown in Fig. 3.

Fig. 3. Flow Through the Room

Fig. 4 shows the optimal LQR feedback functional gainkT (x). Note that a large portion of the support of kT (x) isconcentrated in the center of the room and is zero near theinflow vent. The maximum value in the center of the roomcorresponds with our choice of ΩW = (0.25, 0.75)3. Thus,the functional gain illustrates the point above, suggestingthat “optimal” sensor placements should be focused near theworkspace. However, in practice one must use “wall” sensorsand use state estimation methods to re-construct the state.

Figs. 5–7 below contain the observer functional gains,FT , for three locations. Note that when the sensor is placednear the outflow vent the support of the observer gain isthe largest, while the support of the observer gain whenthe sensor is placed on the top wall the smallest. Thus, asindicated quantitatively in Table I below, placing a sensornear the exit vent will tend to enhance “observability.”

Conclusions and Future Work

Although these results are preliminary, they illustrate howone can compute and use infinite dimensional theory to

Fig. 4. Feedback Functional Gain

Fig. 5. Observer Gain: Sensor Centered on Side Wall

Fig. 6. Observer Gain: Sensor on Side Wall Near Vent























TABLE IL2 NORM OF OBSERVER, F

Sensor Location F2 max(fT )centered on side wall 1.37× 10−6 1.83× 10−6

on side wall near exit 1.43× 10−6 1.94× 10−6

on ceiling near side wall 1.22× 10−6 1.61× 10−6

Fig. 7. Observer Gain: Sensor on Top Wall

develop insight into control problems that arise in the designand operation of high performance buildings. For example,optimization could be used in the wall sensor placementproblem to maximize F2. Likewise, similar optimizationalgorithms can be envisioned to optimally place the inflowvent. However, considerable work needs to be done beforethese ideas become useful tools for whole building designand control.

Acknowledgements The research of the second and fourthauthors was supported in part by Air Force Office of Scien-tific Research grant FA9550-07-1-0273 and the first authorwas supported in part by Air Force Office of ScientificResearch grant FA9550-08-1-0136.

REFERENCES

[1] J. A. Atwell and B. B. King, Reduced Order Controllers for SpatiallyDistributed Systems via Proper Orthogonal Decomposition, VirginiaTech ICAM Report 99-07-01, SIAM Journal on Scientific Computing,26, 2005, 128–151.

[2] J. A. Atwell and B. B. King, Computational Aspects of Reduced OrderFeedback Controllers for Spatially Distributed Systems, Proceedings

of the 38th IEEE Conference on Decision and Control, December 7-10, 1999, 4301–4306.

[3] H. T. Banks and K. Kunisch, An Approximation Theory for NonlinearPartial Differential Equations with Applications to Identication andControl, SIAM J. Control and Optimization, 20, 1982, 815–489.

[4] H. T. Banks, R. C. Smith and Y. Wang, Smart Material Structures:

Modeling, Estimation and Control, John Wiley & Sons-Masson, Paris,1996.

[5] H. T. Banks and K. Ito, Approximations in LQR Problems forInfinite Dimensional Systems with Unbounded Input Operators, J.

Mathematical Systems, Estimation, and Control, 7, 1997, 1–34.

[6] H.T. Banks and K. Ito, A Numerical Algorithm for Optimal FeedbackGains in High Dimensional Linear Quadratic Regulator Problems,SIAM J. Control and Optimization, 29(3), 1991, 499–515.

[7] J. Borggaard and J. Burns, A Continuous Control Design Method, inProceedings of the 3rd Theoretical Fluid Mechanics Meeting, AIAAPaper Number 2002-2758, 2002.

[8] J. Borggaard, J. Burns and L. Zietsman, “Computational Challengesin Control of Partial Differential Equations”, Proc. High PerformanceComputing 2004, 51–56.

[9] J.A. Burns, K. Ito and R.K. Powers, Chandrasekhar Equations andComputational Algorithms for Distributed Parameter Systems, 23rd

IEEE Conference on Decision and Control, December 1984, 262–267.[10] J.A. Burns and R.K. Powers, Factorization and Reduction Methods

for Optimal Control of Hereditary Systems, Mathematica Applicada

E Computational, 5(3), 1987, 203–248.[11] J.A. Burns and B.B. King, Optimal Sensor Location for Robust Control

of Distributed Parameter Systems, Proc. 33rd

IEEE Conference on

Decision and Control, Orlando, FL, December 1994, 3967–3972.[12] J.A. Burns, B.B. King, and Y.-R. Ou, A Computational Approach

to Sensor/Actuator Location for Feedback Control of Fluid FlowSystems, in Sensing, Actuation, and Control in Aeropropulsion, J.D.Paduano, ed., SPIE Proceedings Series, 2494, 1995, 60–69.

[13] J.A. Burns, B.B. King and D. Rubio, Feedback Control of a ThermalFluid Using State Estimation, International Journal of Computational

Fluid Dynamics, 1998, 1–20.[14] J.A. Burns, B.B. King and D. Rubio, On the Design of Feedback

Controllers for a Convecting Fluid Flow via Reduced Order Modeling,Proceedings of the 1999 IEEE CCA/CACSD, Kohala Coast, Hawaii,August 1999, 1157–1162.

[15] J.A. Burns and K.P. Hulsing, Numerical Methods for ApproximatingFunctional Gains in LQR Boundary Control Problems, Journal of

Mathematical and Computer Modeling, 33, 2001, 89–100.[16] R.F. Curtain and H.J. Swart, An Introduction to Infinite-Dimensional

Linear Systems, Springer-Verlag, New York, 1995.[17] B.F. Farrell and P. Ioannou, Turbulence Suppression by Active Control,

Phys. Fluids A, 8, 1996, 1257–1268.[18] H.O. Fattorini and S.S. Sritharan, Existence of Optimal Controls for

Viscous Flow Problems. Proc. Roy. Soc. London Ser. A, 439, 1992,81–102.

[19] K.P. Hulsing, Methods for Computing Functional Gains for LQRControl of Partial Differential Equations, Ph.D. Thesis, Departmentof Mathematics, Virginia Polytechnic Institute and State University,December, 1999.

[20] K. Ito and R. Powers, “Chandrasekhar Equations for Infinite Dimen-sional Systems,” SIAM J. Control and Optimization, Vol. 25(3), 1987,596–611.

[21] K. Ito and F. Kappel, The Trotter-Kato Theorem and Approximationof PDEs, Mathematics of Computation, 67, 1998, 21–44.

[22] K. Ito and S.S. Ravindran, A Reduced Order Method for Simulationand Control of Fluid Flows, J. Computational Physics, 143, 1998,403–425.

[23] K. Ito and S.S. Ravindran, Reduced Basis Method for Optimal Controlof Unsteady Viscous Flows, International J. of Computational Fluid

Dynamics, 15, 2001, 97–113.[24] S. Kang and K. Ito, A Dissipative Feedback Control Synthesis for

Systems Arising in Fluid Dynamics, SIAM J. Control and Opt., 32,1994, 831–854.

[25] J.L. Lions, Controle Optimal des Systemes Gouvernes par des

Equations aux Derivees Partielles, Dunod, Paris 1969 (English trans-lation Springer-Verlag, New York, 1971).

[26] Y-R. Ou, Mathematical Modeling and Numerical Simulation in Exter-nal Flow Control, in Flow Control, Max. D. Gunzburger, ed., Springer-Verlag, New York, (1995), 219–255.

[27] D. Rubio, Distributed Parameter Control of Thermal Fluids, Ph.D.Dissertation. Virginia Polytechnic Institute and State University, 1997.

[28] Sorine, M., “Sur le Semi-groupe Nonlineaire Associe a l’Equation deRiccati,” Rapport INRIA, #167, 1982.

[29] S.S. Sritharan, Optimal Feedback Control of Hydrodynamics: AProgress Report, in Flow Control, Max. D. Gunzburger, ed., Springer-Verlag, New York, 1995, 257–274.

[30] S.S. Sritharan, Optimal Control of Viscous Flows, SIAM Publications,Philadelphia, PA, 1998.

[31] A. Surana, N. Hariharan, S. Narayanan and A. Banaszuck, “Reducedorder modeling for contaminant transport and mixing in building

• Re = 50 is unrealistic (usually > 3000)

• One case scenario provides insufficient information

There are few papers discussing building control & CFD

Reason: CFD is slow. not possible for realtime simulation

How to make CFD faster?

4





ρ∂∂

= ρ + ∂∂

−δ + µ

∂∂

+ ∂∂

− ρ

(2)









∂∂ (ρ) + ∂

∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+ P + P − ρ − YM + S

(3)


∂(ρ) = ∂

∂

µ + µ

σ

∂∂

+C1

(P + C3P)−C2ρ2

+S

(4)






Faster



18

8. Sun, Huawei, Zhao, Lingying, Zhang, Yuanhui, Evaluating RNGk-[epsilon] models using PIV data for airflow in animal build-ings at different ventilation rates. ASHRAE Transactions, Janu-ary 1, 2007

9. TOMINAGA YOSHIHIDE, MOCHIDA AKASHI, et, al., Jour-nal of Architecture, Planning and Environmental Engineering,Comparison of performance of various revised k-.EPSILON.models applied to CFD analysis of flowfield around a high-risebuilding.

10. P. Neofytou, A.G. Venetsanos, et, al., CFD simulations of thewind environment around an airport terminal building, En-vironmental Modelling & Software, Volume 21, Issue 4, April2006, Pages 520-524

11. R. Panneer Selvam, Computation of flow around Texas Techbuilding using k-epsilon and Kato-Launder k-epsilon turbulencemodel, Engineering Structures Volume 18, Issue 11, November1996, Pages 856-860

12. Stam J. Stable fluids. In: Proceedings of 26th international con-ference on computer graphics and interactive techniques, SIG-GRAPH’99, Los Angeles; 1999.

13. Harris MJ. Real-time cloud simulation and rendering. Ph.D.thesis, University of North Carolina at Chapel Hill; 2003.

14. Song O-Y, Shin H, Ko H.- S. Stable but nondissipative water.ACM Transactions on Graphics 2005;24(1):81-97.

15. Zuo W, Chen Q. Validation of fast fluid dynamics for roomairflow. In: Proceedings of the 10th international IBPSA con-ference, Building Simulation 2007, Beijing, China; 2007.

Stable FluidsJos Stam

Alias wavefront

AbstractBuilding animation tools for fluid-like motions is an important andchallenging problem with many applications in computer graphics.The use of physics-based models for fluid flow can greatly assistin creating such tools. Physical models, unlike key frame or pro-cedural based techniques, permit an animator to almost effortlesslycreate interesting, swirling fluid-like behaviors. Also, the interac-tion of flows with objects and virtual forces is handled elegantly.Until recently, it was believed that physical fluid models were tooexpensive to allow real-time interaction. This was largely due to thefact that previous models used unstable schemes to solve the phys-ical equations governing a fluid. In this paper, for the first time,we propose an unconditionally stable model which still producescomplex fluid-like flows. As well, our method is very easy to im-plement. The stability of our model allows us to take larger timesteps and therefore achieve faster simulations. We have used ourmodel in conjuction with advecting solid textures to create manyfluid-like animations interactively in two- and three-dimensions.

CR Categories: I.3.7 [Computer Graphics]: Three-DimensionalGraphics and Realism—Animation

Keywords: animation of fluids, Navier-Stokes, stable solvers, im-plicit elliptic PDE solvers, interactive modeling, gaseous phenom-ena, advected textures

1 IntroductionOne of the most intriguing problems in computer graphics is thesimulation of fluid-like behavior. A good fluid solver is of greatimportance in many different areas. In the special effects industrythere is a high demand to convincingly mimic the appearance andbehavior of fluids such as smoke, water and fire. Paint programscan also benefit from fluid solvers to emulate traditional techniquessuch as watercolor and oil paint. Texture synthesis is another pos-sible application. Indeed, many textures result from fluid-like pro-cesses, such as erosion. The modeling and simulation of fluids is,of course, also of prime importance in most scientific disciplinesand in engineering. Fluid mechanics is used as the standard math-ematical framework on which these simulations are based. Thereis a consensus among scientists that the Navier-Stokes equationsare a very good model for fluid flow. Thousands of books and

Alias wavefront, 1218 Third Ave, 8th Floor, Seattle, WA 98101, [email protected]

articles have been published in various areas on how to computethese equations numerically. Which solver to use in practice de-pends largely on the problem at hand and on the computing poweravailable. Most engineering tasks require that the simulation pro-vide accurate bounds on the physical quantities involved to answerquestions related to safety, performance, etc. The visual appearance(shape) of the flow is of secondary importance in these applications.In computer graphics, on the other hand, the shape and the behav-ior of the fluid are of primary interest, while physical accuracy issecondary or in some cases irrelevant. Fluid solvers, for computergraphics, should ideally provide a user with a tool that enables herto achieve fluid-like effects in real-time. These factors are more im-portant than strict physical accuracy, which would require too muchcomputational power.In fact, most previous models in computer graphics were driven

by visual appearance and not by physical accuracy. Early flowmodels were built from simple primitives. Various combinations ofthese primitives allowed the animation of particles systems [15, 17]or simple geometries such as leaves [23]. The complexity of theflows was greatly improved with the introduction of random tur-bulences [16, 20]. These turbulences are mass conserving and,therefore, automatically exhibit rotational motion. Also the tur-bulence is periodic in space and time, which is ideal for motion“texture mapping” [19]. Flows built up from a superposition offlow primitives all have the disadvantage that they do not responddynamically to user-applied external forces. Dynamical modelsof fluids based on the Navier-Stokes equations were first imple-mented in two-dimensions. Both Yaeger and Upson and Gamitoet al. used a vortex method coupled with a Poisson solver to cre-ate two-dimensional animations of fluids [24, 8]. Later, Chen etal. animated water surfaces from the pressure term given by a two-dimensional simulation of the Navier-Stokes equations [2]. Theirmethod unlike ours is both limited to two-dimensions and is un-stable. Kass and Miller linearize the shallow water equations tosimulate liquids [12]. The simplifications do not, however, cap-ture the interesting rotational motions characteristic of fluids. Morerecently, Foster and Metaxas clearly show the advantages of us-ing the full three-dimensional Navier-Stokes equations in creatingfluid-like animations [7]. Many effects which are hard to key framemanually such as swirling motion and flows past objects are ob-tained automatically. Their algorithm is based mainly on the workof Harlow and Welch in computational fluid dynamics, which datesback to 1965 [11]. Since then many other techniques which Fos-ter and Metaxas could have used have been developed. However,their model has the advantage of being simple to code, since it isbased on a finite differencing of the Navier-Stokes equations andan explicit time solver. Similar solvers and their source code arealso available from the book of Griebel et al. [9]. The main prob-lem with explicit solvers is that the numerical scheme can becomeunstable for large time-steps. Instability leads to numerical sim-ulations that “blow-up” and therefore have to be restarted with asmaller time-step. The instability of these explicit algorithms setsserious limits on speed and interactivity. Ideally, a user should beable to interact in real-time with a fluid solver without having toworry about possible “blow ups”.In this paper, for the first time, we propose a stable algorithm

that solves the full Navier-Stokes equations. Our algorithm is very

5

the most popular ways is Fast Fluid Dynamics, which intend to

break the Navier-Stokes equations into several sub-equations, and

solve them one by one. The FFD scheme was originally proposed

for computer visualization and computer games [12], [13], [14].

The FFD calculation uses the Helmholtz-Hodge Decomposition,

which states that any vector field w can uniquely be decomposed

into the form:

w = u + ∇ (5)

Define an operator P which projects any vector field w onto

its divergence free part u = Pw. Because u has zero divergence,

the above equation can be written as ∇ · w = ∇2. Then u can be

written as

u = w − ∇ (6)

So

∂u∂ = P(−( · ∇)u + ν∇2u + f) (7)

Then FFD tries to solve the equation into the following for steps:

add force, advert, diffuse and project for the time step δ.Let u0 = u(x 0) representing the initial state. Start from the

solution w0() = u(x ) of the previous time step. The first step is

to add the additional of the external force f. Assuming the force

does not vary considerably during the time step, we have:

w1(x) = w0(x) + δf(x ) (8)

The next step accounts for the effect of advection (or convec-

tion) of the fluid on itself. A disturbance somewhere in the fluid

propagates according to the expression −(u·∇)u. This term makes

the Navier-Stokes equations non-linear. FFD uses simple treatment

to make it linear. At each time step the velocity of the fluid itself

6

moves all the fluid particles. Therefore, to obtain the velocity at apoint at the new time + δ , FFD tried to back trace the pointx through the velocity field w1 over time δ. This defines a pathp(x ) corresponding to a partial stream- line of the velocity field.The new velocity at the point x is then set to the velocity that theparticle, now at x, had at its previous location a time δ ago:

w2(x) = w1(p(x −δ)) (9)The third step solves for the effect of viscosity and is equivalent

to a diffusion equation:∂w2∂ = ν∇2w2 (10)

The equation can be solved using an implicit method:(I − νδ∇2)w3(x) = w2(x) (11)

The fourth step involves the projection step, which makes theresulting field divergence free.

∇2 = ∇ · w3 w4 = w3 − ∇ (12)Some of the recent papers apply FFD to building simulation.

In 2007, Qinyan Chen publish a proceeding paper to describe theinitial work to validate FFD for room airflow to Building SimulationConference 2007 [15]. They second that thought by publishing acomprehensive conclusion on Indoor Air in 2009 [16]. The resultsshowed that the FFD is about 50 times faster than the CFD. The FFDcould correctly predict the laminar flow, such as a laminar flow ina lid-driven cavity at R = 100. But the FFD has some problems incomputing turbulent flows due to the lack of turbulence treatments.Although the FFD can capture the major pattern of the flow, itcannot compute the flow profile as accurate as the CFD does. They

w0

1w2w 3w

4ww

u

q

u=0.

Figure 1: One simulation step of our solver is composed of steps.The first three steps may take the field out of the space of divergentfree fields. The last projection step ensures that the field is divergentfree after the entire simulation step.

xp(x,s)

p(x,!!t)

s0 "!t

Figure 2: To solve for the advection part, we trace each point ofthe field backward in time. The new velocity at is thereforethe velocity that the particle had a time ago at the old location

.

2.2 Method of SolutionEq. 5 is solved from an initial state by marchingthrough time with a time step . Let us assume that the field hasbeen resolved at a time and that we wish to compute the field at alater time . We resolve Eq. 5 over the time span in foursteps. We start from the solution of the previoustime step and then sequentially resolve each term on the right handside of Eq. 5, followed by a projection onto the divergent free fields.The general procedure is illustrated in Figure 1. The steps are:

The solution at time is then given by the last velocity field:. A simulation is obtained by iterating these

steps. We now explain how each step is computed in more detail.The easiest term to solve is the addition of the external force .

If we assume that the force does not vary considerably during thetime step, then

is a good approximation of the effect of the force on the field overthe time step . In an interactive system this is a good approxi-mation, since forces are only applied at the beginning of each timestep.The next step accounts for the effect of advection (or convec-

tion) of the fluid on itself. A disturbance somewhere in the fluidpropagates according to the expression . This termmakes the Navier-Stokes equations non-linear. Foster and Metaxasresolved this component using finite differencing. Their methodis stable only when the time step is sufficiently small such that

, where is the spacing of their computationalgrid. Therefore, for small separations and/or large velocities, verysmall time steps have to be taken. On the other hand, we use a to-tally different approach which results in an unconditionally stablesolver. No matter how big the time step is, our simulations willnever “blow up”. Our method is based on a technique to solve par-tial differential equations known as the method of characteristics.Since this method is of crucial importance in obtaining our stablesolver, we provide all the mathematical details in Appendix A. Themethod, however, can be understood intuitively. At each time stepall the fluid particles are moved by the velocity of the fluid itself.Therefore, to obtain the velocity at a point at the new time ,we backtrace the point through the velocity field over a time. This defines a path corresponding to a partial stream-

line of the velocity field. The new velocity at the point is thenset to the velocity that the particle, now at , had at its previouslocation a time ago:

Figure 2 illustrates the above. This method has several advantages.Most importantly it is unconditionally stable. Indeed, from theabove equation we observe that the maximum value of the newfield is never larger than the largest value of the previous field.Secondly, the method is very easy to implement. All that is re-quired in practice is a particle tracer and a linear interpolator (seenext Section). This method is therefore both stable and simple toimplement, two highly desirable properties of any computer graph-ics fluid solver. We employed a similar scheme to move densitiesthrough user-defined velocity fields [19]. Versions of the method ofcharacteristics were also used by other researchers. The applicationwas either employed in visualizing flow fields [13, 18] or improv-ing the rendering of gas simulations [21, 5]. Our application ofthe technique is fundamentally different, since we use it to updatethe velocity field, which previous researchers did not dynamicallyanimate.The third step solves for the effect of viscosity and is equivalent

to a diffusion equation:

This is a standard equation for which many numerical procedureshave been developed. The most straightforward way of solving thisequation is to discretize the diffusion operator and then to doan explicit time step as Foster and Metaxas did [7]. However, thismethod is unstable when the viscosity is large. We prefer, therefore,to use an implicit method:

where is the identity operator. When the diffusion operator isdiscretized, this leads to a sparse linear system for the unknownfield . Solving such a system can be done efficiently, however(see below).The fourth step involves the projection step, which makes the

resulting field divergence free. As pointed out in the previous sub-section this involves the resolution of the Poisson problem definedby Eq. 4:

The projection step, therefore, requires a good Poisson solver.Foster and Metaxas solved a similar equation using a relaxationscheme. Relaxation schemes, though, have poor convergence andusually require many iterations. Foster and Metaxas reported thatthey obtained good results even after a very small number of re-laxation steps. However, since we are using a different method toresolve for the advection step, we must use a more accurate method.

6

moves all the fluid particles. Therefore, to obtain the velocity at apoint at the new time + δ , FFD tried to back trace the pointx through the velocity field w1 over time δ. This defines a pathp(x ) corresponding to a partial stream- line of the velocity field.The new velocity at the point x is then set to the velocity that theparticle, now at x, had at its previous location a time δ ago:

w2(x) = w1(p(x −δ)) (9)The third step solves for the effect of viscosity and is equivalent

to a diffusion equation:∂w2∂ = ν∇2w2 (10)

The equation can be solved using an implicit method:(I − νδ∇2)w3(x) = w2(x) (11)

The fourth step involves the projection step, which makes theresulting field divergence free.

∇2 = ∇ · w3 w4 = w3 − ∇ (12)Some of the recent papers apply FFD to building simulation.

In 2007, Qinyan Chen publish a proceeding paper to describe theinitial work to validate FFD for room airflow to Building SimulationConference 2007 [15]. They second that thought by publishing acomprehensive conclusion on Indoor Air in 2009 [16]. The resultsshowed that the FFD is about 50 times faster than the CFD. The FFDcould correctly predict the laminar flow, such as a laminar flow ina lid-driven cavity at R = 100. But the FFD has some problems incomputing turbulent flows due to the lack of turbulence treatments.Although the FFD can capture the major pattern of the flow, itcannot compute the flow profile as accurate as the CFD does. They

637

Fast Fluid DynamicsSimulation on the GPUMark J. HarrisUniversity of North Carolina at Chapel Hill

Chapter 38

38.1 Introduction

This chapter describes a method for fast, stable fluid simulation that runs entirely onthe GPU. It introduces fluid dynamics and the associated mathematics, and it describesin detail the techniques to perform the simulation on the GPU. After reading thischapter, you should have a basic understanding of fluid dynamics and know how tosimulate fluids using the GPU. The source code accompanying this book demonstratesthe techniques described in this chapter.

38.1 IntroductionFluids are everywhere: water passing between riverbanks, smoke curling from a glowingcigarette, steam rushing from a teapot, water vapor forming into clouds, and paintbeing mixed in a can. Underlying all of them is the flow of fluids. All are phenomenathat we would like to portray realistically in interactive graphics applications. Figure38-1 shows examples of fluids simulated using the source code provided with this book.

Fluid simulation is a useful building block that is the basis for simulating a variety ofnatural phenomena. Because of the large amount of parallelism in graphics hardware,the simulation we describe runs significantly faster on the GPU than on the CPU.Using an NVIDIA GeForce FX, we have achieved a speedup of up to six times over anequivalent CPU simulation.

650

apply Equation 16 at every grid cell, using the results of the previous iteration as inputto the next (x(k+1) becomes x(k)). Because Jacobi iteration converges slowly, we need toexecute many iterations. Fortunately, Jacobi iterations are cheap to execute on theGPU, so we can run many iterations in a very short time.

Initial and Boundary ConditionsAny differential equation problem defined on a finite domain requires boundary condi-tions in order to be well posed. The boundary conditions determine how we computevalues at the edges of the simulation domain. Also, to compute the evolution of theflow over time, we must know how it started—in other words, its initial conditions. Forour fluid simulation, we assume the fluid initially has zero velocity and zero pressureeverywhere. Boundary conditions require a bit more discussion.

During each time step, we solve equations for two quantities—velocity and pressure—and we need boundary conditions for both. Because our fluid is simulated on a rectan-gular grid, we assume that it is a fluid in a box and cannot flow through the sides of thebox. For velocity, we use the no-slip condition, which specifies that velocity goes to zeroat the boundaries. The correct solution of the Poisson-pressure equation requires pureNeumann boundary conditions: ∂p/∂n = 0. This means that at a boundary, the rate ofchange of pressure in the direction normal to the boundary is zero. We revisit boundaryconditions at the end of Section 38.3.

38.3 ImplementationNow that we understand the problem and the basics of solving it, we can move forwardwith the implementation. A good place to start is to lay out some pseudocode for thealgorithm. The algorithm is the same every time step, so this pseudocode represents asingle time step. The variables u and p hold the velocity and pressure field data.

// Apply the first 3 operators in Equation 12.u = advect(u);u = diffuse(u);u = addForces(u);// Now apply the projection operator to the result.p = computePressure(u);u = subtractPressureGradient(u, p);

Chapter 38 Fast Fluid Dynamics Simulation on the GPU

In practice, temporary storage is needed, because most of these operations cannot beperformed in place. For example, the advection step in the pseudocode is more accu-rately written as:

uTemp = advect(u);swap(u, uTemp);

This pseudocode contains no implementation-specific details. In fact, the samepseudocode describes CPU and GPU implementations equally well. Our goal is toperform all the steps on the GPU. Computation of this sort on the GPU may be unfa-miliar to some readers, so we will draw some analogies between operations in a typicalCPU fluid simulation and their counterparts on the GPU.

38.3.1 CPU–GPU AnalogiesFundamental to any computer are its memory and processing models, so any applica-tion must consider data representation and computation. Let’s touch on the differencesbetween CPUs and GPUs with regard to both of these.

Textures = ArraysOur simulation represents data on a two-dimensional grid. The natural representationfor this grid on the CPU is an array. The analog of an array on the GPU is a texture.Although textures are not as flexible as arrays, their flexibility is improving as graphicshardware evolves. Textures on current GPUs support all the basic operations necessary toimplement a fluid simulation. Because textures usually have three or four color channels,they provide a natural data structure for vector data types with two to four components.Alternatively, multiple scalar fields can be stored in a single texture. The most basic oper-ation is an array (or memory) read, which is accomplished by using a texture lookup.Thus, the GPU analog of an array offset is a texture coordinate. We need at least twotextures to represent the state of the fluid: one for velocity and one for pressure. In orderto visualize the flow, we maintain an additional texture that contains a quantity carriedby the fluid. We can think of this as “ink.” Figure 38-4 shows examples of these textures,as well as an additional texture for vorticity, described in Section 38.5.1.

Loop Bodies = Fragment ProgramsA CPU implementation of the simulation performs the steps in the algorithm by loop-ing, using a pair of nested loops to iterate over each cell in the grid. At each cell, thesame computation is performed. GPUs do not have the capability to perform this innerloop over each texel in a texture. However, the fragment pipeline is designed to perform

38.3 Implementation 651

652 Chapter 38 Fast Fluid Dynamics Simulation on the GPU

identical computations at each fragment. To the programmer, it appears as if there is aprocessor for each fragment, and that all fragments are updated simultaneously. In theparlance of parallel programming, this model is known as single instruction, multipledata (SIMD) computation. Thus, the GPU analog of computation inside nested loopsover an array is a fragment program applied in SIMD fashion to each fragment.

Feedback = Texture UpdateIn Section 38.2.4, we described how we use Jacobi iteration to solve Poisson equations.This type of iterative method uses the result of an iteration as input for the next itera-tion. This feedback is common in numerical methods. In a CPU implementation, onetypically does not even consider feedback, because it is trivially implemented usingvariables and arrays that can be both read and written. On the GPU, though, the out-put of fragment processors is always written to the frame buffer. Think of the framebuffer as a two-dimensional array that cannot be directly read. There are two ways toget the contents of the frame buffer into a texture that can be read:

! Copy to texture (CTT) copies from the frame buffer to a texture.! Render to texture (RTT) uses a texture as the frame buffer so the GPU can write di-

rectly to it.

CTT and RTT function equally well, but have a performance trade-off. For the sake ofgenerality we do not assume the use of either and refer to the process of writing to atexture as a texture update.

Earlier we mentioned that, in practice, each of the five steps in the algorithm updates atemporary grid and then performs a swap. RTT requires the use of two textures toimplement feedback, because the results of rendering to a texture while it is bound for

Figure 38-4. The State Fields of a Fluid Simulation, Stored in TexturesFrom left to right, the fields are “ink,” velocity (scaled and biased into the range [0, 1], so zerovelocity is gray), pressure (blue represents low pressure, red represents high pressure), andvorticity (yellow represents counter-clockwise rotation, blue represents clockwise rotation).

In practice, temporary storage is needed, because most of these operations cannot beperformed in place. For example, the advection step in the pseudocode is more accu-rately written as:

uTemp = advect(u);swap(u, uTemp);

This pseudocode contains no implementation-specific details. In fact, the samepseudocode describes CPU and GPU implementations equally well. Our goal is toperform all the steps on the GPU. Computation of this sort on the GPU may be unfa-miliar to some readers, so we will draw some analogies between operations in a typicalCPU fluid simulation and their counterparts on the GPU.

38.3.1 CPU–GPU AnalogiesFundamental to any computer are its memory and processing models, so any applica-tion must consider data representation and computation. Let’s touch on the differencesbetween CPUs and GPUs with regard to both of these.

Textures = ArraysOur simulation represents data on a two-dimensional grid. The natural representationfor this grid on the CPU is an array. The analog of an array on the GPU is a texture.Although textures are not as flexible as arrays, their flexibility is improving as graphicshardware evolves. Textures on current GPUs support all the basic operations necessary toimplement a fluid simulation. Because textures usually have three or four color channels,they provide a natural data structure for vector data types with two to four components.Alternatively, multiple scalar fields can be stored in a single texture. The most basic oper-ation is an array (or memory) read, which is accomplished by using a texture lookup.Thus, the GPU analog of an array offset is a texture coordinate. We need at least twotextures to represent the state of the fluid: one for velocity and one for pressure. In orderto visualize the flow, we maintain an additional texture that contains a quantity carriedby the fluid. We can think of this as “ink.” Figure 38-4 shows examples of these textures,as well as an additional texture for vorticity, described in Section 38.5.1.

Loop Bodies = Fragment ProgramsA CPU implementation of the simulation performs the steps in the algorithm by loop-ing, using a pair of nested loops to iterate over each cell in the grid. At each cell, thesame computation is performed. GPUs do not have the capability to perform this innerloop over each texel in a texture. However, the fragment pipeline is designed to perform


652 Chapter 38 Fast Fluid Dynamics Simulation on the GPU

identical computations at each fragment. To the programmer, it appears as if there is aprocessor for each fragment, and that all fragments are updated simultaneously. In theparlance of parallel programming, this model is known as single instruction, multipledata (SIMD) computation. Thus, the GPU analog of computation inside nested loopsover an array is a fragment program applied in SIMD fashion to each fragment.

Feedback = Texture UpdateIn Section 38.2.4, we described how we use Jacobi iteration to solve Poisson equations.This type of iterative method uses the result of an iteration as input for the next itera-tion. This feedback is common in numerical methods. In a CPU implementation, onetypically does not even consider feedback, because it is trivially implemented usingvariables and arrays that can be both read and written. On the GPU, though, the out-put of fragment processors is always written to the frame buffer. Think of the framebuffer as a two-dimensional array that cannot be directly read. There are two ways toget the contents of the frame buffer into a texture that can be read:

! Copy to texture (CTT) copies from the frame buffer to a texture.! Render to texture (RTT) uses a texture as the frame buffer so the GPU can write di-

rectly to it.

CTT and RTT function equally well, but have a performance trade-off. For the sake ofgenerality we do not assume the use of either and refer to the process of writing to atexture as a texture update.

Earlier we mentioned that, in practice, each of the five steps in the algorithm updates atemporary grid and then performs a swap. RTT requires the use of two textures toimplement feedback, because the results of rendering to a texture while it is bound for

Figure 38-4. The State Fields of a Fluid Simulation, Stored in TexturesFrom left to right, the fields are “ink,” velocity (scaled and biased into the range [0, 1], so zerovelocity is gray), pressure (blue represents low pressure, red represents high pressure), andvorticity (yellow represents counter-clockwise rotation, blue represents clockwise rotation).

654

reciprocal of the grid scale δx. The texture wrap mode must be set to CLAMP_TO_EDGEso that back-tracing outside the range [0, N ] will be clamped to the boundary texels.The boundary conditions described later correctly update these texels so that this situa-tion operates correctly.

Listing 38-1. Advection Fragment Program

void advect(float2 coords : WPOS, // grid coordinatesout float4 xNew : COLOR, // advected qtyuniform float timestep,uniform float rdx, // 1 / grid scaleuniform samplerRECT u, // input velocityuniform samplerRECT x) // qty to advect

// follow the velocity field "back in time"float2 pos = coords - timestep * rdx * f2texRECT(u, coords);

// interpolate and write to the output fragmentxNew = f4texRECTbilerp(x, pos);


Figure 38-5. Primitives Used to Update the Interior and Boundaries of the GridUpdating a grid involves rendering a quad for the interior and lines for the boundaries. Separatefragment programs are applied to interior and border fragments.

In this code, the parameter u is the velocity field texture, and x is the field that is to beadvected. This could be the velocity or another quantity, such as dye concentration.The function f4texRECTbilerp() is a utility to perform bilinear interpolation ofthe four texels closest to the texture coordinates passed to it. Because current GPUs donot support automatic bilinear interpolation in floating-point textures, we must imple-ment it with this type of code.

Viscous DiffusionWith the description of the Jacobi iteration technique given in Section 38.2.4, writinga Jacobi iteration fragment program is simple, as shown in Listing 38-2.

Listing 38-2. The Jacobi Iteration Fragment Program Used to Solve Poisson Equations

void jacobi(half2 coords : WPOS, // grid coordinatesout half4 xNew : COLOR, // resultuniform half alpha,uniform half rBeta, // reciprocal betauniform samplerRECT x, // x vector (Ax = b)uniform samplerRECT b) // b vector (Ax = b)

// left, right, bottom, and top x sampleshalf4 xL = h4texRECT(x, coords - half2(1, 0));half4 xR = h4texRECT(x, coords + half2(1, 0));half4 xB = h4texRECT(x, coords - half2(0, 1));half4 xT = h4texRECT(x, coords + half2(0, 1));

// b sample, from centerhalf4 bC = h4texRECT(b, coords);

// evaluate Jacobi iterationxNew = (xL + xR + xB + xT + alpha * bC) * rBeta;

Notice that the rBeta parameter is the reciprocal of β from Equation 16. To solve thediffusion equation, we set alpha to (δx)2/νδt , rBeta to 1/(4 + (δx)2/νδt), and the xand b parameters to the velocity texture. We then run a number of iterations (usually20 to 50, but more can be used to reduce the error).

Force ApplicationThe simplest step in our algorithm is computing the acceleration caused by externalforces. In the demonstration application found in the accompanying materials, you can


Listing 38-3. The Divergence Fragment Program

void divergence(half2 coords : WPOS, // grid coordinatesout half4 div : COLOR, // divergenceuniform half halfrdx, // 0.5 / gridscaleuniform samplerRECT w) // vector field

half4 wL = h4texRECT(w, coords - half2(1, 0));half4 wR = h4texRECT(w, coords + half2(1, 0));half4 wB = h4texRECT(w, coords - half2(0, 1));half4 wT = h4texRECT(w, coords + half2(0, 1));

div = halfrdx * ((wR.x - wL.x) + (wT.y - wB.y));

pressure field texture to the parameter p in the following program, which computes thegradient of p according to the definition in Table 38-1 and subtracts it from the inter-mediate velocity field texture in parameter w. See Listing 38-4.

Listing 38-4. The Gradient Subtraction Fragment Program

void gradient(half2 coords : WPOS, // grid coordinatesout half4 uNew : COLOR, // new velocityuniform half halfrdx, // 0.5 / gridscaleuniform samplerRECT p, // pressureuniform samplerRECT w) // velocity

half pL = h1texRECT(p, coords - half2(1, 0));half pR = h1texRECT(p, coords + half2(1, 0));half pB = h1texRECT(p, coords - half2(0, 1));half pT = h1texRECT(p, coords + half2(0, 1));

uNew = h4texRECT(w, coords);uNew.xy -= halfrdx * half2(pR - pL, pT - pB);

Boundary ConditionsIn Section 38.2.4, we determined that our “fluid in a box” requires no-slip (zero) velocityboundary conditions and pure Neumann pressure boundary conditions. In Section38.3.2 we learned that we can implement boundary conditions by reserving the one-pixelperimeter of our grid for storing boundary values. We update these values by drawing lineprimitives over the border, using a fragment program that sets the values appropriately.


658

First we should look at how our grid discretization affects the computation of boundaryconditions. The no-slip condition dictates that velocity equals zero on the boundaries,and the pure Neumann pressure condition requires the normal pressure derivative to bezero at the boundaries. The boundary is defined to lie on the edge between the bound-ary cell and its nearest interior cell, but grid values are defined at cell centers. Therefore,we must compute boundary values such that the average of the two cells adjacent toany edge satisfies the boundary condition.

For the velocity boundary on the left side, for example, we have:

where N is the grid resolution. In order to satisfy this equation, we must set u0, j equalto –u1, j. The pressure equation works out similarly. Using the forward difference ap-proximation of the derivative, we get:

On solving this equation for p0, j,we see that we need to set each pressure boundaryvalue to the value just inside the boundary.

We can use a simple fragment program for both the pressure and the velocity bound-aries, as shown in Listing 38-5.

Listing 38-5. The Boundary Condition Fragment Program

void boundary(half2 coords : WPOS, // grid coordinateshalf2 offset : TEX1, // boundary offsetout half4 bv : COLOR, // output valueuniform half scale, // scale parameteruniform samplerRECT x) // state field

bv = scale * h4texRECT(x, coords + offset);

Figure 38-6 demonstrates how this program works. The x parameter represents thetexture (velocity or pressure field) from which we read interior values. The offsetparameter contains the correct offset to the interior cells adjacent to the current bound-ary. The coords parameter contains the position in texture coordinates of the frag-ment being processed, so adding offset to it addresses a neighboring texel. At each

p p

xj j1 0 0, , .−

=δ

(18)

u u0 1

20 0, , , , ,j j j N

+= ∈ [ ] for (17)


662

38.5.1 Vorticity ConfinementThe motion of smoke, air and other low-viscosity fluids typically contains rotationalflows at a variety of scales. This rotational flow is vorticity. As Fedkiw et al. explained,numerical dissipation caused by simulation on a coarse grid damps out these interestingfeatures (Fedkiw et al. 2001). Therefore, they used vorticity confinement to restore thesefine-scale motions. Vorticity confinement works by first computing the vorticity, ω =∇ × u. From the vorticity we compute a normalized vorticity vector field:

Here, The vectors in this vector field point from areas of lower vorticity toareas of higher vorticity. From these vectors we compute a force that can be used torestore an approximation of the dissipated vorticity:

η ω= ∇ .

Ψ ηη

= .


Figure 38-7. Cloud SimulationA sequence of frames (20 iterations apart) from a two-dimensional cloud simulation running on aGPU.

Fast and informative flow simulations in a building by using fast fluid dynamicsmodel on graphics processing unit

Wangda Zuo, Qingyan Chen*

National Air Transportation Center of Excellence for Research in the Intermodal Transport Environment (RITE), School of Mechanical Engineering, Purdue University, 585 Purdue Mall,West Lafayette, IN 47907-2088, USA

a r t i c l e i n f o

Article history:Received 14 April 2009Received in revised form17 August 2009Accepted 19 August 2009

Keywords:Graphics Processing Unit (GPU)Airflow simulationFast Fluid Dynamics (FFD)Parallel computingCentral Processing Unit (CPU)

a b s t r a c t

Fast indoor airflow simulations are necessary for building emergency management, preliminary design ofsustainable buildings, and real-time indoor environment control. The simulation should also be infor-mative since the airflow motion, temperature distribution, and contaminant concentration are impor-tant. Unfortunately, none of the current indoor airflow simulation techniques can satisfy bothrequirements at the same time. Our previous study proposed a Fast Fluid Dynamics (FFD) model forindoor flow simulation. The FFD is an intermediate method between the Computational Fluid Dynamics(CFD) and multizone/zonal models. It can efficiently solve Navier–Stokes equations and other trans-portation equations for energy and species at a speed of 50 times faster than the CFD. However, thisspeed is still not fast enough to do real-time simulation for a whole building. This paper reports ourefforts on further accelerating FFD simulation by running it in parallel on a Graphics Processing Unit(GPU). This study validated the FFD on the GPU by simulating the flow in a lid-driven cavity, channelflow, forced convective flow, and natural convective flow. The results show that the FFD on the GPU canproduce reasonable results for those indoor flows. In addition, the FFD on the GPU is 10–30 times fasterthan that on a Central Processing Unit (CPU). As a whole, the FFD on a GPU can be 500–1500 times fasterthan the CFD on a CPU. By applying the FFD to the GPU, it is possible to do real-time informative airflowsimulation for a small building.

! 2009 Elsevier Ltd. All rights reserved.

1. Introduction

According to the United States Fire Administration [1], 3430civilians and 118 firefighters lost their lives in fires in 2007, with anadditional 17,675 civilians injured. Smoke inhalation is responsiblefor most fire-related injuries and deaths in buildings. Computersimulations can predict the transportation of poisonous air/gas inbuildings. If the prediction is in real-time or faster-than-real-time,firefighters can follow appropriate rescue plans to minimize casu-alties. In addition, to design sustainable buildings that can providea comfortable and healthy indoor environment with less energyconsumption, it is essential to know the distributions of air velocity,air temperature, and contaminant concentration in buildings. Flowsimulations in buildings can provide this information [2]. Again, thepredictions should be rapid due to the limited time available duringthe design process. Furthermore, one can optimize building HVACcontrol systems if the indoor environment can be simulated in real-time or faster-than-real-time.

However, none of the current flow simulation techniques forbuildings can satisfy the requirements for obtaining results quicklyand informatively. For example, CFD is an important tool instudying flowand contaminant transport in buildings [3]. But whenthe simulated flow domain is large or the flow is complex, the CFDsimulation requires a large amount of computing meshes. Conse-quently, it needs a very long computing time if it is only usinga single processor computer [4].

A typical approach to reduce the computing time for indoorairflow simulations is to reduce the order of flow simulationmodels. Zonal models [5] divide a room into several zones andassume that air property in a zone is uniform. Based on thisassumption, zonal models only compute a few nodes for a room togreatly reduce related computing demands. Multizone models [6]expand the uniform assumption to the whole room so that thenumber of computing nodes can be further reduced. Theseapproaches are widely used for air simulations in a whole building.However, the zonal and multizone models solve only the masscontinuity, energy, and species concentration equations but not themomentum equations. They are fast but not accurate enough sincethey can only provide the bulk information of each zone without

* Corresponding author. Tel.: þ1 765 496 7562; fax: þ1 765 494 0539.E-mail addresses: [email protected] (W. Zuo), [email protected] (Q. Chen).

Contents lists available at ScienceDirect

Building and Environment

journal homepage: www.elsevier .com/locate/bui ldenv

0360-1323/$ – see front matter ! 2009 Elsevier Ltd. All rights reserved.doi:10.1016/j.buildenv.2009.08.008

Building and Environment 45 (2010) 747–757

the details about the airflow and contaminant transport inside thezone [6].

Recently, an FFD method [7] has been proposed for fast flowsimulations in buildings as an intermediate method between theCFD and zonal/multizone models. The FFD method solves thecontinuity equation and unsteady Navier–Stokes equations asthe CFD does. By using a different numerical scheme to solve thegoverning equations, the FFD can run about 50 times faster than theCFD with the same numerical setting on a single CPU [8]. Althoughthe FFD is not as accurate as the CFD, it can provide more detailedinformation than a multizone model or a zonal model.

Although the FFD is much faster than the CFD, its speed is stillnot fast enough for the real-time flow simulation in a building. Forexample, our previous work [8] found that the FFD simulation canbe real-time with 65,000 grids. If a simulation domain with30! 30! 30 grids is applied for a room, the FFD code can onlysimulate the airflow in 2–3 rooms on real-time. Hence, if we wantto do real-time simulation for a large building, we have to furtheraccelerate the FFD simulation.

To reduce the computing time, many researchers have per-formed the flow simulations in parallel on multi-processorcomputers [9,10]. It is also possible to speed up the FFD simulationby running it in parallel on a multi-processor computer. However,this approach needs large investments in equipment purchase andinstallation and a designated space for installing the computers andthe related capacity of the cooling system used in the space. Inaddition, the fees for the operation and maintenance of a multi-processor computer are also nearly the same as those of severalsingle processor computers of the same capacity. Hence, multi-processor computers are a luxury for building designers or emer-gency management teams.

Recently, the GPU has attracted attention for parallel computing.Different from a CPU, the GPU is the core of a computer graphicscard and integrates multiple processors on a single chip. Its struc-ture is highly parallelized to achieve high performance for imageprocessing. For example, an NVIDA GeForce 8800 GTX GPU, avail-able since 2006, integrates 128 processors so that its peakcomputing speed is 367 GFLOPS (Giga FLoating point Operation PerSecond). Comparatively, the peak performance of an INETL Core2Duo 3.0 GHz CPU available at the same time is only about 32GFLOPS [11]. Fig.1 compares the computing speeds of CPU and GPU.The speed gap between the CPU and the GPU has been increasingsince 2003. Furthermore, this trend is likely to continue in thefuture. Besides GPU’s high performance, the cost of a GPU is low. Forexample, a graphics card with NVIDIA GeForce 8800 GTX GPU costsonly around $500. It can easily be installed onto a personalcomputer and there are no other additional costs.

Thus, it seems possible to realize fast and informative indoorairflow simulations by using the FFD on a GPU. This paper reports

our efforts to implement the FFD model in parallel on an NVIDIAGeForce 8800 GTX GPU. The GPU code was then validated bysimulating several flows that consist of the basic features of indoorairflows.

2. Fast fluid dynamics

Our investigation used the FFD scheme proposed by Stam [7].The FFD applies a splitting method to solve the continuity equation(1) and Navier–Stokes equation (2) for an unsteady incompressibleflow:

vUivxi

¼ 0; (1)

vUivt

¼ #UjvUivxj

þ nv2Ui

vx2j# 1

rvPvxi

þ fir; (2)

where Ui and Uj are fluid velocity components in xi and xj directions,respectively; n is kinematic viscosity; r is fluid density; P is pres-sure; t is time; and fi are body forces, such as buoyancy force and

0

50

100

150

200

250

300

350

400

2003 2004 2005 2006 2007

GFL

OPS

year

GPUCPU

Fig. 1. Comparison of the computing speeds of GPU (NVIDIA) and CPU (INTEL) since2003 [11].

Nomenclature

ai,j, bi,j equation coefficient (dimensionless)C contaminant concentration (kg/m3)fi body force (kg/m2 s2)H the width of the room (m)i, j mesh node indiceskC contaminant diffusivity (m2/s)kT thermal diffusivity (m2/s)L length scale (m)P pressure (kg/m s2)SC contaminant source (kg/m3 s)ST heat source (%C/s)

T temperature (%C)t time (s)uij velocity components at mesh node (i, j) (m/s)Ub bulk velocity (m/s)Ui, Uj velocity components in xi and xj directions,

respectively (m/s)U horizontal velocity or velocity scale (m/s)V vertical velocity (m/s)xi, xj spatial coordinatesx, y spatial coordinatesDt time step (s)n kinematic viscosity (m2/s)0 previous time step

W. Zuo, Q. Chen / Building and Environment 45 (2010) 747–757748

other external forces. The FFD splits the Navier–Stokes equation (2)into three simple equations (3)–(5). Then it solves them one by one.

vUivt

¼ "UjvUivxj

; (3)

vUivt

¼ nv2Ui

vx2jþ fi

r; (4)

vUivt

¼ "1rvPvxi

; (5)

Equation (3) can be reformatted as

vUivt

þ UjvUivxj

¼ DUiDt

¼ 0; (6)

where DUi/Dt is material derivative. This means that if we followa flow particle, the flow properties, such as velocities Ui, on thisparticle, will not change with time. Therefore, one can get the valueof Ui by finding its value at the previous time step. The currentstudy used a first order semi-Lagrangian approach [12] to calculatethe value of Ui.

Equation (4) is a typical unsteady diffusion equation. One caneasily solve it by using an iterative scheme such as Gauss-Seideliteration or Jacobi iteration. This work has applied the Jacobi iter-ation since it can solve the equation in parallel.

Finally, it ensures mass conservation by solving equations (1)and (5) together with a pressure-correction projectionmethod [13].The idea of the projection method is that the pressure should beadjusted so that the velocities satisfy the mass conservation.Assuming Ui

0 is the velocity obtained from equation (4), equation(5) can be expanded to

Ui " U0i

Dt¼ "1

rv

vxiP; (7)

where Dt is time step size and Ui is the unknown velocity, whichsatisfy the continuity Equation (1):

vUivxi

¼ 0: (8)

Substituting equation (7) into (8), one can get

vU0i

vxi¼ "

Dtr

v2Pvx2i

: (9)

Solving equation (9), one can obtain P. Substituting P intoequation (7), Ui will be known.

The energy equation can be written as:

vTvt

¼ "UjvTvxj

þ kTv2Tvx2j

þ ST ; (10)

Host (CPU)

Device (GPU)

Grid 1Grid 2

Grid 3, 4, …….

Block(2,2)

Block(1,2)

Block(0,2)

Block(2,1)

Block(1,1)

Block(0,1)

Block(2,0)

Block(1,0)

Block(0,0)

Grid 1 Grid 2 Grid 3, 4, ……

Block(2,2)

Block(1,2)

Block(0,2)

Block(2,1)

Block(1,1)

Block(0,1)

Block(2,0)

Block(1,0)

Block(0,0)

Thread (2,2)

Thread (1,2)

Thread (0,2)

Thread(2,1)

Thread (1,1)

Thread (0,1)

Thread (2,0)

Thread (1,0)

Thread(0,0)

Block(0,0)

…………

Thread (2,2)

Thread (1,2)

Thread (0,2)

Thread(2,1)

Thread (1,1)

Thread (0,1)

Thread (2,0)

Thread (1,0)

Thread(0,0)

Block(1,0)

…………

Fig. 2. The schematic of parallel computing on CUDA.

W. Zuo, Q. Chen / Building and Environment 45 (2010) 747–757 749

where T is temperature, kT is thermal diffusivity, and ST is heatsource. The FFD solves the equation (10) in a similar way as equa-tion (2) except for the pressure-correction projection for massconservation.

Very similarly, the FFD also determines concentrations ofspecies by the following transportation equation:

vCvt

¼ "UjvCvxj

þ kCv2Cvx2j

þ SC; (11)

where C is the species concentration, kC is the diffusivity, and SC isthe source.

The FFD scheme was originally proposed for computer visuali-zation and computer games [7,14,15]. In our previous work [8,16],the authors have studied the performance of the FFD scheme forindoor environment by computing different indoor airflows. Theresults showed that the FFD is about 50 times faster than the CFD.The FFD could correctly predict the laminar flow, such as a laminarflow in a lid-driven cavity at Re¼ 100 [16]. But the FFD has some

Read Parameters

AllocateCPU Memory

InitializeCPU Variables

FFD Solver

Finish?

AllocateGPU Memory

InitializeGPU Variables

Write Data File

FreeCPU Memory

FreeGPU Memory

End

Send Data to GPU

Receive Data from CPU

Send Data to CPU

Receive Data from GPU

CPU GPU

YesNo

Command Command and Data

Fig. 3. The schematic for implementing the FFD on the GPU.

Block

(2,2)

Block

(1,2)

Block

(0,2)

Block

(2,1)

Block

(1,1)

Block

(0,1)

Block

(2,0)

Block

(1,0)

Block

(0,0)

Fig. 4. Allocation of mesh nodes to GPU blocks.


simultaneously hold up to 12,288 threads. Because CUDA does notallow one block to spread into two SMs, the allocation of the blocksis crucial to employ the full capacity of a GPU. For example, if a blockhas 512 threads, then only one block can be assigned to one SM andthe rest of the 256 threads in that SM are unused. If a block contains256 threads, then 3 blocks can share all the 768 threads of an SM sothat the SM can be fully used. Theoretically, the 8800 GTX GPU canreach its peak performance when all 12,288 threads are running atthe same time. Practically, the peak performance also depends onmany other factors, such as the time for reading or writing datawith the memory.

4. Implementation

The FFDwas implemented on the GPU by using CUDAversion 1.1[11]. Fig. 3 shows the program structure. The implementation usedthe CPU to read, initialize, and write the data. The FFD parallelsolver, which is the core of the program, runs on the GPU.

Our program assigned one thread for each mesh node. Theimplementation further defined a block with a two-dimensionalmatrix that contained (16!16¼ 256) threads. By this means, anSM used three blocks to utilize all of its 768 threads. For simplicity,the current implementation only adopted one grid for all theblocks. As a result, the number of threads on each dimension of thegrid was the multiplication of 16. However, the number of meshnodes on each dimension may not always be the multiplication of16. For instance, the mesh (shaded part) in Fig. 4 would not fit intofour blocks (0,0; 0,1; 1,0; and 1,1). Thus, it is necessary to use nineblocks for the mesh. Consequently, some threads in those fiveadditional blocks (0,2; 1,2; 2,0; 2,1; and 2,2) could be idled sincethey did not have mesh nodes. Although this strategy is not themost optimal, its implementation is the easiest.

The FFD parallel solver on the GPU is the core of our program.The solver consists of different functions for the split equations (3)–(5) in the governing equations. However, the implementations ofvarious functions are similar in principle. Fig. 5 demonstrates theschematic employed in solving the diffusion equation (4) forvelocity component ui,j. Before the iteration starts, our programdefines the dimensions of grids and blocks for the parallelcomputing. In each iteration, the program first solves ui,j at theinterior nodes in parallel, then ui,j at the boundary nodes.

In the parallel job, it is important to map the thread indices(threadID.x, threadID.y) in a block onto the coordinate of the meshnodes (i, j). The ‘‘Locate Thread (i, j)’’ step in Fig. 5 applied thefollowing formulas:

i ¼ blockDim:x! blockID:xþ threadID:x; (12)

j ¼ blockDim:y! blockID:yþ threadID:y: (13)

where blockID.x and blockID.y are the indices of the block whichcontains this thread. The blockDim.x and blockDim.y are the blockdimensions at x and y directions, respectively. Both of them are 16in our program.

i, j i+1, j

i, j+1

i–1, j

i, j-1

Fig. 6. Coordinates for the computing meshes.

U=1m/s

U=V=0 U=V=0

U=V=0

Fig. 7. Schematic of the flow in a square lid-driven cavity.

0

0.2

0.4

0.6

0.8

1

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

y(m

)

U(m/s)

GPUGHIA

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0 0.2 0.4 0.6 0.8 1

V(m

/s)

x(m)

GPUGHIA

a b

Fig. 8. a. Comparison of the calculated horizontal velocity profile (Re¼ 100) at x¼ 0.5 m with Ghia’s data [21]. b. Comparison of the calculated vertical velocity profile (Re¼ 100) aty¼ 0.5 m with Ghia’s data [21].


simultaneously hold up to 12,288 threads. Because CUDA does notallow one block to spread into two SMs, the allocation of the blocksis crucial to employ the full capacity of a GPU. For example, if a blockhas 512 threads, then only one block can be assigned to one SM andthe rest of the 256 threads in that SM are unused. If a block contains256 threads, then 3 blocks can share all the 768 threads of an SM sothat the SM can be fully used. Theoretically, the 8800 GTX GPU canreach its peak performance when all 12,288 threads are running atthe same time. Practically, the peak performance also depends onmany other factors, such as the time for reading or writing datawith the memory.

4. Implementation

The FFDwas implemented on the GPU by using CUDAversion 1.1[11]. Fig. 3 shows the program structure. The implementation usedthe CPU to read, initialize, and write the data. The FFD parallelsolver, which is the core of the program, runs on the GPU.

Our program assigned one thread for each mesh node. Theimplementation further defined a block with a two-dimensionalmatrix that contained (16!16¼ 256) threads. By this means, anSM used three blocks to utilize all of its 768 threads. For simplicity,the current implementation only adopted one grid for all theblocks. As a result, the number of threads on each dimension of thegrid was the multiplication of 16. However, the number of meshnodes on each dimension may not always be the multiplication of16. For instance, the mesh (shaded part) in Fig. 4 would not fit intofour blocks (0,0; 0,1; 1,0; and 1,1). Thus, it is necessary to use nineblocks for the mesh. Consequently, some threads in those fiveadditional blocks (0,2; 1,2; 2,0; 2,1; and 2,2) could be idled sincethey did not have mesh nodes. Although this strategy is not themost optimal, its implementation is the easiest.

The FFD parallel solver on the GPU is the core of our program.The solver consists of different functions for the split equations (3)–(5) in the governing equations. However, the implementations ofvarious functions are similar in principle. Fig. 5 demonstrates theschematic employed in solving the diffusion equation (4) forvelocity component ui,j. Before the iteration starts, our programdefines the dimensions of grids and blocks for the parallelcomputing. In each iteration, the program first solves ui,j at theinterior nodes in parallel, then ui,j at the boundary nodes.

In the parallel job, it is important to map the thread indices(threadID.x, threadID.y) in a block onto the coordinate of the meshnodes (i, j). The ‘‘Locate Thread (i, j)’’ step in Fig. 5 applied thefollowing formulas:

i ¼ blockDim:x! blockID:xþ threadID:x; (12)

j ¼ blockDim:y! blockID:yþ threadID:y: (13)

where blockID.x and blockID.y are the indices of the block whichcontains this thread. The blockDim.x and blockDim.y are the blockdimensions at x and y directions, respectively. Both of them are 16in our program.

i, j i+1, j

i, j+1

i–1, j

i, j-1

Fig. 6. Coordinates for the computing meshes.

U=1m/s

U=V=0 U=V=0

U=V=0

Fig. 7. Schematic of the flow in a square lid-driven cavity.

0

0.2

0.4

0.6

0.8

1

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

y(m

)

U(m/s)

GPUGHIA

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0 0.2 0.4 0.6 0.8 1V

(m/s

)x(m)

GPUGHIA

a b

Fig. 8. a. Comparison of the calculated horizontal velocity profile (Re¼ 100) at x¼ 0.5 m with Ghia’s data [21]. b. Comparison of the calculated vertical velocity profile (Re¼ 100) aty¼ 0.5 m with Ghia’s data [21].


-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0 0.2 0.4 0.6 0.8 1

V(m

/s)

x(m)

FFD on GPUGHIA

0

0.2

0.4

0.6

0.8

1

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

y(m

)

U(m/s)

FFD on GPUGHIA

a b

Fig. 9. a. Comparison of the calculated horizontal velocity profiles (Re¼ 10,000) at x¼ 0.5 m with Ghia’s data [21]. b. Comparison of the calculated vertical velocity profile(Re¼ 10,000) at y¼ 0.5 m with Ghia’s data [21].

Uin

Fig. 11. Schematic of the fully developed flow in a plane channel.

a multi-processor supercomputer. For more information on parallelcomputing, one can refer to books [18–20].

5. Results and discussion

To evaluate the FFD on the GPU for indoor airflow simulation,this study compared the results of the FFD on the GPU with thereference data. In addition, it was interesting to see the speed of thesimulations.

5.1. Evaluation of the results

The evaluation was performed by using the FFD on the GPU tocalculate four airflows relevant to the indoor environment. The fourflows were the flow in a lid-driven cavity, the fully developed flowin a plane channel, the forced convective flow in an empty room,and the natural convective flow in a tall cavity. The simulationresults are compared with the data from the literature.

5.1.1. Flow in a square cavity driven by a lidAir recirculated in a room is like the flow in a lid-driven cavity

(Fig. 7). This flow is also a classical case for numerical validation[21]. This investigation studied both laminar and turbulent flows.Based on the lid velocity of U¼ 1 m/s, cavity length of L¼ 1 m, andkinematic viscosity of the fluid, the Reynolds number of the laminarflow was 100 and the turbulent one was 10,000. A mesh with65" 65 grid points was enough for a laminar flow with Re¼ 100.Since the FFD model had no turbulence model, it required a densemesh for the highly turbulent flow if an accurate result was desired.Thus, this study applied a fine mesh with 513" 513 grid points forthe flowat Re¼ 10,000. The reference datawas the high quality CFDresults obtained by Ghia et al. [21].

Fig. 8 compares the computed velocity profiles of the laminarflow (Re¼ 100) at the vertical (Fig. 8a) and horizontal (Fig. 8b) mid-sections with the reference data. The predictions by FFD on GPU are

the same as those for Ghia’s data for laminar flow. These resultsshow that the FFD model works well for laminar flow.

The flow at Re¼ 10,000 is highly turbulent. Although the currentFFD model has no turbulence treatment, it could still provide veryaccurate results by using dense mesh (513" 513). As shown inFig. 9, the FFD on the GPU was able to accurately calculate thevelocities at both vertical and horizontal mid-sections of the cavity.The predicted velocity profiles agree with the reference data. Fig. 10compares the streamlines calculated by the FFD with referencesones [21]. The predicted profiles (Fig. 10a) of the vortices are similarto those of the reference one (Fig. 10b). The FFD on the GPUsuccessfully computed not only the primary recirculation in thecenter of the cavity, but also the secondary vortices in the upper-left, lower-left, and lower-right corners. There were one anti-clockwise rotation in the upper-left corner, one anti-clockwise, andone smaller clockwise rotation in both the lower-left and lower-right corners. Although this is a simple case, it proves that the GPUcould be used for numerical computing as the CPU.

5.1.2. Flow in a fully developed plane channelThe flow in a long corridor can be simplified as a fully developed

flow in a plane channel (Fig. 11). The Reynolds number of the flowstudied was 2800, based on the mean bulk velocity Ub and the halfchannel height, H. A mesh with 65" 33 grid points was adopted bythe FFD simulations. The Direct Numerical Simulation (DNS) datafrom Mansour et al. [22] was selected as a reference. Fig. 12compares the predicted velocity profiles by the FFD on both theCPU and the GPU with the DNS data. Different from the turbulentprofile drawn by the DNS data, the FFD on the GPU, gave morelaminar like profiles. As discussed by the authors [8], this laminarprofile was caused by a lack of turbulence treatment in the currentFFD model. Nevertheless, the GPU worked properly and the FFD onthe GPU was the same as that on the CPU for this case.

5.1.3. Flow in an empty room with forced convectionA forced convection flow in an empty room represents flows in

mechanically ventilated rooms (Fig.13). The studywas based on theexperiment by Nielson [23]. His experimental data showed that theflow in the room can be simplified into two-dimensions. The heightof tested room, H, is 3 m and the width is 3H. The inlet was in theupper-left corner with a height of 0.56H. The outlet height was0.16H and located in the lower-right corner. The Reynolds numberwas 5000, based on the inlet height and inlet velocity, which canlead to turbulent flow in a room. This study employed a mesh of37" 37 grid points.

Fig. 14 compares the predicted horizontal velocity profiles at thecenters of the room (x¼H and 2H) and at the near wall regions(y¼ 0.028H and 0.972H) with the experimental data. As expected,

hin

hout

L =3H

Hy

x

Uin= 0.455 m/s

Fig. 13. Schematic of a forced convective flow in an empty room.

0

0.2

0.4

0.6

0.8

1

1.2

0 0.2 0.4 0.6 0.8 1

U/U

b

y/H

GPUCPUDNS

Fig. 12. Comparison of the mean velocity profile in a fully developed channel flowpredicted by the FFD on a CPU and a GPU with the DNS data [22].


Proceedings: Building Simulation 2007

- 983 -

X

Y

0 2 4 6 80

2

(a) FFD

(b) Standard k- model (Chen 1995)

(c) LES (Su et al. 2001)

Figure 7 The velocity field predicted by different numerical methods.

The N strongly depends on number of grids and time step size. A coarse grid size and large time steps can accelerate the simulation but accordingly degrade the accuracy. Therefore, one has to find a trade-off between the computational performance and accuracy. For the three cases, the FFD simulations were faster than the real time on a Dell Inspiron laptop with an Intel Core 2 CPU T200 at 2.00 GHz. Table 1 lists the performance of the FFD simulations. Although this CPU is dual core, the FFD simulations used only one processor.

Table 1 Performance of FFD

CASE GRIDS !"(s) N Lid-driven cavity 20 ! 20 0.1 44.5 Plane channel 32 ! 8 0.05 30.3 Ventilated room 300 ! 125 0.5 2.4

CONCLUSION The Fast Fluid Dynamics (FFD) method based on semi-Lagrangian method was validated for three different flows: flow in a lid-driven cavity, flow in a plane channel, and flow in a ventilated room. The accuracy of the FFD method has been evaluated by comparing the predicted results with the experimental and reference CFD data. The FFD method can predict the flow with acceptable accuracy at a speed faster than the real time.

ACKNOWLEDGMENT This project was funded by U.S. Federal Aviation Administration (FAA) Office of Aerospace Medicine through the Air Transportation Center of Excellence

for Airliner Cabin Environment Research under Cooperative Agreement 04-C-ACE-PU. Although the FAA has sponsored this project, it neither endorses nor rejects the findings of this research. The presentation of this information is in the interest of invoking technical community comment on the results and conclusions of research.

REFERENCES Bozeman JD. and Dalton C. 1973. "Numerical Study

of Viscous Flow in a Cavity," Journal of Computational Physics, 12(3): 348-363.

Chen Q. 1995. "Comparison of Different K-Epsilon Models for Indoor Air-Flow Computations," Numerical Heat Transfer Part B-Fundamentals, 28(3): 353-369.

Erturk E, Corke TC, and Gokcol C. 2005. "Numerical solutions of 2-D steady incompressible driven cavity flow at high Reynolds numbers," International Journal for Numerical Methods in Fluids, 48(7): 747-774.

Ghia U, Ghia KN, and Shin CT. 1982. "High-Re Solutions for Incompressible Flow Using the Navier-Stokes Equations and a Multigrid Method," Journal of Computational Physics, 48(3): 387-411.

Kim J, Moin P, and Moser R. 1987. "Turbulence Statistics in Fully-Developed Channel Flow at Low Reynolds-Number," Journal of Fluid Mechanics, 177: 133-166.

Restivo A. 1979. "Turbulent flow in ventilated room," Ph.D. Thesis, University of London (U.K.).

Robert A, Turnbull C, and Henderso J. 1972. "Implicit Time Integration Scheme for Baroclinic Models of Atmosphere," Monthly Weather Review, 100(5): 329-335.

Su M, Chen Q, and Chiang C. 2001. "Comparison of different subgrid-scale models of large eddy simulation for indoor airflow modeling," Journal of Fluids Engineering-Transactions of the ASME, 123(3): 628-639.

Wang L. 2007. "Coupling of Multizone and CFD Programs for Building Airflow and Contaminant Transport Simulations," Ph.D. Thesis, Purdue University.

the FFD on the GPU could capture major characteristics of flowvelocities (Fig. 14a and 14b). But the differences between theprediction and experimental data are large at the near wall region(Fig. 14c and 14d) since we only applied a simple non-slip wallboundary condition. Advanced wall function may improve theresults, but it will make the code more complex and require morecomputing time.

5.1.4. Flow in a natural convective tall cavityThe flows in the previous three cases were isothermal. The FFD

on the GPU was further validated by using a non-isothermal flow,such as a natural convection flow inside a dual window. This casewas based on the experiment by Betts and Bokhari [24]. Theymeasured the natural convection flow in a tall cavity of 0.076 mwide and 2.18 m high (Fig. 15). The cavity was deep enough so thatthe flow pattern was two-dimensional. The left wall was cooled at15.1 !C and the right wall heated at 34.7 !C. The top and bottomwalls were isolated. The corresponding Rayleigh number was0.86"106. A coarse mesh of 11"21 was applied. Fig. 16 comparesthe predicted velocity and temperature with the experimental dataat three different lines across the cavity. The results show that theFFD on the GPU gave reasonable velocity and temperature profiles.

Again, the results obtained by the FFD on the GPU differ from theexperimental data, but they are the same as those of the FFD on theCPU. The results lead to a similar conclusion as in the previouscases.

The above four cases show that the FFD code on the GPUproduced accurate results for lid-driven cavity flow and reasonableresults for other airflows. Due to the limitation of the FFD model,predictions by the FFD on the GPU may differ from the referencedata.

5.2. Comparison of the simulation speed

To compare the FFD simulation speed on the GPU with that onthe CPU, this study measured their computing time for the lid-driven cavity flow. In addition, this study also measured thecomputing time by the CFD on a CPU. A commercial CFD softwareFLUENT was used in the measurement. The simulations werecarried out on an HP workstationwith an Intel XeonTM CPU and anNVIDIA GTX 8800 GPU. The data was for 100 time steps but witha different number of meshes.

Fig. 17 illustrates that for both CFD and FFD, the CPU computingtime increased linearly with themesh size. The CFD on the CPUwas

0

0.2

0.4

0.6

0.8

1

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

y/H

U/Uin

x=H

GPUExperiment

0

0.2

0.4

0.6

0.8

1

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

y/H

U/Uin

x=2H

GPUExperiment

-0.5

0

0.5

1

1.5

0 0.5 1 1.5 2 2.5 3

U/U

in

x/H

y=0.028H

GPUExperiment

-0.5

0

0.5

1

1.5

0 0.5 1 1.5 2 2.5 3

U/U

in

x/H

y=0.972H

GPUExperiment

a b

dc

Fig. 14. a. Comparison of the horizontal velocity at x¼H in forced convection predicted by the FFD on a GPU with experimental data [23]. b. Comparison of the horizontal velocity atx¼ 2H in forced convection predicted by the FFD on a GPU with experimental data [23]. c. Comparison of the horizontal velocity at y¼ 0.028H in forced convection predicted by theFFD on a GPU with experimental data [23]. d. Comparison of the horizontal velocity at y¼ 0.972H in forced convection predicted by the FFD on a GPU with experimental data [23].


5.3. Discussion

This study implemented the FFD solver for flow simulation onthe GPU. Since the FFD solves the same governing equations as theCFD, it is also possible to implement the CFD solver on the GPU byusing a similar strategy. One can also expect that the speed of CFDsimulations on the GPU should be faster than that on the CPU. Forthe CFD codes written in C language, the implementation will berelatively easy since only the parallel computing part needs to berewritten in CUDA.

Current GPU computing speed can be further accelerated byoptimizing the code implementation. The dimensions of GPUblocks can be flexible to adapt to the mesh. Meanwhile, manyclassical optimization techniques for paralleling computing are alsogood for GPU computing. For example, to read or write data fromGPU memory is time consuming, so the processors are often idledfor data transmission. One approach is to reuse the data already onthe GPU by calculating several neighboring mesh nodes with onethread.

In addition, the computing time can be further reduced by usingmultiple GPUs. For example, an NVIDIA Tesla 4-GPU computer has960 processors and 16 GB system memory [25]. Its peak perfor-mance can be as high as 4 Tetra FLOPS, which is about 10 timesfaster than the GPU used in this study. Thus, the computing time ofa problem with large meshes can be greatly reduced by usingmultiple GPUs.

6. Conclusions

This paper introduced an approach to conduct fast and infor-mative indoor airflow simulation by using the FFD on the GPU. AnFFD code has been implemented in parallel on a GPU for indoorairflow simulation. By applying the code for flow in a lid-drivencavity, a channel flow, a forced convective flow, and a naturalconvective flow, this investigation showed that the FFD on a GPUcould predict indoor airflow motion and air temperature. Theprediction was the same as the data in the literature for lid-drivencavity flow. The FFD on GPU can also capture major flow charac-teristics for other cases, including fully developed channel flow,forced convective flow and natural convective flow. But somedifferences exist due to the limitations of the FFD model, such aslack of turbulence model and simple no-slip wall treatment.

In addition, a flow simulation with the FFD on the GPU was 30times faster than that on the CPU when the mesh size was themultiplication of 256. If the mesh size cannot be exactly themultiplication of 256, the simulation was still 10 times faster than

that on the CPU. As a whole, the FFD on a GPU can be 500–1500times faster than the CFD on a CPU.

Acknowledgements

This study was funded by the US Federal Aviation Administra-tion (FAA) Office of Aerospace Medicine through the National AirTransportation Center of Excellence for Research in the IntermodalTransport Environment under Cooperative Agreement 07-CRITE-PU and co-funded by the Computing Research Institute at PurdueUniversity. Although the FAA has sponsored this project, it neitherendorses nor rejects the findings of this research. The presentationof this information is in the interest of invoking technicalcommunity comment on the results and conclusions of theresearch.

References

[1] United States Fire Administration. Fire statistics, http://www.usfa.dhs.gov/statistics/national/index.shtm; 2008.

[2] Chen Q. Design of natural ventilation with CFD. In: Glicksman LR, Lin J, editors.Sustainable urban housing in china. Springer; 2006. p. 116–23 [chapter 7].

[3] Nielsen PV. Computational fluid dynamics and room air movement. Indoor Air2004;14:134–43.

[4] Lin C, Horstman R, Ahlers M, Sedgwick L, Dunn K, Wirogo S. Numericalsimulation of airflow and airborne pathogen transport in aircraft cabins – part1: numerical simulation of the flow field. ASHRAE Transactions 2005:111.

[5] Megri AC, Haghighat F. Zonal modeling for simulating indoor environment ofbuildings: review, recent developments, and applications. HVAC&R Research2007;13(6):887–905.

[6] Chen Q. Ventilation performance prediction for buildings: a method overviewand recent applications. Building and Environment 2009;44(4):848–58.

[7] Stam J. Stable fluids. In: Proceedings of 26th international conference oncomputer graphics and interactive techniques, SIGGRAPH’99, Los Angeles;1999.

[8] Zuo W, Chen Q. Real-time or faster-than-real-time simulation of airflow inbuildings. Indoor Air 2009;19(1):33–44.

[9] Mazumdar S, Chen Q. Influence of cabin conditions on placement andresponse of contaminant detection sensors in a commercial aircraft. Journal ofEnvironmental Monitoring 2008;10(1):71–81.

[10] Hasama T, Kato S, Ooka R. Analysis of wind-induced inflow and outflowthrough a single opening using LES & DES. Journal of Wind Engineering andIndustrial Aerodynamics 2008;96(10–11):1678–91.

[11] Nvidia. Nvidia CUDA compute unified device architecture– programmingguide (version 1.1). Santa Clara, California: NVIDIA Corporation; 2007.

[12] Courant R, Isaacson E, Rees M. On the solution of nonlinear hyperbolicdifferential equations by finite differences. Communication on Pure andApplied Mathematics 1952;5:243–55.

[13] Chorin AJ. A numerical method for solving incompressible viscous flowproblems. Journal of Computational Physics 1967;2(1):12–26.

[14] Harris MJ. Real-time cloud simulation and rendering. Ph.D. thesis, Universityof North Carolina at Chapel Hill; 2003.

[15] Song O-Y, Shin H, Ko H.- S. Stable but nondissipative water. ACM Transactionson Graphics 2005;24(1):81–97.

[16] Zuo W, Chen Q. Validation of fast fluid dynamics for room airflow. In:Proceedings of the 10th international IBPSA conference, Building Simulation2007, Beijing, China; 2007.

[17] Rixner S. Stream processor architecture. Boston & London: Kluwer AcademicPublishers; 2002.

[18] Roosta SH. Parallel processing and parallel algorithms: theory and computa-tion. New York: Springer; 1999.

[19] Bertsekas DP, Tsitsiklis JN. Parallel and distributed computation: numericalmethods. Belmont, Massachusetts: Athena Scientific; 1989.

[20] Lewis TG, El-Rewini H, Kim I.- K. Introduction to parallel computing. Engle-wood Cliffs, New Jersey: Prentice Hall; 1992.

[21] Ghia U, Ghia KN, Shin CT. High-Re solutions for incompressible flow using theNavier–Stokes equations and a multigrid method. Journal of ComputationalPhysics 1982;48(3):387–411.

[22] Mansour NN, Kim J, Moin P. Reynolds-stress and dissipation-rate budgets ina turbulent channel flow. Journal of Fluid Mechanics 1988;194:15–44.

[23] Nielsen PV. Specification of a two-dimensional test case. Aalborg, Denmark:Aalborg University; 1990.

[24] Betts PL, Bokhari IH. Experiments on turbulent natural convection in anenclosed tall cavity. International Journal of Heat and Fluid Flow2000;21(6):675–83.

[25] NVIDIA, http://www.nvidia.com/object/tesla_computing_solutions.html; 2009.

1.0E-02

1.0E-01

1.0E+00

1.0E+01

1.0E+02

1.0E+03

1.0E+04

1.0E+05

1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07

Number of Grids

Com

putin

g Ti

me

FFD on GPUFFD on CPUCFD on CPU

Fig. 17. Comparison of the computing time used by the FFD on a GPU, the FFD ona CPU, and the CFD on a CPU.


3 Conclusions

• FFD on CPU is faster than CFD (Fluent) on CPU

• FFD on GPU is faster than FFD on CPU

• FFD is not accurate

Several Problems

• How to improve CFD? Can we make CFD running on GPU?

• Why FFD is not accurate? Is it possible to add turbulent models to FFD to make it accurate?

Qingyan Chen’s paper talked about FFD on GPU

No paper related to CFD with k-epsilon model running on GPU has been published yet.

The End ^_^

•Sir Isaac Newton, Philosophiæ Naturalis Principia Mathematica, 5 July 1687•David C. Wilcox,Turbulence Modeling for CFD, July 1993•Adria !n J. Lew, et. al., A note on the numerical treatment of the k-epsilon turbulence model, International Journal of Computational Fluid Dynamics•Rajesh Bhaskaran, Lance Collins, Introduction to CFD Basics•ANSYS, Fluent 6.0 User's Guide, December 2001•Fluent, Commercial CFD Package, http://www.ansys.com/•Shiyi Chen and Gary D. Boolen, Lattice Boltzmann Method for Fluid Flows, 1998•Palabos, Open source lbm package, http://www.lbmethod.org/•Blender, Open source 3d modeling package, http://www.blender.org/•Takahiro Harada, et. al., Real Time Pargicle-Based Simulation on GPUs •isph, Open source SPH implementation, http://isph.sourceforge.net/•Jeff Borggaard, et. al., Control, Estimation and Optimization of Energy Efficient•Stam J. Stable Fluids, Proceedings of 26th international conference on computer graphics and interactive techniques, SIGGRAPH'99•Mark J. Harris, Fast Fluid Dynamics Simulation on the GPU•Wangda Zuo, Qingyan Chen, Fast and Informative Flow Simulations in a Building by Using Fast Fluid Dynamics Model on Graphics Processing Unit, 2009, Building and Environment•OpenFOAM, Open source CFD toolkit, http://www.openfoam.com/•Khronos Group, OpenCL 1.1 Specification, revision 36, September 30, 2010

Reference

http://www.ansys.com

http://www.ansys.com

http://www.lbmethod.org

http://www.lbmethod.org

http://www.blender.org

http://www.blender.org

http://isph.sourceforge.net

http://isph.sourceforge.net

T, H,cis800/lectures/cfd_yue.pdfSir Isaac Newton. Philosophiæ Naturalis Principia Mathematica....

Documents

Transcript of T, H,cis800/lectures/cfd_yue.pdfSir Isaac Newton. Philosophiæ Naturalis Principia Mathematica....