BDAM: Big Data Asset Management

19
BDAM: Big Data Asset Management Mark Harrison - Mike Sundy {mh,msundy}@pixar.com

description

Lots of Content? Check. Big assets? Oh yeah. See how Pixar makes movies using Perforce. Find out how to scale your version control system once your data goes from gigabytes to terabytes.

Transcript of BDAM: Big Data Asset Management

Page 1: BDAM: Big Data Asset Management

BDAM:Big Data

Asset Management

Mark Harrison - Mike Sundy{mh,msundy}@pixar.com

Page 2: BDAM: Big Data Asset Management

No Recording

Page 3: BDAM: Big Data Asset Management

What is Asset Management?

• Long-Lived Data– 50 year charter

• Large Data– Many TB

• Tight Data/Metadata Integration– Shot lists, assignments, rights management

• Scalable Data Services– Human, Render Farm, Build Farm Scale

Page 4: BDAM: Big Data Asset Management

Long Lived Data

• How Templar Project was Started• Things Change

– Vendors– Software– File formats– Hardware, OS, Storage

• Your Own Requirements Change– How flexible, “hackable” can you be?

Page 5: BDAM: Big Data Asset Management

Large Data

• Expanding Expectations (include)• Harrison’s Law of 1 Terabyte (include)• Harrison’s Time Scale of Data (include)• Harrison’s law of mentioning Harrison• Basic Drivers:

– Storage: cheaper– Expectations: higher– Time: stays constant

Page 6: BDAM: Big Data Asset Management

Tight Data/Metadata Integration

• Over Time, you lose information about files• Important Information:

– Assignments, shot lists, rights clearances• Don’t let data disappear into proprietary hole

Page 7: BDAM: Big Data Asset Management

Scalable Data Services

• Picture of single server• Applications need to scale appropriately• Avoid bottleneck of single server (if possible)• Infrastructure should handle data bandwidth• Note: Bottlenecks will always move, but

always exist

Page 8: BDAM: Big Data Asset Management

Templar

• Pixar’s Proprietary Asset Management System• Handles all studio data and metadata

– feature films, shorts, special projects– artwork, scripts, movie frames, simulation data,

project management data• 50 year Timeframe

– All metadata, data can be accessed and used through 2053

Page 9: BDAM: Big Data Asset Management

Templar Asset Management

• Long-Lived Data– 50 year charter

• Large Data– Many TB

• Tight Data/Metadata Integration– Shot lists, assignments, rights management

• Scalable Data Services– Human, Render Farm, Build Farm Scale

Page 10: BDAM: Big Data Asset Management

Templar: Long Lived Data

• Federated Architecture– Loosely Coupled– Software hooks into pipeline

• Pieces can be upgraded incrementally– Software, file formats

• Exit Strategy Orientation– Standards, access to internals

Page 11: BDAM: Big Data Asset Management

Templar Large Data

• Large, Fast Storage– File system caching, etc.

• Scalable Storage Software– proprietary system for non-revisioned files– Perforce

• Both horizontal and vertical scalability

Page 12: BDAM: Big Data Asset Management

Templar Data/Metadata Integration

• “Federated” System– No monolithic application that “does everything”

• Instead, “best in class” programs that interoperate– modeling, rendering, storage, etc.

• Lightly Coupled Applications to Metadata• Metadata in Relational DB, eg Oracle• Expandable Metadata Schema

Page 13: BDAM: Big Data Asset Management

Templar: Scalable Data Services

• Multiple Access Methods for Assets– File system, HTTP, direct Perforce

• Load Balancer, multiple servers (e.g. HTTP)• File System optimizations (clusters, caching)• Perforce: use LINKATRON• Asynchronous Queuing

Page 14: BDAM: Big Data Asset Management

Perforce

• In use at Pixar since 2000 for code only• File revision history goes back to 1983• First Perforce-managed film: Toy Story 3

Page 15: BDAM: Big Data Asset Management

Perforce: Long Lived Data

• Matches “exit strategy” requirements– All data, metadata extractable, hackable– ,d magic – direct flat file storage access on back-end

• Types of Data – not just code!– art – reference and concept art – inspirational art for film– tech – show-specific data. e.g. models, textures, pipeline– studio – company-wide reference libraries. e.g. animation

reference, configuration files, Flickr-like company photo site– tools – code for our central tools team, software projects– dept – department-specific files. e.g. marketing images– exotics – patent data, casting audio, data for live action shorts,

story gags, theme park concepts, intern art show

Page 16: BDAM: Big Data Asset Management

Perforce: Large Data

• Vertical Scalability– 900 GB single file – 6.5 TB checkin– 47 TB largest single depot– 160 TB total Perforce storage across all depots

• Leverage Perforce features to reduce data:– Used +S auto-purge filetype to save 40% of

storage on Toy Story 3 (1.2 TB)– Wrote a script to de-duplicate files, using p4

checksum data. Saved 1 million files and 1 TB

Page 17: BDAM: Big Data Asset Management

Perforce: Data/Metadata Integration

• How does it integrate with Templar?– stores the files– version control– the “authority” for source writes– triggers for synchronous operations (e.g.

LINKATRON)

Page 18: BDAM: Big Data Asset Management

Perforce: Scalable Data Services

• Horizontal Scalability– 190+ depots– 58 VMWare servers– 26 million submitted changelists

• Server architecture– Scale out

• Performance on one depot won’t affect another• Easier administration/downtime scheduling

– Virtualization• 95% of physical hardware performance with greater flexibility• 15 minutes to build new server

• Automated p4 server setup (squire)– 8 seconds to run script to create new p4 instance

Page 19: BDAM: Big Data Asset Management

Conclusion

• Templar and Perforce met our four requirements:– Long-Lived Data

• 50 year charter• confidence in retrieving data due to access to internals

– Large Data• Hundreds of TB• 500 TB depot on horizon

– Tight Data/Metadata Integration• Rock solid file management• users trust it

– Scalable Data Services• 190 depots• hundreds more to come – we keep finding new uses