Blackbird: Accelerated Course Archives Using Condor with Blackboard Sam Hoover, IT Systems Architect...
-
Upload
terence-osborne -
Category
Documents
-
view
219 -
download
2
Transcript of Blackbird: Accelerated Course Archives Using Condor with Blackboard Sam Hoover, IT Systems Architect...
Blackbird: Accelerated Course Archives Using Condor with Blackboard
Sam Hoover, IT Systems Architect
Matt Garrett, System Administrator
Blackboard @ Clemson
@
Blackboard @ Clemson
• End of Semester archives of all online courses in Blackboard since implementation in 2004
• 77 GB Oracle 10.2.0.4 DB tied to a 1.3 TB Content system with over 13 million files
• Spring 2010: 4610 active Blackboard courses, 31,372 total courses in Blackboard
• Full system backups once a week, nightly incremental backups of entire system
The Archive Problem
• Blackboard is a mission critical system
• Why is 85.5 hours for archives a problem?
• Start of new semester vs. normal operations
• Time between semesters is short and getting shorter
• Faculty have to wait to set up next semester’s courses
• End of semester processes
Why do we need course archives?
Student Add / Drop at start of semester
Loss of course content or an entire course
CRLT archive uses• Grade disputes
Blackboard EoS archives
The Archive Problem
The Archive Problem
• Blackboard provides a script for executing batch archives given a list of courses as input.
• Weekly archive process at Clemson began in Fall 2006 after an accidental deletion of many courses.
• Started out splitting the course list into four equal chunks and giving each server ¼ of the total course list. All four servers usually finished within 2 hours of each other, total time for the batch was < 24 hours.
• By Fall 2008, archiving the active courses took 85.5 hours, and the servers finished at widely varying times.
The Archive Problem
The Archive Problem
• Who wants to work weekends?
Blackboard archive script
• /usr/local/blackboard/apps/content-exchange/bin/batch_ImportExport.sh
• Archive/Restore: The Archive Course function creates a record of the Course including User interactions. It is most useful for recalling Student performance or interactions at later time. The archive package is saved as a .ZIP file that can be restored to the Blackboard system at another time. In effect, Archive/Restore acts as a backup tool at the individual course level.
The Archive Problem
The Archive Problem
Potential Solutions?
Throw money at the problem?
Add more servers?
The Archive Problem
Potential solutions
• Write our own job scheduler?• Could we take advantage of the other 3 (CPUs)?• How do we monitor performance so end user
(Blackboard) experience isn’t impacted?• Use a DB to store and manage the queue?• What about security?• Has anyone else out there already done this?
Project Blackbird
+
Condor to the rescue?
• Job scheduler? Check
• Multi-core capable? Check
• Manage the queue? Check
• Performance monitoring? Check
• Security? Check
• Has anyone done this before? No
Steps in the weekly archive process
• Determine what to archive (active courses, orgs)
• Build a course list
• Create Blackbird submit files
• Submit DAGMan job to Condor
• Monitor Condor queue
• Receive email notification when all courses have been archived
• Look for errors and verify archive integrity
Custom Condor Configuration
• DAGMAN_MAX_JOBS_IDLE = 25
• DAGMAN_MAX_JOBS_SUBMITTED = 50
• SLOTS_CONNECTED_TO_CONSOLE = 0
• SLOTS_CONNECTED_TO_KEYBOARD = 0
• ## Force Condor to use Blackboard Private Network
• NETWORK_INTERFACE = Private Blackboard Net
DAGMan example
JOB UniqueCourseID /path/to/condor/submit/job/file/UniqueCourseID.bbCondor
JOB UniqueCourseID2 /path/to/condor/submit/job/file/UniqueCourseID2.bbCondor
JOB UniqueCourseID3 /path/to/condor/submit/job/file/UniqueCourseID3.bbCondor
JOB UniqueCourseID4 /path/to/condor/submit/job/file/UniqueCourseID4.bbCondor
JOB UniqueCourseID5 /path/to/condor/submit/job/file/UniqueCourseID5.bbCondor
JOB UniqueCourseID6 /path/to/condor/submit/job/file/UniqueCourseID6.bbCondor
SCRIPT POST UniqueCourseID6 /usr/local/CMSIntegration/bin/weeklyArchiveChecker.pl
Condor Submit example
universe = vanillarequirements = (OpSys=="LINUX") && ((Arch=="INTEL") || (Arch=="X86_64"))executable = /usr/local/bin/condorSubmitArchive.plarguments = shoover-S0000BKBRD_401001,/san/weeklyArchives/20091008/getenv = Truelog = /usr/local/logs/bbCondorLogs/archive20091008.lognotification = Errornotify_user = [email protected]_executable = Falsewhen_to_transfer_output = ON_EXITqueue 1
Blackbird archive solution
The Archive Problem
Blackbird archive solution
Blackbird Benefits
• Reduced total archive time from > 85 hrs to < 24 hrs
• Job scheduling – servers finish about the same time
• Zero impact to Blackboard Performance
• Automatic suspension/resumption of archives if Load reaches threshold on any core
• Email notification upon completion of all archives
• Load balancing – archive jobs are distributed as cores become available
• Takes advantage of all available CPU cores instead of just one core per server
Project Blackbird
+
Blackbird Benefits
Blackbird Benefits
What did it take to implement?
• Have one or more multi-core (CPU) machines
• A large amount of shared storage for archives
• Choose one machine as your Central Manager
• Install and configure Condor on each machine
• Automate course list creation (Query DB or Directory)
• Automate Condor submit files and Condor DAGMan file creation
• Automate the whole thing with cron
• Check log files for errors upon archive completion
Where else could I use this?
• Any system that does batch processing that can be broken up into many jobs
• Recently implemented on our MySQL server to export all of the MySQL databases
• Reduced the export time from 10 hours to 3.5 hours on a single, quad core machine
Recent updates
• 64 Bit Red Hat 5.4 OS and JVM 1.6
• Maximum (affordable) RAM per machine – 32 GB
• Web page to view Blackbird Condor Pool status
• Duplicate archives
• Error checking logs
• Redo any courses with errors or not completed
• Major Blackboard upgrade from 7.3 to 9.1 end of June
What’s next?
• New machines have 2 x Quad Core CPUs with HyperThreading so Condor sees 16 Cores
• Add out of warranty machines to the Blackboard Condor Pool (keep users off of them)
• Monitoring of queue (web page)
• Use ClassAds to specify architecture and memory requirements for large archive jobs
• Write code to query DB and find out what courses have changed, backup any course that has changed on a daily basis
• Automate installation and configuration
Please provide feedback for this session by emailing [email protected].
The subject of the email should be title of this session:
[Blackbird: Accelerated Course Archives Using Condor with Blackboard]