P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45
description
Transcript of P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45
1
P-20W Federated Data Systems
November 16, 20112:45 – 3:45
Matthew Bryant (VA)Marina Moschos (VA)
Ajay Rohatgi (VA)Najmah Thomas (VA)
Henry Paik (VA)
2
Background
• SLDS Project Awarded in 2010• Divided into 5 “Outcomes”• Primary objective is creating portal and securely accessing
data merged across agencies• “Data Governance” is unique outcome within the proposal• SLDS grant proposal was itself a multi-agency “project,”
under direction from the Governor’s office
3
Initial Partners
Virginia Department of Education
State Council on Higher Education for Virginia
Virginia Employment Commission
Virginia Community College System (Workforce Office)
4
Federated Model
Driven by Virginia’s Privacy Act• Consolidated Data Warehouse not Possible• Received Attorney General Approval
Respects agencies’ need to maintain their own data• Step 1: Agencies de-identify data and apply hash algorithm with
common seed to common data elements• Step 2: Using the hash, third party (the Shaker) matches records,
strips hash and assigns unique identifier• Step 3: Records delivered to requester
No party can match the linked records back to identifiable data
5
Federated Architecture
Workflow
MS SQL Server
Workflow SDK
Active Directory
QueryResults
ExternalWebsites
links
University ResearchUniversity Research
Workflow
AlertsNotifications
AlertsNotifications
Exchange Server
Data Sources
Portal
CommonwealthEmployees
CommonwealthEmployees
Portal API
Web Services
Reports
Canned Data
Public Reports
QBT
Lexicon
6
Lexicon
• Inventory of every available data field in every available data source
• Structure of their storage• Possible values and meanings of the information stored• All possible transformations of each set of field values to
another set of field values• Methods of data source access• Matching algorithms and how they are to be used in
conjunction with possible field value transformations
7
Security Model
Hashing• Data staged by each participating agency
• Hash algorithm applied to individual records based on common “seed,” creating single-use, unique ID
• Records merged based on unique ID, which is stripped out after merge
• Merged records delivered to researcher
Data Adapter• Web services used to request data
• Data is staged at each agency
• Adapter installed at each agency’s staging database
• Adapter manages web service calls from shaker and lexicon
• Adapter works with shaker to manage the hashing process
8
State Council of Higher Education for Virginia
Impact on VA Higher Education
State Objective• Where appropriate, align post secondary education with the workforce needs of
business and employment needs of students.
Data Challenges• Merging K12 and HE data
• Using the data to answer key policy questions
• Granting researchers access to the data
Opportunities• Virginia College Navigator website
• Feedback reports (High School and Transfer)
• Tracking graduates into the workforce
• Does transfer affect workforce outcomes?
9
Virginia Community College System
WDQI Project BackgroundObjectives
• To use data to understand workforce programs and improve performance
• Promote the workforce system
Needs• Linking data across multiple programs
• Automation of data merging process
• Formalized data sharing agreements
Solutions• SLDS Grant (USED)
• WDQI Grant (US DOL)
• Federated Data System
De-identified Data for Reporting & Analysis
DOE Data
DSS Program
Data
WIA Program
Data
10
Building Blocks of a Successful Data Governance Model
Book of Data Governance• Data Governance Council and Constitution – who we are
• Policies – what needs to get done
• Procedures – how things get done
Critical Path Items• Establish Council
• Draft Council by-laws
• Burning questions – what questions do we want to answer?
• Master Agreement – cooperative agreement amongst the participating agencies that authorizes the Council to make decisions
11
Critical Success Factors
• Members must find common ground and politics through a shared vision/goal or develop one as a first priority
• Members must make significant time commitments to the Governance process
• Documentation is vital to maintaining structure and minimize rework
• Delegation of tasks to sub-committees/working groups• Communication with and involvement of the development
teams
“Data governance is as much about people as it is policies”
12
Contact Info:Matthew Bryant, 804-786-1212, [email protected] Moschos, 804-371-0554 [email protected] Ajay Rohatgi, 804-786-0529, [email protected] Najmah Thomas, 804-819-1666, [email protected] Henry Paik, 703-689-3054, [email protected]
P-20W Federated Data Systems: Contacts