P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

12
1 P-20W Federated Data Systems November 16, 2011 2:45 – 3:45 Matthew Bryant (VA) Marina Moschos (VA) Ajay Rohatgi (VA) Najmah Thomas (VA) Henry Paik (VA)

description

P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45. Matthew Bryant (VA) Marina Moschos (VA) Ajay Rohatgi (VA) Najmah Thomas (VA) Henry Paik (VA). Background. SLDS Project Awarded in 2010 Divided into 5 “Outcomes” - PowerPoint PPT Presentation

Transcript of P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

Page 1: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

1

P-20W Federated Data Systems

November 16, 20112:45 – 3:45

Matthew Bryant (VA)Marina Moschos (VA)

Ajay Rohatgi (VA)Najmah Thomas (VA)

Henry Paik (VA)

Page 2: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

2

Background

• SLDS Project Awarded in 2010• Divided into 5 “Outcomes”• Primary objective is creating portal and securely accessing

data merged across agencies• “Data Governance” is unique outcome within the proposal• SLDS grant proposal was itself a multi-agency “project,”

under direction from the Governor’s office

Page 3: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

3

Initial Partners

Virginia Department of Education

State Council on Higher Education for Virginia

Virginia Employment Commission

Virginia Community College System (Workforce Office)

Page 4: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

4

Federated Model

Driven by Virginia’s Privacy Act• Consolidated Data Warehouse not Possible• Received Attorney General Approval

Respects agencies’ need to maintain their own data• Step 1: Agencies de-identify data and apply hash algorithm with

common seed to common data elements• Step 2: Using the hash, third party (the Shaker) matches records,

strips hash and assigns unique identifier• Step 3: Records delivered to requester

No party can match the linked records back to identifiable data

Page 5: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

5

Federated Architecture

Workflow

MS SQL Server

Workflow SDK

Active Directory

QueryResults

ExternalWebsites

links

University ResearchUniversity Research

Workflow

AlertsNotifications

AlertsNotifications

Exchange Server

Data Sources

Portal

CommonwealthEmployees

CommonwealthEmployees

Portal API

Web Services

Reports

Canned Data

Public Reports

QBT

Lexicon

Page 6: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

6

Lexicon

• Inventory of every available data field in every available data source

• Structure of their storage• Possible values and meanings of the information stored• All possible transformations of each set of field values to

another set of field values• Methods of data source access• Matching algorithms and how they are to be used in

conjunction with possible field value transformations

Page 7: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

7

Security Model

Hashing• Data staged by each participating agency

• Hash algorithm applied to individual records based on common “seed,” creating single-use, unique ID

• Records merged based on unique ID, which is stripped out after merge

• Merged records delivered to researcher

Data Adapter• Web services used to request data

• Data is staged at each agency

• Adapter installed at each agency’s staging database

• Adapter manages web service calls from shaker and lexicon

• Adapter works with shaker to manage the hashing process

Page 8: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

8

State Council of Higher Education for Virginia

Impact on VA Higher Education

State Objective• Where appropriate, align post secondary education with the workforce needs of

business and employment needs of students.

Data Challenges• Merging K12 and HE data

• Using the data to answer key policy questions

• Granting researchers access to the data

Opportunities• Virginia College Navigator website

• Feedback reports (High School and Transfer)

• Tracking graduates into the workforce

• Does transfer affect workforce outcomes?

Page 9: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

9

Virginia Community College System

WDQI Project BackgroundObjectives

• To use data to understand workforce programs and improve performance

• Promote the workforce system

Needs• Linking data across multiple programs

• Automation of data merging process

• Formalized data sharing agreements

Solutions• SLDS Grant (USED)

• WDQI Grant (US DOL)

• Federated Data System

De-identified Data for Reporting & Analysis

DOE Data

DSS Program

Data

WIA Program

Data

Page 10: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

10

Building Blocks of a Successful Data Governance Model

Book of Data Governance• Data Governance Council and Constitution – who we are

• Policies – what needs to get done

• Procedures – how things get done

Critical Path Items• Establish Council

• Draft Council by-laws

• Burning questions – what questions do we want to answer?

• Master Agreement – cooperative agreement amongst the participating agencies that authorizes the Council to make decisions

Page 11: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

11

Critical Success Factors

• Members must find common ground and politics through a shared vision/goal or develop one as a first priority

• Members must make significant time commitments to the Governance process

• Documentation is vital to maintaining structure and minimize rework

• Delegation of tasks to sub-committees/working groups• Communication with and involvement of the development

teams

“Data governance is as much about people as it is policies”

Page 12: P-20W Federated Data Systems November 16 , 2011 2:45 – 3:45

12

Contact Info:Matthew Bryant, 804-786-1212, [email protected] Moschos, 804-371-0554 [email protected] Ajay Rohatgi, 804-786-0529, [email protected] Najmah Thomas, 804-819-1666, [email protected] Henry Paik, 703-689-3054, [email protected]

P-20W Federated Data Systems: Contacts