Security and Privacy of Information and Federal Big Data - Goodier.

28
Security and Privacy of Information and Federal Big Data - Goodier

Transcript of Security and Privacy of Information and Federal Big Data - Goodier.

Page 1: Security and Privacy of Information and Federal Big Data - Goodier.

Security and Privacy of Information and Federal Big Data

- Goodier

Page 2: Security and Privacy of Information and Federal Big Data - Goodier.
Page 3: Security and Privacy of Information and Federal Big Data - Goodier.

AgendaFederal big data is different1. Clearly

understanding when to use big data and why

2. The Security and Privacy implications for the federal government

3. Big Data Best Practice Use Cases

3

Page 4: Security and Privacy of Information and Federal Big Data - Goodier.

3 key questions

1. What are the key technology differences between – a non big data database and – a big data database?

2. What are the security and privacy implications of big data for federal application development?

3. What is the Use Case for the federal government?• How to transition to big data database technology – best

practice use cases

Page 5: Security and Privacy of Information and Federal Big Data - Goodier.

Question 1. What is are the differences and what is driving big data vs. non big data?

• The big data transition has been spurred by several key features of these technologies including:

1. Big data economics2. Flexible scaling models 3. Flexible data models

Page 6: Security and Privacy of Information and Federal Big Data - Goodier.

1.1 It’s the economics – no individual record has value – having all records is of increasing value

"Big data is what happened when the cost of keeping information became less than the cost of throwing it away.“ George Dyson

Source: Cloudera “Return on Byte

Page 7: Security and Privacy of Information and Federal Big Data - Goodier.

1.2. Flexible Scaling • Relational database technology is “scale up”

technology.– to add capacity (whether data storage or I/O capacity)

simply get a bigger server. • Ex: 175+ data repositories and ever-growing – it’s the

economics for government

The modern approach to architecture is to scale out, rather than scale up.

Page 8: Security and Privacy of Information and Federal Big Data - Goodier.

1.3. Flexible Data models

• With a relational database, you must define a schema before adding records to the database.

NoSQL databases (whether key-value, document-oriented, column-oriented or otherwise) scale out

they don’t require schema definition prior to inserting data they don’t require a schema change when data capture and management needs to evolve.

Page 9: Security and Privacy of Information and Federal Big Data - Goodier.

1.3 No SQL Model Semantics – • Some NoSQL database systems sort data by ID.

– Data with nearby IDs can be accessed more efficiently than IDs that are all over the place.

– Keeping data that you tend to access at the same time closer together makes your application faster.

• ID-lookup is extremely fast, and by selecting clever semantic IDs you can make data access even faster. – For example, prefixes can identify ontological categories

(target:CountryCode:123) to better group your data.

Page 10: Security and Privacy of Information and Federal Big Data - Goodier.

2.1 What are the security and privacy implications of big data for federal application development?

Page 10

• Agencies must plan in an integrated manner for managing information throughout its life cycle. Agencies will: – Consider the effects of their actions on the privacy

rights of individuals, and ensure that appropriate legal and technical safeguards are implemented

• Sources: • CIRCULAR NO. A-130 Revised Transmittal Memorandum No. 4 governs the Management of Federal

Information Resources – and

• INFORMATION RESELLERS Consumer Privacy Framework Needs to Reflect Changes in Technology and the Marketplace , GAO-13-663 September 2013

Page 11: Security and Privacy of Information and Federal Big Data - Goodier.

“The federal laws that address the types of consumer information that can be collected and shared for marketing and look-up purposes have limited reach and application.

Under most circumstances, information that many people may consider very personal or sensitive legally can be collected, shared, and used for marketing purposes.”

2.1.1 Dates of Federal Law enactment

House Committee on Energy and Commerce, Subcommittee on Communications, Technology, and the Internet, and Subcommittee on Commerce, Trade, and Consumer Protection, Exploring the Offline and Online Collection and Use of Consumer Information, 111th Cong., 1st sess., Nov. 19, 2009; see testimony of Pam Dixon, Executive Director, World Privacy Forum.

Page 12: Security and Privacy of Information and Federal Big Data - Goodier.

2.1.2 What does the Fair Credit Reporting Act have to do with Big Data?

Page 12

The FCRA prohibits the sale of consumer reports for other than a permissible purpose. Data brokerage companies may be considered consumer reporting agencies (CRAs) under the FCRA and thus subject to substantial civil – even criminal penalties. Recent examples of  FCRA enforcement program include:• Spokeo – In June 2012, $800,000 civil penalty. • HireRight – In August 2012, $2.6 million

penalty.• Filiquarian – In May 2013, undisclosed amount.• Certegy – In August 2013, $3.5 million, penalty.

Page 13: Security and Privacy of Information and Federal Big Data - Goodier.

2.1.3 Is the federal government a data broker?

Page 14: Security and Privacy of Information and Federal Big Data - Goodier.

2.2 What are the federal information security and privacy basic principles?

Page 14

• The free flow of information between the government and the public is essential to a democratic society.

• In order to minimize the cost and maximize the usefulness of government information, the expected public and private benefits derived from government information should exceed the public and private costs of the information, recognizing that the benefits to be derived from government information may not always be quantifiable.

• Because the public disclosure of government information is essential to the operation of a democracy, the management of Federal information resources should protect the public's right of access to government information.

• The individual's right to privacy must be protected in Federal Government information activities involving personal information.

Page 15: Security and Privacy of Information and Federal Big Data - Goodier.

2.2.1 Basic federal definitions

Page 15

• The term "dissemination" means the government initiated distribution of information to the public. – Not considered dissemination is distribution limited to

government employees or agency contractors or grantees, intra- or inter-agency use or sharing of government information, and responses to requests for agency records under the Freedom of Information Act (5 U.S.C. 552) or Privacy Act.

Page 16: Security and Privacy of Information and Federal Big Data - Goodier.

2.2.1 Controlled Unclassified Information (CUI) Definitions 2011-01 (Executive Order 13556)

Page 16

• Dissemination means the authorized sharing of CUI amongst parties to include executive branch agencies and State, local, tribal, and private sector partners, but does not include disclosure in response to a request under the Freedom of Information Act.

Page 17: Security and Privacy of Information and Federal Big Data - Goodier.

2.2.1 More CUI Definitions and the Data spill

Page 17

• Information means any knowledge that can be communicated or documentary material, regardless of its physical form or characteristics, that is owned by, is produced by or for, or is under the control of the United States Government.

• Public Release means the act of making information available to the general public through the approved processes of an agency.

• Safeguarding means measures and controls that are prescribed to protect CUI from unauthorized access and to manage the risks associated with processing, storage, handling, transmission, and destruction of CUI. CUI includes Personally Identifying Information (PII)

If CUI is publically released without approval that is a data spill

Page 18: Security and Privacy of Information and Federal Big Data - Goodier.

Attribution • Identification and Authentication (IA): Identifies who or what caused the spill

2.2.1 Data spills: NIST 800-53 Security ControlsSpill Chain

ElementsSecurity Control Areas(As Defined by NIST 800-53)

Prevent • Access Control (AC): Defines who or what is granted permission to access to a system or system component (especially on security enforcement)

• Configuration Management (CM): Documents the proper configuration

for the system and its components to support its mission and protect itself from harm

Assess • Audit and Authorization (AU): Identifies if a data spillage occurred and the users (who or what) involved

Contain • System Connectivity (SC): Identifies what external systems and information were potentially affected

• System and Information Integrity (SI): Identifies what internal

systems, information sets, and components were potentially affected

Eradicate • Incident Response (IR): Defines the organization’s response to a spill

Recovery • Contingency Planning (CP): Defines the process for responding to identified data/system loss scenarios

Page 19: Security and Privacy of Information and Federal Big Data - Goodier.

2.2.1 Basic federal definitions

Page 19

“Personal data” shall mean information from or about consumers, including, but not limited to:(1) first and last name; (2) home or other physical address, including street name and name of city or town; (3) email address or other online contact information(4) telephone number; (5) date of birth; (6) gender, racial, ethnic, or religious information; (7) government-issued identification number; (8) financial information, (9) employment information(10) a persistent identifier, such as a customer number held in a “cookie”

Page 20: Security and Privacy of Information and Federal Big Data - Goodier.

GAO’s Findings Summary

• “Views differ on the approach that any new privacy legislation or regulation should take.

• Nonetheless, the rapid increase in the amount and type of personal information that is collected and resold warrants reconsideration of how well the current privacy framework protects personal information.

• The challenge will be providing appropriate privacy protections without unduly inhibiting the benefits to consumers, commerce, and innovation that data sharing can accord.”

Page 21: Security and Privacy of Information and Federal Big Data - Goodier.

3. Federal Information Obesity challenges and safeguarding big data

Safeguarding Information while Sharing Big Data. 1.Volume, Velocity, Variety, Veracity challenge2.Concurrency challenge3.Query advantage challenge

Page 22: Security and Privacy of Information and Federal Big Data - Goodier.

3.1. What is the Use Case for the federal government? Volume, Velocity, Variety, Veracity challengePROBLEM: Suppose you have copyrighted photo that appears multiple times in a federal application AND you want to be able to edit that photo’s associated metadata.• This photo has a title. It can show up in a Variety of ways in your

photo stream, in sets, collections, groups on your web front page and in many, many more places.– if the Volume of copies is unlimited, and the Velocity of change

is also increasing, then that unique naming approach could potentially require you to update thousands of documents. Your naming approach just won’t scale.

How do you ensure that the photo’s copyright is being protected?

RESOLUTION: Separate and define self-describing metadata

Page 23: Security and Privacy of Information and Federal Big Data - Goodier.

3.1.1. What is the Use Case for the federal government? Resolving the 4Vs

RESOLUTION: Separate key metadata – – Place the title and other identifying metadata –

cognitive metadata - in a single “indexed photo information” or indexing document

– Create a separate “photo placement” document for each place the photo appears (these “photo placement” documents would then point to the “indexed photo information” document).

– Now when you display a target photo you will make two lookups: • one for the photo placement document • another for the indexed photo information document.

BENEFIT: You can easily change the title of the photo and still track the number of instances for your copyright statistics

– Just edit the indexed photo information document, and the title will be changed everywhere on your site.

Page 24: Security and Privacy of Information and Federal Big Data - Goodier.

3.2.1 What is the Use Case for the federal government? Concurrency challenge

• PROBLEM: There are multiple federal target authors, maybe even an editor, and each of them is looking at a single analytic report at any given time. – If you have data that you know is only edited

by a single person at any given time, it’s a good idea to place it into a single document based on a common ontology.

– RESOLUTION example: Palantir

Page 25: Security and Privacy of Information and Federal Big Data - Goodier.

3.2.1 What is the Use Case for the federal government? Concurrency challenge

• PROBLEM: Comments are different. Many people can write comments and they can do so independently and simultaneously. Once the analytic report is published comments can be added immediately. – To avoid write contention – that is, concurrent writes

happening to the same document you can store comments in separate documents, thus ensuring that only one author is editing a single document at any given time.

– RESOLUTION example: Drupal for Government

Page 26: Security and Privacy of Information and Federal Big Data - Goodier.

3.3 What is the Use Case for the federal government? Query advantage challenge

• PROBLEM:. Semantically rich data models and viewing technologies enable complex federal target queries, but how does the federal government safeguard personal information while exploiting targets of interest?

• Where is my target today?• What is my target doing?• When is my target active?

– One man’s target is another man’s freedom fighter…

– RESOLUTION example: Palantir

Page 27: Security and Privacy of Information and Federal Big Data - Goodier.

Questions and Looking ahead

Page 27

• Next Week – Cognitive Metadata the killer enabler for

federal Big Data Security and Privacy in the clouds

• It’s all about the metadata

Page 28: Security and Privacy of Information and Federal Big Data - Goodier.