Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.
-
Upload
guadalupe-oswald -
Category
Documents
-
view
214 -
download
0
Transcript of Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.
![Page 1: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/1.jpg)
Components of a Data Analysis System
Scientific Drivers in the Design of an Analysis System
![Page 2: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/2.jpg)
Data Import
• Format– Either widely used/accepted, or– Can be converted easily from something widely used– User need not know the details of the format– Well documented (e.g., which flavor of latitude).
• Fast Access– Disk I/O speeds do not follow Moore’s law– Read speed is more important than write speed– Caching– File size is only important to keep access times low
• Content must represent the details of the data• E2E - Full intent of the observer must be
embedded
![Page 3: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/3.jpg)
Data Export• Format
– Either widely used/accepted, or– Can be converted easily into something widely used– User need not know the details of the format– Well documented (e.g., which flavor of latitude).
• You can read what you write– Import format == Export format
• Fast Access– Disk I/O speeds do not follow Moore’s law– Read speed is more important than write speed
• Content must represent the details of the data• E2E - Full intent of the observer must be embedded.• Includes user annotation/comments
![Page 4: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/4.jpg)
Data Base System
• Ability to work with more than one data set• Data base for both export and import files • Large data volumes
– Access using scan numbers is no longer sufficient – Require the ability to select subsets of data via sophisticated
data-base queries– Moderate number of columns in data base index– ‘Index’ to data kept in memory to speed data access– File summaries at various levels of detail
• Various levels of ‘granularity”• Calibrated and raw data• E2E - User can add annotation/comments • Security – Only the observer can access data
![Page 5: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/5.jpg)
Data Archive
• Write speed more important than read speed.• File size is very important• Cannot anticipate types of user queries
– Large number of columns in data base index– Very sophisticated/fast RDBMS
• Storage need not be a widely used data format– Format can be very different from that used by
analysis system.
• Export format should be a widely used data format
![Page 6: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/6.jpg)
Interactive On-Line Data Analysis
• The ability to access data ASAP– Import file updates automatically as observations
proceed (real-time “filler”).– Index to file updates automatically– Updates happen per ‘integration’ (spectral-line) or per N
seconds (continuum) – Minimum integration time ~ few times the minimum time
of real-time “filler”– Analysis system automatically is aware of updated
index.– Read-protect online/filled data?
• User should be able to ‘see’ the data within an ‘integration’ of when it was taken (or N seconds).
![Page 7: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/7.jpg)
User Interface
• Command line– Familiar syntax better than a good syntax– Procedural with byte-wise compiling
(performance)– History, min-match or command completion– Useful error messages– Interruptible– Error trapping and exception handling– Ability to “Undo”
![Page 8: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/8.jpg)
User Interface
• GUI’s best for:– Interacting with data visualizations– Filling in forms
• data base queries• options for data pipelines
– Browsing for data files– Defining E2E data flow (ala labview)
![Page 9: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/9.jpg)
Imaging Tools• Visualization
– Shouldn’t try to recreate those things already available in another package – export instead.
• Data Flagging – Pick a system that works• Graphics
– Traditional capabilities (zoom in/out, scroll, print, save, …)– Data volume requires great performance, smart libraries
(screen resolution << # data pts)– Interactive feedback (e.g., defining baseline regions).
• Publishable plots or export into something else?– Default plot style– Ability to tweak everything (label formats; char sizes; add,
remove, move annotation; tick mark size; major/minor ticks, full box; grid; multiple X and Y axes, …..)
![Page 10: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/10.jpg)
Analysis Algorithms
• Algorithms well documented• Study what exists in other packages.• Robustness very important but so is speed
– Provide less robust but faster alternatives
• Developers should not force an algorithm on users• Developers should provide ‘defaults’ only• Building blocks better than a do-all algorithm.• Ability to use and modify ‘header’ information as
well as data.• E2E – do-alls are built out of the same building
blocks.
![Page 11: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/11.jpg)
Documentation
• On-line and hardcopy – Tutorials/Quick Guides – Cookbook
• Based on observing types
– Reference Manuals• Full, gory details• Data Formats• Algorithms
– Searchable by keywords
• Quick, interactive command help from within the system.
• Never release until these are in place
![Page 12: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/12.jpg)
User Support/Feedback
• A familiar system minimizes staff support
• Easily accessed, on-line “help desk” and “Suggestion” box
• Automatic generation of “bug” reports
• Observers of observers
![Page 13: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/13.jpg)
Marketing
• A familiar system already has a market• Don’t be another cereal on the supermarket shelf• Workshops are better than papers• Create a User Community• Responsive feedback from developers• Independent Beta testers• Reputation & first experiences are everything
![Page 14: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/14.jpg)
User Community
• User Forums
• Newsletters
• Accept User Contributions/Additions– Sourceforge-like system– NRAO-seal-of-approval
• NRAO Moderator
![Page 15: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/15.jpg)
Real-Time Data Display
• To guarantee data quality– Product is not stored (except for hardcopy)– Sequential processing -- different from E2E/Data pipeline– Fast is more important than accurate– Few bells and whistles -- must avoid the RTD black hole– A simple display for all observation types more important
than sophisticated displays for a few data types
• Display happens within an ‘integration’ of when data were taken – tied to real time filler
• GUI based – underlying language is unimportant• Output understandable by an operator
![Page 16: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/16.jpg)
Real Time Data Analysis• Pointing/Focus/Tipping/… are different from RTD
– Results should be stored (Data Base)– Results are used by the control system (pointing/focus) or by
subsequent analysis (tipping)– Accuracy is as important as speed– More bells, whistles, user-options– Sequential processing (non E2E/data pipeline)– Only a few observation types are handled
• Analysis happens within an ‘integration’ of when data were taken
• GUI based – underlying language is unimportant• Output understandable by an operator
![Page 17: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/17.jpg)
IDL Work Package
• SDFITS– Interim solution for data import/export– Class/IDL specific; soon Aips++/Aips/UniPOPS?– MD/BDFITS next generation (keywords,
incompleteness of contents, versatility, …)
• IDL – Tom Bania– Uses UniPOPS as a ‘model’ – familiar to many– Very good reproduction– Bania-centric – needs to be generalized
![Page 18: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/18.jpg)
IDL Work Package
• Glen Langston– Assess whether IDL will meet performance,
extensibility, usability, … goals.– Generalization to other observing types.– Real-Time data access and display – Developed on top of and in parallel with Tom’s
work (so, implementations have diverged)– Works well for Glen’s own experiments
![Page 19: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/19.jpg)
IDL Work Package
• Institutionalize what Tom and Glen have done– Code management– Code review– Combine Tom and Glen’s branch– Generalize code– Provide ways for Tom and Glen to contribute within
the same revision-control branch.
• Develop ‘Institutionalized’ code– Improve performance, usability, maintenance– Add/Replace I/O components with better CS
methods.
![Page 20: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/20.jpg)
Calibration Work Package
• User-tunable algorithms– Options for the ‘real-time filler’ – sequential– Options for E2E pipeline – non-sequential– Options for interactive data reduction
• Default algorithms for all observing cases• Extensible as new algorithms are
developed• User-defined/tweaked algorithms• Robust and not-so-robust algorithms
![Page 21: Components of a Data Analysis System Scientific Drivers in the Design of an Analysis System.](https://reader036.fdocuments.us/reader036/viewer/2022070307/551a8a4f550346b52d8b5ad3/html5/thumbnails/21.jpg)
Calibration Work Package
• Opacity/atmosphere model
• Output units
• Efficiencies– Source size– Telescope model
• Tsys(f) estimates
• Differencing schemes
• Non-linearities/template fitting/….