Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline? One or more programs that...

23
Pipeline Basics Pipeline Basics Jared Crossley Jared Crossley NRAO NRAO

Transcript of Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline? One or more programs that...

Page 1: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Pipeline BasicsPipeline BasicsPipeline BasicsPipeline Basics

Jared CrossleyJared Crossley

NRAONRAOJared CrossleyJared Crossley

NRAONRAO

Page 2: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

What is a data pipeline?What is a data pipeline?What is a data pipeline?What is a data pipeline?

One or more programs that One or more programs that perform a task with reduced user perform a task with reduced user interaction.interaction.

May be developed as an extension May be developed as an extension of a more general and more of a more general and more interactive software system.interactive software system.

One or more programs that One or more programs that perform a task with reduced user perform a task with reduced user interaction.interaction.

May be developed as an extension May be developed as an extension of a more general and more of a more general and more interactive software system.interactive software system.

Page 3: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Why use it?Why use it?Why use it?Why use it?

Saves timeSaves time Especially with large (repetitive) data setsEspecially with large (repetitive) data sets Interactive data reduction may take a lot of time Interactive data reduction may take a lot of time

(even for an expert)(even for an expert)

ConsistencyConsistency Increased accessibility of a data reduction Increased accessibility of a data reduction

systemsystem You don’t have to be an “expert” to use a You don’t have to be an “expert” to use a

pipeline.pipeline. A good learning tool -- with good documentationA good learning tool -- with good documentation

Saves timeSaves time Especially with large (repetitive) data setsEspecially with large (repetitive) data sets Interactive data reduction may take a lot of time Interactive data reduction may take a lot of time

(even for an expert)(even for an expert)

ConsistencyConsistency Increased accessibility of a data reduction Increased accessibility of a data reduction

systemsystem You don’t have to be an “expert” to use a You don’t have to be an “expert” to use a

pipeline.pipeline. A good learning tool -- with good documentationA good learning tool -- with good documentation

Page 4: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Building a Pipeline: Start Building a Pipeline: Start simplesimple

Building a Pipeline: Start Building a Pipeline: Start simplesimple

Build a pipeline in layers.Build a pipeline in layers. The lowest level of the pipeline should still The lowest level of the pipeline should still

be interactive.be interactive. For example:For example:

Level 1: allow the user the specify input Level 1: allow the user the specify input parameters needed by the following tasks.parameters needed by the following tasks.

Level 2: find the best Level 2: find the best defaultdefault parameter parameter values for most data sets.values for most data sets.

Given these default values, most data can be Given these default values, most data can be processed with little interaction.processed with little interaction.

Focus on a subset of input data.Focus on a subset of input data.

Build a pipeline in layers.Build a pipeline in layers. The lowest level of the pipeline should still The lowest level of the pipeline should still

be interactive.be interactive. For example:For example:

Level 1: allow the user the specify input Level 1: allow the user the specify input parameters needed by the following tasks.parameters needed by the following tasks.

Level 2: find the best Level 2: find the best defaultdefault parameter parameter values for most data sets.values for most data sets.

Given these default values, most data can be Given these default values, most data can be processed with little interaction.processed with little interaction.

Focus on a subset of input data.Focus on a subset of input data.

Page 5: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Building a Pipeline: Building a Pipeline: continuedcontinued

Building a Pipeline: Building a Pipeline: continuedcontinued

The pipeline will evolve with timeThe pipeline will evolve with time Parameter dependencies will reveal Parameter dependencies will reveal

themselvesthemselves Data processing algorithms will become Data processing algorithms will become

apparent to the user. When well apparent to the user. When well defined, add it to the pipeline.defined, add it to the pipeline.

Acquire metadata when possible. Acquire metadata when possible. This can be used to initialize This can be used to initialize parameters.parameters.

The pipeline will evolve with timeThe pipeline will evolve with time Parameter dependencies will reveal Parameter dependencies will reveal

themselvesthemselves Data processing algorithms will become Data processing algorithms will become

apparent to the user. When well apparent to the user. When well defined, add it to the pipeline.defined, add it to the pipeline.

Acquire metadata when possible. Acquire metadata when possible. This can be used to initialize This can be used to initialize parameters.parameters.

Page 6: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Areas of concernAreas of concernAreas of concernAreas of concern

1.1. How much control should the How much control should the user be given?user be given?

Depends on the target audience. Depends on the target audience. Experts want more control than Experts want more control than novices.novices.

A compromise is lots of controls, but A compromise is lots of controls, but most of them pre-set to good initial most of them pre-set to good initial conditions.conditions.

1.1. How much control should the How much control should the user be given?user be given?

Depends on the target audience. Depends on the target audience. Experts want more control than Experts want more control than novices.novices.

A compromise is lots of controls, but A compromise is lots of controls, but most of them pre-set to good initial most of them pre-set to good initial conditions.conditions.

Page 7: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Areas of concernAreas of concernAreas of concernAreas of concern

2.2. How many output diagnostics How many output diagnostics should the pipeline produce?should the pipeline produce?

Varies by processing goal and user Varies by processing goal and user preference.preference.

If possible, include a pipeline If possible, include a pipeline parameter determines the amount parameter determines the amount of diagnostics.of diagnostics.

2.2. How many output diagnostics How many output diagnostics should the pipeline produce?should the pipeline produce?

Varies by processing goal and user Varies by processing goal and user preference.preference.

If possible, include a pipeline If possible, include a pipeline parameter determines the amount parameter determines the amount of diagnostics.of diagnostics.

Page 8: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

More on OutputMore on OutputMore on OutputMore on Output

In addition to the primary output In addition to the primary output product, consider outputting product, consider outputting calibrated data and log files.calibrated data and log files.

This allows advanced users to build This allows advanced users to build upon what the pipeline has doneupon what the pipeline has done

And, this allows for quick And, this allows for quick “upgrades” to data products. “upgrades” to data products.

In addition to the primary output In addition to the primary output product, consider outputting product, consider outputting calibrated data and log files.calibrated data and log files.

This allows advanced users to build This allows advanced users to build upon what the pipeline has doneupon what the pipeline has done

And, this allows for quick And, this allows for quick “upgrades” to data products. “upgrades” to data products.

Page 9: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Validating OutputValidating OutputValidating OutputValidating Output

This is job is necessarily This is job is necessarily interactive.interactive.

However, a pipeline can simplify However, a pipeline can simplify the process by…the process by… Providing an easy way to view output, Providing an easy way to view output,

including diagnosticsincluding diagnostics And an easy way to delete (or flag) And an easy way to delete (or flag)

unacceptable output.unacceptable output.

This is job is necessarily This is job is necessarily interactive.interactive.

However, a pipeline can simplify However, a pipeline can simplify the process by…the process by… Providing an easy way to view output, Providing an easy way to view output,

including diagnosticsincluding diagnostics And an easy way to delete (or flag) And an easy way to delete (or flag)

unacceptable output.unacceptable output.

Page 10: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

The VLA (AIPS) PipelineThe VLA (AIPS) PipelineThe VLA (AIPS) PipelineThe VLA (AIPS) Pipeline

Page 11: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

DescriptionDescriptionDescriptionDescription The pipeline is a script (AIPS run file) that The pipeline is a script (AIPS run file) that

automates automates Editing,Editing, Calibration,Calibration, And ImagingAnd Imaging

of VLA continuum data. May also process of VLA continuum data. May also process spectral line data.spectral line data.

Emulates an AIPS taskEmulates an AIPS task Takes input parametersTakes input parameters Outputs images and calibration plotsOutputs images and calibration plots

Suggested default parameters contained in AIPS Suggested default parameters contained in AIPS memo.memo.

The pipeline is a script (AIPS run file) that The pipeline is a script (AIPS run file) that automates automates Editing,Editing, Calibration,Calibration, And ImagingAnd Imaging

of VLA continuum data. May also process of VLA continuum data. May also process spectral line data.spectral line data.

Emulates an AIPS taskEmulates an AIPS task Takes input parametersTakes input parameters Outputs images and calibration plotsOutputs images and calibration plots

Suggested default parameters contained in AIPS Suggested default parameters contained in AIPS memo.memo.

Page 12: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

To use the AIPS pipeline: load data into AIPS; split out To use the AIPS pipeline: load data into AIPS; split out different frequencies.different frequencies.

To use the AIPS pipeline: load data into AIPS; split out To use the AIPS pipeline: load data into AIPS; split out different frequencies.different frequencies.

Demo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) Pipeline

Page 13: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Set the VLARUN Set the VLARUN input parameters.input parameters.

Set the VLARUN Set the VLARUN input parameters.input parameters.

Demo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) Pipeline

Flagging control

Pause during calibration

Diagnostic plots

Imaging control

Self-cal (fragile)

Page 14: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Image output by pipeline Image output by pipeline (axes and wedge added)(axes and wedge added) Image output by pipeline Image output by pipeline (axes and wedge added)(axes and wedge added)

Demo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) Pipeline

Page 15: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Demo of VLA Pipeline Demo of VLA Pipeline

System: (System: (Imaging the VLA Imaging the VLA

Archive)Archive)

Demo of VLA Pipeline Demo of VLA Pipeline

System: (System: (Imaging the VLA Imaging the VLA

Archive)Archive)

Page 16: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

DescriptionDescriptionDescriptionDescription

The VLA Pipeline The VLA Pipeline SystemSystem is an is an extension of the AIPS pipeline.extension of the AIPS pipeline.

IncludesIncludes1.1. Data acquisition, and preparation for Data acquisition, and preparation for

processingprocessing

2.2. Data processing (Data processing (AIPS pipelineAIPS pipeline))

3.3. Image finalization, and exportImage finalization, and export

4.4. ArchivingArchiving

5.5. Easy interactive data validationEasy interactive data validation

The VLA Pipeline The VLA Pipeline SystemSystem is an is an extension of the AIPS pipeline.extension of the AIPS pipeline.

IncludesIncludes1.1. Data acquisition, and preparation for Data acquisition, and preparation for

processingprocessing

2.2. Data processing (Data processing (AIPS pipelineAIPS pipeline))

3.3. Image finalization, and exportImage finalization, and export

4.4. ArchivingArchiving

5.5. Easy interactive data validationEasy interactive data validation

Page 17: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

At a high level of pipeline automation, initial user At a high level of pipeline automation, initial user interaction takes place only on the command line.interaction takes place only on the command line.

The user can query the raw data archive via a The user can query the raw data archive via a Perl script:Perl script:

At a high level of pipeline automation, initial user At a high level of pipeline automation, initial user interaction takes place only on the command line.interaction takes place only on the command line.

The user can query the raw data archive via a The user can query the raw data archive via a Perl script:Perl script:

Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline

Page 18: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Next, select data files for download and filling.Next, select data files for download and filling. Next, select data files for download and filling.Next, select data files for download and filling.

Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline

Select files

Download

Page 19: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

A Unix shell script waits to be called by cron.A Unix shell script waits to be called by cron. A Unix shell script waits to be called by cron.A Unix shell script waits to be called by cron.

Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline

Start AIPS

Execute AIPS Pipeline

Page 20: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

After processing, the output is archived via After processing, the output is archived via scripts invoked by cron.scripts invoked by cron.

The data is now available online.The data is now available online. The final step is image validation…The final step is image validation…

After processing, the output is archived via After processing, the output is archived via scripts invoked by cron.scripts invoked by cron.

The data is now available online.The data is now available online. The final step is image validation…The final step is image validation…

Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline

Page 21: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

A web-based validation tool allows for A web-based validation tool allows for validation.validation.

A web-based validation tool allows for A web-based validation tool allows for validation.validation.

Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline

Page 22: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

Images and diagnostics can be viewed together and flagged for Images and diagnostics can be viewed together and flagged for removal.removal.

Images and diagnostics can be viewed together and flagged for Images and diagnostics can be viewed together and flagged for removal.removal.

Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline

Page 23: Pipeline Basics Jared Crossley NRAO NRAO. What is a data pipeline?  One or more programs that perform a task with reduced user interaction.  May be.

For more infoFor more infoFor more infoFor more info About AIPS Pipeline (VLARUN):About AIPS Pipeline (VLARUN):

AIPS Memo 112, by L. Sjouwerman. AIPS Memo 112, by L. Sjouwerman. http://www.aips.nrao.edu/aipsmemo.htmlhttp://www.aips.nrao.edu/aipsmemo.html

VLARUN “online” documentation. From the VLARUN “online” documentation. From the AIPS prompt type AIPS prompt type explain VLARUNexplain VLARUN

About Pipeline System and NVAS:About Pipeline System and NVAS: See the NVAS web page. See the NVAS web page.

http://www.aoc.nrao.edu/~vlbacaldhttp://www.aoc.nrao.edu/~vlbacald

For data acquisition scripts, see J. Crossley’s For data acquisition scripts, see J. Crossley’s web page. web page. http://www.aoc.nrao.edu/~jcrossle/http://www.aoc.nrao.edu/~jcrossle/

About pipeline basics:About pipeline basics: See notes on J. Crossley’s web page.See notes on J. Crossley’s web page.

About AIPS Pipeline (VLARUN):About AIPS Pipeline (VLARUN): AIPS Memo 112, by L. Sjouwerman. AIPS Memo 112, by L. Sjouwerman.

http://www.aips.nrao.edu/aipsmemo.htmlhttp://www.aips.nrao.edu/aipsmemo.html

VLARUN “online” documentation. From the VLARUN “online” documentation. From the AIPS prompt type AIPS prompt type explain VLARUNexplain VLARUN

About Pipeline System and NVAS:About Pipeline System and NVAS: See the NVAS web page. See the NVAS web page.

http://www.aoc.nrao.edu/~vlbacaldhttp://www.aoc.nrao.edu/~vlbacald

For data acquisition scripts, see J. Crossley’s For data acquisition scripts, see J. Crossley’s web page. web page. http://www.aoc.nrao.edu/~jcrossle/http://www.aoc.nrao.edu/~jcrossle/

About pipeline basics:About pipeline basics: See notes on J. Crossley’s web page.See notes on J. Crossley’s web page.