Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of Reduction data treatment for ARCS Tim Kelley, Caltech Materials Science DANSE workshop—June 22,...
reductiondata treatment for ARCS
Tim Kelley, Caltech Materials Science
DANSE workshop—June 22, 2004
reduction
• components to reduce raw, inelastic data from time-of-flight chopper spectrometer
• essential processing—must be done for every experiment
• validation is essential
• descended from Pharos IDL procedures
why change?
• information sources badly mixed– data formats, instrument specs, algorithms– not reusable
• monolithic– unit testing difficult– difficult/impossible to tinker with
• slow
gross testing
recognizable?
Success!
Yes
oh, $%^!
No
gross testing useful only as long as it is not needed!
unit testing
• test every execution path
• programmer provides– standard inputs, expected results– means of comparing actual and expected
results to determine pass/fail
• combine unit tests to create regression tests
• unit tests part of writing code!
example: execution paths
def theAnswer( yourInt): if yourInt == 42: return True else: return False
def theAnswer2( yourInt): if ‘Tim’ in os.uname(): raise RuntimeError, “I hate that guy!” if yourInt == 42: return True else: return False
why change?
• information sources badly mixed– data formats, instrument specs, algorithms– not reusable
• monolithic– unit testing difficult– difficult/impossible to tinker with
• slow
two approaches
why change?
• information sources badly mixed– data formats, instrument specs, algorithms– not reusable
• monolithic– unit testing difficult– difficult/impossible to tinker with
• slow
change to what?
• Object oriented– separate, equalize information sources
• histograms, instrument data, transformations
• Finer granularity, more layered– simpler components, easier to test, replace– add scientific context to generic components
• Unit tests, integrate nexus, Pyre
example: layering
C++
Python
std::vector
StdVector
S_QE
histograms
basic layers have no scientific content: can be reused for any science
higher layers add context
reduction structure
C++
Python
histograms instruments transformations
Store and access n. s. data• multidimensional data sets• 200—1400 MB
need• ability to associate metadata• efficient iteration• availability, stability
Histograms
Histograms: C++
Storage based on STL vector class• ANSI standard C++—universally available• Optimized by compilers
Adaptor classes extend to• Different storage schemes• Different dimensionality
Low-level routines: arithmetic, average, etc.
Histograms: Python
Group C++ containers to model complex data sets
• associate signal & error histograms, ...
Add context, preferred behaviors• names, axes, units• error propogation?
Instruments
Find, store, and serve instrument data• detector & monitor configuration, properties
– pixel-sample distances, angles, dimensions, ...
• moderator, chopper, sample positions...
Need• comprehensive description of instrument• flexibility to describe different instruments
Instruments: C++
Abstract base classes parallel NeXus
Concrete subclasses specialize to given instrument, detector type, etc.
STL-based containers implement sorting• e.g. pixels sorted by
– pixel-sample distance– scattering angle(s)
Instruments: Python
Interface to instrument descriptions• Hard-coded• NeXus or other files
Pass info to C++ layer• Python is better suited to parsing tasks• Also useful for instrument diagnostics and
simulation
Transformations
Variable changes, reductions, slicing• time-of-flight to energy• detector coord. to scattering angle (powder)
• S( Qi, Qj) for Qi (Qx, QY, Qz, E), ...
Need• scientific integrity• efficient (fast) operations• flexibility
Transformations: C++
Modular design• Building blocks perform simple tasks
– BinOverlapCalculator– BinOverlapMultiplier
Generic (iterator) interface• Independent of underlying container• Independent of instrument
Compiled, optimized for maximum speed
Transformations: Python
Driver routines• coordinate simpler objects to perform complex
tasks• EnergyRebinDriver, QRebinDriver • easily modified by user
Experiment with pipeline strategies• e.g. reduce data one pixel at a time
– Trivially parallel for much of process– Real-time refinement of ARCS data!
example: a C++ transformnamespace PharosMath{ template <typename NumT> void reduceSum2d( std::vector<NumT> const &vec2d, std::vector<NumT> & vec1d, std::vector<size_t> const & sizes, size_t axis) ...
1 1 1 1
1 1 1 1
1 1 1 1
4
4
4
3 3 3 3
Reduce a 2d array to 1d by summing rows or columns.
reduceSum2d unit tests
test_reduceSum2d
test_1
test_2
test_3
test_4
test_5
_runTest
_compareVecs
input #1 input #2
input sizes
reduceSum2d
exposing it Python
template <typename NumT>static void _callReduceSum2d( PyObject *py2dv, PyObject *py1dv, std::vector<size_t> const & dimSizes, size_t axis){ std::vector<NumT> *p2dv = static_cast<std::vector<NumT>*>(PyCObject_AsVoidPtr(py2dv)); std::vector<NumT> *p1dv = static_cast<std::vector<NumT>*>(PyCObject_AsVoidPtr(py1dv)); Pharos::reduceSum2d<NumT>( *p2dv, *p1dv, dimSizes, axis); return;}
calling it from Python
reduction.reduceSum2d( vector2d.handle(), vector2d.datatype(), vector1d.handle(), dimSizes, whichAxis)
a more interesting Python call
def S_QE_To_S_E( sQE):
sE = SE( sQE.datatype(), sQE.numE())
reduction.reduceSum2d( sQE.data(), sQE.datatype(), sE.data(),
sQE.sizes(), sQE.QAxis() )
return sE
instance of class that encapsulates S(|Q|,E).
class that encapsulates S(E)
Implementation
Category Progress To Do
Histograms•STL routines +90%•Pharos adaptors 100 %•unit testing ~50%
•“high-level” classes•abstract Adaptor hierachy
Instrument•Pharos/ARCS C++ classes 100%
•abstract “NeX-ified” hierarchy•unit testing ~20%
Transforms•S(|Q|, E) powders 98%•S(Q, E) sin. cryst. 30%
•revise interface•unit testing ~40%
Primary: C++ ~ 4000 lines; Python ~ 1800 lines; Auxiliary: Python-C bindings~4500 lines; tests~4800+ lines
how much effort?
• “topline” Original IDLC/C++ conversion 2-3 months
• topline reduction 10-12 weeks
Still to do
• Complete unit tests ~ 3-4 weeks• Another iteration ~ 4-6 weeks
– code cleaning/reorg (1)
– iterator interfaces for transformations (1-2)
– Python dataset classes (1-2)
– Performance questions (1)
• Test/integrate single crystal ~ 2 weeks• Pyre integration ~ 3+ weeks• Related, general tasks ~ 4-5 weeks
– Nexus C++ hierarchy (2)
– Strengthening Python-C++ integration (2-3)