INDEX [ ] · PDF fileData Flow detail processing ETL, ... of SSIS 64-bit issues, 790–791...
Transcript of INDEX [ ] · PDF fileData Flow detail processing ETL, ... of SSIS 64-bit issues, 790–791...
883
INDEX
Symbols
“ “ (double quotes), building strings using, 175
+ (string concatenation), 177, 186–187== (equivalence operator), 175[ ] (bracket characters), qualifi cation of
column names using, 181–182
Numbers
32-bit modeDTExecUI and, 790Visual Studio and, 791Windows OSs in, 474
64-bit mode, 790–79180/20 rule (Pareto principle), 228
A
absolute references, environment references, 765
Access (Microsoft)64-bit support in, 415–417accessing source data from, 414–415,
421–427referencing columns in expressions
within, 181accessibility, UI design principles, 667ACE (Access Engine), for Microsoft Offi ce,
415–417ACE OLE DT Provider, 415ACH (Automated Clearing House) fi les
Control Flow batch creation, 850–853Control Flow loop, 846–848
Control Flow retrieval of XML fi le size, 848–850
Data Flow capturing total batch items, 859–860
Data Flow detail processing ETL, 860–861
Data Flow parsing and error handling, 854–856
Data Flow validation, 853–854, 856–859
input fi le specifi cation, 800as load package, 845package structure, 801payments via, 806–807setting up, 845–846solution architecture, 803
AcquireConnection methodadding connection time methods to
components, 595building Destination adapter
component, 627–628building Source adapter component,
604–606defi ned, 272retrieving data from database, 273–274retrieving fi les from FTP Server,
274–275active time, Data Flow components,
506–507administration, of SSIS
64-bit issues, 790–791basic reporting, 791–795catalog and, 743–744clustering, 768–770command-line utilities, 774
bindex.indd 883bindex.indd 883 2/27/12 8:37:25 AM2/27/12 8:37:25 AM
COPYRIG
HTED M
ATERIAL
884
administration, of SSIS (continued)creating central server, 766–768creating database (SSISDB), 747–748custom reporting, 795data taps, 765–766deployment models, 748DTExec, 774DTExecUI, 775–780DTUtil, 780–782environments, 760–765legacy security, 785–787monitoring package execution, 791overview of, 743package confi guration, 770–773package deployment, 751–757performance counters, 796project deployment, 748–751scheduling packages, 787–790securing catalog, 782–785setting catalog properties, 744–747ssis_admin role, 782–783summary, 796T-SQL for managing security, 785T-SQL for package execution, 757–758T-SQL for setting parameter values,
758–759T-SQL querying tables to set parameter
values, 759–760administrators, Management Studio and,
36–37ADO
coding SQL statement property, 76executing parameterized SQL statements,
69–71populating recordsets, 117
ADO.NETcoding SQL statement property, 76Connection Manager, 595, 851–852creating connection for CDC tools, 399executing parameterized SQL statements,
69–71outputting Analysis Services results to, 46sorting data with SQL Server, 383source in Data Flow, 11, 115
Advanced Editordesign-time functionality and, 589Import Column Transformation using,
142–143OLE DB Command Transformation
using, 146–147transformation outputs and, 497–498user interface as alternative to, 643user interface overriding, 651viewing components with, 666
Advanced Windowing Extensions (AWE), 475AES (Advanced Encryption Standard), 746Aggregate Transformation
asynchronous transformation outputs and, 498
as blocking transformation, 496–497in Data Flow, 119–121example using, 159
Agileiterative development, 525MSF Agile, 537–539
All Executions report, 792–794Analysis Services. See SSAS (SQL Server
Analysis Services)ANDS, in data extraction, 381–382annotations, on packages, 32, 805Application object
maintaining, 683operations of, 682package management and, 683–686package monitoring and, 686–687
applications, interaction with external. See external applications, interaction with
architecturedata architecture, 805–806scaling out, 474of SSIS, 5
archiving fi lescreating dynamic packages, 251–252overview of, 52
artifacts, in SDLC, 523ASP.NET, 727–731assemblies
adding to GAC, 598–602
administration – assemblies
bindex.indd 884bindex.indd 884 2/27/12 8:37:26 AM2/27/12 8:37:26 AM
885
creating new projects, 597example using custom .NET, 261–264strong names, 646–647, 651–652using managed, 260–261
asynchronous transformationsidentifying, 493, 500vs. synchronous transformations, 119,
498–500writing Script components to act as,
302–305Audit Transformation
in Data Flow, 128–129handling more bad data with, 248
auditing, SSIS database, 791authentication
types supported, 782Windows Authentication and, 18
Automated Clearing House fi les. See ACH (Automated Clearing House) fi les
Autos window, script debugging using, 309–310
AWE (Advanced Windowing Extensions), 475
B
backpressurein SSIS 2012, 488staging environments for source, 512
bad data, handling, 471–473bank fi le package
Control Flow batch creation, 828–832Control Flow fi le loop, 824–825Control Flow retrieval of fi le properties,
825–828Data Flow capturing total batch items,
840Data Flow detail processing ETL,
841–845Data Flow parsing and error handling,
832–835Data Flow validation, 832, 835–839fl at fi les, 801setting up, 819–823
BaseSelect variable, using expressions in Data Flow, 198–199
batch operationsACH fi le package, 850–853bank fi le package, 828–832BankBatch table, 813–814BankBatchDetail table, 813–814batch entities in case study database, 809Data Flow capturing total batch items,
840, 859–860executing batch of SQL statements,
71–72stored procedures for adding, 816–817stored procedures for balancing, 818–819stored procedures for updating, 818stored procedures for working with,
816–819bcp.exe, inserting data into SQL Server
database, 64–65Beginning C# 3.0: An Introduction to Object
Oriented Programming (Purdum), 254benchmarks, 796BI (Business Intelligence) platform, 1BI xPress, Pragmatic Works, 791BIDS (Business Intelligence Development
Studio), 4BLOB (Binary Large Objects) counters,
Performance Monitor, 519–520blocking transformations
Data Flow design practices, 508–510non-blocking, steaming, and row-based
transformations, 493–495optimizing package processing and effects
of, 516overview of, 496–497semi-blocking transformations, 495–496
Boole, George, 524Boolean expressions
in conditional expressions, 187–188precedence constraints used with,
551–555syntax of, 182–183
Boolean literals, 180boot.ini fi le, 474
asynchronous transformations – boot.ini fi le
bindex.indd 885bindex.indd 885 2/27/12 8:37:26 AM2/27/12 8:37:26 AM
886
bottlenecks, in Data Flow, 516–518bracket characters ([ ]), qualifi cation of
column names using, 181–182branching, as source control method, 546breakpoints
adding to Data Flow Task, 638enabling and using, 569–572setting for debugging script, 308
buffer managerin asynchronous component outputs, 499in execution trees, 502
buffersData Flow memory, 492–493Destination adapters de-allocating data
in, 501in execution trees, 502monitoring Data Flow execution, 503–
505optimizing package processing, 513–514performance counters, 519–520, 796synchronous transformation outputs and,
499–500Build menu, projects and, 752BULK INSERT statement, SQL, 64Bulk Insert Task
adding to Control Flow, 65–66overview of, 64–65using with typical data load, 67–68
Business Intelligence (BI) platform, 1Business Intelligence Development Studio
(BIDS), 4
C
C#expression language and, 174Hello World example, 257–258Script Task accessing C# libraries, 43–44scripting with, 254selecting as scripting language, 255–256
Cache Connection Manager (CCM). See CCM (Cache Connection Manager)
Cache Data Sources, Lookup Transformation, 474
cache optionslimitations of SCD, 336in Lookup Transformation, 474
Cache Transformationconfi guring Cache Connection Manager,
229Data Flow and, 124loading Lookup Cache with, 229–230
Call Stack window, 571Candidate Key Profi les
Data Profi ling Task, 318–319turning results into actionable ETL steps,
321capture instance (shadow or change) tables,
in CDCoverview of, 394–396querying, 401–405writing entries to, 394
cascaded Lookup operations, 227–228case sensitivity, of variables, 170, 268case study
ACH Control Flow batch creation, 850–853
ACH Control Flow loop, 846–848ACH Control Flow retrieval of XML fi le
size, 848–850ACH Data Flow capturing total batch
items, 859–860ACH Data Flow detail processing ETL,
860–861ACH Data Flow parsing and error
handling, 854–856ACH Data Flow validation, 853–854,
856–859ACH fi le for bank payments, 806–807ACH load package, 845ACH package setup, 845–846advantages of, 798background information related to
company in, 798–799bank fi le Control Flow batch creation,
828–832bank fi le Control Flow fi le loop,
824–825
bottlenecks, in Data Flow – bank fi le Control Flow fi le loop
bindex.indd 886bindex.indd 886 2/27/12 8:37:27 AM2/27/12 8:37:27 AM
887
bank fi le Control Flow retrieval of fi le properties, 825–828
bank fi le Data Flow capturing total batch items, 840
bank fi le Data Flow detail processing ETL, 841–845
bank fi le Data Flow parsing and error handling, 832–835
bank fi le Data Flow validation, 832, 835–839
bank fi le package and variable setup, 819–823
BankBatch tables, 813–815business problem addressed by, 799corporate ledger data, 815–816customer table, 810–811CustomerLookup table, 813data architecture, 805–806database model for, 808–809database setup for, 810driver package setup, 800–881e-mail Control Flow processing, 862–865e-mail Data Flow processing, 865–866e-mail load package, 861–862e-mail package setup and fi le system
tasks, 862ErrorDetail table, 816fi le storage locations, 806interpreting the results, 879–880invoice table, 811–813load packages, 819lockbox fi les, 807–808matching process Control Flow, 867matching process high-confi dence Data
Flow, 870–874matching process (invoice matching), 867matching process logic, 868–870matching process medium-confi dence
Data Flow, 875–878matching process package setup, 867–868naming conventions in, 804–805overview of, 797PayPal or direct credits to corporate
account, 808
solution architecture, 801–804solution summary, 799–800stored procedures working with batches,
816–819summary, 800–881testing, 866tips related to package development, 805
castingcasting operator, 169–170conditional expression issues, 188
catalogbuilt-in reporting, 791–792as central storage location, 743–744Create Catalog command, 747–748executing packages deployed to, 680–681logging, 582–584Managed Object Model and, 671managing, 672–673operation logs and, 703–704package monitoring and, 686–687permissions, 784project, folder, and package listings,
688–689project deployment model and, 749–751securing, 782–785setting catalog properties, 744–747stored procedures securing, 785
Catalog class, 671–673CatalogCollection class, 672–673CatalogFolder class
folder management with, 673–674overview of, 672server deployment project, 679
.caw fi le, 229–230CCM (Cache Connection Manager)
defi ned, 203loading Lookup Cache from any source
with, 229–230selecting in full-cache mode of Lookup
Transformation, 216CDC (Change Data Capture)
API, 396–398benefi ts of, 392–393instance tables, 394–396
bank fi le Control Flow retrieval of fi le properties – CDC (Change Data Capture)
bindex.indd 887bindex.indd 887 2/27/12 8:37:27 AM2/27/12 8:37:27 AM
888
CDC (Change Data Capture) (continued)overview of, 391–392preparing, 393–394querying, 401–405sources in Data Flow, 11using new SSIS tools, 398–401
CDC Control Task, 398–400CDC Source, 398–400CDC Splitter, 398–401change management, in development, 522Change Tracking, 392Changing Attributes
complex dimension changes with SCD, 331–333
dimension tables, 323updates output, 333
Character Map Transformationcolumn properties in user interface
assembly, 665–667in Data Flow, 129–130processing bank fi le check and invoice
details, 841–842checkpoints
controlling start location, 463creating simple control fl ow, 456–457Data Flow restart using, 476–477effect of containers and transactions on,
457–459inside checkpoint fi le, 461–463restarting packages using, 454variations of FailPackageOnFailure
property, 459–461child packages, 80–81Class Library, 596classes, scripting in SSIS, 259–260cleansing data. See data cleansingCleanup method, component runtime and,
594CLR (Common Language Runtime), 670CLS (Command Language Specifi cation), 602clustering, 768–770code
scripting in SSIS, 259–260source code control, 525–526
code reusecopy-and-paste operation for, 259–260custom assemblies for, 261–264managed assemblies for, 260–261
CodePlex.com, 795Collection class, 688–689Column NULL Ratio Profi le, Data Profi ling
Task, 319, 321Column Pattern Profi le, Data Profi ling Task,
320Column Statistics Profi le, Data Profi ling
Task, 320–321Column Value Distribution Profi le, Data
Profi ling Task, 319columns
Copy Column Transformation, 130Derived Column Transformation, 121–122design-time methods for column data
types, 591design-time methods for setting column
properties, 592Export Column Transformation, 131–133Import Column Transformation, 142–144referencing in expressions, 181–182
columns, in UIdisplaying, 654–657properties, 665–667selecting, 657–661
ComboBox control, for column selection, 658–661
comma-delimited fi les, Flat File sources as, 110
Command Language Specifi cation (CLS), 602command-line
DTExec, 774DTExecUI, 775–780DTUtil, 780–782executing console application in Control
Flow, 81–82utilities, 774
comment fi elds, analyzing with Term Extraction Transformation, 153–156
Common Language Runtime (CLR), 670common table expressions (CTEs), 389–391
CDC (Change Data Capture) – common table expressions (CTEs)
bindex.indd 888bindex.indd 888 2/27/12 8:37:27 AM2/27/12 8:37:27 AM
889
communication mechanism, of transformations, 493
comparison operationscasting issues in, 170concatenation operator in, 186–187
complex queries, writing for Change Data Capture, 391
ComponentMetaData properties, Source Component, 603–604
componentsadding connection time functionality,
594–595adding design-time functionality, 589–
593adding run-time functionality, 593–594adding to SSIS Toolbox, 633–634building, 595building complete package, 636–637component-level properties in user
interface, 661–663design time debugging, 634–636Destination. See Destination ComponentPipeline Component methods and, 588–
589preparing for coding Pipeline
Components, 596–602Row Count Component, 309runtime debugging, 637–640Script Component. See Script Componentseparating component projects from UI
(user interface), 645Source Component. See Source
ComponentTransformation. See Transformation
Componenttypes of, 586upgrading to SQL Server 2012, 641
composite domains, DQS, 367–368compound expressions
conditional, 188creating, 174
concatenation operator (+), string functions and, 186–187
conditional expressions
building logical evaluation expressions, 187–188
creating, 174conditional operator, 174Conditional Split Transformation
capturing total batch items, 840connecting to Lookup Transformation,
245–246handling dirty data, 244loading fact tables, 342matching process medium-confi dence,
875–876Merge Join Transformation using, 203processing bank fi le check and invoice
details, 841querying CDC in SSIS, 404scaling across machines using, 477–478
Configuration objectoverview of, 707–708programming, 708–709
Connection ManagersADO.NET, 851Analysis Services, 46building Destination Component, 625building Source Component, 605–606Cache. See CCM (Cache Connection
Manager)defi ning connection characteristics, 9expressions in properties of, 193–194File, 595, 625fl at fi les, 595, 822Foreach ADO Enumerator example, 100FTP, 53HTTP, 55–56OLE DB, 67, 107–108, 727, 830overview of, 31Package Designer tab for, 32Project, 822–823properties, 193–194SMTP, 83–84, 868Source adapters and, 586–587sources pointing to, 106–107values returned by, 595WMI, 84
communication mechanism, of transformations – Connection Managers
bindex.indd 889bindex.indd 889 2/27/12 8:37:28 AM2/27/12 8:37:28 AM
890
connection time, adding methods to components, 594–595
connectionscoding SQL statement property according
to, 76creating across packages, 234–236to data sources in Script Task, 271–279executing parameterized SQL statement,
69–71Connections collection, Script Task, 272console application, executing in Control
Flow, 81–82constraints
evaluating, 30precedence constraints. See precedence
constraintscontainers
container tasks, 42in Control Flow architecture, 8–9effect on checkpoints, 457–459Foreach ADO Enumerator example,
100–102Foreach File Enumerator example, 98–99Foreach Loop Container, 97–98grouping tasks into, 31groups vs., 95logging, 576–577For Loop Container, 95–97precedence constraints controlling,
550–551Sequence Container, 94for storing parameters (environments),
674–676summary, 103Task Host Container, 93
Control Flowadding bulk insert to, 65checkpoints occurring only at, 454completing package, 239connections in, 31containers in. See containerscustomizing item properties, 28Data Flow compared with, 28, 105–106,
488–491
defi ning for package, 237evaluating tasks, 30example using Script Task variables for,
269–271expressions in precedence, 195–196expressions in tasks, 194–195handling workfl ows with, 491looping and sequence tasks, 42–43options for setting variables, 13overview of, 6precedence constraints, 8, 29–30, 549Script Task in. See Script Tasktasks in, 6–7, 194–195Toolbox tabs related to, 27–28
Control Flow, in case studyACH fi le batch creation, 850–853ACH fi le loop, 846–848ACH fi le retrieval of XML fi le size,
848–850bank fi le batch creation, 828–832bank fi le loop, 824–825bank fi le retrieval of fi le properties,
825–828e-mail package, 862–865invoice matching process, 867–870
control table, in parallel loading, 479–480conversion
rules for date/time types, 166–167Unicode and non-Unicode data type
issues, 167–169using Data Conversion Transformation,
121Copy Column Transformation, 130copy-and-paste operation, code reuse with,
259–260copy-on-fi rst-write technology, database
snapshots, 406corporate ledger data, 815–816correlation operations, in Data Flow design,
510–511counters, Performance Monitor, 518–520CPU cost, 376Create Catalog command, 747–748credentials, Windows Authentication and, 18
connection time, adding methods to components – credentials, Windows Authentication
bindex.indd 890bindex.indd 890 2/27/12 8:37:28 AM2/27/12 8:37:28 AM
891
cross-data fl ow communication, 115cross-package communication, 115CTEs (common table expressions), 389–391cubes, processing, 46customers
Customer table, 810–811CustomerLookup table, 813database entities in case study, 809
customizing SSISadding connection time functionality,
594–595adding design-time functionality, 589–
593adding run-time functionality, 593–594building complete package, 636–637building components, 595building Destination Component, 625–
633building Source Component, 602–614building Transformation Component,
614–625debugging components, 634–636Destination Component, 588installing components, 633–634overview of, 585–586Pipeline Component methods and,
588–589preparing for coding Pipeline
Components, 596–602runtime debugging, 637–640Source Component, 586–587summary, 641Transformation Component, 587UI component, 667upgrading components to SQL Server
2012, 641
D
Dashboard report, 794data cleansing
analyzing source data for. See data profi ling
Derived Column use, 354–357
DQS (Data Quality Services), 366–370DQS Cleansing Transformation, 131,
370–373error outputs and, 471–473Fuzzy Grouping, 363–365Fuzzy Lookup, 357–363overview of, 353–354sources in this book for, 322summary, 373transformations in Data Flow design,
511–512Data Conversion Transformation
in Data Flow, 122in Excel Source, 109Unicode and non-Unicode data type
issues, 168–169Data Defi nition Language (DDL)
defi ning Data Flow for package, 238Execute DDL Task, 45
Data Encryption Standard (DES), 746data extraction
Data Flow restart using, 476JOINS, UNIONS and subqueries in, 381–
382modularizing, 384–385overview of, 376SELECT * problem in, 376–377set-based logic in, 389–391sorting databases, 382–384sources in this book for, 322SQL Server and text fi les, 385–389transformations during, 378–381WHERE clause tool in, 377–378
Data Flow connections, 31Control Flow compared with, 28creating for package, 237–239, 242customizing item properties, 28data taps for viewing data in, 765–766data viewers, 106destinations in. See destinationserror handling and logging, 14Error Row Confi guration properties, 572example, 157–160
cross-data fl ow communication – Data Flow
bindex.indd 891bindex.indd 891 2/27/12 8:37:28 AM2/27/12 8:37:28 AM
892
Data Flow (continued)expressions in, 197–200matching process high-confi dence, 870–
874matching process medium-confi dence,
875–878NULL values in, 183, 185overview of, 9–10performing query tuning when
developing, 378–381pipeline and, 586restart, 475–477scripting in. See Script Componentsources in. See sourcessummary, 160synchronous vs, asynchronous
transformations, 294–302transformations in. See transformationsunderstanding, 105–106working with, 34
Data Flow , in case studyACH fi le capturing total batch items,
859–860ACH fi le detail processing ETL, 860–861ACH fi le parsing and error handling,
854–856ACH fi le validation, 853–854, 856–859bank fi le capturing total batch items, 840bank fi le detail processing ETL, 841–845bank fi le parsing and error handling,
832–835bank fi le validation, 832, 835–839e-mail package, 865–866
Data Flow enginecomparing with Control Flow, 488–491data processing in Data Flow, 491–492design practices, 508–513execution trees, 501–503handling workfl ows with Control Flow,
491memory buffer architecture, 492–493monitoring execution, 503–505optimizing package processing, 513–516overview of, 487–488
pipeline execution reporting, 506–507pipeline execution tree log details, 505–506pipeline performance monitoring, 518–520SSIS engine, 488summary, 520transformations types, 493–501troubleshooting performance bottlenecks,
516–518Data Flow Task
adding to Control Flow, 34breakpoints added to, 638Data Flow restart using, 476–477defi ning Data Flow for package, 237–239Foreach ADO Enumerator example, 102implementing as checkpoint, 454For Loop Container, 97overview of, 10, 47–48in parallel loading, 483–484querying CDC in SSIS, 403referencing columns in expressions
within, 181–182data loading
database snapshots and, 406–408MERGE operator and, 408–411
data miningAnalysis Services tasks, 45Data Mining Query Task, 46–47mining objects, 46
Data Mining Extension (DMX), 47, 130–131Data Mining Model Training Destination,
118Data Mining Query Task, 46–47Data Mining Query Transformation, 130–
131data pipeline architecture, parallelism, 474data preparation tasks
archiving fi les, 52Data Profi ling Task, 48–50File System Task, 50–51FTP Task, 53–55overview of, 48Web Service Task, 55–60XML Task, 60–64
data processing, in Data Flow, 491–492
Data Flow – data processing, in Data Flow
bindex.indd 892bindex.indd 892 2/27/12 8:37:29 AM2/27/12 8:37:29 AM
893
Data Profi le Viewer, 318–321data profi ling
defi ned, 315executing Data Profi ling Task, 315–317overview of, 48–50turning results into actionable ETL steps,
321viewing results of Data Profi ling Task,
317–321Data Profi ling Task
initial execution of, 315–317overview of, 48–50viewing results of, 317–321
Data Quality Services. See DQS (Data Quality Services)
data scrubbing. See mainframe ETL, with data scrubbing
data sharpening, 378data sources. See sourcesdata stores, 706data taps, 765–766Data Transformation Services. See DTS (Data
Transformation Services)data types
confi guring in Flat File sources, 111–112date and time support, 166design-time methods for column data
types, 591Destination Component, 630–631impact on performance, 167mapping and converting as needed, 20parameters, 172–173Source Component, 608–609SSIS, 164–166tips related to working with large
projects, 805Transformation Component, 618–619understanding, 164Unicode and non-Unicode conversion
issues, 167–169variables, 172–173, 184
Data Vieweradding to Fuzzy Grouping
Transformation, 364–365
adding to Fuzzy Lookup, 360–361benefi ts of, 805CDC Splitter outputs, 401data taps and, 765–766overview of, 106querying CDC in SSIS, 405script debugging using, 309using relational join in source, 210
The Data Warehouse Toolkit (Kimball and Ross), 323
data warehousesdata extraction and cleansing, 322data profi ling. See data profi lingdimension table loading. See dimension
table loadingfact table loading, 337–344overview of, 313–315SSAS processing, 345–350summary, 351using Master ETL package, 350–351
databasebuilding basic package for joining data,
207–209creating, 747–748retrieving data, 272–274snapshots, 406–408sorting data, 382–384Transfer Database Task, 88–89Transfer Error Messages Task, 89Transfer Logins Task, 89–90transferring SQL Server objects,
91–92database, for case study
BankBatch tables, 813–815corporate ledger data, 815–816customer table, 810–811CustomerLookup table, 813data architecture and, 805–806ErrorDetail table, 816invoice table, 811–813model used, 808–809setup, 810stored procedures working with batches,
816–819
Data Profi le Viewer – database, for case study
bindex.indd 893bindex.indd 893 2/27/12 8:37:29 AM2/27/12 8:37:29 AM
894
DataReader Destination, 118, 728DataView controls
column display, 654column selectin, 657–658
dateadding new columns for Change Data
Capture, 391data types, 166functions for expressions, 188–190
DatePart() expression functionBoolean expressions and, 182overview of, 188–189string functions, 186T-SQL function vs., 175
DBAs, 521DDL (Data Defi nition Language)
defi ning Data Flow for package, 238Execute DDL Task, 45
debug mode, package execution and, 240debugging
breakpoints. See breakpointscomponents at design time, 634–636components at runtime, 637–640interacting with external applications
and, 720debugging, script
Autos, Locals, and Watch windows, 309–310
breakpoints, 308Immediate window, 310–311overview of, 308Row Count Component and Data
Viewers, 309de-duplication, in Fuzzy Grouping
Transformation, 363–365DELETE statements, 408–411deployment
of custom .NET assembly, 263executing packages deployed to catalog,
680–681executing packages with T-SQL, 736–737models, 748package model. See package deployment
model
project model. See project deployment model
server deployment, 679–680utility for, 751–752, 754
deployment manifest, creating, 751–752Derived Column Transformation
advanced data cleansing with, 354–357as alternative to SCD, 336Audit Transformation compared with,
129confi guring Lookup Transformation
with, 225in Data Flow, 122–123example using, 158expressions and, 199–200handling dirty data, 243InfoPath example, 724–725loading fact tables, 338–339processing bank fi le check and invoice
details, 841–843DES (Data Encryption Standard), 746DescribeRedirectedErrorCode method, 594design practices, Data Flow
data cleansing and transformation, 511–512
data integration and correlation, 510–511leveraging Data Flow, 509–510overview of, 508–509staging environments, 512–513
design timeadding methods to components, 589–593Advanced Editor and, 589component phases, 588creating package parameters, 172debugging components, 634–636defi ning variables, 170Transformation Component methods,
615–620Destination adapters
as integral to Data Flow, 500–501troubleshooting bottlenecks in Data Flow
by removing, 516–517Destination Assistant, 107, 237–238Destination Component
DataReader Destination – Destination Component
bindex.indd 894bindex.indd 894 2/27/12 8:37:29 AM2/27/12 8:37:29 AM
895
AcquireConnection method, 627–628ComponentType property and, 598confi guring Script Component Editor,
289–290Connection Managers and, 625debugging, 634–636defi ned, 288installing, 633–634overview of, 588PreExecute method, 631–633ProcessInput method, 631–632ProvideComponentProperties method,
626–627ReinitializeMetaData method, 629–630SetUsageType method, 630–631types of pipeline components, 586Validate method, 628–629
DestinationConnection property, Foreach File Enumerator example, 99
destinationsconnectivity to, 719creating destination table, 20in Data Flow, 13, 88–89Data Mining Model Training, 118DataReader, 118Dimension and Partition Processing, 118dragging DataReader to Data Flow, 728Excel, 116Flat File, 116function of, 106OLE DB, 116–117, 843overview of, 115–116Raw File, 117Recordset, 117selecting for bulk insert, 65specifying in Import and Export Wizard,
19SQL Server and Mobile, 118troubleshooting bottlenecks in Data
Flow, 516–517development
custom. See customizing SSISsoftware development. See SDLC
(software development life cycle)
Diff operation, 721Diffgram, 721Dimension and Partition Processing
Destination, 118dimension table loading
complex tables, alternatives to SCD Transformation, 335–336
complex tables, preparing data, 327–331complex tables, using SCD
Transformation, 331–335overview of, 332–333simple tables, 323–327
dimensionsDimension and Partition Processing
Destination, 118processing, 46solving changing dimensions with SCD
Transformation, 126directives, creating new projects, 597directories
creating, 51polling for fi le delivery, 86–87
dirty datacleansing. See data cleansinghandling, 242–246
disk I/O, 508–510Distributed Transaction Coordinator
Transactions. See DTC (Distributed Transaction Coordinator) Transactions
DMX (Data Mining Extension), 47, 130–131Document Type Defi nitions (DTDs), 61documents, MSF Agile, 538domains, DQS
DQS Cleansing Transformation and, 370–373
overview of, 367–368double quotes (“ “), building strings using, 175DQS (Data Quality Services)
as alternative to Integration Services, 799Cleansing Transformation, 131, 370–373data cleansing workfl ow of, 366–370KB (Knowledge Base), 366overview of, 366
driver package setup, 880–881
DestinationConnection property, Foreach File – driver package setup
bindex.indd 895bindex.indd 895 2/27/12 8:37:30 AM2/27/12 8:37:30 AM
896
DT_DBDATE data type, 164–166DT_DBTIME data type, 164–166DT_DBTIME2 data type, 166DT_DBTIMESTAMP2 data type, 166DT_DBTIMESTAMPOFFSET data type, 166DT_NUMERIC data type, 178DT_UI4 data type, 178DTC (Distributed Transaction Coordinator)
Transactionsdefi ned, 463–464single package, multiple transactions,
466–468single package, single transaction, 464–
466two packages, one transaction, 468–469
DTDs (Document Type Defi nitions), 61DTExec
32 and 64-bit versions, 79132-bit runtime executables in 64-bit
mode, 416–417debugging components, 634executing packages, 774runtime debugging, 637–640
DTExecUIas 32-bit application, 790executing packages, 775–780
DTS (Data Transformation Services)Import and Export Wizard and, 2package failure. See package restartabilityruntime managed code library, 676SSIS compared with, 1–2
Dts objectaccessing variables in Script Task, 267–
268confi guring Script Task Editor, 265–266connecting to data sources in Script Task,
272–279overview of, 287–288
DtsDebugHost.exe, 638DtsPipelineComponent attribute, 598DTUtil, 780–782dump and reload, for Change Data Capture,
391dynamic packages, 162, 250–252
E
Edit Script button, 255–256editors
Advanced Editor. See Advanced EditorFTP Task Editor, 53Precedence Constraint Editor, 29Property Expressions Editor, 41Script Component Editor, 289–291Script Task Editor, 265–266task editors, 39–41, 50, 65Term Extraction Transformation Editor,
153–154e-mail, Send Mail Task, 83e-mail package
Control Flow processing, 862–865Data Flow processing, 865–866as load package, 861–862payments via, 801setup and fi le system tasks, 862
encryptionalgorithms, 745–746data protection, 21re-encrypting all packages in a directory,
781end-to-end packages. See package creationEngineThreads property, Data Flow, 503enumerators
Foreach ADO Enumerator example, 100–102
Foreach File Enumerator example, 98–99Foreach Loop Container, 97–98
environment referencesabsolute and relative, 765confi guring projects to use environments,
763EnvironmentReference object, 681–682
environment variablesEnvironmentVariable class, 674package confi guration and, 706referenced during package execution, 749
EnvironmentInfo class, 674EnvironmentReference object, 681–682environments
confi guring project to use, 763–765
DT_DBDATE data type – environments
bindex.indd 896bindex.indd 896 2/27/12 8:37:30 AM2/27/12 8:37:30 AM
897
containers for storing parameters, 674–676
creating and confi guring project level parameters, 761
Data Flow design practices for staging, 512–513
Managed Object Model and, 671migrating packages between, 773overview of, 760package confi guration and, 771referencing, 681–682setting up, 761–762setting up environment references, 765variables referenced during package
execution, 749equivalence operator (==), 175error handling
ACH fi le package, 854–856advanced precedence constraints, 551bank fi le package, 832–835basic precedence constraints, 549–551Boolean expressions used with
precedence constraints, 551–555breakpoints, 569–572building Transformation Component
and, 623–624catalog logging, 582–584combining expressions and multiple
precedence constraints, 556–557error rows and, 572–576ErrorQueue table, 248–249in Excel Destination, 116log events, 577–581logging, 576–577logging providers, 577with Merge Transformation, 144in OLE DB Source, 109overview of, 14, 549staged data in, 513summary, 584user interface assembly and, 663–665working with multiple precedence
constraints, 555–556error messages
in Excel Destination, 116Lookup Transformation and, 207, 223–
226with Merge Transformation, 144in OLE DB Source, 109Transfer Error Messages Task, 89
error outputs, 471–473error rows
error handling in Data Flow, 572example demonstrating use of, 573–576table of error handlers and descriptions,
573ErrorDetail table, 816ErrorQueue table, SQL Server, 248–249escape sequences, string literals, 179–180ETL (extraction, transformation, and
loading)ACH fi le detail processing, 860–861bad data handling with Fuzzy Lookup
Transformation, 133–138bank fi le detail processing, 841–845data transformation aspect of, 47development and, 523Import and Export Wizard, 3mainframe ETL. See mainframe ETL,
with data scrubbingMaster ETL package, 350–351SSIS as ETL tool, 1–2, 5tasks in SSIS, 6–7team preparation and, 522–523turning Data Profi le results into
actionable ETL steps, 321Evaluation Operations, in precedence
constraints, 552event handling
breakpoints, 569–572catalog logging, 582–584events available at package level, 558–560inheritance, 567–569log events, 577–581logging and, 576–577logging providers, 577OnError events, 565–566OnPreExecute events, 567
equivalence operator (==) – event handling
bindex.indd 897bindex.indd 897 2/27/12 8:37:30 AM2/27/12 8:37:30 AM
898
event handling (continued)overview of, 557–558responding to events in Script Task,
283–284summary, 584working with event handlers, 34–35,
560–565event logs
log providers, 700–701programming to log providers, 703specifying events to log, 701–702
eventsavailable at package level, 558–559custom, 560defi ned, 281log events, 577–581log provider for Windows events, 577logging, 284–286, 576–577methods for fi ring, 281monitoring pipeline logging, 503–505OnError events, 565–566OnPreExecute events, 567raising in Script Component, 292–293raising in Script Task, 281–283responding to in Script Task, 283–284WMI Event Watcher Task, 86
Excel (Microsoft)64-bit support in, 110, 415–417accessing source data from, 414–415,
417–421destinations in Data Flow, 116executing parameterized SQL statement,
69–71expressions similar to cells in, 163referencing columns in expressions
within, 181sources in Data Flow, 10, 109–110
EXCEPT, set-based logic for extraction, 389–391
exception handling. See also error handling, 305–308
exception logs, 703Execute Package Task
master ETL package and, 350–351
overview of, 80–81package execution, 240scaling out memory pressures with, 475
Execute Package window, 507Execute Process Task
overview of, 81–82SSAS cube processing with, 345, 349
Execute SQL TaskADO.NET properties, 851–852capturing multi-row results, 73–75capturing singleton results, 72–73coin toss example, 552–555combining expressions and multiple
precedence constraints, 556–557completing packages, 239creating simple Control Flow, 455–457e-mail Control Flow processing, 863executing batch of SQL statements,
71–72executing parameterized SQL statements,
69–71executing stored procedures, 75–78expressions in, 194–195Foreach ADO Enumerator example,
100–101matching process logic and, 868–870OLE DB properties and, 830overview of, 68–69in parallel loading, 481–484project deployment model and, 750retrieving output parameters from stored
procedures, 78–80execution, package
from command-line with DTExec, 774from command-line with DTExecUI,
775–780monitoring Data Flow, 503–505monitoring execution, 791overview of, 240, 493–495total time in Data Flow vs. Control Flow,
490–491T-SQL for, 736–737, 757–758
Execution Results tab, 281–282execution trees, Data Flow
event handling – execution trees, Data Flow
bindex.indd 898bindex.indd 898 2/27/12 8:37:31 AM2/27/12 8:37:31 AM
899
monitoring, 503–505optimizing package processing, 513–515overview of, 501–503pipeline log details, 505–506pipeline reporting, 506–507
ExecutionOperation objects, 686explicit variable locking, in Script Task, 267Export Column Transformation
in Data Flow, 131–133optimizing processing with, 515task, 132
expression adorners, 163Expression Builder
creating dynamic packages, 251opening, 251referencing parameters, 181referencing variables, 180–181working with, 175–176
Expression Task, for setting variables, 13, 196–197
expressionsBoolean expressions, 182–183Boolean literals, 180C#-like syntax of, 174–175casting, 169–170column references, 181–182combining with precedence constraints,
556–557conditional expressions, 187–188confi guring Derived Column
Transformation, 121–122in Connection Manager properties,
193–194in Control Flow, 194–196in Data Flow, 197–200data types, 164–170date and time functions, 188–190dealing with NULLs, 183–185dynamic package objects and, 162equivalence operator, 177evaluating, 30Expression Builder, 175–176Expression Task, 196–197Foreach ADO Enumerator example, 100
line continuation, 177–178in Lookup Transformation, 226numeric literals, 178–179overview of, 163–164, 190parameter data types, 172–173parameter defi nition, 171–172parameter reference, 181parameters as, 162–163, 191–193reading string data conditionally, 121setting task properties at runtime, 40–41string concatenation, 177string functions, 185–187string literals, 179–180summary, 200variable data types, 172–173variable defi nition, 170–171variable references, 180–181variables as, 162, 191–193
Expressions tab, task editors, 40–41, 265Extensible Markup Language. See XML
(Extensible Markup Language)Extensible Stylesheet Language
Transformations (XSLT), 61, 722external applications, interaction with
InfoPath data source, 720–726outputting to ASP.NET, 727–731overview of, 719–720summary, 736–741T-SQL for package execution, 736–741Winform application for dynamic
property assignment, 731–736external management, of SSIS
application object maintenance operations, 683
catalog management, 672–673Configuration object and, 707–709deployment project model, 676–677DTS runtime managed code library, 676EnvironmentReference object, 681–682environments, 674–676event logging, 701–702executing packages deployed to catalog,
680–681folder management, 673–674
ExecutionOperation objects – external management, of SSIS
bindex.indd 899bindex.indd 899 2/27/12 8:37:31 AM2/27/12 8:37:31 AM
900
external management, of SSIS (continued)LogProviders collection object and,
702–703managed code in, 670Managed Object Model code library,
671–672operation logs in SQL Server 2012,
703–705package confi gurations, 705–707package log providers, 699–701package maintenance, 684–686package management example, 689–699package monitoring, 686–687package operations, 682–684parameter objects, 677–678project, folder, and package listings,
688–689server deployment, 679–680setting up demonstration package for,
670–671summary, 716–718WMI Data Reader Task example, 710–
715WMI Data Reader Task explained,
709–710WMI Event Watcher Task example,
716–718WMI Event Watcher Task explained, 715WMI task overview, 709
extraction. See data extractionextraction, transformation, and loading. See
ETL (extraction, transformation, and loading)
F
fact table, data warehouses and, 337–344Fail Component, Lookup Transformation,
223, 225FailPackageOnFailure property
checkpoints and, 454creating simple control fl ow, 455–457variations of, 459–461
Failure value, constraints, 8
False
Boolean expressions and, 182–183Boolean literals and, 180in conditional expressions, 187–188
fast load option, OLE DB Destination, 117FastParse option, Flat File Source, 113–114File Connection Manager
building Destination Component, 625values returned by, 595
fi le system deployment, 752File System Task
ACH fi le package, 848–850archiving fi les, 52bank fi le batch creation, 828–832bank fi le package, 823, 825–828basic fi le operations, 50–51e-mail package, 862Foreach File Enumerator example, 99
File Transfer Protocol. See FTP (File Transfer Protocol)
fi lesACH. See ACH (Automated Clearing
House) fi lesarchiving, 52, 251–252bank fi les. See bank fi le packagecheckpoint. See checkpointscopying assembly fi le into GAC, 263fl at. See fl at fi lesgenerating unique fi lenames, 259–260lockbox fi les. See lockbox fi leslocking for editing and committing
changes, 531–532operations, 50–51polling a directory for fi le delivery, 86–87raw. See raw fi lesrepresented in Solution Explorer, 27retrieving from FTP Server, 54–55,
274–275storage locations, 806text. See text fi lesXML. See XML fi les
FileUsageType property, building Source Component, 605–606
fi xed attributes, 331–332, 335
external management, of SSIS – fi xed attributes
bindex.indd 900bindex.indd 900 2/27/12 8:37:31 AM2/27/12 8:37:31 AM
901
Flat File Destinationin Data Flow, 116example using, 160Merge Join Transformation using, 211
Flat File SourceAdvanced page, 111–112Columns page, 111defi ned, 10exporting batches of text fi les, 385FastParse option, 113–114generating Unpivot Transformation, 151Import Column Transformation using,
142–143MultiFlatFile Connection Manager, 114overview of, 31, 110SQL Server data types and, 113text qualifi er option, 110–111
fl at fi lesaccessing source data from, 414, 442–
447Connection Managers, 595, 822creating connection for, 235
foldersdata architecture, 806granting user access to, 783managing with CatalogFolder class,
673–674removing from catalog, 675
For Loop Containercoin toss example, 552–555combining expressions and multiple
precedence constraints, 556–557overview of, 95–97in parallel loading, 480–481tasks, 42–43
Foreach ADO Enumerator, 97, 100–102Foreach File Enumerator, 97Foreach Loop Container
ACH fi le package and, 847creating loop with, 250Foreach ADO Enumerator example,
100–102Foreach File Enumerator example, 98–99lockbox fi les and, 824
overview of, 97–98tasks, 42
formsbuilding UI form, 653modifying form constructor, 653–654steps in building UI (user interface), 644
FTP (File Transfer Protocol)Connection Manager, 53FTP Task, 54–55FTP Task Editor, 53package deployment via, 53retrieving fi le from FTP server, 54–55,
274–276full-cache mode, Lookup Transformation
Cache Connection Manager option in, 230
in cascaded Lookup operations, 227–228data preparation for complex dimension
table, 329defi ned, 474features of, 205overview of, 202partial-cache mode option, 220–222trade-off between no-cache mode and, 220working in, 216–219
fully blocking transformations, 119fully qualifi ed variable names, Script Task,
268Functional Dependency Profi le, 319functions
Change Data Capture, 396–398date and time, 188–189expression, 174–175string, 185–187
Fuzzy Grouping Transformationadvanced data cleansing with, 363–365in Data Flow, 138–141defi ned, 357
Fuzzy Lookup Transformationadding Data Viewer to, 360–361Advanced tab, 135, 359Columns tab, 135, 358connection to SQL Server database,
361–362
Flat File Destination – Fuzzy Lookup Transformation
bindex.indd 901bindex.indd 901 2/27/12 8:37:32 AM2/27/12 8:37:32 AM
902
Fuzzy Lookup Transformation (continued)defi ned, 357example of, 136–138handling bad data with, 133–134matching process high-confi dence, 872–
873matching process medium-confi dence,
875–876output to, 134Reference Table tab, 134–135, 358
G
GAC (global assembly cache)adding assemblies to, 598–602copying assembly fi le into, 263installing user interface assembly in,
645–646Managed Object Model and, 671using managed assemblies, 260
gacutil.exe, 600, 652GateKeeperSequence expression, for Control
Flow precedence, 195–196global assembly cache. See GAC (global
assembly cache)GridView controls
column display, 654column selectin, 657–658displaying SSIS data with ASP.NET
control, 729groups
containers vs., 95Fuzzy Grouping, 138–141, 363–365highlighting tasks to create, 95task groups, 31, 94
GUI, managing security with, 783–784
H
Header Derived Column Transformation, 844Hello World example, of SSIS scripting,
257–258helper methods, Source Component, 608heterogeneous data
Access, 415–417, 421–427Excel, 415–421fl at fi les, 442–447ODBC, 447–449Oracle, 427–430other sources, 450overview of, 413–415summary, 451XML and Web Services, 431–442
historical attribute, 331–334horizontal partitioning, 477–478HTTP Connection Manager, 55–56HttpConnection property, Web Service Task,
56hubs, creating central SSIS server, 766–768
I
IBM MQ Series, 82icons, expression adorner and, 163IDTSComponentEvents interface, 281IDtsComponentUI interface
Delete method, 648Edit method, 649–651Help method, 648implementing, 647–648Initialize method, 649New method, 648–649steps in building UI (user interface), 644
IF.THEN logic, in conditional expressions, 187–188
Ignore Failure, Lookup Transformation, 223–224
Immediate window, script debugging using, 310–311
implicit variable locking, in Script Task, 267Import and Export Wizard
as basic tool in ETL world, 3creating destination table, 20DTS and, 2moving data from sources, 17opening and selecting source in welcome
screen, 18
Fuzzy Lookup Transformation – Import and Export Wizard
bindex.indd 902bindex.indd 902 2/27/12 8:37:32 AM2/27/12 8:37:32 AM
903
options for saving and executing package, 20–22
specifying destination for data, 19Import Column Transformation
in Data Flow, 142–144optimizing processing with, 515saving fi le snapshots and, 844
inferred membersfact tables and, 344SCD and, 332–333updates output, 335
InfoPath data source, 720–726inheritance
components and, 588event handling and, 567–569
Input tab, Web Service Task, 56input verifi cation
design-time methods, 590Transformation Component, 619–620
Insert Destinationcomplex dimension changes with SCD,
333limitations of SCD, 336optimizing SCD packages, 336
INSERT statements, MERGE operator for, 408–411
Integration Services. See SSIS (SQL Server Integration Services), introduction to
IntegrationServices class, 671–672INTERCEPT, set-based logic for extraction,
389–391invoices, in case study
as database entity, 809Invoice table, 811–813matching process Control Flow, 867matching process high-confi dence Data
Flow, 870–874matching process logic, 868–870matching process medium-confi dence
Data Flow, 875–878matching process package setup, 867–868
I/O cost, 376–377ISNULL() expression function
setting NULL values in Data Flow, 183, 185
T-SQL function vs., 175IsSorted property, data sources, 383iterative methodology
in MSF Agile, 537–539in SDLC, 525
J-K
JET (Join Engine Technology)JET engine, 415OLE DB Provider, 415
jobs, SQL Server Agent, 91joins
contrasting SSIS and relational joins, 203–206
in data extraction, 381–382overview of, 201–202summary, 231
joins, with Lookup Transformationbuilding basic package, 207–209with cascaded operations, 227–228with CCM and Cache Transform, 229–230with expressionable properties, 226features of, 206–207in full-cache mode, 216–219in multiple outputs mode, 223–226in no-cache mode, 219–220overview of, 202–203in partial-cache mode, 220–222using relational join in source, 209–211
joins, with Merge Join Transformationbuilding packages, 211–212overview of, 144–145retrieving relational data, 212–214specifying sort order, 214–216working with, 203
L
labeling (striping) source versions, 547–548legacy security, 785–787libraries
Class Library, 596DTS runtime managed code library, 676
Import Column Transformation – libraries
bindex.indd 903bindex.indd 903 2/27/12 8:37:32 AM2/27/12 8:37:32 AM
904
libraries (continued)Managed Object Model code library,
671–672of views, 745
line continuation characters, expression syntax and, 177–178
lineage number, referring to columns by, 182LineageIDs
asynchronous transformation outputs and, 498–499
Source adapters and, 500synchronous transformation outputs and,
499–500transformation outputs and, 498
literalsBoolean, 180numeric, 178–179string, 179–180
load packages, in case studyACH fi le package. See ACH (Automated
Clearing House) fi lesbank fi le package. See bank fi le packagee-mail package. See e-mail package
loadingData Flow restart using, 476data warehouse. See data warehousesLookup Cache from any source, 229–230scaling out using parallel, 479–485
localization, UI design principles, 667Locals window, script debugging using, 310lockbox fi les. See also bank fi le package
looping, 824parsing and error handling, 832–835saving fi le snapshot to database, 844–845solution architecture for case study, 803specifi cation for input fi les, 800structure of, 807–808
Log method, Dts object, 287–288log providers
overview of, 699–701programming for, 702–703in SSIS, 700–701
loggingcatalog logs, 582–584
designing logging framework, 523event logging, 284–286, 577–581, 701–702LOGGING_LEVEL parameter, 739LogProviders collection object, 702–703monitoring pipeline events, 503–505operation logs in SQL Server 2012,
703–705overview of, 14, 576–577package log providers, 699–701pipeline execution reporting, 506–507pipeline execution tree log details, 505–
506providers of, 577writing log entry in Script Component,
293–294writing log entry in Script Task, 287–288
logical AND, 29–30logical expressions
casting issues in, 170using with precedence constraints, 29–30
logical OR, 29–30login, database, 89–90LogProviders collection object, 702–703Lookup Transformation
ACH Data Flow validation, 856as alternative to SCD for dimension table
data, 336building basic package, 207–209caching optimized in, 474caching smallest table in, 212with cascaded operations, 227–228with CCM and Cache Transform, 229–230for complex dimension table, 328–330in Data Flow, 123–124with expressionable properties, 226features of, 206–207in full-cache mode, 216–219Fuzzy Lookup compared with, 133–138,
357handling dirty data, 245–246loading fact table, 338matching process high-confi dence, 871matching process medium-confi dence, 876in multiple outputs, 223–226
libraries – Lookup Transformation
bindex.indd 904bindex.indd 904 2/27/12 8:37:33 AM2/27/12 8:37:33 AM
905
in no-cache mode, 219–220in partial-cache mode, 220–222relational joins compared with, 203–205relational joins performed with, 202–203relational joins used in source, 209–210for simple dimension table, 324–325Term Lookup, 156–157
loopingACH fi le package, 846–848bank fi le package, 824–825CCM enabling reuse of caches across
iterations, 230Foreach Loop Container, 97–98For Loop Container, 95–97tasks, 42–43
LTRIM function, Conditional Split Transformation, 244
M
magic numbers, converting to NULLs, 378mail servers, SMTP, 868Main() function, Hello World example,
257–259mainframe ETL, with data scrubbing
creating Data Flow, 242fi nalizing, 246–247handling dirty data, 242–246handling more bad data, 247–249looping, 250overview of, 241–242summary, 252
maintenance, package, 684–686Manage_Object_Permissions, granting, 783managed assemblies
code reuse and, 260–261using custom .NET assemblies, 261–264
managed codecatalog management, 672–673deployment project model, 676–677DTS runtime managed code library, 676EnvironmentReference object, 681–682environments, 674–676
executing packages deployed to catalog, 680–681
external management of SSIS with, 670folder management with, 673–674Managed Object Model code library,
671–672overview of, 670parameter objects, 677–678server deployment, 679–680setting up demonstration package for,
670–671Managed Object Model. See MOM
(Managed Object Model)Management Studio
creating Customer table with, 810–811creating table with, 731–732overview of, 36–37package deployment with, 754
MapInputColumn/MapOutputColumn methodsdesign-time methods, 590Source Component, 611
mappingdefi ning Data Flow for package, 238–239DQS Cleansing Transformation and,
371–372handling more bad data, 249loading fact table, 342–343sources to destinations, 20variable data types to SSIS Data Flow
types, 172–173master ETL packages, 350–351memory
buffers, 492–493Data Flow and, 105–106design practices for, 508–510increasing in 32-bit Windows OS, 474Merge Join Transformation and, 203monitoring in blocking transformations,
508–509pipeline processing occurring in, 474–475transformations working in, 119
Merge Join Transformationas alternative to SCD for dimension table
data, 336
looping – Merge Join Transformation
bindex.indd 905bindex.indd 905 2/27/12 8:37:33 AM2/27/12 8:37:33 AM
906
Merge Join Transformation (continued)in Data Flow, 144–145features of, 203InfoPath example, 723–726loading fact tables, 339, 341–342, 344Look Transformation compared with,
203matching process high-confi dence, 873matching process medium-confi dence,
875–876pre-sorting data in, 127processing bank fi le check and invoice
details, 842relational joins compared with, 203–204,
206semi-blocking nature of, 495–496working with, 211–216
Merge operationas source control method, 546–547XML Task, 721–722
MERGE operator, for mixed-operation data loads, 408–411
Merge Transformationin Data Flow, 144pre-sorting data in, 127semi-blocking nature of, 495–496
Message Queue TaskFor Loop Container, 95overview of, 82–83
messaging systems, 82–83methodology, in SDLC
iterative, 525overview of, 523waterfall, 524
Microsoft Access. See AccessMicrosoft Excel. See ExcelMicrosoft Message Queuing (MSMQ), 82–83Microsoft Offi ce, 720–726Microsoft Solution Framework, 525Microsoft Team Foundation Server, 526,
533–536mining models, training, 118mining objects, processing, 46
miss-cache feature, Lookup Transformation, 206–207, 221–222
modularize, in data extraction, 384–385MOM (Managed Object Model)
catalog management, 672–673code library, 671–672deployment projects, 676–677environment references, 681–682environments, 674–676executing packages deployed to catalog,
680–681folder management, 673–674package parameters, 678server deployment, 679–680
monitoringbuilt-in reporting, 791–795custom reporting, 795Data Flow execution, 503–505package execution, 791packages, 686–687
MQ Series, IBM, 82MSF Agile
documents, 538overview of, 537reports, 538–539source control, 539team builds, 539work items, 537–538
MSMQ (Microsoft Message Queuing), 82–83Multicast Transformation, in Data Flow, 145,
516–518MultiFlatFile Connection Manager, in Data
Flow, 114multiple outputs, Lookup Transformation
with, 223–226MyExpressionTester variable, 181
N
naming conventionsbest practices, 241creating connections across packages, 235generating unique fi lename for archiving
fi le, 259–260
Merge Join Transformation – naming conventions
bindex.indd 906bindex.indd 906 2/27/12 8:37:33 AM2/27/12 8:37:33 AM
907
referencing columns in expressions and, 181–182
SSIS data types, 164–166using fully qualifi ed variable names, 268variables, 170
native transactiondefi ned, 463single package in SQL Server using,
469–471nesting
conditional expressions, 188containers, 94
.NETADO.NET. See ADO.NETASP.NET, 727–731custom assemblies, 261–264scripts, 124–125Winform application for dynamic
property assignment, 731–736no-cache mode, of Lookup Transformation
in cascaded Lookup operations, 227–228defi ned, 202, 474partial-cache mode option, 220–222trade-off between full-cache mode and,
220variables used in auditing, 32working with, 219–220
non-blocking transformationsoverview of, 493row-based, 494–495server resources required by, 495streaming, 493–494with synchronous outputs, 500
nonmatches, in Lookup Transformation, 207normal load option, OLE DB Destination,
117NULL values
Boolean expressions used in, 182–183converting magic numbers to, 378in Data Flow, 121–122, 185Multicast Transformation compared
with, 145variables and, 183–184
numeric literals, 178–179
O
objectsbuffers, 614data mining, 46Dts object. See Dts objectdynamic package, 162environment references, 681–682external management, 707–709logging, 702–703package management, 683–686package monitoring, 686–687parameters, 677–678permissions, 784storing recordset in memory using object
variables, 101tasks, 40transferring database objects between
databases, 91–92ODBC
accessing source data from, 414, 447–449coding SQL statement property according
to, 76executing parameterized SQL statements,
69–71sources in Data Flow, 11
ODS (operational data store), 48Offi ce (Microsoft), InfoPath example of
interaction with, 720–726OLAP (online analytical processing), 45OLE DB
coding SQL statement property, 76as Data Flow source, 10outputting Analysis Services results to, 46
OLE DB Command Transformationin Data Flow, 145–147loading fact table, 343optimizing processing with, 515–516optimizing SCD package by removing,
333using set-based update vs., 344
OLE DB Connection Manageradding connections across packages,
234–236adding new connections, 67
native transaction – OLE DB Connection Manager
bindex.indd 907bindex.indd 907 2/27/12 8:37:34 AM2/27/12 8:37:34 AM
908
OLE DB Connection Manager (continued)confi guring connections, 727Execute SQL Task properties, 830selecting, 107–108selecting in full-cache mode of Lookup
Transformation, 216OLE DB Destination
adding, 843in Data Flow, 116–117e-mail Data Flow processing, 866fi nalizing package with scrubbed data,
246–247loading fact table, 339, 342
OLE DB SourceADO.NET Source vs., 115confi guring, 727–728in Data Flow, 107–109data preparation for complex dimension
table, 327–329loading fact table, 337, 340–341Merge Join Transformation and, 212–213querying CDC in SSIS, 403relational join for data extraction, 209–
210sorting data with SQL Server in, 383
OnError eventsapplying, 565–566defi ning event handler for, 560–565error handling and logging and, 14inheritance and, 567–569specifying events to log, 701
online analytical processing (OLAP), 45online references
for 64-bit version of Offi ce 2010, 415for conversion rules for date/time types,
166for DQS (Data Quality Services), 366for regular expressions, 297
OnPreExecute eventsapplying, 567defi ning event handler for, 560–565
OPENROWSET functiondata extraction and text fi les, 385–389MERGE operator and, 411
operational data store (ODS), 48operations
Application object, 682–683logging in SQL Server 2012, 703–705Managed Object Model and, 671package, 682–684Project class, 676–677
optimization, staging environments for Data Flow, 512–513
Oracleaccessing source data from, 414, 427–430CDC option, 405
ORDER BY clauseloading fact table and, 340Merge Join Transformation and, 213–214
Output tab, Web Service Task, 57output verifi cation
design-time methods, 590–591Transformation Component, 619–620
outputsasynchronous and synchronous
transformation, 498–500DQS Cleansing Transformation, 372evaluating results of Data Profi ling Task,
317–321improving reliability and scalability of,
471–473Lookup Transformation multiple, 223–
226turning Data Profi le results into
actionable ETL steps, 321Overview report, in SSIS administration,
792–793OverwriteDestination property, Foreach File
Enumerator example, 99
P
Package Confi guration Wizard, 771package creation
adding connections, 234–236basic transformation tutorial, 233–234completing, 239creating Control Flow, 237
OLE DB Connection Manager – package creation
bindex.indd 908bindex.indd 908 2/27/12 8:37:34 AM2/27/12 8:37:34 AM
909
creating Data Flow, 237–239executing, 240making packages dynamic, 250–252performing mainframe ETL with data
scrubbing. See mainframe ETL, with data scrubbing
saving, 239summary, 252
package deployment modelcreating deployment manifest, 751–752list of, 748overview of, 751Package Deployment Wizard, 752–755SSIS Package Store and, 755–757
Package Deployment Wizard, 752–755Package Designer
annotations, 32Connection Manager tab, 31–32Control Flow tab, 29–31Data Flow tab, 34Event Handlers tab, 34–35grouping tasks, 31overview of, 28Package Explorer tab, 35–36Parameters tab, 34Variables window, 33–34
Package Explorer, 35–36Package object
LogProviders collection and, 702operations, 682package maintenance and, 684–686
Package Protection Levels, 21package restartability
containers within containers and checkpoints, 457–459
FailPackageOnFailure property, 459–461inside checkpoint fi le, 461–463overview of, 453–455simple control fl ow, 455–457staging environments for, 512
package transactionseffect on checkpoints, 457–459overview of, 463–464
single package, multiple transactions, 466–468
single package, single transaction, 464–466
single package using native transaction in SQL Server, 469–471
two packages, one transaction, 468–469packages
32-bit and 64-bit modes, 416–417annotations, 31, 805Application object maintenance
operations, 683building basic, 207–209building custom, 636–637built-in reports, 791–792compiled assemblies in, 263–264Configuration object, 707–708confi gurations, 705–707, 770–773containers as miniature, 94Control Flow and, 6as core component in SSIS, 5creating fi rst, 25–26creating to run parallel loads, 480deploying via FTP, 53deployment models. See package
deployment modeldesigning, 28executing, 36, 680–681execution time in Data Flow vs. Control
Flow, 490–491expressions in. See expressionsgrouping tasks in Sequence Containers,
94handling corrupt, 781–782lists, 688–689log providers, 699–701maintaining, 684–686Managed Object Model and, 671management example, 689–699modular, 523monitoring, 686–687, 791naming conventions, 804operations, 682–684optimizing, 513–516
package deployment model – packages
bindex.indd 909bindex.indd 909 2/27/12 8:37:34 AM2/27/12 8:37:34 AM
910
packages (continued)parameters, 14, 162, 678parent and child, 80–81precedence constraints, 8properties of, 28re-encrypting, 781scheduling, 787–790security of, 782T-SQL for executing, 736–741, 757–758
packages, in case studyACH fi le package. See ACH (Automated
Clearing House) fi lesbank fi le package. See bank fi le packagee-mail package. See e-mail package
parallel loading, scaling out with, 479–485parameters
compared with variables, 34creating and confi guring project level,
761data types for, 172–173defi ning, 171–172Managed Object Model and, 671overview of, 162–163packages and, 14parameter objects, 677–678project deployment model and, 749referencing in expressions, 181T-SQL setting parameter values, 758–760using as expressions, 191–193
parent packages, 80–81Pareto principle (80/20 rule), 228parsing
ACH fi le package, 854–856bank fi le package, 832–835
partial-cache mode, of Lookup Transformationin cascaded Lookup operations, 227–228defi ned, 202, 474overview of, 220–222
partially blocking transformations, 119Partition Processing Destination, in Data
Flow, 118partitioned fact tables, considerations, 344partitioning
scaling across machines using horizontal, 477–478
staged data as, 475–477passwords, for data protection, 21Patch operation, XML Task, 722paths
path argument, 730path attachment, 593
patterns, analyzing source data for. See data profi ling
PDSA (Plan, Do, Study, and Act), 524Percentage Sampling Transformation, in Data
Flow, 147perfmon, 796performance counters, 796performance metrics, 576Performance Monitor, 518–520performance monitoring
of pipeline, 518–520troubleshooting bottlenecks in Data
Flow, 516–518performance overhead
of data types, 167of database snapshots, 406of Fuzzy Lookup Transformation, 133,
136of Lookup Transformation caching
modes, 217PerformUpgrade method, design-time
methods, 591permissions
catalog, 784folder, 783object, 784
persisted cache, Lookup Transformation, 474persistent fi le storage, Lookup
Transformation, 474pipeline
component types, 586debugging components in, 635–636defi ned, 474execution reports, 506–507execution tree log details, 505–506
packages – pipeline
bindex.indd 910bindex.indd 910 2/27/12 8:37:35 AM2/27/12 8:37:35 AM
911
monitoring Data Flow execution, 503–505
monitoring performance, 518–520overview of, 585–586scaling out memory pressures, 474–475troubleshooting bottlenecks, 516–518
Pipeline Componentsconnection time functionality, 594–595design-time functionality, 589–593methods, 588–589preparing for coding, 596–602run-time functionality, 593–594UI (user interface) and, 643, 645
PipelineComponent base class, 598pivot tables, 147–148Pivot Transformation, in Data Flow, 147–150placeholders, troubleshooting bottlenecks
and, 516Plan, Do, Study, and Act (PDSA), 524PostExecute method
adding runtime methods to components, 594
Transformation Component, 625Pragmatic Works BI xPress, 791precedence, staging environments for, 512Precedence Constraint Editor, 29, 181precedence constraints
advanced, 551basic, 549–551Boolean expressions used with, 551–555combining expressions and multiple
precedence constraints, 556–557Control Flow and, 29–30, 195–196overview of, 8working with multiple, 555–556
predictive queries, Data Mining Query Task, 46–47
PreExecute methodadding runtime methods to components,
593Destination Component, 631–633Transformation Component, 620
prefi x, generating unique fi lename for archiving fi le, 259–260
PrepareForExecute method, adding runtime methods to components, 593
PrimeOutput methodadding runtime methods to components,
594Transformation Component, 620, 625
processing windows, staging environments for, 512
ProcessInput methodadding runtime methods to components,
594Destination Component, 631–632Transformation Component, 620–623, 625
Professional SQL Server Analysis Services 2012 with MDX and DAX (Harinath et al.), 118
profi ling, Data Profi ling Task, 48–50programming custom features. See
customizing SSISProject class, 676–677Project Connection Managers, 822–823project deployment model
catalog logging and, 582deploying projects with, 761managed code and, 676–677overview of, 748–751
Project Portal, 540projects
adding to UI, 645–647Build menu, 752confi guring to use environments,
763–765creating and aligning with solutions,
24–25creating and confi guring project level
parameters, 761creating from Solution Explorer window,
27defi ned, 24deploying. See project deployment modellistings, 688–689Managed Object Model and, 671parameters, 162server deployment project, 679
Pipeline Components – projects
bindex.indd 911bindex.indd 911 2/27/12 8:37:35 AM2/27/12 8:37:35 AM
912
projects (continued)tips related to working with large
projects, 805versioning in SQL Server 2012, 746
propertiescheckpoint fi le, 454confi guring Script Task Editor, 265–266of Dts objects, 266groups vs. containers and, 95setting catalog properties, 744–747of tasks, 41–42using expressions in Connection
Manager, 193–194Properties windows, 28, 171Property Expressions Editor, 41ProvideComponentProperties method
debugging components, 636design-time methods, 589–590Destination Component, 626–627Source Component, 602–603Transformation Component, 615–616
proxy accounts, 789–790
Q
QBE (Query-By-Example) tool, 71queries
catalog logging and, 583–584Data Mining Query Task, 46–47T-SQL querying tables to set parameter
values, 759–760WQL queries, 84–85
Query Optimizer, 816Query-By-Example (QBE) tool, 71Quick Watch window, 310
R
Ragged Right option, in SSIS, 296rational database management systems
(RDBMS)reducing reliance on, 508–510types of database systems, 64
Raw File Destination, in Data Flow, 114, 117
Raw File Sourceconfi guring Cache Connection Manager,
230in Data Flow, 10, 114–115
raw fi lesData Flow restart using, 476–477scaling across machines using, 477–479scaling out by staging data using, 475
RDBMS (rational database management systems)reducing reliance on, 508–510types of database systems, 64
RDBMS Server tasksBulk Insert Task, 64–68Execute SQL Task. See Execute SQL Taskoverview of, 64
ReadOnlyVariables propertyScript Component, 291–292Script Task, 267–268
ReadWriteVariables propertyScript Component, 291–292Script Task, 267–268
Recordset Destination, in Data Flow, 117recordsets, Execute SQL Task, 73–75Redirect Rows to Error Output, Lookup
Transformation, 223–224Redirect Rows to No Match Output, Lookup
Transformationhandling dirty data, 245–246in multiple outputs, 223–225
referencescolumns, 181–182environment, 765parameters, 181variable, 180–181
RegisterEvents method, design-time methods, 591
registration, of assembly, 263registry, package confi guration and, 705–706,
771regular expressions, validating data using,
297–298ReinitializeMetaData method
design-time methods, 590
projects – ReinitializeMetaData method
bindex.indd 912bindex.indd 912 2/27/12 8:37:35 AM2/27/12 8:37:35 AM
913
Destination Component, 629–630Source Component, 609Transformation Component, 617–618
relational engineChange Data Capture. See CDC (Change
Data Capture)data extraction. See data extractiondata loading, 405–411overview of, 375
relational joins. See also joinsoverview of, 203–204using in source, 209–210
relative references, environment references, 765
ReleaseConnections method, 595reliability and scalability
overview of, 453restarting packages for. See package
restartabilityscaling out for. See scaling outsummary, 485using error outputs for, 471–473using package transactions for data
consistency. See package transactionsRendezvous, Tibco, 82Reporting Services, 795reports
All Executions report, 792–794catalog logging, 582–583custom, 795MSF Agile, 538–539options, 794–795performance bottlenecks, 516–518pipeline execution, 506–507
Required property, parameters, 162, 172resources
used by blocking transformations, 497used by non-blocking vs. semi-blocking
transformations, 497restarting packages. See package
restartabilityreusability, caching operations for. See
Lookup TransformationReverse String Transformation
building UI with, 643design time debugging, 635–636operating on user interface columns,
654–655runtime debugging, 637–640
Review Data Type Mapping screen, Import and Export Wizard, 20
root cause analysis, 576Row Count Component, 309Row Count Transformation
capturing total batch items, 840in Data Flow, 124–125
row counters, Performance Monitor, 519–520Row Number Transformation, 478Row Sampling Transformation, 147row-based transformations
optimizing processing with, 515–516overview of, 494–495
Rows Read, performance counters, 796Rows Written, performance counters, 796rules
conditional expressions, 188date/time type conversion, 166–167numeric literals, 178–179
runtimeadding methods to components, 593–594component phases, 588debugging components, 634, 637–640defi ning variables, 170Source Component methods, 611–614Transformation Component methods,
620–625UI connections and, 658–661
S
Save and Execute Package screen, Import and Export Wizard, 20–21
Save SSIS Package Screen, Import and Export Wizard, 21
saving data, to XML fi le, 276–277saving packages, 239scaling out. See also reliability and scalability
architectural features of, 474
relational engine – scaling out
bindex.indd 913bindex.indd 913 2/27/12 8:37:36 AM2/27/12 8:37:36 AM
914
scaling out (continued)memory pressures, 474–475overview of, 473with parallel loading, 479–485by staging data, 475–479
SCD (Slowly Changing Dimension) Transformationcomplex dimension changes with, 331–335considerations and alternatives to, 335–336in Data Flow, 126loading simple dimension table with,
325–327querying CDC in SSIS, 402–403
scheduling packagesoverview of, 787proxy accounts and, 789–790SQL Server Agent for, 787–788
scope, variable, 31–32Script Component
accessing variables in, 291–292adding programmatic code to, 255–256as alternative to SCD for dimension table
data, 336compiled assemblies in, 263–264confi guring Script Component Editor,
289–291connecting to data sources, 292data validation example, 294–302editor, 289–291logging, 293–294overview of, 125–126, 288primary role of, 254raising events in, 292–293script debugging and troubleshooting,
308–310Script Task compared with, 288–289synchronous vs. asynchronous
transformations, 302–305when to use, 255
Script tab, Script Task Editor, 265Script Task
accessing variables in, 267–271adding programmatic code to,
255–256
breakpoints set in, 569checkpoint fi le and, 461–462coin toss example, 552–555compiled assemblies in, 263–264connecting to data sources in, 271–279in Control Flow, 264defi ned, 254Dts object, 266Foreach ADO Enumerator example, 102Hello World example, 257–258logging, 287–288For Loop Container, 95, 96–97overview of, 43–45raising events in, 281–286Script Component compared with,
288–291script debugging and troubleshooting,
308–311setting variables in, 13, 171SSAS cube processing with, 345, 349when to use, 255
Script Task Editor, 265–266scripting
adding code and classes, 259–260custom.NET assemblies for, 261–264debugging and troubleshooting, 308–311getting started, 255Hello World example, 257–258interacting with external applications
and, 719introduction to, 253–254managed assemblies for, 260–261overview of, 253Script Component. See Script ComponentScript Task. See Script Taskselecting scripting language, 255–256structured exception handling, 305–308summary, 311–312VSTA Scripting IDE, 256–257
scrubbing data. See mainframe ETL, with data scrubbing
SDLC (software development life cycle)branching, 546as development methodology, 719
scaling out – SDLC (software development life cycle)
bindex.indd 914bindex.indd 914 2/27/12 8:37:36 AM2/27/12 8:37:36 AM
915
history of, 524iterative approach, 525labeling (striping) source versions, 547–548merging, 546–547MSF Agile and, 537–539overview of, 521–523Project Portal, 540shelving and unshelving, 544–546Subversion (SVN), 526–533summary, 548Team Foundation Server and, 533–536Team System features, 540–542Team System version and source control,
542–544versioning and source code control,
525–526waterfall approach, 524
securitycatalog, 782–785legacy security, 785–787
SEH (structured exception handling), 305–308SELECT * statements
with JOINS, UNIONS and subqueries, 381–382
performing transformations, 380–381problems with, 376–377sorting data, 382–384WHERE clause and, 377–378
SelectSQL variable, 198–199SelectSQL_ExpDateParm variable, 198–199SelectSQL_UserDateParm variable, 198–199semi-blocking transformations
Data Flow design practices, 508–510overview of, 495–496
Send Mail Taskadding, 870overview of, 83–84
Sequence Containeroverview of, 94in single package, multiple transactions,
467tasks, 42
sequence tasks, 42–43serialization, XML object-based, 277–280
service-oriented architectures, Web services and, 55
set-based logic, in data extraction, 389–391SetComponentProperty method, design-time
methods, 591SetUsageType method
design-time methods, 592Destination Component, 630–631Source Component, 608–609Transformation Component, 618–619
shadow tables, SQL Server Agent writing entries to, 394
Shannon, Claude, 524shared methods, 262SharePoint Portal Services, 540shelving/unshelving, source control and,
544–546Shewhart, Dr. Walter, 524shredding recordsets, Execute SQL Task, 73signing assemblies, 262–263Slowly Changing Dimension Transformation.
See SCD (Slowly Changing Dimension) Transformation
SMO (SQL Management Objects)Managed Object Model based on, 671overview of, 87
SMO administration tasksoverview of, 87–88Transfer Database Task, 88–89Transfer Error Messages Task, 89Transfer Job Task, 91Transfer Logins Task, 89–90Transfer Master Stored Procedures Task,
90Transfer SQL Server Objects Task, 91–92
SMTPConnection Manager, 83–84e-mail messages via, 83invoice matching process and, 868values returned by, 595
snapshots, databasecreating, 406–408saving ACH fi le to database, 861saving bank fi le to database, 844–845
security – snapshots, database
bindex.indd 915bindex.indd 915 2/27/12 8:37:37 AM2/27/12 8:37:37 AM
916
Soft NUMA node, in parallel loading, 484–485
software development life cycle. See SDLC (software development life cycle)
Solution Explorercomponents in, 26creating new project, 27executing packages, 25–26, 36OLE DB Connection in, 822
Solution Framework, Microsoft, 525solutions
creating new project in, 27creating projects and aligning with,
24–25defi ned, 24
sort in database, data extraction and, 382–384
Sort Transformationasynchronous transformation outputs
and, 498as blocking transformation, 496–497data fl ow example using, 159InfoPath example, 725loading fact table, 339overview of, 126–127presorting data for Data Mining Model
Training Destination, 118presorting data for Merge Join
Transformation, 212–215presorting data for Merge
Transformation, 144processing bank fi le check and invoice
details, 842sorting data in SQL Server compared
with, 382–384Source adapters
debugging, 634–636installing, 633–634as integral to Data Flow, 500–501overview of, 586–587
Source Assistantaccessing heterogenous data in, 414confi guring source in Data Flow with, 107defi ning Data Flow for packages, 237
Source ComponentAcquireConnections method, 604–606buffer objects and, 614columns and, 613–614ComponentMetaData properties, 603–604ComponentType property and, 598Connection Managers and, 605–606data types, 608–609debugging source adapter, 634–636FileUsageType property, 605–606helper methods, 608installing source adapter, 633–634MapInputColumn/MapOutputColumn
methods, 611overview of source adapter, 586–587ParseTheFileAndAddToBuffer method,
612–613PrimeOutput method, 611–612ProvideComponentProperties method,
602–603querying CDC in SSIS, 404ReinitializeMetaData method, 609SetUsageType method, 608–609types of pipeline components, 586Validate method, 606–609
source controlbranching, 546iterative development and, 525labeling (striping) source versions,
547–548merging, 546–547MSF Agile, 539shelving and unshelving, 544–546Team System and, 542–544tools for, 523versioning and source code control,
525–526Source type, of Script Component
confi guring Script Component Editor, 289–290
connecting to data sources, 292defi ned, 288
sourcesADO.NET Source, 115
Soft NUMA node, in parallel loading – sources
bindex.indd 916bindex.indd 916 2/27/12 8:37:37 AM2/27/12 8:37:37 AM
917
CDC Source, 398–400confi guring destination vs., 115–116connecting in Script Component to, 292connecting in Script Task to, 271–279connectivity, 719in Data Flow, 10–11ETL development and, 523Excel Source, 109–110fl at fi les. See Flat File Sourcefunction of, 106Import and Export Wizard and, 17–18mapping to destinations, 20OLE DB. See OLE DB Sourceoverview of, 106–107permissions. See data profi lingprocessing data from heterogeneous
sources, 800raw fi les. See Raw File SourceTransfer Database Task and, 88–89XML Source, 57–60, 115
space padding, string functions and, 186–187SPC (statistical process control), 524special characters, string literals with, 179–
180Spiral, iterative development, 525SQL (Structured Query Language)
capturing multi-row results, 73–75capturing singleton results, 72–73creating BankBatch table, 813–814creating BankBatchDetail table, 814–815creating corporate ledger data, 815creating CustomerLookup table, 813creating ErrorDetail table, 816creating Invoice table with, 811–812executing batch of statements, 71–72executing parameterized statements,
69–71executing stored procedure, 75–78Management Objects. See SMO (SQL
Management Objects)retrieving output parameters from stored
procedures, 78–80SQL Profi ler
log provider for, 577
package log provider for, 700programming to log providers, 703
SQL ServerAnalysis Services. See SSAS (SQL Server
Analysis Services)authentication, 782Bulk Insert Task, 64–65CDC. See CDC (Change Data Capture)creating central server, 766–768Data Tools. See SSDT (SQL Server Data
Tools)deploying SQL Server 2012, 679–680deployment options, 753destinations, 118editions, 14–15Integration Services. See SSIS (SQL Server
Integration Services), introduction tolog provider for, 577Management Studio. See Management
Studiooperation logs in SQL Server 2012,
703–705package confi guration and, 705–706package log provider for, 700programming to log providers, 703project versioning in SQL Server 2012,
746single package using native transaction
in, 469–471Transfer SQL Server Objects Task,
91–92upgrading components for SQL Server
2012, 641WMI Data Reader Task for gathering
operational type data, 86SQL Server Agent, 393–394, 787–788SQL Server Business Intelligence Edition, 15SQL Server Enterprise Edition, 14–15SQL Server Standard Edition, 15SQLCMD command, in parallel loading,
484SQLMOBILE, 69–71SQLStatement property, Execute SQL Task,
194–195, 470
space padding, string functions and – SQLStatement property, Execute SQL Task
bindex.indd 917bindex.indd 917 2/27/12 8:37:37 AM2/27/12 8:37:37 AM
918
SSAS (SQL Server Analysis Services)cube processing with, 314–315, 345–350Data Mining Query Task, 46–47Execute SQL Task, 45Processing Task, 46
SSDT (SQL Server Data Tools)adding components to, 633–634common task properties, 41–42creating deployment utility, 751–752creating fi rst package, 25–26creating new project, 732data taps, 765–766debugging components, 634locating and opening, 23opening Import and Export Wizard, 18overview of, 4Properties windows, 28runtime debugging, 637–640Solution Explorer window, 26–27solutions and projects in, 24Toolbox items, 27–28
SSIS (SQL Server Integration Services), introduction toarchitecture of, 5containers, 8–9Control Flow, 6Data Flow, 9–10data tools. See SSDT (SQL Server Data
Tools)destinations, 13error handling and logging, 14history of and what’s new, 2Import and Export Wizard, 3overview of, 1–2packages, 5–6parameters, 14precedence constraints, 8sources, 10–11SQL Server editions and, 14–15summary, 15–16tasks, 6–7transformations, 11–12variables, 13–14
SSIS external management. See external management, of SSIS
SSIS interaction with external applications. See external applications, interaction with
SSIS Package Confi guration, 770–773SSIS Package Store, 755–757SSIS tools. See tools, SSISssis_admin role, 782staged data
across machines, 477–479Data Flow design for, 508, 512–513Data Flow restart, 475–477scaling out by, 475
Standardize Zip Code Transformation, 243–244
static methods, 262statistical process control (SPC), 524Stephen’s Visual Basic Programming 24-
Hour Trainer (Stephens), 254steps (phases), in SDLC, 523storage
catalog for, 743of fi les, 806of packages, 683
stored procedurescatalog security and, 785controlling and managing catalog with,
745–747in databases, 748encapsulating common queries in, 384–385executing, 75–78in parallel loading, 481querying CDC, 402–404retrieving output parameters, 78–80Transfer Master Stored Procedures Task,
90T-SQL for package execution, 758for working with batches, 816–819
streaming assertion, 387strings
concatenation (+) operator, 177functions, 185–187literals, 179–180
SSAS (SQL Server Analysis Services) – strings
bindex.indd 918bindex.indd 918 2/27/12 8:37:38 AM2/27/12 8:37:38 AM
919
striping (labeling) source versions, 547–548strong names
GAC (global assembly cache) and, 598–599
signing assembly with, 646–647, 651–652
structured exception handling (SEH), 305–308
Structured Query Language. See SQL (Structured Query Language)
subqueries, in data extraction, 381–382SUBSTRING function, 243Subversion (SVN). See SVN (Subversion)success values, constraints, 8suffi xes, numeric literal, 178–179surrogate keys, in data warehousing, 323,
338SVN (Subversion)
confi guring, 526–527connecting project to, 529–531downloading and installing, 526locking fi les for editing and committing
changes, 531–532overview of, 525–526testing integration with project, 531for version control for packages, 239walkthrough exercise, 527–529
Swap Inputs button, Merge Join Transformation, 215
synchronous processeslimiting in Data Flow design, 509reducing in Data Flow design, 508reducing in data-staging environment,
513tasks in Control Flow, 491
synchronous transformationsidentifying, 493, 500vs. asynchronous, 119, 498–500writing Script components to act as,
302–303SynchronousInputID property, 500System Monitor, 796system variables, 31–34
T
tab-delimited fi les, 110–111tables
creating with Management Studio, 731–732
in databases, 748enabling CDC for, 394package confi guration and, 705, 771T-SQL querying tables to set parameter
values, 759–760table-valued parameters, 389–391Tabular Data Stream (TDS), 72task editors
Bulk Insert Task, 65data profi ling and, 50Expressions tab, 40–41FTP Task Editor, 53overview of, 39Script Task Editor, 265–266
Task Host Container, 93task objects, 40tasks
Analysis Services, 45–46archiving fi les, 52Bulk Insert Task, 64–68comparing Data Flow with Control Flow,
488–490Data Flow Task, 10, 47–48Data Mining Query Task, 46–47data preparation tasks, 48Data Profi ling Task, 48–50defi ned, 39DQS (Data Quality Services), 366ETL tasks, 6–7evaluating, 30Execute Package Task, 80–81Execute Process Task, 81–82Execute SQL Task. See Execute SQL TaskFile System Task, 50–51FTP Task, 53–55grouping in containers, 31logging, 576–577looping and sequence tasks, 42–43Message Queue Task, 82–83
striping (labeling) source versions – tasks
bindex.indd 919bindex.indd 919 2/27/12 8:37:38 AM2/27/12 8:37:38 AM
920
tasks (continued)opening for editing, 237overview of, 39precedence constraints controlling, 550properties of, 41–42RDBMS Server tasks, 64Script Task, 43–45Send Mail Task, 83–84SMO administration tasks, 87–88summary, 92Task Editor, 40–41Transfer Database Task, 88–89Transfer Error Messages Task, 89Transfer Job Task, 91Transfer Logins Task, 89–90Transfer Master Stored Procedures Task,
90Transfer SQL Server Objects Task, 91–92Web Service Task, 55–60WMI Data Reader Task, 84–86WMI Event Watcher Task, 86–87work fl ow tasks, 80working with multiple precedence
constraints, 555–556XML Task, 60–64
TDS (Tabular Data Stream), 72team builds, MSF Agile, 539Team Foundation Server (Microsoft), 526,
533–536team preparation, ETL and, 522–523Team Project, setting up, 534–536Team System. See VSTS (Visual Studio Team
System)Term Extraction Transformation
Advanced tab, 154–156in Data Flow, 152–156Exclusion tab, 154overview of, 152–153Term Extraction Transformation Editor,
153–154Term Frequency and Inverse Document
Frequency (TFIDF) score, 152–156Term Lookup Transformation, 156–157testing
case study packages, 866data fl ows during development with
Union All Transformation, 210database snapshot functionality, 407expressions with Expression Builder, 175external applications, 720Immediate window for, 311UI component, 667
textcomma-delimited fi le requirement, 110–
111Derived Column for advanced data
cleansing, 355–357Term Extraction Transformation, 152–
156Term Lookup Transformation, 156–157
text fi lesdata extraction, 385–389log provider for, 577, 700MERGE operator and reading from, 411programming to log providers, 703
TFIDF (Term Frequency and Inverse Document Frequency) score, 152–156
third-party solutionsChange Data Capture, 392trash destinations for testing purposes,
210threads
monitoring Data Flow execution, 502optimizing package processing, 513–515
Tibco Rendezvous, 82time
data types, 166functions, 188–190
Toolboxadding components to, 633–634working with, 27–28
tools, SSISannotations, 31Connection Managers, 31Control Flow, 29–31creating fi rst package, 25–26Data Flow, 34event handlers, 34–35
tasks – tools, SSIS
bindex.indd 920bindex.indd 920 2/27/12 8:37:38 AM2/27/12 8:37:38 AM
921
executing packages, 36Import and Export Wizard and, 17–22Management Studio, 36–37overview of, 17Package Designer, 28Package Explorer, 35–36parameters, 34Properties windows, 28Solution Explorer window, 26–27SSDT (SQL Server Data Tools), 23–25summary, 37task groups, 31Toolbox items, 27–28variables, 31–34
TransactionOption propertypossible settings for, 464in single package, multiple transactions,
467in single package, single transaction,
465–466in two packages, one transaction, 468–
469transactions, package. See package
transactionsTransfer Database Task, 88–89Transfer Error Messages Task, 89Transfer Job Task, 91Transfer Logins Task, 89–90Transfer Master Stored Procedures Task, 90Transfer SQL Server Objects Task, 91–92Transformation Component
building, 614ComponentType property and, 598confi guring Script Component Editor,
289–290debugging, 634–636defi ned, 288error handling, 623–624input/output verifi cation methods, 619–
620installing, 633–634overview of, 587PostExecute method, 625PreExecute method, 620
PrimeOutput method, 620, 625ProcessInput method, 620–623, 625ProvideComponentProperties method,
615–616ReinitializeMetaData method, 617–618SetUsageType method, 618–619types of pipeline components, 586Validate method, 616–617
transformationsAggregate Transformation. See Aggregate
Transformationasynchronous outputs, 498–499Audit Transformation, 128–129, 248blocking, 496–497Cache Transformation, 124, 229–230Character Map Transformation. See
Character Map TransformationConditional Split Transformation. See
Conditional Split TransformationCopy Column Transformation, 130Data Conversion Transformation. See
Data Conversion Transformationin Data Flow, 11–12, 47Data Flow and Control Flow comparison,
488–490Data Flow design for correlation and
integration, 510–511Data Flow design for data cleansing,
511–512Data Flow restart using, 476Data Mining Query Transformation,
130–131Data Quality Services (DQS) Cleansing
Transformation, 131Derived Column Transformation. See
Derived Column TransformationDTS (Data Transformation Services). See
DTS (Data Transformation Services)example, 98–99Export Column Transformation. See
Export Column Transformationfunction of, 106Fuzzy Grouping Transformation. See
Fuzzy Grouping Transformation
TransactionOption property – transformations
bindex.indd 921bindex.indd 921 2/27/12 8:37:39 AM2/27/12 8:37:39 AM
922
transformations (continued)Fuzzy Lookup Transformation. See Fuzzy
Lookup TransformationImport Column Transformation. See
Import Column TransformationInfoPath example, 723–726Lookup Transformation. See Lookup
TransformationMerge Join Transformation. See Merge
Join TransformationMerge Transformation. See Merge
TransformationMulticast Transformation, 145, 516–518non-blocking (streaming and row-based),
493–495OLE DB Command Transformation. See
OLE DB Command Transformationoverview of, 119Percentage Sampling and Row Sampling
Transformations, 147Pivot Transformation, 147–150Reverse String Transformation. See
Reverse String TransformationRow Count Transformation, 124–125,
840Row Number Transformation, 478SCD (Slowly Changing Dimension)
Transformation. See SCD (Slowly Changing Dimension) Transformation
Script Component and, 125–126semi-blocking, 495–496Sort Transformation. See Sort
TransformationSource and Destination adapters, 500–
501Standardize Zip Code Transformation,
243–244synchronous outputs, 499–500synchronous vs, asynchronous, 119synchronous vs, asynchronous
transformations, 294–302Term Extraction Transformation. See
Term Extraction TransformationTerm Lookup Transformation, 156–157
troubleshooting bottlenecks in Data Flow, 517
types of, 493–497Union All Transformation. See Union All
TransformationUnpivot Transformation, 150–152when to use during data extraction,
378–381XSLT (Extensible Stylesheet Language
Transformations), 61, 722trash destinations, testing data fl ow in
development with, 210triggers, adding for Change Data Capture,
391troubleshooting performance bottlenecks,
516–518True
Boolean expressions and, 182–183Boolean literals and, 180in conditional expressions, 187–188
truncation, during casting, 170Try/Catch/Finally structure, in Visual Basic
or C#, 305–308T-SQL
aggregating data, 119–120confi guring projects to use environments,
763–764controlling environments with, 745DMX (Data Mining Extension) to, 47expression functions vs. functions in, 175managing security, 785for package execution, 736–741, 757–758querying tables to set parameter values,
759–760setting environments with, 762setting parameter values with, 758–759
U
UI (user interface)adding project to, 645–647building form for, 653column display in, 654–657column properties, 665–667
transformations – UI (user interface)
bindex.indd 922bindex.indd 922 2/27/12 8:37:39 AM2/27/12 8:37:39 AM
923
column selection, 657–658component-level properties, 661–663design-time functionality and, 589Expression Builder, 175extending, 658handling errors and warnings, 663–665implementing IDtsComponentUI
interface, 647–651managing security with GUI, 783–784modifying form constructor, 653–654overview of, 643runtime connections, 658–661setting UITypeName property, 651–653steps in building, 644summary, 667
UITypeName property, 644, 651–653unchanged output, SCD Transformation, 335Ungroup command, 95Unicode
conversion issues, 167–169string functions in, 186–187
UNION
in data extraction, 381–382set-based logic for extraction, 389–391
Union All Transformationadding to Lookup Transformation,
225–226as asynchronous transformation, 500in Data Flow, 127–128data preparation for complex dimension
table, 330in parallel loading, 483querying CDC in SSIS, 405sending cleansed data back into main
data path with, 246testing data fl ow in development with,
210testing data fl ow with Fuzzy Lookup, 359testing Lookup Transformation, 217testing Merge Join Transformation, 215
Unpivot Transformation, in Data Flow, 150–152
UPDATE statements, 408–411updates
capture instance tables in CDC, 396complex dimension changes with SCD,
333–335limitations of SCD, 336loading simple dimension table, 327
upgrading components, to SQL Server 2012, 641
usability, UI design principles, 667user interface. See UI (user interface)user variables, 32
V
Validate methoddesign-time methods, 590Destination Component, 628–629Source Component, 606–609Transformation Component, 616–617
Validate operation, XML Task, 722validation
ACH fi le package, 853–854, 856–859bank fi le package, 832, 835–839of data using Script Component,
294–302staged data in, 513timeout, 747of XML fi le, 62–64
Variable Mappings tabForeach ADO Enumerator example,
101–102Foreach File Enumerator, 98–99
VariableDispenser object, Script Task, 267variables
accessing in Script Component, 291–292
accessing in Script Task, 267–271ACH fi le package, 845–846adding to checkpoint fi le, 461data types for, 172–173defi ning, 170–171displaying list of, 32as expressions, 191–193Immediate window for changing value of,
311
UITypeName property – variables
bindex.indd 923bindex.indd 923 2/27/12 8:37:39 AM2/27/12 8:37:39 AM
924
variables (continued)matching process package, 867NULL values and, 183–184options for setting, 13–14overview of, 162package confi guration and, 706, 771referencing in expressions, 180–181retrieving data from database into,
272–274scope, 31–32setting up for bank fi le load package,
819–823setting variable values in environments,
761–762types of, 31
Variables collection, Script Task, 267–268VB (Visual Basic)
Hello World example, 257–258overview of, 254Script Task accessing VB libraries, 43–44selecting as scripting language, 255–256using VSTA scripting IDE, 255–256
verifi cation methods, Pipeline Components, 589
version control. See also source controlpackages, 239project versioning in SQL Server 2012,
746source code control and, 525–526Team System and, 542–544
Visual C#, creating Windows application project, 734–736
Visual Studio32-bit runtime executables in 64-bit
mode, 416–41764-bit issues, 791creating Visual C# Windows application
project, 734–736source control and, 526SSDT (SQL Server Data Tools) and, 4, 22Team System. See VSTS (Visual Studio
Team System)Tools for Applications. See VSTA (Visual
Studio Tools for Applications)
Visual Studio Team Explorer 2010, 533VSTA (Visual Studio Tools for Applications)
accessing with Script Task, 43Hello World example, 257–258Script Task and Script Component using,
254using managed assemblies for
development purposes, 260using scripting IDE, 255–256
VSTS (Visual Studio Team System)features, 540–542source control and collaboration and, 522Team Foundation Server and, 533–536version and source control, 542–544
W
warningspackages and, 247user interface and, 663–665
watch windowsscript debugging using, 310viewing debugging with, 571
waterfall methodology, in SDLC, 524Web Service Task
General tab, 56Input tab, 56Output tab, 57overview of, 55retrieving data from XML source, 57–60
Web Services Description Language (WSDL), 55
Web Services, XML and, 414, 431–442WHERE clause, in data extraction, 377–378Windows Authentication
credentials and, 18securing catalog and, 782
Windows clusters, 768–769Windows Forms
for displaying user interface, 647steps in building UI (user interface), 644
Windows Management Instrumentation. See WMI (Windows Management Instrumentation)
variables – Windows Management Instrumentation
bindex.indd 924bindex.indd 924 2/27/12 8:37:40 AM2/27/12 8:37:40 AM
925
Windows OSsincreasing memory in 32-bit OS, 474log providers for Windows events, 577,
701Winform application, for dynamic property
assignment, 731–736WMI (Windows Management
Instrumentation)Connection Managers, 84overview of, 709values returned by Connection Manager,
595WMI Data Reader Task
example, 710–715explained, 709–710overview of, 84–86
WMI Event Watcher Taskexample, 716–718explained, 715overview of, 86polling a directory for fi le delivery, 86–87
work fl owsExecute Package Task, 80–81Execute Process Task, 81–82handling with Control Flow, 491Message Queue Task, 82–83overview of, 80Send Mail Task, 83–84WMI Data Reader Task, 84–86WMI Event Watcher Task, 86–87
WQL queries, 84–85wrapper classes, user interface and, 653–654WSDL (Web Services Description Language),
55
X-Y
XML (Extensible Markup Language)retrieving data from XML source, 57–60
retrieving XML-based result sets using Web service, 55
sources in Data Flow, 11validating XML fi le, 62–64Web Services and, 414, 431–442
XML Diffgram, 62XML fi les
in case study solution architecture, 803log provider for, 577, 701package confi guration and, 705–706, 771programming to log providers, 703retrieval of fi le size, 848–850saving data to, 276–277serializing data to, 277–280storing log information in, 576
XML Path Language (XPATH), 61, 722XML Schema Defi nition (XSD), 61, 115XML Source, in Data Flow, 115XML Task
confi guring, 720–721InfoPath document consumed by,
723–726operation options, 61OperationType options, 61–62overview of, 60validating XML fi le, 62–64
XMLA code, 348–349XPATH (XML Path Language), 61, 722XSD (XML Schema Defi nition), 61, 115XSLT (Extensible Stylesheet Language
Transformations), 61, 722
Z
zip codes, handling dirty data, 243
Windows OSs – zip codes, handling dirty data
bindex.indd 925bindex.indd 925 2/27/12 8:37:40 AM2/27/12 8:37:40 AM