Level 2 Stairway to Integration Services

download Level 2 Stairway to Integration Services

of 25

Transcript of Level 2 Stairway to Integration Services

  • 8/13/2019 Level 2 Stairway to Integration Services

    1/25

    1

    The SSIS Data Pump - Level 2 of the Stairway to IntegrationServices

    By Andy Leonard, 2013/01/04 (first published: 2011/02/17)

    The Series

    This article is part of the Stairway Series: Stairway to Integration Services

    Integration Services is one of the most popular subsystems in SQL Server. In allows you to Extract,Transform, and Load (ETL) data between a variety of data sources and programmatically change datain any manner you can think of and script in C#.

    In our last installment (Level 1 BIDS) you learned your way around Business IntelligenceDevelopment Studio and you are now itching to build an SSIS package that moves real data!

    Introduction

    SQL Server Integration Services was built to move data. The Data Flow Task provides thisfunctionality. For this reason, when introducing people to SSIS, I like to start with the Data Flow Task.

    The Basics of a Data Flow Task

    I believe it helps to start with the basics of an SSIS Data Flow Task:

    Figure 1

    In Figure 1 we see a very simple Data Flow Task: Data is read into the Data Flow task from adatabase via an OLE DB Source Adapter and written out to a database via an OLE DB DestinationAdapter. The Source and Destination Adapters interact with databases and other data stores throughConnection Managers. For now lets focus on the internal operations of the Data Flow Task.

    There are three objects in Figure 1:

    1. OLE DB Source Adapter2. OLE DB Destination Adapter3. A Data Flow Path (the green arrow) connecting the OLE DB Source and OLE DB Destination

    Adapters

    Connection Managers and the OLE DB Source Adapter

    The OLE DB Source Adapter connects to a Connection Manager. Connection Managers live in a smalltab located at the bottom of the Data Flow editor:

  • 8/13/2019 Level 2 Stairway to Integration Services

    2/25

    2

    Figure 2

    The important thing to know about Connection Managers is this: Connection Managers are a bridgebetween the SSIS package and external data sources. The Connection Manager handles things likeserver name and database instance (if applicable), database name, and credentials. The OLE DBSource Adapter handles things like tables and columns.

    The OLE DB Source Adapter executes a query against the server / database configured in theConnection Manager, in the context of the credentials supplied in the Connection Manager. A coupleimportant notes:

    First, SSIS will neverstore or save decrypted password fields. If your connection requires a password,and you check the Save My Password (or Save Password, depending on the version of SSIS yourerunning) checkbox, and your SSIS package is encrypted.

    SSIS willstore your password internally and encrypted if you use Windows Authentication, theConnection Manager will connect to the database in the context of the user who executes thepackage.

    The OLE DB Destination Adapter

    The OLE DB Destination Adapter also serves as an interface (or bridge) between the Data Flow Taskand a Connection Manager. Rows flow into the OLE DB Destination Adapter and are written to thedestination table specified in the OLE DB Destination Editor.

    The OLE DB Source Adapter brings data into the Data Flow Task; the OLE DB Destination Adapterwrites data out of the Data Flow Task.

    A Data Pipeline

    Data Flow Paths connect the Source Adapters, Transformations, and Destination Adapters inside aData Flow Task. There are no transformations in the simple Data Flow Task in Figure 1 so the DataFlow Path pipes all the data from the OLE DB Source Adapter into the OLE DB Destination Adapter.Well look at transformations in the Data Flow in a later installment.

    All components in a Data Flow Task operate on rows of data. A row is the basic unit. Rows aregrouped into buffers, and buffers are used to move rows through a data pipeline. Its called a pipelinebecause rows flow in, then through, and then out of the Data Flow Task.

    Data is read into the OLE DB Source Adapter in chunks. A single chunk of data fills one buffer. Abuffer of data is processed before the data moves downstream.

    Play Along At Home!

    Open (or create) the SSIS project named My_First_SSIS_Project from What is SSIS? Level 1 of the

    Stairway to Integration Services.From the Control Flow toolbox, drag a Data Flow Task onto the Control Flow canvas.

    Figure 3

  • 8/13/2019 Level 2 Stairway to Integration Services

    3/25

    3

    The Data Flow Task is a Control Flow task, which is also important. Why? Because when you open theData Flow tab, you see a lot of similarities between the Control Flow and Data Flow Task. Forexample, theres a toolbox. And green and red arrow connectors. Its easy to get confused, especiallywhen youre first starting to use SQL Server Integration Services.

    Right-click the Data Flow Task and click Edit:

    Figure 4

    This opens the Data Flow tab in BIDS which is the Data Flow Task Editor:

    Figure 5

    Drag an OLE DB Source adapter from the Data Flow Task toolbox onto the canvas:

    Figure 6

  • 8/13/2019 Level 2 Stairway to Integration Services

    4/25

    4

    When you first add an OLE DB Source to a Data Flow Task, the component displays with an error icon(the red circle with a white X). You can get more information about the error by viewing the Error List.To open the Error List, click the View dropdown menu in BIDS (Visual Studio), and then click Error List:

    Figure 7

    Note you can also hold the Ctrl key and while holding the Ctrl down press the \ and E keys todisplay the Error List. I use these keyboard shortcuts often, but thats merely a style choice interactwith BIDS in the way that makes you most comfortable.

    SSIS validates the things you and I do automatically and as we do them. The Error List containsdetailed information about SSIS errors and warnings. As you can see from the screenshot below, SSISfinds two problems with the current package:

    Figure 8

    Closer examination reveals a single problem here: Theres no Connection Manager assigned to theOLE DB Source adapter. Why two errors? Error #2 is the actual problem. There is no ConnectionManager assigned to the OLE DB Source. Error #1 is caused by Error #2. The clue is in the textimmediately following Validation error. Error #2 indicates an error with Data Flow Task OLE DBSource while error #1 indicates an error with Data Flow Task: Data Flow Task. The issue with theOLE DB Source (the fact theres no Connection Manager currently assigned) is causing a validationerror in the Data Flow Task.

    Lets fix it. You can open the OLE DB Source Editor by right-clicking and selecting Edit or bydouble-clicking the OLE DB Source adapter. Ive learned I sometimes double-click too slow and entera text-edit mode for renaming the component. When I do, the component looks like this:

  • 8/13/2019 Level 2 Stairway to Integration Services

    5/25

    5

    Figure 9

    This is great if I want to quickly rename the component, but often what I really want is to open theeditor. My trick for working around this is I double-click the icon and not the text portion of thecomponent. This opens the editor:

    Figure 10

    If I have previously defined an OLE DB Connection Manager in the project, I select it from the OLE DBConnection Manager dropdown. In this case, we have no Connection Managers defined. You can stophere, return to the Connection Manager window located beneath the Data Flow Task canvas, andconfigure a new OLE DB Connection Manager. But dont do that. Instead, click the New button to the

    right of the OLE DB Connection Manager dropdown.

    This actually creates a new OLE DB Connection Manager and opens the Configure OLE DBConnection Manager editor in one step:

  • 8/13/2019 Level 2 Stairway to Integration Services

    6/25

    6

    Figure 11

    You can see the new Connection Manager in the screenshot above, along with the Configure OLE DBConnection Manager editor. This is a time-saving feature when developing SSIS packages, and I use itregularly.

    As you develop SSIS packages on a server, you create connections. The connections you create arestored in your profile and will appear in the Data Connections list in the Configure OLE DB ConnectionManager list. If this is the first SSIS package youve built, your Data Connections list will be empty:

  • 8/13/2019 Level 2 Stairway to Integration Services

    7/25

    7

    Figure 12Click the New button here to define a new data connection using the Connection Manager Editor.Earlier I wrote the connection managers handle things like database engine instance and databasename. This is where we configure those items:

  • 8/13/2019 Level 2 Stairway to Integration Services

    8/25

    8

    Figure 13

    The server name can be typed into or selected from the Server Name dropdown. Once the servername is configured, SSIS actually connects to the server and retrieves a list of databases. You canenter or select the database in the Select or enter a database name dropdown. In this case, Iconnected to the (local) instance of SQL Server and selected the AdventureWorks database.

    Once configured, you can test the connection with the Test Connection button:

    Figure 14

    Click the OK button to close the Test dialog. Click the OK button to close the Connection ManagerEditor. Click the OK button to close the Configure OLE DB Connection Manager window. This returnsyou to the OLE DB Source Editor, which now appears as shown here:

  • 8/13/2019 Level 2 Stairway to Integration Services

    9/25

    9

    Figure 15

    Note the warning message in the lower pane of the OLE DB Source Editor when the Table or View

    Data Access Mode is selected and no table or view name is specified: Select a table of view from thelist.

    The next decision is: How do we bring in the data? The Data Access Mode property configures this forthe OLE DB Source adapter. The options are shown here:

    Figure 16

    Table or view allows you to select a table or view as the data source from the database configured inthe Connection Manager. This is the default option and if you elect to acquire data from the sourcedatabase using this option, you simply select the table or view name from the Name of the table orview dropdown below the Data Access Mode dropdown:

  • 8/13/2019 Level 2 Stairway to Integration Services

    10/25

    10

    Figure 17

    You can also store the name of the table or view in an SSIS variable. This provides some flexibility by

    allowing dynamic source tables. One thing to keep in mind, however, is OLE DB Source adapters arecoupled to the Data Flow Task at design time. This means you can change the name of the table, butnot the names of the columns or their data types. There are use cases for utilizing this option, butbelieving you can write a single data flow to load tables with different schemas is a commonmisunderstanding.

    The next set of Data Access Modes allows you to enter SQL statements to query the source:

    Figure 18

    SQL Command and SQL Command From Variable provide a means to query the database configuredin the Connection Manager dropdown, returning only the columns you desire from a single table / viewor columns from multiple joined tables / views. When you select this option (and please do), the Nameof the table or view dropdown is replaced by a SQL Command Text textbox:

  • 8/13/2019 Level 2 Stairway to Integration Services

    11/25

    11

    Figure 19

    The warning displayed in the lower pane of the OLE DB Source Editor until you begin typing isCompose an SQL query. Lets do that. Enter the following SQL statement:

    SelectFirstName,MiddleName,LastName,EmailAddress

    FromPerson.Contact

    The OLE DB Source Editor will now appear as shown:

  • 8/13/2019 Level 2 Stairway to Integration Services

    12/25

    12

    Figure 20

    The warning disappeared with the first character you typed into the SQL Command Text textbox.

    You can also build an SQL query visually if you prefer. Click the Build Query button to display theQuery Builder utility:

  • 8/13/2019 Level 2 Stairway to Integration Services

    13/25

    13

    Figure 21The Query Builder provides a powerful and flexible Access-like visual interface. You can add tablesby right-clicking the top pane and clicking Add Table, then graphically define relationships between thetables, and then select the fields you wish to include in your SQL statement. You can graphicallydefine criteria and specify sorting as well:

  • 8/13/2019 Level 2 Stairway to Integration Services

    14/25

    14

    Figure 22

    The SQL statement is generated and displayed in the SQL pane. If you enter SQL into this pane thegraphics are updated to reflect the statement. As I said previously; this is very powerful and flexible.

    If you have the Query Builder open now in our SSIS project, click the Cancel button to close it weresticking with the SQL we entered earlier. In the OLE DB Source Editor, click the Columns page. Asshown below, the Columns page displays the columns provided by the Data Access Mode and dataselected on the Connection Manager page:

  • 8/13/2019 Level 2 Stairway to Integration Services

    15/25

    15

    Figure 23

    The Available External Columns table lists the columns from the data source. The External Columncolumn in the grid beneath reflects the selected columns. If I deselect MiddleName, the editor appearsthus:

  • 8/13/2019 Level 2 Stairway to Integration Services

    16/25

    16

    Figure 24

    What Ive done is this: Ive told the OLE DB Source adapter that I want it to read the MiddleNamecolumn into the Data Flow Task when it executes, but I do not want to expose the MiddleName field toany subsequent Data Flow Task transformations or destinations. The MiddleName column is stillavailable if I wish to select it for output from the OLE DB Source adapter, but I cannot use itdownstream. This is wasteful and SSIS will provide Warnings during and after execution to this effect.

    Why is it wasteful? Im loading data that I will never use. Its therefore a best practice to only load SSISdata into a Data Flow Task that you plan to use in the following transformations and/or destinations. If Itruly do not want to include MiddleName in this Data Flow Task, I can simply remove it from the SQLstatement on the Connection Manager page.

    I do want to use MiddleName, so lets check the checkbox beside it before moving forward.

    The External Column column in the grid beneath the Available External Columns table is a dropdownthat only allows us to select columns from the Available External Columns list:

    Figure 25

    The Output Column grid column allows me to enter an alias for that data column in the remainder ofthe Data Flow Task. In the example shown below, Im aliasing the EmailAddress column as Email:

    Figure 26

    I could also do this in the SQL statement by changing the statement to read:

    SelectFirstName,MiddleName,LastName,EmailAddress AsEmail

    FromPerson.Contact

  • 8/13/2019 Level 2 Stairway to Integration Services

    17/25

    17

    For our first OLE DB Source adapter configuration in our first Data Flow Task, were going to stophere. Click the OK button to close the OLE DB Source Editor.

    Drag an OLE DB Destination adapter onto the Data Flow Task canvas:

    Figure 27

    As with the OLE DB Source, the OLE DB Destination arrives with a Connection Manager notconfigured error. We know how to fix this though double-click the OLE DB Destination to open theeditor. Except instead, we get this:

    Figure 28

    If we think about it, this error makes sense. We are configuring a Destination component and wevegiven it no inputs to write to a destination. I can hear you thinking How do we provide inputs to write toa destination, Andy? Im so glad you asked! First, click the No button to dismiss the Warning dialog.

    Click the OLE DB Source adapter. See that green arrow hanging off the bottom? Click that and drag itonto the OLE DB Destination adapter, then let go. Itll look like this:

  • 8/13/2019 Level 2 Stairway to Integration Services

    18/25

  • 8/13/2019 Level 2 Stairway to Integration Services

    19/25

    19

    Figure 31

    Thats better. You can copy that onto a slide for a presentation!

    Lets take a look at this Data Flow Path we just connected. Right-click the Data Flow Path and clickEdit to open the Data Flow Path Editor:

    Figure 32

    The Data Flow Path Editor has a Metadata page. It displays the contents of the data pipelineconnecting the Data Flow components:

    Figure 33The Data Flow Path metadata displayed here looks similar to a table definition. You can see the nameof the column, its data type, and several properties such as the length, precision, scale, and codepage. These properties describe the data flowing through the Data Flow Task, and this is a peek underthe hood. Data flows along this Data Flow Path from the OLE DB Source to the OLE DB Destination.

    Lets rename the OLE DB Destination. Right-click it and click Rename, then change the name toContact. Double-click the Contact OLE DB Destination adapter to open the editor:

  • 8/13/2019 Level 2 Stairway to Integration Services

    20/25

    20

    Figure 34

    When the OLE DB Destination Editor opens, the OLE DB Connection Manager property defaults to thefirst available OLE DB Connection Manager. In this case, its the (local).AdventureWorks connectionmanager. As with the OLE DB Source adapter, the New button allows you to create a new OLE DB

    Connection Manager. Well stick with AdventureWorks.The Data Access Mode property defaults to Table or view fast load. The other options are shown

    here:

    Figure 35

    The two most common selections are Table or view fast load and Table or view. The majordifference is the fast load option can execute a BULK INSERT statement, while the non-fast loadoption (Table or view) is limited to generating INSERT INTO statements.

    If the target table or view exists, you can select it from the Name of the table or view dropdown. If the

    table or view doesnt exist you can create the table using the New button beside the Name of the tableor view dropdown. Thisis super cool and I use it all the time. Click the New button to create a newtable or view:

  • 8/13/2019 Level 2 Stairway to Integration Services

    21/25

    21

    Figure 36

    Lets pick this apart. First, theres a table name where did that come from? Oh right, we renamed theOLE DB Destination adapter Contact earlier. Its pretty clever that the name of the component is usedin the CREATE TABLE statement, dont you think? I do. How about the columns? Where did theycome from? Oh thats right remember the Data Flow Path metadata? But those columns all had thesame data type: DT_WSTR. And this statement creates the columns as nvarchar(50). It turns outDT_WSTR is the SSIS equivalent data type of SQL Servers nvarchar data type. Its a Unicode stringdata type. The OLE DB Destination adapter is smart enough to recognize this and build the CREATETABLE statement accordingly. The column lengths even came from the Data Flow Path metadata.How cool is that?

    Click the OK button. When you clicked the OK button, the CREATE TABLE statement was executedand the table created. It now shows up in the OLE DB Destination Editor:

  • 8/13/2019 Level 2 Stairway to Integration Services

    22/25

    22

    Figure 37

    Note the warning in the lower pane: Map the columns on the Mappings page. Click on the Mappingspage and observe what happens:

  • 8/13/2019 Level 2 Stairway to Integration Services

    23/25

    23

    Figure 38

    Again, some pretty cool stuff happens here. The tables in the upper right portion contain AvailableInput Columns and Available Destination Columns. Heres a better screenshot of that portion of theeditor:

    Figure 39

    The Available Input Columns are the columns flowing into the OLE DB Destination from the Data FlowPath. The Available Destination Columns are the columns available in the new Contact table we justcreated in the destination database. Since the column names and data types are the same, the OLEDB Destination Editor auto-maps them. And theyre the same because the Available DestinationColumns were built from the metadata for the Available Input Columns. Cool.

    Click the OK button to close the OLE DB Destination Editor.

  • 8/13/2019 Level 2 Stairway to Integration Services

    24/25

    24

    Time to Test!

    To test the SSIS package weve just constructed, click the Start Debugging button:

    Figure 40

    Alternately, you can click the DebugStart Debugging dropdown menu item or press the F5 key:

    Figure 41

    Note the asterisk (*) next to the name of Package.dtsx above. This indicates the package has changedsince it was last saved. I can hear you thinking Gosh Andy, shouldnt we have saved this first beforeexecuting the debugger? Im glad you asked! BIDS saves the package each time you start debugging

    how cool is that? If something tragic happens (your laptop battery dies, BIDS locks up, etc.) yourwork is still saved. Neat.

    If youve followed this tutorial closely, you will see green boxes:

    Figure 42

    Happiness is green boxes!

    Conclusion

    Wow. This is a long article. But along the way weve discussed a lot of features of the SQL ServerIntegration Services Data Flow Task. In my opinion, the Data Flow Task is the most important task to

    master when learning SSIS.In this article, we:

    Discussed the relationship between Connection Managers and adapters (OLE DB Source andOLE DB Destination)

    Examined the role and some features of Data Flow Paths that connect Data Flow Taskcomponents to each and represent the "data flow pipeline"

    Learned about design-time validation and how to view Errors and Warnings.

  • 8/13/2019 Level 2 Stairway to Integration Services

    25/25

    Studied some cool time-saving features of the Adapter Editors - you have to love those Newbuttons!

    Looked at formatting components on the Data Flow canvas so they're aligned. Took a peek under the hood of the Data Flow Task, viewing the Data Flow Path metadata. Learned how this metadata can be used by the OLE DB Destination to build a CREATE

    TABLE statement.

    That was a lot! Thanks for hanging in here with me. Were not done with the SSIS Data Flow Task. Inthe next installment, well dig into transformations and their uses.

    :{>

    This article is part of the Stairway to Integration ServicesStairway