Graph Overview

download Graph Overview

of 3

Transcript of Graph Overview

  • 7/29/2019 Graph Overview

    1/3

    Graph :

    A graph is a data flow diagram that defines the various processing stages ofa task and the streams of data as they move from one stage to another. In a graph a component represents a stage and a flow represents a data stream. In addition, there are parameters for specifying various aspects of graph behavior

    Build a graph in the GDE by dragging and dropping components, connecting them with flows, and then defining values for parameters. run, debug, and tune your graph in the GDE. In the process of building a graph, are developing anAb Initio application, and thus graph development is referred to as graph programming. When you are ready to deploy your application, you save the graph as a script that you can run from the command line.A sample Graph shown aboveGraph Parameters

    A graph is completely defined by the totality of its parameter values. A parameter is a name-value pair, with a number of additional associated attributes thatdescribe when and how to interpret or resolve the value. Parameters are used tochange the behavior of graphs and projects (groups of graphs in a sandbox) in acontrolled and uniform way.

    The sample graph is shown below the input file and output file is a database component .The details about component is explained in the following section. Input file and partition by round robin is conne

    cted by flows .flow is the path of data passes in the graph .

    Parts of a Graph

    Metadata: Metadata is any information about data or how to process it. There aretwo broad categories of metadata:

    1. Technical metadata Metadata associated with graphs. This includes the information needed to build a graph, such as record formats, key specifier, and transform functions; as well as graphs themselves, tracking information from the running of graphs, job histories, versioning, and so on. You can store technicalmetadata as part of a graph , in a file, or in a data store in the EME

    2. Enterprise metadata Metadata associated with the business using Ab Initio software. This includes user-defined documentation of job functions, roles, categories, and so on.

    Dataset: It is one of the components it contains the table or files which hold the input and output file

    Component: it is used to build a graph .the component organizer contains all the components

    Flows: Flow is used to connect the two components

    Layout:

    1. Layout determines the location of a resource.

    2. A layout is either serial or parallel.

    3. A serial layout specifies one node and one directory.

    4. A parallel layout specifies multiple nodes and multiple directories. It is permissible for the same node to be repeated.

  • 7/29/2019 Graph Overview

    2/3

    5. The location of a Data set is one or more places on one or more disks.

    6.The location of a computing component is one or more directories on one or more nodes. By default, the node and directory is unknown.

    7.Computing components propagate their layouts from neighbors, unless specifically given a layout by the user

    Phase of the Graph

    Phase are used to break up a graph into blocks for performance tuning. Theprimary purpose of phasing is performance tuning by managing resources. Phasinglimits the number of simultaneous processes by breaking up the graph into different phases, only one of which is running at any given time. One common use of phasing is to avoid deadlocks. The temporary files created because of phase breaks are deleted at the end of the phase regardless of whether the run was successful or not

    Check point :

    1 .Checkpoints are used for the purpose of recovery.

    2 .The main aim of checkpoints is to provide the means to restart a failed graphfrom some intermediate state.

    3 .In this case, the temporary files from the last successful checkpoint are retained so that the graph can be restarted from this point in the event of a failure. Only as each new checkpoint is completed successfully are the temporary files corresponding to the previous checkpoint deleted.

    Graph Runtime Behavior:

    1 .The graph execution can be done from the GDE itself or from the back-end as well

    2 .A graph can be deployed to the back-end server as a Unix shell script or Windows NT batch file.

    3 .The deployed shell or the batch file can be executed at the back-end

    Sandbox:

    A sandbox is a user personnel work place . A sandbox is a collection of graphs and related files that are stored in a single directory tree, and treated as a group for purposes of version control, navigation, and migration. A sandbox can bea file system copy of a datastore project

    Sandbox Structure :

    It has five folder based on metadata as explained below

    db - database-related

    dml - record formats

    mp - graphs

    run - deployed scripts

  • 7/29/2019 Graph Overview

    3/3

    xfr - transforms

    Picture