Sunday, 28 September 2014

Informatica Workflow Recovery

When a PowerCenter workflow fails, it is the part of the operations team’s responsibility to troubleshoot the issue and successfully complete the workflow. A PowerCenter workflow fails when a task (e.g. session) inside the workflow fails for various reasons. When the root cause of the issue is understood and fixed the workflow has to be completed in such a way that only the failed task and other tasks which are yet to run are executed. Configuring the workflow for recovery helps you achieve this.

Workflow recovery allows you to continue processing the workflow and workflow tasks from the point of interruption. When you enable a workflow for recovery, the Integration Service saves the workflow state of operation in a shared location. The workflow state of operation includes the status of tasks in the workflow and workflow variable values. You can recover the workflow if it terminates, stops, or aborts. The workflow does not have to be running. You can recover a workflow if the Integration Service can access the workflow state of operation.

What is Workflow Recovery
Workflow recovery allows you to continue processing the workflow and workflow tasks from the point of interruption. During the workflow recovery process Integration Service access the workflow state, which is stored in memory or on disk based on the recovery configuration. The workflow state of operation includes the status of tasks in the workflow and workflow variable values.
The configuration in Informatica power centre includes.
1.       Workflow Configuration for Recovery
2.       Session and Tasks Configuration for Recovery
3.       Recovering the Workflow from Failure

1. Workflow Configuration For Recovery 
To configure a workflow for recovery, we must enable the below configurations in power centre
workflow for recovery or Configure the workflow to suspend on task error.

Enable Recovery: When you enable a workflow for recovery, the Integration Service saves the workflow state of operation in a shared location. You can recover the workflow if it terminates, stops, or aborts.
We can set up the automatic recovery in the workflow as shown in below image.


An optional High Availability (HA) license is required for this check box to be available for selection. Without the HA option, workflows must be recovered manually.

Suspend: When you configure a workflow to suspend on error, the Integration Service stores the workflow state of operation in memory. You can recover the suspended workflow if a task fails.
You can fix the task error and recover the workflow. If the workflow is not able to recover automatically from failure with in the maximum allowed number of attempts, it goes to 'suspended' state. 
Below image shows about setting workflow to suspend on error:


2. Session and Tasks Configuration for Recovery
Each session or task in a workflow has its own recovery strategy. When the Integration Service recovers a workflow, it recovers tasks based on the recovery strategy of each task or session specified. There are three different options available as below:
·    Restart task
·    Fail task and continue workflow
·    Resume from the last checkpoint

Restart task: This recovery strategy is available for all type of workflow tasks. When the Integration Service recovers a workflow, it restarts each recoverable task that is configured with a restart strategy. You can configure Session and Command tasks with a restart recovery strategy and other tasks have a restart recovery strategy by default. 

Fail task and continue workflow: It is only available for session and command tasks. When the Integration Service recovers a workflow, it does not recover the task. The task status becomes failed, and the Integration Service continues running the workflow.

Resume from the last checkpoint: This recovery strategy is only available for session tasks. The Integration Service saves the session state of operation and maintains target recovery tables. If the session aborts, stops, or terminates, the Integration Service uses the saved recovery information to resume the session from the point of interruption.

When you configure the session recovery strategy to resume from the last checkpoint, Integration Service stores the session state of operation in the shared location, $PMStorageDir. And also it is written to the recovery tables (PM_RECOVERY, PM_TGT_RUN_ID, PM_REC_STATE) to determine where to begin loading data to target tables, in case of a recovery. 
Below image shows the available session recovery options:


Below image shows available command task recovery options


3. Recovering the Workflow from Failure
Workflow can be either recovered in two ways as described below: 
Recovering Automatically:
If you have High Availability (HA) licence and the workflow is configured to recover automatically as described above, Integration service automatically attempts to recover the workflow based on the recovery strategy set of each session or task in the workflow. If the workflow is not able to recover automatically from failure with in the maximum allowed number of attempts, it goes to 'suspended' state, which can be then manually recovered.

Recovering Manually:
You can manually recover the workflow or individual tasks with in a workflow separately. You can access the options as shown in below image from the workflow manager or from the workflow monitor.

Recover workflow: Continue processing the workflow from the point of interruption. 
Recover Task: Recover a session but not the rest of the workflow. 
Recover workflow from a task: Recover a session and continue processing a workflow.



Thursday, 25 September 2014

New Features In IBM Datastage 9.1



New features and Enhancement on IBM InfoSphere Information Server, Version 9.1

The new and changed features and documentation updates are described in details in the following different sections.

Index 
1.    InfoSphere Information Server for Data Integration
·         InfoSphere Data Click
·         InfoSphere DataStage
2.    InfoSphere Information Server for Data Quality
·         InfoSphere Data Quality Console
·         InfoSphere Information Analyzer
·         InfoSphere QualityStage
3.    InfoSphere Metadata Asset Manager
4.    Common capabilities across the InfoSphere Information Server suite
·         Administering
·         Connecting to external sources
·         InfoSphere Blueprint Director
·         InfoSphere Metadata Asset Manager
·         InfoSphere Metadata Workbench
·         Migrating
5.    InfoSphere Business Information Exchange
·         InfoSphere Business Glossary
·         InfoSphere Business Glossary Client for Eclipse
6.    Documentation changes included in the Version 9.1 release
·         Documentation introduced or enhanced with Version 9.1

Below is the detailed description about newly added features in Datastage v9.1:

1. InfoSphere Information Server for Data Integration
  •    InfoSphere Data Click
InfoSphere Data Click helps users retrieve data and provision systems with agility. Users can offload individual tables or entire schemas to generate sandbox environments for personal or group development work. The simple interface enables users of any skill level to complete the data integration task. InfoSphere Data Click inherits the built-in data governance features of the InfoSphere Information Server platform.

InfoSphere Data Click generates both design and operational metadata to support data lineage and impact analysis. InfoSphere Data Click assets also support linkages to the business glossary so that users can establish trust in the sources of information that are used. Also, administrators can define policies that control the data integration activity so that users cannot exceed limits that are based on enterprise requirements.

InfoSphere Data Click is installed when you install InfoSphere Information Server for Data Integration. InfoSphere Data Click activities are governed from InfoSphere Blueprint Director. You install InfoSphere Data Click as a plug-in into InfoSphere Blueprint Director.

  •   InfoSphere DataStage
Workload management
You can now use the workload management service in InfoSphere Information Server to allow the administrator to set system resource policies and prioritization of workload classes. The policies and workload classes control the execution of parallel and server jobs.

Web-based job runtime management
Administration and management of the operational environment is simplified by extending the Operations Console. Authorized users can now define the workload management policies, and can run, stop, and reset integration jobs within the projects that they administer.

Balanced optimization for Hadoop
Extending the HDFS features in Version 8.7, you can now use the Balanced Optimization features of InfoSphere DataStage to push sets of data integration processing and related data I/O into a Hadoop cluster. InfoSphere DataStage adds integration with Oozie workflows, as well as real-time integration with InfoSphere Streams.

Support for IBM Rational Team Concert™ as a source control system
You can now use Rational Team Concert as a source control system in IBM InfoSphere Information Server Manager.

XML design and performance optimization enhancements
InfoSphere DataStage 9.1 includes new features to help you work with the type of large XML schemas that are often seen in industry standards. You can use one new feature, the schema view, to narrow the scope of a large XSD to only the subset of the schema tree that you want to work with.

When you narrow the scope, you can focus on a particular business challenge and parse and compose XML documents more easily. Other new features include user-specified parallelization for greater performance, extended support for XSD typing, and usability and productivity improvements in XML job editing through schema search and mapping intelligence.

2.   InfoSphere Information Server for Data Quality
  • InfoSphere Data Quality Console
InfoSphere Data Quality Console is a new unified, browser-based interface that you can use to monitor and track data quality exceptions that are generated by InfoSphere Information Server products and components. Exceptions are entities that are generated by a condition or event and that might require additional information or investigation. For example, records that do not meet the conditions of data rules in InfoSphere Information Analyzer might be considered exceptions. The following screen capture shows how you can view a subset of exception descriptors by specifying search criteria, which include search terms and attributes.

  • InfoSphere Information Analyzer
Predefined rule definitions
A key challenge in assessing and monitoring information quality is starting the process to validate key business requirements. Instead of starting that process without assistance, you can start by using predefined data quality rule definitions.

New installations of this release include more than 100 predefined rule definitions for basic and common domains. Also included are more than 60 predefined rule definitions that are designed to validate standardized address data. Although the rule definitions are optimized for US data, they can be modified for any country or region.

The data domains that are represented include the following domains:
·         Personal identity, such as age, date of birth, and national identifier
·         Asset identity, such as IP address information
·         Financial
·         Orders and sales
·         Data classification, such as identifier, indicator, code, date, and quantity
·         Completeness, which checks whether a field exists
·         Data format, such as alphabetic and numeric
·         Address data

User-named output tables for data rules
When you create data rules, you can specify that you want a user-named rule output table to be created in addition to the system rule output tables. User-named output tables can be simple or advanced. Use a simple table if you plan to use the rule output from one rule to create subsequent rules. Use an advanced table if you want to collect rule output from multiple data rules into one table. Also, you might want to create an advanced user-named table if you plan to use the rule output from multiple rules to create subsequent rules. An advanced user-named table is an additional physical table with copied records, which means that it requires additional storage space.

Distinct output records
You can now specify whether you want only distinct output records or all output records in the rule output table.

Task sequencing
You can now use task sequences to group multiple InfoSphere Information Analyzer jobs that are to be executed sequentially. In this release, task sequencing is available only by using the HTTP API and CLI, and only rules, rule sets, and metrics are supported for task sequencing.
 
  • InfoSphere QualityStage
Standardization Rules Designer
The new Standardization Rules Designer provides an intuitive and efficient framework that you can use to enhance standardization rule sets. You can use the browser-based interface to add or modify classifications, lookup tables, and rules.

You can also import sample data to validate that the enhancements to the rule set work with your data. The following screen capture shows a part of the Standardization Rules Designer in which you can add or modify a rule by mapping input values from an example record to output columns. This rule splits concatenated values in an input address record by mapping each part of the input value to a different output column.

New rule sets
The following rule sets are now available:
·   The PHPROD rule set is a rule set for pharmaceutical data. The rule set demonstrates how you can use rules to standardize description data from the health industry.
·     The RUNAMEL rule set can be used to standardize Russian names.
·    The RUADDRL rule set can be used to standardize Russian addresses and area information.


Rule set enhancements
The predefined rule sets are enhanced in the following ways:
·         The domain-specific rule sets can be used with the Standardization Rules Designer.
·       The CNNAME, HKCNAME, and HKNAME rule sets now have special options for name processing.
·      The CNADDR, CNAREA, CNPHONE, HKADDR, HKCADDR, and HKPHONE rule sets now have user modification subroutines.
·       The CNPHONE and HKPHONE rule sets are enhanced in several ways. For example, input data can be converted to half-width characters.

Sample data available for predefined jobs and tutorial
Sample data is now provided for the predefined standardization jobs that you can use to generate standardized data and the frequency information for that data.The installation media also now contains sample data and other files that are required for the InfoSphere QualityStagetutorial.

3.   InfoSphere Metadata Asset Manager

Enhanced documentation of import and export bridges
Individual reference topics for each bridge contain prerequisites, frequently asked questions, troubleshooting information, and detailed help for each parameter. Individual PDF guides to using BI bridges contain customized information for imports from IBM Cognos, SAP BusinessObjects, Microsoft, and Oracle BIEE.
Mapping documents for each import bridge show how each metadata class in the source tool is displayed inInfoSphere Information Server.

Asset interchange and istool command line
The following functions are documented:
·         Exporting and importing InfoSphere Streams assets.
·         Exporting and importing InfoSphere Data Quality Console assets
·      Generating business glossary content from InfoSphere Data Architect glossary        models
·         Generating business glossary content from logical data models

InfoSphere QualityStage
To help you learn about the new Standardization Rules Designer, tutorials are provided that use data from the product and address domains.
New and updated topics provide information about the standardization process and standardization rule sets:
·         Standardization workflow
·         Developing rule sets
·         Enhancing standardization rule sets by using the Standardization Rules Designer

4. Common capabilities across the InfoSphere Information Server suite
  •    Administering
New repository administration tool
The InfoSphere DataStage and QualityStage operations database and the InfoSphere QualityStage Standardization Rules Designer database are typically installed by the installation program unless you are using a database other than DB2 or unless you want to create them yourself.
To assist in the management of repositories that are not installed by the installation program, the RepositoryAdmin command line tool is provided. You can also use the RepositoryAdmin tool for other purposes, such as to assist you in relocating a repository to another server or to update a connection to a repository.

New database for InfoSphere QualityStage
The InfoSphere QualityStage Standardization Rules Designer is supported by an additional database for your Version 9.1 installation.

  • Connecting to external sources
Stage for IBM Operational Decision Management
IBM Operational Decision Management allows customers to externalize complex business rules from applications. With the new ILOG JRules stage, you can invoke complex business rules within the context of a job.

InfoSphere Streams connector
The new InfoSphere Streams connector enables integration between InfoSphere Streams and InfoSphere DataStage. You can use the InfoSphere Streams connector to send data from an InfoSphere DataStage job to an InfoSphere Streams job, and also to send data from an InfoSphere Streams job to an InfoSphere DataStage job.

Unstructured Data stage
Use the new Unstructured Data stage to extract information, such as formulas or document authors, from Microsoft Excel files. The stage supports style sheets for .xls and .xlsx file types.

Java™ Integration stage
You can use the new Java Integration stage to integrate your code into your job design by writing your Java code using the Java Integration stage API. The Java Integration stage API defines interfaces and classes for writing Java code that can be invoked from within InfoSphere DataStage and QualityStage parallel jobs.

Support for new data sources
The following connectors and stages are now available:
·         DB2 connector for IBM DB2 for Linux, UNIX, and Microsoft Windows, Version 10.1.x
·         DB2 connector for IBM DB2 for z/OS  , Version 10
·         MQ connector for IBM WebSphere MQ, Version 7.1.x and 7.5.x
·         Informix stage for IBM Informix, Version 11.7
·         Streams connector for IBM InfoSphere Streams 3.0
·         Teradata connector for Teradata Database 13.10 and 14.0
·         Oracle connector for Oracle Database 11g Release2
·         Sybase stage for Sybase ASE, Version 15.7 and Sybase IQ, Version 15.4
·         Netezza connector for Netezza 4.6x, 6.0.x, and 7.0.x
·         ODBC connector for DataDirect ODBC, Version 7.0.x
·         ILOG JRules stage for ILOG-JRules 7.1.x and WODM 8.0.x
·         Big Data File stage for IBM BigInsight 1.4 and Cloudera CH4.0
  • InfoSphere Blueprint Director
Publication of blueprints
Blueprints can now be published to the metadata repository of InfoSphere Information Server so that other users can view or use them.

InfoSphere Metadata Asset Manager
Import metadata by bridge from additional tools
Import support was added for the following tools and types of metadata:
·         CA ERwin Data Modeler 8. Logical and physical data models.
·      IBM Cognos, Version 10. Business intelligence (BI) models, BI reports, and related implemented data resources.
·         IBM InfoSphere Streams MetaBroker, Endpoints and tuples.
·       Oracle BI Enterprise Edition. Business intelligence (BI) models, BI reports, and related data resources.

Export metadata
You can now use the OMG CWM 1 XMI 1 bridge to export the contents of databases and database schemas to XML files that are compliant with the OMG CWM XMI file format.

Create and edit data connections
When you import by using a connector, you can now create a data connection, use an existing data connection, or edit an existing data connection. Data connections are saved to the metadata repository.

Automatic creation of metadata interchange servers
Metadata interchange servers that enable import from bridges and connectors are now created automatically during installation.

  • InfoSphere Metadata Workbench
Enhancements in Manage Lineage utility
You can now select or clear InfoSphere DataStage projects to be included in lineage. Previously, the Manage Lineage utility included all jobs in a selected project. In addition, you can run the Manage Lineage utility on database views without selecting a InfoSphere DataStage project to link the database view to its source database table.

Integration with IBM InfoSphere Blueprint Director
You can browse, query, and display published blueprints. You can display the blueprint diagram.

Integration with IBM InfoSphere Information Analyzer
You can browse, query, and display published rule definitions and published rule set definitions.
You can browse, query, display, and include for lineage the InfoSphere DataStage Data Rules stage and its relationship to the published data rule.

Integration with Big Data platform
You can browse, query, display, and include for lineage the InfoSphere DataStage Unstructured Data, Big Data File, and Streams Connector stages.

Integration with IBM InfoSphere Business Glossary
You can browse, query, display, and assign assets to information governance rules and information governance policies.
You can query and display the new Is A and Has A term relationships. In previous versions, only the parent category of the term was displayed.

Integration with IBM InfoSphere DataStage
You can browse, query, display, and include for lineage the InfoSphere DataStage Java Client and Java Transformer stages. You can display additional database stage properties: the server, database, schema, and table properties of the stage.
You can display additional data file stage properties: the file and location properties of the stage.

Integration with IBM InfoSphere Data Click
You can browse, query, display, and include for lineage published Change Data Capture (CDC) subscriptions from InfoSphere Data Click. In addition, you can invoke the CDC subscription process from a blueprint diagram.

Importing assets into the metadata repository
You can generate database, data file, and business intelligence (BI) report assets from a CSV file for later import into the metadata repository.

  • Migrating
New migration functions
To help you to migrate automatically, you can now use two new migration wizards. The wizards automate the process of exporting and importing databases, profiles, and directories that are associated with InfoSphere Information Server. The wizards collect information about your computer and InfoSphere Information Server configuration. The information is then used to export and import your system.

The migration wizards support all three server tiers: the services tier, the engine tier, and the metadata repository tier. When you export or imports by using the wizards, all tiers that are installed on the computer are backed up simultaneously.

5.   InfoSphere Business Information Exchange
  • InfoSphere Business Glossary
Expanded enterprise information governance policies and rules
Now, in addition to creating and managing terms and categories, you can create and manage information governance policies and information governance rules. Information governance policies and rules describe the way that information should be used and managed to comply with business objectives. You can define relationships among the policies and rules and between the policies and rules and other metadata information assets.

Advanced term relationships
You can use new relationships between terms to express hierarchies of type and containment. The relationships enable consumers of the information to understand the meaning of terminology more fully, in the context of other terms.

Single sign-on for Windows users
Integration with Windows desktop authentication enables users who are logged in to Windows to work with InfoSphere Business Glossary immediately, without requiring a separate login process.

Web-based access to blueprints
You can now define information about blueprints and view published blueprints directly from the business glossary.

Dynamic display of external content from OSLC providers
OSLC (Open Services for Lifecycle Collaboration) is a method of communicating among different systems. InfoSphere Business Glossary can now be a consumer of OSLC services from Rational Asset Manager and Rational Software Architect Data Manager.

The metadata content that is stored in these OSLC providers is displayed dynamically in the business glossary. The dynamic display ensures that data is synchronized and eliminates the need for separate data transfer procedures.

Enhanced integration with InfoSphere Information Analyzer
In previous releases, you were able to view the results of table and column analysis, including valid values for columns. You can now browse, search, view details of, and assign published data rule definitions and data rule set definitions to business glossary assets.

  • InfoSphere Business Glossary Client for Eclipse
Information governance policy and information governance rule assets
You can now browse, search, and display the properties of two new InfoSphere Business Glossary assets: information governance policies and information governance rules. You can assign an information governance rule to an asset, such as a database table, so that the information governance rule governs the asset.

Import and export of glossary assignments
Earlier versions supported import and export term assignments. In version 9.1, you can import and export glossary assignments, which include both, term assignments and information governance rule assignments.

Advanced term relationships from InfoSphere Business Glossary
Two new term relationships, Is A and Has A, are included in the Properties view of a term. You can view the supertype and subtype relationship between terms in the Term Type Hierarchy view.

Business Process Modeling Notation (BPMN) model elements
You can now view and remove term assignments in BPMN model elements that are displayed in IBM Rational Software Architect. With the Business Process Model Integration API, you can build functions to add, remove, and get term assignments to BPMN model elements.

Local indexing
Local term assignments and local information governance rule assignments are now indexed to improve search and display performance.

6. Documentation introduced or enhanced with Version 9.1

Introduction to InfoSphere Information Server
This information is more complete and streamlined to help you understand how the suite and its components interact. Diagrams show where each component fits in the suite architecture, and scenarios explain how each component might be used to solve real business problems.

InfoSphere Business Glossary
New topics provide information about populating your business glossary by using the command line:
·        Generating business glossary content from InfoSphere Data Architect glossary model (*.ndm) files.
·         Generating business glossary content from logical data models

InfoSphere DataStage
The quality of information is improved and task steps are clarified in the InfoSphere DataStage tutorial.
More troubleshooting information, with focus on client login and job runtime issues, is provided. The enhanced troubleshooting information includes information about specific operating systems and information about how to prevent errors.