New features and Enhancement on IBM InfoSphere Information Server,
Version 9.1
The new and changed
features and documentation updates are described in details in the following
different sections.
Index
1. InfoSphere Information Server for Data
Integration
· InfoSphere Data Click
· InfoSphere DataStage
2. InfoSphere Information Server for Data
Quality
· InfoSphere Data Quality Console
· InfoSphere Information Analyzer
· InfoSphere QualityStage
3. InfoSphere Metadata Asset Manager
4. Common capabilities across the InfoSphere
Information Server suite
· Administering
· Connecting to external sources
· InfoSphere Blueprint Director
· InfoSphere Metadata Asset Manager
· InfoSphere Metadata Workbench
· Migrating
5. InfoSphere Business Information Exchange
· InfoSphere Business Glossary
· InfoSphere Business Glossary Client for Eclipse
6. Documentation changes included in the Version
9.1 release
· Documentation introduced or enhanced with
Version 9.1
Below is the detailed
description about newly added features in Datastage v9.1:
1. InfoSphere Information
Server for Data Integration
InfoSphere Data Click
helps users retrieve data and provision systems with agility. Users can offload
individual tables or entire schemas to generate sandbox environments for
personal or group development work. The simple interface enables users of any
skill level to complete the data integration task. InfoSphere Data Click
inherits the built-in data governance features of the InfoSphere Information
Server platform.
InfoSphere Data Click
generates both design and operational metadata to support data lineage and impact
analysis. InfoSphere Data Click assets also support linkages to the business
glossary so that users can establish trust in the sources of information that
are used. Also, administrators can define policies that control the data
integration activity so that users cannot exceed limits that are based on
enterprise requirements.
InfoSphere Data Click is
installed when you install InfoSphere Information Server for Data
Integration. InfoSphere Data Click activities are governed from InfoSphere
Blueprint Director. You install InfoSphere Data Click as a plug-in into
InfoSphere Blueprint Director.
Workload management
You can now use the
workload management service in InfoSphere Information Server to allow
the administrator to set system resource policies and prioritization of
workload classes. The policies and workload classes control the execution of
parallel and server jobs.
Web-based job runtime
management
Administration and management
of the operational environment is simplified by extending the Operations
Console. Authorized users can now define the workload management policies, and
can run, stop, and reset integration jobs within the projects that they
administer.
Balanced optimization
for Hadoop
Extending the HDFS
features in Version 8.7, you can now use the Balanced Optimization features of
InfoSphere DataStage to push sets of data integration processing and related
data I/O into a Hadoop cluster. InfoSphere DataStage adds integration with
Oozie workflows, as well as real-time integration with InfoSphere Streams.
Support for IBM Rational
Team Concert™ as a source control system
You can now use Rational
Team Concert as a source control system in IBM InfoSphere Information Server
Manager.
XML design and
performance optimization enhancements
InfoSphere DataStage 9.1
includes new features to help you work with the type of large XML schemas that
are often seen in industry standards. You can use one new feature, the schema
view, to narrow the scope of a large XSD to only the subset of the schema tree
that you want to work with.
When you narrow the
scope, you can focus on a particular business challenge and parse and compose
XML documents more easily. Other new features include user-specified
parallelization for greater performance, extended support for XSD typing, and
usability and productivity improvements in XML job editing through schema
search and mapping intelligence.
2. InfoSphere Information
Server for Data Quality
- InfoSphere Data Quality Console
InfoSphere Data Quality
Console is a new unified, browser-based interface that you can use to monitor
and track data quality exceptions that are generated by InfoSphere
Information Server products and components. Exceptions are entities that
are generated by a condition or event and that might require additional
information or investigation. For example, records that do not meet the
conditions of data rules in InfoSphere Information Analyzer might be considered
exceptions. The following screen capture shows how you can view a subset of
exception descriptors by specifying search criteria, which include search terms
and attributes.
- InfoSphere Information Analyzer
Predefined rule
definitions
A key challenge in
assessing and monitoring information quality is starting the process to
validate key business requirements. Instead of starting that process without
assistance, you can start by using predefined data quality rule definitions.
New installations of
this release include more than 100 predefined rule definitions for basic and
common domains. Also included are more than 60 predefined rule definitions that
are designed to validate standardized address data. Although the rule
definitions are optimized for US data, they can be modified for any country or
region.
The data domains that
are represented include the following domains:
· Personal identity, such as age, date of birth,
and national identifier
· Asset identity, such as IP address information
· Financial
· Orders and sales
· Data classification, such as identifier,
indicator, code, date, and quantity
· Completeness, which checks whether a field
exists
· Data format, such as alphabetic and numeric
· Address data
User-named output tables
for data rules
When you create data
rules, you can specify that you want a user-named rule output table to be
created in addition to the system rule output tables. User-named output tables
can be simple or advanced. Use a simple table if you plan to use the rule
output from one rule to create subsequent rules. Use an advanced table if you
want to collect rule output from multiple data rules into one table. Also, you
might want to create an advanced user-named table if you plan to use the rule
output from multiple rules to create subsequent rules. An advanced user-named
table is an additional physical table with copied records, which means that it
requires additional storage space.
Distinct output records
You can now specify
whether you want only distinct output records or all output records in the rule
output table.
Task sequencing
You can now use task
sequences to group multiple InfoSphere Information Analyzer jobs that
are to be executed sequentially. In this release, task sequencing is available
only by using the HTTP API and CLI, and only rules, rule sets, and metrics are
supported for task sequencing.
Standardization Rules
Designer
The
new Standardization Rules Designer provides an intuitive and
efficient framework that you can use to enhance standardization rule sets. You
can use the browser-based interface to add or modify classifications, lookup
tables, and rules.
You can also import
sample data to validate that the enhancements to the rule set work with your
data. The following screen capture shows a part of the Standardization Rules
Designer in which you can add or modify a rule by mapping input values from an
example record to output columns. This rule splits concatenated values in an
input address record by mapping each part of the input value to a different
output column.
New rule sets
The following rule sets
are now available:
· The PHPROD rule set is a rule set for pharmaceutical data. The
rule set demonstrates how you can use rules to standardize description data
from the health industry.
· The RUNAMEL rule set can be used to standardize Russian names.
· The RUADDRL rule set can be used to standardize Russian addresses
and area information.
Rule set enhancements
The predefined rule sets
are enhanced in the following ways:
· The domain-specific rule sets can be used with
the Standardization Rules Designer.
· The CNNAME, HKCNAME, and HKNAME rule sets now
have special options for name processing.
· The CNADDR, CNAREA, CNPHONE, HKADDR, HKCADDR,
and HKPHONE rule sets now have user modification subroutines.
· The CNPHONE and HKPHONE rule sets are enhanced
in several ways. For example, input data can be converted to half-width
characters.
Sample data available
for predefined jobs and tutorial
Sample data is now
provided for the predefined standardization jobs that you can use to generate
standardized data and the frequency information for that data.The installation
media also now contains sample data and other files that are required for
the InfoSphere QualityStagetutorial.
3. InfoSphere Metadata
Asset Manager
Enhanced documentation
of import and export bridges
Individual reference
topics for each bridge contain prerequisites, frequently asked questions,
troubleshooting information, and detailed help for each parameter. Individual
PDF guides to using BI bridges contain customized information for imports from
IBM Cognos, SAP BusinessObjects, Microsoft, and Oracle BIEE.
Mapping documents for
each import bridge show how each metadata class in the source tool is displayed
inInfoSphere Information Server.
Asset interchange and
istool command line
The following functions
are documented:
· Exporting and importing InfoSphere Streams
assets.
· Exporting and importing InfoSphere Data Quality
Console assets
· Generating business glossary content from
InfoSphere Data Architect glossary models
· Generating business glossary content from
logical data models
InfoSphere QualityStage
To help you learn about
the new Standardization Rules Designer, tutorials are provided that use
data from the product and address domains.
New and updated topics
provide information about the standardization process and standardization rule
sets:
· Standardization workflow
· Developing rule sets
· Enhancing standardization rule sets by using the
Standardization Rules Designer
4. Common capabilities
across the InfoSphere Information Server suite
New repository
administration tool
The InfoSphere DataStage
and QualityStage operations database and the InfoSphere QualityStage
Standardization Rules Designer database are typically installed by the
installation program unless you are using a database other than DB2 or unless
you want to create them yourself.
To assist in the
management of repositories that are not installed by the installation program,
the RepositoryAdmin command line tool is provided. You can also use the
RepositoryAdmin tool for other purposes, such as to assist you in relocating a
repository to another server or to update a connection to a repository.
New database for
InfoSphere QualityStage
The InfoSphere
QualityStage Standardization Rules Designer is supported by an additional
database for your Version 9.1 installation.
- Connecting to external sources
Stage for IBM
Operational Decision Management
IBM Operational Decision
Management allows customers to externalize complex business rules from
applications. With the new ILOG JRules stage, you can invoke complex business
rules within the context of a job.
InfoSphere Streams
connector
The new InfoSphere
Streams connector enables integration between InfoSphere Streams and InfoSphere
DataStage. You can use the InfoSphere Streams connector to send data from an
InfoSphere DataStage job to an InfoSphere Streams job, and also to send data
from an InfoSphere Streams job to an InfoSphere DataStage job.
Unstructured Data stage
Use the new Unstructured
Data stage to extract information, such as formulas or document authors, from
Microsoft Excel files. The stage supports style sheets
for .xls and .xlsx file types.
Java™ Integration stage
You can use the new Java
Integration stage to integrate your code into your job design by writing your
Java code using the Java Integration stage API. The Java Integration stage API
defines interfaces and classes for writing Java code that can be invoked from
within InfoSphere DataStage and QualityStage parallel jobs.
Support for new data
sources
The following connectors
and stages are now available:
· DB2 connector for IBM DB2 for Linux, UNIX, and
Microsoft Windows, Version 10.1.x
· DB2 connector for IBM DB2 for z/OS ,
Version 10
· MQ connector for IBM WebSphere MQ, Version 7.1.x
and 7.5.x
· Informix stage for IBM Informix, Version 11.7
· Streams connector for IBM InfoSphere Streams 3.0
· Teradata connector for Teradata Database 13.10
and 14.0
· Oracle connector for Oracle Database 11g
Release2
· Sybase stage for Sybase ASE, Version 15.7 and
Sybase IQ, Version 15.4
· Netezza connector for Netezza 4.6x, 6.0.x, and
7.0.x
· ODBC connector for DataDirect ODBC, Version
7.0.x
· ILOG JRules stage for ILOG-JRules 7.1.x and WODM
8.0.x
· Big Data File stage for IBM BigInsight 1.4 and
Cloudera CH4.0
- InfoSphere Blueprint Director
Publication of
blueprints
Blueprints can now be
published to the metadata repository of InfoSphere Information
Server so that other users can view or use them.
InfoSphere Metadata
Asset Manager
Import metadata by
bridge from additional tools
Import support was added
for the following tools and types of metadata:
· CA ERwin Data Modeler 8. Logical and physical
data models.
· IBM Cognos, Version 10. Business intelligence
(BI) models, BI reports, and related implemented data resources.
· IBM InfoSphere Streams MetaBroker, Endpoints and
tuples.
· Oracle BI Enterprise Edition. Business
intelligence (BI) models, BI reports, and related data resources.
Export metadata
You can now use the OMG
CWM 1 XMI 1 bridge to export the contents of databases and database schemas to
XML files that are compliant with the OMG CWM XMI file format.
Create and edit data
connections
When you import by using
a connector, you can now create a data connection, use an existing data connection,
or edit an existing data connection. Data connections are saved to the metadata
repository.
Automatic creation of
metadata interchange servers
Metadata interchange
servers that enable import from bridges and connectors are now created
automatically during installation.
- InfoSphere Metadata Workbench
Enhancements in Manage
Lineage utility
You can now select or
clear InfoSphere DataStage projects to be included in lineage. Previously, the
Manage Lineage utility included all jobs in a selected project. In addition,
you can run the Manage Lineage utility on database views without selecting a
InfoSphere DataStage project to link the database view to its source database
table.
Integration with IBM
InfoSphere Blueprint Director
You can browse, query,
and display published blueprints. You can display the blueprint diagram.
Integration with IBM
InfoSphere Information Analyzer
You can browse, query,
and display published rule definitions and published rule set definitions.
You can browse, query,
display, and include for lineage the InfoSphere DataStage Data Rules stage and
its relationship to the published data rule.
Integration with Big
Data platform
You can browse, query,
display, and include for lineage the InfoSphere DataStage Unstructured Data,
Big Data File, and Streams Connector stages.
Integration with IBM
InfoSphere Business Glossary
You can browse, query,
display, and assign assets to information governance rules and information
governance policies.
You can query and
display the new Is A and Has A term relationships. In previous versions, only
the parent category of the term was displayed.
Integration with IBM
InfoSphere DataStage
You can browse, query,
display, and include for lineage the InfoSphere DataStage Java Client and Java
Transformer stages. You can display additional database stage properties: the
server, database, schema, and table properties of the stage.
You can display additional
data file stage properties: the file and location properties of the stage.
Integration with IBM
InfoSphere Data Click
You can browse, query,
display, and include for lineage published Change Data Capture (CDC)
subscriptions from InfoSphere Data Click. In addition, you can invoke the CDC
subscription process from a blueprint diagram.
Importing assets into
the metadata repository
You can generate
database, data file, and business intelligence (BI) report assets from a CSV
file for later import into the metadata repository.
New migration functions
To help you to migrate
automatically, you can now use two new migration wizards. The wizards automate
the process of exporting and importing databases, profiles, and directories that
are associated with InfoSphere Information Server. The wizards collect
information about your computer and InfoSphere Information
Server configuration. The information is then used to export and import
your system.
The migration wizards
support all three server tiers: the services tier, the engine tier, and the
metadata repository tier. When you export or imports by using the wizards, all
tiers that are installed on the computer are backed up simultaneously.
5. InfoSphere Business
Information Exchange
- InfoSphere Business Glossary
Expanded enterprise
information governance policies and rules
Now, in addition to
creating and managing terms and categories, you can create and manage
information governance policies and information governance rules. Information
governance policies and rules describe the way that information should be used
and managed to comply with business objectives. You can define relationships
among the policies and rules and between the policies and rules and other
metadata information assets.
Advanced term
relationships
You can use new
relationships between terms to express hierarchies of type and containment. The
relationships enable consumers of the information to understand the meaning of
terminology more fully, in the context of other terms.
Single sign-on for
Windows users
Integration with Windows
desktop authentication enables users who are logged in to Windows to work
with InfoSphere Business Glossary immediately, without requiring a
separate login process.
Web-based access to
blueprints
You can now define
information about blueprints and view published blueprints directly from the
business glossary.
Dynamic display of
external content from OSLC providers
OSLC (Open Services for
Lifecycle Collaboration) is a method of communicating among different
systems. InfoSphere Business Glossary can now be a consumer of OSLC
services from Rational Asset Manager and Rational Software Architect Data
Manager.
The metadata content
that is stored in these OSLC providers is displayed dynamically in the business
glossary. The dynamic display ensures that data is synchronized and eliminates
the need for separate data transfer procedures.
Enhanced integration
with InfoSphere Information Analyzer
In previous releases,
you were able to view the results of table and column analysis, including valid
values for columns. You can now browse, search, view details of, and assign
published data rule definitions and data rule set definitions to business
glossary assets.
- InfoSphere Business Glossary Client for Eclipse
Information governance
policy and information governance rule assets
You can now browse,
search, and display the properties of two new InfoSphere Business Glossary
assets: information governance policies and information governance rules. You
can assign an information governance rule to an asset, such as a database
table, so that the information governance rule governs the asset.
Import and export of
glossary assignments
Earlier versions
supported import and export term assignments. In version 9.1, you can import
and export glossary assignments, which include both, term assignments and
information governance rule assignments.
Advanced term relationships
from InfoSphere Business Glossary
Two new term
relationships, Is A and Has A, are included in the Properties view of a term.
You can view the supertype and subtype relationship between terms in the Term
Type Hierarchy view.
Business Process
Modeling Notation (BPMN) model elements
You can now view and
remove term assignments in BPMN model elements that are displayed in IBM
Rational Software Architect. With the Business Process Model Integration API,
you can build functions to add, remove, and get term assignments to BPMN model
elements.
Local indexing
Local term assignments
and local information governance rule assignments are now indexed to improve
search and display performance.
6. Documentation introduced
or enhanced with Version 9.1
Introduction to InfoSphere
Information Server
This information is more
complete and streamlined to help you understand how the suite and its
components interact. Diagrams show where each component fits in the suite
architecture, and scenarios explain how each component might be used to solve
real business problems.
InfoSphere Business
Glossary
New topics provide
information about populating your business glossary by using the command line:
· Generating business glossary content from
InfoSphere Data Architect glossary model (*.ndm) files.
· Generating business glossary content from
logical data models
InfoSphere DataStage
The quality of
information is improved and task steps are clarified in the InfoSphere
DataStage tutorial.
More troubleshooting
information, with focus on client login and job runtime issues, is provided.
The enhanced troubleshooting information includes information about specific
operating systems and information about how to prevent errors.