Overview of the Oracle GoldenGate
architecture
Oracle GoldenGate can be configured for the
following purposes:
·
A static extraction of data records from one database and the
loading of those records to another database.
·
Continuous extraction and replication of transactional DML1
operations and DDL changes (for supported databases) to keep source and target
data consistent.
·
Extraction from a database and replication to a file outside the
database.
Oracle
GoldenGate is composed of the following components:
·
Extract
·
Data pump
·
Replicat
·
Trails or extract files
·
Checkpoints
·
Manager
·
Collector
Overview of Extract
The Extract process runs on the source system
and is the extraction (capture) mechanism of Oracle GoldenGate. You can
configure Extract in one of the following ways:
·
Initial
loads: For initial data loads, Extract extracts (captures) a current,
static set of data directly from their source objects.
·
Change
synchronization: To keep source data synchronized with another set of data, Extract
captures DML and DDL operations after the initial synchronization has taken
place.
Extract captures from a data source that can be one of the
following:
·
Source tables, if the run is an initial load.
·
The database recovery logs or transaction logs (such as the Oracle redo
logs or SQL/MX audit trails). The actual method of capturing from the logs
varies depending on the database type.
·
A third-party capture module. This method provides a communication layer
that passes data and metadata from an external API to the Extract API. The
database vendor or a third-party vendor provides the components that extract
the data operations and pass them to Extract
When configured for change synchronization, Extract captures
the DML and DDL operations that are performed on objects in the Extract
configuration. Extract stores these operations until it receives commit records
or rollbacks for the transactions that contain them. When a rollback is
received, Extract discards the operations for that transaction.
When a commit is received, Extract persists the transaction
to disk in a series of files called a trail,
where it is queued for propagation to the target system. All of the operations
in each transaction are written to the trail as a sequentially organized
transaction unit. This design ensures both speed and data integrity.
NOTE Extract ignores operations on objects that are not in the Extract
configuration, even though the same transaction may also include operations on
objects that are in the Extract configuration.
Overview of data pumps
A data pump is a secondary Extract group within the source
Oracle GoldenGate configuration. If a data pump is not used, Extract must send
the captured data operations to a remote trail on the target. In a typical
configuration with a data pump, however, the primary Extract group writes to a
trail on the source system. The data pump reads this trail and sends the data
operations over the network to a remote trail on the target. The data pump adds
storage flexibility and also serves to isolate the primary Extract process from
TCP/IP activity.
In general, a data pump can perform data filtering, mapping,
and conversion, or it can be configured in pass-through mode, where data is passively transferred as-is, without manipulation.
Pass-through mode increases the throughput of the data pump, because all of the
functionality that looks up object definitions is bypassed.
Advantages
·
Protection
against network and target failures: In a basic Oracle GoldenGate configuration, with only a
trail on the target system, there is nowhere on the source system to store the
data operations that Extract continuously extracts into memory. If the network
or the target system becomes unavailable, Extract could run out of memory and
abend. However, with a trail and data pump on the source system, captured data
can be moved to disk, preventing the abend of the primary Extract.
When
connectivity is restored, the data pump captures the data from the source trail
and sends it to the target system(s).
·
You are
implementing several phases of data filtering or transformation. When using complex filtering or data
transformation configurations, you can configure a data pump to perform the
first transformation either on the source system or on the target system, or
even on an intermediary system, and then use another data pump or the Replicat
group to perform the second transformation.
·
Consolidating
data from many sources to a central target. When synchronizing multiple source databases with a central
target database, you can store extracted data operations on each source system
and use data pumps on each of those systems to send the data to a trail on the
target system. Dividing the storage load between the source and target systems
reduces the need for massive amounts of space on the target system to
accommodate data arriving from multiple sources.
·
Synchronizing
one source with multiple targets. When sending data to multiple target systems, you can
configure data pumps on the source system for each target. If network
connectivity to any of the targets fails, data can still be sent to the other
targets.
Overview of Replicat
The Replicat process runs on the target system,
reads the trail on that system, and then reconstructs the DML or DDL operations
and applies them to the target database. You can configure Replicat in one of
the following ways:
·
Initial
loads: For initial data loads, Replicat can apply a static data copy to
target objects or route it to a high-speed bulk-load utility.
·
Change
synchronization: When configured for change synchronization, Replicat applies the replicated
source operations to the target objects using a native database interface or
ODBC, depending on the database type. To preserve data integrity, Replicat
applies the replicated operations in the same order as they were committed to
the source database.
You can use multiple Replicat processes with
multiple Extract processes in parallel to increase throughput. To preserve data
integrity, each set of processes handles a different set of objects. To
differentiate among Replicat processes, you assign each one a group name
You can delay Replicat so that it waits a
specific amount of time before applying the replicated operations to the target
database. A delay may be desirable, for example, to prevent the propagation of
errant SQL, to control data arrival across different time zones, or to allow
time for other planned events to occur. The length of the delay is controlled
by the DEFERAPPLYINTERVAL
parameter.
Overview of trails
To support the continuous extraction and replication of
database changes, Oracle GoldenGate stores records of the captured changes
temporarily on disk in a series of files called a trail. A trail can exist on the source
system, an intermediary system, the target system, or any combination of those
systems, depending on how you configure Oracle GoldenGate. On the local system
it is known as an extract
trail (or local trail). On a remote system it is known as a remote trail.
Processes that write to, and read, a
trail
The primary Extract and the data-pump Extract write to a
trail. Only one Extract process can write to a trail, and each Extract must be
linked to a trail.
Processes that read the trail are:
·
Data-pump Extract: Extracts DML and DDL operations from a local
trail that is linked to a previous Extract (typically the primary Extract),
performs further processing if needed, and transfers the data to a trail that
is read by the next Oracle GoldenGate process downstream (typically Replicat,
but could be another data pump if required).
·
Replicat: Reads the trail and applies replicated DML and DDL
operations to the target database.
Trail creation and maintenance
The trail files themselves are created as needed during
processing, but you specify a two character name for the trail when you add it
to the Oracle GoldenGate configuration with the ADD RMTTRAIL or ADD EXTTRAIL command. By default, trails are stored in the dirdat subdirectory of the Oracle GoldenGate directory.
Full trail files are aged automatically to allow processing
to continue without interruption for file maintenance. As each new file is
created, it inherits the two-character trail name appended with a unique,
six-digit sequence number from 000000 through 999999 (for example c:\ggs\dirdat\tr000001). When the sequence number reaches 999999, the numbering starts
over at 000000.
To maximize throughput, and to minimize I/O load on the
system, extracted data is sent into and out of a trail in large blocks.
Transactional order is preserved. By default, Oracle GoldenGate writes data to
the trail in canonical
format, a
proprietary format which allows it to be exchanged rapidly and accurately among
heterogeneous databases. However, data can be written in other formats that are
compatible with different applications.
Overview of extract files
In some configurations, Oracle GoldenGate stores extracted
data in an extract
file instead
of a trail. The extract file can be a single file, or it can be configured to
roll over into multiple files in anticipation of limitations on file size that
are imposed by the operating system. In this sense, it is similar to a trail,
except that checkpoints are not recorded. The file or files are created
automatically during the run. The same versioning features that apply to trails
also apply to extract files.
Overview of Manager
Manager is the control process of Oracle
GoldenGate. Manager must be running on each system in the Oracle GoldenGate
configuration before Extract or Replicat can be started, and Manager must
remain running while those processes are running so that resource management
functions are performed. Manager performs the following functions:
·
Start Oracle GoldenGate processes
·
Start dynamic processes
·
Maintain port numbers for processes
·
Perform trail management
·
Create event, error, and threshold reports
One Manager Process can control many Extract or
Replicat processes. On Windows systems, Manager can run as a service.
Overview of Collector
Collector is a process that runs in the
background on the target system when continuous, online change synchronization
is active. Collector does the following:
·
Upon a connection request from a remote Extract to Manger, scan
and bind to an available port and then send the port number to Manager for
assignment to the requesting Extract process.
·
Receive
extracted database changes that are sent by Extract and write them to a trail
file. Manager starts Collector automatically when a network connection is
required, so Oracle GoldenGate users do not interact with it. Collector can
receive information from only one Extract process, so there is one Collector
for each Extract that you use. Collector terminates when the associated Extract
process terminates.
By default, Extract initiates TCP/IP connections from the
source system to Collector on the target, but Oracle GoldenGate can be
configured so that Collector initiates connections from the target. Initiating
connections from the target might be required if, for example, the target is in
a trusted network zone, but the source is in a less trusted zone.
Overview of process types
Depending on the requirement, Oracle GoldenGate
can be configured with the following processing types.
·
An online Extract or Replicat process runs until stopped by a user. Online
processes maintain recovery checkpoints in the trail so that processing can
resume after interruptions. You use online processes to continuously extract
and replicate DML and DDL operations (where supported) to keep source and
target objects synchronized. The EXTRACT
and REPLICAT parameters apply to this process type.
·
A source-is-table Extract process extracts a current set of static data directly
from the source objects in preparation for an initial load to another database.
This process type does not use checkpoints. The SOURCEISTABLE parameter applies to this process type.
·
A special-run Replicat process applies data within known begin and end points.
You use a special Replicat run for initial data loads, and it also can be used
with an online Extract to apply data changes from the trail in batches, such as
once a day rather than continuously. This process type does not maintain
checkpoints, because the run can be started over with the same begin and end
points. The SPECIALRUN
parameter
applies to this process type.
·
A remote task is a special type of initial-load process in which Extract
communicates directly with Replicat over TCP/IP. Neither a Collector process
nor temporary disk storage in a trail or file is used. The task is defined in
the Extract parameter file with the RMTTASK parameter.
Overview of groups
To differentiate among multiple Extract or Replicat
processes on a system, you define processing groups.
For example, to replicate different sets of data in parallel, you would create
two Replicat groups.
A processing group consists of a process (either Extract or
Replicat), its parameter file, its checkpoint file, and any other files
associated with the process. For Replicat, a group also includes the associated
checkpoint table.
You define groups by using the ADD EXTRACT and ADD
REPLICAT commands
in the Oracle GoldenGate command interface, GGSCI.
All files and checkpoints relating to a group share the name
that is assigned to the group itself. Any time that you issue a command to
control or view processing, you supply a group name or multiple group names by
means of a wildcard.
Overview of the Commit Sequence Number (CSN)
When working with Oracle GoldenGate, you might need to refer
to a Commit
Sequence Number, or
CSN. A CSN is an identifier that Oracle GoldenGate constructs to identify a transaction
for the purpose of maintaining transactional consistency and data integrity. It
uniquely identifies a point in time in which a transaction commits to the
database. The CSN can be required to position Extract in the transaction log,
to reposition Replicat in the trail, or for other purposes. It is returned by
some conversion functions and is included in reports and certain GGSCI output.