The Data Migration Gap

by Tony Velcich

October 26, 2020

While organizations are benefiting from the performance, flexibility, and cost savings offered by the cloud, many enterprises have struggled with their data migration initiatives. Cloud data migration can be fraught with business risks including disruption of critical business operations, risk of data loss, and overall project complexities that often result in cost overruns or failed initiatives. According to Gartner, “Through 2022, more than 50% of data migration initiatives will exceed their budget and timeline—and potentially harm the business—because of flawed strategy and execution.”

These issues are particularly present when using legacy data migration approaches and is what we call the “data migration gap”. How can organizations migrate their petabytes of business critical and actively changing customer data without causing business disruption and minimizing the time, costs and risks associated with legacy data migration approaches?

Legacy Data Migration Approaches

Legacy data migration approaches typically fall under the following categories.

Lift and Shift

A lift and shift approach is used to migrate applications and data from one environment to another with zero or minimal changes. This approach is probably the most common, as companies perceive it to be straight forward. However, it is simplistic and risk laden because of the assumption that no changes are needed. As a result, a majority of these projects fail. Instead of gaining the efficiencies promised by the cloud, they don’t take advantage of new capabilities available to them and end up transferring shortcomings from their existing on-premises implementation to the new cloud environment.

Summary of issues:

Simplistic and risk laden approach
Results in failed projects
Requires downtime during migration to prevent data changes from occurring
All applications must be cut-over at one time requiring a “big-bang” approach
Does not take advantage of new capabilities available in the cloud environment
Often results in unexpected costs and performance issues

Incremental Copy

An incremental copy approach is where new and modified data are periodically copied from the source to target environment utilizing multiple passes of the source data. The approach requires that all original data from the source system have been migrated to the target, and incremental changes to the data are processed with each subsequent pass. The issue with this approach is that if there is a large volume of data with changes occurring it may be impossible to ever catchup and complete the migration without requiring downtime.

Summary of issues:

Requires multiple passes of the source data
May be impossible to catchup with all data changes
May require application downtime
Typically, only supports small data volumes

Dual Pipeline / Ingest

A dual pipeline or dual ingest approach is where new data is ingested simultaneously into both the source and target environments. This approach requires significant effort to develop, test, operate and maintain the multiple pipelines. It requires that all applications are modified to always update both source and target environments when performing any data changes.

Summary of issues:

Increased development, test and maintenance efforts
Impacts to application complexity and performance
Does not address initial migration of source data

A Live Data Strategy

If the migration technology used does not need to compare data that have been modified during an extended period of data transfer, it can eliminate the gap that exists between source and target and maintain the target as a current replica of the source data. This requires the combination of change event information with the activities from scanning existing data, and an intelligent approach to bringing those two streams of information together to take the correct actions against the migration target. This is not just change data capture, but the use of a continuous stream of change notifications with the scan of the data that are undergoing change.

This approach is what we call a live data strategy because it enables data migrations to be performed even as the data sets are undergoing active change. A live data strategy solves all of the challenges associated with the legacy data migration approaches by enabling migrations to be performed without requiring any application downtime or business disruption. A live data approach is able to support data migration and replication use cases regardless of the data volumes or amount of data change occurring, and is the only approach able to cost effectively manage these large scale data migrations even while large amounts of data changes are occurring.

Author's Bio

Tony Velcich

Sr. Director of Product Marketing, Wandisco
Tony is an accomplished product management and marketing leader with over 25 years of experience in the software industry. Tony is currently responsible for product marketing at WANdisco, helping to drive go-to-market strategy, content and activities. Tony has a strong background in data management having worked at leading database companies including Oracle, Informix and TimesTen where he led strategy for areas such as big data analytics for the telecommunications industry, sales force automation, as well as sales and customer experience analytics.

Tony is an accomplished product management and marketing leader with over 25 years of experience in the software industry. Tony is currently responsible for product marketing at WANdisco, helping to drive go-to-market strategy, content and activities. Tony has a strong background in data management having worked at leading database companies including Oracle, Informix and TimesTen where he led strategy for areas such as big data analytics for the telecommunications industry, sales force automation, as well as sales and customer experience analytics.