Zhongyuan Bank: Joining Hands with Huawei Distributed Storage to Build a More economical, Faster, and Stable Integrated Data Lake Platform
Zhongyuan Bank adheres to the principle of "improving customer experience through digital means", working together with Huawei's years of technological accumulation in the field of distributed storage, continuously consolidating its business foundation, serving thousands of households, and building a bridge connecting finance, technology, and people's livelihood in this Kyushu hinterland.
As the only provincial-level legal person bank in Henan Province, Zhongyuan Bank has always adhered to the development concept of "building a bank through science and technology, promoting development through science and technology" since its establishment. As of June 2023, the total assets of Zhongyuan Bank have exceeded 1.3 trillion yuan, ranking 8th among urban commercial banks in China.
In 2021, with the goal of deepening the construction of digital application capabilities throughout the bank and exploring the transformation from operational efficiency to business model innovation, Zhongyuan Bank launched a three-stage digital transformation work, focusing on retail, corporate, risk, and back-end construction. With a platform operation model of "scenario driven, technology empowered, open and win-win", it jointly built a financial service ecosystem; Adhere to agile transformation, strengthen online and digital capabilities, and stride forward in the journey of building technology banks and digital banks.
The original integrated storage and computing data warehouse and Hadoop big data cluster face many challenges
In recent years, Zhongyuan Bank has built a data technology platform that meets the requirements of one-stop data integration, storage, calculation, and development, focusing on distributed data warehouses and big data technology, supporting commercial decision-making within the bank and the large-scale delivery of various applications.
With the online, mobile, and scenario based banking business, the scale of business data has exploded, and data types have become more diverse. As the main infrastructure of Zhongyuan Bank, the total amount of data stored in data warehouses has increased by more than twice in three years, and the number of applications based on data warehouse architecture has increased to over 60. At the same time, AI technologies mainly based on RPA, human-computer interaction, and knowledge graphs have put forward new requirements for fast access, feature extraction, and data processing of semi-structured and unstructured data, with high concurrency and low latency.
In this process, the original storage and computing integrated data warehouse and Hadoop big data cluster face unprecedented challenges in terms of flexible scalability, data analysis efficiency, and business experience stability:
How to reduce the cost of data storage
Faced with the increasing scale of data volume, data warehouse expansion has become normalized. The original storage and computing integrated architecture requires synchronous expansion of storage and computing. However, storage capacity increases linearly with data volume, while computing resource demand fluctuates with business peaks and low peaks. Synchronous expansion of the two leads to low resource utilization.
How to improve data analysis efficiency
According to the hierarchical modeling concept, the Zhongyuan Data Warehouse is divided into integration clusters (data is integrated and summarized according to financial themes), application clusters (data is extracted from the integration cluster and processed according to business scenarios), and data lakes (collecting and aggregating source data from various business systems and historical data from the integration layer). Due to the fact that data is stored separately in data warehouses and data lakes, achieving efficient data analysis across lake warehouses has become an urgent issue that needs to be addressed.
How to improve the stability of business experience
The original big data storage and computing integrated architecture used to exclusively cache data for computing nodes, which would cause a node to fail and need to be rebuilt when migrating to a new node. The business recovery time would take hours to days, seriously affecting the upper level business experience.
Joining hands with Huawei, adopting data lake+distributed storage to create a more cost-effective, faster, and stable integrated data platform
Faced with the above challenges, we have begun to explore the "lake warehouse integration" architecture scheme and gradually clarify the construction ideas for integrating data lakes:
1. With the completion of the data warehouse construction, the data lake can serve as a supplement to the data warehouse, located at the backend of the data warehouse, mainly used to unload some overloading of the data warehouse, such as storing and querying historical data.
2. The data lake expands its data exploration and AI analysis capabilities in other scenarios, while the original data warehouse's external services remain unchanged, mainly focusing on application scenarios such as integration layer, marketplace layer, and application layer batch processing tasks and report queries.
3. At the logical level, technologies such as integrated development of lake and warehouse tasks, unified management of metadata, and federated queries are used to connect data lakes, data warehouses, and data services as a whole, meeting the integrated usage needs of users.
In terms of technical architecture, Huawei OceanStor Pacific's big data storage and computing separation solution is introduced. Through storage and computing separation and lake warehouse fusion storage, on-demand expansion of storage and computing is achieved, efficient data flow is achieved, and personalized multi-dimensional data statistical analysis needs of different business lines throughout the bank are met.