Watch the replay of the Apsara Conference 2021 at this link!
“Technology serves for business value.”
– Wang Kepan, CTO of AddNewer Technology
Wang Kepan, CTO of AddNewer Technology
This article introduces the practical experience of the cloud-native upgrade of big data platforms in the digital marketing industry. This article is based on the problems and challenges that occur when upgrading the big data platform of AddNewer Technology, the restructuring of the data platform architecture, and the changes after restructuring. This article includes the following three parts.
1. An Introduction to AddNewer
AddNewer Technology was founded in 2014 and built its technical services in 2015. Our entire service mode is built for brand advertisers, and we also provide marketing solutions for customers with marketing needs.
1.1 Service Mode of AddNewer
The following are the media and data providers that cooperate with AddNewer Technology:
Our service mode is to integrate all media traffic into a channel. Our clients may need to jointly control the frequency between different media. For example, users see an advertisement on Youku and another one on iQiYi, but our clients only want the user to see three advertisements in total. Then, AddNewer can exercise cross-platform advertisement control. When clients need third-party selection and monitoring, we can cooperate with other service providers to provide clients with advertising services.
1.2 Data Scale of AddNewer
The data scale of AddNewer Technology has been growing rapidly. In the beginning, the traffic might be outnumbered by small and medium-sized media. Last month, the peak of requests reached 80 billion. The data also had a relatively high complexity. Each request carried the corresponding advertisement information, each with nearly 100 relevant dimensions to be processed. Our service reaches over 500 million users every day, with over 5,000 activities launched every year, and we serve over 100 brand customers.
2. Challenges of AddNewer Big Data Service
2.1 Challenges from Service Scenarios
As the size of our businesses increases, we have encountered some problems and pain points:
The data scale is large, and the service operations are complex. The scale of service is too large, and we have to carry out real-time analysis every day. Our clients summarize the activity information within a certain time range or remove cross-media duplicates.
The needs of clients are changeable and complicated. The needs of clients often change. There are many dimensions of data to be analyzed according to the needs of the clients we serve. There is no unified requirement because each media user has different tag attributes, and we need to split and remove duplicates. Therefore, it is necessary to deal with these requirements within the big data scope.
The volume of computing fluctuates a lot, and the peak value is difficult to predict. Computing scale fluctuates based on the needs of clients. When clients have urgent delivery needs, a lot of media traffic will be occupied. This will lead to a very high traffic peak in a short time. When clients do not place orders during this time, the scale decreases accordingly. There is a need for flexibility support between service costs and capabilities.
The service guarantee requirements are high. From media to request, we send the information to a third-party or a traffic monitoring platform, and then a message comes back. Then, we can finally choose the material to generate for the user. The whole process is completed within 100 milliseconds, with multiple network delays and computing delays considered. If there are some data errors, it will adversely affect our service to clients.
2.2 In-House Big Data Architecture
AddNewer Technology has an in-house service platform. When the data scale was not as large, we chose a commercial database to support the overall data. The service system of AddNewer Technology has always been built on Alibaba Cloud, but our database was a commercial one. At that time, the personnel costs and service performance had to be considered. With a complex analysis system, the performance of commercial databases was still much better than self-built clusters, and the server costs were lower.
The data sources were mostly the logs obtained from ECS. We did not have high real-time requirements for data at that time because we mostly did offline analysis. Therefore, in the beginning, the logs were compressed and then aggregated to the data cluster at regular intervals for processing. Then, we used Kafka to collect the information from the relevant data of our cooperators and integrated it into the business report to present to our client.
The history data existed in OSS, and another in-house business intelligence (BI) was used to display the corresponding complex data reports. The results supported some self-dragging analysis. Data analysis was simplified to reduce costs. With offline data at the hour level and the cached data in ApsaraDB Redis, we made a module for online statistics.
2.3 Pain Points of Service in Obsolete Architectures
The obsolete structures had many pain points. As the business and the data scale grew, more problems arose.
These architectures had bad computing elasticity. When the amount of data is small, commercial data groups could scale in and scale out relatively fast. However, the larger the load is, the more difficult it is for scaling. The difficulty of dealing with sudden crashes and the uncontrollability of the time consumed in resource scale-in and scale-out will impose a burden on the business. If the server fails, the overall business will be influenced significantly.
These architectures had complex data management. Over many years, many intermediate tables had been formed. It was difficult to sort out the logical relationships between them. The data was difficult to divide and has a high complexity of regulation. There is a strong dependence between businesses and the contention of task resources. The need for elasticity increased after some resource consumption was created during the computing.
These architectures had low efficiency in specific scenarios. Large-scale data intersections are often used in service scenarios, which involve queries on a large number of data intersections. A single data engine is very fast in some scenarios, but it is not efficient in some specific scenarios. When data is placed in the same cluster, its efficiency will be affected significantly.
These architectures had an unpredictable computing consumption. From the perspective of business, the cost was uncontrollable, and the computing task was difficult to get through to the business. It was difficult to provide clients with a standardized and visualized service.
3. Upgrading of AddNewer Big Data Platform
3.1 Architecture after Upgrading
The most important part of this restructuring is the entire computing engine. Data is migrated to the MaxCompute platform, and DataWorks schedules and manages the data. The application of MaxCompute has improved flexibility significantly.
It is convenient to scale in and scale out in a cloud environment. The computing resources and storage resources are guaranteed. We can also better manage table sharding of the original data table. Then, it is clear how these data are used and the relationship between them. This way, we can perform better workflow management.
During migration, MaxCompute did not support this kind of open-source scheduling. Later, we developed it together with Alibaba Cloud and supported a way to call MaxCompute tasks. The big change took place in our BI 2.0 module. The previous service module was a self-dragging product. We found that some clients did not know how to drag or drop and found this method unacceptable. Now, this method has changed to services that generate reports automatically. This service is currently available to clients, and the number of data queries will increase substantially.
3.2 Effect of Architecture Restructuring: Data
The most significant effect of this architecture restructuring lies in data. First of all, the daily average number of users has increased significantly from hundreds of real-time requests per year to thousands. Some time-consuming tasks have also been solved by MaxCompute. In the past, the waiting time for some time-consuming tasks was very long, but now the time has been reduced by five. The adjustment time of the entire resource was about 72 hours on average, but now it can be less than half an hour. This is the capability brought about by the cloud.
3.3 Benefits of Cloud Big Data Product for AddNewer
Finally, I want to discuss some changes brought about by the structure restructuring in the following aspects:
There is an upgrade in response to business needs. The service capability of business needs is improved, and the cost of every service is reduced. The business costs are predictable, which improves business service efficiency.
There is an improvement in service stability and resilience. The time and effort of resource adjustment and the time consumption of specific computing scenarios are reduced significantly.
There is the transformation of data team capability. On the one hand, business O&M capabilities transform into business-driven capabilities. On the other hand, data analysis turns to machine learning.
There is an extension of new application scenarios. The process and task management have been automated, and the services for technology stacks and businesses are continuously optimized.
Generally speaking, we do not use many complex open-source technologies. When optimizing the service architecture, we think more about our business elasticity and employee management. As decision-makers and practitioners in technology, we pay more attention to the technology supply chain. Our team has been able to focus on solving business problems because of the mature technologies of Alibaba Cloud. We can combine existing technologies in the market flexibly and support the development of our business quickly. The professional division of labor ensures that our team can serve clients better.
Leave a Reply