In recent years, IT transformation has been replaced by digital transformation in enterprises. What exactly is digital transformation? It is the process to utilize data technology to revolutionize how companies push forward organizational, business model, and cultural changes. The key idea is to use new technology advancements in mobile, web, social, big data, AI/ML, IoT, Cloud computing, block chain (just to name a few) for companies and vendors to provide new and differential products and services. Therefore, digital transformation, at its core, relies on the deep integration between technology and business models. The ultimate goal is to revamp the business models of enterprises.
Complete digital transformation requires the system infrastructure to be "open, integrated, software defined, contextual/situational, automated, and intelligent". Digitization not only requites the transformation and upgrade of business processes and applications, it also puts tremendous pressure on the system architecture (esp., data infrastructure) on which the applications reside. To adapt to the ever-changing and fast-moving business environment and competition, enterprise architects need to reevaluate and optimize the corporate structure as well as IT structure based on business needs. Key considerations include software defined data centers, hybrid cloud strategy, multi-center active-active systems, multi-availability zone networking, business continuity, geolocation security and operations, digital marketing etc.
Whether you are planning on a new system architecture or restructuring existing aged architectures, you should engage with experienced consulting and implementing teams to carry out the projects independently and holistically. This will avoid some known problems, for example, isolated business requirements; half-baked design by non-professionals; focusing on short-term gains while sacrificing long-term benefits. A well-carried out infrastructure project will have two clear benefits:
LongDB has a professional team with rich consulting experience, and mature solution references in different industry sectors. We specialize in helping customers set up the best system architecture for their use cases. The service includes, but is not limited to:
A Big Data platform is a platform that encompasses data acquisision and transformation (ETL), storage, computing, operations and data services.
Without a balanced environment, your big data project's value cannot be realized. We can do full-stack tuning to achieve the best utilization of CPU, memory, disk, and network bandwidth according to your workload patterns.
Some vendors have been offering certain solutions based on the application requirements (I/O intensive or compute intensive). However, due to the complexity of the systems and the modules involved, a lot of the solutions cannot address the foundamental system issues, thus leaving a huge room to improve. On the other hand, in some projects, the hardware environment has already been pre-determined before software stack is chosen. Therefore, better utilization of existing hardware environment and configurations to improve performance become critical.
LongDB's technical team has rich experience in turning and optimizing large-scale big data systems. We can guide clients to tune their workloads based on hardware configurations, system resource availability, data placement and format, and specific application requirements.
Hardware environment is the backbone of system performance. Big data platforms consist of various service components that have different usage patterns on the underlying hardware resources. For example, storage capacity planning can be done primarily according to a predictable job pattern and data growth pattern. The disk I/O throughput and network bandwidth typically have hard limits and are usually leave little room for tuning for SLA-tight services. On the other hand, CPU and memory resources can be allocated more elastically.
We can perfom disk I/O tuning, balance disk and network I/O, choose the suitable server model for best performance/cost ratio, pre-allocate or dynamic allocate memory pools.
We know resource requirements from analytical workloads (including growing needs from statistical analysis, data mining, AI/ML applications) are more elastic. Therefore, there is a lot more opportunity to tune for comeplex and long-running analytics jobs. However, the operating system cannot distinguish the resource needs from these workloads. We will also consider turning off services that are unnecessary or impacting critical business workloads.
On a distributed system, there are a lot of services running. Each of them will require specific configurations to be set. However, some of the configurations can be conflicting. For example, while increasing the Spark executor memory allocation can improve response of single spark job, it may also impact/limit the parallelism of the system. However, running tiny executors (single core and enough memory for one task) will throw away the benefits that come from running multiple tasks in a single JVM (resulting unnecessary copying of data).
The optimization should consider multiple factors holistically, for example, JVM setting, parallelism required, compression tradeoff (increasing CPU paths for reduced I/O), pre-allocation/pre-fetch/caching of resources.
When data is loaded, completing certain pre-processing can greatly improve the performance of workloads. When data grows, where to put hot, warm, and cold data in a hybrid cloud environment while still satisfying busines needs will directly impact the bottom line.
Application development, packaging, and deployment can be tricky in regards to how to best utilize the different storage architecture and micro-service coordination as well as load balancing. A good systematic design should be considered.
In production environment, there are some typical areas of consideration: job skew, runaway execution, serialization and encryption algorithms used etc.
We can profide guidance regarding the management of the full data life cycle. This can include formulating enterprise master data standards, setting up data flow frameworks, planning easy-to-extend metadata management platform, and establish proper data security guildlines and compliance policies.
Based on customers' requirements, we can either suggest the most suitable data governance tools (e.g., master data management, metadata cataloging or lineage tool) to use to help clients establish a data governance policy and playbook.