Why is platform engineering getting so popular among CTOs in 2023?

Platform Engineering - A CTOs Guide
Discover what's behind the recent popularity of platform engineering and why should you care.

Table of Contents

What is Platform Engineering? A brief overview

“Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era.” The Gartner cycle has identified Platform Engineering as a vital trend in the upcoming years.

7pG19 oBvDGDJHdbpHY q8gpGxuxNOePzVNjPOO JIgcZTmxKPsA1bx3sQtmiTqkmD N qsPSPhHeE3QAS0Ag2t 8C8ZpEvgettKvuqrAlWBcSxo3A2PD0SEHYBqpn0vsdGtOQS IX9iYDca55Ijelc

Particularly in cloud computing, platforms have helped enterprises realize values long promised by the cloud like fast product releases, portability across infrastructures, more secure and resilient products, and greater developer productivity. (Source: CNCF Platforms white paper)

Platform engineering and platforms provide key capabilities to developers: Continuous Integration, Continuous Deployment, Infrastructure provisioning, Messaging and event services, Environments and workflows, Authorizations, Secrets, Registries and repositories,  Logging and monitoring workloads etc

Some organizations that have adopted platform engineering practices are:

  • Netflix uses platform engineering to manage its massive streaming infrastructure.
  • Uber has developed an advanced microservices platform that supports their large-scale, distributed architecture.
  • Spotify uses a workflow management system which enables its developers to make rapid deployments at scale.

Is Platform Engineering different from DevOps?

At an overall level, Platforms and DevOps complement each other. Let’s take a look.

Rise of DevOps

During the early days of computing, there was a clear distinction between the roles of developers and operations teams. Developers were responsible for writing the code, while operations teams were responsible for managing the infrastructure that ran the code.

As the complexity of software applications increased, the demand for scalability and availability grew. Developers began to take on more responsibility for managing the infrastructure that ran their applications. The need for active collaboration between development and operations teams gave rise to DevOps. The term “DevOps” was first coined in 2009 by Patrick Debois, who was inspired by the work of the Agile Alliance and the Lean Startup movement.

But as the cloud-native ecosystem matured and became complex, the philosophy of enabling developers who build software, to also be able to run it became unmanageable. There needed to be a change, and one solution that emerged was platform engineering

“When organizations successfully establish a platform model for enabling application development or significantly improve their change management effectiveness, they achieve the goal that DevOps initiatives aim at: faster and easier delivery of better quality, more secure software.”

~State of DevOps Report 2020

DevOps paves the path for Platform Engineering

DevOps has changed the IT industry for the better. There has been a significant uptick in automation, infrastructure as code, measurement, collaboration, and systems thinking, all helping us deliver better software, more quickly.  

DevOps philosophy gave power to developer teams, so they were able to build the tooling needed to build, test, and release software faster. 

However, despite the notable progress, many organisations find themselves stuck, unable to move further in their DevOps journey. This obstruction was mainly due to two key forces,

  • Decentralization at scale: Decentralization empowered developers following the DevOps principles. But it also led to disorganised sets of tools, processes and automation. The problem was further exacerbated as software matured and organisations grew larger. Decentralization, at scale, became inefficient.
  • The complexity of the cloud native ecosystem: Initially, making Dev & Ops work seamlessly together was the perfect solution. But as the ecosystem matured, the Ops part required additional effort. The ecosystem is so complex that asking developers to learn it is no longer viable.
landscape

Source: Cloud Native Computing Foundation

The Ops part of DevOps started overwhelming the developers and the organisation’s efforts. Platform engineering tames the complexity. To cite Puppet’s State of DevOps Report 2021, “The existence of a platform team does not inherently unlock higher evolution DevOps; however, great platform teams scale out the benefits of DevOps initiatives.”

Platform engineering: A step in the right direction

Microservices and containerisation remain a common ground for DevOps and Platform engineering. At its core, the platform ensures that the complexity of ops & infrastructure does not incapacitate developers. It begins with taking advantage of containerisation. Developers can choose tech stacks of their choice, and containerisation ensures that the platform can function agnostic of the tech stack. Additionally,

  • Platform engineering encourages the design and development of tooling to meet developers’ evolving needs.
  • Developers focus on the core, while the platform engineering team tackles the complexity of the cloud native ecosystem.
  • Platforms lead to centralization, increasing standardization, and decreasing duplication of effort across teams.
  • Incorporates comprehensive, automated security and compliance checks as part of test suites.
  • Automation and standardized tools increase collaboration across all of the teams. Increases the ease of integration of software applications. 
  • Clear guidance for where teams interact via self-service interfaces.

Organisations with established platform teams have reported a notable improvement in development speeds. Those who had formed platform teams more than three years ago reported greater improvements than those whose teams were less than three years old. 

While it may take some time to inculcate platform engineering within the organization, once established it can greatly impact speed, scalability and consistency of development. These benefits are often realized relatively quickly.

Jz CbmKhX 5v5mYPIWgI5BwUjUbqh5QL cdmE3gwp41CKJDTLB14qn2XEQeOHpopCersqNYasGed3QqJkbZsJ VBdrctFAsNVcKesDAy LiuKZoofVwtqJ IXZL8flANqVA7lGfrrKwrXLDsf0B246M

Source: The State of DevOps Report 2023

According to a Garner report, by 2026, 80% of software engineering organizations will establish platform teams as internal providers of reusable services, components and tools for application delivery. Many organizations are waking up to the benefits of platform engineering and developer self-service.

9 ways platform engineering can benefit your organization

According to a recent study by Puppet, platform engineering not only helps improve the speed of software delivery but also has a broad impact on the culture of the organisation. The multiple benefits cited by respondents include:

  1. Improved productivity, 
  2. Higher system reliability, 
  3. Standardizations,
  4. Lower time-to-market, 
  5. Improved security, 
  6. Enhanced collaboration,
  7. Increased scalability and flexibility,
  8. Cost optimization, and
  9. Manages complexity

Improved productivity

Developers often use many fragmented tools for different tasks. Switching between tools and having to debug incompatibility issues causes inefficiency and lost productivity.  

“Most developers’ day-to-day is inefficient primarily because of the dozens of fragmented services and tools they use to build, run and scale applications. The inefficiency inadvertently leads to lost productivity. For small companies, the fragmented developer experiences might be tolerable, but the need to unify them grows as the business grows.”

~Brian Leathem, Senior Software Engineer at Netflix

With platform engineering, the self-service capabilities reduce dependency on IT operations teams and empower developers to provision resources and deploy applications on demand. This boosts productivity by eliminating delays. 

Platforms also improve operational efficiency by reducing the manual work of the operations team and allowing them to focus on strategic initiatives.

Netflix’s platform engineering efforts with the Netflix API Gateway improved developer productivity by providing a unified interface for managing API traffic. (Source: PlatformEngineering.org)

Improved Reliability and Stability 

Platform engineering emphasizes robustness and reliability by implementing monitoring, observability, and incident response tooling. This ensures early detection and prompt resolution of issues, minimizing downtime, and enhancing system stability.

Platforms automate logging, monitoring and alerting which in turn reduces chances of manual errors in the system. Further improving reliability and stability.

Standardizations

Platforms do away with custom scripts that might have plagued the DevOps era, instead, they allow for best practices to be standardized in the platform itself. All teams can build on top of good standards.

Uber has developed a workflow orchestration platform called Cadence. It has proven to be a robust engine that scales seamlessly to handle complex scenarios while enabling developers to build workflows using the native programming language of their choice. It is being used by thousands of developers in Uber and other companies (e.g. Doordash, Hashicorp, Coinbase). It operates at 99.9% availability despite its scale. This is a good example of the second-order impact of platform engineering, which provided a solid foundation to build on top of.

Lower Time-to-market

Automation and self-service capabilities reduce the time required for provisioning resources, deploying applications, and iterating on features.  Platform engineering ensures that developer teams are not distracted from building the software. It ensures that testing and deployment happen without delays.


Spotify has developed a Fleet Management platform that enables its developers to make rapid changes at scale. More than 95% of Spotify developers believe Fleet Management has improved the quality of their software. It has also helped reduce the time to deliver new features. The graph below shows that earlier it used to take 200 days for a new version/feature to reach 70% of the developers. With the Fleet Management platform, the time has been reduced to less than 7 days. (Source: Spotify Engineering blog)

lnrzYjpA9FtJ3PkuQzsRRM0xPWOc SGGdjUwCg56d8z5DynCR76s 3FXyGtU9Q6xyASxI0wx8MK9HxCPxgnlGlGmwvtVU LyiCb

The resources are freed up from having to manually maintain or build pipelines, and manage infrastructure. They can instead focus on NPD & innovation efforts.

Enhanced efficiency

Platform engineering fosters cross-team collaboration and knowledge sharing by providing a common platform and tooling. Imagine 10 teams, each with its own technology stack, toolchain and processes. You’re going to have all these teams trying to solve similar problems, spending way too much time evaluating technologies for CICD, orchestration, monitoring etc. And then spend further time integrating them, maintaining them, and debugging them.

A platform model defined by a clear purpose, boundaries and responsibilities can significantly reduce the toil and overhead of individual product teams. This leads to increased efficiency and improved overall productivity. 

Improved Security

A short list of ways in which platform improves security

  • Platform engineering helps establish organisation-wide best security practices and tools. 
  • Automated workflows also reduce vulnerabilities caused due to manual errors.
  • Standardized workflows with built-in checks ensure compliance with regulations and industry standards. For example, the platform automatically can scan docker images for the latest CVE vulnerabilities.
  • Automated tools can monitor system logs, network traffic, and user behaviour to identify potential security breaches or suspicious activities, enabling timely mitigation and response.
  • Robust identity and access management system with MFA and role-based access control.

For example, Spotify was able to deploy a fix to the infamous Log4j vulnerability to 80% of their production backend services within 9 hours. This was possible because they consistently ensure that every component of the system-wide Fleet Management platform is up to date with bug fixes and improvements. (Source: Spotify Engineering blog)

Increased scalability and flexibility

Platform engineering facilitates scalable and flexible infrastructure management, allowing organizations to adapt to changing demands. By leveraging cloud services, containerization, and auto-scaling capabilities, platforms can dynamically scale resources based on workload needs

Intuit, the company behind TurboTax and QuickBooks, leverages cloud-based infrastructure, particularly on AWS (Amazon Web Services), to support their platform engineering practices. By utilizing scalable compute resources, managed services, and containerization technologies, they can handle fluctuations in demand, improve resource utilization, and easily scale their applications to meet customer needs. (Source: Intuit Engineering Blog)

Cost Optimization 

Platform engineering optimizes resource utilization and reduces costs by leveraging cloud services, automation, and efficient infrastructure management. Organizations can scale resources as needed. They can pay for what they use and when they use it. They can avoid unnecessary overhead in terms of idle resources

Nike built composable platforms utilizing cloud services to unite strategically related business capabilities. The platforms are modular and composable, exposed through APIs. Nike finds the platforms help it respond quickly, shorten time to market, increase scalability, and decrease costs.

Manages complexity

Platform engineering provides tooling to manage the inherent complexity of multi-cloud environments, applications with large numbers of microservices, and complex workloads like those of AI/ML.  

To be honest, Kubernetes itself can get quite complex. It requires laying the foundation with a multitude of components like service mesh, secrets, observability, certificate management, registries, repositories, messaging services, authorization services and more.

A platform ensures these complexities and responsibilities do not bother the developers.

What are the challenges in Platform Engineering?

No matter what the benefits, implementing new technology and bringing about a culture change is never a bed of roses. Here are some common challenges an organization faces when implementing platform engineering practices

  • Continuous Evolution: Platforms need to evolve continuously to meet changing business needs and technological advancements. Keeping up with emerging technologies, industry trends, and evolving user requirements can be a challenge. Platform engineers must invest time and effort in research and development to ensure the platform remains up-to-date and competitive. 
  • Resource Allocation: Platform engineering requires dedicated resources, including skilled engineers, infrastructure, and tools. Organizations may face challenges in allocating sufficient resources for platform development, maintenance, and support. Balancing the needs of platform engineering with other project requirements and priorities can be a challenge, especially in resource-constrained environments.
  • Adoption and Culture: Introducing platform engineering practices or getting teams to adopt the platform can be challenging. Adoption may face resistance from teams accustomed to pre-existing practices. Developers might feel their freedoms are being curbed if they were used to DevOps practices. It will take training and change management to help them realise the benefits of having a platform take care of ops and infrastructure.
  • Integration and Compatibility: Platforms often need to integrate with existing systems, third-party services, or legacy infrastructure. Platform engineers must address interoperability issues, perform thorough testing, and implement robust integration mechanisms.
  • Documentation and Knowledge Sharing: Comprehensive documentation and knowledge sharing are vital for platform engineering success. Documenting platform architecture, design decisions, and best practices can be time-consuming, and ensuring that the documentation stays up to date can be a challenge. Encouraging knowledge sharing through documentation, wikis, and internal communication channels is essential.
  • Vendor Lock-in: If organizations rely heavily on third-party platforms or cloud services, there is a risk of vendor lock-in. Depending too heavily on say AWS or GCP’s proprietary pipelines or APIs can limit flexibility and create dependency. Platform engineers must design solutions with portability and interoperability in mind.
  • Operational Complexity: Platform engineering includes managing the operational aspects of the platform, such as monitoring, maintenance, and troubleshooting. Dealing with issues like performance bottlenecks, system failures, and identifying root causes can be complex and time-consuming. 
  • Complexity and Scale: Dealing with distributed systems, microservices, and interdependencies among various components can introduce complexity. Balancing scalability, performance, and maintainability becomes a challenge.

Being aware of these challenges, you can tackle them head-on in the execution phase.

A note on stages of platform maturity and impact

We look at 3 levels of platform engineering maturity: Manual, hybrid, and fully automated.

Running a startup with manual processes

For an early-stage startup with a small team working on product delivery, running a manual process might appear enticing due to cost benefits. Manual processes offer greater flexibility and customization compared to rigid automated systems without substantial effort from the get-go. Startups in niche industries or those with unique requirements may find it easier to tailor processes to their specific needs without relying on pre-packaged software solutions.

However, manual processes could prove to be more challenging than beneficial in the long run. Manual processes are often time-consuming and prone to human errors. As a startup aims to scale and grow, inefficiencies in manual processes can become bottlenecks, hindering progress. Scaling operations becomes difficult, leading to delays and inconsistencies.

For example, with a manual pipeline, every deployment has a chance of not going into production. This in turn requires deployments to be scheduled, overtime budgeted for troubleshooting, and delays in launching the improved version of software. The uncertainty of deployments is not good for team morale either. Now imagine this inefficiency, occurring across the multitude of functions on the operations and infrastructure.

Startups need to make data-driven decisions to optimize operations, identify growth opportunities, and understand customer behaviour. Manual processes make it difficult to gather accurate and timely data, limiting the ability to derive insights and make informed decisions.

To overcome these challenges, startups often adopt automation, digital tools, and streamlined workflows enabled by platform engineering and DevOps practices. These approaches help automate repetitive tasks, enhance scalability, improve agility, foster collaboration, and enable data-driven decision-making, ultimately driving efficiency, productivity, and growth.

Hybrid model for startups

Startups can benefit from adopting hybrid models that incorporate both automation and manual processes. This approach allows startups to leverage the advantages of automation while maintaining flexibility and customization where necessary.

This also is a case for startups that started with manual and are gradually automating components as time progresses. For example, CICD might be automated first while monitoring and alerting await.

Managing a semiautomated infrastructure can introduce complexity. Coordinating and integrating automated and manual processes require additional effort and resources. Maintenance and updates for both automated and manual components need to be considered. 

As the startup grows so will the complexity of its infrastructure. When different parts of the infrastructure are managed manually or rely on disparate tools, it becomes challenging to maintain a cohesive and standardized environment. Lack of knowledge sharing can contribute to increased tech debt as the system becomes reliant on a limited set of experts. 

To mitigate the potential tech debt associated with a semiautomated infrastructure, it’s important to establish good practices and processes:

  • Define clear standards and guidelines for both automated and manual processes. Strive for consistency across the infrastructure to minimize fragmentation.
  • Allocate dedicated resources for ongoing maintenance and updates of all components, including automated and manual processes. 
  • Address integration challenges early on to avoid tech debt accumulation.
  • Maintain up-to-date documentation of both automated and manual processes to facilitate knowledge transfer and onboard new team members effectively.

A mix of manual and automated models can give the advantage of speed but it does accumulate tech debt. It is crucial that the leadership team has a plan to pay off the tech debt in a timely manner. This means putting aside time, effort and cost that would otherwise contribute to core software development. It is the price of the speed gained in the initial stages

Overall, It’s important for startups to conduct a thorough analysis of their specific needs, objectives, and resources when considering a hybrid infrastructure and operations model. Every startup’s circumstances and requirements will vary, and understanding the trade-offs involved will help in making informed decisions.

Automated self-service platforms

A self-service platform allows startups to scale their operations efficiently. It enables developers to independently access and utilize resources and services they need without relying on manual processes or support. This scalability and efficiency help startups handle growing demands and increase their productivity. End-to-end automation ensures excellent performance on key metrics like deployment frequency, resource utilization, MTTR, change failure rates, time to restore services after failure etc.

One big recommendation I can make is that you get rid of all the daily recurring tasks…these manual activities, cost a lot of time and have no great benefits for your developers so, automate as much as you can so that you have more time to spend with your developer community

~Roland Fetcher, Platform Architect at Mercedes Benz

Implementing a self-service internal platform can offer numerous benefits to startups, but there are also tradeoffs that need to be considered. Building and maintaining a fully automated self-service platform can be complex and requires expertise in infrastructure management and support services. Startups need to invest precious time, resources, and skills in developing and maintaining the platform.

The effort and costs are akin to managing two product teams – one for the core software product and one alone for the internal development platform. 

Developing a fully automated self-service platform internally can require a significant upfront investment. Startups need to allocate budgets and resources to build and maintain the platform, which might divert resources from other areas of the business. It’s important for startups to carefully evaluate these tradeoffs and align them with their business goals, resources, and customer needs. One alternative is to acquire a platform, this ensures that resources and teams are focused on the core software. (You can take a look at a case study about how an ML company in the pharmaceutical industry went about choosing the right model,  here)