Cloud cost management vs FinOps
It goes without saying that if you use public or private clouds you need to be able to track IT costs, especially, it is important for public clouds where there is no physical cap and the cost of error (e.g. recursive lambda functions) can be extremely high. You can use either cloud-native cost management tools like AWS Cost Explorer or Azure Advisor, or third-party solutions. But as it was mentioned in one of our previous articles, cloud cost management tools have significant issues as, generally, they are used only by few people at organizations and these solutions don’t establish a cost-saving process and culture of engaging R&D teams. As a result, you get a nice report with a list of action items, but after analyzing it you figure out that you need to bother many people, take multiple actions and, finally, it’s easier to transfer expenses to your customers than to execute the whole report. Here is where FinOps comes into the game.
FinOps is an evolution of cloud cost management with an established culture and process, with R&D, SRE, DevOps teams and individual members being involved in planning and execution. But how to build it? How to evolve from cloud cost management and how long will it take to be there?
3 main steps to establish a FinOps process
If you don’t have any cloud or Kubernetes cost management tool, consider using either free or paid solutions. Tag your resources and allocate budgets or pools to distribute expenses between applications/departments or teams. You can assign multiple tags on resources – one tag can be used to identify an application, another – a team. I suggest having separate budgets for every R&D team and application running in production. In that case, you can track expenses, make analyses and get forecasts. With a proper cost management tool, you can do that in a few days but it’s okay if it takes 4–6 weeks for all teams to update their automation scripts and CI/CD jobs to use tags.
Assign cloud or Kubernetes resources to R&D or SRE teams. It is important at this stage to make the whole team responsible for their resources. They should periodically get a list of resources, a summary of expenses and alerts about budget exceed or TTL violation. In this case, everybody and nobody is responsible at the same time and this makes the transition to the third point very smooth. Shared responsibility helps to eliminate the risk of being punished for any error and educates team members to start thinking about cost-saving and why it is important for the organization. This stage can take from 3 to 9 months, in general.
If your engineering team owns any specific resources, assigns individual kubernetes or cloud resources to them so engineering can properly manage resource lifecycle. By this time they already understand the business need of the process and get used to tracking cloud expenses. At this stage you get the optimal cloud cost management and FinOps process as engineers, who are in the majority of cases the main cloud cost generators, are educated and motivated to reduce an IT cost and that significantly improves the process comparing to traditional cloud cost management tools when an IT guy needs to figure out resource owners and chase engineers to optimize cloud costs and reduce wastage.
In the ideal case companies should use tools that:
- give a way to set and update TTLs, notifications about expiration. Setting tags and running a script to send emails can be an option
- give engineers a way to track their resources, get alerts, update TTLs via Slack or Microsoft Teams
give personalized recommendations about engineer’s resources so he or she doesn’t need to interact with IT guys
- give managers and budget owners a way to track progress and results to make corrective actions if necessary
- educate engineers and managers and simplify their usage of the instruments and
explain the business need of such actions
Beware of avoiding the first and second steps as you need to come to the third step with a ready mentality and an established process. If you force, your team can get a negative experience and it would be difficult to give it another try. And that means that the only thing you can do to improve your metrics – make your customers pay for this mistake.