As we discussed both technical and business benefits of being Well-Architected in our previous blog post, Well-Architected Framework is a comprehensive guidance for organizations to build and improve their architectures in line with the proven best practices. You can find and explore the 5 pillars of Well-Architected Framework and the recommended best practices for each pillar in this post.
This pillar focuses on your ability to support your operations and extract insights for improving your supporting processes and procedures. Operational excellence is a combination of clear understanding of your organizational needs, expected business outcomes, your value promise and organizational culture around all work processes.
Defining your organizational priorities both for the business and technical needs is the key for operational excellence. Clear definitions of expected business and organizational outcomes help you set achievable targets and you design your organizational processes, workflow and culture accordingly.
Clear understanding of the nature of your workloads, their behavior and your operational readiness enable you achieve operational excellence. The way you design your workloads, production flows and deployment processes helps you go into production with success.
Understanding the health of your workloads and ultimately operations help you stay on your track and achieve your expected business outcomes. Defining, collecting and analyzing metrics for workload and operational performance help you understand your current status and take required actions if needed. Understanding operational health also allows you to respond to any events that would affect your workloads and reduce the risk of downtime.
Learning from the operational activities, metrics and data and sharing those learnings across your organization are key to evolve your operations. As your business needs, requirements and priorities change, it is important to evolve your operations accordingly. The small, frequent improvements across your organization also helps you evolve your operational activities and position your organization better.
The security pillar focuses on protecting your organization’s information system and data through various risk assessments and strategies. Key security principles include strong identity foundation, traceability, application of multi layer security, protection of data, preparing for security events and automating best security practices to ensure holistic security within your workloads.
How well you implement your Identity and Access Management plays a key role in your information security system. AWS Identity and Access Management -AWS IAM for short- service allows you to control access of both users and programs, create policies and permissions for your defined principals such as users, roles or services and enforce strong credential management. The key questions to ask yourself when measuring your Identity and Access Management practices might be how do you manage your identities across users and services for resource access and how effectively you design and manage permissions of these defined principals. There are also important questions regarding the least privilege usage, credential management, and Multi-Factor Authentication-MFA.
How well you manage and respond to security events determines your success in your security management. More importantly, detective controls help you identify potential security events and assess their scope. In a cloud environment, you have various services to help you set strong detective controls such as collection of logs, monitoring and automation. You can collect auditlogs with the AWS CloudTrail, analyze configurations and your resources with AWS Config,monitor your resources and analyze logs with AWS CloudWatch. These mechanisms help you understand and identify security events and the effectiveness of your security stance depends on how well you implement such mechanisms.
The effectiveness of your infrastructure protection efforts depends on how well you protect your computing and network resources. Setting up multiple security layers in your computing resources (as offered in many AWS services such as Amazon EC2, Amazon S3, Amazon EBS) and ensuring the security of your network such as operating in a Virtual Private Cloud (VPC), help you meet the requirements that both you have set and you are expected from the regulatory authorities.
The way you classify your sensitive data and how effectively you handle it affects your ability to protect the confidentiality and integrity of the data. Also, to what extent you are able to protect your data both in transit and at rest plays a critical role in the data protection and compliance. You should be evaluating the effectiveness of your data classification and protection methods such as encryption, tokenization etc. to ensure data security.
You should evaluate the effectiveness of your incident management program to assess your ability to anticipate, respond and recover from security incidents. Being prepared for security incidents through implementing tools, defining the procedures beforehand, running simulations regularly and designing your architecture accordingly help you recover from security incidents faster and in a more effective manner.
Reliability Pillar focuses on the performance of your workloads and their ability to perform their functions through their lifecycle. Key principles for reliability include automated recovery, test recovery procedures, horizontal scaling, auto-scalability and effective change management. Reliability pillar also puts emphasis on your fault tolerance and workload status monitoring.
How well you define your architectural foundations determines your reliability. When you are designing the architecture for your workloads, you should consider your service limits and your chosen cloud environment(s). You should make informed decisions when deciding on your network topology and engaging in service limits to avoid unintended resource usage due to external abuse.
On the other hand, your service architecture design impacts the reliability of your workloads throughout your operations. You should focus on how well you manage interactions within your distributed systems to avoid, if not, mitigate failures.
You should monitor your workload health to get an holistic view of your workload environment and be able to implement necessary changes. Your workloads should be scalable and adaptive to these changes by design. Change management also ensures you that these changes will generate predictable and expected results.
Being fault-tolerant is critical for reliability. Setting organizational recovery procedures and objectives help you implement efficient data backup and recover from workload failures. You should also consider your business needs to build a strong disaster recovery strategy for your workloads through game days within the organization.
Performance Efficiency Pillar focuses on your ability to efficiently use and evolve your computing resources in line with your business needs and keeping your operations efficient as your needs change. Implementing and encouraging the next-gen technologies within your organization, supporting experiments to improve performance and adapting serverless architectures contribute to your overall performance.
Deciding on your workload architecture is important to gain performance efficiency throughout your operations. Multiple factors such as your business vertical, size, strategic objectives and priorities affect the performance of your workloads, and you should focus on supporting these factors with different approaches within your architecture. Also your network decision should reflect your needs in terms of maximum acceptable latency, required bandwidth and throughput.
Compute resources are diversified in respect to different technical needs of the organizations. You should define your computing needs when designing your workloads and choose which computing resource type is the best fit for operational performance, both in terms of function and the cost.
Cloud storage is a huge advantage for organizations in terms of convenience and cost, but you should indeed spend time on which storage type would be the best as well. You probably do not need to access all your data 24/7, however there may also be some cases where you need immediate access. A well-defined storage and access needs can help you choose the best storage type.
When choosing your database solution, you should consider the nature of your workload and both current and future needs associated with your data. Factors such as availability, consistency, query capability and etc. have a great impact on your database’s performance.
Keeping up with the new technologies and encouraging their use to take advantage of improved offerings in your workload definitely improves your performance. Also, monitoring your ongoing system performance to avoid both internal/external factors is critical to ensure performance efficiency over time.
Building the optimal architecture for your organization requires a clear understanding of trade-offs between your priorities and needs. Your preferences on availability, latency or performance all require different approaches to your architecture setup and you should work on reflecting your priorities in these decisions to achieve performance efficiency.
The cost optimization pillar focuses on operating in the cloud and delivering business value at the minimum cost available, based on your operational needs. Putting emphasis on cloud financial management, total cost of ownership, migrating from physical data centers and analyzing your expenditure on the cloud helps you achieve your expected business outcomes with lower cost.
You should work on aligning your organizational financial objectives with your technical requirements to take advantage of lower costs provided in cloud computing. The alignment between the financial and technical departments within your organization helps you effectively manage your cloud expenditures. Also, continuous monitoring of these costs and required usage help you better understand and gain awareness of both current and future cost figures.
Just as in the performance efficiency focus, identification of your technical needs and understanding the patterns of your service usage can help you leverage more cost-effective service selections. You should monitor and evaluate your ongoing service usage, utilization and available options to lower the cost when needed. As new features and service offerings emerge, adapting them help you cut your costs due to their improved offerings. So, improving your services and embracing new technologies enables you to achieve lower costs in addition to increasing your performance and reliability.
A fresh new graduate and specializing in marketing, Deniz is excited to learn and share her knowledge on business technologies and technology culture. With her experience in technology companies during her school years, she is always excited to learn more about how technology transforms businesses.
Cookies are small files that are sent to and stored in your computer by the websites you visit. Next time you visit the site, your browser will read the cookie and relay the information back to the website or element that originally set the cookie.
Cookies allow us to recognize you automatically whenever you visit our site so that we can personalize your experience and provide you with better service.