Download and Learn Site Reliability Engineer Udacity Nanodegree Course 2023 for free with google drive download link.

Master the job-ready skills you need to be a successful site reliability engineer and start designing systems to automate responses to software site issues.

What You’ll Learn in Site Reliability Engineer Nanodegree

Site Reliability Engineer

Master the skills necessary to become a successful site reliability engineer. Learn to build automation tools that ensure designed solutions respond to requirements such as availability, performance, security, and maintainability.

Site Reliability Engineer Intro Video:

Prerequisite knowledge

Python or Java, Bash or Powershell, Linux, UNIX Shell and SQL.

A well-prepared learner is already able to:

  • Write basic functions in an object-oriented language (Python or Java), such as for loops, conditionals, Control Flow; Python Methods; Java Methods, etc.
  • Write basic shell scripts in Bash or Powershell, which could include for loops, conditionals, scripting, etc.
  • Work with Linux command-line (bash/shell) and UNIX Shell
  • Create simple SQL queries using SELECT, JOINS, GROUP BY functions.
  • Display networking skills including knowledge of virtual networks, DNS, subnets, and basic network troubleshooting techniques.
  • Perform DevOps tasks, such as setting up monitoring, doing feature rollout, troubleshooting production issues, ideally for large systems.
  • Work with Kubernetes and basic kubectl, such as kubectl apply, kubectl create, kubectl config.

Foundations of Observability

Get a practical introduction to what observability requires in terms of people and tools. Learn about site reliability engineering, its roles and responsibilities, and how those differ from other teams. See how the role helps an enterprise improve, discuss associated costs, learn the types of members and about the tools a team may use.

Project – Observing Cloud Resources

Configure a monitoring software stack to collect and display a variety of metrics for commonly used cloud resources. Establish and configure rules for alerting and set parameters to be notified prior to the occurrence of failures. Test and observe the implementation of the monitoring software stack to apply and showcase SRE methodologies and practices.

Planning for High Availability and Incident Response

This course will cover monitoring, high availability (HA) and disaster recovery (DR), infrastructure as code, and database recovery and availability. Learn the basics about SLOs and SLIs as well as how to translate them into queries and finally graphs. Also, learn how to design and deploy highly available databases to AWS.

Project – Deploying HA Infrastructure

Design and deploy HA infrastructure through Terraform and deploy it to AWS. Start by defining SLOs and SLIs, create a disaster recovery plan, and define the high availability infrastructure. Develop Terraform code to deploy the infrastructure to multiple AWS regions and then deploy replicated databases.

Self-Healing Architecture

Learn how to deploy microservices or cloud architecture that is resilient enough to withstand failures, and predictable enough to resolve issues via automation without human intervention. Understand self-healing system design fundamentals, deployment strategies, implementation steps, and use cases. Learn cloud automation to increase the resiliency of systems.

Project – Deployment Roulette

Play the role of an engineer at a growing consulting firm. Applications left by a departing team are in an undocumented, unknown state. Identify failing applications and implement fixes to resolve the problems. Create an architecture diagram that communicates the status of the cloud environment to improve the onboarding of future developers.

Establishing a Culture of Reliability

Learn how to develop processes and frameworks that drive workplaces toward putting reliability first by working through the incident management process and how to have effective on-calls. Understand how to perform reliability reviews on various phases of your system, how to effectively manage system capacity, and how to reduce toil.

Project – Plan, Reduce, Repeat

Participate in three mock scenarios one might encounter as an SRE. In the first scenario, utilize capacity management skills and demonstrate how to maintain an as-built document. In the second scenario, utilize on-call best practices and complete with a post-mortem. In the third scenario, develop a toil reduction plan and perform some hands-on automation.

Median base salary for a site reliability engineer is $200,000.

Site Reliability Engineer Nanodegree Free Download Link: