Software Development Practices and Tools

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

What is the purpose of dependency scanning in software development?

  • To optimize application performance
  • To automate deployment processes
  • To manage version control of dependencies
  • To find security vulnerabilities in dependencies (correct)

Which tool is NOT mentioned as a dependency scanning tool?

  • Blackduck (correct)
  • Retire.js
  • Synopisis
  • Gemnasium

What is the function of a vulnerability database in the delivery pipeline?

  • To generate application documentation
  • To collect and disseminate information about security vulnerabilities (correct)
  • To monitor application performance in real-time
  • To integrate multiple CI/CD tools

What does Continuous Delivery enable in software development?

<p>Software deployment can occur at any time (C)</p> Signup and view all the answers

Which of the following tools is used for release orchestration?

<p>Gitlab CI (D)</p> Signup and view all the answers

What is the main goal of fuzzing in automated testing?

<p>To provide random data inputs and observe outcomes (A)</p> Signup and view all the answers

Review apps are designed to facilitate which of the following?

<p>Real-time code reviewing by spinning up environments (D)</p> Signup and view all the answers

Which type of scanning ensures that a container image does not contain known vulnerabilities?

<p>Container scanning (A)</p> Signup and view all the answers

What is the primary goal of Site Reliability Engineering (SRE)?

<p>To create ultra-scalable and highly reliable distributed software systems (A)</p> Signup and view all the answers

How do Site Reliability Engineers (SREs) allocate their working time?

<p>50% on ops work and 50% on development tasks (C)</p> Signup and view all the answers

Which statement about SRE is true?

<p>SRE involves both software engineering and operational challenges. (D)</p> Signup and view all the answers

What was a primary reason for the creation of Site Reliability Engineering at Google?

<p>To address the challenges of infrastructure and operations problems. (B)</p> Signup and view all the answers

Which of the following is NOT one of the key pillars of success for DevOps as defined at Google?

<p>Implement rapid changes (D)</p> Signup and view all the answers

What percentage of their time do SREs spend on monitoring, alerting, and automation?

<p>Part of their 50% ops time and development tasks (D)</p> Signup and view all the answers

Which of these activities is primarily associated with the 'ops' related work of an SRE?

<p>Manual interventions and issue resolution (D)</p> Signup and view all the answers

What does incremental rollout involve?

<p>Deploying small, gradual changes to a service (C)</p> Signup and view all the answers

Who popularized the Site Reliability Engineering discipline?

<p>Ben Treynor at Google (B)</p> Signup and view all the answers

What is the primary purpose of canary deployments?

<p>To identify issues before full rollout by testing a small user base (D)</p> Signup and view all the answers

What are feature flags primarily used for?

<p>To decide which behavior a system should invoke without altering code (D)</p> Signup and view all the answers

What is the focus of release governance?

<p>To track and manage releases in an auditable way for security and compliance (D)</p> Signup and view all the answers

What is meant by secrets management?

<p>The tools and methods for managing digital authentication credentials (B)</p> Signup and view all the answers

Which of the following best describes Auto DevOps?

<p>An automated approach to applying DevOps best practices across projects (D)</p> Signup and view all the answers

What is a common characteristic of blue/green deployments?

<p>Utilizes colored environments to manage incremental rollouts (D)</p> Signup and view all the answers

What is the primary purpose of tracing in application management?

<p>It helps track the performance and health of a deployed application. (C)</p> Signup and view all the answers

What is the significance of launch Darkly in feature flags?

<p>It facilitates the toggling of features without changing code (D)</p> Signup and view all the answers

What does synthetic monitoring involve?

<p>Simulating the actions of end-users to monitor service behavior. (B)</p> Signup and view all the answers

Which tool actively blocks threats in the production environment?

<p>RASP (D)</p> Signup and view all the answers

What is the purpose of a status page in service management?

<p>To communicate the status of services to customers and users. (D)</p> Signup and view all the answers

In incident management, which aspects are captured to ensure service level objectives are met?

<p>The who, what, when of service incidents. (D)</p> Signup and view all the answers

What does User and Entity Behavior Analytics (UEBA) primarily focus on?

<p>Analyzing normal and abnormal user behavior. (D)</p> Signup and view all the answers

Which type of monitoring informs the deployment environment's health?

<p>Cluster Monitoring (A)</p> Signup and view all the answers

Which of the following is a feature of a Web Application Firewall (WAF)?

<p>Examine traffic and block malicious requests. (A)</p> Signup and view all the answers

What is the primary focus of vulnerability management?

<p>Scanning for vulnerabilities in assets and applications (C)</p> Signup and view all the answers

What does DLP stand for in the context of security?

<p>Data Loss Prevention (D)</p> Signup and view all the answers

Which metric is used to measure the Mean Time to Recover for components?

<p>MTTR (A)</p> Signup and view all the answers

What is chaos engineering primarily used for?

<p>To enhance resilience by identifying key dependencies (D)</p> Signup and view all the answers

How can organizations benefit from anti-fragility?

<p>By creating systems that become stronger through challenge (B)</p> Signup and view all the answers

What does SLO stand for in service metrics?

<p>Service Level Objective (A)</p> Signup and view all the answers

Which metric measures the Mean Time to Detect failures or incidents?

<p>MTTD (A)</p> Signup and view all the answers

What is the Recovery Point Objective (RPO) associated with?

<p>Total data loss time tolerance (D)</p> Signup and view all the answers

What does MTRS represent in terms of service metrics?

<p>Mean Time to Recover Service (A)</p> Signup and view all the answers

What purpose does simulating component failure serve in a resilient system?

<p>It creates an opportunity for auto-recovery processes (A)</p> Signup and view all the answers

Which tool is commonly used for managing the lifecycle of services?

<p>Service Now (D)</p> Signup and view all the answers

What functionality do tools like Jira and Trello provide in value stream management?

<p>Time tracking (C)</p> Signup and view all the answers

Which of the following is not a capability of issue tracking tools?

<p>Source code storage (A)</p> Signup and view all the answers

Which tools are specifically noted for providing visualization in the DevOps lifecycle?

<p>Gitlab CI and DevOptics (D)</p> Signup and view all the answers

In Agile Portfolio Management, what is the primary focus?

<p>Evaluating in-flight projects (C)</p> Signup and view all the answers

Which of these is a tool that can be used for peer code reviews?

<p>Crucible (C)</p> Signup and view all the answers

Which aspect of management is associated with handling test case planning and defect tracking?

<p>Quality Management (C)</p> Signup and view all the answers

What is a primary feature of Kanban boards in relation to issue tracking?

<p>Work item representation (B)</p> Signup and view all the answers

Flashcards

Site Reliability Engineering (SRE)

A discipline combining aspects of software engineering and operations, applied to infrastructure and operations problems. It emerged at Google around 2003 and seeks to create ultra-scalable, highly reliable distributed software systems.

SRE Operational Tasks

SREs dedicate half their time to operational work like resolving issues, on-call duties, and manual interventions.

SRE Development Tasks

SREs spend the other half of their time on development activities, building features, scaling systems, and automating processes.

SRE Tools and Practices

Monitoring, alerting, and automation are key components of SRE, aiding in detecting and resolving issues proactively.

Signup and view all the flashcards

DevOps

DevOps is a broader philosophy focusing on collaboration and automation within IT organizations. It aims to reduce silos, embrace failure, implement gradual changes, leverage tools, and promote continuous improvement.

Signup and view all the flashcards

Reducing Organizational Silos (DevOps Pillar)

One of the five key DevOps pillars at Google, focusing on breaking down barriers between teams (like development and operations) to enable smoother collaboration.

Signup and view all the flashcards

Accepting Failure as Normal (DevOps Pillar)

Another DevOps pillar, acknowledging that failures are inevitable in complex systems and focusing on learning and improving from them.

Signup and view all the flashcards

Implementing Gradual Changes (DevOps Pillar)

This DevOps pillar emphasizes making small, incremental changes to systems, reducing the risk of major disruptions.

Signup and view all the flashcards

Issue Tracking

Tools like Jira, Trello, CA's Agile Central and VersionOne are used to capture incidents and work backlogs. It organizes your tasks and projects in a structured way.

Signup and view all the flashcards

Kanban Boards

Kanban Boards are used to visualize the flow of work and manage tasks in a project. They help you see the progress of each task, identify bottlenecks, and prioritize work.

Signup and view all the flashcards

Time Tracking

Time Tracking tools are used to record the amount of time spent on tasks, projects or issues, and analyze the time spent. These can be used to create reports and improve project planning

Signup and view all the flashcards

Agile Portfolio Management

Agile Portfolio Management involves the evaluation of current projects and potential future initiatives to guide investments in projects and tasks. It's used to manage a large portfolio of projects efficiently.

Signup and view all the flashcards

Service Desk

Service Now is a platform used to manage the lifecycle of services, including internal and external stakeholder engagement. It helps automate tasks, track incidents, and manage service requests efficiently.

Signup and view all the flashcards

Requirements Management

Requirements Management involves defining, tracing, and organizing requirements for projects, including code requirements and test cases. These tools help ensure all requirements are met throughout the development process.

Signup and view all the flashcards

Quality Management

Quality Management tools are used to plan, execute, and track tests. They help to identify defects, analyze their severity and priority, and ensure the quality of a product.

Signup and view all the flashcards

Source Code Management

Git is a popular tool used to manage and store source code in a secure and scalable environment, allowing developers to collaborate effectively.

Signup and view all the flashcards

Dependency Scanning

A process that automatically identifies security vulnerabilities in your application's dependencies during development and testing. Popular tools include Synopisis, Gemnasium, Retire.js, and bundler-audit.

Signup and view all the flashcards

Container Scanning

When building a container image for your application, this process scans the image to detect known security vulnerabilities in the environment where your code will be deployed. Popular tools include Blackduck, Synopisis, Synk, Claire, and klar.

Signup and view all the flashcards

License Compliance

Tools like Blackduck and Synopisis ensure the licenses of your dependencies are compatible with your application, and approve or blacklist them, ensuring legal compliance.

Signup and view all the flashcards

Vulnerability Database

A database that collects, stores, and shares information about discovered computer security vulnerabilities. This info is used during the delivery pipeline to check for potential threats.

Signup and view all the flashcards

Fuzzing

A testing technique where invalid, unexpected, or random data is fed to a service to test its robustness. This aims to find bugs and security vulnerabilities by pushing the system to its limits.

Signup and view all the flashcards

Continuous Delivery

A software development approach where software is continuously built, tested, and deployed, enabling rapid releases and frequent updates. The goal is to have software ready for production at any time.

Signup and view all the flashcards

Release Orchestration

A pipeline used to automate the release process, detecting changes that might cause problems in production. It orchestrates tools to identify performance, security, and usability issues.

Signup and view all the flashcards

Review Apps

A technique that allows developers to review their applications in real-time by spinning up temporary environments whenever code is committed. This enables rapid feedback loops and faster deployment cycles.

Signup and view all the flashcards

Canary Deployment

A software deployment strategy where a new version is released to a small group of users first, acting as a test group. If issues arise, the release is quickly rolled back, limiting the impact.

Signup and view all the flashcards

Incremental Rollout

A process involving multiple smaller deployments of a service, gradually shifting all users to the new version. This allows for controlled updates and minimizes disruption.

Signup and view all the flashcards

Release Governance

A method of managing and tracking releases, ensuring they meet business requirements, comply with regulations, and are well-documented.

Signup and view all the flashcards

Secrets Management

The practice of automating and managing digital authentication credentials (secrets) used by applications and services, ensuring their security and access control.

Signup and view all the flashcards

Feature Flags

Utilizing special flags to control specific features or behaviors within a system, allowing changes in functionality without altering the underlying code.

Signup and view all the flashcards

Auto DevOps

A set of tools and practices that enable the automatic configuration of software development lifecycles, including building, testing, deploying, and monitoring applications.

Signup and view all the flashcards

SRE Tools & Automation

A set of tools and processes used to automate and manage the development, testing, and deployment of applications, facilitating a smooth and efficient workflow.

Signup and view all the flashcards

Tracing

Tracing provides insight into the performance and health of a deployed application by following the path of a request through different functions or microservices.

Signup and view all the flashcards

Cluster Monitoring

Tools that monitor the health and performance of applications running in cluster environments like Kubernetes.

Signup and view all the flashcards

Error Tracking

Tools that help identify and analyze errors generated by applications, providing detailed information about the issues.

Signup and view all the flashcards

Incident Management

Involves capturing and analyzing information about incidents affecting services. This data is used to meet service level objectives.

Signup and view all the flashcards

Synthetic Monitoring

Monitoring service behavior by running scripts that mimic user actions and analyze the results.

Signup and view all the flashcards

Status Page

Web pages that communicate the status of services to users and customers.

Signup and view all the flashcards

RASP

Runtime Application Self Protection (RASP) tools actively monitor and block threats within the production environment.

Signup and view all the flashcards

WAF

Web Application Firewalls (WAFs) examine incoming traffic and block malicious requests before they reach the application.

Signup and view all the flashcards

Vulnerability Management

Ensuring assets and applications are scanned for vulnerabilities, then recording, managing, and mitigating those vulnerabilities.

Signup and view all the flashcards

Data Loss Protection (DLP)

Tools that prevent data from leaving a service environment or organization.

Signup and view all the flashcards

Storage Security

Focuses on securing data storage systems and the data stored on them.

Signup and view all the flashcards

Container Network Security

Ensures that applications running in containers cannot access or communicate with each other in unintended ways.

Signup and view all the flashcards

Antifragility

The ability of a system to withstand and adapt to unexpected events, benefiting from them to improve its performance.

Signup and view all the flashcards

Mean Time To Detect (MTTD)

The average time it takes to detect a failure or incident.

Signup and view all the flashcards

Mean Time To Recover (MTTR)

The average time it takes to recover a component from failure.

Signup and view all the flashcards

Mean Time To Recover (MTRS)

The average time it takes to recover the entire service after a failure.

Signup and view all the flashcards

Service Level Objective (SLO)

A target for service performance, often defined as a percentage of uptime or a specific metric.

Signup and view all the flashcards

Recovery Point Objective (RPO)

The maximum amount of data loss that can be tolerated during a failure.

Signup and view all the flashcards

Study Notes

Site Reliability Engineering Foundation Course

  • Course goals include learning about SRE, its core vocabulary, principles, practices, and automation.
  • The course also aims to explore real-life scenarios and have fun while doing so.
  • Passing the SRE Foundation Exam is also a goal, requiring 40 multiple-choice questions, completed within 60 minutes, with a 65% passing score.
  • The exam is accredited by the DevOps Institute.
  • A digital badge is awarded upon successful completion.

Course Content

  • Module 1: SRE principles and practices
  • Module 2: Service Level Objectives & Error Budgets
  • Module 3: Reducing Toil
  • Module 4: Monitoring & Service Level Indicators
  • Module 5: SRE Tools & Automation
  • Module 6: Anti-fragility & learning from failure
  • Module 7: Organizational Impact of SRE
  • Module 8: SRE, other frameworks, the future

Module 1: SRE Principles & Practices

  • What is site reliability engineering?

  • SRE & DevOps: What is the difference?

  • SRE principles & practices

  • Site Reliability Engineering (SRE) is a discipline that incorporates software engineering aspects and applies them to infrastructure and operations problems.

  • It originated at Google around 2003 and was publicized via SRE books.

  • SRE's spend 50% of their time on operations-related tasks (e.g., issue resolution, on-call, manual interventions).

  • The other 50% of their time is dedicated to development tasks (e.g., new features, scaling, automation).

  • DevOps (at Google) defines 5 key pillars of success:

    1. Reduce organizational silos.
    2. Accept failure as normal.
    3. Implement gradual changes.
    4. Leverage tooling and automation.
    5. Measure everything.
  • SRE is a specific implementation of DevOps with some extensions.

  • DevOps is a set of practices, guidelines, and culture designed to break down silos in IT development, operations, architecture, networking, and security.

  • SRE is a set of practices found to work and some beliefs animating those practices, as well as a job role.

  • Operations is a software problem.

  • SRE utilizes software engineering approaches to solve operational problems.

  • Estimates suggest anywhere from 40% to 90% of total ownership costs are incurred after launch.

  • A Service Level Objective (SLO) is an availability target for a product or service (it's not 100%).

  • SRE services are managed to the SLO

  • SLOs need consequences if they are violated

  • Any manual, mandated operational task is considered bad.

  • If a task can be automated, it should be automated.

  • Tasks can provide wisdom from production to inform better system design and behavior.

  • SRE teams have the ability to regulate their workload.

  • Automate what is currently done manually.

  • Decide what to automate and how to automate it.

  • Take an engineering-based approach to problems rather than toiling at them repeatedly.

  • Prioritize automating, not automating bad processes.

  • Late problem (defect) discovery is expensive, so SREs look for ways to avoid it.

  • Look to improve MTTR (mean time to repair).

  • Smaller changes can address this.

  • Canary deployments are also related to this.

  • SREs share skill sets with product development teams.

  • Boundaries between application development and production (Dev & Ops) should be removed.

Module 2: SLO's & Error Budgets

  • Example SLO's and error budgets

  • SLI's for measurement

  • SLO's adoption

  • Error Budgets – Good and Bad

  • Error Budgets - Fixed?

  • Consequences of missed SLO's

  • The VALET Dimensions of SLO

  • The importance of SLO's in error budget and policies

  • The service level objective is a goal for how well a product or service functions.

  • SLOs are strongly related to the user experience.

  • Setting and measuring service-level objectives is important for SRE roles.

  • Availability is the most widely tracked SLO.

  • Products and services often have multiple SLO's.

  • SLOs aim to improve the user experience.

Module 3: Reducing Toil

  • What is toil?

  • Why toil is bad

  • Doing something about toil

  • Work is toil if it is manual, repetitive, automatable, tactical, lacks enduring value, and scales linearly.

  • Doing the same test over and over, acknowledging the same alert every morning, dealing with interrupts, physical meetings to approve production deployments, manual starts/resets of equipment and components, and creating users are also forms of toil.

  • Known workarounds, on-call responses, and manual scaling infrastructure are also forms of toil.

  • Extracting some data is also a form of toil.

  • Toil (a specific description) isn't "stuff I don't like doing."

  • Toil reduction requires engineering time.

  • Creating external automation, internal automation, or enhancing services to avoid intervention are all choices for reducing toil.

  • Google has an advertised goal of keeping operational work (toil) below 50% of an engineer's time.

  • At least 50% of each SRE's time should be spent on engineering.

  • The 50% rule ensures that one team or person doesn't handle operational tasks solely.

Module 4: Monitoring & SLI's

  • SLI's - Service Level Indicators
  • Monitoring
  • Observability
  • SLI's are ways for engineers to communicate quantitative data about systems.
  • Multiple numbers can function as an SLI, generally as a ratio of good to total events.
  • Service-level indicators may also need client-side data collection.
  • SLI measurement needs to be time-bound in some way.
  • Monitoring tools frequently used include Catchpoint, Nagios, Prometheus, Splunk, Grafana, and Collectd.
  • Monitoring is the use of hardware or software components to monitor system resources and their performance.
  • Telemetry is the automated communications process for receiving measurements.
  • Application Performance Management (APM) monitors and manages application performance and availability.

Module 5: SRE Tools & Automation

  • Automation Defined

  • Automation Focus

  • Hierarchy of Automation Types

  • Secure Automation

  • Automation Tools

  • Manage (1): Audit Management

  • Authentication & Authorization

  • Manage (2): DevOps Score

  • Value Stream Management

  • Plan(1): Issue Tracking, Kanban Boards, Time Tracking, Agile Portfolio Management

  • Plan (2): Service Desk, Requirements Management, Quality Management,

  • Create (1): Source Code Management, Code Review, Wiki,

  • Create (2): Web IDE, Snippets

  • Verify (1): Continuous Integration, Code Quality,

  • Verify (2): Performance Testing, Usability Testing,

  • Package (1): Package Registry, Container Registry, Dependency Proxy

  • Package (2): Helm Chart Registry, Dependency Firewall

  • Secure (1): SAST, DAST, IAST, Secret Detection

  • Secure (2): Dependency Scanning, Container Scanning, License Compliance,

  • Secure (3): Vulnerability Database, Fuzzing

  • Release (1): Continuous Delivery, Release Orchestration, Pages, Review Apps, Incremental Rollout

  • Release (2): Canary Deployments, Feature Flags, Release Governance, Secrets Management

Module 6: Antifragility & Learning from Failure

  • Why learn from failure?

  • Benefits of antifragility

  • Shifting the organizational balance

  • MTTD - Mean Time to Detect (Failure/Incidents)

  • MTTR - Mean Time to Recover (Components)

  • MTRS - Mean Time to Recover (Service)

  • SLO - Service Level Objective

  • RPO - Recovery Point Objective

  • Chaos Engineering Next Steps

  • Simulating component failure allows for automation of recovery

  • More frequent backup of queue data may be needed to meet RPO

  • Chaos engineering approaches identify key interfaces & dependencies across services pinpointing areas where more resilience may be required.

  • Introducing failure to a messaging queue may indicate excess data loss outside the RPO.

  • A fire drill (where, e.g., a database is taken down) may result in an SLO being broken but caching data in the case of a database outage instead could mean the SLO is met

  • Introducing failure to a queue could indicate excessive data loss outside the RPO

  • More frequent backups of the queue data may be needed to meet the RPO.

Module 7: Organizational Impact of SRE

  • Why organizations embrace SRE

  • Patterns for SRE adoption

  • SRE Job Description

  • Sustainable Incident Response

  • Blameless post mortems

  • SRE & Scale

  • Increased Service Resilience

  • Minimize Loss of Revenue

  • Average cost of service downtime is $5,600 per minute.

  • Downtime differences per hour vary greatly.

  • Typical SRE adoption steps include consulting, embedded, platform, slice & dice, and full SRE.

  • SRE ownership of common tools and platforms ('platform SRE') may be used.

  • Shared responsibility ('embedded SRE') development teams is a common strategy.

  • Automation saves SRE time for crucial development tasks.

  • The challenge of scale is always a good one.

  • Automation techniques (such as auto-scaling, containerization, and clustering), flexible platforms (such as public/private cloud), non-structural databases (such as NoSQL and MongoDB), and 'as-a-service' capabilities are critical to platform growth,

  • SRE owners have common tools and platforms which other devs use, SRE expertise 'shifts left' for dev teams ("embedded SRE"), and toil automation improves the time available for development.

  • Toil reduction mechanisms include automated ticket responses and "self-service" features, while DRY (don't repeat yourself) solutions prevent toil-related problem repetition.

  • SRE & Other Frameworks
  • SRE Evolution
  • SRE teams can operate in an agile way.
  • SRE can help with ITSM compliance.
  • SRE is part of a "system of systems" for delivery.
  • SRE Evolution.
  • A Network Reliability Engineer (NRE)
  • A Database Reliability Engineer (DBRE)
  • A Customer Reliability Engineer (CRE)
  • A Heritage Reliability Engineer (HRE)

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser