Podcast
Questions and Answers
Which of the following best describes the primary goal of resilience engineering in software systems?
Which of the following best describes the primary goal of resilience engineering in software systems?
- To eliminate all possible system failures and vulnerabilities.
- To reduce the cost of software development and maintenance.
- To maximize system performance under ideal operating conditions.
- To ensure the system maintains continuity of critical services despite disruptive events. (correct)
Resilience engineering places emphasis on:
Resilience engineering places emphasis on:
- Avoiding system failures entirely.
- Minimizing the cost of system failures and facilitating recovery. (correct)
- Maximizing the number of technical faults in a system.
- Ignoring the potential for system failures.
In the context of resilience activities, what is the role of 'Resistance'?
In the context of resilience activities, what is the role of 'Resistance'?
- To recognize early indications of system failure.
- To reduce the probability that the system will fail when problems are detected. (correct)
- To restore all system services to normal operation.
- To restore critical system services quickly after a failure.
What does 'proactive resistance' involve in system resilience?
What does 'proactive resistance' involve in system resilience?
Which statement accurately describes the scope of cybersecurity?
Which statement accurately describes the scope of cybersecurity?
What is a key factor contributing to cybersecurity failure?
What is a key factor contributing to cybersecurity failure?
Which type of cybersecurity threat involves data being made accessible to unauthorized individuals?
Which type of cybersecurity threat involves data being made accessible to unauthorized individuals?
In cybersecurity, what is the primary purpose of firewalls?
In cybersecurity, what is the primary purpose of firewalls?
Which practice supports recovery after a successful cyberattack?
Which practice supports recovery after a successful cyberattack?
Which of the following is a key step in cyber-resilience planning?
Which of the following is a key step in cyber-resilience planning?
What is the focus of 'Threat Identification' in cyber resilience planning?
What is the focus of 'Threat Identification' in cyber resilience planning?
What does 'Asset Reinstatement' involve in the context of cyber resilience planning?
What does 'Asset Reinstatement' involve in the context of cyber resilience planning?
Why is it important to consider sociotechnical systems design when building resilient systems?
Why is it important to consider sociotechnical systems design when building resilient systems?
In the Mentcare example, what is presented as a better strategy to prevent data theft from a user's credentials rather than using complex authentication procedures?
In the Mentcare example, what is presented as a better strategy to prevent data theft from a user's credentials rather than using complex authentication procedures?
In a nested technical and sociotechnical system, what happens if a failure in system S1 leads to a failure in system ST1?
In a nested technical and sociotechnical system, what happens if a failure in system S1 leads to a failure in system ST1?
What is a key characteristic of resilient organizations?
What is a key characteristic of resilient organizations?
What differentiates 'the person approach' from 'the systems approach' when considering human error?
What differentiates 'the person approach' from 'the systems approach' when considering human error?
According to the systems approach, what is one reason people are likely to make mistakes?
According to the systems approach, what is one reason people are likely to make mistakes?
What does redundancy and diversity achieve in creating defensive layers?
What does redundancy and diversity achieve in creating defensive layers?
According to the Swiss Cheese model, how do system failures occur?
According to the Swiss Cheese model, how do system failures occur?
What action increases system resilience?
What action increases system resilience?
What is the relationship between efficiency and resilience in process design?
What is the relationship between efficiency and resilience in process design?
What is an implication of only presenting operators with the information they 'need to know'?
What is an implication of only presenting operators with the information they 'need to know'?
How can process automation negatively influence system resilience?
How can process automation negatively influence system resilience?
What is a disadvantage of process automation?
What is a disadvantage of process automation?
What is the initial step in resilient systems design?
What is the initial step in resilient systems design?
What is the purpose of 'Survivability Analysis'?
What is the purpose of 'Survivability Analysis'?
In the stages of survivability analysis, what is the focus of 'Identify softspots and survivability strategies'?
In the stages of survivability analysis, what is the focus of 'Identify softspots and survivability strategies'?
What is a limitation of using survivability analysis for business systems?
What is a limitation of using survivability analysis for business systems?
What phase is 'Plan backup strategy' in Resilience Engineering?
What phase is 'Plan backup strategy' in Resilience Engineering?
What is one of the work streams of resilience engineering?
What is one of the work streams of resilience engineering?
What should the aim be of good critical service maintenance?
What should the aim be of good critical service maintenance?
What is one way of minimizing risks to confidentiality on multiple copies of information on laptops?
What is one way of minimizing risks to confidentiality on multiple copies of information on laptops?
What should resilience planning be based on?
What should resilience planning be based on?
If the database is unavailable, how can doctors still access essential patient information?
If the database is unavailable, how can doctors still access essential patient information?
Which is required for client and server communication?
Which is required for client and server communication?
Is the following an example of Recognition? Watchdog timer on client that times out if no response to client access
Is the following an example of Recognition? Watchdog timer on client that times out if no response to client access
Is the following an example of Malware infection of client computers Security awareness workshops for all system users
Is the following an example of Malware infection of client computers Security awareness workshops for all system users
What is the primary focus of resistance strategies in system resilience?
What is the primary focus of resistance strategies in system resilience?
When is reactive resistance employed in system resilience?
When is reactive resistance employed in system resilience?
What is the relationship between cybersecurity and system security engineering?
What is the relationship between cybersecurity and system security engineering?
Which of the following represents a threat to the integrity of assets in cybersecurity?
Which of the following represents a threat to the integrity of assets in cybersecurity?
How can multi-stage diverse authentication enhance system resilience?
How can multi-stage diverse authentication enhance system resilience?
In cyber-resilience planning, what role does 'Threat Resistance' play?
In cyber-resilience planning, what role does 'Threat Resistance' play?
What does the 'Asset Classification' stage primarily involve in cyber resilience planning?
What does the 'Asset Classification' stage primarily involve in cyber resilience planning?
What is the primary objective of 'Threat Recognition' within cyber resilience planning?
What is the primary objective of 'Threat Recognition' within cyber resilience planning?
What is the purpose of 'Asset Recovery' in the context of cyber resilience planning?
What is the purpose of 'Asset Recovery' in the context of cyber resilience planning?
What is the focus of the 'Asset Reinstatement' phase in cyber resilience planning?
What is the focus of the 'Asset Reinstatement' phase in cyber resilience planning?
What does sociotechnical systems design emphasize in resilient system development?
What does sociotechnical systems design emphasize in resilient system development?
According to the Mentcare example, what organizational strategy is more effective in preventing data theft from user credentials than complex authentication?
According to the Mentcare example, what organizational strategy is more effective in preventing data theft from user credentials than complex authentication?
How do resilient organizations approach handling future threats and vulnerabilities?
How do resilient organizations approach handling future threats and vulnerabilities?
In the context of organizational resilience, what signifies 'the ability to learn'?
In the context of organizational resilience, what signifies 'the ability to learn'?
What is a key concept of the 'systems approach' to human error?
What is a key concept of the 'systems approach' to human error?
Which statement aligns with the principles of the Swiss Cheese model of accident causation?
Which statement aligns with the principles of the Swiss Cheese model of accident causation?
What strategy increases overall system resilience according to the content?
What strategy increases overall system resilience according to the content?
In balancing efficiency and resilence, what is the impact of prioritizing process improvement solely on efficiency?
In balancing efficiency and resilence, what is the impact of prioritizing process improvement solely on efficiency?
What is a potential pitfall of providing operators only with the 'need to know' information to promote efficiency?
What is a potential pitfall of providing operators only with the 'need to know' information to promote efficiency?
How could process automation potentially detract from system resilience?
How could process automation potentially detract from system resilience?
How can automated management systems undermine resilience?
How can automated management systems undermine resilience?
What is the immediate next step in resilient systems design after identifying critical services and assets?
What is the immediate next step in resilient systems design after identifying critical services and assets?
What is the purpose of Attack Simulation in survivable systems analysis?
What is the purpose of Attack Simulation in survivable systems analysis?
In the stages of survivability analysis, what does 'Identify attacks and compromisable components' involve?
In the stages of survivability analysis, what does 'Identify attacks and compromisable components' involve?
What is a key limitation of survivability analysis for business systems, according to the content?
What is a key limitation of survivability analysis for business systems, according to the content?
Which action falls under the work stream of 'Plan event recognition and resistance' in Resilience Engineering?
Which action falls under the work stream of 'Plan event recognition and resistance' in Resilience Engineering?
What key pieces of information are needed to maintain the availability of critical services?
What key pieces of information are needed to maintain the availability of critical services?
In the Mentcare system, what client-side action enhances resilience by minimizing risks to confidentiality?
In the Mentcare system, what client-side action enhances resilience by minimizing risks to confidentiality?
Why is it better to download information to the client before consultation occurs?
Why is it better to download information to the client before consultation occurs?
Flashcards
System Resilience
System Resilience
How well a system maintains critical services during disruptive events, like failures and cyberattacks.
Critical Services
Critical Services
Services whose failure could cause serious harm (human, social, economic).
Disruptive Events
Disruptive Events
Unexpected events that disrupt the ability of a system to deliver its services.
Resilience Engineering
Resilience Engineering
Signup and view all the flashcards
Recognition (Resilience)
Recognition (Resilience)
Signup and view all the flashcards
Resistance (Resilience)
Resistance (Resilience)
Signup and view all the flashcards
Recovery (Resilience)
Recovery (Resilience)
Signup and view all the flashcards
Reinstatement (Resilience)
Reinstatement (Resilience)
Signup and view all the flashcards
Cybercrime
Cybercrime
Signup and view all the flashcards
Cybersecurity
Cybersecurity
Signup and view all the flashcards
Threat to Confidentiality
Threat to Confidentiality
Signup and view all the flashcards
Threat to Integrity
Threat to Integrity
Signup and view all the flashcards
Threat to Availability
Threat to Availability
Signup and view all the flashcards
Authentication
Authentication
Signup and view all the flashcards
Encryption
Encryption
Signup and view all the flashcards
Firewalls
Firewalls
Signup and view all the flashcards
Data Redundancy
Data Redundancy
Signup and view all the flashcards
Asset Classification
Asset Classification
Signup and view all the flashcards
Threat Identification
Threat Identification
Signup and view all the flashcards
Threat Recognition
Threat Recognition
Signup and view all the flashcards
Threat Resistance
Threat Resistance
Signup and view all the flashcards
Asset Recovery
Asset Recovery
Signup and view all the flashcards
Asset Reinstatement
Asset Reinstatement
Signup and view all the flashcards
Sociotechnical Resilience
Sociotechnical Resilience
Signup and view all the flashcards
Sociotechnical System Failure
Sociotechnical System Failure
Signup and view all the flashcards
Characteristics of Resilient Organizations
Characteristics of Resilient Organizations
Signup and view all the flashcards
Ability to Respond
Ability to Respond
Signup and view all the flashcards
Ability to Monitor
Ability to Monitor
Signup and view all the flashcards
Ability to Anticipate
Ability to Anticipate
Signup and view all the flashcards
Ability to Learn
Ability to Learn
Signup and view all the flashcards
Human Error
Human Error
Signup and view all the flashcards
The Person Approach
The Person Approach
Signup and view all the flashcards
The Systems Approach
The Systems Approach
Signup and view all the flashcards
Systems Engineer Assumptions
Systems Engineer Assumptions
Signup and view all the flashcards
Defensive Layer Strategy
Defensive Layer Strategy
Signup and view all the flashcards
Swiss Cheese Model
Swiss Cheese Model
Signup and view all the flashcards
Maximize System Resilence
Maximize System Resilence
Signup and view all the flashcards
Operational Processes
Operational Processes
Signup and view all the flashcards
Operations Design
Operations Design
Signup and view all the flashcards
Inefficient Practices
Inefficient Practices
Signup and view all the flashcards
Automated Management system flaws
Automated Management system flaws
Signup and view all the flashcards
Resilient System Designs
Resilient System Designs
Signup and view all the flashcards
System Goals
System Goals
Signup and view all the flashcards
Services Identification
Services Identification
Signup and view all the flashcards
More important than general requirements.
More important than general requirements.
Signup and view all the flashcards
Resiliance Engineering
Resiliance Engineering
Signup and view all the flashcards
Crucial services
Crucial services
Signup and view all the flashcards
Study Notes
Resilience
- System resilience gauges how well a system maintains critical service continuity when disruptive events occur.
- Disruptive events include equipment failure and cyberattacks.
- Resilience handles system failures and other disruptive events, with cyberattacks by malicious actors posing the most serious threat to networked systems.
Essential Resilience Ideas
- Some system services are critical, and their failure can lead to severe human, social, or economic consequences
- Some events are disruptive and can impact the ability of a system to deliver its critical services.
- Resilience is based on expert judgment, without resilience metrics or quantifiable measures.
- Experts assess resilience through examination of the system and its operational processes.
Resilience Engineering Assumptions
- Resilience Engineering acknowledges that avoiding system failures is impossible.
- The focus is on limiting the costs of failures and recovering from them.
- Good reliability engineering practices are used to minimize technical faults.
- Emphasis is placed on limiting failures from external events like operator errors or cyberattacks.
Resilience Activities
- Recognition involves the system or its operators identifying early signs of system failure.
- Resistance involves implementing strategies to reduce the failure probability when problems or cyberattacks are detected early.
- Recovery ensures the quick restoration of critical system services when a failure occurs.
- Reinstatement involves restoring all system services, allowing normal system operation to continue.
Resistance Strategies
- Isolation of critical system parts to prevent impact from problems elsewhere is important.
- Proactive resistance includes defenses to trap problems.
- Reactive resistance involves actions taken upon problem discovery.
Cybersecurity
- Cybercrime is the unlawful utilization of networked systems, posing a significant societal challenge.
- Cybersecurity is broader than system security engineering.
- Cybersecurity is a sociotechnical issue that protects citizens, businesses, and critical infrastructure from threats arising from computer and internet use.
- Cybersecurity is concerned with protecting all IT assets, from networks to application systems.
Factors Contributing to Cybersecurity Failure
- Organizational ignorance of problem severity.
- Poor security procedure design and lax application.
- Human carelessness.
- Inappropriate trade-offs between usability and security.
Cybersecurity Threats
- Threats to confidentiality involve unauthorized data access without damage.
- Threats to integrity involve system or data damage through cyberattacks.
- Threats to availability aim to prevent authorized users from accessing assets.
Examples of Security Controls
- Authentication requires users to prove their authorization.
- Encryption algorithmically scrambles data to prevent unauthorized access.
- Firewalls examine network packets and accepts or rejects them based on organizational rules.
- Firewalls ensure only trusted traffic passes from the internet to the local network
Redundancy and Diversity
- Data and software copies should be maintained on separate computer systems to support cyberattack recovery.
- Multi-stage diverse authentication protects against password attacks and serves as a resistance measure.
- Critical servers may be over-provisioned to handle expected load. This allows for resistance without service degradation.
Cyber-Resilience Planning Steps
- Asset classification involves classifying hardware, software, and human assets based on their importance to normal operations.
- Threat identification involves identifying and classifying threats to each asset.
- Threat recognition involves identifying how each threat might be recognized.
- Threat resistance involves identifying potential resistance strategies for each threat.
- Asset recovery involves determining the recovery process for each critical asset after a successful cyberattack.
- Asset reinstatement involves defining procedures to restore the system to normal operation.
Sociotechnical Resilience
- Resilience Engineering addresses external events leading to system failure.
- Resilient system design considers sociotechnical systems, rather than only software.
- Addressing adverse events is easier and more effective within a broader sociotechnical system.
Mentcare Example
- Cyberattacks may aim to steal legitimate user credentials.
- Technical solutions include complex authentication, which can irritate users and reduce security if they avoid logging out.
- A better strategy involves organizational policies emphasizing strong passwords and discouraging credential sharing.
Failure Hierarchy
- Operator actions in a broader sociotechnical system (ST1) may trap failures in system S1.
- Organizational damage is therefore limited.
- Managers in the broader organization respond to failures in ST1 caused by failures in S1.
Organizational Resilience Characteristics
- Characteristics include responsiveness, monitoring, anticipation, and learning.
- Organizations must adapt processes and procedures to both anticipated risks and detected threats.
- Internal operations and the external environment should be monitored for threats.
- Resilient organizations anticipate future events and changes.
- Organizational resilience improves by learning from experience.
- Learning from successful responses such as resisting cyberattacks is particularly important.
Human Error
- People inevitably make mistakes that can lead to serious system failures.
- The person approach attributes errors to individual carelessness or reckless behavior.
- The systems approach recognizes that people are fallible and make mistakes due to workload, training, or system design.
Systems Approach Regarding Human Error
- Systems engineers should assume human errors will occur during system operation.
- System designers should consider defences and barriers to human error.
- Barriers can involve either technical components or processes, procedures, and guidelines.
Defensive Layers
- Redundancy and diversity should be used to create a set of defensive layers.
- Each layer uses a different approach to deter attackers or trap technical/human failures.
- Air Traffic Control (ATC) system examples include conflict alert systems, formalized recording procedures, and collaborative checking.
Swiss Cheese Model
- Defensive layers have vulnerabilities
- The ‘holes’ are not always in the same place and the size of the holes may vary depending on the operating conditions.
- System failures occur when all defenses fail because the holes in the layers align.
Increasing System Resilience
- Reduce the probability of an external event that might trigger system failures.
- Increase the number of defensive layers
- The more layers that you have in a system, the less likely it is that the holes will line up and a system failure occur.
- Design a system so that diverse types of barriers are included
- By including diverse types of barriers, the 'holes' will probably be in different places and so there is less chance of the holes lining up and failing to trap an error.
- Minimize the number of latent conditions in a system
- Reducing the number and size of system 'holes' will minimize latent conditions.
Operational And Management Processes
- All software systems have associated operational processes that reflect the assumptions of the designers about how these systems will be used.
- For example, in an imaging system in a hospital, the operator may have the responsibility of checking the quality of the images immediately after these have been processed.
- This allows the imaging procedure to be repeated if there is a problem.
Operational Processes
- Operational processes are involved in using the system for its defined purpose.
- These processes must be defined and documented during system development for new systems.
- Operators require training, and other work processes may need adaptation for effective new system use.
Personal And Enterprise IT Processes
- Designers for personal systems may describe expected system use, but have no control over user behavior.
- Enterprise IT systems may provide user training to teach users how to use the system.
- While user behaviour cannot be controlled, it is reasonable to expect that users will normally follow the defined process.
Process Design
- Operational and management processes are vital defence mechanisms that must balance efficient operation and problem management.
- Process improvement focuses on identifying and codifying good practice and developing software to support this.
- A focus on efficiency during process improvement can make dealing with problems more difficult.
Efficiency And Resilience
- Efficient process operation include process optimization and control, information hiding and security and role specialization.
- Problem management requires process flexibility and adaptability, information sharing and visibility and manual processes with spare operator/manager capacity.
Coping With Failures
- Retaining redundant information or sharing information helps handle problems effectively.
- Operators and system managers often recover from issues, even if it requires breaking rules or working around the defined process.
- Operational processes should be therefore designed to enhance both flexibility and adaptability.
Information Provision And Management
- Presenting operators with necessary information when needed can increase efficiency.
- Operators may struggle to detect issues not directly affecting immediate tasks if shown only what the process designer considers necessary.
- Lack of a broad system overview complicates strategy formulation for dealing with problems.
Process Automation
- Process automation can affect system resilience both positively and negatively.
- Automated systems can efficiently detect problems, invoke cyberattack resistance, and initiate recovery.
- Fewer personnel may be available to handle problems if the automated system can't handle it, and system automation may worsen system damage.
Disadvantages Of Process Automation
- Automated management systems may take unexpected actions making problems worse and that cannot be understood by the system managers.
- Collaborative problem solving may be slowed due to fewer available managers needed to figure out a recovery strategy.
Resilient Systems Design
- Critical services and assets are those elements of the system that allow a system to fulfill its primary purpose.
- For example, the critical services in a system that handles ambulance dispatch are those concerned with taking calls and dispatching ambulances.
- System components should be designed to have problem recognition, resistance, recovery and reinstatement.
- For example, in an ambulance dispatch system, a watchdog timer may be included to detect if the system is not responding to events.
Survivable Systems Analysis
- System understanding involves reviewing goals, requirements, and architecture.
- Critical service identification involves pinpointing what must be maintained and the components required.
- Attack simulation involves finding scenarios and use-cases for attacks along with system components that would be affected.
- Survivability analysis involves identifying essential and compromisable components and finding survivability strategies based on resistance, recognition and recovery.
Problems For Business Systems
- The starting point for survivability analysis is requirements and architecture documentation.
- It is not explicitly related to the business requirements for resilience.
- It assumes that there is a detailed requirements statement for a system.
Streams Of Work In Resilience Engineering
- First identify business resilience requirements.
- Next, plan how to bring systems back to their normal operating state.
- Then, identify system failures and cyberattacks that could compromise a system.
- Next, plan how to recover critical services quickly after damage from a cyberattack.
- Finally, test all aspects of resilience planning.
Maintaining Critical Service Availability
- System services most critical for a business
- The minimal quality of service that must be maintained
- How these services might be compromised
- How these services can be protected
- How one can recover quickly if the services become unavailable
- Assets may be hardware, software, data or people
Mentcare System Resilience
- Mentcare assists clinicians treating patients with mental health issues.
- It offers patient data and consultation records from doctors and nurses.
- It includes alerts for patients at risk of harm or suicidal tendencies.
- It’s based on a client-server architecture.
Critical Mentcare Services
- Information service about a patient's current diagnosis and treatment.
- A warning service that highlights patients that could pose a danger to others or to themselves.
- Complete patient record availability is NOT a critical service, because routine information is not normally required during consultations.
Assets Required for Normal Mentcare Service Operation
- The patient record database with all patient information.
- A database server providing database access for local clients.
- A network that allows for client/server communication.
- Local laptop or desktop computers used to access patient information.
- A rule set to identify if patients are dangerous, highlighting dangerous patients to system users.
Adverse Events Affecting Mentcare System
- The database server may be unavailable due to failure, a network issue, or a cyberattack.
- Patient records or rules that define at risk designations may be corrupted accidentally, or deliberately.
- Client computers may be infected with malware.
- Unauthorized individuals may access client computers gaining access to patient records.
Recognition And Resistance Strategies For Mentcare
Event | Recognition | Resistance |
---|---|---|
Server Unavailability | Client watchdog timer alerts to client access failure.Text messages from system manager to clinician | System architecture maintains copies of critical info.Peer-to-peer search across clients.Staff provided with smartphones that can access network if server fails. Provide back up server |
Patient database corruption | Record level cryptographic checksums. Regular autochecking of database integrity. Reporting system for incorrect information | Replayable transaction log to update database backup with recent transactions. Maintenance of local copies of patient information and software to restore database from local copies and backups |
Malware infection of client computer | Reporting system where computer users can report unusual activity.Automated malware checks on start up | Security awareness workshops for system users.USB port prevention on client computers.Automated system set up for new clients.Installation of security software |
Unauthorized access to patient information | Warning text messages from users about possible intruders. Log analysis for unusual activity | Multi level system Authentication process.Security awareness workshops for system users.USB port prevention on client computers.Access logging and real-time log analysis |
Architecture For Resilience
- Client computers locally store summary patient records, exchanging information via network or ad hoc connections for database unavailability, doctors and nurses can access essential patient information.
- A backups store makes server snapshots to act as the server if the main one fails.
- Database integrity checking and recovery software checks for database corruption, initiating automatic recovery with a transaction log updating backups.
Critical Service Maintenance
- Clients downloading data enables consultations during server access issues, only data that needs to be downloaded will be downloaded.
- Harmful actions by others may be implemented if patient records are identified before download, software can highlight the patient to suggest special care.
Risks To Confidentiality
- To minimize risks of lost data on laptops, only load patient data who are scheduled at the clinic.
- To ensure confidentiality, the system encrypts a local drive so that hackers cannot access data.
- To ensure limited data, the system deleted data from a clinic session as limited records of deletion will enable less hacker entry.
- To enable secure entry, transactions are encrypted so that unauthorized attacks cannot gain entry to information during traffic.
Key points
- System design requires defensive layers with a variety of tools to trap potential human and technical failures.
- To cope with problems, system providers should design a system with processes that are flexible and adaptable.
- Designers should always design a system that can be recoverable, recognizable and able to implement reinstatement.
- Cyber attacks tend to involve either an insider/outsider attack to gain entry for a system.
- System designers should implement a multi layer defensive system to trap potential cyber attacks from entry.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.