Incident Response and Recovery PDF

end users need to be able to accomplish everyday work (including successful handling of contingencies, special cases, and errors, so long as these were part of the use cases that drove the design and development). It is not acceptance testing—it does not check off each functional and nonfunctional requirement as satisfied or deficient; instead, OT&E is based on scenarios that have varying levels of operational realism in their flow of activi- ties, test input data, and the conditions to be verified upon completion. OT&E evaluates, or at least provides insight about, the readiness of the people elements of the system under test as much as it does the hardware, software, data, and communications capabili- ties. Quite often, OT&E discovers that the tacit knowledge of the end users and operators is not effectively captured in the system specifications or operational procedures; this can also reflect that business needs and the real world have moved on from the time the requirements were specified and design and development began. OT&E events usually run in separate test environments, with some degree of isolation from production systems and data, and can run from hours to days in nonstop duration. This same lag between what the requirements asked the systems to do and what its users know (tacitly) that they actually do with the system can be a mixed blessing when you look to include OT&E activities as part of security assessment and testing. It’s quite likely, too, that your security team’s knowledge of the threats and vulnerabilities has also continued to evolve. For everyone, OT&E activities can and should be a great learning (and knowledge management) experience. OT&E provides a great opportunity to look for ways to operationally assess specific security concerns in a realistic setting. It is white-box testing primarily; the test team, planners, users, and security personnel are all aware of the test and have as perfect knowl- edge about the system and the scenarios as possible. Ethical Penetration Testing Ethical penetration testing involves running attacks against your own systems; you (or, more correctly, your organization) contractually spell out what tests to accomplish, what objectives to attempt to achieve, and what limitations, constraints, and special conditions apply to every aspect of the ethical penetration testing process. Emphasis must be placed on that first word—ethical—because the people planning, conducting, and reporting on these tests work for your organization, either as direct employees or via contracts. Their loyalties must be with your organization; they have to be your “white hats for hire” because you are trusting them with your most vital business secrets: knowledge of the vul- nerabilities in your security posture. Ethical penetration testing, therefore, depends upon the trust relationship between testers and the target organization; it depends upon the integrity of those testers, includ- ing their absolute adherence to contract terms or statements of work regarding your need to have them protect the confidentiality of all information about your systems and your 210 CHAPTER 3 Risk Identification, Monitoring, and Analysis processes that they gather, observe, learn, or evaluate as part of the testing. Ethical pene- tration testing also depends upon a legally binding written agreement that grants specific permissions to the test team to attempt to penetrate your facilities, your systems, attempt deceptions (such as social engineering), plant false data, malware, or take actions that change the state of your systems. In most jurisdictions around the world, it is illegal to perform such actions without the express consent of the owners or responsible managers of the systems in question—so this contract or agreement is all that keeps your ethical 3 penetration testers out of jail for doing what you’ve asked them to do! Risk Identification, Monitoring, and Analysis (This is not a good time to save some money by hiring convicted former hackers simply because they seem to know their technical stuff without some very powerful and enforceable legal assurances that your testers will turn over all copies of all data about your systems and retain absolutely nothing about them once testing and reporting is completed.) Even with such contracts in place or detailed, written permission in hand, things can always go wrong during any kind of testing, especially during penetration testing. Such tests are attempting to do what an advanced persistent threat would do if it was attacking your systems. Test activities could inadvertently crash your systems, corrupt data, degrade throughput, or otherwise disrupt your normal business operations; if things go horribly wrong, the actions the penetration testers are taking could jump from your systems out into the wild and in effect springboard their attack onto some third party systems—whose owners or managers no doubt have not signed your penetration test plan and contract. It’s beyond the scope of this book to delve further into ethical penetration testing. One good resource is Chapter 6 of Grey Hat Hacking, 5th Edition,2 which provides an excellent overview of penetration testing from the insider’s perspective. It also makes the point that penetration testing, as with other security assessments, should confirm what is working properly as much as it should find vulnerabilities that need correction (whether you knew about them before but hadn’t done anything about them yet or not). ✔✔ Pen Testing and Moral Hazards We normally think of the ethical in ethical penetration testing as describing the pen tester’s honesty, integrity, and ultimately their professional dedication to their client. Pen testing by its nature is trying to break your systems; it’s trying to find exploitable CONTINUES 2 Allen Harper, Daniel Regalado, Ryan Linn, Stephen Sims, Branko Spasojevic, Linda Martinez, Michael Baucom, Chris Eagle, and Shon Harris (2018). Gray Hat Hacking: The Ethical Hacker’s Handbook, Fifth Edition. McGraw-Hill Education. ISBN-13: 978-1260108415. Perform Security Assessment Activities 211 CONTINUED weaknesses, and oftentimes this involves placing your people under the microscope of the pen test. For the test results to be meaningful, you need your people to act as if everything is normal; they should respond to strange events (which might be test injects) as they have been trained to. The testing is evaluating the effectiveness of that training and how well it really equips your end users to do their bit in keeping your sys- tems secure. Security testing of any kind can quickly lose its value to your organization, if the work- force perceives it as nothing more than a tool to weed out workers who need to be moved to less sensitive jobs or out of the organization completely. Security testing is a legitimate and necessary means to assess training effectiveness, people effectiveness, and systems functionality, as they all contribute to organiza- tional security and success. These must be harmonized with keeping your workforce engaged with security and avoid having them see it as a thinly disguised reduction in staffing levels. Assessment-Driven Training Whether your assessment results indicate findings (of problems to fix) or good findings (celebrating the things the organization is doing well), each set of assessment results is an opportunity to improve the effectiveness of the human element in your information secu- rity system of systems. Problems or recommendations for corrective actions are typically exploited as opportunities to identify the procedural elements that could be improved, as well as possibly identifying the need for refresher training, deeper skills development training, or a more effective engagement strategy with some or all of your end users. The good news—the good findings—are the gold that you share with your end users. It’s the opportunity to share with them the “wins” over the various security threats that the total organization team has achieved; it’s a time to celebrate their wins over the APTs, offer meaningful appreciation, and seek their input on other ways to improve the overall security posture. Sadly, many organizations are so focused on threat and risk avoidance that they fail to reap the additional benefits of sharing successes with the workforce that made it possible. (It does require that assessment analysts make the effort to identify these good findings in their overall assessment reports; one might argue that this is an ethical burden that these analysts share with management.) Post-assessment debriefs to your end-user groups that were affected by or involved with the assessment can be both revealing and motivating. Questions and discussions can identify potential areas of misunderstanding about security needs, policies, and controls, or highlight opportunities to better prepare, inform, and train users in their use of these 212 CHAPTER 3 Risk Identification, Monitoring, and Analysis controls. Each such bit of dialogue, along with more informal conversations that you and your other team members have with end users, is an opportunity to further empower your end users as teammates; it can help them to be more intentional and more purposeful in their own security hygiene efforts. Be sure to invite end users to post-assessment debriefs and discuss both findings and good findings with them. 3 Design and Validate Assessment, Test, and Audit Strategies Risk Identification, Monitoring, and Analysis Projects require creating a methodology and scope for the project, and security assess- ment and audit efforts are no different. Management must determine the scope and targets of the assessment, including what systems, services, policies, procedures, and prac- tices will be reviewed, and what standard, framework, or methodology the organization will select or create as the foundation of the assessment. Commonly used industry frameworks include the following: NIST SP 800-53r4, “Assessing Security and Privacy Controls in Federal Informa- tion Systems and Organizations.” NIST SP 800-115, “Technical Guide to Information Security Testing and Assess- ment.” This is an important information source for you, as it provides an in-depth explanation of information systems testing, penetration testing, assessment, analy- sis, and reporting. ISO 18045, “Information technology – Security techniques – Methodology for IT security evaluation,” and the related ISO for controls ISO/IEC 27002, “Infor- mation Technology – Security Techniques – Code of practice for information security controls.” ISO 15408, “Information Technology – Security Techniques – Evaluation criteria for IT security,” also known as the Common Criteria. Although NIST standards may appear U.S.-centric at first glance, they are used as a reference for organizations throughout the world if there is not another national, interna- tional, or contractual standard those organizations must meet. In addition to these broad standards, specific standards like the ISA/IEC 62443 series of standards for industrial automation and control systems may be used where appropriate. Using a standard methodology or framework allows consistency between assessments, allowing comparisons over time and between groups or divisions. In many cases, orga- nizations will conduct their own internal assessments using industry standards as part of their security operations efforts, and by doing so, they are prepared for third-party or inter- nal audits that are based on those standards. In addition to choosing the standard and methodology, it is important to understand that audits can be conducted as internal audits using the organization’s own staff or as Perform Security Assessment Activities 213 external audits using third-party auditors. In addition, audits of third parties like cloud service providers can be conducted. Third-party audits most often use external auditors, rather than your organization’s own staff. Once the high-level goals and scope have been set and the assessment standard and methodology have been determined, assessors need to determine further details of what they will examine. Detailed scoping questions may include the following: What portions of the network and which hosts will be tested? Will auditing include a review of user files and logs? Is susceptibility of staff to social engineering being tested? Are confidentiality, integrity, and availability in scope? Are there any privacy concerns regarding the audit and data it collects? Will processes, standards, and documentation be reviewed? Are employees and adherence to standards being examined? Are third-party service providers, cloud vendors, or other organizations part of the assessment? Other aspects of security are also important. A complete assessment should include answers to these questions: Are architectural designs documented with data flows and other details matching the published design? Are things designed securely from the beginning of the design process? Is change management practiced? Does a configuration management database exist? Are assets tracked? Are regular vulnerability scans, and maybe even penetration tests, conducted? Are policies, procedures, and standards adhered to? Is the organization following industry-recognized best practices? Budget and time constraints can make it impossible to test everything, so manage- ment must determine what will be included while balancing their assessment needs against their available resources. Once the goals, scope, and methodology have been determined, the assessment team must be selected. The team may consist of the company’s own staff, or external personnel may be retained. Factors that can aid in determining which option to select can include industry regulations and requirements, budget, goals, scope, and the expertise required for the assessment. 214 CHAPTER 3 Risk Identification, Monitoring, and Analysis With the team selected, a plan should be created to identify how to meet the assessment’s goals in a timely manner and within the budget constraints set forth by management. With the plan in place, the assessment can be conducted. This phase should generate significant documentation on how the assessment target complies or fails to comply with expectations. Any exceptions and noncompliance must be documented. Once the assessment activities are completed, the results can be compiled and reported to management. Upon receipt of the completed report, management can create an action plan to 3 address the issues found during the audit. For instance, a timeframe can be set for install- Risk Identification, Monitoring, and Analysis ing missing patches and updates on hosts, or a training plan can be created to address process issues identified during the assessment. Interpretation and Reporting of Scanning and Testing Results Your security assessment workflow doesn’t stop when the tests are done and the scans are complete. In many respects, this is when the hardest task begins: analyzing and assessing what those tests and scans have told you and trying to determine what they mean with respect to your security posture, a particular set of security or risk controls, or a potential threat. NIST 800-115 provides succinct but potent guidance on this subject when it says that (in the context of security assessment and testing) the purpose of analysis is to identify false positives, identify and categorize vulnerabilities, and determine (if possible) the underlying cause(s) of the vulnerabilities that have been detected. Once that analysis is complete, you can then make informed judgments as to whether each vulnerability rep- resents a risk to avoid, accept, transfer, or treat. You’re also in a more informed position to recommend risk treatment approaches, some of which may need further analysis and study to determine costs, implementation strategies, and anticipated payback periods. Root-cause analysis (RCA) is a simple but powerful technique to apply here, as you’re struggling to reduce e-mountains of test data into actionable intelligence and reporting for your senior managers and leaders. RCA is essentially asking “why?” over and over again, until you’ve chased back through proximate causes and contributing factors to find the essential best opportunity to resolve the problem. NIST 800-115 identifies a variety of categories of root or contributing (proximate) causes of vulnerabilities.3 Insufficient patch management, such as failing to apply patches in a timely fashion or failing to apply patches to all vulnerable systems. Insufficient threat management, including outdated antivirus sig- natures, ineffective spam filtering, and firewall rulesets that do not enforce the organization’s security policy. 3 NIST 800-115, 2008, pg 58. Perform Security Assessment Activities 215 Lack of security baselines, such as inconsistent security configuration settings on similar systems. Poor integration of security into the system development life cycle, such as missing or unsatisfied security requirements and vulnerabilities in organization-developed application code. Security architecture weaknesses, such as security technologies not being properly integrated into the infrastructure (e.g., poor placement, insufficient coverage, or outdated technologies), or poor placement of systems that increases their risk of compromise. Inadequate incident response procedures, such as delayed responses to penetration testing activities. Inadequate training, both for end users (e.g., failure to recognize social engineering and phishing attacks, deployment of rogue wireless access points) and for network and sys- tem administrators (e.g., deployment of weakly secured systems, poor security maintenance). Lack of security policies or policy enforcement (e.g., open ports, active services, unsecured protocols, rogue hosts, weak passwords). As you do your analysis, characterize your conclusions into two broad sets: findings and good findings. On the one hand, findings are the recommendations you’re making for corrective action; they identify problems, deficiencies, hazards, or vulnerabilities that need prompt attention. Your analysis may or may not provide enough insight to recom- mend a particular approach to resolving the finding, but that’s not immediately import- ant. Getting management’s attention on the findings should be the priority. On the other hand, good findings are the positive acknowledgment that previously instituted security controls and procedures are working properly and that the investment of time, money, and people power in creating, installing, using, maintaining, and monitoring these controls is paying off. Management and leadership need to hear this as well. (Ethical penetration testers often make good use of this analysis and reporting tactic; it helps keep things in perspective.) Remediation Validation So, you found a risk or a vulnerability, and you decided to fix it; you’ve put some kind of control in place that in theory or by design is supposed to eliminate the risk or reduce it to a more acceptable level. Perhaps part of that remediation includes improving the affected component’s ability to detect and generate alarms concerning precursors or indi- cators of possible attempts to attack the component. Common sense dictates that before turning that risk control and the new versions of the affected systems or applications over to end users, some type of regression testing and acceptance testing must be carried out. Two formal test processes, often conducted together, are used to validate that risk 216 CHAPTER 3 Risk Identification, Monitoring, and Analysis remediation actions do what is required without introducing other disruptions into the system. Security acceptance testing validates that the risk control effectively does what is required by the risk mitigation plan and that any residual risks are less than or equal to what was anticipated (and approved by management) in that plan. Regression testing establishes confidence that the changes to the component (the fix to the identified prob- lem or vulnerability) did not break other required functions; the fix didn’t introduce other errors into the system. 3 Unfortunately, it’s all too common to discover that security acceptance testing (or Risk Identification, Monitoring, and Analysis regression testing) has identified additional items of risk or levels of residual risk that go beyond what was anticipated when the decision was made to apply the particular miti- gation technique in question. At this point, the appropriate levels of management and leadership need to be engaged; it is their responsibility to decide whether to accept this changed risk posture and migrate the control into production systems for operational use or to continue to accept the risk as originally understood while “going back to the draw- ing board” for a better fix to the vulnerability and its root cause. Audit Finding Remediation In almost all cases, audit findings present your organization with a set of deficiencies that must be resolved within a specified period of time. Depending upon the nature and sever- ity of the findings and the audit standards themselves, your business might be disbarred (blocked) from continuing to engage in those types of business operations until you can prove that the deficiencies have been remediated successfully. This might require the offending systems and procedures be subjected to a follow-on audit or third-party inspec- tion. Less severe audit findings might allow your organization to provisionally continue to operate the affected systems but perhaps with additional temporary safeguards (such as increased monitoring and inspection) or other types of compensating controls until the remediation can be successfully demonstrated. Naturally, this suggests that finishing the problem-solving analysis regarding each audit finding, identifying and scoping the cost-effective remediation options, and suc- cessfully implementing management’s chosen risk control are key to staying in the good graces of your auditors and the regulatory authorities they represent. As with any other risk control, your implementation planning for controls related to audit findings should contain a healthy dose of regression and acceptance testing. It should also have clearly defined decision points for management and leadership to sign off on the as-tested fix and commit to having it moved into production systems and use. The final audit findings closure report package should also contain the relevant con- figuration management and change control records pertaining to the systems elements affected by the finding and its remediation; don’t forget to include operational procedures in this too! Perform Security Assessment Activities 217 Manage the Architectures: Asset Management and Configuration Control Think back to how much work it was to discover, understand, and document the infor- mation architecture that the organization uses and then the IT architectures that support that business logic and data. Chances are that during your discovery phase, you realized that a lot of elements of both architectures could be changed or replaced by local work unit managers, group leaders, or division directors, all with very little if any coordination with any other departments. If that’s the case, you and the IT director, or the chief infor- mation security officer and the CIO, may have an uphill battle on your hands as you try to convince everyone that proper stewardship does require more central, coordinated change management and control than the company is accustomed to. The definitions of these three management processes are important to keep in mind: Asset management is the process of identifying everything that could be a key or valuable asset and adding it to an inventory system that tracks information about its acquisition costs, its direct users, its physical (or logical) location, and any relevant licensing or contract details. Asset management also includes processes to periodically verify that “tagged property” (items that have been added to the formal inventory) is still in the company’s possession and has not disappeared, been lost, or been stolen. It also includes procedures to make changes to an asset’s location, use, or disposition. Configuration management is the process by which the organization decides what changes in controlled systems baselines will be made, when to implement them, and the verification and acceptance needs that the change and business condi- tions dictate as necessary and prudent. Change management decisions are usually made by a configuration management board, and that board may require impact assessments as part of a proposed change. Configuration control is the process of regulating changes so that only authorized changes to controlled systems baselines can be made. Configuration control implements what the configuration management process decides and prevents unauthorized changes. Configuration control also provides audit capabilities that can verify that the contents of the controlled baseline in use today are in fact what they should be. What’s at Risk with Uncontrolled and Unmanaged Baselines? As a member of your company’s information security team, consider asking (or looking yourself for the answers to!) the following kinds of questions: How do you know when a new device, such as a computer, phone, packet sniffer, etc., has been attached to your systems or networks? 218 CHAPTER 3 Risk Identification, Monitoring, and Analysis How do you know that one of your devices has “gone missing,” possibly with a lot of sensitive data on it? How do you know that someone has changed the operating system, updated the firmware, or updated the applications that are on your end users’ systems? How do you know that an update or recommended set of security patches, pro- vided by the systems vendor or your own IT department, has actually been imple- 3 mented across all of the machines that need it? Risk Identification, Monitoring, and Analysis How do you know that end users have received updated training to make good use of these updated systems? This list should remind you of the list of NIST 800-115’s list of root causes of vulner- abilities that you examined in the “Interpretation and Reporting of Scanning and Testing Results” section. If you’re unable to get good answers to any of these kinds of questions, from policy and procedural directives, from your managers, or from your own investiga- tions, you may be working in an environment that is ripe for disaster. Auditing Controlled Baselines To be effective, any management system or process must collect and record the data used to make decisions about changes to the systems being managed; they must also include ways to audit those records against reality. For most business systems, you need to consider three different kinds of baselines: recently archived, current oper- ational, and ongoing development. Audits against these baselines should be able to verify that: The recently archived baseline is available for fallback operations if that becomes necessary. If this happens, you also need to have an audited list of what changes (including security fixes) are included in it and which documented deficiencies are still part of that baseline. The current operational baseline has been tested and verified to contain proper implementation of the changes, including security fixes, which were designated for inclusion in it. The next ongoing development baseline has the set of prioritized changes and security fixes included in its work plan and verification and test plan. Audits of configuration management and control systems should be able to verify that the requirements and design documentation, source code files, builds and control sys- tems files, and all other data sets necessary to build, test, and deploy the baseline contain authorized content and changes only. This was covered in more depth in Chapter 1. Perform Security Assessment Activities 219 OPERATE AND MAINTAIN MONITORING SYSTEMS Traditional approaches to security process data collection involved solution-specific logging and data capture, sometimes paired with a central SIEM or other security man- agement device. As organizational IT infrastructure and systems have become more complex, security process data has also increased in complexity and scope. As the pace of change of your systems, your business needs, and the threat environment continue to accelerate, this piecemeal approach to monitoring applications, systems, infrastructures, and endpoints is no longer workable. Note Extending your security monitoring systems to include OT systems, such as smart buildings, ICS, SCADA, or IoT, has its own challenges, which are covered in the Appendix. Information security continuous monitoring (ISCM) is a holistic strategy to improve and address security. ISCM is designed to align facets of the organization including the people, the processes, and the technologies that make up the IT infrastructure, networks, systems, core applications, and endpoints. As with any security initiative, it begins with senior management buy-in. The most effective security programs consistently have upper management support. This creates an environment where the policies, the budget, and the vision for the company all include security as a cornerstone of the company’s success. Implementing a continuous information security monitoring capability should improve your ability to do the following: Monitor all systems. Understand threats to the organization. Assess security controls. Collect, correlate, and analyze security data. Communicate security status. Actively manage risk. A number of NIST publications, and others, provide planning and implementation guidance for bringing ISCM into action within your organization. Even if you’re not in the U.S. Federal systems marketplace, you may find these provide a good place to start: NIST SP800-137, “Information Security Continuous Monitoring (ISCM) for Federal Information Systems and Organizations” (https://csrc.nist.gov/ publications/detail/sp/800-137/final). Cloud Security Alliance STAR level 3 provides continuous monitoring-based certification (https://cloudsecurityalliance.org/star/continuous/). 220 CHAPTER 3 Risk Identification, Monitoring, and Analysis The FedRAMP Continuous Monitoring Strategy Guide (https://www.fedramp.gov/assets/resources/documents/CSP_Continuous_Monitoring_Strategy_ Guide.pdf). Most of these show a similar set of tasks that organizations must accomplish as they plan for, implement, and reap the benefits of an effective ISCM strategy. Define the strategy based on the organization’s risk tolerance. 3 Formally establish an ISCM program by selecting metrics. Risk Identification, Monitoring, and Analysis Implement the program and collect the necessary data, ideally via automation. Analyze and report findings, and determine the appropriate action. Respond to the findings based on the analysis and use standard options such as risk mitigation, risk transference, risk avoidance, or risk acceptance. Plan strategy and programs as needed to continually increase insight and visibility into the organization’s information systems. ✔✔ ISCM Is a Strategy; SIEM Is Just One Tool Don’t confuse the overall tasks you need to get done with the marketing copy describing a tool you may want to consider using. Security incident and event management (SIEM) systems have become increasingly popular over the last few years; be cautious, however, as you consider them for a place in your overall security toolkit. Your best bet is to focus first on what jobs the organization needs to get done and how those jobs need to be managed, scheduled, and coordinated, as well as how the people doing them need to be held accountable for producing on-time, on-target results. Once you understand that flow of work and the metrics such as key performance indicators (KPIs) or key risk indica- tors (KRIs) that you’ll manage it all with, you’ll be better able to shop for vendor-supplied security information management and analysis tools. It’s prudent to approach an ISCM project in a step-by-step fashion; each step along the way, as that task list suggests, offers the opportunity for the organization to learn much more about its information systems architectures and the types of data their systems can generate. With experience, your strategies for applying continuous monitoring as a vital part of your overall information security posture will continue to evolve. ICSM has become increasingly complex as organizations spread their operations into hosted and cloud environments and as they need to integrate third parties into their Operate and Maintain Monitoring Systems 221 data-gathering processes. Successful ICSM now needs to provide methods to intercon- nect legacy ICSM processes with third-party systems and data feeds. Be mindful, too, that compliance regimes (and their auditors) are becoming increasingly more aware of the benefits of a sound ICSM strategy and will be looking to see how your organization is putting one into practice. Let’s take a closer look at elements of an ICSM program; you may already have many of these in place (as part of “traditional” or legacy monitoring strategies). Events of Interest Broadly speaking, an event of interest is something that happens (or is still ongoing) that may have a possible information systems security implication or impact to it. It does not have to be an ongoing attack in and of itself to be “of interest” to your security operations center or other IT security team members. Vulnerability assessments, threat assessments, and operational experience with your IT infrastructures, systems, and applications should help you identify the categories of events that you want to have humans (or machine learning systems) spend more time and effort analyzing to determine if they are a warning sign of an impending attack or an attack in progress. Root-cause analysis should help you track back to the triggering events that may lead to a series of other events that culminate in the event of interest that you want to be alarmed about. (Recall that by definition an event changes something in your system.) Let’s start with the three broad categories of events or indicators that you’ll need to deal with. Think of each as a step in a triage process: the further along this list you go, the greater the likelihood that your systems are in fact under attack and that you need to take immediate action. First, let’s look at precursor events. A precursor is a signal or observable characteristic of the occurrence of an event; the event itself is not an attack but might indicate that an attack could happen in the future. Let’s look at a few common examples to illustrate this concept: Server or other logs that indicate a vulnerability scanner has been used against a system An announcement of a newly found vulnerability by a systems or applications vendor, an information security service, or a reputable vulnerabilities and exploits reporting service that might relate to your systems or platforms Media coverage of events that put your organization’s reputation at risk (deserv- edly or not) Email, phone calls, or postal mail threatening an attack on your organization, your systems, your staff, or those doing business with you 222 CHAPTER 3 Risk Identification, Monitoring, and Analysis Increasingly hostile or angry content in social media postings regarding customer service failures by your company Anonymous complaints in employee-facing suggestion boxes, ombudsman com- munications channels, or even graffiti in the restrooms or lounge areas Genuine precursors—ones that give you actionable intelligence—are quite rare. They are often akin to the “travel security advisory codes” used by many national govern- 3 ments. They rarely provide enough insight that something specific is about to take place. The best you can do when you see such potential precursors is to pay closer attention to Risk Identification, Monitoring, and Analysis your indicators and warnings systems, perhaps by opening up the filters a bit more. In doing so, you’re willing to accept more false positive alarms and spend more time and effort to assess them as the price to pay that a false negative (a genuine attack spoofing its way into your systems) is overlooked. You might also consider altering your security posture in ways that might increase protection for critical systems, perhaps at the cost of reduced throughput due to additional access control processing. An indicator is a sign, signal, or observable characteristic of the occurrence of an event that an information security incident may have occurred or may be occurring right now. Common examples of indicators include: Network intrusion detectors generate an alert when input buffer overflows might indicate attempts to inject SQL or other script commands into a web page or data- base server. Antivirus software detects that a device, such as an endpoint or removable media, has a suspected infection on it. Systems administrators, or automated search tools, notice filenames containing unusual or unprintable characters in them. Access control systems notice a device attempting to connect, which does not have required software or malware definition updates applied to it. A host or an endpoint device does an unplanned restart. A new or unmanaged host or endpoint attempts to join the network. A host or an endpoint device notices a change to a configuration-controlled ele- ment in its baseline configuration. An applications platform logs multiple failed login attempts, seemingly from an unfamiliar system or IP address. Email systems and administrators notice an increase in the number of bounced, refused, or quarantined emails with suspicious content or ones with unknown addressees. Unusual deviations in network traffic flows or systems loading are observed. Operate and Maintain Monitoring Systems 223 One type of indicator worth special attention is called an indicator of compromise (IOC), which is an observable artifact with high confidence signals that an information system has been compromised or is in the process of being compromised. Such artifacts might include recognizable malware signatures, attempts to access IP addresses or URLs known or suspected to be of hostile or compromising intent, or domain names associated with known or suspected botnet control servers. The information security community is working to standardize the format and structure of IOC information to aid in rapid dis- semination and automated use by security systems. As you’ll see in Chapter 4, the fact that detection is a war of numbers is both a bless- ing and a curse; in many cases, even the first few “low and slow” steps in an attack may create dozens or hundreds of indicators, each of which may, if you’re lucky, contain information that correlates them all into a suspicious pattern. Of course, you’re probably dealing with millions of events to correlate, assess, screen, filter, and dig through to find those few needles in that field of haystacks. There’s strong value in also characterizing events of interest in terms of whether they are anomalies, intrusions, unauthorized changes, or event types you are doing extra moni- toring of to meet compliance needs. Let’s take a closer look. Anomalies In general terms, an anomaly is any event that is out of the ordinary, irregular, or not quite normal. Endpoint systems that freeze for a few seconds and then seem to come back to life with no harm done are anomalies. Timeouts, or time synchronization mis- matches between devices on your network, may also be anomalies. Failures of disk drives to respond correctly to read, write, or positioning commands may be indicators of incip- ient hardware failures, or of contention for that device from multiple process threads. In short, until you know something odd has occurred and that its “oddness” has passed your filter and you’ve decided it’s worth investigating, you probably won’t know the anomaly occurred or whether it was significant (as an event of interest) until you gather up all of the log data for the affected systems and analyze it. There are some anomalous events that ought to be considered as suspicious, perhaps even triggering immediate alarms to security analysts and watch officers. Unscheduled systems reboots or restarts, or re-initializations of modems, routers, switches, or servers, usually indicate either that there’s an unmanaged software or firmware update process going on, that a hung application has tempted a user into a reboot as a workaround, or that an intruder is trying to cover their tracks. It’s not that our systems are so bug-free that they never hang, never need a user-initiated reboot, or never crash and restart themselves; it’s that each time this happens, your security monitoring systems should know about it in a timely manner, and if conditions warrant, send up an alarm to your human security analysts. 224 CHAPTER 3 Risk Identification, Monitoring, and Analysis Intrusions Intrusions occur because something happens that allows an intruder to bypass either the access control systems you’ve put in place or your expectations for how well those systems are defending you against an intrusion. Let’s recap some of the ways intruders can gain access to your systems: You’ve left the factory default usernames and passwords set on anything, even 3 guest access. Risk Identification, Monitoring, and Analysis Your network communications devices, especially wireless access points, are physi- cally accessible and can be manually triggered to install a bogus firmware update. Your chosen identity authentication approaches have exploitable vulnerabilities in them. A user’s login credentials have been compromised, exposed, intercepted, or copied. An otherwise trustworthy employee becomes a disgruntled employee or has been coerced or incentivized to betray that trust. A social engineering attacker discovers sufficient information to be able to imper- sonate a legitimate user. An endpoint device has been lost, stolen, or otherwise left untended long enough for attackers to crack its contents and gain access information. An attacker can find or access an endpoint device which an authorized user has left logged in, even if only for a few minutes. Keystroke loggers or other endpoint surveillance technologies permit an attacker to illicitly copy a legitimate user’s access credentials. And so on. Other chapters in this book offer ways to harden these entry points into your systems; when those hardening techniques fail, and they will, what do you do to detect an intru- sion while it is taking place, rather than waiting until a third party (such as law enforce- ment) informs you that you’ve been the victim of a data breach? Your organization’s security needs should dictate how strenuously you need to work to detect intrusions (which by definition are an unauthorized and unacceptable entry by a subject, in access control terms, into any aspect of your systems); detection and response will be covered in Chapter 4. Unauthorized Changes Configuration management and configuration control must be a high priority for your organization. Let’s face it: If your organization does not use any type of formalized Operate and Maintain Monitoring Systems 225 configuration management and change control, it’s difficult if not impossible to spot a change to your systems, hardware, networks, applications, or data in the first place, much less decide that it is an unauthorized change. Security policy is your next line of defense: administrative policies should establish acceptable use; set limits or establish procedures for controlling user-provided software, data, device, and infrastructure use; and establish programs to monitor and ensure compliance. Automated and semi-automated tools and utilities can help significantly in detecting and isolating unauthorized changes: Many operating systems and commercial software products now use digital sig- natures on individual files and provide auditing tools that can verify that all the required files for a specific version have been installed. Software blocked listing, typically done with antimalware systems, can identify known or suspected malicious code, code fragments, or associated files. Software allowed listing tools can block installation of any application not on the accepted, approved lists. Network scanning and mapping can find devices that may not belong, have been moved to a different location in the system, or have been modified from previous known good configurations. It may be that some or all of your information systems elements are not under effec- tive configuration management and control (or may even be operating with minimal access control protections). This can happen during mergers and acquisitions or when acquiring or setting up interfaces with special-purpose (but perhaps outdated) systems. Techniques and approaches covered in other chapters, notably Chapter 1, should be con- sidered as input to your plan to bring such potentially hazardous systems under control and then into your overall IT architecture. Compliance Monitoring Events Two types of events can be considered as compliance monitoring events by their nature: those that directly trace to a compliance standard and thus need to be accounted for when they occur, and events artificially triggered (that is, not as part of routine business operations nor as part of a hostile intrusion) as part of compliance demonstrations. Many compliance standards and regulations are becoming much more specific in terms of the types of events that have to be logged, analyzed, and reported on as part of their compliance regime. This has led to the development of a growing number of sys- tems and services that provide what is sometimes called real-time compliance monitoring. These typically use a data mart or data warehouse infrastructure into which all relevant systems, applications, and device logs are updated in real time or near real time. Analysis 226 CHAPTER 3 Risk Identification, Monitoring, and Analysis tools, including but not limited to machine learning tools, examine this data to detect whether events have occurred that exceed predefined limits or constraint conditions. Many of these systems try to bridge the conceptual gap between externally imposed compliance regimes (imposed by law, regulation, contract, or standards) and the detail- level physical, logical, and administrative implementation of those compliance require- ments. Quite often, organizations have found that more senior, policy-focused individuals are responsible for translating contracts, standards, or regulations into organizational 3 administrative plans, programs, and policies, while more technically focused IT experts Risk Identification, Monitoring, and Analysis are implementing controls and monitoring their use. The other type of compliance events might be seen when compliance standards require the use of deliberately crafted events, data injects, or other activities as part of ver- ification and validation that the system meets the compliance requirements. Two types of these you might encounter are synthetic transactions and real user monitoring events. Synthetic Transactions Monitoring frequently needs to involve more than simple log reviews and analysis to provide a comprehensive view of infrastructure and systems. The ability to determine whether a system or application is responding properly to actual transactions, regardless of whether they are simulated or performed by real users, is an important part of a moni- toring infrastructure. Understanding how a system or application performs and how that performance impacts users as well as underlying infrastructure components is critical to management of systems for organizations that want a view that goes deeper than whether their systems are up or down or under a high or low load. Two major types of transaction monitoring are performed to do this: synthetic transactions and real user monitoring. Synthetic transactions are actions run against monitored objects to see how the system responds. The transaction may emulate a client connecting to a website and submitting a form or viewing the catalog of items on a web page, which pulls the information from a database. Synthetic transactions can confirm the system is working as expected and that alerts and monitoring are functioning properly. Synthetic transactions are commonly used with databases, websites, and applica- tions. They can be automated, which reduces the workload carried by administrators. For instance, synthetic transactions can ensure that the web servers are working properly and responding to client requests. If an error is returned during the transaction, an alert can be generated that notifies responsible personnel. Therefore, instead of a customer complaining that the site is down, IT can proactively respond to the alert and remedy the issue, while impacting fewer customers. Synthetic transactions can also measure response times to issues, allowing staff to proactively respond and remediate slowdowns or mimic user behavior when evaluating newly deployed services, prior to deploying the service to production. Operate and Maintain Monitoring Systems 227 Synthetic transactions can be used for several functions, including the following: Application monitoring: Is an application responsive, and does it respond to que- ries and input as expected? Service monitoring: Is a selected service responding to requests in a timely man- ner, such as a website or file server? Database monitoring: Are back-end databases online and responsive? TCP port monitoring: Are the expected ports for an application or service open, listening, and accepting connections? Network services: Are the DNS and DHCP servers responding to queries? Is the domain controller authenticating users? Real User Monitoring Real user monitoring (RUM) is another method to monitor the environment. Instead of creating automated transactions and interactions with an application, the developer or analyst monitors actual users interacting with the application, gathering information based on actual user activity. Real user monitoring is superior to synthetic transactions when actual user activity is desired. Real people will interact with an application in a variety of ways that synthetic transactions cannot emulate because real user interactions are harder to anticipate. However, RUM can also generate much more information for analysis, much of which is spurious since it will not be specifically targeted at what the monitoring process is intended to review. This can slow down the analysis process or make it difficult to isolate the cause of performance problems or other issues. In addition, RUM can be a source of privacy concerns because of the collection of user data that may include personally identifiable information, usage patterns, or other details. Synthetic transactions can emulate certain behaviors on a scheduled basis, including actions that a real user may not perform regularly or predictably. If a rarely used element of an application needs testing and observation, a synthetic transaction is an excellent option, whereas the developer or analyst may have to wait for an extended amount of time to view the transaction when using RUM. By using a blend of synthetic transactions and real user monitoring, the effectiveness of an organization’s testing and monitoring strategy can be significantly improved. Down- time can be reduced because staff is alerted more quickly when issues arise. Application availability can be monitored around the clock without human intervention. Compliance with service level agreements can also be accurately determined. The benefits of using both types of monitoring merits consideration. 228 CHAPTER 3 Risk Identification, Monitoring, and Analysis Logging Logs are generated by most systems, devices, applications, and other elements of an organiza- tion’s infrastructure. They can be used to track changes, actions taken by users, service states and performance, and a host of other purposes. These events can indicate security issues and highlight the effectiveness of security controls that are in place. Assessments and audits rely on log artifacts to provide data about past events and changes and to indicate whether there 3 are ongoing security issues, misconfigurations, or abuse issues. Security control testing also relies on logs including those from security devices and security management systems. Risk Identification, Monitoring, and Analysis The wide variety of logs, as well as the volume of log entries that can be generated by even a simple infrastructure, means that logs can be challenging to manage. Logs can capture a significant amount of information and can quickly become overwhelming in volume. They should be configured with industry best practices in mind, including implementing centralized collection, validation using hashing tools, and automated anal- ysis of logs. Distinct log aggregation systems provide a secure second copy, while allow- ing centralization and analysis. In many organizations, a properly configured security information and event management (SIEM) system is particularly useful as part of both assessment and audit processes and can help make assessment efforts easier by allowing reporting and searches. Even when centralized logging and log management systems are deployed, security practitioners must strike a balance between capturing useful informa- tion and capturing too much information. ✔✔ CIANA+PS Applies to Log Files Too! Maintaining log integrity is a critical part of an organization’s logging practice. If logs can- not be trusted, then auditing, incident response, and even day-to-day operations are all at risk since log data is often used in each of those tasks. Thus, organizations need to assess the integrity of their logs as well as their existence, content, and relevance to their purpose. Logs should have proper permissions set on them, they should be hashed to ensure that they are not changed, a secure copy should be available in a separate secure location if the logs are important or require a high level of integrity, and of course any changes that impact the logs themselves should be logged! Assessing log integrity involves validating that the logs are being properly captured, that they cannot be changed by unauthorized individuals or accounts, and that changes to the logs are properly recorded and alerted on as appropriate. This means that auditors and security assessors cannot simply stop when they see a log file that contains the information they expect it to. Instead, technical and administrative procedures around the logs themselves need to be validated as part of a complete assessment process. Operate and Maintain Monitoring Systems 229 Assessments and audits need to look at more than just whether logs are captured and their content. In fact, assessments that consider log reviews look at items including the following: What logs are captured? How is log integrity ensured? Are log entries hashed and validated? Are the systems and applications that generate logs properly configured? Do logging systems use a centralized time synchronization service? How long are logs retained for, and does that retention time period meet legal, business, or contractual requirements? How are the logs reviewed, and by whom? Is automated reporting or alarming set up and effective? Is there ongoing evidence of active log review, such as a sign-off process? Are logs rotated or destroyed on a regular basis? Who has access to logs? Do logs contain sensitive information such as passwords, keys, or data that should not be exposed via logs to avoid data leakage? Policies and procedures for log management should be documented and aligned to standards. ISO 27001 and ISO27002 both provide basic guidance on logging, and NIST provides SP 800-92, “Guide to Computer Security Log Management.” Since logging is driven by business needs, infrastructure and system design, and the organization’s func- tional and security requirements, specific organizational practices and standards need to be created and their implementation regularly assessed. Source Systems On the one hand, nearly every device, software package, application, and platform or ser- vice that is part of your systems should be considered as a data source for your continuous monitoring and analysis efforts. But without some logical structure or sense of purpose to your gathering of sources, you’re liable to drown in petabytes of data and not learn much in the process. On the other hand, it might be tempting to argue that you should use a prioritized approach, starting with your highest-valued information assets or your highest-priority business processes and the platforms, systems, and other elements that support them. Note the danger in such a viewpoint: It assumes that your attackers will use your most important “crown jewels” of your systems as their entry points and the places from which they’ll execute their attack. In many respects, you have to face this as an “all-risks” approach, as insurance underwriters refer to it. 230 CHAPTER 3 Risk Identification, Monitoring, and Analysis There is some benefit in applying a purposeful or intentional perspective as you look at your laundry list of possible data sources. If you’re trying to define an “operational normal” and establish a security baseline for anomaly detection, for example, you might need to tailor what you log on which devices or systems differently, than if you’re trying to look at dealing with specific categories of risk events. No matter how you look at it, you’re talking large volumes of data, which require smart filtering and analysis tools to help you make sense of it quickly enough to make risk 3 containment decisions before it’s too late. Risk Identification, Monitoring, and Analysis ✔✔ Data Collection and Processing: Probably Cheaper Than Disaster Recovery In some IT circles, people are known to say that disk space is cheap as a way of saying that the alternatives tend to be far, far more costly in the long run. A Dell EMC survey, reported by Johnny Wu at Searchdatabackup.techtarget.com in March 2019, sug- gests that the average impact to businesses of 20 hours of downtime can exceed half a million dollars; losing 2.13 TB of data can double that average impact. Compare that with the budget you’d need to capture all log and event data and have sufficiently high-throughput analysis capabilities to make sense of it in near real time, and you’ve got the makings of your business case argument for greater IT security investment. This is also part of the argument for moving to SIEM and SOAR (security orchestration, automation, and response) systems or services. These can provide smarter ways to gather operational security insights—and detect security incidents more quickly—than the traditional filtering-the-logs approach could ever support. For all data sources, be they hardware, software, or firmware, part of your infrastruc- ture or a guest endpoint, you should strongly consider making current health and status data something that you request, capture, and log. This data would nominally reflect the current identity and version of the hardware, software, and firmware, showing in partic- ular the latest (or a complete list) of the patches applied. It would include antimalware, access control rule sets, or other security-specific dataset versions and update histories. Get this data every time a subject connects to your systems; consider turning on health and status-related access control features like quarantines or remediation servers, to pre- vent out-of-date and out-of-touch endpoints from possibly contaminating your infrastruc- tures. And of course, log everything about such accesses! Operate and Maintain Monitoring Systems 231 In doing this, and in routinely checking on this health information periodically throughout the day, you’re looking for any indicators of compromise that signal that one of your otherwise trusted subjects has been possibly corrupted by malware. Depending upon the security needs of your organization, you may need to approach the continuous monitoring, log data analysis, and reporting set of problems with the same sensibilities you might apply to investigating an incident or crime scene. The data you can gather from an incident or crime scene is dirty; it is incomplete, or it may have been inadvertently or deliberately corrupted by people and events at the scene or by first responders. So, you focus first on what looks to be a known signature of an event of inter- est and then look for multiple pieces of corroborating evidence. Your strongest corrobo- ration comes from evidence gathered by dissimilar processes or from different systems (or elements of the incident scene); thus, it’s got to walk, talk, and swim like a duck, rather than just be seen to walk by three different people, for you to sound the alarm that a duck has intruded into your systems. Let’s look at the obvious lists in a bit more detail; note that a given device or software element may fit in more than one category. On-Premises Servers and Services Almost all server systems available today have a significant number of logging features built in, both as troubleshooting aids and as part of providing accountability for access control and other security features. Services are engaged by users via applications (such as Windows Explorer or their web browser), which use applications program interfaces or systems interfaces to make service requests to the operating system and server routines; more often than not, these require a temporary elevation of privilege for that execution thread. All of those transactions—requests, acknowledge or rejection, service perfor- mance, completion, or error conditions encountered—can generate log entries; many can be set to generate other events that route alarm signals to designated process IDs or other destinations. Key logs to look for include the following: The server security log will show successful and unsuccessful logins, attempts to elevate privilege, and connection requests to resources. Depending upon the operating system and server system in use and your own customization of it, this log may also keep track of attempts to open, close, write, delete, or modify the metadata associated with files. The access control system logs should be considered as a “mother lode” of rich and valuable data. Systems logs on each server keep track of device-level issues, such as requests, errors, or failures encountered in attempting to mount or dismount removable, fixed, or virtual storage volumes. Operating system shutdown and restart requests, 232 CHAPTER 3 Risk Identification, Monitoring, and Analysis OS-level updates, hibernation, and even processor power level settings are reflected here. Directory services, including workstation, endpoint, and system-level directory ser- vices (such as Microsoft Active Directory or other X.500 directory services), can be tailored to log virtually everything associated with entities known to these systems as they attempt to access other entities. 3 Single sign-on (SSO) activities should be fully logged and included on your shop- ping lists as a quality data source. Risk Identification, Monitoring, and Analysis File replication services, journaling services, and other storage subsystems ser- vices log or journal a significant amount of information. This is done to greatly enhance the survivability of data in the event of device-level, server, or application problems (it’s what makes NTFS or EFS a far more reliable and better-performing file system than good old FAT, for example). These logs and journals are great sources of data when hunting for a possible exfiltration in the works. DNS servers can provide extensive logs of all attempts to resolve names, IP addresses, flush or update caches, and the like. Virtual machine managers or hypervisors should be logging the creation, modifi- cation, activation, and termination of VMs. DHCP services should log when new leases are issued or expire or devices con- nect or disconnect. Print servers should log jobs queued, their sources, destination printer, and com- pletion or error status of each job. Fax servers (yes, many business still use fax traffic, even if over the Internet) should log all traffic in and out. Smart copiers and scanners should log usage, user IDs, and destination files if applicable. Email servers should log connection requests, spoof attempts or other junk mail filtered at the server, attempts to violate quality or security settings (such as maxi- mum attachment sizes), and the use of keyword-triggered services such as encryp- tion of outbound traffic or restriction of traffic based on keywords in the header or message body. Applications and Platforms What started out decades ago as a great troubleshooting and diagnostic capability has come of age, as most applications programs and integrated platform solutions provide extensive logging features as part of ensuring auditable security for the apps themselves Operate and Maintain Monitoring Systems 233 and for user data stored by or managed by the app. Many apps and platforms also do their own localized versions of access control and accounting, which supports change management and control, advanced collaboration features such as co-authoring and revi- sion management, and of course security auditing and control. Some things to look for include the following: User-level data, such as profiles, can and should be logged, as changes may reveal that a legitimate user’s identity has been spoofed or pirated. Document-, file-, or dataset-level logging can reveal patterns of access that might be part of an exfiltration, a covert path, or other unauthorized access. Integrated applications platforms, particularly ones built around a core database engine, often have their own built-in features for defining user identities, assigning and managing identity-based privileges, and accounting for access attempts, suc- cesses, and failures. Application crash logs might reveal attacks against the application. Other application log data can highlight abnormal patterns of applications usage. Application-managed data backup, recovery, and restoration should all be creat- ing log events. External Servers and Services Your organization may have migrated or originally hosted much of its business logic in cloud-hosted solutions, using a variety of cloud service models. Unless these have been done “on the cheap,” there should be extensive event logging information available about these services, the identities of subjects (users or processes) making access attempts to them, and other information. If you’re using an integrated security continuous monitoring (ISCM) product or system, you should explore how to best automate the transfer of such data from your cloud systems provider into your ISCM system (which, for many good rea- sons based on reliability, availability, and integrity, may very well be cloud hosted itself). Other services that might provide rich security data sources and logs, nominally external to your in-house infrastructure and not directly owned or managed by your team, might include: IDaaS and other identity management solutions Services provided via federated access arrangements Data movement (upload and download, replication, etc.) across an external inter- face to such service providers Hot, warm, and cold backup site service providers Off-site data archiving services 234 CHAPTER 3 Risk Identification, Monitoring, and Analysis Workstations and Endpoints All endpoint devices that are allowed to have any type of access arrangements into your systems should, by definition and by policy, be considered as part of your systems; thus, they should be subject to some degree of your security management, supervision, and control. You’ll look at endpoint security in greater detail in Chapter 7, and endpoint device access control will be addressed in Chapter 6. That said, consider gathering up the following kinds of data from each endpoint every time it connects: 3 Risk Identification, Monitoring, and Analysis Health check data (current levels of patches, malware definitions, rule sets, etc., as appropriate); require this data at initial connect and query it throughout their connected day, and use automated tools to detect changes that might be worthy of an alarm. Local account login and logoff. Device-level reboots. Application installations or updates. Security events, such as elevation of user privilege or invoking trusted superuser or administrative IDs. File services events, such as creation, deletion, movement, and replication. USB or removable storage mounts, dismounts, and use. Other USB device type connections. Bluetooth, Wi-Fi, or other connection-related events. Applications use, modification, and diagnostics logs. IP address associated with the device. Changes to roaming, user, or device-level profiles. Network Infrastructure Devices All of the modems, routers, switches, gateways, firewalls, IDS or IPS, and other network security devices and systems that make up your networks and communications infrastruc- tures should be as smart as possible and should be logging what happens to them and through them. Administrator logins and logouts to the device itself Reboots, resets, loss of power or similar events Connections established to other services (such as DHCP and DNS) Health check information Data transfers in and out of the device Operate and Maintain Monitoring Systems 235 Configuration changes of any kind Attempts to access restricted domains, IP addresses, applications, or services Attempts to circumvent expired certificates Dial-in connection attempts from caller IDs outside of your normal, accepted ranges (that is, if you still have real POTS-supported dial-in connections available!) Some of these classes of data, and others not in this list, may be found in the services and servers that provide the support (such as managing certificates, identities, or encryp- tion services). IoT Devices The first generations of Internet of Things (IoT) devices have not been known to have much in the way of security features, even to the level of an ability to change the factory-default username, password, or IP address. If your company is allowing such “artificial stupidity” to connect to your systems, this could be a significant hazard and is worthy of extra effort to control the risks these could be exposing the organization to. (See Chapter 7 and the Appendix for more information.) If your IoT or other robot devices can provide any of the types of security-related log or event information such as what you’d require for other types of endpoints, by all means include them as data sources for analysis and monitoring. Legal and Regulatory Concerns Chapter 1 highlighted many of the significant changes in international and national laws that dictate information security requirements upon organizations that operate within their jurisdictions. And for all but the most local of organizations, most businesses and nonprofits find themselves operating within multiple jurisdictions: Their actions and information, as well as their customers and suppliers, cross multiple frontiers. As a result, you may find that multiple sets of laws and regulations establish constraints on your abil- ity to monitor your information systems, collect specific information on user activities, and share that data (and with whom) as you store, collate, analyze, and assess it. There is often a sense of damned-if-you-do to all of this: You may violate a compliance require- ment if you do collect and exploit such information in the pursuit of better systems secu- rity, but you may violate another constraint if you do not. Almost all audit processes require that critical findings be supported by an audit trail that supports the pedigree (or life history) of each piece of information that is per- tinent to that finding. This requires that data cleansing efforts, for example, cannot lose 236 CHAPTER 3 Risk Identification, Monitoring, and Analysis sight of the original form of the data, errors and omissions included, as it was originally introduced into your systems (and from whence it came). Auditors must be able to walk back each step in the data processing, transformation, use, and cleansing processes that have seen, touched, or modified that data. In short, audit standards dictate very much the same type of chain of custody of information—of all kinds—that forensics investiga- tions require. From one point of view, the information systems industries and their customers have 3 brought this upon themselves. They’ve produced and use systems that roughly two out Risk Identification, Monitoring, and Analysis of three senior leaders and managers do not trust, according to surveys in 2016 by CapGemini and EMC, and in 2018 by KPMG International; they produce systems that seemingly cannot keep private data private nor prevent intruders from enjoying nearly seven months of undetected freedom to explore, exploit, exfiltrate, and sometimes destructively disrupt businesses that depend upon them. At the same time, organized crime continues to increase its use of cybercrime to pursue its various agendas. Governments and regulators, insurers and financial services providers, and sharehold- ers have responded to these many threats by imposing increasingly stringent compliance regimes upon public and private organizations and their use of information systems. Yet seemingly across the board, senior leadership and management in many businesses consider that the fines imposed by regulators or the courts are just another risk of doing business; their costs are passed on to customers, shareholders, or perhaps to workers if the company must downsize. Regulators and legislatures are beginning to say “enough is enough,” and we are seeing increasing efforts by these officials to respond to data breaches and information security incidents by imposing penalties and jail time on the highest-ranking individual decision-makers found to be negligent in their duties of due care and due diligence. It’s beyond the scope of this book to attempt to summarize the many legal and regula- tory regimes you might need to be familiar with. Your organization’s operating locations and where your customers, partners, and suppliers are will also make the legal compli- ance picture more complex. Translating the legal, regulatory, and public policy com- plexities into organizational policies takes some considerable education and expertise, along with sound legal advice. As the on-scene information security practitioner, be sure to ask the organization’s legal and compliance officers what compliance, regulatory, or other limitations and requirements constrain or limit your ability to monitor, assess, and report on information security-related events of interest affecting your systems. Let the organization’s attorneys and compliance officers or experts chart your course through this minefield for you. Operate and Maintain Monitoring Systems 237 ✔✔ Your Logbooks as Your Lifeline As a practicing information security professional, you have so many good reasons for keeping your own logbook or journal of activity. Take notes about key decisions you’ve made and how you reached them. Log when you’ve been given directions (even if veiled as suggestions or requests) to make changes to the security posture of the systems and information that you protect. These files can provide important clues to assist in a forensic investigation, as well as capture key insights vital to responding to and cleaning up after a security incident. You create them with the same discipline you’d use in establishing a chain of custody for important evidence or the pedigree or audit trail of a set of important data. All of these contribute to achieving a transparent, auditable, and accountable security climate in your organization. ANALYZE MONITORING RESULTS Ongoing and continuous monitoring should be seen as fulfilling two very important roles. First, it’s part of your real-time systems security alarm system; the combination of controls, filters, and reporting processes are providing your on-shift watch standers in your security or network operations centers with tipoffs to possible events of interest. These positive signals (that is, an alarm condition has been detected) may be true indications of a security incident or false positives; either way, they need analytic and investigative atten- tion to determine what type of response if any is required. The second function that the analysis of monitoring data should fulfill is the hunt for the false negative—the events in which an intruder spoofed your systems with falsified credentials or found an exploitable vulnerability in your access control system’s logic and control settings. In either case, analysis of monitoring data can provide important insight into poten- tial vulnerabilities within your systems. And it all starts with knowing your baselines. The term baseline can refer to any of three different concepts when we use it in an IT or infor- mation security context. An architectural baseline is an inventory or configuration management list of all of the subsystems, elements, procedures, or components that make up a particular system. From most abstract to most detailed, these are: An information architecture baseline captures what organizations need to know in order to get work done, as they use, create, and share information. 238 CHAPTER 3 Risk Identification, Monitoring, and Analysis An information systems architecture baseline provides the “how-to” of an informa- tion architecture by describing the workflows and processes used by an organiza- tion and its people in terms of the information they know, learn, and create. An information technology architecture baseline identifies all the hardware, software, firmware, communications and networks, and procedural elements that comprise a system, detailed to the specific version, update, patch, and other configuration-controlled changes that have been applied to it. This baseline 3 description also should include the physical and logical location or deployment Risk Identification, Monitoring, and Analysis details about each element. We typically see asset-based risk management needing to focus on the IT architecture as the foundational level; but even an asset purist has to link each asset back up with the organizational priorities and objectives, and these are often captured in process- oriented or outcomes-oriented terms—which the other two baselines capture and make meaningful. Chapter 1 examined the tailored application of a standard, or a set of required or desired performance characteristics, as a baseline. As a subset of baselines, security base- lines express the minimum set of security controls necessary to safeguard the information security requirements and properties for a particular configuration. Scoping guidance is often published as part of a baseline, defining the range of deviation from the baseline that is acceptable for a particular baseline. This scoping guidance should interact with configuration management and control processes to ensure that the directed set of secu- rity performance characteristics are in fact properly installed and configured in the physi- cal, logical, and administrative controls that support them. The third use of baselines in information security contexts refers to a behavioral baseline, which combines a description of a required set of activities with the observable characteristics that the supporting systems should demonstrate; these characteristics act as confirmation that the system is performing the required activities correctly. Many times, these are expressed as confirmation-based checklists: You prepare to land an aircraft by following a checklist that dictates flap, engine power, landing gear, landing lights, and other aircraft configuration settings, and you verify readiness to land by checking the indi- cators associated with each of these devices (and many more). The next section will explore how you can put these concepts to work to enhance your information security posture. Analyze Monitoring Results 239 ✔✔ Anomaly Detection: UEBA Takes Center Stage The dramatic growth in the frequency and severity of cyberattacks demand a greater emphasis on anomalous behavior detection and response. Behavioral modeling with machine learning and other AI techniques are now being applied to architectures and systems across all layers of systems design and use. UEBA, for example, is being expanded to examine the behavior of multiple entities and users, across larger spans of time, to call attention to more complex, sophisticated attacks in which any one user or entity’s actions are not suspicious in their own right, but taken together suggest mali- cious intent. SSCPs are becoming more involved with the operational use of such technologies, even to include helping to explain to human managers the decisions the machines have made or the actions they have taken or recommended based on their models. Security Baselines and Anomalies As you saw in the “Source Systems” section, you’ve got many rich veins of data to mine that can give you near-real time descriptions of the behavior of your IT systems and infrastructures. Assuming these IT systems and infrastructures are properly described, documented, and under effective configuration management and configuration control, you’re ready for the next step: identifying the desired baseline behavior sets and gathering the measurements and signatures data that your systems throw off when they’re acting properly within a given behavioral baseline. Let’s use as an illustration a hypothetical industrial process control environment, such as a natural gas-fired electric power generation and distribution system. Furthermore, let’s look at just one critical subsystem in that environment: the real-time pricing system that networks with many different electric power wholesale distributor networks, using a bid-ask- sell system (as any commodity exchange does) to determine how much electricity to gener- ate and sell to which distributor. This process is the real business backbone of national and regional electric power grids, such as the North American or European grid system. Note Enron’s manipulation of demand and pricing information, via the network of real-time bid-ask-sell systems used by public utilities across North America, led to the brownouts and 240 CHAPTER 3 Risk Identification, Monitoring, and Analysis rolling blackouts that affected customers in California in 2000 and 2001. Nothing went wrong with the power generation and distribution systems—just the marketplaces that bought and sold power in bulk. Accidents of data configuration caused a similar cascade of brownouts in Australia in the mid-1990s; it is rumored that Russian interference caused similar problems for Estonia in 2007. This bid-ask-buy system might be modeled as having the following major behavioral 3 conditions or states: Risk Identification, Monitoring, and Analysis Development, test, and pre-deployment Transition from pre-deployment to operational use Normal demand cycles, based on North American seasonal weather patterns Disrupted demand cycles, due to major storms Distribution network failures (weather, accidents, or other events that disrupt high voltage bulk power distribution) Emergency shutdown of generating stations or key power distribution substations This list isn’t comprehensive. Many permutations exist for various circumstances. Define the Behavioral Baselines For each of those behavioral sets, analysts who know the systems inside and out need to go through the architectures and identify what they would expect to see in observ- able terms for key elements of the system. The bulk price of electricity (dollars per megawatt-hour [mwh] in North America, Euros per mwh in Europe) would be a gross level indicator of how the overall system is behaving, but it’s not fine-grained enough to tell you why the system is misbehaving. For each of those behavioral states (and many, many more), picture a set of “test points” that you could clip a logic probe, a protocol sniffer, or a special-purpose diagnostic indicator to, and make lots of measurements over time. If the system behaved “normally” while you gathered all of those measurements, then you have a behavioral fingerprint of the system for that mode of operational use. Behavioral baselines can be tightly localized in scope, such as at the individual cus- tomer or individual end user level. Attribute-based access control, for example, is based on the premise that an organization can sufficiently characterize the work style, move- ment patterns, and other behaviors of its chief financial officer, as a way of protecting itself from that CFO being a target of a whaling attack. Analyze Monitoring Results 241 Your systems, the tools you’re using, and your ability to manage and exploit all of this data will shape your strategies and implementation choices. It’s beyond our scope here to go into much detail about this, save to say that something has got to merge all of that data, each stream of which is probably in a different format, together into a useful data mart or data warehouse that you can mine and analyze. You’ll no doubt want to apply various filters, data smoothing and cleansing, and veri- fication tools, as you preprocess this data. Be mindful of the fact that you’re dealing with primary source evidence as you do this; protect that audit trail or the chain of custody as you carry out these manipulations, and preserve your ability to walk back and show who authorized and performed which data transformations where, when, and why. Successful detection and response to an incident, and survival of the post-response litigation that might follow, may depend upon this bit of pedigree protection. Finding the Anomalies Think about your systems environments, within your organization, as they exist right now, today. Some obvious candidates for anomalies to look for in your data should come to mind. Internal IP addresses, user IDs, and devices (MAC addresses or subject IDs) that aren’t predefined and known to your access control moat dragons Large, inexplicable swings in performance metrics, such as traffic levels on inter- nal network segments, your external connections, or the rate of help-desk ticket creation or user complaints Multiple failures by antimalware systems to work effectively or need intervention to restart Multiple attempts to log into accounts (perhaps a “replay attack” being conducted) Logins outside of normal business hours Dramatic changes in outbound traffic, especially from database, mail, or multi- media servers (Are you being exfiltrated, perhaps?) Numerous hits on your firewalls from countries, regions, or other address ranges that are outside of your normal business patterns Too many attempts by an internal workstation to connect to internal hosts, or external services, in ways that exceed “normal” expectations Some of those anomalies might be useful precursors to pay attention to; others, such as changes in traffic and loading, are probably high-priority emergency alarm signals! 242 CHAPTER 3 Risk Identification, Monitoring, and Analysis Start with that list; peel that behavioral onion down further, layer by layer, and iden- tify other possible anomalies to look for. Do You Allow or Block Behaviors? By its very name, behavioral anomaly detection suggests that you can define what “business normal” (or acceptable) is and is not. Too tightly defined and you risk your 3 positive and negative control approach being little more than signature recognition, Risk Identification, Monitoring, and Analysis templating, or profiling. To avoid this, a mix of AI and ML techniques may help you set the balance between positive and negative control approaches. Visualizations, Metrics, and Trends If the abiding purpose of doing analysis is to inform and

Incident Response and Recovery PDF

Document Details

Tags

Related

Summary

Full Transcript