SEBA Bachelor Summary PDF
Document Details
Uploaded by EvaluativeSet
Technische Universität München
Tags
Summary
This document provides an overview of requirements engineering and software estimation for business applications. It covers topics like motivation, scope, specification documents, agile contexts, and scrum. Furthermore, it discusses user stories, class diagrams, and the process of estimating software projects.
Full Transcript
IT Support for Business Applications Requirements Engineering 1. Motivation a. The vast majority of IT projects are not completed on time, within budget, or at all (only a 29% reported success rate in 2017) b. Insufficient requirements engineering is, broadly, t...
IT Support for Business Applications Requirements Engineering 1. Motivation a. The vast majority of IT projects are not completed on time, within budget, or at all (only a 29% reported success rate in 2017) b. Insufficient requirements engineering is, broadly, the #1 cause of failed IT projects (the Standish Group lists "incomplete requirements", "changing requirements" and "lack of user involvement" as some of their top factors) 2. Scope a. Requirements engineering is concerned with the elicitation, analysis, specification, and validation of software projects b. Software requirements express the needs and the constraints of a software project. i. Needs can be categorized into functional (what the system needs to do) and non-functional (how quickly, reliably, securely, user-friendly, etc. the system needs to do it) requirements. ii. Constraints are represented by so-called pseudo-requirements, which confine the possible solution space for the project. c. The requirements engineering phase should account for roughly 30% of the total energy expended in the realization of a software project. 3. Specification documents: a. Written mostly in natural, informal language to minimize misunderstanding. Formal language and diagrams will be introduced later, in the software engineering phase. b. Requirements Specification (“Lastenheft”) i. Result of the requirement identification phase, where the customer’s desires are compiled into a list of all “deliverables and services” to be fulfilled ii. Describes what is expected of the software solution to the customer’s problem iii. As general as possible, as restrictive as necessary iv. Enables the contractor to develop optimal solutions to the problem without unnecessary constraint c. Functional Specification (“Pflichtenheft”) i. Result of the requirements analysis phase, where the contractor creates a solution proposal that fulfills the Requirement Specification as best as possible ii. Describes how (concrete approaches, technologies, algorithms) the software solution to the customer’s problem is to be realized, in addition to detailing and supplementing the what of the Requirements Specification iii. Contains a complete description of the externally observable behaviour of the system iv. Serves as the basis for the contract between customer and contractor 4. Relevance in agile context a. Agile software engineering fosters continuous communication with the customer to respond and adapt to changing needs and desires b. Despite the constantly changing nature of agile development and the lower emphasis on extensive documentation, specification documents may still be necessary as the contract baseline (presumably in a less detailed form) 5. Requirements Engineering in Scrum a. The Scrum Guide does not make any mention of “requirements engineering”, and yet understanding the customer’s needs plays a consistently crucial role b. Often, customer requirements in Scrum are formulated into so-called User Stories, and are constantly adapted in each sprint c. Other requirement engineering artifacts commonly found in agile software development include: i. Use cases: sequences of actions/events demonstrating a particular functionality of the system and the interactions involved ii. Scenarios: informal descriptions of a user problem and the interaction the user has with the system to solve that problem iii. UML Diagrams: the Unified Modeling Language provides an extensive standard of abstraction to foster clear communication about system design iv. Prototypes: a work-in-progress mockup of a system to foster communication and exploration of alternative solutions 6. Formulating and Validating User Stories (INVEST) a. User stories are formulated from the perspective of the end user and serve to describe a feature of the software system and why it is meaningful b. Template: As a (role), I want to (action) so that (benefit). c. INVEST i. Independent - User stories should be self-contained and not reference or depend on any other user stories ii. Negotiable - User stories should only capture the needs of the user and leave room for conversation about possible ways to fulfill those needs (see “the what” in Requirements Specification) iii. Valuable - User stories should deliver value to the end user (see “so that (benefit)”) iv. Estimable - User stories should be formulated in a way that enables the estimation of their time and energy requirements, so that they can be properly prioritized and fit into sprints v. Small - User stories should represent small chunks of work, able to be implemented in a maximum of 3 to 4 days vi. Testable - User stories must be able to be verified using pre-written user acceptance criteria 7. From User Stories to Class Diagrams (Abbot’s Technique) a. Abbot’s Technique is a systematic process to guide the conceptual conversion of informal natural language into formal UML class diagrams b. Identify candidates for UML classes i. Look for nouns that are relevant for the understanding of the problem 1. Identify synonyms: different words that refer to the same concept in the application domain. 2. Identify homonyms: the same word being used to describe different concepts in the application domain. ii. Categorize the nouns into conceptual groups 1. Concrete objects or things (e.g. account) 2. People and their roles (e.g. employee, customer) 3. Information about actions (e.g. bank transfer) 4. Places (e.g. waiting room) 5. Relationships between classes (e.g. contract between people) iii. Eliminate superfluous classes 1. If it has no identifiable attributes or functionality 2. If it only represents implementation details (solution domain instead of application domain) 3. If its only purpose is to represent a role that another class takes on in relationship to another class 4. If it only contains operations that can be better assigned to other existing classes 5. If it represents the system as a whole (surprisingly common mistake! That would eliminate the whole point of the diagram) iv. Choose expressive and binding class names. Every name should be… 1. a singular noun 2. as concrete as possible 3. not a homonym (no overlap in meaning with another class) 4. (in the vast majority of cases) not an acronym 5. representative of all attributes and functionality within the class v. Document each class clearly and concisely with... 1. a defining sentence 2. (optional) a list of synonyms to aid in understanding the purpose of the class c. Identify attributes i. All attributes should be… 1. relevant to the application context 2. necessary for implementation ii. Choose expressive attribute names. Every name should be… 1. a noun (singular or plural, depending on multiplicity) 2. as concrete as possible 3. not a homonym iii. Define the attribute types 1. In a conceptual class diagram the type may remain only vaguely specified 2. Attributes with a complex, structured type should be replaced by independent classes d. Identify associations i. Search for relevant relationships between classes 1. Try to identify the relationship type (inheritance, aggregation, composition, etc.) 2. Important distinction between aggregation and composition a. Aggregation implies that the child object can exist independently of the parent object (e.g. wheels on a car) b. Composition implies that the child object can only exist when the parent object exists (e.g. rooms in a building) ii. Define the multiplicity of these relationships iii. Define the roles of the involved classes with role names in the diagram 8. Conceptual vs Implementation-Oriented Class Diagrams a. Conceptual class diagrams are designed to communicate the important entities and relationships in the application domain of the project b. Implementation-oriented class diagrams include more solution-domain-specific detail c. Important differences between the two: i. Included in implementation-oriented diagrams but NOT in conceptual diagrams: 1. Visibility modifiers (public, private, etc.) 2. Methods! (really!) 3. Abstract classes ii. Included in conceptual diagrams but NOT in implementation-oriented diagrams: 1. Generic associations between classes (should be resolved into one of the more specific association types) iii. Included in conceptual diagrams, but in less detail than in implementation-oriented diagrams: 1. Attribute data types 2. Generalization / inheritance Software Estimation 1. Motivation a. Every software project requires the definition of a budget and a timeframe necessary to deliver the specified final product b. This must be done during the early stages of the project lifecycle c. It is in the best interest of both customer and contractor to have accurate estimates for each metric i. Necessary for contract negotiations d. Within an agile context i. Software estimation also aids internal organization of the development team, as developers can be more efficiently allocated to more resource-intensive tasks ii. Cost estimates are made multiple times throughout the development process (changing requirements, changing costs) with varying degrees of detail 2. Scope a. Software Estimation aims to provide an accurate-as-possible prognosis of the amount of resources required to complete a project, given all information known about the project at the present time b. Sources of cost: i. Personnel costs (salaries) make up the vast majority of all costs in any software project ii. If the customer is not located nearby, traveling to meet with them and discuss the product may be another large source of cost 3. Devil’s Square a. Four competing variables that the customer wants i. High quality software ii. High functionality iii. Low price (as cheap as possible) iv. Low time (as soon as possible) b. These variables form the corners of a square with limited surface area (limited by the productivity of the development team) c. In order to satisfy the customer more in a single area (e.g. higher quality), other areas must be sacrificed (e.g. higher cost, longer development period) d. The assumption is that productivity (area of the square) will remain constant for a given organization and resource input i. Putting more pressure on the development team does not increase development productivity (usually has the opposite effect!) 4. Traditional software estimation a. Information available early in the project lifecycle i. Size of the codebase (lines of code) is NOT known during the planning phase, therefore is not a reliable metric for software estimation 1. Early methods of software estimation were based on expected LOC and the average programming productivity (LOC per month, LOC per year) of any given developer 2. Problem: LOC != LOC a. More lines of code does not necessarily mean more functionality, quality, reliability b. One developer might solve a problem in 10 lines of code, whereas another needs 100 lines. Is the second developer more productive because he wrote more code? Of course not! c. 10 lines of code that fix a critical bug are much more valuable than 1000 lines of code that add a small feature ii. Functional scope / data scope of the project IS known during the planning phase (derived from requirements elicitation phase) and therefore CAN be a reliable metric for software estimation 1. Requirements can also be weighted by complexity for a more accurate estimate b. Quality as a metric i. The higher the quality requirements, the greater the effort ii. There is no one measurement of quality, but rather many different characteristics that make up quality, to each of which performance indicators can be assigned (see non-functional requirements) c. Human productivity i. Influenced by countless different factors ii. Learning aptitude and motivation of the development team are crucial! iii. Adding more team members is not always a good thing! 1. Number of communication links grows quadratically with team size. Higher complexity is not helpful to productivity 2. Brook’s Law: “Adding manpower to a late software project makes it later.” 3. There must exist an optimal number of team members, at which communication complexity and division of labor are in equilibrium. Below this amount, the team members have too many individual responsibilities. Above this amount, the team members cannot communicate effectively with one another. 5. Estimation methods a. Three categories i. Comparison methods are based on the effort analysis of other, already completed software projects with similar characteristics 1. Analogy methods 2. Relation methods ii. Algorithmic methods are based on mathematical (statistical) models derived from the observed effort required in other projects 1. Weighting methods (e.g. Function Point Method) 2. Sampling method iii. Key figure methods are based on extrapolating the costs of individual units or project phases 1. Multiplication methods 2. Percentage-based methods b. Distinction between top-down and bottom-up strategies of estimation i. Top-down involves a holistic approach using general information known about the scope of the project and about other projects of similar scale (produces a rough estimate, preferred by higher-ups in the company who want a bird’s-eye view of the project and other projects) ii. Bottom-up involves a more component-based breakdown of the project to calculate the price of each individual item separately and add them up (more involved, detailed estimate used to validate the top-down approach) c. Which estimation method to use is heavily dependent on the project and on the required estimate accuracy/speed i. Early, rough estimates can be made with the analogy, relation, and percentage-based methods ii. More precise (but longer, and requiring more information!) estimates can be made with the weighting, multiplication, and sampling methods iii. Multiple estimation methods can and should be combined for more accurate results d. Some common concrete methods i. Function Point Method ii. Object Point Method iii. Constructive Cost Model (COCOMO) iv. SHELL – Method v. EGW – Method vi. IFA – PASS Method vii. Integrated method for effort estimation (Integriertes Verfahren zur Aufwandsschätzung, INVAS) viii. Use Case Point 3.0 Method 6. Function Point Method a. The most established method of software estimation b. Generally delivers the most accurate results by combining relational and weighting approaches c. Based on product requirements, NOT on lines of code d. Developed at IBM by Albrecht in 1979 e. Seven steps involved i. Categorization of product requirements into one of 5 categories, based on the type and the primary purpose of the function 1. For functional requirements (LFs): input, output, or query a. input is chosen when the primary purpose is the input of data from the user b. output is chosen when the primary purpose is the calculation (!) and output of some information to the user c. query is chosen when the primary purpose is to display raw data to the user, without calculation involved (think of simple SELECT statements in SQL) 2. For product data requirements (LDs): database, reference data a. database is chosen when the database has to be updated with new information (for example, customer information) b. reference data is chosen when the required information is read-only (for example, weather information pulled from Google) ii. Classification of product requirements by complexity (simple, medium, complex) 1. Based on number of pieces of information involved and number of locations where that information is sourced from, inserted into, etc. iii. Entry into calculation sheet iv. Evaluation of influencing factors 1. Just use your best intuition while scoring the factors, and be able to justify your scores v. Calculation of the evaluated Function Points vi. Determination of personnel expenses using a Function Point => Man Month curve or table 1. Data often derived from previous projects 2. Such curves often display non-linear growth, which can be attributed to decreases in productivity in larger projects vii. Update of empirical data as an estimation basis for future projects; add new data point into curve/table 1. Optional step: remove oldest entry in the dataset to keep reflect the most current state of the process f. The actual efforts needed for the software project should be measured to better be able to estimate projects in the future 7. Agile estimation techniques a. In the context of the Scrum framework i. Estimation of Story Points for each item in the product backlog ii. Estimation of days required for each item in the sprint backlog iii. User stories are the individual units of work with which software features are estimated iv. The user stories are managed by the Product Owner, while actual estimation is done by the Scrum Team b. Planning Poker i. Idea: Avoid “group-think” by having each team member come to their own conclusion independently, and then compare ii. After reviewing and discussing a particular feature to be implemented, all team members put a card on the table face-down with the number of days they think it will take to complete. Everyone turns their cards over at the same time, so no answers can be changed iii. This ensures everyone has a voice of equal importance, and the meeting is not dominated by the most talkative or the most influential members of the team Technical Foundations of Business Information Systems 1. Architectural patterns a. An architectural pattern describes a particular recurring design problem that arises in specific contexts along with a well-proven generic scheme for its solution. Some examples include… i. Pipes and filters architecture ii. Blackboard architecture iii. Broker architecture iv. Microkernel architecture v. Layered architecture vi. Tiered architecture b. Layered architectures define a logical partitioning of software components into “layers” of common scope to reduce overall complexity i. Strict layered architectures allow individual layers to access only components of the layer directly below it ii. Open layered architectures allow individual layers to access components of all layers below it iii. No layered architecture allows any layer to access components of a higher layer in the hierarchy. c. Tiered architectures define a physical (read: involving separate hardware nodes) partitioning of software components into different process spaces of a system i. What differentiates process spaces is simply their set of responsibilities in the system. 1. A process space does not necessarily have to reside on its own hardware node 2. Think of the client version of a piece of software running on perhaps millions of computers across the country, and the server version of that software residing on one (or many!) central computation centers. These represent two distinct process spaces. Note how the number of distinct hardware nodes plays no role in distinguishing the boundaries of the process spaces. 3. Another typical set of process spaces: a. Presentation tier b. Business logic tier c. Resource tier ii. Example: client-server architecture 1. As mentioned previously, a very common example of a tiered (in this case two-tiered) architecture is the client-server architecture 2. In this architecture, a client sends a service request to the server, and the server sends a response back 3. Communication is always initiated by the client 4. One server can serve many clients 5. A server can itself be a client of another server 6. Visual presentation is handled by the client tier. Business logic can reside on both/either client and/or server. Resource management is handled by the server tier. 7. The client-server architecture offers advantages over other tiered architectures in performance and ease of implementation. iii. Example: three-tiered architecture 1. Occasionally, to prevent the unnatural distribution of three responsibilities (presentation, business logic, resource management) onto two tiers, a third middle tier can be introduced 2. This tier exclusively handles business logic. Presentation remains on the client tier, and resource management remains on the server tier. 3. This is the standard model for simple web applications: a. Client tier realized by HTML/CSS/JavaScript on a browser b. Middle tier realized by a web application server (see: Spring Boot “controller” classes) c. Server tier realized by a database management system iv. Higher-tiered architectures (four-tiered, n-tiered) 1. The business logic of the middle tier is further split into separate tiers to reduce code complexity 2. Offers greater protection and isolation of individual application processes 3. Can enable many concurrent application processes a. Web Server i. Processes requests coming in over various network protocols like HTTP, HTTPS, FTP ii. Provides clients (web browsers, web clients) with content (HTML, CSS, JS, files, images) iii. Manages sockets, access control, cookies, script execution, caching b. Application Server i. Specialized web servers that specifically generate HTML responses from HTML requests ii. Handles authentication, authorization, session management, database encapsulation, transaction processing, asynchronous communication (messaging) c. Database Server i. Software with the specific task of storing and managing control of persistent application data 2. Libraries vs. Frameworks a. A library is a reusable software component (typically consisting of multiple classes) that is designed to be used by other software through the use of its API. The order in which the functions of the API are called is determined by the individual user of the library. b. A framework is a partially finished generic software system designed to be augmented by application-specific code in order to create a complete software system with relative ease. A framework does not offer an API, but rather a basic architecture and set of generic functions which can/must be overridden by the individual user. 3. Inversion of control a. A program running within a framework no longer has the agency of deciding what functions to perform when. This is instead handled by the framework (analogy: the implementing code is the puppet, and the framework is the puppeteer) b. The Hollywood Principle: “don’t call us, we’ll call you!” i. The framework is a director at Hollywood. ii. The implementing code is the actor, who waits to be called by the director. 4. Advantages and disadvantages of frameworks a. Faster development, more standardization, fewer errors b. More training required, language and environment strictly specified, creating frameworks themselves is hard, often difficult to combine multiple frameworks (incompatibility) 5. Dependency injection a. Dependency injection is a design pattern where an object receives other objects that it depends on from a so-called injector, rather than fetching those objects itself b. Instead of the client object specifying the exact implementation to use, the injector makes this decision on its behalf c. This helps achieve a separation of concerns in the construction and use of objects, which can greatly improve code readability and reusability d. Spring Boot makes extensive use of dependency injection with its @Autowired annotation 6. Other annotations in Spring Boot a. @SpringBootApplication is the primary annotation denoting the main class of a Spring Boot project. It is equivalent to using the following three annotations in combination... i. @EnableAutoConfiguration will auto-configure the spring modules with a default (opinionated!) configuration, which can be tweaked later. This is the “Boot” in Spring Boot. Without this, all configuration must be done by hand. ii. @ComponentScan will recursively search for configuration beans (see the @Bean annotation), and classes marked with @Component or a derivative annotation (@Repository, @Service, etc…) in the current package, allowing them to be injected with @Autowired iii. @Configuration allows adding extra Spring @Beans to the application context (example from the exercise group: marking the getter method for a new Argon2PasswordEncoder as a @Bean) and importing additional configuration classes b. @Component i. Generic annotation for any class that needs to be managed by Spring ii. Has multiple derivative annotations, including: 1. @Repository for interaction with the database 2. @Service for business logic and entity manipulation 3. @RestController for managing a web API Persistent Data Management 1. Motivation: impedance mismatch a. There exists a significant mismatch between data manipulation patterns in code (high frequency iterations, with loops) and data manipulation patterns in relational databases (low-frequency bulk operations, with large, wide-reaching queries) b. When our objects in code must also be persisted in a database, we must find a way to eliminate this mismatch i. Not solving this problem would entail sending queries to the database in a loop, fetching just a single object at a time. This would cause a massive performance hit 2. Database transactions a. A transaction is a single logical unit of work, which cannot be split up into smaller units (it is atomic) b. Transactions adhere to the ACID paradigm i. Atomicity 1. Cannot be broken down into subcomponents 2. Transactions are executed in their entirety, or not at all ii. Consistency 1. At no point in a transaction is inconsistent data stored in the database 2. No possibility of a failure leading to corrupt data iii. Isolation 1. Transactions have full synchronized control of the database until their successful completion 2. Two transactions cannot occur at the same time (might lead to a race condition) iv. Durability 1. Transactions have persistent effects on the data in the database 2. Transactions cannot get “lost” 3. JDBC (“Java Database Connectivity”) a. Enables interaction between Java code and a generic relational database (MySQL, SQLite, PostgreSQL, H2, MongoDB, etc.) b. Often too complex to handle directly in code (error-prone) c. Requires committing to a single persistence strategy and database schema. Changes are cumbersome and require extensive changes to the code d. It is generally considered (very) bad practice to integrate JDBC into business logic code 4. JPA (“Java Persistence API”) a. A standard API for the management of persistent data in Java environments b. Only a specification! A concrete implementation must be chosen in order to use it (EclipseLink, Hibernate, OpenJPA, etc.) 5. Hibernate a. “Object-Relational Mapping framework” for Java and relational database management systems i. Provides a specification for the translation of Java Objects into database data, and vice versa ii. This translation occurs behind-the-scenes and is not visible to the programmer b. Full JPA support c. Provides an SQL-like query language called Hibernate Query Language (HQL) i. We did not use this in the exercise, instead we used standardized JPQL d. Serves as an abstraction above JDBC by wrapping JDBC connections in Session objects from a SessionFactory i. Managing these Session objects directly is surely better than JDBC, but is still error-prone and leads to a lot of boilerplate code 6. Object serialization and deserialization a. Serialization is the process of converting an object and all its component into a byte or character stream that can be transmitted over the internet and reconstructed (deserialized) on the other side b. Not all objects can be serialized i. Threads (there is no meaningful definition of what it would mean to send a thread over the internet) ii. Sockets (same as above) iii. Attributes explicitly marked as transient (should not be serialized, e.g. passwords) c. The Serializable interface in Java enables this functionality d. (De)serialization is also referred to as (un)marshalling in the context of distributed information systems 7. Persisting entities a. In order to obey the JPA entity specification, a Java class that represents an entity to be persisted to the database must... i. be marked with the JPA annotation @Entity ii. have a default constructor with no parameters. If a custom constructor is added to the class, this default constructor must also be explicitly added (public Entity() {}) iii. provide access to its attributes only through getters and setters iv. have no final attributes v. implement the Serializable interface (only if an entity instance is to be passed by value through a remote interface - sent over the internet. This was not the case in the exercise groups) 1. Note from the editor: this requirement is redundant as this is already the case for any Java object that you want to send over the internet. They included it anyway… b. Entities must also define… i. an id attribute marked with @Id to serve as its primary key in the database (can also be marked with @GeneratedValue if this id should be generated by the database automatically) ii. a table name (defaults to the name of the class, can be overridden with the @Table annotation) iii. column names for each of its attributes (defaults to the name of the attribute, can be overridden with the @Column annotation) 8. Spring Data JPA a. Provides yet another layer of abstraction above Hibernate and JDBC by allowing seamless interaction with relational databases within Spring projects b. Repositories i. Repository code (to query the database and reconstruct Java objects from the data) can be generated automatically using Spring and the @Repository interface annotation ii. Method stubs within the @Repository interface receive auto-generated implementations at runtime, such that the interfaces can be @Autowired in other classes and used as if they were already complete implementations iii. Spring provides a set of pre-made repository interfaces with increasingly greater functionality, from which your repositories can inherit. 1. CrudRepository provides basic CRUD operations a. Create entities b. Read entities c. Update entities d. Delete entities 2. PagingAndSortingRepository zasto smo a. Offers extra methods for pagination and sorting of queries u rezbama Koristili ovo umesto - 3. JpaRepository ? nadam da Crudrepo-a a. Offers extra JPA-related methods for flushing pending se treba da me tasks to the database and deleting records in a batch bolikuracza ovo 4. Update: This taxonomy has been extended as of Spring Boot 3.0. - iv. Any additional methods you may want to add to your repository interface can be automatically implemented by Spring as well. The implementation query can be... 1. derived from the method name 2. specified manually using the @Query annotation and the SQL-like JPQL language specified in the JPA specification a. Most notable difference for basic SELECT statements: must always use an alias! For example: “SELECT t FROM Table t (...)” rather than “SELECT * FROM Table (...)” b. Attributes of attributes are directly accessible (normally you would have to use a join). For example: “SELECT r FROM Rental r, Invoice i WHERE r.invoice.id = i.id AND i.isPaid = false” c. Query parameters are numbered and autofilled from the Java method below (normally they would just be ? and have to be populated manually). For example: “@Query(“SELECT r FROM Rental r WHERE r.id = ?1”)” ? ↳ prvi parametar moline public Rental findRentalById(int id); u metodi & v. To explicitly mark a repository as an abstract super-interface that does not correspond to a concrete database table, replace the @Repository annotation with @NoRepositoryBean. This is useful if you have multiple application-specific repositories with overlapping functionality c. Services i. Wraps a repository for a specific entity and provides selective and customized access to it through delegation (this will be your primary interface between your database and your business logic) ii. Marked with the @Service annotation 9. Inheritance strategies a. There are multiple ways to represent object inheritance in a relational database b. Single Table Strategy -> fastest inheritance strategy available - i. @Inheritance(strategy = InheritanceType.SINGLE_TABLE) 7 advantages : - easy primary key handling ii. Stores all information about supertype and subtypes in one single table -> good polymorphic query iii. Uses a discriminator column to describe the specific entity type of each performance individual row (e.g. “Dog”, “Cat”, in the table “Pets”) (can be named with disadvantages : the @DiscriminatorColumn and @DiscriminatorValue annotations) null values -> many iv. Results in a large number of columns that only apply to one specific not -> null-impossible subtype, and are filled with NULL values for all other types 1. As a result, NOT NULL constraints on subtype attributes are effectively impossible c. Joined Table Strategy i. @Inheritance(strategy = InheritanceType.JOINED) ↳ ii. Attributes of the supertype are stored in the supertype table, while subtype attributes are stored in separate tables for each individual subtype along with the entity ID iii. Eliminates the NULL problem, but introduces the need to join supertype and subtype tables together in order to read complete data d. Table-per-Class Strategy i. @Inheritance(strategy = InheritanceType.TABLE_PER_CLASS) ii. Like Joined Table, but supertype attributes are now mirrored into each individual subtype table so that every entity (both supertype and subtype) is stored entirely within its own table iii. Eliminates the join problem, but introduces much harder primary key handling 1. A complete list of primary keys is no longer stored within one table, but rather spread out over all subtype tables. This means all tables must be queried in order to generate the next primary key, or an external key management system must be devised e. Mapped Superclass Strategy i. @MappedSuperclass ii. Like Table-per-Class, but the supertype table is removed completely, leaving only the subtype tables with the included attributes from the supertype iii. Eliminates the primary key problem by generating IDs per-table, but also prevents the use of polymorphic queries on and relational associations with the supertype table (as no such table exists anymore) 10. Persistence associations a. These association annotations are to be applied on the attributes of an entity class b. @OneToOne c. @OneToMany / @ManyToOne i. @OneToMany should be applied in the class with single multiplicity and on the attribute with n-multiplicity (one of this to many of those) ii. @ManyToOne should be applied in the class with n-multiplicity and on the attribute with single multiplicity (many of these to one of that) iii. The owner of this relationship (determined with mappedBy) should be the entity with n-multiplicity (one foreign key per database entry), NOT the entity with single multiplicity (many foreign keys per database entry) d. @ManyToMany i. Many to many relationships must be stored in a separate join table ii. mappedBy has the effect of eliminating one of the two join tables Spring Boot will create by default, with the same data in a different order (ex. Student_Lectures and Lecture_Students). e. These annotations can be modified with the “mappedBy”, “cascade”, and “fetch” attributes i. “mappedBy” is a String value which holds the name of the attribute in the other class in relation with this one. It has the effect of preventing Spring Boot from mapping this attribute to a foreign key column in the database, because the specified attribute in the other class already maps to such a foreign key, and adding another would be redundant. ii. “cascade” is an enum value which determines which database actions performed on this entity should cascade down to the associated entity. For example, when a student is saved to the database, all of their devices are automatically saved too. 1. CascadeType.ALL - cascades all CRUD operations 2. CascadeType.REMOVE - will cascade DELETE operations 3. CascadeType.PERSIST - will cascade SAVE operations 4. CascadeType.NONE - default, no cascading 5.... iii. “fetch” determines how soon attribute entities should be loaded from the database 1. FetchType.EAGER - loads attributes immediately 2. FetchType.LAZY - loads attributes whenever necessary (see lazy loading) 3. Further details: https://www.baeldung.com/hibernate-lazy-eager-loading 11. Query languages a. JPQL i. Highly SQL-similar query language ii. Can be used in the @Query repository method annotation to manually define queries to be implemented iii. As mentioned before, SELECT statements always require an alias (“SELECT t FROM Table t” as opposed to “SELECT * FROM Table”) iv. The selection “FROM” must always refer to a persistence entity class v. Attributes of attributes are directly accessible without joining vi. Can be parametrized with ?1, ?2, ?3, method parameters are matched with these wildcards in the same order they appear in the method signature (by default) vii. Downsides: query strings cannot be checked by the compiler, long queries can become hard to understand. The desire for a type-safe way to construct queries arises b. Criteria API i. Introduces an object-oriented library for the creation of queries ii. The first two steps in query creation always look the same: 1. Creation of a CriteriaBuilder object by calling.getCriteriaBuilder() on an EntityManager object a. The EntityManager is a JPA artifact which wraps a JDBC connection and is typically hidden behind the abstraction of Spring Boot. However, for this purpose it can be injected using the @PersistenceContext annotation 2. Creation of a parametrized CriteriaQuery object by calling.createQuery(Entity.class) on your CriteriaBuilder object iii. The middle steps involve modifying your CriteriaQuery object (“query”) with different query methods 1. SELECTing a certain entity table a. Create a Root object table with query.from(Entity.class); b. Modify cqry with query.select(table); 2. Adding a WHERE clause a. Create a Predicate object pred by calling one of CriteriaBuilder’s methods that returns a Predicate, such as.gt(),.equal(),.and(),.count(),.all(),.exist(), etc. (Expression objects for the method parameters can be obtained by calling table.get(String attributeName) on the Root of the correct table) (this is not type-safe!) b. Modify CriteriaQuery object with query.where(pred); iv. The last two steps in query creation (also) always look the same: 1. Creation of a parametrized TypedQuery object by calling em.createQuery(query) on your EntityManager object and passing in your CriteriaQuery object 2. Execute the query and get the result by calling TypedQuery#getSingleResult() -> Entity, TypedQuery#getResultList() -> List or TypedQuery#getResultStream() -> Stream on your TypedQuery object v. Downsides of Criteria API 1. Still no complete type-safety due to reliance on Root#get(String) which cannot guarantee the attribute type (and is error-prone like JPQL) 2. Lots of code for very simple queries c. Querydsl i. Provides a more functional, Stream-like method of creating queries based on a QueryFactory source object ii. Makes use of so-called Q-types to create an abstract persistence model above your entity classes 1. Every entity class receives its own Q-type, which is a statically (before compilation) generated class to be used in Querydsl queries 2. The name of an entity Q-type is the name of the entity class with a “Q” prefix (by default) 3. You do not need to know how to generate these Q-Types on the exam, you can simply assume that they are available for use iii. The basic Querydsl syntax looks like this: QueryFactory.selectFrom(QEntity object).where(Predicate...).fetch(); 1. QEntity objects can be found as public static attributes inside of each Q-type class, named with the name of the entity type. For example: QUser.user or QAccount.account 2. Predicates can be acquired by accessing the public static attributes of a QEntity object, and calling predicate methods on them. For example: QUser.user.name.eq(“John”) or QAccount.account.iban.eq() 3. Alternatively to.selectFrom(), which will fetch all columns from the database and compile them as entity objects,.select(Attribute...).from(QEntity object) will only fetch certain attributes and return a List. For example:.select(QAccount.account.iban).from(QAccount.account) iv. Advantages of Querydsl 1. More human-readable than the Criteria API 2. Open-source 3. Statically typed 4. Type-safe queries 12. Alternatives for persistent and bulk data management a. File system i..txt,.xml, other proprietary file formats ii. Advantages 1. Universally available, very simple iii. Disadvantages 1. Not extensible 2. No multi-user access 3. Limited scalability 4. OS-dependent 5. Not easily changeable b. XML files (“eXtensible Markup Language”) i. Advantages 1. Platform-independent (works everywhere) 2. Extensible, flexible format 3. Human-readable, self-describing 4. Suitable for encapsulation of legacy applications 5. Well-supported by database management systems 6. Separates the content from its presentation (unlike HTML) 7. Supported by APIs for XML processing 8. Uses application-specific tags ii. Disadvantages 1. Not trivial in detail 2. Does not provide direct support for references 3. Limited to hierarchical nesting (like file system) 4. Supports updates only via load & save c. Content management system i. Ideal for storing lots of non-text-based files (photo, video, audio, etc.) ii. Multi-user support iii. Provides additional support for manipulating the visual representation of data irrespective of how it is stored in the database 1. “Decoupling of information and representation” d. NoSQL database (“Not only SQL”) i. Non-relational database model ii. Focuses on horizontal as opposed to vertical scalability (many small database servers rather than one large database server) 1. Horizontal scaling immediately implies a distributed system iii. Schemaless - weaker restrictions on table attributes. Some entries in a table might have different attributes than the rest iv. Advantages 1. Typically open-source 2. Easier data replication due to distributed nature 3. Often provide a simple API 4. Can handle read and write operations at a higher rate than relational database systems v. Disadvantages 1. Cannot guarantee ACID properties due to high level of distribution, can only be “eventually consistent” 2. Weak consistency in and between tables e. Relational database (not an alternative, but was included in the slides) i. Advantages 1. Strict table schemata 2. Can guarantee ACID transaction properties ii. Disadvantages 1. Often show performance issues with data-intensive applications, since they can be optimized for either small, frequent transactions or large, infrequent transactions, but not both. f. Native XML database g. Object database 13. Data validation https://jakarta.ee/specifications/bean-validation/3.0/apidocs/jakarta/validation/constraints/package-summary.html a. Spring Boot supports entity attribute validation using annotations from (primarily) the jakarta.validation.constraints package b. Some validation cannot happen until an entity reaches the database i. Database will throw an exception if any table constraints are violated, for example if a unique attribute is already found in the database Architecture of Distributed Information Systems 1. Motivation a. Sharing resources among multiple distinct machines connected over a network (for example, all computers can access a single printer on the network) b. Highly extensible hardware and software c. Allows for a high degree of concurrency with many independent users on the network d. High scalability to allow for greater workloads at the same speed e. Failure tolerance through redundancy f. Offers (does not guarantee!) transparency in implementation i. Remote files behave like local files ii. The physical location of objects on the network is not disclosed, nor are location changes iii. Consistency of functionality despite concurrent interaction 2. False assumptions about distributed information systems a. The network is reliable i. You cannot assume that a message sent is a message received! b. The network is secure i. You cannot assume that nobody can steal or spoof your messages c. The network is homogeneous i. A system that works on Windows will not necessarily comply with MacOS or Linux d. The topology does not change i. Active servers might be moved to a different country e. Latency is zero i. Latency exists! It makes a big speed difference whether you communicate one time or 100 consecutive times with the server, especially when the server is very far away f. Bandwidth is infinite i. Sending an entire serialized object across the internet as opposed to just the necessary attributes might not seem like a big deal in local testing, but it can cause major problems if thousands of users are doing so simultaneously g. Transport cost is zero i. High bandwidth load => high costs ii. Not everyone knows how to fix every problem - sometimes sending someone to a different location to fix an issue is necessary h. There is one administrator i. One person cannot be everywhere at once 3. Disadvantages of system distribution a. Increased complexity b. More vulnerable to attacks c. Hardware and software used in the system is no longer as uniform - lots of different hardware levels, software systems, operating systems d. Performance loss due to network e. Much harder to debug and solve problems when they occur f. Therefore: more distribution is not always a good thing! Use as much as necessary, and as little as possible. 4. Synchronous vs asynchronous communication a. Synchronous communication is characterized by lots of “idle waiting” - processes get blocked while awaiting a response from another node in the system and cannot do anything else in the meantime b. Asynchronous communication allows processes to continue working on other tasks, while awaiting a response 5. Remote procedure calls a. Nothing more than a method call that must be sent over a network b. Calls (and their eventual responses) are marshalled (serialized) and unmarshalled (deserialized) i. On the side of the calling entity, this (un)marshalling is done by a “proxy” ii. On the side of the receiver, this (un)marshalling is done by a “skeleton” 6. Client and server communication a. There are multiple points of failure in client-server communication i. The message might be lost in transit to the server ii. The server might crash before sending a response iii. The message might be lost in transit back to the client b. What should a synchronous client do if one of these things happens? It cannot just sit there and wait forever. The answer: timeouts i. Wait a predetermined amount of time for a response, but stop waiting if none arrives c. What should the client do upon a timeout? Multiple ideas: i. Maybe semantics: do not send the request again. We do not know if the request was received or not, but that’s okay. ii. At-least-once semantics: send the request as many times as it takes to get a response. We have to know that the request was received, even if it was actually received (and executed by the server) 300 times. iii. At-most-once semantics: send the request as many times as it takes to get a response, but also mark each request as a duplicate. We have to know that the request was received, but we also ensure that the server only has to execute this request a single time (if the server is smart enough to identify that) iv. The “myth” of exactly-once semantics: this would be ideal, but is essentially impossible to achieve 1. There is simply no way to always know what happened to a request you sent, and therefore no way to decide whether to send it again d. Communication through HTTP i. HTTP defines a set of basic request methods that correspond with the already-familiar CRUD operations 1. Create => /POST 2. Read => /GET 3. Update => /PUT 4. Delete => /DELETE ii. HTTP also defines a set of basic response statuses that enable clear, concise communication from the server back to the client 1. Every status consists of a three-digit code and a brief textual phrase, for example “200 OK” 2. Code ranges a. 100 - 199: Informational b. 200 - 299: Success c. 300 - 399: Redirect d. 400 - 499: Client error e. 500 - 599: Server error 7. JSON (“JavaScript Object Notation”) a. Lightweight text-based open standard designed for human-readable data interchange b. Derived from the key:value pair object syntax of JavaScript 8. Asynchronous messaging models a. Idea: introduce a persistent, intermediate message queue to enable indirect (not immediate) communication i. MOM (“Message Oriented Middleware”) ii. Example: email b. Senders can “produce” messages into the queue for receivers to “consume” c. Different executions of the same idea i. Point-to-point communication 1. One producer, one consumer 2. Only after the receiver consumes a message does it acknowledge this to the queue, so that the message can be finally removed from the queue. No message is taken out of the queue before being successfully consumed! 3. Example: direct messaging (person-to-person) ii. Publish-subscribe scheme 1. One producer, many consumers 2. Every message enqueued by the publisher gets forwarded by the queue to all of its subscribers d. Big advantages of such a system i. Decoupling of producer/consumer failures 1. A producer can continue to send messages to the queue even if the consumer is offline 2. A consumer can continue to read messages in the queue even if the producer is offline ii. Queue strategy can be freely adapted - FIFO, LIFO (maybe limited usefulness), priority queue iii. Producers and consumers require no information about one another - the address of the queue is all that is needed 9. Implementation in Spring Boot a. Spring Boot supports the development of REST-compliant web applications b. A controller class can be marked with the @RestController annotation in order to mark it as a RESTful web application controller, and with the @RequestMapping(String path) annotation to set the API path this controller should be responsible for (for example “/api/accounts”) c. API methods in this class should be marked with a mapping annotation corresponding to one of the aforementioned HTTP request methods i. @GetMapping for GET operations ii. @PostMapping for POST operations iii. @PutMapping for PUT operations iv. @DeleteMapping for DELETE operations v. These annotations optionally accept path strings which can more specifically define the path that a method is responsible for handling 1. These paths can define path variables inside of {}, for example {id}, which can then be accessed in the method parameters with the @PathVariable annotation vi. POST and PUT operations also accept a request body object, which can be reconstructed from a prespecified encoding format (JSON, XML, etc.) and passed to the method parameters as a fully-formed Java entity object 1. Mark these parameters with @Valid to ensure all entity attribute constraints are fulfilled, and with @RequestBody to specify that this object should be vii. Controller classes should have an autowired Service instance to which method calls can be delegated as necessary 10. Global error handling a. Spring Boot allows for the configuration of global (within the context of the API controllers) error handler classes using the annotation @ControllerAdvice b. Methods in such classes are marked with the annotations… i. @ResponseBody, to indicate that the return value of this method (usually a String) should be provided as-is in the response body sent back to the client ii. @ExceptionHandler(SpecificExceptionClass.class) to specify which exception(s) should be caught by this method iii. @ResponseStatus(HttpStatus.SOME_RESPONSE_STATUS) to specify what response status should be sent back to the client c. In addition, methods should accept a parameter of the exception type specified in the @ExceptionHandler annotation Security Engineering 1. Motivation a. Business applications often store sensitive data that should only be available to certain parties. This data needs to be protected from attacks 2. Core information security principles a. Confidentiality: data is only accessible to explicitly authorized parties b. Integrity: data cannot be unnoticeably altered (e.g. without a log entry) c. Accessibility: data is consistently available to authorized parties 3. Some additional information security principles a. Non-repudiation: impossibility for a user to plausibly deny having performed an action on the system b. Auditability: ability to reconstruct earlier states of the system (in some level of detail) c. Usability: ease of observing the security requirements in place (requirements are not so hard to fulfill that users start to take shortcuts to get around them) 4. How much security is right for me? a. Since 100% security is impossible to achieve, some compromise must be made b. How much security is necessary depends on how much risk the system provider is willing to undertake i. Simple (weighted) risk analysis: what is the probability of a given thing being compromised, and how bad would it be if that happened? 5. Common vulnerabilities in web-based APIs a. OWASP publishes a list of the top 10 risks for web applications every 4 years b. Broken object-level authorization i. Authorization checks if a user is logged in, but not as whom they are logged in ii. Fails to prevent a logged-in user from accessing another user’s private information (e.g. by changing the user id in the URL) c. Broken function-level authorization i. Users have access to administrative privileges because the server fails to check their role or fails to prevent them from changing their role (e.g. in an altered POST/PUT request) 6. Authentication vs Authorization a. Authentication: “Who am I?” i. Username and password ii. Certificate authentication (user possesses a secure certificate file and associating private key on their system which grants them access) iii. Multi-factor authentication (biometrics, SMS messages, time-based codes, predefined security questions, possession of a device) b. Authorization: “What am I allowed to do?” i. Role-based authorization: users are categorized into different roles and inherit all permissions from their respective role(s) ii. Attribute-based authorization: users are authorized based on specific attributes they have (more fine-grained, more complicated to set up) c. Authentication is a prerequisite for authorization 7. Dealing with passwords a. Passwords should never be stored in plain text, but rather hashed with a secure algorithm b. Difference between encoding, encryption, and hashing i. Encoding is a reversible (not secure!) transformation of a string into another string (e.g. to avoid problems due to unsupported characters) ii. Encryption is a specific type of encoding which can only be reversed with an additional secure piece of information (private key) iii. Hashing is a deterministic one-way transformation (reversal is infeasible) 8. HTTP Basic Authentication a. In its default configuration, Spring Boot uses HTTP Basic Authentication, an encoding (not encryption or hashing!) method that translates username and password into a string in the Base64 format and sends it over the HTTP authentication header b. This Base64 string must be sent with every request to the server (stateless authentication) c. In order for this authentication method to be secure (as Base64 does not provide this security) the information must be sent securely over the web (e.g. with HTTPS) d. Another limitation: HTTP Basic Authentication cannot be delegated to and managed by another system, which is often desirable 9. Non-stateless authorization a. Idea: remember who is logged in so that the users do not have to send their username and password with every request b. Server-side sessions and session cookies i. The server gives the client a session ID which both will store and compare on later requests ii. This solves the problem of the client having to send their username and password everytime, but interferes with statelessness on the server-side (which is a principle of RESTful web APIs) c. Token-based authentication and authorization i. The server gives an authenticated client an authorization token, which can be used as proof-of-authorization and as an access right to certain functionalities in the system (exactly those functionalities that the client is authorized to use) ii. The client is no longer authorized directly, but rather the token they are carrying (and provide with every request) iii. Tokens have a short lifetime and will expire quickly, forcing the user to log in again and request a replacement. This prevents a would-be attacker from having access to a stolen token for very long. iv. The server does not need to keep a log of all tokens it has given out: it simply reads a client’s token and determines what access privileges it grants (security is provided by the signature of the token, which the server verifies was created using the secret key that only it knows) v. This enables server-side statelessness as well as non-stateless authorization on the client side vi. JWT (JSON Web Tokens) is the most common token standard in use today 1. Describes a method of sending tokens through JSON objects 2. Every token consists of… a. A header, which contains the token type and hashing algorithm used later on in the token signature b. A payload, which contains multiple claims about the expiration date of the token and various permission-granting user attributes c. A signature, which is generated by the hash of the previous two components and the aforementioned secret key stored on the server. This ensures the integrity of the token 10. OAuth 2.0 a. Protocol for delegating authorization to other services (Google, Apple, Facebook, etc.) b. Is not an authentication protocol - how the user is authenticated is outside the scope of OAuth c. Defines 4 main roles i. Resource Owner 1. An entity capable of being authorized (such as an end user) ii. Resource Server 1. The server with the resources the Resource Owner would like to be authorized to access iii. Client 1. The application the Resource Owner is trying to log into (distinct from the Resource Server, which is where the actual information is stored!) iv. Authorization Server 1. The third-party server which authorizes the Resource Owner to access specific information 11. Configuring Spring Security a. The Spring Security configuration can be adjusted by creating a web security class and marking it with @Configuration b. Define @Bean configuration methods to fit your security needs. Of particular importance to us is the method: public SecurityFilterChain filterChain(HttpSecurity http) c. Use http.httpBasic() to enable HTTP Basic Authentication, or http.formLogin() to enable a login splash-screen upon visiting the site d. Use http.anyRequest().authenticated() to require every request to be authenticated e. Use http.authorizeRequests().requestMatchers(String path).hasRole(String role) to require every request to a particular path to be authorized with a particular role 12. Implementing users in the system a. To manage users in the system, Spring Boot offers the interfaces UserDetails and UserDetailsService (for fetching users by username) and UserDetailsManager (for all CRUD operations on users) b. User entity classes must also implement the UserDetails interface and override the required methods i. The method getAuthorities() should return a Collection (read: list of permissions) in accordance with the user’s role or access rights c. By-user API method authorization i. API methods also support Authentication objects in their parameters, which represent the authentication details of the specific user currently calling this API method. This can be used to selectively grant or deny access to a method Software Restoration Workflow Management Systems 1. Motivation a. Business processes typically must follow a predefined order of operations involving multiple distinct systems in order to produce a result b. Idea: create a horizontal “helping” layer over these vertical systems to guide users along this order of operations and improve the efficiency of the process c. The people executing these business processes no longer need to perform menial, repetitive tasks like sending emails, forwarding documents, notifying coworkers, etc. Can spend more time on more useful tasks i. Comparable to a software framework (like Spring) that is extended and adapted by people instead of by more code 2. Classification of business processes a. Alignment against two factors i. Value to the company ii. Frequency of execution b. Production processes: high value, high frequency i. Core processes of the organization ii. Complex information processing iii. Hard to automate due to high required semantic knowledge of multiple distinct applications c. Administrative processes: low value, high frequency i. Often repetitive, predictable ii. Simple rules, few variations iii. Easier to automate d. Collaborative processes: high value, low frequency i. Occasional complex tasks that require higher level thinking ii. Not at the core of the organization like production processes iii. Hard to automate iv. Ex. designing a special (one-time) event, developing a new software system e. Ad hoc processes: low value, low frequency i. Mostly done manually ii. Do not follow a set scheme due to low frequency iii. Tasks are unique, not repetitive 3. Normative processes a. See changes relatively infrequently b. They are characterized by a strict schema - every step of the way is more or less planned out c. Examples i. Loan processing at a bank ii. Claims processing at an insurance company iii. Vehicle production line 4. Adaptive processes a. May have a clearly defined start and end point, but the inner steps are not clearly defined or are strongly dependent on the individual case b. Generally are not easy to automate in whole c. Often called “data-intensive” or “knowledge-intensive” processes because of their high dependence on instance data and specialized steps based on that data 5. Challenges in realization of workflow management systems a. Misalignment of business and technical domains i. Business processes are defined from a business-centric perspective which does not necessarily translate 1:1 into a technical implementation ii. A technical implementation may not fit into a business environment b. Security i. Users are hesitant to use the application because of potential data vulnerabilities ii. May be difficult to integrate real-world security procedures in the first place c. Rapid development i. Changing requirements mean that the system needs to be flexible and not follow a completely hard-coded schema d. Integration of heterogeneous systems i. The various systems being joined in a workflow are not standardized and a new interface must be designed for every interaction between them e. Different skills and backgrounds in the workforce i. People are typically only familiar with a single area of the system, but now must learn to interact with the other components as well ii. Requires communication and training