exam 2 part 1.pdf
Document Details
Uploaded by AttentivePink
Tags
Full Transcript
Lecture 14: Web Server Performance Notes ● Secure Web Communication ○ Public Key Cryptography ■ What can happen to the key fob I give to my friend of my car? ● He doesn't make copies ● Somebody steals the key i give him…somebody steals the key and drive away the car ■ The minute I give away my priva...
Lecture 14: Web Server Performance Notes ● Secure Web Communication ○ Public Key Cryptography ■ What can happen to the key fob I give to my friend of my car? ● He doesn't make copies ● Somebody steals the key i give him…somebody steals the key and drive away the car ■ The minute I give away my private key..share my private key, i lose control of the key ■ Private key you keep and public key is for public. ■ For Authentication: sender creates both keys ■ Ensure no body receive ■ For Privacy: receiver creates keys ■ Private Key Encryption: sender/receiver share private key ■ Public Key Encryption: for authentication-> sender creates both keys ■ Receiver has private and public keys: for privacy ○ ○ ○ ○ ○ ■ 3 guys at MIT invented public key encryption ■ The most popular algorithm for public key encryption(for authentication) is the RSA algorithm (Rivest, Shamir Adleman) ■ Professor Adleman coined the term computer virus RSA is one of the first practical public-key cryptosystems and is widely used for secure data transmission A user of RSA creates two large prime numbers and then publishes one of them as his public key – the numbers are typically 1024 digits or more – there are a few more steps involved Determining the private key from the public key involves factoring very large numbers – but there is no efficient algorithm for factoring large numbers Hash Functions - to map a very large object (mp4) to a unique small value.. Bits of source file go through hash function and out put is a small hash…A hash ○ ○ ○ function or hash algorithm is a function that maps a domain of values into a range of numbers. ■ Two well known cryptographic hash functions are MD5 and SHA….Used today MD5 and SHA ■ – H(X) is called the message digest or digital signature of X under the hashing algorithm H. ■ Property of a good hash function: ● Collisions - the probability of collisions is low…probability of 2 files to produce the same hash is low… ● SHA and MD5 have low collisions Super fast hash functions - Bulk cipher methods ■ •public/private key encryption methods are not suitable for general purposes, e.g. ● – the RSA method can only encrypt blocks of data which are 11 bytes less than the key size; and each decryption involves complex mathematical calculations ■ therefore, secure communication on the web uses a combination of public key encryption and conventional one-way ciphers • ■ a bulk cipher is one in which the same keys are used to encrypt and decrypt the data; they are fast and can encrypt files of any size ■ some sample bulk ciphers: RC2, RC4-40, RC4-56, DES40-CBC: The higher the digits the more complex the algorithms Making sure a message has not been altered…Message Digest ■ Generated in the frontend..before encrypting the resource ■ A message digest is the number produced by applying a cryptographic hash function to a message ● – The message digest is included along with the message; here are the steps that are followed: ● 1. Sender produces a message digest using a known hashing algorithm ● 2. Message digest is encrypted and sent with the message ● 3. Receiver decrypts the digest and then computes the message digest from the actual message to make sure they are identical ○ • For greater security, the message can also be encrypted ○ • Systems combining public key cryptography and message digests are called digital signatures ■ If one bit is changed Certificate Authority ■ Encryption of data does not solve the entire problem; how do we guarantee that the organizations that we are dealing with are legitimate? ■ A certificate authority (CA) is an organization that both parties involved in a secure communication, trust • ■ the role of the CA is to verify the identity of an entity (client/server) ■ ● once the CA verifies the entity, it issues a digitally signed electronic certificate – it is signed with the CA’s private key ■ Web browsers are usually pre-configured with a list of CAs that are trusted ■ Proving who you say you are - CA ■ You can become your own CA…using OpenSSL..why would you be your own CA? 1-1 transaction between yourself and no one else ■ A certificate authority dispenses public and private keys ○ Secure Sockets Layer Protocol and HTTPS ■ Given public key encryption, cryptographic hashing, message digests and digital certificates, how are these put together to produce secure electronic commerce? ■ The answer: SSL, is a protocol for establishing an encrypted link between server and client, that uses authentication and encryption of transactional data ■ Netscape was the original designer of SSL ■ SSL is occasionally called TLS, Transport Layer Security protocol ■ SSL is transparent to users, except for the https (rather than http) that appears ■ the SSL protocol fits between the TCP layer and the HTTP layer – ● therefore, SSL can be used to encrypt other application-level protocols such as FTP and NNTP ■ SSL is supported by all major browsers ■ 7 steps that happen before any data is transmitted ■ TLS replaced SSL as a protocol ■ SSLlabs …give thema domain and they run it on a site and give a score to see how secure the site is ■ Introduction to SSL ● Secure Sockets Layer (SSL) provides end-to-end security between client and server ● authentication of both parties is done using digital certificates ● privacy is maintained using encryption • ● message integrity is accomplished using message digests ● SSL for HTTP is referred to as HTTPS and operates on port 443 Web Server Performance ○ Popular platforms ■ Selecting a Web Platform ● Capacity – what capacity is needed from the server, databases, applications ● Cost/investment – what are the initial costs and the continuing costs? ● Maintenance – who will perform it; how complex ● Security – a strategy is needed ● ■ ■ ■ ■ Development support – are there staff to support application development ● Popular platforms ● Microsoft ○ Available on cloud ● Linux ○ Customized by the cloud…gcp, aws..their version of linux ○ redhat ○ Available on cloud ○ opensource ○ Oracle, mysql ○ Java servlets ● UNIX ○ Oracle solaris ○ Oracle weblogic ○ Financial transaction Estimate server performance requirement ● What is the time required to deliver a request to the server? ● What is the time required to obtain the result? ● What is the time to deliver the result? ● E.g., 120 clients connect twice each minute implies 240 connections per minute or 4 per second. If each client sends 1Kbyte of data and receives 15Kbytes back, then the server needs to support 64Kbytes per second or 512 Kbitsps or roughly .5megabits per second. ● General rule: the link should have at least twice the bandwidth as the average above Web server farms the way you are ● Multiple server machines and load balancing hardware that distributes web requests across the servers ● Hot standby: inset of a table…the hot server the one that is live and copies to another table..table that is ready ● if storage is shared across all servers, then there is a single point of failure Why is Load Balancing is needed? ● DNS redirection is good solution to roundrobin(send clients to next server…some servers will be under/over utilizied) ○ This form of load balancing (DNS Redirection) has problems because ■ – web browsers will cache the IP address for a given domain ■ – some operating systems cache IP addresses for given domains ○ ● However, some DNS servers use algorithms other than round robin, e.g. ■ – load-balancing: they check the load on many web servers and send the request to the least loaded ■ – proximity-routing: they send the request to the nearest server, when the servers are geographically distributed ■ – fault-masking: check for down web servers and avoid them ● Load balancing hardware exists to prevent requests going to servers that have failed ○ Web Server Farms ○ Load Balancing ■ • Switches ■ • DNS redirection Web Server as a proxy Server ○ The more traffic, the more RAM is needed by APache..if you don’t have memory then everything goes slow. ○ More connections means more traffic…how many http request can you respond per second…apache ability to respond to request goes down as there are more connections….apache is unable to support high traffic ○ NginX is used by more in the last few years… because its better…doesnt have high memory usage when there is more connections… and be able to respond to requests as more connections ○ Lighttpd..is second to nginx ○ Apache is just terrible ○ Need speed use nginx ○ Improving apache web server performance ■ Additional RAM is the only way…costly ■ Direct modules can help..embedded within binary ■ Use NginX as the reverse proxy…use apache as app server (tier-2)…route traffic to to proxy instead of actual application server ○ Proxy Server - An intermediary server that accepts requests from clients and either forwards them or services the request from its own cache. ■ A proxy server acts as a server to the client that makes a request, and it acts as a client to the servers it connects to ■ A proxy server can monitor all client requests to other servers ■ – On client-side, a called a “forward proxy” ● Use proxy server on the client side… ● Why use a forward proxy server: ○ Runs on client side ○ Prevent access to restricted sites..blacklisting ○ Control access to a restricted site.. Proxy server can request name/password ○ ○ ○ ○ • to improve performance by maintaining a cache to enhance security by controlling which application-level protocols are permitted Act as an anonymizer..removing some headers and leaving only what is needed…dont tell anything to the server side. removing identifying information from HTTP messages Caching ■ Get something from cache or server side ■ Through the ■ ■ ■ ■ ■ ■ ■ ■ Validation is used by servers and caches to communicate when an object has changed. ● – caches avoid having to download the entire object when they already have a copy locally, but they're not sure if it's still fresh. • The most common validator is the Last-Modified time. HTTP 1.1 introduced a new kind of validator called the ETag. ● – ETags are unique identifiers that are generated by the server and changed every time the object does. • Almost all caches use Last-Modified times and E-Tags in determining if an object is fresh. • Most modern Web servers will generate both ETag and Last-Modified validators for static content automatically Last modified..header used by server If modified since …header used by client side ● ■ ■ HTTP Headers ● HTTP headers give you a lot of control over how both browser caches and proxies handle your objects. ○ – They can't be seen in the HTML and are usually automatically generated by the Web server. However, you can control them to some degree, depending on the server you use. ● • HTTP headers are sent by the server in front of the HTML, and only seen by the browser and any intermediate caches. Lecture 15: Web Services and REST Notes ● Web Services is the idea of offering the capabilities/information on a web site via a programming interface, so application programs can more readily access the information on the site ● Web Services are APIs for accessing a website’s information across the Internet ● Big Web services uses a level of complex APIs…SOAP(Simple Object Access Protocol) based on XML and definitions …what companies need XML security and XML encryptions in their transactions? Financial companies…anything that involves money need to be protected…xml security and encryption will do that. ● ● ● ● ● Implementation of web services is divided in 3 categories: ○ Big web services ○ REST(Representational State Transfer) Services which use HTTP methods PUT, GET, POST and DELETE. ..server to client communication not server to server ○ Cloud services s which provide cloud storage, application hosting, content delivery, and other hosting services • All three types of Web Services provide access through APIs. REST Services ○ • Many web sites are now offering their facilities through REST Web Services ○ Access is provided using one or both of these methods: ■ – Direct URL, returning a response in one or more formats (XML, JSON, PHP) ■ – Library-based APIs, embedded in JavaScript, Java, C#, Objective-C and other source and binary library formats ○ Many of these services now require or include OAuth user authentication ■ – Oauth is a standard for clients to access server resources on behalf of a resource owner ■ Better to use library based APIs to do the authentication for you ○ Free up to a point….there is a limit for the free number of calls per day…if your site becomes popular…then the services will not be free Cloud Services ○ Application hosting…GCP, Azure, AWS ○ Backup and storage..use cloud for backup ○ Content Delivery…netflix is using AWS ○ Ecommerce - amazon ○ Media hosting ○ DNS Protection services (Cloudflare)...DDOS attacks ○ Access is provided using one or both of these methods: ■ – Dashboard ■ – Library-based APIs, embedded in Java, C#, Objective-C and other binary library formats ○ All these services are commercial services that require monthly payments ○ The consumer cloud services provide limited, free basic storage REST (Representational State Transfer) ○ REST is a style of software architecture for distributed hypermedia systems ■ – Initially proposed by Roy Fielding in a 2000 doctoral dissertation (remember which RFC he was involved with? The HTTP specification. With Tim Berners Lee) ● He co-founded the Apache HTTP Project. ● The world wide web is an example of REST ○ ● ● There are three fundamental aspects of the REST Design Pattern ■ – 1. client, 2. servers and 3. resources ■ – Resources are typically represented as documents ■ – Systems that follow Fielding's REST principles are often referred to as RESTful; ● Every entity is a resource…the name after the / ● Urls uniquely identify the resource ● Simple operations (put, get, post, delete) that invoke the operations you can do on the resource REST vs Other Approaches ○ REST ■ – Software architectural style for distributed hypermedia systems like WWW ■ – Quickly gained popularity through its simplicity ■ Became the norm bc its simple ○ SOAP ■ – Protocol for exchanging XML-based message, normally using HTTP ■ – Much more robust way to make requests, but more robust than most APIs need ■ – More complicated to use ● https://www.w3schools.com/xml/xml_soap.asp ■ It is very complex…requires libraries on both ends ○ • XML-RPC….remote procedure call ■ – RPC protocol with XML as an encoding and HTTP as a transport ■ – More complex than REST but much simpler than SOAP ■ – Supported by Python: ● • https://docs.python.org/3/library/xmlrpc.html ■ This is more of the middle…not as complex as soap but not as simple as REST ■ Running a remote function on server machine ○ • JSON-RPC ■ – RPC protocol encoded in JSON instead of XML ■ – Very simple protocol (and very similar to XML-RPC) REST as Lightweight Web Services ○ Much like Web services, a REST service is: ■ Totally platform independent..you dont care if server is unix or if the client is a mac or anything else ■ Language-independent (C# can talk to Java, etc) ● Standards-based (runs on top of http) ● – Can be used in the presence of firewalls (port 80/443 always open) ○ ● ● ● ● Like Web Services, REST offers no built-in security features, encryption, session management, QoS guarantees, etc. But also as with Web Services, these can be added by building on top of HTTP: ■ – For security, username/password tokens are often used. ■ – For encryption, REST can be used on top of HTTPS (secure sockets). ○ Stateless: Each HTTP request from a client to server must contain all the information needed to understand and complete the request. The server does not store any state about the client session on the server side. More complex REST requests ○ REST can easily handle more complex requests, including multiple parameters. ○ All types of HTTP requests: GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS ○ Rarely used: LINK, UNLINK, PURGE ○ As a rule, ○ 1. GET requests should be for read-only queries; they should not change the state of the server and its data, List Items, for example ○ 2. For creation, updating, use POST requests. POST can also be used for readonly queries, as noted above, when complex params are required.’ ○ 3. PUT, DELETE are also used for updating and deleting items. Amazon Associates web services ○ Amazon offers web services to 3 types of users: ■ – Associates: third-party site owners wishing to build more effective sponsored affiliate links to Amazon products, thus increasing their referral fees ■ – Vendors: sellers on the Amazon platform looking to manage inventory and receive batch product data feeds ■ – Developers: third-party developers building Amazon-driven functionality into their applications ○ Developers can build businesses by creating Web sites and Web applications that use Amazon products, charging and delivery mechanisms Apple Icloud for developers ○ Apple’s iCloud service places all information captured on any Apple device into the cloud, making it immediately available to all other Apple devices ○ 5gb for free REST Best Practices ○ . Provide a URI for each resource that you want exposed. ○ Try not use name value pairs….”id=xxx”...use planes/747 ○ As a corollary to (2) use nouns in the logical URI, not verbs. Resources are "things“ not "actions" ■ Use Nouns not verbs ○ ○ ○ ○ . Make all HTTP GETs side-effect free. The actions are the http actions…. Minimize the use of query strings . For example, prefer http://www.parts-depot.com/parts/00345 Over http://www.parts-depot.com/parts?part-id=00345 Always implement a service using HTTP GET when the purpose of the service is to allow a client to retrieve a resource representation, i.e., don’t use HTTP POST ■ Make sure GET are read only ■ POST is used to add stuff