Multimedia - PDF | Quizgecko

Dr./ Ahmed Mohamed Rabie 1 2 The word multimedia, originating from the Latin words “multum” and “medium”, means combination of multiple media contents. It is a technology that, even today after two decades of explosive growth, means different things to different people. 3 Multimedia is a technology that allows us to present text, audio, images, animations, and video in an interactive way that has created a tremendous impact on all aspects of our day-to-day life. It also has the potential to continue to create ever more 4 fascinating applications. 5 The key phrase in this definition is “digital processing device ”.It was the digital computer and its many variants such as tablets, smartphones, and PDAs that transformed tradition and produced “new media. ”The computer displaced traditional techniques for creating and editing all forms of media. 6 The multimedia revolution is not just about performing traditional tasks in new ways. It is also about creating new approaches to communication, commerce, education, and entertainment. Cell phones become text messengers, cameras, and video displays. E-commerce gives shoppers instant access to countless products and services complete with pictures, demos, reviews, and price comparisons. 7 New forms of entertainment, such as podcasts, video games, online poker tournaments, and interactive film, have transformed that industry as well. In these cases, and in many more, digital multimedia is changing the world by making it possible for users to interact with information in new ways. 8 A major component required in multimedia applications is a computer with high processing speed and large storage capacity. The hardware cost is decreasing at a rate never seen before, along with a rapid increase in the storage capacity, computing power, and network bandwidth. 9 The integration of multimedia technology into the communication environment has the potential to transform an audience from passive recipients of information to active participants in a media-rich learning process. 10 Within a short span of time, personal computers proved to be an appropriate medium to deliver full screen, full motion video with their innovative technologies and video compression technologies. Apple computer was one of the early computers coupled with these multimedia features. 11 A good multimedia application is one that keeps the technology invisible from the user. The purpose of a good multimedia presentation is to envelope the viewer with rich text, clear sound, sharp image, and smooth motion that can be stopped, started, and cross-referenced with ease. 12 13 Multimedia may be divided into following three categories based on their functions and how they are organized. Linear and non-linear; Interactive and non-interactive; and Real-time and recorded. 14 Linear active content progresses without any navigation control for the viewer such as a cinema presentation. Non-linear content offers user interactivity to control progress as used with a computer game or used in self-paced computer-based training. 15 In noninteractive multimedia the user has no control over the flow of information. The developer establishes a sequence of media elements and determines the manner in which they will be presented. An information kiosk at a museum might regularly repeat a series of slides describing the day’s events. Such applications are often a simple and effective way to draw attention to announcements, products, or services without requiring any action of the part of the viewer. 16 In interactive multimedia users are be able to control the flow of information.There are several types of interactive multimedia. The first provides basic interactivity. Basic interactions include menu selections, buttons to advance screens, VCR-like controls, clickable objects, links, and text boxes for questions or 17 responses. Hypermedia is a more advanced form of interactive media in which the developer provides a structure of related information and the means for a user to access that information. An online anatomy tutorial, for example, organizes information based on physiological relationships and may enhance a user’s understanding through hyperlinks to related text, drawings, animations, or video. The term “rich 18 media” is synonymous for interactive multimedia. 19 Another powerful form of multimedia interactivity is found in advanced simulations and games that create their own virtual reality. Virtual realities are not simply responsive to users; they are immersive. An immersive multimedia application draws its users into an alternate world, engaging them intellectually, emotionally, and even viscerally. Advanced flight simulators so thoroughly immerse pilots in a world of virtual flight that they routinely serve as substitutes for training in actual aircraft. Video games can draw players into other worlds 20 for hours or even days on end Multimedia presentations can be live (real-time) or recorded. A recorded presentation may allow interactivity via a navigation system. A live multimedia presentation may allow interactivity via interaction with the presenter or 21 performer. 22 23 24 25 Multimedia Authoring: Multimedia authoring tools are by far the most versatile and have lots of interactive controls for the user to develop complete multimedia applications from simple (e.g., slide show presentation) to most complex ones (e.g., computer games or interactive computer aided learning applications). Most authoring tools support WYSIWYG (what you see is what you get) environment or in a timeline-based environment. 26 27 Programming and scripting languages are also supported for designing customized and advanced scenes. Other important feature includes exporting the developed projects to self-executing and self- installing files to a CD or DVD recording media. Some well known authoring tools available are Macromedia Director, Authorware, Flash, HyperCard, Hyper studio and Icon Author etc. 28 29 30 Multimedia networks, as the name indicates, are combinations of two basic technologies—networking and multimedia computing. It is a system consisting of connected nodes made to share multimedia data, hardware, and software. Multimedia networking started placing continuous demand on the network infrastructure, and was at odds with packet switching and LAN technologies. 31 Lately, asynchronous transfer mode (ATM) has been developed to accommodate the real-time multimedia application issues, specially the delay and jitter problems. By using a 53-byte standard cell size to carry voice, data and video signals the delay problems can be avoided. It can also switch data via hardware, which is more efficient and less expensive. Different traffic types have been defined in ATM each delivering different QoS. 32 One of the traffic types, known as constant bit rate (CBR) is most suitable for multimedia applications. CBR supplies a fixed bandwidth virtual circuit that takes care of delay-sensitive multimedia applications which could be containing real-time video and voice. ATM also provides low latency, high throughput, and scalability which make it a network of choice for supporting new high bandwidth multimedia applications as well as LAN and TCP/IP traffic. ATM speeds are scalable and can exceed 2.5GBPS over fiber 33 Corporate Information Superhighway (COINS): Though with the advent of Internet the usage of multimedia applications has reached more people, overcoming the bandwidth limitations still remains a challenge for sometime to come. COINS is a globally connected, fast, efficient, cost-effective, high capacity multimedia network which supports multimedia applications. 34 It is based on a fiber-optic backbone with a capacity of up to 10 gigabits per second to transmit voice, video, data, and images. COINS offer seamless inter-networking, vertical integration, electronic home banking, and electronic commerce. It also provides security, reliability, and is extremely cost-effective. The quest for building an information superhighway, that is, setting up high capacity telecommunications network that would carry vast amounts of digital binary data, is still on. 35 Multimedia Wireless Networking: For all existing applications, emerging wireless systems will bring a new generation of wireless multimedia applications. The Nippon Electric Company (NEC) Corporation developed one of the early wireless transmission technologies based on the IEEE1394 high-speed serial bus and capable of 400 megabits, at transmission ranges up to 7 meters through interior wall and up to 12 meters by line-of-sight, which brought multimedia home networking another step closer to reality. 36 The IEEE1394 is well suited to multimedia networking in homes. It has the ability to connect up to 63 devices at a bandwidth of up to 400Mbps and enables a variety of graphics, video, computer and other data to use the network simultaneously. The development of wireless IEEE1394 networking technology now allows for creativity in homes without the hassle of installing new wiring. Some recent research works in multimedia networking area are listed in the references. 37 38 1- Real-Time Transport Protocol (RTP): It is used on the Internet for transmitting real-time data such as audio and video. The Real-Time Transport Protocol (RTP) does not have a TCP or UDP port to communicate. It runs over UDP via an open port (generally in the range 16384 to 32767) and next higher port (odd) is used for the RTP control protocol (RTCP). 39 UDP can not detect packet loss and restore packet sequence. RTP recover these problems using sequence number and time stamping. It also provides other end- to-end real-time data delivery services that include payload type identification and delivery monitoring. 40 41 2- Real-Time Control Protocol (RTCP): It is an Internet protocol that works in conjunction with RTP to monitor the quality of service and to convey information about the participants in an on-going session. The RTP control protocol (RTCP) is based on the periodic transmission of control packets to all participants in the session, using the same distribution mechanism as the data packets. 42 Feedback can be used to control performance. Sender may modify its transmissions based on feedback. Each RTCP packet contains sender and/or receiver reports. Statistics include number of packets sent, number of packets lost, inter-arrival jitter, and so forth. 43 3- Resource Reservation Protocol (RSVP): A host uses RSVP to request a specific quality of service (QoS) from the network, on behalf of an application data stream. RSVP carries the request through the network, visiting each node the network uses to carry the stream. At each node, RSVP attempts to make a resource reservation for the stream. RSVP does not perform its own routing; instead it uses underlying routing protocols to determine where it should carry reservation requests. 44 There are seven messages used in RSVP: Path, Resv, Path Teardown, Resv Teardown, Path Error, Resv Error, and Confirmation. The RSVP protocol is described in RFC 2205. 45 4- Real-Time Streaming Protocol (RTSP): It is a protocol for use in streaming media systems which allows a client to remotely control a streaming media server, issuing VCR-like commands (i.e., pause/ resume, repositioning of playback, fast forward, and rewind), and allowing time-based access to files on a server. Most RTSP servers use the RTP as the transport protocol for the actual audio/video data. 46 However, a proprietary transport protocol, known as Real Data Transport (RDT) developed by RealNetworks, is used as the transport protocol for RTSP server from RealNetworks. The RSVP protocol is described in RFC 2326 (http://tools.ietf.org/html/rfc2326). 47 5- The H.323, SIP (for call control and signaling), and RTP, RTCP, RTSP (for audio/video) are the protocols and standards used for Internet Telephony. SIP is a signaling protocol that initiates, manages, and terminates multimedia sessions. H.323 supports H.245 over UDP/TCP and Q.931 over UDP/TCP and RAS over UDP. 48 49 In both academia and industry, peer-to-peer (P2P) applications have attracted great attention. In a peer-to- peer (P2P) overlay network, a large number and heterogeneous types of peer processes are interconnected in networks (having wide varieties of computing and network resources) with an aim to exchange multimedia contents, such as movies, music, pictures, and animations, and so forth, in a reliable and real-time manner. 50 It is different from client-server based systems in a way that the peers bring with them server capacity. Multimedia streaming is a key technology to realize multimedia applications in networks. The P2P streaming applications, such as PPLive, UUSee, are on the rise. These enable the P2P file sharing/streaming application inexpensive to build and excellent in scalability 51 Multimedia Compression Algorithms 52 Multimedia compression is employing tools and techniques in order to reduce the file size of various media formats. With the development of World Wide Web the importance of compress algorithm was highlighted because it performs faster in networks due to its highly reduced file size 53 54 Uncompressed graphics, audio, and video data require substantial storage capacity, which is not possible in the case of uncompressed video data, even given today’s CD and DVD technology. The same is true for multimedia communications. Data transfer of uncompressed video data over digital networks requires that very high bandwidth be provided for a single point-to-point communication. 55 To be cost-effective and feasible, multimedia systems must use compressed video and audio streams. The most important compression techniques in use today are JPEG for single pictures, H.263 for video, and MPEG for video and audio 56 Currently the digital video interface (DVI), Joint Photographic Experts Group (JPEG), and Motion Pictures Experts Group (MPEG) are the three compression techniques that are widely used. Many of these algorithms employ the discrete cosine transform (DCT) because of its excellent energy compaction to achieve data compression. 57 Even with the amount of intensive computations used with the DCT algorithm, grayscale images still require millions of computations. Extending this to color images or moving images, require billions of computations. Data compression techniques for multimedia systems is a step towards reducing the intensity of computations employing fast transform algorithm. 58 There are three main reasons the present multimedia systems require data to be compresses. These reasons are related to Large storage requirements of multimedia data. Relatively slow storage devices which does not allow playing multimedia data in real time. Present network’s band width , which do not allow real time video data transmission. 59 A compression ratio is the average number of bits per pixel (bpp) before compression divided by the number of bits per pixel after compression. For example, if an 8 bit image is compressed and each pixel is then represented by 1 bit per pixel, the compression ratio =8/1=8 Or equivalently for a 24 bit image, if the compression ratio = 18 the compressed image will have 24/18=1.33 bpp. Data compression ratio is defined as the ratio between the uncompressed size and compressed size 60 Thus a representation that compresses a 10 MB file to 2 MB has a compression ratio of 10/2=5 ,often notated as an explicit ratio , 5:1 ( read "five" to "one"), or as an implicit ratio ,5/1.Note that this formulation applies equally for compression, where the uncompressed size is that of the original; and for decompression, where the uncompressed size is thatof the reproduction. 61 1- Lossless compression refers to compression methods for which the original uncompressed data set can be recovered exactly from the compressed stream. The need for lossless compression arises from the fact that many applications, such as the compression of digitized medical data, require that no loss be introduced from the compression method. 62 Lossless data compression is used in many applications. For example, it is used in the ZIP file format and in the GNU tool gzip. It is also often used as a component within lossy data compression technologies (e.g. lossless preprocessing by MP3 encoders and other lossy audio encoders). Lossless compression is used in cases where it is important that the original and the decompressed data be identical, or where deviations from the original data would be unfavorable. 63 64 Common examples are executable programs, text documents, and source code. Some image file formats, like PNG or GIF, use only lossless compression, while others like TIFF and MNG may use either lossless or lossy methods. Lossless audio formats are most often used for archiving or production purposes, while smaller lossy audio files are typically used on portable players and in other cases where storage space is limited or exact replication of the audio is unnecessary. 65 In recent years, several compression standards have been developed for the lossless compression of such images. In general, even when lossy compression is allowed, the overall compression scheme may be a combination of a lossy compression process followed by a lossless compression process. 66 67 1-1 Run Length Encoding (RLE) is a simple method of compressing data by specifying the number of times a character or pixel color repeats followed by the value of the character or pixel. The aim is to reduce the number of bits used to represent a set of data. Reducing the number of bits used means that it will take up less storage space and be quicker to transfer. The process involves going through the text and counting the number of consecutive occurrences of each character (called "a run"). The number of occurrences of the character and the character itself are then stored in pairs. 68 Run-length encoding compresses data by reducing the physical size of a repeating string of characters. This process involves converting the input data into a compressed format by identifying and counting consecutive occurrences of each character. The steps are as follows: 1.Traverse the input data. 2.Count the number of consecutive repeating characters (run length). 3.Store the character and its run length. 69 70 71 Decoding algorithm: The decoding process involves reconstructing the original data from the encoded format by repeating characters according to their counts. The steps are as follows: 1.Traverse the encoded data. 2.For each count-character pair, repeat the character count times. 3.Append these characters to the result string. 72 Applications of RLE: Some of the applications of RLE and how it can be used to streamline data with multiple compression runs. 1. Text Compression: RLE is commonly used in text compression to reduce the size of files, especially those containing large amounts of repetitive text. For example, consider a file containing a long list of names. Without compression, the file would be quite large, but by using RLE to replace repeated names with a count and a single occurrence, the file size can be significantly reduced. 73 2. Image Compression: RLE is also used in image compression, particularly for black and white images or images with large areas of solid color. In such images, there may be long runs of identical pixels that can be compressed using RLE. For example, in a black and white image, a run of 1000 black pixels can be represented by the code "1000B" (where "B" represents black). 74 3. Video Compression: RLE is used in video compression to reduce the amount of data needed to represent a video sequence. In video compression, RLE is applied to individual frames or to a series of frames to capture the changes between them. By compressing the frames using RLE, the overall file size can be reduced, making it easier to store and transmit the video rates. 75 4. Data Transmission: RLE is also used in data transmission to reduce the amount of data that needs to be transmitted. By compressing the data using RLE before transmission, the amount of data that needs to be sent is reduced, leading to faster transmission times and lower bandwidth usage. RLE is a versatile and efficient algorithm that can be applied to various types of data to reduce their size. From text and images to video and data transmission, RLE can be used to streamline data with multiple compression runs, leading to faster transmission times, lower storage requirements, and improved data transfer 76 The advantages and disadvantages of RLE compression are as follows: Advantages of RLE This algorithm is fast and uses less CPU cycles. This algorithm is suitable for compressing data containing repeating characters such as spaces or nulls. Disadvantages of RLE This algorithm is not suitable for compressing binary files which contain few repeating characters. This algorithm typically does not compress data as well as ZLIB compression. 77 78 1-2 Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the generated bit stream. 79 Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or “ab”. See this for applications of Huffman Coding. There are mainly two major parts in Huffman Coding Build a Huffman Tree from input characters. Traverse the HuffmanTree and assign codes to characters. 80 Algorithm: The method which is used to construct optimal prefix code is called Huffman coding. This algorithm builds a tree in bottom up manner. We can denote this tree by T Let, |c| be number of leaves |c| -1 are number of operations required to merge the nodes. Q be the priority queue which can be used while constructing binary heap. 81 Steps to build Huffman Tree Input is an array of unique characters along with their frequency of occurrences and output is HuffmanTree. 1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root) 2. Extract two nodes with the minimum frequency from the min heap. 82 3. Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap. 4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete 83 Huffman Tree 84 Example 85 86 87 88 89 90 91 92 93 94 95 Decompression (Huffman algorithm) Binary File 101100111 Text dce 96 Time complexity: O(nlogn) where n is the number of unique characters. If there are n nodes, extractMin() is called 2*(n – 1) times. extractMin() takes O(logn) time as it calls minHeapify(). So, the overall complexity is O(nlogn). If the input array is sorted, there exists a linear time algorithm. 97 Applications of Huffman Coding: 1. They are used for transmitting fax and text. 2. They are used by conventional compression formats like PKZIP, GZIP, etc. 3. Multimedia codecs like JPEG, PNG, and MP3 use Huffman encoding(to be more precise the prefix codes). 4. It is useful in cases where there is a series of frequently occurring characters. 98 Advantages of Huffman Coding: 1- Huffman coding is an efficient method of data compression, as it assigns shorter codes to symbols that appear more frequently in the dataset. This results in a higher compression ratio. 2- Huffman coding is a prefix coding scheme, which means that it does not require any special markers to separate different codes. This makes it easy to implement and decode. 99 3- Huffman coding is a widely used method of data compression and is supported by many software libraries and tools, making it easy to integrate into existing systems. 4- Huffman coding is a lossless compression method, meaning the original data can be reconstructed exactly from the compressed data. 5- Huffman coding is a simple and efficient algorithm and can be easily implemented in software and hardware. 100 Disadvantages of Huffman Coding: 1- Huffman coding requires the frequency of each symbol to be known in advance, making it less suitable for situations where the distribution of symbols is not known or changes dynamically. 2- Huffman trees can be complex and difficult to understand, making it harder to debug and maintain the code. 101 3- The coding process can be time-consuming and computationally expensive, especially for large datasets. 4- Huffman coding is not always the most efficient method of compression, and there may be other methods that provide better compression ratios for a given dataset. 5-Huffman coding can be less effective on data where there are very few unique symbols, or where the symbols are already highly compressed. 102 103 1-3 Lempel–Ziv–Welch (LZW) Algorithm: The LZW algorithm is a very common compression technique. This algorithm is typically used in GIF and optionally in PDF and TIFF. Unix’s ‘compress’ command, among other uses. It is lossless, meaning no data is lost when compressing. The algorithm is simple to implement and has the potential for very high throughput in hardware implementations. It is the algorithm of the widely used Unix file compression utility compress and is used in the GIF image format. 104 The Idea relies on reoccurring patterns to save data space. LZW is the foremost technique for general-purpose data compression due to its simplicity and versatility. It is the basis of many PC utilities that claim to “double the capacity of your hard drive. LZW compression works by reading a sequence of symbols, grouping the symbols into strings, and converting the strings into codes. Because the codes take up less space than the strings they replace, we get compression. Characteristic features of LZW includes. 105 LZW compression uses a code table, with 4096 as a common choice for the number of table entries. Codes 0-255 in the code table are always assigned to represent single bytes from the input file. When encoding begins the code table contains only the first 256 entries, with the remainder of the table being blanks. Compression is achieved by using codes 256 through 4095 to represent sequences of bytes. 106 As the encoding continues, LZW identifies repeated sequences in the data and adds them to the code table. Decoding is achieved by taking each code from the compressed file and translating it through the code table to find what character or characters it represents. 107 Example: ASCII code. Typically, every character is stored with 8 binary bits, allowing up to 256 unique symbols for the data. This algorithm tries to extend the library to 9 to 12 bits per character. The new unique symbols are made up of combinations of symbols that occurred previously in the string. It does not always compress well, especially with short, diverse strings. But is good for compressing redundant data, and does not have to save the new dictionary with the data: this method can both compress and uncompressed data. 108 The idea of the compression algorithm is the following: as the input data is being processed, a dictionary keeps a correspondence between the longest encountered words and a list of code values. The words are replaced by their corresponding codes and so the input file is compressed. Therefore, the efficiency of the algorithm increases as the number of long, repetitive words in the input data increases 109 Example Use the LZW algorithm to compress the string: BABAABAAA The steps involved are systematically shown in the diagram below. 110 BABAABAAA A=65, B=66 Current Char Next Char C+N in Dictionary Output Matrix Index B A No 66 BA 256 A B No 65 AB 257 B A Yes - - - BA A No 256 BAA 258 A B Yes - - - AB A No 257 ABA 259 A A No 65 AA 260 A A Yes - - - AA Eof No 260 - - Output 111 The LZW decompressor creates the same string table during decompression. It starts with the first 256 table entries initialized to single characters. The string table is updated for each character in the input stream, except the first one. Decoding is achieved by reading codes and translating them through the code table being built. 112 Example : LZW Decompression: Use LZW to decompress the output sequence of The steps involved are systematically shown in the diagram below. 113 Current Next Next in Output string Frist char Matrix, Index code code dictionary 66 - - B - - 66 65 Yes A A BA=256 65 256 Yes BA B AB=257 256 257 Yes AB A BAA=258 257 65 Yes A A ABA=259 65 260 No AA A AA=260 Output BABAABAAA 114 In this example, 72 bits are represented with 72 bits of data. After a reasonable string table is built, compression improves dramatically. LZW Summary: This algorithm compresses repetitive sequences of data very well. Since the code words are 12 bits, any single encoded character will expand the data size rather than reduce it. 115 Advantages of LZW : LZW requires no prior information about the input data stream. LZW can compress the input stream in one single pass. Another advantage of LZW is its simplicity, allowing fast execution. High Compression Ratio: LZW can achieve high compression ratios, particularly for text-based data, which can significantly reduce file sizes and storage requirements. 116 Fast Decompression: LZW decompression is typically faster than other compression algorithms, making it a good choice for applications where decompression speed is critical. Universal Adoption: LZW is widely used and supported across a variety of software applications and operating systems, making it a popular choice for compression and decompression. Dynamic Compression: LZW uses a dynamic compression algorithm, meaning it adapts to the data being compressed, which allows it to achieve high compression ratios even for data with repetitive patterns. 117 Disadvantages: Patent Issues: LZW compression was patented in the 1980s, and for many years its use was subject to licensing fees, which limited its adoption in some applications. Memory Requirements: LZW compression requires significant memory to maintain the compression dictionary, which can be a problem for applications with limited memory resources. 118 Compression Speed: LZW compression can be slower than some other compression algorithms, particularly for large files, due to the need to constantly update the dictionary. Limited Applicability: LZW compression is particularly effective for text-based data, but may not be as effective for other types of data, such as images or video, which have different compression requirements. 119 2- Lossy compression differs from its counterpart, lossless compression, as its name implies some amount of data may be lost in the process. Thus, after a compression/decompression cycle, the data set will be modified from the uncompressed original and information may be lost. Lossy compression techniques attempt to eliminate unnecessary or redundant information, focusing more on saving space over preserving the accuracy of the data. Ideally, the loss is either minimal or undetectable by 120 human observations Lossy compression techniques are used for pictures and music files that can be trimmed at the edges. Unlike text files and processing files, pictures and music do not require reconstruction to be identical to the original, especially if the data dropped is insignificant or undetectable. Lossy compression is achieved by combining standard compression techniques plus simplifications to the image that reduce the amount of data required to store it 121 A very popular image format, jpeg, is also lossy. The goal in lossy compression is to reduce the file size further than is possible with lossless compression but to keep the appearance of the image as intact as possible while making the simplifications. The sources of images and videos are required more storage space than text. When exporting a movie, codec to compress the information for storage and transfer (such as on a DVD), and to decompress the information so it can be viewed again. 122 Lossy compression is most commonly used to compress multimedia data (audio, video, and images), especially in applications such as streaming media and internet telephony. By contrast, lossless compression is typically required for text and data files, such as bank records and text articles. It can be advantageous to make a master lossless file which can then be used to produce additional copies from. This allows one to avoid basing new compressed copies off of a lossy source file, which would yield additional artifacts and further unnecessary information loss. 123 In many cases, files or data streams contain more information than is needed. For example, a picture may have more detail than the eye can distinguish when reproduced at the largest size intended; likewise, an audio file does not need a lot of fine detail during a very loud passage. Developing lossy compression techniques as closely matched to human perception as possible is a complex task. Sometimes the ideal is a file that provides exactly the same perception as the original, with as much digital information as possible removed; other times, perceptible loss of quality is considered a valid tradeoff. 124 2.1 Transform coding is the process of creating a quantized group of blocks (containing all pixels in a frame) of consecutive samples from a source input and converting it into vectors. The goal of transform coding is to decompose or transform the input signal into something easier to handle. There is a good chance that there will be substantial correlations among neighboring samples; to put it in other words, adjacent pixels are usually similar, therefore, a compressor will remove some samples to reduce file size. 125 The range of pixels that can be removed without degrading quality irreparably is calculated by considering the most salient ones in a block. For example: If Y is the result of a linear transform T of the input vector X in such a way that the components of Y are much less correlated, then Y can be coded more efficiently than X. 126 If most information is accurately described by the first few components of a transformed vector Y, then the remaining components can be coarsely quantized, or even set to zero, with little signal distortion. As correlation decreases between blocks and subsequent samples, the efficiency of the data signal encode increases. 127 2.2 Discrete Cosine Transform is used in lossy image compression because it has very strong energy compaction, i.e., its large amount of information is stored in very low frequency component of a signal and rest other frequency having very small data which can be stored by using very less number of bits (usually, at most 2 or 3 bit). 128 To perform DCT Transformation on an image, first we have to fetch image file information (pixel value in term of integer having range 0 – 255) which we divides in block of 8 X 8 matrix and then we apply discrete cosine transform on that block of data. After applying discrete cosine transform, we will see that its more than 90% data will be in lower frequency component. For simplicity, we took a matrix of size 8 X 8 having all value as 255 (considering image to be completely white) and we are going to perform 2-D discrete cosine transform on that to observe the output. 129 DCT 130 2.3 Fractal Compression is a method of data compression that leverages the self-similarity property of fractals to encode images and other data types. By identifying and mathematically describing repeating patterns within the data, fractal compression algorithms can significantly reduce the amount of storage space required while maintaining high- quality output upon decompression. 131 Fractal compression, a revolutionary approach in the field of data compression, is particularly notable for its application in image compression. Unlike traditional compression methods that rely on specific algorithms to reduce data size, fractal compression uses fractals—complex geometric shapes that can be split into parts, each of which is a reduced-scale copy of the whole. This self-similarity characteristic allows fractal compression to efficiently encode data. 132 Fractal compression was first introduced by Michael Barnsley in the late 1980s. His work demonstrated that natural images often contain repetitive patterns that can be described using mathematical functions. These functions, known as iterated function systems (IFS), form the core of fractal compression techniques. 133 The process of fractal compression involves several key steps: 1.Partitioning the Image: The image is divided into non- overlapping blocks, known as range blocks. 2.Identifying Similarities: Each range block is compared with larger sections of the image, called domain blocks, to find the best match based on similarity. 3.Transform Function: A mathematical transform (rotation, scaling, translation) is applied to the domain block to approximate the range block. 134 4- Encoding: The parameters of the transformation functions are stored instead of the actual pixel values. These parameters are much smaller in size compared to the original image data. During decompression, these transformation functions are applied iteratively to generate the image, leveraging the self-similarity properties to reconstruct the image with high fidelity. 135 136 Benefits of Fractal Compression Fractal compression offers several benefits that make it an attractive option for certain applications: High Compression Ratios: Fractal compression can achieve higher compression ratios compared to traditional methods like JPEG or PNG, particularly for images with high levels of detail and self-similarity. 137 Resolution Independence: The fractal representation of an image allows it to be scaled to different resolutions without significant loss of quality. This makes it ideal for applications where images need to be viewed at multiple scales. Progressive Transmission: Fractal compressed images can be progressively transmitted and reconstructed. This means that a lower resolution version of the image can be quickly displayed while the higher resolution details are still being downloaded. 138 Applications of Fractal Compression Fractal compression is used in various fields where efficient data storage and high-quality image reproduction are crucial: Medical Imaging: High-resolution medical images can be stored and transmitted more efficiently using fractal compression. Satellite Imaging: Satellite images, which often contain repetitive patterns, benefit from the high compression ratios of fractal compression. 139 Multimedia: Video and image storage in multimedia applications can be optimized using fractal compression techniques. Pattern Recognition: The self-similar nature of fractals is useful in pattern recognition tasks where identifying and encoding repetitive patterns is essential. 140 Implementation of Fractal Compression To implement fractal compression, one must understand the mathematical foundations of fractals and the iterated function systems used in the process. Here is a simplified outline of the steps involved in implementing fractal compression for an image: 1.Image Partitioning: Divide the image into smaller non- overlapping range blocks. 2.Domain Pool Creation: Create a pool of larger overlapping domain blocks from the image. 141 3. Similarity Search: For each range block, search the domain pool to find the best matching domain block based on a similarity measure (e.g., mean squared error). 4. Transformation Calculation: Calculate the transformation parameters (rotation, scaling, translation) that best map the domain block to the range block. 5. Parameter Encoding: Store the transformation parameters instead of the original image data. 6. Decompression: Apply the stored transformation parameters iteratively to reconstruct the image. 142 143 The name “codec ”comes from an abbreviation of its function of compression and decompression. During compression, repetitive and unnecessary information in the original file is discarded, causing the original file to lose information. For this reason, most codecs are considered lossy to allow the file to retain a high level of quality. The DV and MPEG codecs are especially good at maintaining excellent quality. Compressing video reduces its file size and data transfer rate, facilitating smooth playback and reducing storage requirements. 144 Codec denotes a complete system capable of encoding and decoding data which consists of an Encoder and a Decoder, transcoding is a conversion from one encoded digital representation into another one. A wide range of codecs is available, no single codec is best for all situations. For example, the best codec for compressing cartoon animation is generally not efficient for compressing live- action video. 145 A codec takes data in one form, encodes it into another form and decodes it at the Egress point in the communication session. Codecs are made up of an encoder and decoder. The encoder compresses a media file, and the decoder decompresses the file. There are hundreds of different codecs designed to encode different mediums such as video and audio 146 Codecs are invisible to the end user and come built into the software or hardware of a device. For example, Windows Media Player, which comes pre-installed with every edition of Windows, provides a limited set of codecs that play media files. Users can also download codecs to their computers if they need to open a specific file, but in those cases, it might be easier to download a codec pack or a player program. However, before adding codecs, users should first check which codecs are already installed on their system by using a software program. 147 In communications, codecs can be hardware- or software-based. Hardware-based codecs perform analog- to-digital and digital-to-analog conversions. A common example is a modem used to send data traffic over analog voice circuits. In this case, the term codec is a blend of coder/decoder. 148 Software-based codecs describe the process of encoding source audio and video captured by a microphone or video camera in digital form for transmission to other participants in calls, video conferences, and streams or broadcasts, as well as shrinking media files for users to store or send over the internet. In this example, the term codec is a blend of compression/decompression. 149 Codecs are used for several reasons, including the following: Take up less space. Media files are compressed to save space. Some media files, like video files, can be huge and can take up a lot of space if not compressed. According to tech newsletter Review Geek, an uncompressed 4K video file is the equivalent of about 5 terabytes (TB) of data per hour, which is way more than Blu-ray or streaming could handle. Compressed, the file would be in the gigabyte range. 150 Enable efficient transfers. If media files were not compressed, it would be much more difficult to send these files over the internet. Uncompressed files would take much longer to share, since the files are bigger and it takes more resources to send them. 151 152 As an example, a typical multimedia application may require the storage of more than 30 min of video ,2000 images, and 40 min of stereo sound on each laser disc side, this application would require about 50 GB storage for video ,15 GB storage for images, and 0.4 GB storage for audio that gives a total of 65.4 GB of storage. Video files are collections of images, audio and other data. The attributes of the video signal include the pixel dimensions, frame rate, audio channels, and more. In addition, there are many different ways to encode and save video data. 153 1- H.265: High Efficiency Video Coding (HEVC): H.264’s successor that supports 8K video resolution is one of the most popular codecs you can find today. It is known for being capable of delivering impeccable quality at a higher degree of compression than the one you get with the H.264 codec. It is perfect for optimizing huge high-resolution video files for the web. 154 The only big issue with H.265 right now is compatibility because not every device can handle HEVC’s advanced video compression algorithms and decode them properly. To make sure your PC would be okay with a HEVC video file, you need a powerful CPU or graphics card - the minimum would be, for instance, an Intel 6th generation Skylake CPU or an NVIDIA GeForce GTX 950 graphics card. 155 2- H.264 Advanced Video Coding (AVC): Though more advanced codecs are being released, H.264 is still the most widely used one. Compared to MPEG-2, AVC is much better at compressing video files while keeping the image quality high. Compressing with H.264 results in lower bitrates and smaller file sizes. On top of that, you get a video with lower bitrate requirements making AVC an ideal choice for streaming. 156 Because H.265 is a little too demanding and requires more processing power, H.264 is still the safest choice for video editing. Both H.265 and H.264 have lossless versions for compression without sacrificing video quality. 157 3- MPEG-2: This codec released in 1996 was developed by Moving Picture Experts Group and while AVC and HEVC are taking over, MPEG-2 is still often used in video production. MPEG-2 also known as H.222/H.262 is a popular codec of choice in over-the-air digital TV broadcasting. 158 It is also a go-to for DVD video format, but when it comes to the web, MPEG-2 is not the top choice, which is why some browsers do not support it without a special plugin. MPEG-2 offers resolutions of 720×480 and 1280×720 at 60 fps and is great at reducing file sizes. Needless to say, it does not require a high-end GPU or a last-generation CPU. 159 4- AV1 AOMedia Video 1 (AV1) is one of the codecs that supports lossless compression. What makes it so awesome is that it’s open and royalty-free. AV1 is AVC, HEVC, and VP9’s main competitor as it can deliver higher compression rates at comparable image quality. AV1 also needs a lower bitrate to achieve the same picture quality, proving its superiority as a video codec for streaming. The main downside is that AV1 is one of the newest codecs out there and software support issues are a common trouble. 160 5- VP9: Just like AV1, VP9 was developed by Google and is an open and royalty-free codec that supports lossless compression. It is very similar to the almighty HEVC in terms of bitrates and video quality. Though VP9’s main profile supports 8-bit color depth at 4:2:0 chroma subsampling levels, its profiles also offer support for the full range of chroma subsampling modes and more color depth. It is one of the most powerful codecs for compressing high-resolution videos for the web. Note, however, that Apple devices do not supportVP9 161 6- ProRes: ProRes is an Apple proprietary codec developed to provide high-quality video with efficient compression. ProRes is widely used for post-production, distribution, and archiving of professional video content. However, the codec is limited to Apple devices, and compatibility with other software may be limited 162 Choosing the right video codec is vital for any project. While quality and compatibility are essential, other factors such as compression efficiency, licensing fees, and hardware requirements should be considered. With this guide’s help and understanding of the popular codecs available, you should be able to choose the right video codec for your project and requirement. 163 If making online videos and streaming is your go-to goal, H.264 or H.265 are excellent choices (especially if you’re looking to make 8k videos). However, these will come at a hefty price, so keep that in mind. Alternatively, if you want a free video codec, either AV1 or VP9 would be sound choices. VP9 might be a better pick if your primary target audience is on Android and Google’s platforms. But its playback limitations may prove to be a deal breaker if you aim to target a broader market. 164 A container format is a package or a wrapper that contains all the necessary metadata of a digital file, including an audio codec, video codec, and closed captioning. These containers can hold several types of codecs, so they are essentially just storage units. They will only open and allow the codecs to work their magic if the target device or program supports the stored codec. 165 166 The crucial thing to note here is that not all programs accept all types of containers and codecs. In other words, you’d be wise to use multi-format encoding when looking to get your content to a variety of devices. 167 168 1. MP4: MP4 is a widely supported container format compatible with various devices and platforms, making it highly versatile. It is based on the MPEG-4 Part 14 standard and can store video, audio, subtitles, and metadata. MP4 files are known for their relatively small file sizes without compromising video quality. 169 2. AVI: AVI (Audio Video Interleave) is an older container format that remains popular for its compatibility with legacy systems. Microsoft introduced it in the early 1990s, supporting audio and video data. AVI files are larger than more modern formats, but they offer good video quality and are widely supported by media players. 170 3. MKV: MKV (Matroska Video) is a flexible, open-source container format supporting multiple audio and subtitle tracks. It is commonly used for storing high-definition videos and is known for retaining high-quality video and audio while keeping file sizes relatively small. MKV files also support advanced features like chapter navigation and metadata 171 4. MOV: MOV is a video container format developed by Apple and is prevalent in the Mac ecosystem. However, MOV files can also be played on platforms with compatible media players. MOV files are commonly used for storing videos, especially those created using Apple’s QuickTime framework. They support various codecs and can contain multiple audio and video tracks, subtitles, and metadata. 172 173 Codecs are responsible for compressing the video, while containers encapsulate the compressed video and audio and other multimedia elements. Together, they enable efficient transmission and delivery of video content over the internet while ensuring compatibility across different devices and platforms. 174 Choosing the appropriate codec and container combination depends on intended use, target devices, and desired video quality. When selecting a codec, consider factors like compression efficiency, playback compatibility, and licensing requirements. For containers, consider the desired additional features, compatibility across devices and playback platforms, and the specific multimedia components you intend to include in your video file. 175 For example, the MP4 container paired with the H.264 codec has become a de facto standard for online video streaming. This combination provides excellent compression, quality, and compatibility with most devices and streaming services. Similarly, the MKV container with the H.265/HEVC codec is gaining popularity for high-definition content due to its superior compression capabilities and versatility. 176 177 Multimedia Components 178 Text 179 Text is one of the most imperative components of multimedia and an essential source of presenting information to a wide range of people. Proper use of text, keeping in mind elements such as font style, size and various design tools help the content creator to communicate the idea and message to the user. Text is also the most widely used and flexible means of communicating information and ideas on an electronic media. 180 Billboards are used in a public place where a large number of people can see it. To make sure everybody can see the billboard, a clear, large font style and size is used. It is also important to use text in concise manner so that the billboard does not look text heavy and can be easily and quickly read by people. 181 A multimedia developer can also customize fonts according to his wants and design. There are various software available in the market that help in creating a variety of typefaces. The involvement of text in hypermedia and hypertext on internet allows the users to red information, listen to music, play games, shop online. Hypertext use hyperlinks to present text and graphics whereas interactive multimedia is called hypermedia. 182 Text is one of the easiest of all multimedia elements to use. Most computer users are familiar with word processing and know the processes of entering and editing text and working with fonts and font sizes. Factors affecting legibility of text are as follows: Size and style Background and foreground colors Leading 183 A glyph is a graphic representation of a character’s shape where a character may be represented by many glyphs. A typeface is a family of many characters often with many type sizes and styles. On the other hand, a font is a collection of character or glyphs of a single size and style belonging to a particular typeface family. 184 Glyph 185 Many fonts are also available online and people can download them from a server. They are classified on the basis of spacing between characters, words, presence or absence of serifs, their shape, stretch and weight such as bold or italics. Underlining, outlining and strikeout of characters may also be added in the text. 186 Typefaces 187 Fonts 188 Font size is measured in points and it does not describe the height or width of its character. This happens because the height of two different fonts (in both upper and lower case) may differ. One point is approximately 1/72 of an inch i.e., 0.0138. 189 190 Fonts are very useful as they help in gaining attention of the reader by highlighting headings, increasing readability and project an image. They can be classified into three categories – serif, sans serif and decorative. The serif is the little decoration at the end of a letter stroke. 191 192 Example: Times New Roman, Bodoni, Bookman are some fonts which come under serif category. Arial, Avant Garde, Verdana are some examples of sans serif font. The spacing between character pairs is called kerning and the space between lines is called leading. 193 There are a few things that a user must keep in mind before selecting fonts for a multimedia presentation. The guidelines will help a user in choosing appropriate fonts: Choose a font that is legible and easy to read. The different effects and colors of a font can be chosen to make the text look distinctive. Try to use few different colors within the same presentation. Try to use few typefaces within the same presentation. Play with the style and size to match up to the purpose and importance of the text. For instance, use large font size for headings. 194 Drop caps and initial caps can be used to accent the words. Anti-aliased can be used to make a text look gentle and blended. To attract instant attention to the text, the words can be wrapped onto a sphere or bent like a wave. In case of text links (anchors) on web pages the messages can be highlighted. Meaningful words and phrases can be used for links and menu items. Overcrowding of text on a single page should be avoided. Do not use decorative passages for longer paragraphs. 195 196 The basic element of multimedia is the text. However, the text should be kept minimum to avoid overcrowding unless the application contains a lot of reference material. Less text can be read easily and quickly unlike longer text passages which can be time consuming and tiring. A lot of information in a multimedia presentation is not ideally the best way to transfer information to a wide range of audience. Combining other elements such as pictures, graphics, diagrams, etc., can help reduce the amount of text written to provide information. 197 From design point of view, text should fill less than half the screen. There are following ways in which a text can be used in multimedia: in text messaging in advertisements in a website in films such as titles and credits as subtitles in a film or documentary that provide a translation. 198 Using text in websites attract a visitor’s attention as well as help him in understanding the webpage better. It is far better than the use of meaningless graphics and images which do not contribute in understanding of the page. Website loading speed is one of the important factors that influences conversion as visitors stars to leave the page if it takes more than eight seconds to load. 199 Another important things is how easily visitors find what they are looking for which depends upon both eye catching images and informative text. However, informative text draws much more visitors than graphics and images. This is why text should be the primary concern of the website than graphic elements. Informative text can also boost search engine traffic and conversions to a great deal. 200 A font editor is a class of application software specifically designed to create or modify font files. Font editors differ greatly depending on if they are designed to edit bitmap fonts or outline fonts. Most modern font editors deal with the outline fonts. Special font editing tools can be used to make your own type, so you can communicate an idea or graphic feeling exactly. With these tools, professional typographers create distinct text and displays faces. 201 202 Sometimes a physical web page behaves like two or more separate chunks of content. The page is not the essential unit of content in websites built with Flash (an animation technology from Macromedia) and in many non-web hypertext systems. Hence, the term node is used as the fundamental unit of hypertext content. Links are the pathways between nodes. When a user clicks links a succession of web pages appear and it seems that a user is navigating the website. 203 For a user, exploring a website is much like finding the way through a complex physical environment such as a city. The user chooses the most promising route and if get lost, he may backtrack to familiar territory or even return to home page to start over. A limitation of the navigation is that it does not correspond to the full range of user behavior. Majority of users click the most promising links they see which has forced the web designers to create 204 links that would attract users. Website designers and other hypertexts must work hard to decide which nodes will be linked to which other nodes. There are familiar arrangements of nodes and links that guide designers as they work. They are called information structures. Hierarchy, web-like and multi-path are three of the most important of these structures. 205 The hierarchy is the most important structure because it is the basis of almost all websites and most other hypertexts. Hierarchies are orderly (so users can grasp them) and yet they provide plenty of navigational freedom. Users start at the home page, descend the branch that most interests them, and make further choices as the branch divides. At each level, the information on the nodes becomes more specific. Notice that branches may also converge. 206 207 When designing larger hypertexts, website designers must choose between making the hierarchy broader (putting more nodes on each level) or deeper (adding more levels). One well-established design principle is that users more easily navigate a wide hierarchy (in which nodes have as many as 32 links to their child nodes) than a deep hierarchy. 208 Nodes can be linked to one another in web-like structures. There are no specific designs to follow but web designers must take care in deciding which links will be most helpful to users. Many structures turn into a hierarchical structure and cause trouble to users in navigating them. This is why few web-like websites and non-web hypertexts are made. Many web-like hypertexts are short stories and other works of fiction, in which artistic considerations may override the desire for efficient navigation. 209 210 Multi-path Structures: It is possible to build a sequence of nodes that is in large part linear but offers various alternative pathways. This is called multi-path structure. Users find multi-path structures within hierarchical websites. For instance, a corporate website may have a historical section with a page for each decade of the company’s existence. Every page has optional digressions, which allows the user to discover events of that decade. 211 212 Computing and the web will continue to evolve in a great many ways. Monitors may give way to near-eye displays, at least for mobile computing. Virtual reality may become more widespread and may be routinely incorporated into the web. For instance, websites may provide much improved site maps consisting of a 3D view of the site structure, maybe using the metaphor of galaxies and solar systems. 213 The web may well become more intelligent and capable of generating personalized links that really match users interest. The web may also become more social as users click links that open up live audio or video sessions with another user. As communications medium changes, theory must keep pace. Or else it becomes increasingly difficult to understand the medium and design successfully for it. 214 215 We acquire a great deal of knowledge through our ears. Many multimedia developers take advantage of this sense by incorporating sound into their multimedia products. Sound enhances a multimedia application by supplementing presentations, images, animation, and video. In the past, only those who could afford expensive sound recording equipment and facilities could produce high-quality, digital sound. Today, computers and synthesizers make it possible for the average person to produce comparable sound and music. 216 Sound is the terminology used in the analogue form, and the digitized form of sound is called as audio. A sound is a waveform. It is produced when waves of varying pressure travel though a medium, usually air. It is inherently an analogous phenomenon, meaning that the changes in air pressure can vary continuously over a range of values. 217 The sound recorded on an audio tape through a microphone or from other sources is in an analogue (continuous) form. The analogue format must be converted to a digital format for storage in a computer. This process is called digitizing. The method used for digitizing sound is called sampling. 218 Digitizing Sampling 219 Digital audio represents a sound stored in thousands of numbers or samples. The quality of a digital recording depends upon how often the samples are taken. Digital data represents the loudness at discrete slices of time. It is not device dependent and should sound the same each time it is played. It is used for music CDs. 220 The sampling rate determines the frequency at which samples will be drawn for the recording. The number of times the analogue sound is sampled during each period and transformed into digital information is called sampling rate. The most common sampling rates used in multimedia applications are 192 KHZ, 96 KHZ, 48KHZ, 44.1 KHZ, 22.05 KHZ and 11.025 KHZ. 221 222 Sampling at higher rates more accurately captures the high frequency content of the sound. Higher sampling rate means higher quality of sound. However, a higher sampling rate occupies greater storage capacity. Conversion from a higher sampling rate to a lower rate is possible. 223 224 Sampling rate and sound bit depth are the audio equivalent of resolution and color depth of a graphic image. Bit depth depends on the amount of space in bytes used for storing a given piece of audio information. Higher the number of bytes higher is the quality of sound. Multimedia sound comes in 8-bit, 16-bit, 32-bit and 64-bit formats. An 8-bit has 28 or 256 possible values. A single bit rate and single sampling rate are recommended throughout the work. An audio file size can be calculated with the simple formula: 225 226 Bit Rate refers to the amount of data, specifically bits, transmitted or received per second. It is comparable to the sample rate but refers to the digital encoding of the sound. It refers specifically to how many digital 1s and 0s are used each second to represent the sound signal. This means the higher the bit rate, the higher the quality and size of your recording. For instance, an MP3 file might be described as having a bit rate of 320 kb/s or 320000 b/s. This indicates the amount of compressed data needed to store one second of music. 227 Example: The standard audio CD is said to have a data rate of 44.1 kHz/16, implying the audio data was sampled 44,100 times per second, with a bit depth of 16. CD tracks are usually stereo, using a left and right track, so the amount of audio data per second is double that of mono, where only a single track is used. The bit rate is then 44100 samples/second x 16 bits/sample x 2 = 1,411,200 bit/s or 1.4 Mbit/s. 228 Mono sounds are flat and unrealistic compared to stereo sounds, which are much more dynamic and lifelike. However, stereo sound files require twice the storage capacity of mono sound files. Therefore, if storage and transfer are concerns, mono sound files may be the more appropriate choice. 229 Formula for determining the size of the digital audio is given below: Monophonic = Sampling rate × duration of recording in seconds × (bit resolution/8) × 1 Stereo = Sampling rate × duration of recording in seconds × (bit resolution/8) × 2 230 Analogue verses Digital: There are two types of sound – analogue and digital. Analogue sound is a continuous stream of sound waves. To be understood by the computer, these sound waves must be converted to numbers. The process of converting analogue sounds into numbers is called digitizing or sound sampling. Analogue sounds that have been converted to numbers are digital sounds. When we are working with digital sound, we call it audio. 231 Therefore, sound that has been converted from analogue to digital is often called digital audio sounds. Non-destructive sound processing methods maintain the original file. A copy of the original file can be manipulated by playing it louder or softer, combining it with other sounds on other tracks, or modifying it in other ways. 232 Once a sound has been recorded, digitized, processed, and incorporated into a multimedia application, it is ready to be delivered. So that you can hear it through your speakers, the digital sound is sent through a digital-to-analogue converter (DAC). 233 In digital recording, digital sound can be recorded through microphone, keyboard or DAT (Digital Audio Tape). To record with the help of a microphone connected to a sound card is avoided because of sound amplification and recording consistency. Recording on a tape recorder after making all the changes and then through sound card is recommended. 234 Sound Editors are responsible for creating sound, transforming file formats and enhancing the quality of sound by cutting the noise. Sound Edit 16, Cool Edit and Sound Forge are three commonly used sound editors in multimedia applications. Sound Edit 16 allows a user to record, edit and transform digital audio effortlessly in a lesser time whereas Cool Edit is a low cost software that is easy to use giving a reasonably good quality of sound. On the other hand, Sound Forge is regarded as the best software for audio recording and 235 editing for PCs. Read below the various sound editing operations used in multimedia operations: One of the first sound editing operations is to delete any blank space from the beginning and end of the recording. This is called trimming. Using this function a sound editor can also remove any extraneous noises that might have crept in while recording. One of the most important functions that a software should do is to perform multiple tasks. It should be capable to combine and edit multiple tracks, merge and export them 236 in a single audio file. Volume adjustment is another important task when a sound recorder is trying to combine ten tracks into a single track as the tracks may have different volume. Sound editors have to perform format conversion when the digital audio editing software read a format different from the format read by the presentation program. If the sound editor edits and records sounds at 16 bit sampling rates but is using lower rates then he must resample or down sample the file. 237 Some editing software provide digital equalization capabilities that let a sound editor revise a recording frequency content so that it sounds brighter or darker. Some programs allow a sound editor to process the signal with reverberation, multi-tap delay, and other special effects using Digital Signal Processing routines. To produce a surreal sound, sound editors reverse all or a portion of a digital audio recording. Advanced programs let you alter the length of a sound file without changing its pitch. This feature can be very useful but watch out because most time stretching algorithms will severely degrade the audio 238 quality. Audio file formats are formats for storing audio data on a computer system. Generally they are container formats or audio data formats with a defined storage layer, but can be a raw bit stream too. The storage of digital audio involves sampling the audio voltage, which would correspond to a certain level of signal in a channel with a particular resolution in regular intervals, on playback. The data can then be stored uncompressed or compressed. Compression reduces the file size. 239 To distinguish between a file format and a codec becomes essential here since a codec encodes and decodes the raw audio data and specific audio file format stores the data in a compatible file. Most of the audio file formats are created with one or more encoders or codecs. Generally the audio file formats support only one type of audio data that is created with an audio coder but a multimedia container format may support multiple types of audio and 240 video data. Audio file formats are classified into three major groups that include: Uncompressed audio formats like WAV, AIFF, AU or raw header-less PCM. Lossless compressed audio formats like FLAC, Monkey’s Audio, WavPack, Shorten, Tom’s lossless Audio Compressor, TTA, ATRAC Advanced Lossless, Apple Lossless, MPEG-4 SLS, MPEG-4 ALS, MPEG-4 DST and Windows Media Audio Lossless. Lossy compressed formats like MP3, Vorbis, Musepack, 241 AAC,ATRAC and lossy Windows Media Audio. 1- Uncompressed audio files are digital representations of soundwave, which are most accurate. However, it can be a resource-intensive method of recording and storing digital audio, in terms of storage and management. They are generally the master audio formats of choice as they are suitable for archiving and delivering audio at high resolution due to their accuracy. They are also suitable when working with audio at a professional level. 242 a- Pulse Code Modulation ) PCM (is the major uncompressed audio format that is stored as a.wav on Windows or as.aiff on Macintosh operating systems. The aforementioned formats are flexible file formats designed for the storage of any combination of sampling rates or bitrates. The AIFF format is based on the IFF format, and the WAV format on the RIFF file format that is similar to the IFF format. These formats are working formats within many audio, video and multi-media applications, and Digital Audio Workstations employed for professional audio production and editing. 243 The listed uncompressed audio file types are ‘wrapper’ formats that use PCM audio and add additional data to enable compatibility with specific codecs and operating systems. Although some of them were developed for specific platforms, they have open-source codecs that are available for all standard operating systems like Windows, MacOS and Linux. The different types include: 244 B- Microsoft Wave format, commonly known as WAV, is the most widely used uncompressed format. All forms of WAV are PCM wrap formats with.wav as extension and store the audio data in a basic form. To provide compatibility with non-PCM audio streams, the wrapper has been altered over time. WAV (Waveform Audio File Format): It is a flexible format, capable of storing very high quality audio but can’t hold any metadata describing its audio contents and 245 its file size is limited to 4 GB. BWF (Broadcast Wave Format): This format includes an extra header file that contains metadata about the audio and synchronization information (BEXT chunk). It is the default audio format of some non-linear digital audio/video workstations and has a file size limited to 4 GB. MBWF (Multichannel Broadcast Wave Format): The recent evolution of Broadcast WAV, MBWF has RF64 audio with a BEXT chunk. 246 It contains up to 18 simultaneous streams of surround audio, non-PCM data streams and a stereo ‘mixdown’. The 64-bit address header extends the file size to over 18 billion GB. AIFF: Audio Interchange File Format was developed by Apple and Amiga and is the native format for audio on Mac OSX. 247 All uncompressed audio formats share some of the technical attributes like: Bit Depth – 8-bit, 12-bit, 16-bit and 24-bit. Sampling Rate -

Multimedia - PDF

Document Details

Tags

Related

Summary

Full Transcript