High Volume Functional Validation - Master's Dissertation PDF

HIGH VOLUME FUNCTIONAL VALIDATION A dissertation Submitted in partial fulfillment of the requirements for the award of the degree of Master of Technology in Embedded System...

HIGH VOLUME FUNCTIONAL VALIDATION A dissertation Submitted in partial fulfillment of the requirements for the award of the degree of Master of Technology in Embedded System Design Submitted by Nigu Kumari (32219106) Under the supervision of Dr. Niraj Pratap Singh (Associate Professor, Department of ECE, NIT KURUKSHETRA) & Mr. Upender Cherukupally (Senior Staff Engineer, Qualcomm India Private Limited, Hyderabad) SCHOOL OF VLSI DESIGN AND EMBEDDED SYSTEM NATIONAL INSTITUTE OF TECHNOLOGY KURUKSHETRA, HARYANA-136119 June 2024 © NIT Kurukshetra CANDIDATE’S DECLARATION This is to certify that the work which is being presented in dissertation titled “High Volume Functional Validation” for the partial fulfillment towards the award of degree of Master of Technology in Embedded System Design of National Institute of Technology, Kurukshetra, is an authentic record of my own work carried out during the period from August 2023 to June 2024, under the supervision and guidance of Dr. N.P Singh, Associate Professor , Department of Electronics and Communication Engineering, National Institute of Technology Kurukshetra, Haryana, India and Mr. Upender Cherukupally, Senior Staff Engineer, SVE, Qualcomm India Private Limited. The material embodied in present dissertation is original and has not been submitted for the award of any other degree or diploma of any University. (Nigu Kumari) Roll No.:32219106 Embedded System Design CERTIFICATE This is to certify that the above statement made by the candidate is correct to the best of our knowledge. Mr. Upender Cherukupally Dr. N.P Singh Senior Staff Engineer Associate Professor, ECE Qualcomm India Private Limited National Institute of Technology Hyderabad Kurukshetra ii ACKNOWLEDGEMENT I am deeply grateful to my Manager, Mr. Upender Cherukupally, Senior Staff Engineer at SVE, Qualcomm India Private Limited, and my academic supervisor, Dr. Niraj Pratap Singh, Associate Professor in the Department of ECE at the National Institute of Technology Kurukshetra. Their unwavering guidance, encouragement, and understanding have been invaluable, and his lessons in patience have greatly benefited my endeavor. My association with him has extended beyond academics, offering me the chance to collaborate with a true expert in Embedded System Design. I also wish to extend my heartfelt gratitude to Prof. R.K. Sharma, Coordination of School of VLSI Design and Embedded system, and Prof. B.V. Ramana Reddy, Director of our college at the National Institute of Technology Kurukshetra. Their insightful guidance, support, and knowledge have been crucial in my field of work. Moreover, I am thankful to my friends for their encouragement and motivation, which played a significant role in the completion of this task. Finally, I appreciate everyone who has helped me, directly or indirectly, in the successful completion of this project. Nigu Kumari Roll No.:32219106 iii ABSTRACT Post-silicon validation is a critical phase in the development of System-on-Chip (SoC) devices, ensuring that the hardware functions correctly and performs reliably after fabrication. Despite rigorous pre-silicon verification, certain issues remain undetected due to the inherent limitations of simulation and emulation environments. These issues include rare and intermittent failures that can significantly impact the device's performance and user experience. High-Volume Validation (HVV) addresses these challenges by employing extensive testing across numerous SoC units to uncover these elusive defects. This dissertation presents the application and effectiveness of HVV in post-silicon validation processes. It provides the systematic approach of receiving large quantities of SoC devices, flashing new firmware, validating for defective components, and executing comprehensive regression tests within a continuous integration framework using Jenkins. The compiled test results are analyzed using Splunk, providing a detailed overview of the data and insights into the validation process. This method not only improves the detection rate of post-silicon defects but also ensures secure and reliable firmware and software updates. The outcomes of HVV are highly promising, demonstrating a 0% failure rate and an Overall Failure Rate (OFR) of less than or equal to 0.5, indicating 100% content stability. These results show a significant enhancement in the detection and rectification of post-silicon defects, thereby substantially improving the quality and reliability of SoC devices. The future scope of post-silicon validation includes integrating advanced machine learning algorithms to predict and detect defects more efficiently, further automating the validation process, and expanding the HVV methodology to accommodate increasingly complex SoC architectures. This evolution will not only streamline the validation process but also ensure the continual improvement of SoC device quality and reliability. iv TABLE OF CONTENTS CANDIDATE’S DECLARATION……………………………………………………………II CERTIFICATE………………………………………………………………………………...II ACKNOWLEDGEMENT…………………………………………………………………….III ABSTRACT…………………………………………………………………………………….IV LIST OF FIGURES………………………………………………………………………….VIII LIST OF TABLES…………………………………………………………………………......X LIST OF ABBREVIATIONS………………………………………………………………...XI CHAPTER 1 1. INTRODUCTION 1.1. OVERVIEW………………………………………………………………………….. 1 1.2. BACKGOUND……………………………………………………………………….. 1 1.3. ASIC DESIGN FLOW………………………………………………………………. 2 1.3.1. System Specification……………………………………………………………….. 2 1.3.1.1. Product Definition…………………………………………………………….. 3 1.3.1.2. High-level Design Specification………………………………………………. 3 1.3.1.3. Low-level Specification…………………………………………………………4 1.3.2. Architectural Design………………………………………………………………... 5 1.3.2.1. Architectural Exploration…………………………………………………….. 5 1.3.2.2. Architectural Level Design…………………………………………………… 6 1.3.2.2.1. Transaction Level Modelling……………………………………………….. 6 1.3.2.2.2. High-level Synthesis………………………………………………………..... 7 1.3.2.2.3. Architectural Design Flow………………………………………………….. 7 1.3.3. Logic/RTL Design Flow..…………………………………………………………… 8 1.3.4. Verification…………………………………………………………………………... 9 1.3.5. Gate-level Design...…………………………………………………………………..10 1.3.5.1. Synthesis Flow...………………………………………………………………..11 1.3.6. Physical Design……………………………………………………………………... 12 1.3.7. Silicon Fabrication and Validation……………………………………………….. 13 1.4. HIGH VOLUME VALIDATION...………………………………………………… 16 v 1.4.1. JTAG……………………………………………………………………………....... 16 1.4.2. Serial Wire Debug (SWD) Standard……………………………………………… 17 1.5. INTERNAL DPPM VALIDATION (IDV).………………………………………... 18 1.6. DSF VALIDATION (DDR SYSTEM FIRMWARE)...…………………………….20 1.7. OBJECTIVES…………………………………………………………………………21 1.8. METHODOLOGY……………………………………………………………………22 1.9. DISSERTATION ORGANIZATION.……………………………………………….23 CHAPTER 2 2. LITERATURE REVIEW……………………………………………………………........24 2.1. POST SILICON VALIDATION METHOD………………………………………...27 2.1.1. Post-silicon milestones……………………………………………………………….28 2.2. SERIAL PERIPHERAL INTERFACE……………………………………………...29 CHAPTER 3 3. SOC VALIDATION FOR THE CONTENT STABILTY 3.1. FLASHING…………………………………………………………………………... 31 3.2. SETTING UP THE REGRESSION TESTS………………………………………. 32 3.2.1. Create a Jenkins Job……………………………………………………………...... 32 3.2.2. Add regression tests..……………………………………………………………….. 32 3.2.3. Test reporting.………………………………………………………………….…....34 3.2.4. Integration with other tools.…………………………………………………….......34 3.2.5. Schedule builds...…………………………………………………………………….34 3.3. COMPILE THE RESULTS...………………………………………………………..35 3.3.1. Splunk…..…………………………………………………………………………… 36 3.3.1.1. Splunk dashboard.……………………………………………………………..36 3.3.1.2. Operations on Splunk dashboard……………………………………………..36 3.3.2. Python Libraries..…………………………………………………………………...38 3.3.2.1. argparse library.………………………………………………………………. 38 3.3.2.2. Pandas library.…………………………………………………………………39 3.3.2.3. OS library…..……………………………………………………………….….40 vi 3.3.2.4. Tkinter library.………………………………………………………………....41 CHAPTER 4 4. SOFTWARE AND TOOLS USED 4.1. CORE DEVELOPMENT PLATFORM (CDP)……………………………………..42 4.1.1. Components of a CDP……………………………………………………………….43 4.1.2. Testing scenarios……………………………………………………………………..43 4.1.3. Software Development……………………………………………………………....43 4.1.4. Prototyping and Iterations………………………………………………………….43 4.2. HANDLER…………………………………………………………………………….44 4.2.1. Advantages of Handlers……………………………………………………………..44 4.3. JENKINS………………………………………………………………………………45 4.4. JIRA……………………………………………………………………………………46 4.5. CONFLUENCE……………………………………………………………………….47 4.6. VISUAL STUDIO CODE…………………………………………………………….49 4.7. LINUX…………………………………………………………………………………50 4.8. SUMMARY……………………………………………………………………………51 CHAPTER 5 5. RESULTS & DISCUSSIONS…………………………………………………………….52 CHAPTER 6 6. CONCLUSION & FUTURE SCOPE 6.1. CONCLUSION………………………………………………………………………..55 6.2. FUTURE SCOPE..……………………………………………………………………55 REFERENCES…………………………………………………………………..57 PUBLICATIONS………………………………………………………………...61 vii LIST OF FIGURES Figure 1.1 Asic Design flow…………………………………………………………………… 2 Figure 1.2 System Specification……………………………………………………………….. 3 Figure 1.3 SOC High-Level Specification Example…………………………………………… 4 Figure 1.4 Low-Level Specification……………………………………………………………. 5 Figure 1.5 Architectural Exploration…………………………………………………………… 6 Figure 1.6 Architectural-Level Design Flow…………………………………………………… 8 Figure 1.7 Logic/RTL Design Flow……………………………………………………………..10 Figure 1.8 Verification Design Flow…………………………………………………………….11 Figure 1.9 RTL Synthesis………………………………………………………………………..12 Figure 1.10 Synthesis Flow………………………………………………………………………13 Figure 1.11 Physical Design Flow………………………………………………………………..14 Figure 1.12 JTAG Based System…………………………………………………………………17 Figure 1.13 Serial Wire Debug………………………………………………………………….. 18 Figure 1.14 DDR Configuration and Validation Suite………………………………………….. 21 Figure 2.1 Silicon on Board………………………………………………………………………28 Figure 2.2 Serial peripheral interface Protocol…………………………………………………...29 Figure 3.1 Flash Memory………………………………………………………………………...30 Figure 3.2 Flash Memory Controller……………………………………………………………. 30 Figure 3.3 Creating a Jenkins Job……………………………………………………………….. 32 Figure 3.4 Build the Job…………………………………………………………………………. 33 Figure 3.5 Display Test Results…………………………………………………………………. 33 Figure 3.6 Splunk Enterprise……………………………………………………………………. 35 viii Figure 3.7 Display the Statistics………………………………………………………………… 36 Figure 3.8 Splunk Visualization………………………………………………………………… 36 Figure 3.9 Example of argparse Library………………………………………………………… 38 Figure 3.10 Example of Pandas Library……………………………………………………….... 38 Figure 3.11 Example of OS Library………………………………………………………………39 Figure 3.12 Example of Tkinter Library………………………………………………………… 40 Figure 4.1 Core Development Platform………………………………………………………… 41 Figure 4.2 Handler………………………………………………………………………………. 44 Figure 4.3 Stages in Jenkins Pipelining…………………………………………………………. 45 Figure 4.4 Jira Ticket……………………………………………………………………………. 46 Figure 4.5 Confluence Page………………………………………………………………………47 Figure 4.6 Design of Linux OS………………………………………………………………….. 49 ix LIST OF TABLES Table 5.1 13E FEOL Mask Removal……………………………………………………………..52 Table 5.2 Failure rate for 13E FOEL Mask Removal Parts………………………………………52 Table 5.3 Retest result for failure parts…………………………………………………………..53 Table 5.4 Final results of 13E…………………………………………………………………….53 Table 5.5 Qultivated Parts………………………………………………………………………..53 Table 5.6 Latest readout for qultivate parts………………………………………………………53 Table 5.7 Readout for IST PMIC release…………………………………………………………54 Table 5.8 Readout for IST DDR release…………………………………………………………54 x LIST OF ABBREVIATIONS SOC System on chip RISC Reduced instruction set Computing DSP Digital signal processor HDL Hardware Description language RTL Register transfer logic ESL Electronic System Level TLM Transaction Level Modelling HLS High Level Synthesis IP Intellectual Property FPGA Field programmable gate array ASIC Application Specific Integrated Circuit PD Physical Design HVV High Volume Validation DPPM Defect parts per Million PVT Process Variation and Temperature SLT System Level Test JTAG Joint Test Action Group PCB Printed Circuit Board TAP Test Action Port DDR Double Data Rate SDRAM Synchronous Dynamic Random-Access memory SWDIO Serial Wire Debug I/O SWCLK Serial Wire Debug Clock ARM Advanced RISC Machine IDV Internal DPPM Validation DSF DDR System Firmware SD Secure Digital USB Universal Serial Bus ECU Electronics Control Unit CI Continuous Integration xi API Application Programming Interfaces CDP Core Development Platform LCD Liquid Crystal Display IC Integrated Circuits CI Continuous Integration CD Continuous Development SLS Service Level Agreements VS Visual Studio LAMP Linux, Apache, MySQL, PHP XAMPP cross-platform Apache, MariaDB, PHP, Perl OFR Overall Failure Rate POC Point of Contact SS Sub-System CPUSS Central Processing Unit Sub-System ATM Address Translation Mechanism EWS Electrical Wafer Sorting IJTAG IEEE 1687 Internal Joint Test Action Group I2C Inter Integrated Circuit SPI Serial Peripheral Interface UART Universal Asynchronous Receiver Transmitter MPCU Multi-Protocol Conversion unit COSE Conversion Select PMIC Power Management Integrated Circuits IST Initial System Test xii Chapter 1 INTRODUCTION This chapter covers the fundamental theory related to ASIC flow design, post-silicon validation, and high-volume validation. It explains the processes involved, how high-volume validation is achieved, and the specific activities undertaken within HVV. 1.1 OVERVIEW High Volume Validation (HVV) stands as a crucial phase in the semiconductor industry's design and manufacturing process, ensuring the quality, reliability, and performance of integrated circuits (ICs) produced in large quantities. This validation process occurs post-manufacturing, following tape out, and aims to detect and rectify any defects or issues that may have arisen during chip fabrication. The primary goal of HVV is to validate that semiconductor devices meet the required specifications and standards, guaranteeing their functionality, performance, and reliability. HVV encompasses a range of validation activities, including content stability, Internal Defective Parts per Million (DPPM) Validation (IDV), Design-for-Serviceability (DSF) validation, temperature testing, and more. These activities are vital for verifying various aspects of ICs, such as functionality, timing, power consumption, electrical characteristics, and overall performance. HVV teams employ sophisticated automation tools and testing methodologies to execute these validation activities efficiently and effectively. Automation platforms like Jenkins enable distributed and scalable testing, allowing simultaneous testing on numerous devices. Moreover, tools such as Splunk are used for storing and analyzing the vast amounts of testing data generated during the validation process. The successful implementation of HVV is paramount for semiconductor companies to ensure the quality, reliability, and performance of their products. By utilizing robust validation methodologies and cutting-edge automation technologies, HVV teams can effectively validate semiconductor devices at high volumes, thereby minimizing defects, reducing time-to-market, and enhancing overall product quality and customer satisfaction. 1.2 BACKGROUND In the semiconductor industry, the relentless pursuit of innovation and the demand for high- performance, reliable integrated circuits (ICs) have necessitated rigorous validation processes. 1 High Volume Validation (HVV) is a pivotal phase in the design and manufacturing lifecycle of semiconductors, playing a critical role in ensuring that the ICs produced in large quantities meet stringent quality and performance standards. This process becomes especially crucial post- manufacturing, following the tape-out stage, where the design is finalized and sent for fabrication. The primary objective of HVV is to identify and correct any defects or issues that may have arisen during the fabrication process. These defects, if left undetected, can significantly impact the functionality, performance, and reliability of the semiconductor devices, leading to potential failures in the field and decreased customer satisfaction. HVV ensures that the ICs adhere to the required specifications and standards, thereby guaranteeing their operational integrity and reliability in real-world applications. 1.3 ASIC DESIGN FLOW In Figure 1.1, flow starts with specification created by the architects based on marketing inputs. The architect evaluates the system using software model and finalized the specification and architecture. Based on the specifications engineer design, verify, and implement the asic and sent to fabrication. Once asic is fabricated it is characterized and tested before releasing to the target. System Specification Architectural-level Design Logic Design Verification Gate-level Design Physical Design Silicon Fabrication & Validation Figure 1.1 ASIC design flow 2 1.3.1. System Specification In Figure 1.2, the specification defines the details of the design. For example, the System on chip (SOC) uses a 32-bit reduced instruction set computing (RISC) processor and a digital signal processor (DSP) with peripherals like UART, USB and includes flash, RAM, and ROM memories. It defines 3 stages which involves product definition, high level specification and low-level or micro-architecture. System Specification Product Definition Architectural-level Design High-level Specification Logic Design Low-level Specification Verification Gate-level Design Physical Design Silicon Fabrication & Validation Figure 1.2 System Specification 1.3.1.1.Product definition Any product develops need to be assist by marketing team. Marketing team derive the requirements and finds out how we can edge over other competitors. This process defines product requirements. Product team defines the high-level requirement. For example, SOC should work at 600 MHz for a given application and should support audio-video application. Architects converts 3 the product requirements into high-level specification which is used by Hardware and software engineers. Evaluation of the overall system will be done at the architectural phase where the system level model is returned and evaluated for performance and functionality. The Engineers works with the system architects and break-down the high-level specifications into the low-level specifications. 1.3.1.2.High-level Design Specification It is created by the architects, detailed out the functional requirements of the system defined by the marketing team. Here, partitioning of the hardware and software is carried out to define what is handled in hardware and software. It also defines the frequency of operations, technological node, memories, performance parameters and software use cases like in Figure 1.3. Architects and engineers find out the reusability of the core or hardware logic blocks which can produce development cycle of SOC. Once the above parameter is defined architecture exploration is done for hardware and software using system C or C. SoC Processor 32 bits Data Bus Support 64 Clock Speed 600 MHz Internal RAM 16 MB Flash 256 MB Supply Voltage 1.2 V Pin Count 150 Figure 1.3 SOC High-level Specification Example 4 1.3.1.3.Low-Level Design Specification The hardware team adopts a top-down methodology, beginning with high-level specifications and progressively refining them to low-level specifications. This process involves deriving implementation details for analog and digital blocks, including clock distribution, power domain management, and interaction protocols at the block level which is given in Figure 1.4. These specifications ensure that the design meets performance, power, and interface requirements. Once the low-level specifications are defined, hardware engineers proceed to develop the Register Transfer Level (RTL) design using Hardware Description Languages (HDL) such as Verilog or VHDL. This RTL design serves as a detailed blueprint for the digital logic that will be implemented on the chip. It defines the behavior of the circuitry at a level that can be synthesized into actual hardware components. This meticulous planning and design phase are critical for ensuring that the final chip meets its functional and performance goals. Flash I/F F USB UART L CPU System A Sub-system Controller Peripheral Bus S Interconnect H System Bus Interconnect AV Engine DSP RAM GPIO Boot RA Mux ROM M Figure 1.4 Low-level specification 5 1.3.2. Architectural Design In this phase, system specification is translated into various system architectures. This helps engineers to choose those architectures which has good trade-off between area, power, and performance. 1.3.2.1. Architecture Exploration In this phase, system architects and SOC designers create abstract models using C, C++, or system C as per specification. This model should be functionally accurate as per the specification. In Figure 1.5 decision was made based on the various details of the various exploration to partition the hardware and software aspects of the system. Electronic system-level (ESL) helps the architects to further optimize the functionality of system level module’s designers can leverage the ESL designing by using transaction level modeling which is referred to as TLM to choose between various macro-architecture and micro-architecture. Application Software System Software Flash I/F F USB UART L CPU System A Subsystem Controller Peripheral Bus S Interconnect H System Bus Interconnect AV Engine DSP RAM GPIO Boot RA Mux ROM M Figure 1.5 Architectural Exploration 6 1.3.2.2 Architectural-level Design Architectural exploration is done in two ways by TLM or HSL (High level synthesis). 1.3.2.2.1 Transaction Level Modelling Its emphasis more on the data transfer and less on actual implementation i.e. pin level transaction details. It provides much flexibility for SOC designers to create a hardware generic modelling without focusing much on the actual hardware. Such models are useful during virtual prototyping on which the system software tested and validated. The time taken for system software to develop and validate which might takes several iterations is greatly reduced with availability of early hardware virtual porotypes. Software Developer need not to wait to actual hardware to functionally validate the software against the specification. C, C++, or system C model develop to mimic the hardware functionality are refined further & are synthesis using HLS tool to generate functionally RTL equivalent description. 1.3.2.2.2 High Level Synthesis HLS is a process which convert an untimed C++ or system C description of hardware module into an RTL module description. RTL can be in Verilog HDL or VHDL. User can define the hardware at an abstract level without bothering the clock boundaries, state machines, target-technology, memories, and hardware register manage. Note that not all C, C++ or system C can be synthesized into a good RTL description. The designer should model the abstract-level code with hardware in perspective and use only those system C or C++ constructs which can be synthesized to RTL. For example, software can use large arrays with infinite storage and fast access times. Hardware must consider memory ports, access conflicts and reusing resources. The modern embedded system Involves number of software elements along with multiple embedded processors, DSP, custom logic, and memories. Conventional coding of these logic in Verilog HDL or VHDL is quite tedious. The C++ or system C programming of complex algorithm and custom hardware blocks further processed by automated flow of HLS tool has made a perfect solution for SOC development. Likewise, the programming of different boundaries of software and hardware is evaluated in terms of power, performance, and area without any trade-off. Reuse of behavioral IP is added advantage apart from deduction in line count when compared to RTL. These Behavioral Ips can be re-targeted to diverse realization and architectures various RTL IPs have constraints of fixed architectures. Any changes in micro-architecture in RTL means a lot of 7 pre-work in terms of code, re-write of code and subsequent functional verification and annotation. It reduces the code complexity. 1.3.2.2.3 Architectural-level Design Flow System Specification Architectural Exploration (ESL) Architectural-level Design Behavioral Virtual Prototyping Design(HLS) Logic Design Explore Architecture App Software Development Verification Refine Microarchitecture System Software Development Gate-level Design Rescheduling Software Validation Physical Design Analyze Results Silicon Fabrication & Validation RTL Generation Figure 1.6 Architectural-level Design Flow In behavioral Design (HLS) the designer creates C++ or system C models of the system. Programmers require to use only those code which are synthesizable by HLS tool. To explore various architecture based on area and timing requirement of the design. As said in HLS tool allow the designer to create the different configurations of the same C++ or system C model allowing them to experiment with area, clock-speed, power & latency. These enables the designer to choose the best architecture that fixes specification. During these phase designers can visualize the data flow and control flow of design enabling the designer to make decisions changes to the logic. The architecture can be further refined to improve the quality of the results as shown in Figure 1.6. Inlining of system C or C++ can result in much faster logic but lead to use of redundant resources. So, consume more area. Similarly, multiplexers can be shared between cycles by two or more threads, thereby reducing the area. Such manipulation at the micro-architecture level gives more 8 option to the designer to try and choose the best architecture which gives the better quality of the result. Scheduling in HLS flow is where all the micro-architecture refinement and constraints are processed. The schedular takes in the clock-speed and resource sharing inputs and then tries to create optimize data-path and control path logic of the design. Schedular also consider the technology library for asic or FPGA implementation. Thus, creating optimize RTL for specific application. When scheduling is done, it will provide detailed report of area, power, and timing. Based on the results engineers can make further refinement of the architecture to get the desired or better quality of results. After engineers satisfied with the QoR, RTL is generated. Typically, control path is consisting of FSM and glue logics for data-path. 1.3.3 LOGIC/RTL DESIGN After defining the low-level specifications, designers proceed to specify the registers within the design and describe how data is transferred and operated on within these registers. The RTL (Register Transfer Level) description defines the control flow for data transfer and operations. It starts with the design entry of the specifications and ends with hardware generation as shown in Figure 1.7. It is written according to specific coding guidelines and is validated with tools to ensure it meets the required standards for the next level of processing. A high-quality RTL design leads to an optimized gate count, which is crucial for efficient chip design. Before starting the RTL design, each aspect of the low-level specification is carefully analyzed, and an optimal design plan is devised. Once the RTL design is completed, it undergoes RTL validation to ensure that it can be successfully converted into a gate-level design. RTL validation is crucial as it ensures that the RTL design accurately reflects the intended behavior of the hardware. It involves simulating the RTL code to verify that it performs the specified operations correctly and meets the design requirements. By detecting and resolving issues at this stage, designers can avoid costly errors later in the design process. Additionally, RTL validation helps ensure that the design can be successfully synthesized into a gate-level net list for implementation in hardware. 9 RTL Design Flow Design Entry Testbench in VHDL Design Validation Compilation and Synthesis Post-synthesis Timing Analysis Simulation Hardware Generation Figure 1.7 Logic /RTL Design Flow 1.3.4 VERIFICATION ASIC verification is the process of testing and verifying the design of an application-specific integrated circuit (ASIC) to ensure that it functions correctly and meets its specifications as shown in Figure 1.8. This verification process is crucial during the ASIC design process and can consume as much as 70-80% of the total ASIC design and verification time. Without verification one cannot guarantee that the block or IPs are functionally correct. Verification starts with the specification. Once specification is understood, we move on to verification strategy such as identifying the custom blocks for unit level verification and deciding the verification techniques and methodology, tool chain & sign-off criteria. Once, we have strategy in place, we start creating test plan. A typical test plan, list the functionality of each block in hierarchical manner, the stimulus plan to test the feature and details of matrix such as functional coverage that will be collected to decide on the verification closure. It also lists the constraints and 10 design assumption that the design engineer made. If the SOC is verified using multiple verification techniques, then it also lists the functionality that will be used by other techniques. Test plan is created for each block of SOC. It can be unit-level or top-level test plan. System Specification Specification Study Architectural-level Design Verification Strategy Logic Design Test Plan Verification Test Creation Gate-level Design Regression Physical Design Closure Metric Analysis Silicon Fabrication & Validation Verification Sign-off Figure 1.8 Verification design flow 1.3.5 GATE-LEVEL DESIGN RTL synthesis, also known as logic synthesis, is a fundamental step in the design flow of digital circuits. It involves translating a high-level description of the desired circuit behavior, often described in an RTL language like Verilog or VHDL, into a gate-level netlist as shown in Figure 1.9. This netlist represents the physical implementation of the circuit using logic gates, flip-flops, and other standard cells. During RTL synthesis, the tool analyzes the RTL description and performs various optimizations to improve the design's area, power consumption, and performance. These optimizations include logic restructuring, technology mapping, and resource sharing. The goal is to achieve a design that 11 meets the specified functionality while minimizing area and power consumption and maximizing performance. RTL synthesis is a critical step because it bridges the gap between the abstract RTL description and the physical implementation of the circuit. It enables designers to explore different design options and trade-offs early in the design process, helping to ensure that the final design meets the project's requirements. Additionally, RTL synthesis plays a crucial role in the overall design flow, as the output of the synthesis process is used as input for further stages, such as placement and routing, which ultimately lead to the fabrication of the physical chip. Therefore, the quality of the RTL synthesis directly impacts the final chip's performance, area, and power characteristics. Register Transfer Level Bug Found Verification Technology Library(.lib) RTL Logic Synthesis Design Constraints Netlist Simulation & Formal Equivalence (sdc) Mismatch Figure 1.9 RTL Synthesis 1.3.5.1. Synthesis Flow The process of expanding of all the modules in the RTL description into in it objects is known as elaboration. It also involves evaluation and propagation of ports constraints and parameters throughout the RTL description. In next phase, logic is optimized using algebraic or Boolean manipulation techniques. Many optimizations algorithm could be employed here based on the 12 number of combinational and sequential logic involved. The output is optimized but in technology independent circuit. The whole flow of the process shown in Figure 1.10. Technology Dependent Synthesis: Optimization is done by breaking down the logic circuit, reducing the number of logic cells. The final output is a gate-level netlist that can be used for further steps in the design flow, such as placement and routing. Why RTL Synthesis Matters? Efficiency: By converting RTL to gate-level logic, we can optimize the design for area and performance. Automation: Manual gate-level design is time-consuming and error-prone. Synthesis tools automate this process. Integration: The gate-level netlist integrates seamlessly with other design tools (such as place- and-route tools) to create a complete physical layout. Elaboration Technolog y Translation library(.lib) RTL Technology Independent Stage Design Constraints Technology Dependent Synthesis (sdc) Netlist Figure 1.10 Synthesis Flow 13 1.3.6 PHYSICAL DESIGN At this stage, the synthesized level netlist which is in logical form is implemented as transistor layer layout as shown in Figure 1.11. The logical cells are implemented as physical geometries within the given area which can be manufactured in silicon. Physical design consist of logical cells, digital macros and analog macros are implemented as physical geometries that should meet the design rule specified by silicon fabricators called as foundry. Layout implement using technology node such as 40 nm, 28 nm, and 16 nm where the node integrates the length of transistor gate. The technology libraries provided by fabrication houses play a crucial role in the physical design process. These libraries include information about silicon wafers, standard cells, and layout rules (such as design rule checks, or DRC). The choice between full custom and semi-custom design depends on the project requirements. Figure 1.11 Physical Design Flow 14 1.3.7 SILICON FABRICATION AND VALIDATION Fabrication is a process in which silicon wafers transform into intricate integrated circuits. Crystal Growth and Wafer Preparation: The process begins with a silicon ingot, which is essentially a cylindrical crystal of pure silicon. Here’s a step-by-step breakdown:  Crystal Growth: Pure silicon is melted in a crucible at temperatures around 1400°C. A small seed crystal with the desired orientation (usually a single crystal) is carefully inserted into the molten silicon.  Seed Pulling: The seed crystal is slowly pulled out of the molten silicon, allowing it to solidify and form a cylindrical ingot. This ingot serves as the raw material for wafers.  Wafer Slicing: The silicon ingot is sliced into thin, flat discs called wafers. These wafers are typically around 200-300 micrometers thick (about the thickness of a human hair). Epitaxy: During epitaxy, additional silicon layers are grown on the wafer surface. Epitaxy involves depositing a thin layer of silicon (or other semiconductor material) onto the wafer. This layer can be doped (intentionally contaminated with specific atoms) to create regions with varying electrical properties. Epitaxial layers are commonly used for forming transistor channels, creating isolation layers, and enhancing device performance. Dielectric and Polysilicon Deposition:  Dielectric Materials: Dielectric layers (such as silicon dioxide, SiO₂) are deposited on the wafer. These insulating layers serve various purposes, including isolating different components and preventing electrical interference.  Polysilicon Deposition: Polysilicon (partially crystalline silicon) is also deposited. Polysilicon is used for gates in transistors, interconnects, and other structural elements. Oxidation: Purpose of Oxidation: The wafer is exposed to oxygen at high temperatures (typically around 1000°C). This process forms a thin layer of silicon dioxide (SiO₂) on the wafer’s surface. Insulating Layer: The silicon dioxide layer acts as an insulator, protecting the underlying silicon and providing electrical isolation between different components. 15 Lithography: It is a critical process in semiconductor manufacturing that enables the creation of intricate patterns on silicon wafers, essential for forming integrated circuits (ICs). Photolithography, a key step, involves using masks and UV light to transfer patterns onto the wafer's surface. Each lithography step, from photolithography to pattern transfer, is crucial in converting raw silicon wafers into functional integrated circuits by defining intricate patterns. Post- silicon validation is a critical phase that follows manufacturing, involving the testing of actual silicon prototypes on real-world system boards to verify functionality and adherence to specifications. Unlike pre-silicon validation, which is virtual, post-silicon validation ensures that the chip meets quality and performance standards before release. The goal of post-silicon validation is to guarantee the chip functions as intended in real-world applications, ensuring reliability and customer satisfaction with the final product. 1.4 HIGH VOLUME VALIDATION High Volume Validation (HVV) is a process that involves testing many parts to identify functional and electrical issues. Unlike the previous approach that only validated around 50 parts and could detect 20,000 defective parts per million (DPPM), HVV aims to find marginalities at a much lower DPPM (100). The HVV factory allows comprehensive testing of standalone and concurrent scenarios for extended periods without time constraints. It also evaluates the impact of process variations, voltage margins, and temperature ranges on system-on-chip (SoC) functionality. The goal is to create, fine-tune, and optimize tests and scenarios for production-level testing. 1.4.1 JTAG JTAG (Joint Test Action Group) is a crucial interface used in the electronics industry for testing and debugging electronic devices, providing a standardized method for accessing and controlling internal components of integrated circuits (ICs). One of its example is shown in Figure 1.12. Its primary purpose is in verifying designs and testing printed circuit boards (PCBs) after manufacturing. Based on the IEEE 1149.1 standard, JTAG defines the Test Access Port (TAP) and boundary scan architecture. The TAP provides a serial communication interface for accessing on- chip test registers with minimal overhead, eliminating the need for direct external access to system buses. 16 One of the key features of JTAG is boundary scan testing, which enables engineers to check the continuity of interconnects within an IC. By scanning signals and memory components linked to interfaces like DDR SDRAM, JTAG helps identify faults and issues. Moreover, JTAG facilitates debugging during development by allowing engineers to halt the processor, read/write memory, and inspect internal states. This capability is crucial for diagnosing issues and optimizing software for better performance. In summary, JTAG plays a vital role in ensuring the quality and reliability of electronic devices by providing efficient testing and debugging capabilities. Figure 1.12 JTAG based System 1.4.2 Serial Wire Debug (SWD) Standard It is a protocol designed by ARM for programming and debugging their microcontrollers. Unlike traditional JTAG (Joint Test Action Group), which uses multiple pins, SWD operates with just two pins: SWDIO (Serial Wire Debug I/O) and SWCLK (Serial Wire Debug Clock). SWD simplifies the debug interface while maintaining compatibility with the JTAG protocol as shown in Figure 1.13. 17 SWD allows bidirectional communication between the debugger and the target microcontroller. With only two pins, SWD minimizes the number of connections required on the PCB. SWD enables efficient debugging, including halting the processor, reading/writing memory, and examining internal states. It is part of the ARM Debug Interface Architecture Specification. The SWDIO line carries both data and control signals. SWCLK provides the clock signal for synchronous communication. SWD uses a shift register mechanism to transfer data between the debugger and the target device. It supports various operations, such as reading and writing memory, accessing registers, and setting breakpoints. SWD is commonly used during development, production testing, and field debugging of ARM-based microcontrollers. It provides an efficient and streamlined way to interact with the target device. In summary, SWD offers a simplified and efficient debugging interface, making it a valuable choice for ARM-based systems. Figure 1.13 Serial Wire Debug 1.5 INTERNAL DPPM VALIDATION (IDV) Internal DPPM Validation (IDV) is a critical step in the quality assurance process for semiconductor manufacturers Company. It involves simulating the shipping process by internally validating a batch of chips before they are sent to actual clients or customers. The goal is to predict the approximate Defective Parts per Million (DPPM) value for the chips. 18 DPPM represents the number of defective parts (chips) per one million devices. For example, if a batch of one million chips has 100 defective units, the DPPM would be 100.Lower DPPM values indicate higher quality and reliability. The primary objective of IDV is to catch and address any manufacturing defects or hardware issues before the chips reach customers. By identifying failures internally, Company can take corrective actions, improve processes, and enhance product quality. Manufacturing processes are complex, involving various steps (such as lithography, etching, and packaging). Defects can occur at any stage. External factors (such as temperature variations, humidity, and handling during shipping) can also impact chip quality. IDV helps uncover these issues early, preventing defective chips from reaching customers. When defects are detected during IDV, engineers perform root cause analysis. They investigate whether the issue is related to design, fabrication, assembly, or other factors. Corrective actions are then taken to prevent similar defects in future batches. IDV covers both hardware and software aspects: Hardware Failures: These include defects in transistors, interconnects, memory cells, and other components. Software Failures: These involve issues related to firmware, drivers, or system-level functionality. It ensures that only high-quality chips are shipped to customers and reduces the risk of field failures, warranty claims, and customer dissatisfaction. Also, enhances overall product reliability and reputation. IDV includes several activities some of which are listed below:  Testing 2000+ Devices with Different Test Contents During IDV, many devices (typically 2000 or more) are subjected to various test scenarios. The goal is to identify hardware failures early in the validation process. By testing different contents, engineers can simulate real-world usage and uncover potential issues before the devices reach customers. This proactive approach helps prevent defects from escaping to the field.  Temperature Testing 19 Temperature plays a crucial role in device performance and reliability. IDV includes testing devices at various temperature extremes: Hot Temperatures: Ensures that the devices function correctly even under high thermal stress. Low Temperatures: Verifies their performance in cold environments. Observing functionality across different physical factors (such as temperature) helps assess robustness.  RAM Configuration Testing Different RAM configurations (e.g., 8GB, 12GB) are tested during IDV. Engineers verify that the devices operate seamlessly with varying memory capacities. This ensures compatibility and stability across different system configurations.  Frequency Corners Testing Frequency corners refer to extreme operating conditions (high and low frequencies). IDV involves testing devices at these corners to assess their behavior. It helps identify any frequency-related issues, such as timing violations or signal integrity problems.  Identifying, Reporting, and Debugging Failures Throughout the IDV process, any recorded failures are meticulously documented. Engineers analyze these failures to understand their root causes. The goal is to pinpoint issues related to design, manufacturing, or other factors. Once identified, corrective actions are taken to improve product quality.  Collaboration and Continuous Improvement IDV is a collaborative effort involving cross-functional teams. Engineers, testers, and quality assurance personnel work together to ensure thorough validation. Lessons learned from IDV feed into continuous improvement processes. In summary, IDV is a comprehensive validation process that combines rigorous testing, temperature assessments, configuration checks, and failure analysis. Its purpose is to catch and address issues internally, ensuring that only high-quality devices reach customers. 20 1.6 DSF VALIDATION (DDR SYSTEM FIRMWARE) It’s a way to verify the quality and reliability of content provided by the DDR team. Its internal structure and the interaction with the other devices are shown in Figure 1.14. The DDR team provides test content and test plan. We regress the runs for their content stability. After regression analysis of multiple runs, we compile a report. This report informs the DDR team about the stability of their content. If the content is stable, it can be trusted for downstream applications. Figure 1.14 DDR Configuration and Validation Suite 1.7 OBJECTIVES The objectives of High-Volume Validation (HVV) encompass four key areas:  Content Stability: Ensuring the stability and integrity of the content within the integrated circuits (ICs) to guarantee that they meet the required specifications and standards.  IDV (Internal DPPM Validation): Conducting validation to identify and rectify any internal defects within the ICs, aiming to minimize the Defective Parts per Million (DPPM) and ensure high-quality semiconductor devices. 21  DSF Validation (DDR System Firmware): Validating the design and content of the DDR to ensure that they are optimized for serviceability, enabling easier and more efficient maintenance and repair processes.  Temperature Testing: Performing rigorous testing under various temperature conditions to assess the functionality, performance, and reliability of the ICs across different environmental scenarios. These objectives are critical for verifying the functionality, performance, and reliability of semiconductor devices, ensuring they meet the required standards and specifications while also minimizing defects and enhancing overall product quality. 1.8 METHODOLOGY The project involves receiving system-on-chip (SoC) devices in large quantities. Once received, the next step is to flash new firmware or software onto the SOCs to update them. After flashing, the SOCs are validated for defective parts using automated tests. Regression testing is then performed using Jenkins to ensure that the changes do not introduce new issues. The test results are compiled using Splunk for data analysis and visualization. Secure updates of firmware and software are ensured to prevent unauthorized modifications. Finally, the information is transformed and transmitted to upstream or downstream teams to maintain stability and reliability. Python scripts are utilized in this project for automating various aspects of the system. These scripts play a crucial role in streamlining tasks such as flashing new firmware, running regression tests, and compiling test results. They help in automating repetitive tasks, saving time, and reducing the risk of human error. Python's versatility allows it to interact with different components of the system, such as Jenkins for running tests and Splunk for data analysis. 1.9 DISSERTATION ORGANISATION Chapter 1: This chapter provides an overview, outlines the objectives of the work, and details the organization of the dissertation. It covers the fundamental theory related to ASIC flow design, post-silicon validation, and high-volume validation. It explains the processes involved, how high- volume validation is achieved, and the specific activities undertaken within HVV. Chapter 2: This chapter delves into various methodologies and innovations in the semiconductor industry aimed at optimizing processes, ensuring quality, and enhancing performance. It 22 underscores the importance of early defect detection, efficient validation, and debugging techniques, and the integration of advanced technologies to address contemporary challenges in semiconductor design and manufacturing. It includes a review of existing literature and identifies the research gap addressed by the work. Chapter 3: This chapter details the comprehensive work and achievements in high-volume validation of system-on-chip (SoC) devices. It covers the entire process from receiving SoCs in bulk, flashing new firmware, validating defective parts, conducting regression testing with Jenkins, and compiling results using Splunk. Additionally, it highlights the importance of secure updates and the effective transformation and transmission of information to ensure system stability and reliability. Chapter 4: This chapter provides an in-depth overview of the software and tools used throughout the project, highlighting their roles in enhancing efficiency and accuracy. It discusses the integration of Jenkins for automated regression testing, Splunk for data analysis and visualization, and various Python scripts for automating routine tasks. Chapter 5: This section presents the experimental results pertaining to the project, showcasing the outcomes of the high-volume validation process for SoC devices. It includes detailed analyses of the data collected from automated tests, regression tests, and defect validations. The results highlight the effectiveness of the flashing and updating processes, the stability and reliability of the SoCs, and the efficiency of the automated systems in identifying and addressing issues. These findings are visualized through various dashboards and reports, providing clear insights into the project's performance. Chapter 6: This chapter outlines the conclusions drawn from the high-volume validation (HVV) work, summarizing the key achievements and insights gained from the project. It highlights the effectiveness of the implemented processes in ensuring the stability and reliability of SoC devices. Additionally, the chapter discusses the future scope, suggesting potential enhancements and further research opportunities to improve validation techniques, automation processes, and overall system performance. The focus is on continuous improvement and adapting to emerging technological advancements. 23 Chapter 2 LITERATURE REVIEW This chapter delves into various methodologies and innovations in the semiconductor industry aimed at optimizing processes, ensuring quality, and enhancing performance. It underscores the importance of early defect detection, efficient validation, and debugging techniques, and the integration of advanced technologies to address contemporary challenges in semiconductor design and manufacturing. The increasing demand for Silicon-on-Chip devices has significantly impacted the processes of leading semiconductor companies. To optimize costs and production yield, the industry is redesigning internal technology processes. A crucial aspect of this effort is the early identification of wafer defects. The Electrical Wafer Sorting (EWS) stage is vital for efficiently analyzing wafer defects by examining the visual maps associated with wafers. In authors proposed an effective solution for the automatic evaluation of EWS defect maps. The proposed approach utilizes recent deep learning techniques, both supervised and unsupervised, to robustly classify EWS defect patterns across various device technologies, including Silicon and Silicon Carbide. The solution includes an end-to-end pipeline for supervised EWS defect pattern classification and a hierarchical unsupervised system to identify new defects in the production line. The authors demonstrate the effectiveness of their approach through numerical experiments using real-world data. The proposed method outperforms existing state-of-the-art methods in terms of internal cluster validation quality and normalized mutual information, an external cluster validation metric. The results indicate substantial improvements in SPR performance, particularly for more complex- shaped patterns. In author introduces an innovative router architecture aimed at minimizing trace data during Network-on-Chip (NoC) post-silicon validation. This architecture effectively reduces the amount of trace data generated, thereby alleviating storage and bandwidth constraints while maintaining high validation coverage. By incorporating advanced compression techniques and efficient data routing mechanisms, the proposed solution significantly enhances the efficiency of the post-silicon validation process. For instance, the method achieved an overall failure rate (OFR) of 0% for certain parts, indicating complete content stability and readiness for deployment without further testing. In contrast, 24 other parts exhibited an OFR of 4.07% over 1180 iterations, necessitating retesting to ensure content stability. In author addresses the issue by defining comprehensive bug models and proposing an ISA- independent validation methodology for high-end microprocessors' ATMs. The method detects ATM bugs immediately after their manifestation, facilitating diagnosis and debugging. Experiments using an enhanced Gem5 simulator demonstrate the method's effectiveness, detecting known ATM bugs five orders of magnitude faster than traditional approaches. Post-silicon validation and debugging is a highly time-consuming process due to several challenges, such as bug localization, signal selection, trace buffer management, and trace data bandwidth. In author proposed a method to validate and debug complex design bugs that demand large trace bandwidths by dividing the problem into multiple hierarchies. The proposed method uses a two-phase approach. In the first phase, important signals are selected at a coarse level, and in the second phase, the method zooms into specific problematic circuit parts. This hierarchical approach enables identifying the root cause of problems with limited trace data bandwidth, significantly reducing the volume of trace data. In , Wang et al. (2021) investigate the concurrent testing of reconfigurable scan networks in self-aware systems, highlighting the critical role of real-time monitoring and adaptability. Their proposed methodology utilizes reconfigurable scan architectures to facilitate efficient, simultaneous testing and operational monitoring, thus improving system reliability and performance. This approach minimizes disruptions during testing and allows for the dynamic reconfiguration of scan paths to address various operational states and fault conditions. The study makes a significant contribution to the development of robust, self-aware systems that maintain high performance and reliability through advanced testing methods. In author proposed a thorough protection method to defend against data flow attacks in IJTAG networks. By optimizing the network structure according to the access times of instruments, the method enhances test efficiency and protects third-party intellectual property. Moreover, it suggests grouping instruments to minimize hardware overhead while ensuring security. Abdennadher and Meixner explore the efficacy of defect-based testing for high-speed I/O interfaces, presenting a compelling argument for its adoption over traditional testing methodologies. They identify that conventional testing methods often fall short in detecting 25 the specific and nuanced faults associated with high-speed interfaces, which are increasingly prevalent in modern electronic systems. Defect-based testing, in contrast, focuses on identifying faults resulting from manufacturing inconsistencies, material defects, and operational stresses, offering a more targeted and effective approach. Their study provides empirical evidence showing that defect-based testing significantly enhances fault detection accuracy, leading to more reliable and robust high-speed I/O interface designs. This research is particularly relevant as it addresses the growing complexity and performance demands of contemporary digital circuits, advocating for a shift towards more sophisticated and precise testing strategies to ensure the integrity and performance of high-speed interfaces. In , the author introduces a Multi-Protocol Conversion Unit (MPCU) designed and simulated using Hardware Descriptive Language (HDL). This unit acts as a bridge to facilitate data communication between three widely used serial communication protocols: Serial Peripheral Interface (SPI), Inter Integrated Circuit (I2C), and Universal Asynchronous Receiver Transmitter (UART). The MPCU can receive input from any of these protocols and uses a Conversion Select (COSE) input value to determine the protocol for data transfer. Based on the COSE input, the MPCU directs data from its internal bus to the appropriate protocol slave, which converts the data and outputs it as an 8-bit value. In , the author proposes integrating clock gating techniques to improve the performance of the Serial Peripheral Interface (SPI) protocol commonly used in embedded systems. This technique selectively disables the clock signal during idle states, significantly reducing power consumption while maintaining high-speed, reliable data transfer. Extensive simulations and experimental validations demonstrate the effectiveness of this clock gating strategy in balancing power consumption and communication speed. This research not only enhances SPI applications but also provides a model for optimizing other communication protocols. The findings offer valuable insights into the implementation challenges and considerations of integrating clock gating into the SPI protocol, paving the way for advancements in power-aware design methodologies. In author presents the design and implementation of an FPGA controller for writing to and reading from SD cards using the SPI protocol. Aimed at addressing the memory needs of FPGA systems, this solution provides a cost-effective, high-capacity, and portable storage option. Developed on a Xilinx AC701 platform with Artix-7 FPGA using VHDL, the controller efficiently 26 accesses multiple SD cards with minimal data lines. Testing with various SD cards demonstrated reliable, error-free operation, offering a robust storage solution for real-time data processing applications. In author introduces the design of a Built-In-Self-Test (BIST) embedded Serial Peripheral Interface (SPI) module configured for single master and single slave communication. The BIST feature enhances self-testability, ensuring fault-free circuits and reducing maintenance and testing costs. The module transfers 8-bit data and uses BIST to verify the correctness of the circuit under test (CUT). Designed using Verilog HDL on the EDA playground platform, this SPI module is suitable for applications in Application Specific Integrated Circuits (ASIC) and Systems on Chip (SoC). Lee et al. present a novel approach to utilizing a RISC-V FPGA platform for ROS-based robotics applications, showcasing the potential of open-source hardware and software integration in robotics. The study highlights the benefits of using RISC-V, a popular open-source instruction set architecture, in FPGA-based systems for robotics, offering flexibility and customization crucial for complex robotic applications. By integrating RISC-V with the Robot Operating System (ROS), a widely-used framework for robotics development, the authors demonstrate a scalable and adaptable platform for developing advanced robotics systems. Their work contributes to the growing body of research on open-source hardware and software collaboration in robotics, paving the way for more accessible and cost-effective robotic solutions. The study underscores the importance of leveraging open-source technologies to drive innovation and accelerate the development of robotics applications. 2.1 POST SILICON VALIDATION METHOD Post-silicon validation refers to the process of testing and verifying the functionality and performance of a semiconductor chip after it has been manufactured. This validation phase is crucial to ensure that the chip meets its design specifications and functions correctly in real-world conditions.It involves running various tests, simulations, and debugging procedures to identify and fix any issues or defects in the chip’s design or fabrication. It is the last step in the design IC flow. Our work starts after the tape-out is completed i.e. design is ready and sent for manufacturing. 27 Figure 2.1 Silicon on board 2.1.1 Post-Silicon Milestones After fabrication, silicon placed on the board as shown in Figure 2.1 which typically refers to the integration of silicon chips directly onto a circuit board, rather than using separate components. This integration can lead to more compact and efficient electronic devices, as it reduces the need for additional components and interconnections. It’s often seen in applications where space is limited or where power efficiency is critical, such as in mobile devices or IoT (Internet of things) devices. An engineering sample refers to an early version of a product that is produced for testing, evaluation, and validation purposes. These samples are often used by engineers to identify potential issues, assess performance, and make necessary adjustments before mass production begins. Engineering samples are crucial in the product development process as they help refine and improve the final product before it reaches consumers. Customer sampling refers to the process of providing samples of a product to potential customers for evaluation and testing purposes. This allows customers to assess the quality, performance, and suitability of the product for their specific needs before making a purchasing decision. Customer sampling is a common practice in various industries, especially in the semiconductor and 28 electronic sectors where companies often provide engineering samples or prototype units to interested clients for feedback and validation. 2.2 SERIAL PERIPHERAL INTERFACE The Serial Peripheral Interface (SPI) protocol governs communication between master and slave devices using control and data signals. Key control signals include the clock signal (SCLK) and the slave select (SS) or chip select (CS) signal, which determines which slave device to communicate with. Data is transmitted using the master output slave input (MOSI) and master input slave output (MISO) signals. The clock polarity (CPOL) and clock phase (CPHA) determine when data is read or written, based on the chip-select line's state. Figure 2.2 Serial Peripheral Interface Protocol 29 SPI supports full-duplex communication, allowing data to be transferred simultaneously between master and multiple slave devices as shown in Figure 2.2. The selection of the slave device is controlled by the slave select pin, which can be used to enable communication with a specific slave device when pulled low. The SPI bus interface includes four main signals: MOSI, MISO, SCLK, and SS. MOSI is used for data transmission from master to slave, while MISO is used for data transmission from slave to master. SCLK is the clock signal that synchronizes data transfer, and SS is the slave select signal used to select a specific slave device for communication. In summary, HVV (High-Volume Validation), which is a crucial part of post-silicon validation, uses the SPI (Serial Peripheral Interface) protocol to flash content onto integrated circuits (ICs). This process involves transferring data from the PC to the hardware and vice versa, ensuring efficient communication and data exchange. The SPI protocol facilitates reliable and high-speed data transfer, making it an ideal choice for flashing and validating ICs during the post-silicon phase. This method ensures that the ICs meet the required specifications and perform correctly before they are deployed in final products. 30 Chapter 3 SOC VALIDATION FOR CONTENT STABILTY This chapter provides a detailed description of the high-volume validation process for system-on- chip (SoC) devices when they are received in large quantities. It explains how this process ensures the stability of the SoCs' content, validates defective parts, and manages the transmission of information to the upstream or downstream teams. The chapter outlines the steps involved in flashing new releases onto the SoCs, setting up regression tests using Jenkins for automated testing, and compiling reports using Splunk for data analysis and visualization. Additionally, it highlights the importance of secure updates and testing methodologies in maintaining the performance and reliability of embedded systems. 3.1 FLASHING Flashing content in embedded systems refers to the process of updating or programming the firmware or software on a microcontroller or other embedded device like embedded flash memory. Embedded flash memory is a type of non-volatile memory as shown in Figure 3.1. It is integrated directly into microcontrollers or system-on-chip (SoC) devices. Flash Memory Controller as shown in Figure 3.2 is the one which controls it. It allows these devices to store program code, data, and configuration settings. Unlike external storage (such as SD cards or USB drives), embedded flash memory is an integral part of the chip itself. The main advantage is that it provides fast access to data and instructions without the need for external storage devices. Common applications include microcontrollers, automotive ECUs (Electronic Control Units), IoT devices, and smart cards. Figure 3.1 Flash Memory Figure 3.2 Flash Memory Controller 31 The process of flashing involves writing new firmware or software to the embedded flash memory. Here are the typical steps: Bootloader: Many embedded systems have a bootloader, which is a small piece of code that runs when the device starts up. The bootloader allows the system to load new firmware. Downloading Updates: The bootloader enables downloading updated program and data code. This can include control unit firmware updates (bug fixes or feature improvements) or downloading additional multimedia files. Updating Flash Memory: During the flashing process, the existing content in the flash memory is overwritten with the new code or data. Verification: After flashing, the system verifies that the new content is correctly written to the flash memory. Reboot: Finally, the system reboots with the updated firmware or software. Flashing is essential for keeping embedded systems up to date, but it also poses security risks. Ensuring secure updates is crucial to prevent unauthorized modifications or attacks. Techniques like signed firmware, secure boot, and encrypted communication are used to enhance security during the flashing process. In summary, flashing content in embedded systems is a critical operation that allows devices to stay current with the latest software and firmware updates. It ensures optimal performance, bug fixes, and feature enhancements. 3.2 SETTING UP THE REGRESSION TESTS For running the regression, we use the Jenkins Portal. Jenkins is a popular continuous integration (CI) server that can be used for automating various tasks, including regression testing. Jenkins is built to run regression tests as part of the CI process. It is plugin-based, allowing you to customize it for your specific needs. Regression tests help ensure that changes to your codebase do not introduce new issues or break existing functionality. Here are the steps to set up regression tests using Jenkins: 32 3.2.1 Create a Jenkins Job Figure 3.3 Creating a Jenkins Job In Jenkins, start by accessing the dashboard and clicking "New Item" to create a new job as shown in Figure 3.3. Name your job and select "Freestyle project" or "Pipeline" as needed. Configure the job under "Source Code Management" by selecting "Git" and entering your repository URL, along with necessary credentials and branch details. Next, specify the build steps by adding commands or scripts to compile the code and run tests. Finally, set up build triggers, such as polling the SCM or triggering on commits, and save the configuration to complete the job setup. 3.2.2 Add Regression Tests The regression tests can be written in various languages (Java, Python, etc.). Add a build step as shown in Figure 3.4 that runs your regression test suite. You can use shell scripts, batch files, or other tools to execute your tests. 33 Figure 3.4 Build the job Figure 3.5 Display test results 34 3.2.3 Test Reporting Jenkins provides test reporting features. After running the tests, Jenkins will display test results, including pass/fail status and any errors. 3.2.4 Integration with Other Tools Jenkins integrates well with other tools like Selenium, Junit, TestNG, and more. It can use these tools to write and execute your regression tests. 3.2.5 Schedule Builds Set up triggers to automatically run your regression tests when code changes are pushed to the repository. It can schedule builds periodically or based on events (e.g., after a commit). If regression tests involve GUI applications or interactive methods (e.g., SendKeys), consider the following: Jenkins Slave Agent: Run Jenkins slave agents that can deal with active desktop sessions. Keep-Alive Jobs: Set upkeep-alive jobs to maintain an active session for interactive tests. Jenkins supports Pipeline (declarative and scripted) for defining complex workflows as shown in Figure 3.5. Ensure the plugin works seamlessly with Pipeline by including necessary dependencies. Secure your Jenkins setup to prevent unauthorized access. Use credentials securely for accessing repositories and other resources. 3.3 COMPILE THE RESULTS HVV data is ingested into Splunk from various sources (logs, databases, APIs, etc.). Splunk indexes this data, making it searchable and accessible. It creates a search query in Splunk to retrieve relevant HVV data. The search results are visualized using panels on a dashboard. These panels can include charts, tables, maps, and other visual elements. You compile these panels into a dashboard. The dashboard provides an overview of HVV performance, trends, and anomalies. Later, we can customize the layout, add filters, and set permissions. Once the dashboard is ready, you share it with the downstream team. They can access the dashboard to gain insights into HVV metrics. The downstream team can use this information for decision-making, troubleshooting, or reporting. 35 3.3.1 Splunk This kind of software as shown in Figure 3.6 can indeed streamline data processing, analysis, and visualization, making it invaluable for organizations dealing with large volumes of machine data and other forms of big data. Its ability to automatically integrate data from various sources and accept data in any format makes it a versatile solution. Additionally, offering functionalities like searching, analyzing, reporting, and visualizing data makes it a comprehensive tool for extracting insights and making informed decisions. 3.3.1.1 Splunk Dashboard Splunk dashboards consist of panels, which are customizable components that can display various modules like search boxes, fields, charts, tables, and lists. Each panel is typically linked to a report, allowing users to visualize and interact with the data in a meaningful way. 3.3.1.2 Operations on Splunk Dashboard Figure 3.6 Splunk Enterprise  Change Dashboard Permissions Adjust who can view or edit the dashboard by modifying its access settings. This allows you to control visibility and collaboration options as shown in Figure 3.7.  Change Dashboard Permissions Modify how data is presented within a panel by switching between different visualization types, such as changing a table to a chart or a list.  Edit the XML Configuration of a Dashboard To directly modify the underlying XML code of a Splunk dashboard, access the dashboard's edit mode and switch to the XML view as shown in Figure 3.8. Here, manually edit the XML code to customize the layout, add advanced functionalities, and change the appearance of the dashboard. This allows for more precise control over the dashboard elements, enabling configurations that are not achievable through the standard user interface. After making the desired changes, save the XML code to apply the customizations. 36 Figure 3.7 Display the statistics Figure 3.8 Splunk visualization Apart from these results generation and publishing the results to downstream team, we are writing some automation scripts using python. Python offers several advantages that make it a popular choice among developers. We are developing scripts for read/write the information using SPI, I2C protocol from the hardware to the host machine and many more. It has one of the largest and most active developer communities worldwide. It’s widely used, and you’ll find plenty of support, helpful documentation, and resources. Python is powerful, flexible, and easy to use. It supports multiple programming paradigms and performs automatic memory management. Its English-like syntax makes it easy to learn and read. It uses simple line breaks instead of symbols for code blocks, which aids readability. It boasts an extensive selection of open-source libraries. These libraries cover various domains, from web development to data science, making development quicker and easier. It is platform-independent, allowing you to write code once and run it on different platforms without modification. 37 3.3.2 Python Libraries Python’s standard library functions, also known as built-in functions, come pre-packaged with the language. They serve various purposes and simplify common tasks. These inbuilt functions are part of Python itself, so we don’t need to create them from scratch. They cover a wide range of functionalities, from basic operations (like printing to the terminal) to more complex tasks. Unlike languages like C or C++, Python doesn’t require explicit header files. The functions’ prototypes and definitions are already available within the language. By using standard library functions, developers streamline their workflow. Instead of writing custom code for routine tasks, they leverage these pre-existing functions. Standard library functions undergo thorough testing, ensuring reliability and robustness. A dedicated group of developers continually improves and optimizes them. Functions like printing, calculating squares, or handling data parsing are readily available. Reusing them saves time and effort. As real-world needs evolve, Python’s standard library adapts. It ensures that applications work consistently across platforms. In this work, we’ve used Python for data parsing, bandwidth extraction, and automatic graph plotting. Python’s high speed, performance, and user-friendly GUI frameworks make it an excellent choice for such tasks. 3.3.2.1 argparse library The argparse library is a powerful Python module that simplifies creating user-friendly command- line interfaces. When we have a Python script that requires user input before execution, argparse helps to define what those inputs should look like and even generates helpful messages for users to understand their requirements. It’s a great tool for handling command-line arguments and options. It allows you to define command-line arguments and specify different actions for them. These actions include storing argument values (either singly or as part of a list), handling Boolean switches, calling callbacks, and counting argument occurrences. By default, the argument value is stored, but you can also provide a type for conversion. If a destination argument is specified, the value is saved as an attribute with that name in the Namespace during argument parsing. Some key features about argparse: User-Friendly Interfaces: argparse allows you to define the arguments your program needs. It then parses those arguments from sys. argv, making it easy for users to provide input. 38 Automatic Help and Usage Messages: It generates help messages and usage instructions automatically, enhancing the user experience. Example of argparse library is shown in Figure 3.9. Figure 3.9 Example of argparse Library 3.3.2.2 Pandas Library Pandas is a robust and open-source Python library primarily used for data manipulation and analysis. It provides data structures and functions that enable efficient operations on datasets. The core data structure in Pandas, resembling a table with rows and columns. It allows efficient data manipulation, indexing, and alignment. Example shown in Figure 3.10. Figure 3.10 Example of Pandas Library 39 3.3.2.3 OS library It provides functions for interacting with operating system-dependent features, such as file and directory manipulation, process management, and environment variables. Example shown in Figure 3.11. Figure 3.11 Example of OS Library Features of OS library: File and Path Manipulation: You can use it for tasks like opening files, manipulating paths, and working with directories. Process Parameters: It allows you to work with process-related information, such as forking processes or executing external commands. Environment Variables: Provide access and modify environment variables using this module. Platform Independence: The same interface is used across different operating systems (e.g., ‘posix’, ‘nt’, ‘java’), ensuring portability. Exception Handling: Functions in this module raise OSError (or its subclasses) for invalid or inaccessible file names and paths. 3.3.2.4 Tkinter library. When it comes to creating graphical user interfaces (GUIs) in Python, there are several libraries like Tkinter, Kivy, PySimpleGUI etc. Tkinter is the default GUI framework bundled with Python. 40 It’s simple and comes with standard layouts and widgets. It is suitable for small portable applications and basic tool GUIs. It has Cross-platform, but widgets may look outdated on Windows. Example is shown in Figure 3.12. Figure 3.12 Example of Tkinter Library In summary, when large quantities of SoCs arrive or a new release is issued, the High-Volume Validation (HVV) process begins by flashing the new release onto the hardware to ensure content stability. This process includes checking for defective parts, especially when new batches of 100- 200 parts are received. To streamline and automate these tasks, automated Python scripts are developed and utilized. These scripts help in efficiently managing the flashing process, defect detection, and overall regression testing, ensuring that the hardware functions correctly and reliably before moving forward in the development cycle. 41 Chapter 4 SOFTWARE AND TOOLS USED This chapter outlines the internal tools and automation techniques employed to streamline the Integrated Device Validation (IDV) process. By automating repetitive tasks, these tools significantly enhance testing efficiency. Through this project, I have developed the skills needed to effectively utilize and manage these automation tools. 4.1 CORE DEVELOPMENT PLATFORM (CDP) A CDP, also known as a development board or evaluation board, serves as a fundamental hardware platform for testing and validating integrated circuits (ICs) or chips. It provides an environment where engineers can interact with the chip as shown in Figure 4.1, evaluate its performance, and verify its functionality. CDPs are essential during the early stages of chip development, allowing designers to assess their designs before moving to mass production. Figure 4.1 Core Development Platform 42 4.1.1 Components of a CDP  Hardware Board: The CDP typically consists of a hardware board where the chip (integrated circuit) is mounted. This board provides electrical connections, power supply, and interfaces for testing.  Peripherals: CDPs come equipped with various peripherals (such as sensors, actuators, communication modules) that simulate real-world interactions.  Displays: Some CDPs include displays (like LCD screens) to visualize chip output or debug information.  Programmability: Engineers can program and configure the chip on the CDP using software tools.  Debugging Interfaces: CDPs often have debugging interfaces (e.g., JTAG) for real-time monitoring and debugging. 4.1.2 Testing Scenarios  Functional Testing: Engineers verify that the chip performs its intended functions correctly.  Stress Testing: CDPs subject the chip to extreme conditions (e.g., high temperature, voltage fluctuations) to assess reliability.  Performance Testing: Engineers measure the chip’s speed, power consumption, and other performance metrics.  Compatibility Testing: CDPs check how the chip interacts with other components (e.g., memory, peripherals). 4.1.3 Software Development  Engineers write and debug software for the chip using the CDP.  They can simulate different use cases and validate software behavior. 4.1.4 Prototyping and Iteration  CDPs allow rapid prototyping and iteration during chip development.  Engineers can modify the design, reprogram the chip, and test again. 43 In summary, a CDP is a versatile platform that accelerates chip development by providing a controlled environment for testing, debugging, and software development. It ensures that chips meet performance, functionality, and reliability requirements before production. 4.2 HANDLER Handlers are specialized machines used in semiconductor manufacturing and testing. Their primary function is to automate the process of picking up chips (integrated circuits) from trays or carriers and placing them onto the Core Development Platform (CDP) for testing as shown in Figure 4.2. Handlers use stepper motors to precisely control movement. These motors allow fine adjustments during chip handling. It employs a vacuum system to pick up chips. A suction cup or nozzle creates a vacuum, securely holding the chip during transfer. Handlers use gears and belts to move the chip from the tray to the CDP. These mechanical components ensure accurate positioning. The robotic arm performs the actual chip handling. It moves along predefined paths to pick up, transport, and place chips. The handler’s vacuum system picks up a chip from the tray. The robotic arm moves the chip to the CDP. The chip is accurately placed on the CDP for testing. It repeats this process for all chips in the tray. Handlers enhance efficiency, accuracy, and consistency in chip testing by automating the transfer from trays to the CDP. Their independence allows continuous testing without human supervision. 4.2.1 Advantages of Handlers  Precision: Handlers offer high precision, critical for delicate chips.  Speed

High Volume Functional Validation - Master's Dissertation PDF

Document Details

Tags

Related

Summary

Full Transcript