Assignment 2: CISC Architecture Performance Comparison (Fall 2024) PDF

Document Details

CuteWatermelonTourmaline

Uploaded by CuteWatermelonTourmaline

Kangwon National University

2024

Tags

computer architecture CISC ISA performance analysis

Summary

This document is an assignment for a computer architecture course. It involves evaluating the performance of different versions of the memcpy function by using ISA extensions. Students are required to implement their own memcpy functions using MOVS instructions and other ISA features, and compare their performance on various CPUs. The assignment includes details of the main() function, logistics, experiments, and evaluation procedures.

Full Transcript

4471029, Fall 2024 Assignment #2: Better exploiting the ISA extension in CISC architecture and comparing performance Assigned: Nov. 13, Due: Sun., Nov. 24, 11:59PM Computer Architecture TA: SeungHo Choi([email protected]) You can contact TA directly for question...

4471029, Fall 2024 Assignment #2: Better exploiting the ISA extension in CISC architecture and comparing performance Assigned: Nov. 13, Due: Sun., Nov. 24, 11:59PM Computer Architecture TA: SeungHo Choi([email protected]) You can contact TA directly for questions or use Q/A in E-Ruri. 1 Introduction The purpose of this assignment is to have better understand on the characteristics of CISC architecture, x86-64, and to become more familiar with the inline-assembly programming in C for better optimization. In particular, you will learn how you can utilize CISC instructions that CPU provide to improve the per- formance of your code; thus, understand how ISA feature can affect your program’s performance. Going through several evaluations of your code on different CPU(e.g., class server and your computer), you will figure out how S/W development drives H/W change and vice-versa. The goal is very simple; implementing different version of custom memcpy functions, evaluating them, and comparing their performance. You will implement your own memcpy function by using a single MOVS instruction that Intel provide. You will further improve your own memcpy fucntion by using other ISA features that Intel provide. (Also using your own secret sauce ,) You will measure the elapsed cycle of different version of memcpy functions, including old version of memcpy, current version of memcpy, your custom memcpy using MOVS, and another custom mem- cpy(). You will do several experiments, minimizing run-time variation, and evaluate the results of different version of memcpy on different CPUs You will draw bar graphs, where all results are normalized to the result of the old version memcpy, to make comparisons on your experiment results accordingly 1 2 Logistics This is an individual project. All handins are electronic. Clarifications and corrections will be posted on the course Web page(E-Ruri). 2.1 Get Ready Your VM I believe you already have several experiences on setting-up a Linux virtual machine from other classes. You also need your own Linux VM for this and later assignment in this class, so please create a Linux VM. Of course, you can set-up the Linux environment on either a virtual machine or a native machine1 , depending on H/W resources on your computer system. Please make sure that free disk space of your environment is at least 80GB or more2. After setting up your virtual machine, please install any stable version of Ubuntu3 on the VM. 2.2 Warm-up yourself with exercising Linux basic commands Please use the manpage(e.g., $ man cp) to review the following commands and I recommand you to practice them enough. cd(moving your working directory), ls(listing files in your working directory), mv(moving a file or a directory to another location), cp(copying your files to another location), rm(deleting a file or direc- tory), etc. tar (compression/de-compression), make (build automation), editor (e.g., VIM, gedit) Please review the materials from Linux class Googling makes your life easier , 3 Handout Instructions The handout also contains a Tar file(Lab2.tar). Please download it. Start by copying Lab2.tar to a directory in which you plan to do your work(i.e., create a directory and locate the file in the directory on both your machine4 and Gomduri server). Then give the command below. unix> tar xvf Lab2.tar. This will cause a number of files to be unpacked in the directory. 1 Please make sure that your host system are Intel or AMD’s x86-based system 2 At least 50GB is requied for this assignment, however you need more free space to build for the later assignment. 3 Currently, Ubuntu 18.04.6 LTS (Bionic Beaver) is installed on your Gomduri server. 4 Either VM or native system 2 3.1 main.c The main.c contains the main() function where several sub-routines(e.g., file open, memory allo- cation, and memory copy) are called and performance measurement for the memory copy is also done as shown in Figure 1. Figure 1: main() function 3 Let’s look into the main() function in Figure 1. At line 33, you first open /proc/cpuinfo file from Procfs5 At line 38, the getSize() function is called with an argument, the file descriptor just opened right before, which returns the size of the given file At line 41, file size is multiplied by a macro constant, ITER, to increase memory footprint At line 43, the alloc SRC and DEST() function is called, where each of two pointer variables(SRC and DEST) respectively with a given size(alloc size) In line 49, The allocated memory area indicated by SRC is initialized by the content of /proc/cpuinfo file6 At line 63 and 78, RDTSC instructions are used to measure the elapsed cycle counts during the memory-to-memory copy(i.e., SRC to DEST) In line 64−72, different versions of the memory-copy functions is called. All four functions(memcpy(), old memcpy(), movs memcpy(), and custom memcpy()) perform the same operations(i.e., copy all contents in a given memory area pointed by SRC to the memory area pointed by DEST) At line 80, memcmp() function is called to check if the contents of both SRC and DEST are identical or not. Note: Please don’t modify the main.c file. 3.2 libLab.c libLab2.c contains several functions that will be called in the main() function. Some of the func- tions are three different versions of memory-copy functions(e.g., old memcpy(), movs memcpy(), and custom memcpy()). The old memcpy()7 is a simple implemention for the memory-copy function that was used for C library in an Unix-like operation system, OpenBSD8. The movs memcpy() and custom memcpy() are skeletons for different versions of the memory-copy functions. This file also contains other functions such as alloc SRC and DEST() and getSize(). Your first goal is to complete two skeleton functions(movs memcpy() and custom memcpy()) using ISA features that x86 provides. You implement movs memcpy() function using MOVS instructions and see if your movs memcpy() function outperforms either memcpy() or old memcpy() on both systems. After evaluating and com- paring the performance, please analyze the results and discuss about your analysis. 5 https://www.kernel.org/doc/Documentation/filesystems/proc.rst 6 The duplicated contents of /proc/cpuinfo file are used to fill out the entire memory area that SRC points to. 7 https://github.com/openbsd/src/blob/master/sys/lib/libkern/memcpy.c 8 https://www.openbsd.org 4 You also implement custom memcpy() function. In this time, you can utlize any CISC instructions that Intel ISA supports except for MOVS, however make sure that it must show better performance than other versions of memory-copy functions(old memcpy(), memcpy(), and movs memcpy()). And please explain what your optimization strategy is. Your second goal is to complete the remaining functions(alloc SRC and DEST() and getSize()) accord- ingly as well. The getSize() function gets a file descriptor as an agrument, calculates the size of the file, and returns it to its caller function The alloc SRC and DEST() function get three arguments including two pointer variables(SRC and DEST) and the file size which is returned by getSize(). This function allocates two given-sized- memory regions for SRC and DEST respectively. So, you can edit this file except for the old memcpy() function. 3.3 Makefile Figure 2: Makefile Makefile will generate four different versions of memory-copy executable files, old, cur, movs, and custom. As shown in Figure 2, old executable file is generated by compiling main.c file with old memcpy() 5 in libLab2.c file, movs is generated by compiling main.c with movs memcpy() in libLab2.c file, and so on. Note: You can’t modify the Makefile file. 3.4 Miscs. gen graph.xlsx: it would be helpful to plot figures. please feel free to use it to write your report. After your all experiments are done, you have to fill in the sheet accordingly. ex.sh: you can use this shell script to do your experiments. You can understand how the script works when you read it. 4 Evaluation Figure 3: Experiment results on Gomduri server Figure 4: Experiment results on Dakgalb server. Once your ‘libLib1.c’ and binaries are ready, let’s do experiments with executable files that you gen- erated and compare performance across different versions of memory-copy programs. However, please make sure setting up your system accordingly to minimize the runtime variation9 before kicking-off your experiments. For the evaluation, you can use ‘ex.sh‘ shell script. Please note that 1. I’ll test your code in Gomduri server and run it with ’/proc/cpuinfo’ as a command-line argument. So please test your code with the argument before submitting it. 9 Your Gomduri is already configured properly 6 2. I’ll examine whether your movs memcpy() and custom memcpy() functions operate correctly, in terms of the functionality, or not by checking the result of ’memcmp()’ function in the main() function. 3. I’ll check for the plagiarism based on your submission, libLab2.c file. Your score will be determined out of a maximum of 100 points based on the following objectives. Each objective has a different score range and I will evaluate your answers based on the quality of your writing skills (i.e., elaboration, example, and research) [∼5 pts] Objective #1: Please fill in the table 1 with system information about both Gom- duri server and your system. Gomduri server Your system vendor id model name cpu MHz memory size Table 1: System information [∼10 pts] Objective #2: Implement the getSize() function (no hard coding allowed) [∼10 pts] Objective #3: Implement the alloc SRC and DEST() function [∼10 pts] Objective #4: Implement the movs memcpy() function [∼30 pts] Objective #5: Implement the custom memcpy() function Objective #6: Evaluate your code on both Gomduri server and your system Then, write a report considering the following subjects [∼5 pts] #6-1: Please explain what efforts have you made to minimize the runtime varia- tions on your system? Please elaborate this carefully because your description will be highly related to your answer on #6-4. [∼10 pts] #6-2: After completing all experiments, plot two figures of your results on both Gomduri server and your system respectively. Each figure should consists of three bar graphs where all results of movs, curs, and custom are normalized to the execution time of old. For example, the figure 3 and 4 are the evaluated results on Gomduri and Dakgalb11 servers respectively. [∼10 pts] #6-3: Please analyze your result on Gomduri3. Does your graphs show a similar trend to Figure 3? If so, please explain what your design choices were for implementing movs memcpy() and custom memcpy() If not in the case, please discuss why it is. [∼10 pts] #6-4: Please analysize your result on your system and compare it with your result on Gomduri server. In particular, your discussion should include elaborations about what makes the differences. You can do some research to support your claim. 7 4.1 Handin Instructions Please write a report that includes your answer about Objective #1 and Objective #6. Please compress three files below and make a Tar file(your student id.tar). gen graph.xlsx10 libLab2.c report student id.pdf 11 You should electronically hand in your assignment (in Tar format) to E-Ruri. 5 Notes You can study Intel software developer manuals to find a CISC instruction for implementing your movs memcpy() and custom memcpy(). Files in Procfs not exist on your storage You can reverse-engineer memcpy() code by using disassembler such as GDB and objdump tools. It gives you a hint to implement custom memcpy(). You can ask a question about the handout instructions. But you can’t ask anything about your error messages that you face in the implementation Of course, you can ask anything right after the deadline Hope you enjoy this :-) References Intel® 64 and IA-32 Architectures Software Developer Manuals, https://www.intel.com/ content/www/us/en/developer/articles/technical/intel-sdm.html, Intel. 10 Please fill in the sheet with your experiment result accordingly 11 No template given, use a self-selected form 8

Use Quizgecko on...
Browser
Browser