Tuesday, November 26, 2019

High Performance Computing Essay Example

High Performance Computing Essay Example High Performance Computing Essay High Performance Computing Essay HIGH PERFORMANCE COMPUTING: DESIGN, BUILD AND BENCHMARK A LINUX CLUSTER by GOH YOKE LOONG KEM 050024 Semester 1 Session 2008/09 Final year research project report submitted to the Faculty of Engineering University of Malaya in partial fulfillment of the requirement to be Bachelor’s degree of Engineering DEPARTMENT OF MECHANICAL ENGINEERING ENGINEERING FACULTY UNIVERSITY OF MALAYA October 2008 ABSTRACT Nowadays almost every industry needs fast processing power especially in engineering field. Manipulating high-resolution interactive graphics in engineering, such as in aircraft engine design, has always been a challenge in terms of performance and scalability because of the sheer volume of data involved. [1] Linux clustering is popular in many industries these days. With the advent of clustering technology and the growing acceptance of open source software, supercomputers can now be created for cheaper cost of traditional high-performance machines. Due to these conditions compromise, the number, variety, and specialized configurations of these machines are increasing dramatically with 32 – 128 node clusters being commonplace in science labs. [2] As a result, the intent of this research is to use the latest open source software and computers available in the computer laboratory of Mechanical Engineering Department, University of Malaya to design and build a High Performance Linux Cluster. This paper will present the clustering fundamentals and details of how to setup the cluster. High performance cluster is mostly parallel programming. This paper shows how to run parallel programming with Message Passing Interface (MPI). High-quality implementations for MPI are freely available for FORTRAN, C, and C++ for Linux. In this research, MPICH 2 is used as MPI implementation. Extensive research will be carried out by benchmarking the performance of the cluster with the standard test codes. The results gained will use to compare with the existing clusters. So far, setup of a simple cluster is done and preliminary results are obtained. Further investigation is going on. i TABLE OF CONTENTS Title Abstract Table of Contents List of Figures List of Tables Chapter 1 Introduction 1. 1 Introduction 1. 2 Research Objective Chapter 2 Literature Study 2. 1 What is a Cluster 2. 2 Linux, Open Sources and Cluster 2. 3 High Performance Computing 2. 4 Benchmark of Linux Cluster Chapter 3 Methodology 3. 1 Methodology 3. 2 Work Plan Chapter 4 Problems Faced 4. 1 Operating System 4. 2 Managing Cluster Chapter 5 Preliminary Result 5. 1 Prerequisites 5. 2 Creating a Simple Linux Cluster 5. Testing on Conjugate Gradient Solver List of References Appendix A – Open Sources Location Appendix B – How to Change Hostname on Linux Machines Appendix C – Tabulated Data of Testing on CG Solver Page i ii iii iii 1 1 2 3 3 3 4 5 6 6 8 9 9 9 10 10 11 14 16 17 18 19 ii LIST OF FIGURES Figure 2. 2. 1- Logical view of HPC Figure 3. 3. 1- Flow of the project methodology Figure 5. 3. 1- Speedup versus processes for different grid L IST OF TABLES Table 3. 2. 1- Gantt chart of project iii CHAPTER 1 INTRODUCTION 1. 1 INTRODUCTION Computing power and capabilities have been dramatically increased over the years, but none as dramatic as recently. Beforetime mathematical computations were facilitated by lines drawn in the sand. This eventually led to the abacus, the first mechanical device for assisting with mathematics. Much forward time came punch cards which function as a mechanical method to assist with tabulation. Ultimately, this led to ever more complex machines, mechanical and electronic, for computation. Early computers used small toroids to store hundreds or thousands of bits of information in an area the size of a broom closet. Modern computers use silicon to store billions of bits of information in a space not much larger than a postage stamp. However, as computers become more capable, certain constraints still arise. Early computers worked with 8 bits, or bytes, to solve problems. Nowadays, most computers work with 32 bits at a time, with many dealing with 64 bits per operation, which is alike increasing the width of highway. Another method for increasing performance is to increase the clock speed, which is similar to raising the speed limits. So, modern computers are the equivalent of very wide highways with very fast limits. [2] But the way, there are limits to the performance benefits that can be achieved by simply increasing the clock speed or bus width. As a result, supercomputers introduced in the 1960s were designed primarily by Seymour Cray a Control Data Corporation (CDC) as an alternative approach to increasing computer power. [3] Instead of using one computer to solve a problem, why not use many computers, in concert, to solve the same problem? 1 A computer is not just constructed on hardware. There is also the operating system and the software. There have been noteworthy developments in operating systems that will help us in our looking for higher processing power. A fairly recent evolution is Linux, an operating system wrote by a Finnish student name of Linus Torvald in 1991 with very robust multi-user and multi-tasking capabilities. [2] The Linux source code is openly available, allowing a level of control and modification unavailable in a proprietary environment. 1. 2 RESEARCH OBJECTIVE The main object of this research project is to design, build and benchmark a Linux cluster for high performance computing purpose. This means that Linux must be utilized as operating system for the cluster construction. Four new and high performance computers in computer laboratory of Mechanical Engineering Department, University of Malaya will be used to build the Linux Cluster. This cluster is going to replace the cluster existing in faculty engineering which is outdated in aspect of hardware and software capabilities. The performance of the cluster will be benchmarked using standard test codes and compared with the performance of the existing clusters. The sub-objectives of the project are: a) Managing a cluster in a production environment with a large user base, job scheduling and monitoring. ) Study on Message Passing Interface (MPI) programming model, a computation comprises one or more processes that communicate by calling library routines to send and receive messages to other process. c) Study on parallel programming to know how to design and build efficient and cost effective programs for parallel computer system based on Amdahl’s Law. 2 CHAPTER 2 LITERATURE STUDY 2. 1 WHAT IS A CLUS TER? In its simplest form, a cluster is two or more computers that work together to provide a solution. The idea behinds clusters is to join the computing powers of the nodes involved to provide higher scalability, more combined computing powers, or to build in redundancy to provide higher availability. Clusters of computers must be somewhat self-aware that is the work being done on a specific node often must be coordinated with the work being done on other nodes. Consequently, it makes clusters are complex in connectivity configurations and sophisticated inter-process communications between the nodes. Furthermore, the sharing of data between the nodes of a cluster through a common file system is almost always a requirement. All clusters basically fall into two broad categories: a) High Availability (HA) strive to provide extremely reliable services where the failure of one of or more components (hardware, software, or networking) does not significantly affect the availability of the application being used. b) High Performance Computing (HPC) – designed to provide greater computational power than one computer alone could provide by using parallel computing techniques. [4] 2. 2 LINUX, OPEN SOURCES AND CLUSTERS Linux is being accelerated with high speed development at a faster pace than any operating system in history. The basic idea of open source is very simple: when programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. Operating systems such as Linux, which can be 3 obtained virtually for free, provide a very economical solution to operating system licensing on large numbers of nodes. Besides that, with the familiarity of Linux, there are many tools, utilities, and application available to help build and manage a cluster. Many of these programs are available either for free or for a very reasonable cost. [4] Parallel Application MPI Linux Local Area Network Master Node Interconnect Compute Nodes Cluster Management Tools Figure 2. 2. 1: Logical view of HPC. 2. 3 HIGH PERFORMANCE COMPUTING High-Performance Computing (HPC) is a branch of computer science that focuses on developing supercomputers, parallel processing algorithms and related to software. HPC is very important due to its lower cost and because it is implemented in sectors where distributed parallel computing is needed to: a) Solve large scientific problems Advanced product design Environmental studies (weather prediction and geological studies) Research b) Store and process large amounts of data Data mining Genomics research Internet engine search Image processing [1] 2. 4 BENCHMARK OF LINUX CLUSTER In cluster, benchmarking means measuring the speed with which a cluster system will execute a computing task, in a way that will allow comparison between different hard/software combination. Benchmarking is helpful in understanding how the database manager responds under var ying conditions. [5] Benchmarking is a tedious, repetitive task, and takes attention to details. Normally the results are not what would expect, and subject to interpretation. Benchmarking deals with facts and figures, not opinion or approximation. There are many benchmark programs for HPC. Perhaps the best-known benchmark in technical computing is the LINPACK benchmark. The version of this benchmark that is appropriate for clusters is the High Performance LINPACK (HPL). Obtaining and running this benchmark are relatively easy, though getting good performance can require a significant amount of effort. In addition, while the LINPACK benchmark is widely known, it tends to significantly overestimate the achievable performance for many applications because it involves n3 computation on n2 data and is thus relatively insensitive to the performance of the node memory system. [6] 5 CHAPTER 3 METHODOLOGY 3. 1 METHODOLOGY Initially, literature review of clusters usage worldwide is compiled and commented. Information on usage of clusters in computer science field worldwide and locally was gathered from online sources such as journals and articles. A concise summary of procedures on how to design and build a cluster is presented and documented. At the same time, study are made on the user guides for Linux which as the operating system of the cluster. Besides that, reviews of cluster’ benchmark will be obtained from online sources. In order to understand the works behind test codes, study on parallel programming models is needed. The first thing to manage is the physical deployment of a cluster. After has fulfilled the minimum hardware requirements, installation of OS on each machine will be carried out by manually. Next, start to decide the free open source software going to use for the cluster construction and download them from internet. The setup procedures will then be explained and documented to illustrate the method of building cluster for 4 Linux machines. This includes preliminary tests. After cluster is build, benchmarking for cluster will be carried out. The experimental data for different test codes provided by supervisor and from internet such as LINPACK and in house codes will be recorded systematically to enable comparison cluster’s performance with others existing clusters to be made. The data collected will be tabulated and relevant graphs plotted. Next, the results will be critically analyzed. Finally, a conclusion is made based on the experimental result. At the meantime, improvement of cluster performance will be done on the job scheduling and monitoring. 6 Literature review on history and current usage of cluster Study on user guides of Linux Reviews on cluster’s benchmark Study on parallel programming models Physical deployment of the cluster Installation of OS (CentOS 5) on each machine Setup the Linux Cluster Benchmarking the Linux Cluster Improve the Linux Cluster Analysis of results Discussion and conclusion Figure 3. 3. 1: Flow of the project methodology 7 10% Analysis Improvement Introduction to Titles of Final Year Project 15% 0% Discussion conclusion Benchmarking Preparation of report and presentation 100% 10% Setup cluster Installation of OS Physical deployment Study on parallel programming Reviews on benchmarking Study on Linux user’s guide 3. 2 WORK PLAN Literature review on cluster Schedule MONTH 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 JULY’08 AUG’08 SEPT’08 OCT’08 NOV’08 DEC’08 JAN’09 WEEK 100% 100% 60% FEB’09 MAR’09 APR’09 Table 3. 2. 1: Gantt chart of project 100% 100% Today 25% 8 CHAPTER 4 PROBLEM S FACED 4. 1 OPERATING SYSTEM Problem In the early, OS (Centos5) installation kit provided by supervisor is 32 bits but currently machine running is based on AMD Opteron system which is able to support 64 bits system. This means that the usage and performance of the machines existing did not optimize. Solution Centos5. 2 for 64 bits is downloaded. 2 more machines will be installed for OS 64 bits. A comparison between performance of cluster for 32 bits and 64 bits system will be carried out. 4. MANAGING CLUSTER Problem When it comes to managing a cluster in a production environment with a large user base, job scheduling and monitoring are crucial. In order to do so, Rocks Cluster will be build. On the frontend node (head note), at least two ethernet interfaces are required but machine just has only one ethernet interface. Solution A ethernet interface (network card) has bought and now waiting for technician to open the locked CPU’s casing for installation. 9 CHAPTER 5 PRELIMINARY RESULT 5. 1 PREREQUISITES Node Hardware 4 machines have the following setup each: Processor: 2nd- Generation AMD Opteron 2. 0Ghz, 1MB L2 Cache per core. RAM: 2GB DDR2-667 MHz non-ECC Hard drive: 80GB 7200 SATA 3Gb/s NCQ Network: Integrated Broadcom 5755 10/100/1000 LAN Switch: D-Link 5-Ports 10/100Mbps desktop switch Software For the defaults installation are needed: a) A copy of the latest distribution, mpich2-1. 0. 7. tar. gz b) GNU C complier c) GNU FORTRAN, C++ and JAVA compiler if wish to write or execute MPI program in any of these languages. d) Python 2. 2 or later version, for building the default process management system, MPD. e) Setup Password-less SSH. f) Any one of UNIX operating systems, in this case CentOS 5. (one of the family of LINUX) is used. Configure will check for the prerequisites and some of dependencies will be needed to complete the ins tallation. Details to obtain the open sources software are provided in Appendix A. 10 5. 2 CREATING A SIMPLE LINUX CLUSTER Here are the steps from using MPICH2 and 4 sets of regular Linux machines to build bare-bones systems of a simple Linux Cluster. Step 1 GCC complier is installed. # rpm –Uvh gcc-4. 1. 2-42. el5. i386. rpm Follow by Gfortran, C++ and Java complier. There are needed dependencies to complete the installation. Refers to Appendix A) Step 2 Start to install the MPICH2. The tar file is unpacked in the directory home/ab01/ libraries. # tar xfz mpich2-1. 0. 7. tar. gz Now, that directory is contained a subdirectory named mpich2-1. 0. 7 Step 3 The installation directory is chose. # mkdir /home/ab01/mpich2-install Step 4 MPICH is configured, the installation directory is specified, and the configure script is ran in the source directory. # . /configure –prefix=home/ab01/mpich2-install 21 | tee configure. log Step 5 MPICH2 is build. # make 21 | make. log [6] 1 1 Step 6 MPICH is installed. # make install 21 | install. og All required executables and scripts in the bin subdirectory of the directory specified by the prefix argument to configure are collected by this step. Step 7 The bin subdirectory of the installation directory is added to the path by added command below in file etc/bashrc: PATH=/home/ab01/mpich2-install/bin:$PATH ; export PATH Step 8 Everything is checked in order at this point by doing # which mpd # which mpiexec # which mpicc Step 9 The default process manager is called MPD, which is ring of daemons on the machines where run the MPI programs. In next few steps, mpd is rang up and tested. A file named mpd. conf (/etc/. mpd. conf if user) is created by: # cd/etc # vi mpd. conf press ‘a’ then type â€Å"secretword=hpcluster†, press â€Å"ESC† and type â€Å":x† to save and exit. The file is made readable and writable only by root user. # chmod 600 mpd. conf Step 10 Bringing up a ring of one mpd on the local machine, testing one mpd command, and bringing the â€Å"ring† down is checked as first sanity. # mpd # mpdtrace # mpdallexit Step 12 A ring of mpd is brought up on a set of machines. A file named mpd. hosts is created consisting of a list of machine names, one per line and located it in root directory. These hostnames will be used as targets for ssh or rsh, so include full domain names if necessary. Steps to permanent change hostname of Linux machines are provided in Appendix B. To reach these machines with ssh or rsh without entering a password is tested by doing # ssh othermachine hostname or IP address Step 13 The daemons on the hosts in the file mpd. hosts is started by # mpdboot –n Step 14 There are some examples in the install directory mpich2-1. 0. 7/examples. One of the cpi example, which computes the value of tested by # mpiexec –n 4 cpi The value and wall time is shown after finish executed. Vary numbers of machines was tested in this example. After all of the above steps are completed, this means that MPICH2 has been successfully installed. The Linux cluster is ready to run others MPI programs and benchmarking for its performance. by numerical integration in parallel is 12 13 5. 3 TESTING ON CONJUGATE GRADIENT SOLVER Description of solver: This is an incomplete Cholesky pre-conditioned conjugate gradient solver for symmetric matrices (e. g. pressure or pressure-correction equation, heat conduction, etc. ), for multi processor run. For preconditioning matrix, parallelization technique follows that of Ferziger Peric (2004) for SIP. The rest of the codes utilize loop distribution. Purpose of Testing: Obtain the wall clock time required to solve the problem with different numbers of processor run on it. Wall clock time or wall time is a measure of how much real time that elapses from start to end, including time that passes due to programmed (artificial) delays or waiting for resources to become available. In computing, wall clock time is the actual time taken by a computer to complete a task. Results obtained use to plot speedup versus processor graph and then analyzed on it. Result of Testing: Speedup versus Compute nodes with different Grid 2. 5 2 speedup ,S 1. 5 1 0. 5 Case 1 i=100; resmax=1E-20; Grid=64x64x64 Case 2 i=100; resmax=1E-20;Grid 128x128x128 Case 3 i=100; resmax=1E-15; Grid 256x256x256 0 0 1 2 processes,N 3 4 5 Figure 5. 3. 1: Speedup versus processes for different grid 14 Discussion on Testing Result: As shown in the graph, there is a speedup around 1. 4 on 2 processes in case 1 and the tabulated data are provided in Appendix C. However, the declines of speedup are occurred at 3 and 4 processes. When the increasing communication time needed is more than the decreasing computing time with more processes, for overall, wall clock time will be slightly increased than before. Time to transfer data between processes is usually the most significant source of parallel processing overhead. There is no improvement result on 3 processes in case 2. Parallel processing overhead occurred at this point because impossible to distribute the subtask workload equally to each processor when there are 3 processes working on it. That means at some points, all but one processes might be done and waiting for one process to complete. This phenomenon called imbalance load. Also shown in the graph, case 3 has a normal speedup curve. There is a speedup around 2 for 4 processes which is quite low. Amdahl’s Law states that if P is the proportion of a program that can be made parallel, and (1 ? P) is the proportion that cannot be parallelized (remains serial), then the maximum speedup that can be achieved by using N processes is [8] 1 1? + = Maximum speedup in this case is 2 then the proportion of this conjugate gradient solver that can be made parallel will be 0. 66. This also means that there has 34% of program run in serial. 15 LIST OF REFERENCES 1. High Performance Linux Clustering, Part 1: Build a Working Cluster, Oct 2005, Aditya Narayan, Founder, QCD Microsystems, United State of America, viewed 2 August 2008, 2. Linux HPC Cluster Installation, June 2001, IBM International Technical Support Organization , Lius Ferreira, Gregory Kettemann, United State of America, viewed 18 July 2008, 3. Supercomputer, July 2008, Wikipedia, viewed 10 Oct 2008, 4. Linux Clustering with CSM and GPFS, January 2004, IBM International Technical Support Organization, Stephen Hochstetler, Bob Beringer, United State of America, viewed 20 July 2008, 5. Bechmarking, April 2007, Wikipedia, viewed 10 Oct 2008, 6. High Performance Linux Clustering, Part 2: Build a Working Cluster, Oct 2005, Aditya Narayan, Founder, QCD Microsystems, United State of America, viewed 2 August 2008, 7. MPICH2 Installer Guide version 1. 0. 7, April 2008, Mathematics and Computer Science Division, Argonnne National Laboratory, U. S Department of Energy, viewed 10 August 2008. 8. Amdahl’s Law, Dec 2007, Wikepedia, viewed 20 Oct 2008 16 APPENDIX A Open Sources Location 1) OS-Centos 5 for 32bits and 64bits Source: 2) GCC, Gfortran, C++ and Java complier Source: cd/dvd of installer Centos 5 Name: a) b) c) d) gcc-4. 1. 2-42. el5. i386. rpm gcc-gfortran-4. 1. 2-42. el5. i386. rpm gcc-c++-4. 1. 2-42. el5. i386. rpm gcc-java-4. 1. 2-42. el5. i386. rpm Dependencies required Source: cd/dvd of installer Centos 5 Name: a) b) c) d) e) f) g) h) glibc-devel-2. 5-18. i386. rpm glibc-devel-2. 5-24. i386. pm glibc-headers-2. 5-18. i386. rpm glib-java-0. 2. 6-3. fc6. i386. rpm libgcj-devel-4. 1. 2-14. el5. i386. rpm libgfortran-4. 1. 2-14. el5. i386. rpm libgomp-4. 1. 2-14. el5. i386. rpm libstdc++-devel-4. 1. 2-14. el5. i386. rpm 3) MPICH2 version 1. 0. 7 Source: 4) How to setup password-less SSH using Public – Private Keys 17 APPENDIX B How to change the Hostname of a Linux system Permanent hostname change Step 1: Editing hostname file in /etc/sysconfig/network. In Linux machine etc/sysconfig/network file look like this: NETWORKING = yes NETWORKING_1PV6=no HOSTNAME=hpcluster1 The hostname in etc/sysconfig/network have changed to from original to hpcluster1, hpcluster2, hpcluster3, hpcluster4 accordingly on each machine. Step 2: The older hostname listed in /etc/hosts file were changed to new hostname manually. Step 3: After that Linux machines is reboot. Network service is restarted before rebooting by # /etc/init. d/network restart Hostname on each machine was changed to specified name and can be function properly. 18 APPENDIX C Tabulated Data of Testing on CG Solver Case 1 i=100; resmax=1E-20; Grid=64x64x64 N T1 T2 AVE 1 2 3 4 22. 1375 16. 0003 20. 272 20. 9793 22. 2508 16. 3247 20. 2426 20. 691 22. 19415 16. 1625 20. 2573 20. 9742 Speed Up 1 1. 3732 1. 0956 1. 0582 Table C. 1 : Data for case 1. Case 2 i=100; resmax=1E-20;Grid 128x128x128 N T1 T2 AVE Speed Up 1 248. 3205 245. 6234 246. 97195 1 2 129. 9253 131. 2424 130. 58385 1. 8913 3 129. 9289 128. 8646 129. 39675 1. 9086 4 110. 3437 111. 3847 110. 8642 2. 2277 Table C. 2 : Data for case 2. Case 3 i=1 00; resmax=1E-15; Grid 256x256x256 N T1 T2 AVE Speedup 1 1538. 1186 1536. 8205 1537. 46955 1 2 1023. 5759 1022. 2592 1022. 91755 1. 503 3 837. 1469 840. 9828 839. 06485 1. 8324 4 773. 0982 770. 9155 772. 00685 1. 9915 Table C. 3 : Data for case 3. = 19

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.