Discuss the parallel performance of the lu factorization
Course:- Theory of Computation
Reference No.:- EM13923511

Assignment Help
Expertsmind Rated 4.9 / 5 based on 47215 reviews.
Review Site
Assignment Help >> Theory of Computation

You are provided with a program matrix_serial . c to solve a linear system of equations Ax = b. The code includes routines to initialize A and b, compute the LU factorization of A, and solve triangular systems with lower and upper triangular matrices. The main program uses these routines to compute the solution of the system Ax = b by computing A = LU, and solving Ly = b for y, followed by Ux = y for x. Instructions to compile and execute the code are included in the file.

You need to parallelize the routines for computing LU factorization and solving the triangular systems using pthreads. The file should be named matrix . c.

1. You need to parallelize the routines for computing LU factorization and solving the triangular systems using pthreads. The file should be named matrix . c. A total of 20 points are reserved for performance of the code: speedup obtained by the multithreaded code and the overall execution time will be considered when awarding these points.

2. Execute the code for n= 212 with p chosen to be 2k, for k = 0, 1,...,6. Plot execution time versus p to demonstrate how time varies with the number of threads. Use logarithmic scale for the x-axis. Plot speedup versus p to demonstrate the change in speedup with p.

3. Discuss the parallel performance of the LU factorization routine and the triangular solver routines. Comment on the observed performance and the possible reasons for the observations.

4. You will receive bonus points equal to the amount the sum of speedups observed in the following routines - LU factorization, lower triangular solve, and upper triangular solve - exceeds 2.0. Speedup is computed as the speed improvement achieved by each routine over the execution time of the routine reported by matrix_serial . c for a single run. Bonus points are subject to a maximum of 10 points. Total speedup value will be rounded. Individual routine speedup values lower than 1.0 will be raised to 1.0 to compute bonus points. For example, a speedup of 3.5, 2.1, and 0.7, respectively, in the three routines is awarded 5 bonus points. In your submission, you need to specify the input arguments to the executable that produce the best speedup. Also indicate the speedup you observe in each of the routines. Compilation will be done using icc with default optimization.

Attachment:- hw4.txt

Put your comment

Ask Question & Get Answers from Experts
Browse some more (Theory of Computation) Materials
Considering a single programmed operating system, what is the minimal total time required to complete executions of the two processes? You should explain your answer with a di
Create a program that reads integers in range 0 .. 9999. The event stops reading if -99 is entered. Your event should use Stack to store those numbers then it used Priority Qu
Goal is to find expedition of maximum profit. Either show that there exists polynomial-time algorithm for GDP, or show that corresponding decision problem is NP-complete.
a tape that is infinitely long in both directions and is divided into cells; at any given step, each cell either is blank or contains a 1 (we will refer to the latter type o
A javascript program to demonstrate computational complexity. Using the wikipedia article; a computer program that calculates the number of moves necessary to solve Tower of
Explain informally, the working of the machine M. Give the language L(M ) in set notation. Provide a grammar having 12 or less rules that accepts the same language as M. What
Write down which of the following Turing nuchines is suitable for this task. For each machine which is unsuitable, explain why it is unsuitable this explanation can take the
Explain the importance of having a test coupled with real users' views of the product at the end of the development effort, even if it is the test of a prototype and not the