Discuss the parallel performance of the lu factorization
Course:- Theory of Computation
Reference No.:- EM13923511

Assignment Help
Expertsmind Rated 4.9 / 5 based on 47215 reviews.
Review Site
Assignment Help >> Theory of Computation

You are provided with a program matrix_serial . c to solve a linear system of equations Ax = b. The code includes routines to initialize A and b, compute the LU factorization of A, and solve triangular systems with lower and upper triangular matrices. The main program uses these routines to compute the solution of the system Ax = b by computing A = LU, and solving Ly = b for y, followed by Ux = y for x. Instructions to compile and execute the code are included in the file.

You need to parallelize the routines for computing LU factorization and solving the triangular systems using pthreads. The file should be named matrix . c.

1. You need to parallelize the routines for computing LU factorization and solving the triangular systems using pthreads. The file should be named matrix . c. A total of 20 points are reserved for performance of the code: speedup obtained by the multithreaded code and the overall execution time will be considered when awarding these points.

2. Execute the code for n= 212 with p chosen to be 2k, for k = 0, 1,...,6. Plot execution time versus p to demonstrate how time varies with the number of threads. Use logarithmic scale for the x-axis. Plot speedup versus p to demonstrate the change in speedup with p.

3. Discuss the parallel performance of the LU factorization routine and the triangular solver routines. Comment on the observed performance and the possible reasons for the observations.

4. You will receive bonus points equal to the amount the sum of speedups observed in the following routines - LU factorization, lower triangular solve, and upper triangular solve - exceeds 2.0. Speedup is computed as the speed improvement achieved by each routine over the execution time of the routine reported by matrix_serial . c for a single run. Bonus points are subject to a maximum of 10 points. Total speedup value will be rounded. Individual routine speedup values lower than 1.0 will be raised to 1.0 to compute bonus points. For example, a speedup of 3.5, 2.1, and 0.7, respectively, in the three routines is awarded 5 bonus points. In your submission, you need to specify the input arguments to the executable that produce the best speedup. Also indicate the speedup you observe in each of the routines. Compilation will be done using icc with default optimization.

Attachment:- hw4.txt

Put your comment

Ask Question & Get Answers from Experts
Browse some more (Theory of Computation) Materials
How many bits are left for the address part of the instruction - What is the maximum allowable size for memory and what is the largest unsigned binary number that can be accom
Discuss the impact of Moore's law on data center costs on such things as servers and communications equipment. List at least 3 steps or recommendations your data center can
Write a VHDL module for a 6-bit accumulator with carry-in (CI) and carry-out (CO). When Ad = o, the accumulator should hold its state. When Ad = 1, the accumulator should ad
Create and DFA or LR(0) items for this grammar. Is this grammar LR(0) parsing table? If not, explain LR(0) conflict. If so create LR(0) parsing table.
Argue that the following prob is NP Complete. Given list of positive integers, u1,u2,...un (in binary representation) and asked if there is partition of this set into 3 subse
Analyse the Case Study documents and develop a candidate architecture to meet the functional and non-functional requirements - Document your proposed architecture with a high
Construct a DFA that recognizes each of the following languages. Unless otherwise noted we are assuming that ω ∈ {0,1}*. (A drawing of a state diagram is sufficient.)
Use rules of inference to show that the hypotheses "If it does not rain or if it is not foggy, then the sailing race will be held and the lifesaving demonstration will go on