What is the maximum speedup obtainable on the program

Assignment Help Computer Engineering
Reference no: EM131218167

Throughout, be sure to type your homework; you may want to use the electronic version of the homework as you can then just insert your answers where they need to be. Show your work on the problems so you can get partial credit for incorrect answers (and tips from the grader about where you went wrong). Writing your units in the calculations will help ensure you wind up with a correct value. When done with a problem, ask yourself: "Does the answer make sense?" Please remember that the homework in here requires some time to complete-it is best if you work on it several times between now and the due date.

1. Three processors P1, P2, and P3 have the same ISA. Their clock rates and average CPI are as follows:

 

P1

P2

P3

Clock Rate

3 GHz

2.5 GHz

4.0 GHz

Average CPI

1.5

1.0

2.2

a. Which processor has the highest performance?

b. How much faster is the fastest processor than the slowest processor?

c. If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions for each processor.

 

P1

P2

P3

Number of cycles:

 

 

 

Number of instructions:

 

 

 

d. For processor P1, we want to reduce the execution time of the reference program from 10 seconds to 7 seconds, but the change leads to an increase of 20% in the CPI. What clock rate is required to reach the target reduction in execution time?

2. The following table shows the number of instructions for programs A and B on a processor:

 

Arith

Store

Load

Branch

Total

Prog A

650

100

600

50

1400

Prog B

750

250

500

500

2000

Assume that arithmetic instructions take 1 clock cycle, load and store each take 5 cycles, and branches require 2 cycles.

a. What is the execution time of each program in a 2 GHz processor?

b. Find the average CPI for each program.

c. If the number of load instructions can be reduced by one half (thus also reducing the total number of instructions), calculate the new average CPI of each respective program over its original version?

d. What is the speedup of each revised program in the previous problem compared with the original versions?

3. Consider two implementations of the same ISA, P1 and P2, with five classes of instructions (A through E) in the instruction set. P1 has a 4 GHz clock rate, while P2 has a 6 GHz clock rate. The average number of cycles for each instruction class are in the following table:

 

CPI Classes:

 

A

B

C

D

E

P1

1

2

3

4

5

P2

3

3

3

5

5

a. Assume that peak performance is defined as the fastest rate that a computer can execute any instruction sequence (ie: the instructions can be chosen to maximize performance). What are the peak performances of P1 and P2 expressed in instructions per second?

b. If the number of instructions in a program has equal numbers of all instruction classes except for class A, which occurs twice as often as the others, how much faster is P2 than P1?

c. Using the same instruction mix, what clock frequency for P1 will give it the same performance as P2?

4. Consider a CPU manufacturing process using 300mm wafers.

a. If the dimensions of the die are 1.2cm x 0.8cm, what is the approximate number of dies produced?

b. Assuming a defect density of 0.55/cm2, what is the die yield?

c. If the wafer costs $300, what is the price of a die?

d. If the chip price is 22% more than the cost per die, what is the chip price?

5. Consider a program comprising 62% arithmetic instructions, 16% load instruction, 8% store instructions, and 14% branch instructions. Assume the CPI for all instructions is 2, except for branches which have a CPI of 3.

a. Suppose we consider two alternate improvements to a processor. P1 will execute arithmetic instructions twice as fast. P2 will execute both load and store instructions 3 times faster. In each case, other instructions are unaffected by the changes. Which is faster, P1 or P2, and by how much?

b. Consider running the program on a machine with a large graphics card. When we run the program on this machine, the arithmetic instructions only can be run in parallel on the card, everything else is run sequentially. As the number of stream processors on the GPU goes toward infinity, what is the maximum speedup obtainable on this program?

c. In the previous problem, the GPU improvement was applied toward arithmetic instructions. Assuming that you could apply the GPU to any one instruction class, this is still the smartest choice. What "Great Idea in Computer Architecture" is this an example of?

d. Assuming we can apply the GPU improvement to both load and store instructions. With infinitely many streaming GPU processors, is it possible to make the program run two times faster? Explain.

6. Compilers can dramatically affect performance of some applications. Assume compiler A results in a dynamic instruction count of 1.0E9 with an execution time of 1.1 seconds. Compiler B results in a dynamic IC of 1.2E9 and an execution time of 1.5 seconds.

a. Find the average CPI for each program given that the processor has a clock cycle time of 1 ns.

b. Assume P1 executes compilers A's code, and P2 executes compiler B's code, and the execution times are the same. How much faster is the clock on P1 versus P2?

c. A third compiler is developed that uses only 6.0E8 instructions and has an average CPI of 1.1. Calculate the speedup using this new compiler relative to both compiler A and compiler B.

7. Consider a reference program. Running on processor P1, with a clock rate of 4 GHz and an average CPI of 0.9, it requires execution of 5.0E9 instructions. Running on processor P2, with a clock rate of 3 GHz and an average CPI of 0.75, it requires execution of 1.0E9 instructions to complete.

a. One fallacy is that the fastest clock rate equates to the best performance. Calculate the execution times. Is the chip with the fastest clock the one with the best performance?

b. Another fallacy is to use MIPS (millions of instructions per second) to compare performance of two processors. Calculate the MIPS for each chip. Does the chip with the larger MIPS on this program also the faster one?

c. Another fallacy is to use MFLOPS (millions of floating-point operations) as a performance metric. MFLOPS is calculated as:

MFLOPS=(# floating pt operations)/(execution time · 10^6 )

d. Assume 35% of the instructions on both processors are floating-point operations. Find the MFLOPS figures for both programs. Is the chip with the highest MFLOPS also the fastest on this program?

8. Suppose we run a program on various parallel architectures. 75% of the program can be parallelized. What is the speedup for 2, 8, 32 and 128 processor machines? What is the maximum speedup achievable on this program?

9. Computationally hard problems are an important part of computer science. In such problems, the execution time of a solution program often grows exponentially as a size of the input. If I need to be able to substantially increase the size of the input I can handle on such a problem in order to solve a problem instance of interest, why is parallelizing the code likely to have little effect?

Reference no: EM131218167

Questions Cloud

New technologies have emerged : How have ICT architectures changed over time as new technologies have emerged?
Visit a museum or gallery exhibition : Visit a museum or gallery exhibition or attend a theater, dance, or musical performance.- Write a report that describes your experience.
Setup an information policy for policy : Would information policy help the company to preventing unintentionally security attack from the employees?, if you were the decision maker of a company, would you like to setup an information policy for your policy?
Provide sufficient information in the summary : Do not paste the article abstract in the paper. Annotations must be your summary of the article. The summary must include the findings of research that was included in the article.
What is the maximum speedup obtainable on the program : What is the speedup of each revised program in the previous problem compared with the original versions? what is the maximum speedup obtainable on this program?
Derive an analytical expression of flux density : Compare the obtained computational results with infinite surface charge approximation - Derive an analytical expression of flux density - Compare the computational results with the analytical solution.
Case analysis-research in motion : Case Analysis - Research in Motion - RIM Research the history of RIM, specifically with an eye toward product development. Your research should lead to answers to the following questions.
Page paper on compilers and interpreters : There are Java compilers and interpreters. Write a 3 page paper on compilers and interpreters (2-3 paragraphs on each topic). Your paper must include the following:
Important to understand visitors : Explain why it is important to understand visitors to your site who are not your customers and visitors who do not return.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Write down an application that displays a menu

Write down an application that displays a menu.

  Determine how much money you earned or lost with each stock

Summarize the various accounting systems that each firm provides. Be sure to address the following for each firm: a. The various types of accounting systems it sells (e.g., Oracle sells Oracle Financials as well as PeopleSoft financials)

  Telecommunications planning model

Discuss how to utilize the telecommunications planning model in detail and also explain the significance of the strategic management in order to support this model.

  Imagine you are asked to head a non-computerized department

Now assume you would like to connect the internal network you devised in Q1 to the outside world. What are the steps you consider to follow? And what are the additional equipment/software you would purchase and why?

  Design a flowchart and pseudocode using control breaks

design a flowchart and pseudocode using control breaks that enable the user to print invoices for any raw material that is selected.

  Analyze and explain specific remote network monitoring rmon

you are working as the network administrator in abc technologies. your companys network has been experiencing many

  Create a program that reads an exchange rate

Write a program that reads an exchange rate for converting English currency to U.S. currency and after that reads several values in English currency and converts each amount to the equivalent U.S. currency.

  Discuss how a cio might handle schrage scenarios

Must businesses choose between good ethics and financial benefits? Explain your answer using Schrage's scenarios or your own examples.

  Closed-loop functional block diagram for grinding process

Draw a closed-loop functional block diagram for the grinding process, showing the input, output, force, and grinder feed rate.

  Write a short on structured design

Identify task you perform regularly, such as cooking, mowing the lawn, or driving a car

  Give some of the major threats to information systems

What will happen if that information were compromised? What are some of the main threats to information systems? Provide three examples to support your answer.

  Questionmno incorporated is considering acquiring its own

questionmno incorporated is considering acquiring its own leased lines to handle voice and data communications between

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd