What is the asymptotic accuracy for a one-bit predictor

Assignment Help Computer Engineering
Reference no: EM131884932

High-Performance Computer Architecture Midterm Exam

Problem 1 - Amdahl's Law

We are optimizing a program which spends 25% of its time in third-party libraries. We do not have source code for those libraries and, as a result, cannot apply any optimizations to them. What speedup do we need on the remaining 75% of the program to achieve an overall speedup of 2 on the entire program?

Problem 2 - Processor Performance Equation

We are working on a new processor, Techium 5, which is going to replace the old Techium 4. Techium 5 has an improved ISA that results in executing only half as many instructions as Techium 4 for the same program. Additionally, the CPI on Techium 5 is reduced by 40% and clock frequency is increased by 25% compared to Techium 4. What is the overall speedup of moving from Techium 4 to Techium 5?

Problem 3 - Branch Prediction

The accuracy of a predictor is the fraction of all predictions that turn out to be correct. Asymptotic accuracy of a predictor is its accuracy when the time over which we observe its behavior approaches infinity. A processor is executing a long-running program in which the most frequently occurring branch is having the following repeating sequence of outcomes:

T, T, NT, T, NT, T, T, NT, T, NT, etc.

(twice T, once NT, once T, once NT, then repeat)

A) What is the asymptotic accuracy for a one-bit predictor on this branch?

B) What is the asymptotic accuracy of a two-bit counter predictor on this branch? (Assume that we are using the using the normal 0 ↔ 1 ↔ 2 ↔ 3 counter for the 2-bit predictor)

C) What is the asymptotic accuracy on this branch for a local history predictor with a two-bit history and (four) one-bit counters?

D) Can we design a local history predictor with K bits of history and one-bit counters that would have asymptotic accuracy of 1 (always correct after the initial warm-up period) on this branch? If yes, how many bits of history do we need? If no, why not?

Problem 4 - Predication

Consider the code fragment shown below. We show source code, the actual instructions the compiler generates without using predication, and the instructions generated when using conditional-move predication. The predicated code is not complete - make sure you complete the last instruction (MOVN) in the predicated code before you answer the rest of this question.

Source

MIPS (No predication)

MIPS (predication)

if cond then

a= a - 1;

b = b + 1;

endif

BEQZ R1, End

ADDI R2, R2, -1

ADDI R3, R3, 1

End:

ADDI R8, R2, -1

ADDI R9, R3, 1

MOVN R2, R8, R1 ; if R1!=0 then R2 = R8

MOVN R _____, R _____, R _____

Without branch mispredictions, the processor achieves the CPI of 1, i.e. it completes one instruction every cycle. A branch misprediction results in the one cycle to complete the branch instruction, plus 10 wasted cycles.

Suppose that "cond" is true 90% of the time, i.e. out of 1000 times that this code is executes, R1 is zero 100 times and is non-zero 900 times. Recall that "BEQZ" stands for "branch on equal to zero" and that the "N" in MOVN stands for "non-zero".

  • With a perfect branch predictor (no branch misprediction penalty ever), what is the average execution time of this code fragment without predication? What is the average execution time with predication? Which would you use? Note: You can compute the average execution time of this fragment by computing how long it takes to execute the 1000 execution of the fragment as illustrated above, then dividing that number by 1000.
  • Suppose now that we have a branch predictor whose overall accuracy is 50%. What is now the average execution time with predication and without predication? Which would you use?
  • The predicated code performs better than the non-predicated code when the overall branch predictor accuracy is below ______ percent. Show your work.

Problem 5 -

A one-wide processor uses out-of order execution with an ROB. The processor has two unified reservation stations (RS0 andRS1) and four ROB entries (ROB0 through ROB3). There are three fully pipelined execution units, one for branches and two for all other instructions. Execution takes two cycles for any instruction. Every instruction spends a cycle for a CDB broadcast, even those instructions that don't produce any values. A reservation station is freed in the cycle in which its instruction begins execution, and another instruction can use that RS in the next cycle after that. Similarly, a ROB entry can be reused in the next cycle after the one in which it was freed. An instruction that is waiting for a result to be produced can begin execution in the next cycle after the one in which its last missing operand is broadcast. Assume that all branches are correctly predicted, that the ROB is empty at the beginning of the starting cycle (cycle 1), and that the following instructions are already fetched, decoded, and waiting in a buffer to be issued:

ADD R1, R2, R2

ADD R2, R1, R3

SUB R3, R0, R4

BNE R1, R2, Label1

ADD R4, R1, R4

BNE R3, R0, Label2

ADD R3, R3, R4

Note that this is not a listing of a program, this is a list of instructions in the pre-issue buffer, in the order in which they appear in that buffer. In other words, branches have already been (correctly) predicted and these are the instructions that have been fetched and decoded.

A) Which instruction is the last one to commit? Why?

B) What is the first cycle in which a branch is committed? Why?

C) What is the first cycle in which no instruction is issued? Why?

D) Which tag (name) is broadcast on the CDB when the last ADD is executed? Why?

Problem 6 - Tomasulo's Algorithm

A processor uses Tomasulo's algorithm in its floating-point unit, and is executing the sequence of instructions given in the first table on this page. You don't know what the execution times are for various instructions, but you do know the following

  • There are 3 reservation stations, 2 for the ADD unit and 1 for the MUL unit. If multiple reservation stations of the right kind are free, the processor uses the first free one (the RS with the lowest number).
  • When a reservation station is freed in some cycle, it cannot be used by another instruction in the same cycle in which it was freed.
  • There is a cycle (let's call it Cycle X) during which the processor's CDB is used for the very first time, and the result that is broadcast on the CDB during cycle X is a result of an ADD instruction.

A) At the end of cycle X, for each instruction determine whether or not it has issued, began execution, and whether its result has been written. To help you out, we have already filled out this data for the first instruction. Note: "Execute" means that the instruction has been sent from the RS to the execution unit, but such an instruction may or may not have completed execution.

 

Instruction Status

 

Issue?

Execute?

Wr. Result?

MUL F1, F2, F3

Yes

Yes

No

ADD F1, F0, F1

 

 

 

ADD F2, F3, F2

 

 

 

ADD F3, F1, F4

 

 

 

ADD F0, F2, F3

 

 

 

B) What is the content of the reservation stations at the end of cycle X?

Reservation Stations

Name

Busy?

Op

Vj

Vk

Qj

Qk

Add1

 

 

 

 

 

 

Add2

 

 

 

 

 

 

Mul1

 

 

 

 

 

 

C) At the beginning of cycle X, what is the content of the processor's RAT?

Register

F0

F1

F2

F3

F4

Mapping

 

 

 

 

 

Problem 7 - Compiler Optimizations and VLIW

We have a VLIW processor with 16 registers and the following units:

1) A branch unit (for branch and jump operations), with a one-cycle latency.

2) A load/store (LW/SW operations) unit, with a six-cycle latency.

3) Two arithmetic units for all other operations, with a two-cycle latency.

Each instruction specifies what each of these four units should be doing in each cycle. The processor does not check for dependences nor does it stall to avoid violating program dependences - the program must be written so that all dependences are satisfied. For example, when a load-containing instruction is executed, the next five instructions cannot use that value. If we execute the following loop (each row represents one VLIW instruction), which performs element-wise addition of two 60-element vectors (whose address are in R0 in R1) into a third (address is in R2), where the end address of the result vector is in R3, and each element is a 32-bit integer:

Label

Branch Unit

LD/ST Unit

Arith. Unit 1

Arith. Unit 2

Loop:

NOP

LW R4, 0(R0)

NOP

NOP

 

NOP

LW R5, 0(R1)

ADDI R4, R4, 4

ADDI R2, R2, 4

 

NOP

NOP

ADDI R5, R5, 4

NOP

 

NOP

NOP

NOP

NOP

 

NOP

NOP

NOP

NOP

 

NOP

NOP

NOP

NOP

 

NOP

NOP

NOP

NOP

 

NOP

NOP

ADD R6, R4, R5

NOP

 

NOP

NOP

NOP

NOP

 

BNE R2, R3, Loop

SW R6, -4(R2)

NOP

NOP

Note how we increment the vector points while waiting for loads to complete. Also note how each operation reads its source registers in the first cycle of its execution, so we can modify source registers in subsequent cycles. Each operation also writes its result registers in the last cycle, so we can read the old value of the register until then. Overall, this code takes ten cycles per element, for a total of 600 cycles.

A) If we unroll this loop twice and re-schedule the code in each (new) iteration, what does the code look like (use only as many instruction slots in the table below as you need) and how many cycles does the entire 60-iteration (20 iterations after unrolling) loop take now?

B) If we apply software pipelining to split the loop (the one from page 6, not your unrolled loop from page 7) into two stages (one loads , the other adds and stores) and then re-schedule the code in each iteration, what does the code look like and how many cycles does the entire loop take now?

Your code for the SW-pipelined loop (show prologue and epilogue code, too).

Reference no: EM131884932

Questions Cloud

Explain the importance of a budget for fire : Explain the importance of a budget for fire and emergency medical services (EMS) administration in preparation for emergency incidents.
Integrate elements of reasoning and intellectual standards : Read the following three situations. Describe what you would do in EACH situation. Integrate the elements of reasoning and intellectual standards in your.
Thirteen original colonies : Using the Webtext, briefly describe three (3) characteristics for each of the English colonies located in the South, Middle, and New England regions.
Describe the firms general environment : Describe the firm's General Environment (the 6 elements that disrobe a company -technological change, demographic trends, cultural trends.
What is the asymptotic accuracy for a one-bit predictor : CS 4290/6290: High-Performance Computer Architecture Midterm Exam. What is the asymptotic accuracy for a one-bit predictor on this branch
How would you describe the given class to others : In this class, you have reflected on multiple aspects of critical thinking. What did you get out of the class? How would you describe this class to others?
What are three of the cultural missteps that wally astor : What are three of the cultural missteps that Wally Astor and his father-in-law, Henry Williams, made in this scenario? Why do you think this happened?
Describe what makes the leader effective : Select a known business leader that you believe demonstrates strong leadership traits and write a research paper on this individual.
How would you measure the amount of the given loss : Operationally, how would you measure the amount of this loss? Might firms that make cupcakes suffer a loss, too? What is the measure of their loss?

Reviews

len1884932

3/3/2018 12:12:03 AM

High-Performance Computer Architecture Practice Midterm Exam - This is a closed-book exam. There are 6 problems on this exam and the maximum total number of points is 50. Write your answers in the space provided. Use the extra pages for scratch space is needed. If we apply software pipelining to split the loop (the one from page 6, not your unrolled loop from page 7) into two stages (one loads , the other adds and stores) and then re-schedule the code in each iteration, what does the code look like and how many cycles does the entire loop take now?

len1884932

3/3/2018 12:11:56 AM

Note: You can compute the average execution time of this fragment by computing how long it takes to execute the 1000 execution of the fragment as illustrated above, then dividing that number by 1000. Note that this is not a listing of a program, this is a list of instructions in the pre-issue buffer, in the order in which they appear in that buffer. In other words, branches have already been (correctly) predicted and these are the instructions that have been fetched and decoded.

Write a Review

Computer Engineering Questions & Answers

  Compare the isochronous to other connections

How does the isochronous connection compare to the asynchronous and synchronous connections? Compare the applications and efficiencies of all three.

  An information technology recruiting firm has been growing

an information technology recruiting firm has been growing rapidly over the past few years. the number of clients over

  Design a detailed network in netsim

MN503 Overview of Internetworking Assessment Title - Network design with configuration. Design a detailed Network in Netsim

  Thread problem the eaters must give their dishes to the

thread problem the eaters must give their dishes to the dishwasher by putting them on a conveyor belt that has a

  Discuss the optimality of the dynamic programming solution

Discuss the optimality of the dynamic programming solution. Discuss the time complexity of this algorithm in terms of the size of the inputs X and Y.

  Show that variable elimination on poly trees can performed

Show that variable elimination on poly trees can be performed in linear time, assuming that the local probability models are represented as full tables.

  Write a scientific research paper that investigates

write a scientific research paper that investigates the properties of assembly language from its inception till today and how it revolutionized today's technology.

  Define the rule that is applied to place a class

The system development team at Wilson Company is working on developing a new customer order entry system. In the method on designing the new system, the team has identified the following class and its attributes.

  Write a program that request a student lastname

Write a program that request a student LastName and Social Security Number.

  Compare and contrast the useradd

Compare and contrast the useradd and adduser commands in Linux. What is their purpose? Which one would you use? What other processes besides using these two commands might you employ to accomplish the same task?

  Making the network the organization

Comment about this topic: Comment on any one of the ten tech solution- Where we are at this time with respect to this trend and the authors prediction

  Regarding delivery of the packet wirelessly

Supposing no malfunction in any of stations or nodes of the network, also explain in scholarly detail if it is possible for the packet to be delivered to the wrong destination

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd