Implementing efficient code on the gpu with cuda

Assignment Help Programming Languages
Reference no: EM131035690

Introduction-

The aim of the assignment is to test your understanding and technical ability of implementing efficient code on the GPU with CUDA. You will be expected to benchmark and optimise the implementation of a simple rule based simulation. You will start by implementing a serial CPU version, you will then parallelise this version for multi-core processors using OpenMP, before finally implementing the same simulation in CUDA. The emphasis of the assignment is on how you have optimised and progressively improved your code in order to implement the simulation efficiently. In order to demonstrate this you are expected to document the processes of benchmarking and improving your code by producing a written report. The written report should show the performance improvements of your code and demonstrate that you understand what limiting factors affect the performance at each stage of your implementation. Handing in just a piece of code with excellent performance will not score highly in the assessment unless you have demonstrated in understanding in the written report of how you have progressively refined your implementation to achieve the final solution.

The Task-

Conway's Game of Life1 is a simple cellular automaton which describes a rule based game in which a player describes only the initial system state. A cellular automaton is a discrete model which consists of a population of regularly spaced cells each with a number of states. By communicating with neighbouring cells, a new generation of the population (at t + 1)   can be generated by applying a set of rules which use   the cell's current state and the cells neighbouring states to determine the next state of the current cell. With respect to the Game of Life, the specific rules of the game are very simple.  A cell within the population is either in an alive or dead state. At each time step, a cell considers its Moore    neighborhood   of immediately adjacent neighbors, including diagonal neighbors,   of   which there are 8 in total (see Figure 1).

1144_Figure.png

For any live cells the following rules are applied in determining the state of the cell at ?? + 1;

1) If there are fewer than two live neighbours; the cell dies of loneliness.

2) If there are two or three live neighbours; the cell remains alive.

3) If there are four or more live neighbours; the cell will die due to overcrowding.

For any dead cells the following rules are applied in determining the state of the cell at t+1;

1) If there are exactly three live neighbours; the dead cell becomes alive due to reproduction.

2) For any other configuration of neighbours the cell remains dead.

The implementation of the task is split into three parts. The first part of the assignment requires implementing the game of life simulation. The second part of the assignment considers how to count a number of patterns observed during simulation. The third part allows you to pick a specific extension to implement within the program.

Part 1 Requirements:

You should start from the provided source code and complete the TODO sections. Your program should accept the arguments described by the existing print_help() function. The file format for Game of Life files (both input and output) should use plain text with a line width and number of lines matching the program arguments W and H respectively. Each line should be terminated with a carriage return ('\n'). A single character should be used to encode the state of each grid cell. If the cell is dead the character should have a space value (i.e. character ' ' with ASCII code #32). If the cell is alive the character should have a value of '#' (i.e. ASCII code #35). Using this plain text format you can visually check the state of any Game of Life file.

You should implement a serial C version of the code and make this as efficient as you can. Once implemented your serial version should act as a baseline to measure potential performance speedup of your parallel versions. Note: It is not permitted to use data structures such as trees to spatially reduce the grid. You code must explicitly store a 2D set of state data for each cell and progress the simulation by considering the Moore neighbourhood.

You should implement a multi core parallel version of the same simulation which uses OpenMP.

Finally, you should implement a CUDA version of the simulation. You should consider how using different approaches for memory caching affect performance over a range of problem sizes. You should also consider how boundary conditions (overlapping regions between thread blocks) are handled and give an overview of any approaches you have applied to improve boundary conditions performance.

For all versions of the implementation the environment should have wrapping bounds. This means that the environment should be 2D but form a 3D torus where cells on the extreme right side of the environment are neighbours to cells on the extreme left, and cells on the extreme top side of the environment are neighbours to cells on the extreme bottom (Figure 2).

1250_Figure1.png

Part 2 Requirements:

It is required that you update your three implementations (serial, CPU and GPU) to be able to provide analysis of the simulation by recognising common patterns. For example, there are a number of patterns which once created in the Game of Life will remain static unless their immediate neighbours are modified. These static patterns are often referred to as a still life patterns. For the purposes of the assignment it is required to know the total number of still life Cross patterns (Figure 3) for any given iteration.

1175_Figure2.png

The Game of Life can also exhibit oscillating patterns. These oscillating patterns are classed by the period (number of iterations) which is required for them to repeat. For the purposes of the assignment it is required to know the total number of Blinker patterns (period 2) which are exhibited for any given iteration. See Figure 4.

40_Figure3.png

A Glider is a special case of an oscillating pattern which migrates across the screen as it oscillates.   For the purposes of the assignment it is required to know the total number of Gliders which are exhibited for any given iteration. Figure 5 shows a the 4 patterns that are exhibited by a Gliders over 4 iterations when travelling in a south east direction. You are required to calculate the number of gliders heading in any direction. In total there are 16 unique patterns which can be obtained by rotating each of the patterns in Figure 5 by 90, 180 and 270 degrees.

1778_Figure4.png

Note: that for all pattern recognition tasks (Cross, Blinkers and Gliders), it is important that the pattern and surrounding 5x5 neighbours match those shown in the figures exactly. For example when matching a pattern both the live and dead cells must represent the 5x5 patterns exactly. I.e. there must be 25 exactly matching cells. Figure 6 shows some examples of patterns which should  not be matched.

2336_Figure5.png

In order to implement counting of patterns you will be required to implement a form of 2D parallel reduction. You should consider different approaches (e.g. recursion, shared memory and warp shuffling) for implementing this reduction and describe how the performance of each suits the technique you have implemented.

Part 3:

For the final part of the assignment you have the freedom to implement ONE of the following more advanced changes to the program.

  • Real time visualisation using CUDA OpenGL interoperability.

 • Modify the simulation to recognise arbitrary patterns of any size and any period. Patterns should be provided as input to the simulation. You will need to create an input file to describe all possible patterns and their names.

  • Extend the simulation to three dimensions. You will need to modify the rules to ensure that the simulation does not die out or demonstrate explosive growth.

You should allow your extension to the program to execute ONLY when using some additional command line option. You should update the print_help() function to describe how this works. As with the previous parts of the assignment you should benchmark your code to demonstrate the performance implications of your implementation.

Attachment:- Assignment.rar

Reference no: EM131035690

Questions Cloud

Differences in factor intensity between stages of production : “Differences in factor intensity between stages of production are very important for firms engaged in vertical FDI than horizontal FDI” Explain why.
Reduce the potential severity of the top five hazards : Based on your knowledge of health and safety matters and your actual observations of operations that are similar to ours , make a list of the potential hazardous conditions employees and others face at Learn In Motion. What should we do to reduce the..
Breakeven and leverage wingler communications corporation : BREAKEVEN AND LEVERAGE Wingler Communications Corporation (WCC) produces premium stereo headphones that sell for $28 80 per set, and this year's sales are expected to be 450,000 units. Variable production costs for the expected sales under present..
Specified by normal distribution with mean lead time : The daily demand of a product is very stable at 250 units per day. However, it's delivery lead time varies and can be specified by a normal distribution with a mean lead time of twelve days and standard deviation of three days. What are the safety st..
Implementing efficient code on the gpu with cuda : Com4521/Com6521: Parallel Computing with GPUs. The aim of the assignment is to test your understanding and technical ability of implementing efficient code on the GPU with CUDA. You will be expected to benchmark and optimise the implementation of a..
Determine if the farmer can successfully restrain the cow : A 180-lb farmer tries to restrain the cow from escaping by wrapping the rope two turns around the tree trunk as shown. If the cow exerts a force of 250 lb on the rope
Write a paper about major factors that contribute to project : Write a two pages paper about the major factors that contribute to project's success? describe and justify your position.
Breakeven and operating leverage : a. Given the following graphs, calculate the total fixed costs, variable costs per unit, and sales price for Firm A. Firm B's fixed costs are $120,000, its variable costs per unit are $4, and its sales price is $8 per unit.
Effect on cash received and interest expense recognized : Prepare a statement of cash flows for the year ended December 31, 2015 using the indirect method. (Hint: for any accounts where the balance changed.

Reviews

Write a Review

Programming Languages Questions & Answers

  Validation and test documentation

The final project consists of a currency conversion application. This application-similar to simple, practical programs on many travel or financial Web sites-includes the following elements: Validation and test documentation

  You can always add a compareto method to a class without

q1. you can always add a compareto method to a class without making it implement the comparable interface so why do you

  Write down a program which explores the seating patterns

write a program that explores the seating patterns related to course performance by using an array of student scores.

  Explain overriding method

Assuming the class B inherits from class A and method foo belongs to class A, what do we mean by overriding method foo in class B? We are coding another version of method foo in class B

  Write a program that will keep track of golf handicaps.

Write a program that will keep track of golf handicaps. To find a player's handicap, you will need to know the player's score, the course rating and the course slope.

  Create a script that requires your visitors to accept

If the check box is not selected, display an alert dialog box information the user that he or she must accept the terms. if the user clicks the Decline link, open the Declinepage.html document.

  Identify two major segments erp systems

And a brief summary describing how such a system would be installed/implemented

  Brief discussion for problem

Brief discussion for Problem - What does the following transcript teach us

  Create a custom application using eclipse

Create a custom Application Using Eclipse Android Development

  Draw a line joining the first two points

Write a program that allows the user to specify a triangle with three mouse presses - Draw a line joining the first two points. After the third mouse press, draw the entire triangle.

  Write a recursive-descent parser

Use any programming language you prefer to write a recursive-descent parser that parses the language generated by the following EBNF descriptions.

  Examine the loop statements in the code sample column

Examine the loop statements in the Code Sample column of the template. Identify the loop type used in the Loop Type column of the template

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd