Implementing efficient code on the gpu with cuda

Assignment Help Programming Languages
Reference no: EM131035690

Introduction-

The aim of the assignment is to test your understanding and technical ability of implementing efficient code on the GPU with CUDA. You will be expected to benchmark and optimise the implementation of a simple rule based simulation. You will start by implementing a serial CPU version, you will then parallelise this version for multi-core processors using OpenMP, before finally implementing the same simulation in CUDA. The emphasis of the assignment is on how you have optimised and progressively improved your code in order to implement the simulation efficiently. In order to demonstrate this you are expected to document the processes of benchmarking and improving your code by producing a written report. The written report should show the performance improvements of your code and demonstrate that you understand what limiting factors affect the performance at each stage of your implementation. Handing in just a piece of code with excellent performance will not score highly in the assessment unless you have demonstrated in understanding in the written report of how you have progressively refined your implementation to achieve the final solution.

The Task-

Conway's Game of Life1 is a simple cellular automaton which describes a rule based game in which a player describes only the initial system state. A cellular automaton is a discrete model which consists of a population of regularly spaced cells each with a number of states. By communicating with neighbouring cells, a new generation of the population (at t + 1)   can be generated by applying a set of rules which use   the cell's current state and the cells neighbouring states to determine the next state of the current cell. With respect to the Game of Life, the specific rules of the game are very simple.  A cell within the population is either in an alive or dead state. At each time step, a cell considers its Moore    neighborhood   of immediately adjacent neighbors, including diagonal neighbors,   of   which there are 8 in total (see Figure 1).

1144_Figure.png

For any live cells the following rules are applied in determining the state of the cell at ?? + 1;

1) If there are fewer than two live neighbours; the cell dies of loneliness.

2) If there are two or three live neighbours; the cell remains alive.

3) If there are four or more live neighbours; the cell will die due to overcrowding.

For any dead cells the following rules are applied in determining the state of the cell at t+1;

1) If there are exactly three live neighbours; the dead cell becomes alive due to reproduction.

2) For any other configuration of neighbours the cell remains dead.

The implementation of the task is split into three parts. The first part of the assignment requires implementing the game of life simulation. The second part of the assignment considers how to count a number of patterns observed during simulation. The third part allows you to pick a specific extension to implement within the program.

Part 1 Requirements:

You should start from the provided source code and complete the TODO sections. Your program should accept the arguments described by the existing print_help() function. The file format for Game of Life files (both input and output) should use plain text with a line width and number of lines matching the program arguments W and H respectively. Each line should be terminated with a carriage return ('\n'). A single character should be used to encode the state of each grid cell. If the cell is dead the character should have a space value (i.e. character ' ' with ASCII code #32). If the cell is alive the character should have a value of '#' (i.e. ASCII code #35). Using this plain text format you can visually check the state of any Game of Life file.

You should implement a serial C version of the code and make this as efficient as you can. Once implemented your serial version should act as a baseline to measure potential performance speedup of your parallel versions. Note: It is not permitted to use data structures such as trees to spatially reduce the grid. You code must explicitly store a 2D set of state data for each cell and progress the simulation by considering the Moore neighbourhood.

You should implement a multi core parallel version of the same simulation which uses OpenMP.

Finally, you should implement a CUDA version of the simulation. You should consider how using different approaches for memory caching affect performance over a range of problem sizes. You should also consider how boundary conditions (overlapping regions between thread blocks) are handled and give an overview of any approaches you have applied to improve boundary conditions performance.

For all versions of the implementation the environment should have wrapping bounds. This means that the environment should be 2D but form a 3D torus where cells on the extreme right side of the environment are neighbours to cells on the extreme left, and cells on the extreme top side of the environment are neighbours to cells on the extreme bottom (Figure 2).

1250_Figure1.png

Part 2 Requirements:

It is required that you update your three implementations (serial, CPU and GPU) to be able to provide analysis of the simulation by recognising common patterns. For example, there are a number of patterns which once created in the Game of Life will remain static unless their immediate neighbours are modified. These static patterns are often referred to as a still life patterns. For the purposes of the assignment it is required to know the total number of still life Cross patterns (Figure 3) for any given iteration.

1175_Figure2.png

The Game of Life can also exhibit oscillating patterns. These oscillating patterns are classed by the period (number of iterations) which is required for them to repeat. For the purposes of the assignment it is required to know the total number of Blinker patterns (period 2) which are exhibited for any given iteration. See Figure 4.

40_Figure3.png

A Glider is a special case of an oscillating pattern which migrates across the screen as it oscillates.   For the purposes of the assignment it is required to know the total number of Gliders which are exhibited for any given iteration. Figure 5 shows a the 4 patterns that are exhibited by a Gliders over 4 iterations when travelling in a south east direction. You are required to calculate the number of gliders heading in any direction. In total there are 16 unique patterns which can be obtained by rotating each of the patterns in Figure 5 by 90, 180 and 270 degrees.

1778_Figure4.png

Note: that for all pattern recognition tasks (Cross, Blinkers and Gliders), it is important that the pattern and surrounding 5x5 neighbours match those shown in the figures exactly. For example when matching a pattern both the live and dead cells must represent the 5x5 patterns exactly. I.e. there must be 25 exactly matching cells. Figure 6 shows some examples of patterns which should  not be matched.

2336_Figure5.png

In order to implement counting of patterns you will be required to implement a form of 2D parallel reduction. You should consider different approaches (e.g. recursion, shared memory and warp shuffling) for implementing this reduction and describe how the performance of each suits the technique you have implemented.

Part 3:

For the final part of the assignment you have the freedom to implement ONE of the following more advanced changes to the program.

  • Real time visualisation using CUDA OpenGL interoperability.

 • Modify the simulation to recognise arbitrary patterns of any size and any period. Patterns should be provided as input to the simulation. You will need to create an input file to describe all possible patterns and their names.

  • Extend the simulation to three dimensions. You will need to modify the rules to ensure that the simulation does not die out or demonstrate explosive growth.

You should allow your extension to the program to execute ONLY when using some additional command line option. You should update the print_help() function to describe how this works. As with the previous parts of the assignment you should benchmark your code to demonstrate the performance implications of your implementation.

Attachment:- Assignment.rar

Reference no: EM131035690

Questions Cloud

Differences in factor intensity between stages of production : “Differences in factor intensity between stages of production are very important for firms engaged in vertical FDI than horizontal FDI” Explain why.
Reduce the potential severity of the top five hazards : Based on your knowledge of health and safety matters and your actual observations of operations that are similar to ours , make a list of the potential hazardous conditions employees and others face at Learn In Motion. What should we do to reduce the..
Breakeven and leverage wingler communications corporation : BREAKEVEN AND LEVERAGE Wingler Communications Corporation (WCC) produces premium stereo headphones that sell for $28 80 per set, and this year's sales are expected to be 450,000 units. Variable production costs for the expected sales under present..
Specified by normal distribution with mean lead time : The daily demand of a product is very stable at 250 units per day. However, it's delivery lead time varies and can be specified by a normal distribution with a mean lead time of twelve days and standard deviation of three days. What are the safety st..
Implementing efficient code on the gpu with cuda : Com4521/Com6521: Parallel Computing with GPUs. The aim of the assignment is to test your understanding and technical ability of implementing efficient code on the GPU with CUDA. You will be expected to benchmark and optimise the implementation of a..
Determine if the farmer can successfully restrain the cow : A 180-lb farmer tries to restrain the cow from escaping by wrapping the rope two turns around the tree trunk as shown. If the cow exerts a force of 250 lb on the rope
Write a paper about major factors that contribute to project : Write a two pages paper about the major factors that contribute to project's success? describe and justify your position.
Breakeven and operating leverage : a. Given the following graphs, calculate the total fixed costs, variable costs per unit, and sales price for Firm A. Firm B's fixed costs are $120,000, its variable costs per unit are $4, and its sales price is $8 per unit.
Effect on cash received and interest expense recognized : Prepare a statement of cash flows for the year ended December 31, 2015 using the indirect method. (Hint: for any accounts where the balance changed.

Reviews

Write a Review

 

Programming Languages Questions & Answers

  Write program to find smallest-largest value from n numbers

Write down program which will determine the smallest, largest and average values in collection of N numbers. Get value of N before scanning each value in collection of N numbers.

  Write program to prints the question

Write a program that prints the question"do you want to continue?" and reads a user input. if the user input is"y", "yes", "ok", "sure", or "why not?".

  Write a new class file with a main() method in the project

Write a new class file with a main() method in the same project, use it to load in the names.txt, sort the names, and store them to a file called, nameout.txt. Each name on its own line.

  Write a machine-language program

Write a machine-language program to input two one-digit numbers add them, and output the one-digit sum. Write it in a format suitable for the loader and execute it on the Pep/8 simulator.

  Write program to open file for reading

Write the program to open file for reading which has twenty (20) rows and in each row there are three (3) columns. After reading each row call user-defined function to display each row.

  Which array types could hold object references

What is the default initialization value for a character array?

  Design program that allows clerk to go through cards

Design a program that allows a clerk to go through the cards, entering the district for each citizen until an appropriate sentinel value is entered.

  Find the sum of the cost line column

Write the SQL to count how many authors have a last name that starts with S.

  Write function template that accepts array

Write a function template arraySum () whihc accepts an array and number of values stored in it and returns sum of those values.

  Write a program that takes one command-line argument

Write a program called runsim that takes one command-line argument. Check for the appropriate command-line argument and output a usage message if the command line is incorrect.

  Applyfunction that receives an array and a function

Write a function called applyFunction that receives an array (arr) and a function (func) as a parameter.

  A police officer or investigator has sufficient cause

If a police officer or investigator has sufficient cause to support a search warrant, the prosecuting attorney might direct him or her to submit a(n) ???a. exhibitb. verdict

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd