Produce the opencl kernel and the application driver

Assignment Help Computer Engineering
Reference no: EM13908944

Part 1:

Description

You are given C code for an application that draws dragon curve fractal producing output as you see in the picture on the left. Your task (if you choose to accept) is to transfer curve calculation section of the provided application to OpenCL kernel and to investigate kernel execution times with input depths of 5, 10 and 12.

To build the application use the provided Makefile. The application can be launched with input depth of 10 as follows:

~$ ./dragon 10 > output.pnm

Deliverables

You are expected to produce the OpenCL kernel and the application driver. In addition a document detailing the following points is expected:

1. Understanding of the algorithm

2. Explanation for the chosen kernel implementation

3. Analysis of the measurements made. An explanation is required

You are allowed to use the matrix-matrix multiplication example as the starting base.

Part2

Description

In this assignment, you will experiment with how to accelerate application kernel using GPU architecture and explore GPU design space with a GPU simulation environment. To complete this homework, you need to download GPGPU-Sim (https://www.gpgpu-sim.org/). GPGPU-Sim is a GPU simulator that provides a model for detailed studies of GPU architecture. It supports GPU software written using either CUDA or Open CL. You need access to Linux-based systems in order to run the simulator. The studied application is the kernel function for bitcoin mining.

It is divided into the following steps. Please submit your Open CL code and a report that documents what you did, results and analysis.

Part A: Open CL. Convert bitcoin kernel function into Open CL. Take the bitcoin mining kernel function (written in C) and convert it to Open CL. The bitcoin kernel function performs two hashes (SHA256) and tests the output. The mining kernel can be accelerated using GPU streaming processors and parallel threads.

Part B: GPGPU-Sim. Test the developed bitcoin mining kernel using GPGPU-Sim. Use the GPU configuration below.

Number of Shader Cores

28

Warp Size

32

SIMD Pipeline Width

8

Number of Threads/Core

1024

Number of CTAs /Core

8

Number of Registers /Core

16384

Shared Memory Size /Core (KB)

16 KB

Constant Cache Size /Core

8KB (2 way set assoc, 64B lines LRU)

Texture Cache Size /Core

64KB (2 way set assoc. 64B lines LRU)

Number of Memory Channels

8

L1 Cache

None

L2 Cache

None

Warp Scheduling Policy

Round Robin among ready warps

Report performance for the following:
- Classification of bitcoin mining kernel's instruction type (percentage of each dynamic instruction types)
- Breakdown of memory operations (shared, constant, tex, local, global)
- Warp occupancy (number of active threads in an issued warp)
- Performance of shared memory and constant cache
- IPC for bitcoin mining kernel (instructions per cycle)

Part C: Code Optimization. Optimize the bitcoin mining kernel using loop unrolling and compare GPU performance. Does the optimization improve Warp occupancy, IPC, or memory performance?

Part D: GPU Design Space Exploration. The next exercise explores the design space of GPU architecture for accelerating the bitcoin mining kernel.

Will increasing the number of threads per core improve bitcoin mining performance?

Experiment and compare the performance using different number of threads/core.

Number of Shader Cores

28

Warp Size

32

SIMD Pipeline Width

8

Number of Threads/Core

1024/1536/2048

Number of CTAs /Core

8

Number of Registers /Core

16384

Shared Memory Size /Core (KB)

16 KB

Constant Cache Size /Core

8KB (2 way set assoc, 64B lines LRU)

Texture Cache Size /Core

64KB (2 way set assoc. 64B lines LRU)

Number of Memory Channels

8

L1 Cache

None

L2 Cache

None

Warp Scheduling Policy

Round Robin among ready warps

Will increasing the number of registers per core improve bitcoin mining performance?

Number of Shader Cores

28

Warp Size

32

SIMD Pipeline Width

8

Number of Threads/Core

1024

Number of CTAs /Core

8

Number of Registers /Core

16384/24576/32768

Shared Memory Size /Core (KB)

16 KB

Constant Cache Size /Core

8KB (2 way set assoc, 64B lines LRU)

Texture Cache Size /Core

64KB (2 way set assoc. 64B lines LRU)

Number of Memory Channels

8

L1 Cache

None

L2 Cache

None

Warp Scheduling Policy

Round Robin among ready warps

Experiment and compare the performance using different number of registers /core.

Will increasing shared memory size or texture cache size improve bitcoin mining performance? Answer the question with performance measurement.

Assume you want to maximize bitcoin mining performance by adding new features to the GPU. Bitcoin mining kernel computes SHA-256 hash calculations. SHA-256 algorithm utilizes a 32-bit integer right rotate operation. Right rotation of n = 11100101 by 3 makes n = 10111100 (Right shifted by 3 and last 3 bits are put back in first ) if n is stored using 8 bits. If n is stored using 16 bits or 32 bits then right rotation of n (000...11100101) by 3 becomes 101000..0011100.

As a GPU designer, you decide to add a funnel shifter to accelerate the integer shift operations. The funnel shifter is pipelined (ptx_opcode_latency = 1, and ptx_opcode_initiation = 1). For developers, a 64-bit "funnel shift" instruction that may be accessed with the following intrinsics:

funnelshift_lc(): returns most significant 32 bits of a left funnel shift.

funnelshift_rc(): returns least significant 32 bits of a right funnel shift.

These intrinsics are implemented as inline device functions (using inline PTX assembler). By default, the least significant 5 bits of the shift count are masked off; the _lc and _rc intrinsics clamp the shift value to the range 0..32.

 

INTRINSIC

 

DESCRIPTION

 

    funnelshift_l(hi, lo, sh)

 

Concatenates [hi:lo] into a 64-bit quantity, shifts it left by (sh&31)bits, and returns the most significant 32 bits

 

    funnelshift_lc(hi, lo, sh)

 

Concatenates [hi:lo] into a 64-bit quantity, shifts it left by min(sh,32) bits, and returns the most significant 32 bits

 

    funnelshift_r(hi, lo, sh)

 

Concatenates [hi:lo] into a 64-bit quantity, shifts it right by (sh&31) bits, and returns the least significant 32 bits

 

    funnelshift_rc(hi, lo, sh)

 

Concatenates [hi:lo] into a 64-bit quantity, shifts it right by min(sh,32) bits, and returns the least significant 32 bits

To complete this question, you need to add extension to GPGPU-sim to take into account this new support for shift operations and simulate its performance (assume one funnel shift unit per SP).

Re-write the bitcoin mining kernel using funnel shift intrinsics and measure its performance using the modified GPGPU-sim with funnel shift support. Can bitcoin mining performance be improved significantly using the dedicated hardware support for integer shift operations? Answer the question using simulated performance measures.

Please submit your GPGPU-sim code, Open CL bitcoin mining kernel using funnel shift intrinsic, and a report that documents design of your extension to GPGPU-sim, performance results, and analysis.


Attachment:- hwDocs.rar

Reference no: EM13908944

Questions Cloud

Dividend is expected to grow : Mitts Cosmetics Co.'s stock price is $50.30, and it recently paid a $2.25 dividend. This dividend is expected to grow by 25% for the next 3 years, then grow forever at a constant rate, g; and rs = 13%. At what constant rate is the stock expected to g..
Discuss the links between style, systems, and shared values : Utilizing from the McKinsey 7S Framework model, for the Company Fed Ex discuss the links among/between Strategy, Structure and Shared Values for Fed Ex.
Percentage of elapsed time : Let us assume that a program takes 400 seconds of elapsed time to execute. Out of these 400 seconds, 320 seconds is the CPU time and the rest is I/O time. What percentage of elapsed time is spent on I/O operations?
Weighted average expected rate of return on retirement fund : Average annual income during retirement--stated in inflation-adjusted dollars 330,123. Weighted average expected rate of return on your retirement fund: 2.07%
Produce the opencl kernel and the application driver : Re-write the bitcoin mining kernel using funnel shift intrinsics and measure its performance using the modified GPGPU-sim with funnel shift support. Can bitcoin mining performance be improved significantly using the dedicated hardware support for ..
Mosaic company applies overhead using machine hours : Mosaic Company applies overhead using machine hours and reports the following information.
What is value of stock today-expanding rapidly and currently : Microtech Corporation is expanding rapidly and currently needs to retain all of its earnings; hence, it does not pay dividends. However, investors expect Microtech to begin paying dividends, beginning with a dividend of $1.25 coming 3 years from toda..
Instructions are executed in the polling loop : Let us have a 20MIPS processor, you have to calculate the percentage of time it spends in busy wait loop of a 75-character line printer, when it takes 2 msec to print a character and a total of 665 instructions require to be executed to print a 75..
Prepare a material purchases budget for the same period : Prepare a material purchases budget for the same period

Reviews

Write a Review

Computer Engineering Questions & Answers

  Evaluate some simple semantic web application

Comparative evaluation of alternative AI-based machine learning approaches on a broad range of classification tasks.

  Security devices for protection against different attack

Describe two or more attacks for which the routers (layer 3 devices) are vulnerable. Explain how these attacks are detected and prevented by the security devices.

  How to explain capacity of the queue

How to explain capacity of the queue

  Questionwrite down a program visual basic format that has a

questionwrite down a program visual basic format that has a key form with following attributes-a main form with these

  Use the int cast process to convert to an integer

give Output the corresponding letter grade along with the initial number entered in decimal format utilizing the printf method and format specifiers. Output the number in only 3 decimal places.

  Program meeting the least program requirements

Incorporate ADO.NET access to at least three Microsoft Access data tables. These tables must include Customers, Products (the kinds of products offered, cost, sales price, and quantity on hand), and Sales information (a combination of customer and..

  Pros and cons of each with a focus on it investments

Total Cost of Ownership (TCO) and 2) Return on Investment (ROI). Describe each of these approaches, state your preference, and analyze the pros and cons of each with a focus on IT investments

  What role does technology or social media have on how we

1.what role do you think media internet music industry movies or advertising play the development of our self-concept?

  What are some automation tools

What is the office automation and group collaboration software which is used in any oraginization? How do you analize this/ what are the ad/disadvantages on this method.

  Create a class called song that has 3 attributes-title

Create a class called Song that has 3 attributes-title, artist, and price (which will be entered later as either 0.99 or 1.49). It should contain 2 constructor methods, all the necessary set and get methods, and a "toString" method that prints the..

  Why the user clicks the read file button to read the file

What I need help with is to get the dice to roll 100 times instead of just one. So read file results will show the results of 100 rolls of the dice. The file tab also has instructions as to how program should work.

  1 sort a list of distinct numbers in ascending order using

1 sort a list of distinct numbers in ascending order using the following divide-and-conquer strategy quicksort divide

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd