Difference between revisions of "MatrixMultiply"

From CSE231 Wiki
Jump to navigation Jump to search
Line 5: Line 5:
 
=Where to Start=
 
=Where to Start=
  
You can find all of the relevant files for the assignment under the '''matrix-multiply''' directory. From there, you will need to implement the <code>Matrix</code> and <code>SubMatrix</code> classes.
+
You can find all of the relevant files for the assignment under the '''matrix-multiply''' directory. From there, you will need to implement the <code>MatrixMultiplySolutions</code> class.
  
==Matrix==
+
==MatrixMultiplySolutions==
  
There are seven methods you will need to implement, all of which are different ways to use parallel for loops to solve the problem. To assist you, the sequential implementation has already been completed for you. We recommend starting from the top and working your way down.
+
There are three methods you will need to implement, all of which are different ways to use parallel for loops to solve the problem. To assist you, the sequential implementation has already been completed for you. We recommend starting from the top and working your way down. There is also an optional recursive implementation and a manual grouping implementation which has been done for you (this is just to demonstrate how chunking works behind the scenes).
  
===parallelForAllIMultiply===
+
===FORALL_FORALL===
  
In this implementation, you will simply convert the sequential solution into a parallel one using a forall loop. This should be the easiest parallel implementation and should only require minor changes to work properly.
+
In this implementation, you will simply convert the sequential solution into a parallel one using two forall loops.
  
===parallelChunkedForAllIMultiply===
+
===FORALL2D===
 
 
In this implementation, we will add a minor performance boost to the process by using the forall-chunked construct. Although similar to the traditional forall loop, it increases speedup using iteration grouping/chunking. This topic is discussed in detail in this [https://edge.edx.org/courses/RiceX/COMP322/1T2014R/courseware/a900dd0655384de3b5ef01e508ea09d7/6349730bb2a149a0b33fa23db7afddee/?activate_block_id=i4x%3A%2F%2FRiceX%2FCOMP322%2Fsequential%2F6349730bb2a149a0b33fa23db7afddee Rice video] and explained in the [http://pasiphae.cs.rice.edu/#hjlib-for-parallelchunkediterationwithimplicitfinish HJ documentation]. There is no need to specify anything, allow the runtime to determine the chunking.
 
 
 
===parallelGroupedForAllTasksForSeqIMultiply===
 
 
 
In this approach, we will show you how you can specify the grouping yourself and what forAllChunked is doing behind the scenes. There is no need to implement this method, this is just here to serve as a reference for you. However, you should understand the concept of iteration grouping before moving forward.
 
 
 
===parallelForAllIForAllJMultiply===
 
 
 
In this implementation, we will increase the parallelism of the previous implementations using two forall loops instead of one. There is no need to chunk or group the iterations. This should look very similar to the original forall implementation.
 
 
 
===parallelForAll2dIJMultiply===
 
  
 
In this implementation, we will cut down the syntax of the two forall implementation with the use of HJ’s <code>forall2d</code> method. Functionally, this method serves the purpose of using two forall loops. The input parameters should equal the input parameters of the two separate forall loops. For example,
 
In this implementation, we will cut down the syntax of the two forall implementation with the use of HJ’s <code>forall2d</code> method. Functionally, this method serves the purpose of using two forall loops. The input parameters should equal the input parameters of the two separate forall loops. For example,
Line 35: Line 23:
 
Would appear as <code>forall2d(min, max, min, max, (i, j) -> doSomething());</code> in the forall2d syntax.
 
Would appear as <code>forall2d(min, max, min, max, (i, j) -> doSomething());</code> in the forall2d syntax.
  
===parallelChunkedForAll2dIJMultiply===
+
===FORALL2D_CHUNKED===
  
This method should build on the previous 2d method and optimize its performance using chunked(). This should look very familiar from the previous implementations. Again, there is no need to specify how to chunk, leave it all to the runtime.
+
In this implementation, we will add a minor performance boost to the process by using the forall-chunked construct. Although similar to the traditional forall loop, it increases speedup using iteration grouping/chunking. This topic is discussed in detail in this [https://edge.edx.org/courses/RiceX/COMP322/1T2014R/courseware/a900dd0655384de3b5ef01e508ea09d7/6349730bb2a149a0b33fa23db7afddee/?activate_block_id=i4x%3A%2F%2FRiceX%2FCOMP322%2Fsequential%2F6349730bb2a149a0b33fa23db7afddee Rice video] and explained in the [http://pasiphae.cs.rice.edu/#hjlib-for-parallelchunkediterationwithimplicitfinish HJ documentation]. There is no need to specify anything, allow the runtime to determine the chunking.
  
===parallelGroupedForAllTasksForSeq2dIJMultiply===
+
===FORALL_GROUPED & FORALL2D_GROUPED===
  
In this approach, we will show you how you can specify the grouping yourself and what forAllChunked2d is doing behind the scenes. There is no need to implement this method, this is just here to serve as a reference for you. However, you should understand the concept of iteration grouping before moving forward.
+
In this approach, we will show you how you can specify the grouping yourself and what forAllChunked is doing behind the scenes. There is no need to implement this method, this is just here to serve as a reference for you. However, you should understand the concept of iteration grouping before moving forward.
  
==SubMatrix==
+
==OPTIONAL_SEQ_RECURSIVE & OPTIONAL_PAR_RECURSIVE==
  
In this implementation, you will solve the matrix multiply problem in parallel using recursion. Although this class should be able to take in a matrix of any size, try to imagine this as a 2x2 matrix in order to make it easier to solve. Once you solve the sequential method, the parallel method should look very similar with exception of an async/finish block.
+
In this implementation, you will solve the matrix multiply problem sequentially and in parallel using recursion. Although this class should be able to take in a matrix of any size, try to imagine this as a 2x2 matrix in order to make it easier to solve. Once you solve the sequential method, the parallel method should look very similar with exception of an async/finish block.
  
 
In order to obtain the desired result matrix, you will need to recursively call the correct submatrices for each of the four result submatrices. Imagining this as a 2x2 matrix, remember that the dot products of the rows of the first matrix and the columns of the second matrix create the result matrix.
 
In order to obtain the desired result matrix, you will need to recursively call the correct submatrices for each of the four result submatrices. Imagining this as a 2x2 matrix, remember that the dot products of the rows of the first matrix and the columns of the second matrix create the result matrix.
  
 
Hint: Each result submatrix should have two recursive calls, for a total of eight recursive calls.
 
Hint: Each result submatrix should have two recursive calls, for a total of eight recursive calls.

Revision as of 15:38, 17 August 2017

Background

Matrix multiplication is a simple mathematical operation which we will replicate in this studio. For our purposes, we will only deal with square matrices (same number of rows and columns). However, we will approach this problem with several different parallel constructs and approaches. For those unfamiliar on how to multiply two matrices, take a look at this link for a quick overview.

Where to Start

You can find all of the relevant files for the assignment under the matrix-multiply directory. From there, you will need to implement the MatrixMultiplySolutions class.

MatrixMultiplySolutions

There are three methods you will need to implement, all of which are different ways to use parallel for loops to solve the problem. To assist you, the sequential implementation has already been completed for you. We recommend starting from the top and working your way down. There is also an optional recursive implementation and a manual grouping implementation which has been done for you (this is just to demonstrate how chunking works behind the scenes).

FORALL_FORALL

In this implementation, you will simply convert the sequential solution into a parallel one using two forall loops.

FORALL2D

In this implementation, we will cut down the syntax of the two forall implementation with the use of HJ’s forall2d method. Functionally, this method serves the purpose of using two forall loops. The input parameters should equal the input parameters of the two separate forall loops. For example,

forall(min, max, (i) -> {
	forall(min, max, (j) -> doSomething());
});

Would appear as forall2d(min, max, min, max, (i, j) -> doSomething()); in the forall2d syntax.

FORALL2D_CHUNKED

In this implementation, we will add a minor performance boost to the process by using the forall-chunked construct. Although similar to the traditional forall loop, it increases speedup using iteration grouping/chunking. This topic is discussed in detail in this Rice video and explained in the HJ documentation. There is no need to specify anything, allow the runtime to determine the chunking.

FORALL_GROUPED & FORALL2D_GROUPED

In this approach, we will show you how you can specify the grouping yourself and what forAllChunked is doing behind the scenes. There is no need to implement this method, this is just here to serve as a reference for you. However, you should understand the concept of iteration grouping before moving forward.

OPTIONAL_SEQ_RECURSIVE & OPTIONAL_PAR_RECURSIVE

In this implementation, you will solve the matrix multiply problem sequentially and in parallel using recursion. Although this class should be able to take in a matrix of any size, try to imagine this as a 2x2 matrix in order to make it easier to solve. Once you solve the sequential method, the parallel method should look very similar with exception of an async/finish block.

In order to obtain the desired result matrix, you will need to recursively call the correct submatrices for each of the four result submatrices. Imagining this as a 2x2 matrix, remember that the dot products of the rows of the first matrix and the columns of the second matrix create the result matrix.

Hint: Each result submatrix should have two recursive calls, for a total of eight recursive calls.