Difference between revisions of "MatrixMultiply"
m (Added note about auto-coarsening) |
|||
(89 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | =Motivation= | ||
+ | We gain experience using the parallel for loop constructs. | ||
+ | |||
=Background= | =Background= | ||
Line 7: | Line 10: | ||
* [http://mathworld.wolfram.com/MatrixMultiplication.html Wolfram MathWorld] | * [http://mathworld.wolfram.com/MatrixMultiplication.html Wolfram MathWorld] | ||
* [https://en.wikipedia.org/wiki/Matrix_multiplication Wikipedia] | * [https://en.wikipedia.org/wiki/Matrix_multiplication Wikipedia] | ||
− | + | ||
If <math>\mathbf{A}</math> is an <math>n \times m</math> matrix and <math>\mathbf{B}</math> is an <math>m \times p</math> matrix | If <math>\mathbf{A}</math> is an <math>n \times m</math> matrix and <math>\mathbf{B}</math> is an <math>m \times p</math> matrix | ||
Line 13: | Line 16: | ||
: <math>(\mathbf{A}\mathbf{B})_{ij} = \sum_{k=1}^m A_{ik}B_{kj}</math> | : <math>(\mathbf{A}\mathbf{B})_{ij} = \sum_{k=1}^m A_{ik}B_{kj}</math> | ||
+ | |||
+ | :<math>\mathbf{C}=\begin{pmatrix} | ||
+ | a_{11}b_{11} +\cdots + a_{1n}b_{n1} & a_{11}b_{12} +\cdots + a_{1n}b_{n2} & \cdots & a_{11}b_{1p} +\cdots + a_{1n}b_{np} \\ | ||
+ | a_{21}b_{11} +\cdots + a_{2n}b_{n1} & a_{21}b_{12} +\cdots + a_{2n}b_{n2} & \cdots & a_{21}b_{1p} +\cdots + a_{2n}b_{np} \\ | ||
+ | \vdots & \vdots & \ddots & \vdots \\ | ||
+ | a_{m1}b_{11} +\cdots + a_{mn}b_{n1} & a_{m1}b_{12} +\cdots + a_{mn}b_{n2} & \cdots & a_{m1}b_{1p} +\cdots + a_{mn}b_{np} \\ | ||
+ | \end{pmatrix} | ||
+ | </math> | ||
+ | |||
source: [https://en.wikipedia.org/wiki/Matrix_multiplication#General_definition_of_the_matrix_product Matrix Multiplication on Wikipedia] | source: [https://en.wikipedia.org/wiki/Matrix_multiplication#General_definition_of_the_matrix_product Matrix Multiplication on Wikipedia] | ||
− | == | + | =Code To Investigate= |
− | + | ==Demo Video== | |
+ | <youtube>iEuYiy1Bx2A</youtube> | ||
+ | ==SequentialMatrixMultiplier== | ||
+ | {{CodeToInvestigate|SequentialMatrixMultiplier|multiply|matrixmultiply.demo|demo}} | ||
− | + | ==SequentialMatrixMultiplierClient== | |
− | + | {{CodeToInvestigate|SequentialMatrixMultiplierClient|main|matrixmultiply.client|demo}} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | = | + | ==MatrixMultiplyApp== |
− | + | {{Viz|MatrixMultiplyApp|matrixmultiply.viz|demo}} | |
− | + | [[File:Martix multiply app 3x5 X 5x4.png|800px]] | |
− | + | =The Core Questions= | |
− | + | *What are the tasks? | |
− | + | *What is the data? | |
− | + | *Is the data mutable? | |
− | + | *If so, how is it shared? | |
− | |||
− | |||
− | + | =Code To Implement= | |
− | |||
− | = | + | There are three methods you will need to implement, all of which are different ways to use parallel for loops to solve the problem. To assist you, the [https://classes.engineering.wustl.edu/cse231/core/index.php?title=MatrixMultiply#SequentialMatrixMultiplier sequential implementation] has been implemented in a [[#Demo_Video|demo video]]. |
− | + | ==LoopLoopMatrixMultiplier== | |
+ | {{CodeToImplement|LoopLoopMatrixMultiplier|multiply|matrixmultiply.exercise}} | ||
− | + | {{Parallel|public double[][] multiply(double[][] a, double[][] b)}} | |
− | |||
− | + | In this implementation, you will simply convert the sequential solution into a parallel one using two nested [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/spring22/apidocs/fj/FJ.html#join_void_fork_loop(int,int,fj.api.TaskIntConsumer) parallel fork loops]. | |
− | In this implementation, | ||
− | |||
− | |||
− | |||
− | |||
− | == | + | === Computation Graph === |
− | |||
− | + | For 3x3 Matrix X 3x3 Matrix: | |
− | + | [[File:LoopLoopMatrixMultiplier_Computation_Graph.svg|800px]] | |
− | |||
− | + | ==Loop2dMatrixMultiplier== | |
+ | {{CodeToImplement|Loop2dMatrixMultiplier|multiply|matrixmultiply.exercise}} | ||
− | + | {{Parallel|public double[][] multiply(double[][] a, double[][] b)}} | |
− | + | <!-- | |
− | In <code> | + | In this implementation, we will cut down the syntax of the two forall implementation with the use of V5’s <code>forall2d</code> method. Functionally, this method serves the purpose of using two forall loops. [[Reference_Page#Forall_2d|Take a look at the reference page]] if you have questions on how to utilize this loop. |
+ | --> | ||
− | + | [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/spring22/apidocs/fj/FJ.html#join_void_fork_loop_2d(int,int,int,int,fj.api.TaskBiIntConsumer) join_void_fork_loop_2d] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | === Computation Graph === | |
− | |||
− | |||
− | + | For 3x3 Matrix X 3x3 Matrix: | |
− | : | + | [[File:Loop2dMatrixMultiplier_Computation_Graph.svg|800px]] |
− | + | ||
− | + | ==Loop2dAutoCoarsenMatrixMultiplier== | |
− | + | {{CodeToImplement|Loop2dAutoCoarsenMatrixMultiplier|multiply|matrixmultiply.exercise}} | |
− | + | ||
− | + | {{Parallel|public double[][] multiply(double[][] a, double[][] b)}} | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | [https://www.cse.wustl.edu/~dennis.cosgrove/courses/cse231/spring22/apidocs/fj/FJ.html#join_void_fork_loop_2d_auto_coarsen(int,int,int,int,fj.api.TaskBiIntConsumer) join_void_fork_loop_2d_auto_coarsen] | |
− | |||
− | + | This implementation will look very similar to the previous one, so don't overthink it! The real benefit can be seen in the performance difference between the two based on the coarsening being done behind the scenes. | |
− | + | <!-- | |
− | + | In this implementation, we will add a minor performance boost to the process by using the forall-chunked construct. Although similar to the traditional forall loop, it increases performance using iteration grouping/chunking. This topic is discussed in detail in this [https://edge.edx.org/courses/RiceX/COMP322/1T2014R/courseware/a900dd0655384de3b5ef01e508ea09d7/6349730bb2a149a0b33fa23db7afddee/?activate_block_id=i4x%3A%2F%2FRiceX%2FCOMP322%2Fsequential%2F6349730bb2a149a0b33fa23db7afddee Rice video] and explained in the [[Reference_Page#Parallel_Loops|V5 documentation]]. There is no need to specify anything, allow the runtime to determine the chunking. | |
− | |||
− | |||
− | |||
− | + | NOTE: we contemplated also assigning building a 1D forall chunked version. We deemed this more work that it was worth given that you are already building the 2d version. Just know that forall(chunked(), ...) exists for 1d loops as well. | |
− | + | Use chunking. It is a nice feature. | |
− | + | --> | |
− | |||
− | = | + | =Extra Credit Challege Divide and Conquer= |
− | + | [[Matrix_Multiply_Divide_and_Conquer_Assignment|Divide and Conquer Matrix Multiplication]] | |
− | === | + | =Testing Your Solution= |
− | + | ==Correctness== | |
− | + | {{TestSuite|__MatrixMultiplyTestSuite|matrixmultiply.studio}} | |
− | + | ==Performance== | |
+ | {{Performance|MatrixMultiplicationTiming|matrixmultiply.performance}} | ||
− | + | Investigate the performance difference for your different implementations. When you run MatrixMultiplicationTiming it will put a CSV of the timings into your copy buffer. You can then paste them into a spreadsheet and chart the performance. Feel free to tune the parameters of the test to see the impacts of, for example, different matrix sizes. | |
− | |||
− | + | [[File:Matrix multiply performance.png]] | |
− | [ | ||
− | |||
− | + | =Pledge, Acknowledgments, Citations= | |
+ | {{Pledge|matrix-multiply}} |
Latest revision as of 00:21, 14 February 2023
Contents
Motivation
We gain experience using the parallel for loop constructs.
Background
Matrix multiplication is a simple mathematical operation which we will replicate in this studio. For our purposes, we will only deal with square matrices (same number of rows and columns). However, we will approach this problem with several different parallel constructs and approaches.
For those unfamiliar on how to multiply two matrices, take a look at these overviews:
If is an matrix and is an matrix
for each i=[0..n) and for each j=[0..p)
source: Matrix Multiplication on Wikipedia
Code To Investigate
Demo Video
SequentialMatrixMultiplier
class: | SequentialMatrixMultiplier.java | DEMO: |
methods: | multiply | |
package: | matrixmultiply.demo | |
source folder: | src/demo/java |
SequentialMatrixMultiplierClient
class: | SequentialMatrixMultiplierClient.java | DEMO: |
methods: | main | |
package: | matrixmultiply.client | |
source folder: | src/demo/java |
MatrixMultiplyApp
class: | MatrixMultiplyApp.java | VIZ |
package: | matrixmultiply.viz | |
source folder: | student/src/demo/java |
The Core Questions
- What are the tasks?
- What is the data?
- Is the data mutable?
- If so, how is it shared?
Code To Implement
There are three methods you will need to implement, all of which are different ways to use parallel for loops to solve the problem. To assist you, the sequential implementation has been implemented in a demo video.
LoopLoopMatrixMultiplier
class: | LoopLoopMatrixMultiplier.java | |
methods: | multiply | |
package: | matrixmultiply.exercise | |
source folder: | student/src/main/java |
method: public double[][] multiply(double[][] a, double[][] b)
(parallel implementation required)
In this implementation, you will simply convert the sequential solution into a parallel one using two nested parallel fork loops.
Computation Graph
For 3x3 Matrix X 3x3 Matrix:
Loop2dMatrixMultiplier
class: | Loop2dMatrixMultiplier.java | |
methods: | multiply | |
package: | matrixmultiply.exercise | |
source folder: | student/src/main/java |
method: public double[][] multiply(double[][] a, double[][] b)
(parallel implementation required)
Computation Graph
For 3x3 Matrix X 3x3 Matrix:
Loop2dAutoCoarsenMatrixMultiplier
class: | Loop2dAutoCoarsenMatrixMultiplier.java | |
methods: | multiply | |
package: | matrixmultiply.exercise | |
source folder: | student/src/main/java |
method: public double[][] multiply(double[][] a, double[][] b)
(parallel implementation required)
join_void_fork_loop_2d_auto_coarsen
This implementation will look very similar to the previous one, so don't overthink it! The real benefit can be seen in the performance difference between the two based on the coarsening being done behind the scenes.
Extra Credit Challege Divide and Conquer
Divide and Conquer Matrix Multiplication
Testing Your Solution
Correctness
class: | __MatrixMultiplyTestSuite.java | |
package: | matrixmultiply.studio | |
source folder: | testing/src/test/java |
Performance
class: | MatrixMultiplicationTiming.java | |
package: | matrixmultiply.performance | |
source folder: | src/main/java |
Investigate the performance difference for your different implementations. When you run MatrixMultiplicationTiming it will put a CSV of the timings into your copy buffer. You can then paste them into a spreadsheet and chart the performance. Feel free to tune the parameters of the test to see the impacts of, for example, different matrix sizes.
Pledge, Acknowledgments, Citations
file: | matrix-multiply-pledge-acknowledgments-citations.txt |
More info about the Honor Pledge