Demonstrate and evaluate the speed improvement due to parallelization of the matrix multiply algorithm. - StudyAce – Custom Writing & Research Support for All Levels

The post Demonstrate and evaluate the speed improvement due to parallelization of the matrix multiply algorithm. is a property of College Pal
College Pal writes Plagiarism Free Papers. Visit us at College Pal – Connecting to a pal for your paper

Demonstrate and evaluate the speed improvement due to parallelization of the matrix multiply algorithm.
Goal of this assignment

Distributed parallel computing
SIMD using vectorized data
Symmetric Multiprocessing using OpenMP
Distributed Memory using Message Passing Interface (MPI)
Team work
Source code management
Code review
Writing a common report
Full project life cycle
Task elicitation
Development with remote system
Testing
Coding
Problem StatementGiven the following algorithm doing matrix multiplication, implement multiples variations using different types of parallel processing we saw in class: SIMD, OMP, MPI, and OMP+MPI.
for (i = 0; i < N; i++)
for (j = 0; j < N; j++) {
c[i][j] = 0;
for (k = 0; k < N; k++)
c[i][j] += a[i][k] * b[k][j];
}
Lab Parallel Computing – First day review (due before end of first lab):
Run Matrix Multiplication non-vectorized in C
Create a vectorized SIMD matrix multiplication version in C
Run HelloWorld.c MPI
Lab Parallel Computing – One Week Review
Project board is created, tasks are created and updated
Writing on problem research is due
Automate running matrix multiplication on different size matrix and generating timing data in tabular format for graph production.
Rewrite mmult.c to use SIMD parallelization. (Create and run mmult_simd.c . This is a rewrite of mmult.c to add SIMD optimisations (refer to ppt for the algorithm to use) and try to run with and without -O3 on Wolfgand cluster. )
Compare running matrix multiplication in C on Wolfgand cluster with and without SIMD.
Compare running matrix multiplication in C on Wolfgand cluster with and without -O3 optimizations.
Produce a single graphs comparing all four ways of running matrix mul
Lab Parallel Computing – Full project – (This Canvas Assignment)
Project board is updated and reflect current project status
Read input matrix from two files as described in MPI and OpenMP Approaches to consider .docx Download MPI and OpenMP Approaches to consider .docx in section: main program. This will be used for demo and grading.
Matrix multiplication in C on Wolfgand cluster with OpenMP.
Matrix multiplication in C on Wolfgand cluster with MPI (Distributed Memory)
Update graph to include SIMD, OpenMP and MPI versions. (You can removed unoptimized algorithm as it expected will be “off the chart” and make the chart difficult to read)
Extra credit: Matrix multiplication in C on Wolfgand cluster with both OpenMP and MPI.
Extra credit: Automate the production of graph with Gnuplot or other tools. If you want to use python libraries, such as Matplotlib, you need to first create a python virtual environment on the cluster machine. See these instructions – updatedLinks to an external site.
Helpful Git commands
<git add -A> adds all new files to git.
<git status> shows current branch and the updates
<git commit -m “message goes here”> to commit your edits in local branch
<git checkout –b <branchname> to create new branch
<git push –set-upstream origin <branchname>> to push new branch to remote repository
Helpful MPI Commands
Every file currently in directory needs to be compiled in a specific way so that it can be run in parallel. In order to compile them a makeFile already exists for simplicity therefore just type command
make
And it will compile the files for you which then you can run using
mpiexec -f ~/hosts -n 4 ./<nameOfFile> <anyParametersNeeded>
Example:
mpiexec -f ~/hosts -n 4 ./mmult_mpi_omp a.txt b.txt
If you would like to remove all compiled files type command
make clean
small writing( word.doc)
Present your current work you did
What did you do last week?
What will you do before submission next week?
Demo your current working code. Tasks expected to be completed:
Automate running matrix multiplication on different size matrix and generating data in tabular format for graph production.
Matrix multiplication in C on Wolfgand cluster without SIMD and without parallelization.
Matrix multiplication in C on Wolfgand cluster with SIMD non-vectorized (w/o -O3) and SIMD vectorized (w/ -O3) (rewrite algo accordingly and try with and without -O3).
Produce a single graph comparing the speed of all implementations
Writing
Research Question: what is SIMD, OMP, and MPI? What are the differences between them?
Describe what is shown on the graph you have produced.

Big writing(word.doc)
Distributed parallel computing
Explain the architecture of your solution.
Explain the variations of algorithm you implemented.
Teamwork
List all team members in the README.md explain the contribution of each person.
Did you lock the master branch as explain in GitHW2 Lab 2 – Git? How did you proceed to review each other work?
Full project life cycle
Have you used a project board? How did you used it, or if you did not use a project board how did you plan and manage your project and team work?
Is the usual cycle: write code, compile code, run code, test code was the same when doing remote development on Wolfgand cluster. Did you need to adapt your way of working or use different tools?
What kind of testing did you plan to use for this project? Did you consider measuring speed, memory consumption and validity of results for matrix multiplication. Did you consider that the code provided by the professor could have flaws?
Did you need to write code or use tools to generate random matrix of specific sizes? Did you put this in your plan?
Did you put in your plan the work needed to generate tables or graphs? Did you automate this work?
What proportion of the tasks (and time) in your plan is about writing variations on the matrix multiplication algorithm and what proportion is about testing and reporting activities?

The post Demonstrate and evaluate the speed improvement due to parallelization of the matrix multiply algorithm. appeared first on College Pal. Visit us at College Pal – Connecting to a pal for your paper

Demonstrate and evaluate the speed improvement due to parallelization of the matrix multiply algorithm.

Submit Your Instructions to Writers for FREE!!

CLICK HERE TO ASK YOUR QUESTION