c - Optimising and why openmp is much slower than sequential way? -


i newbie in programming openmp. wrote simple c program multiply matrix vector. unfortunately, comparing executing time found openmp slower sequential way.

here code (here matrix n*n int, vector n int, result n long long):

#pragma omp parallel private(i,j) shared(matrix,vector,result,m_size) for(i=0;i<m_size;i++) {     for(j=0;j<m_size;j++)   {       result[i]+=matrix[i][j]*vector[j];   } } 

and code sequential way:

for (i=0;i<m_size;i++)         for(j=0;j<m_size;j++)             result[i] += matrix[i][j] * vector[j]; 

when tried these 2 implementations 999x999 matrix , 999 vector, execution time is:

sequential: 5439 ms parallel: 11120 ms

i cannot understand why openmp slower sequential algo (over 2 times slower!) can solve problem?

because when openmp distributes work among threads there lot of administration/synchronisation going on ensure values in shared matrix , vector not corrupted somehow. though read-only: humans see easily, compiler may not.

things try out pedagogic reasons:

0) happens if matrix , vector not shared?

1) parallelize inner "j-loop" first, keep outer "i-loop" serial. see happens.

2) not collect sum in result[i], in variable temp , assign contents result[i] after inner loop finished avoid repeated index lookups. don't forget init temp 0 before inner loop starts.


Comments

Popular posts from this blog

c++ - Function signature as a function template parameter -

algorithm - What are some ways to combine a number of (potentially incompatible) sorted sub-sets of a total set into a (partial) ordering of the total set? -

How to call a javascript function after the page loads with a chrome extension? -