c - Optimising and why openmp is much slower than sequential way? -
i newbie in programming openmp. wrote simple c program multiply matrix vector. unfortunately, comparing executing time found openmp slower sequential way.
here code (here matrix n*n int, vector n int, result n long long):
#pragma omp parallel private(i,j) shared(matrix,vector,result,m_size) for(i=0;i<m_size;i++) { for(j=0;j<m_size;j++) { result[i]+=matrix[i][j]*vector[j]; } }
and code sequential way:
for (i=0;i<m_size;i++) for(j=0;j<m_size;j++) result[i] += matrix[i][j] * vector[j];
when tried these 2 implementations 999x999 matrix , 999 vector, execution time is:
sequential: 5439 ms parallel: 11120 ms
i cannot understand why openmp slower sequential algo (over 2 times slower!) can solve problem?
because when openmp distributes work among threads there lot of administration/synchronisation going on ensure values in shared matrix , vector not corrupted somehow. though read-only: humans see easily, compiler may not.
things try out pedagogic reasons:
0) happens if matrix
, vector
not shared
?
1) parallelize inner "j-loop" first, keep outer "i-loop" serial. see happens.
2) not collect sum in result[i]
, in variable temp
, assign contents result[i]
after inner loop finished avoid repeated index lookups. don't forget init temp
0 before inner loop starts.
Comments
Post a Comment