Tpetra parallel linear algebra Version of the Day
Tpetra_DistributionLowerTriangularBlock.hpp
1// @HEADER
2// ***********************************************************************
3//
4// Tpetra: Templated Linear Algebra Services Package
5// Copyright (2008) Sandia Corporation
6//
7// Under the terms of Contract DE-AC04-94AL85000 with Sandia Corporation,
8// the U.S. Government retains certain rights in this software.
9//
10// Redistribution and use in source and binary forms, with or without
11// modification, are permitted provided that the following conditions are
12// met:
13//
14// 1. Redistributions of source code must retain the above copyright
15// notice, this list of conditions and the following disclaimer.
16//
17// 2. Redistributions in binary form must reproduce the above copyright
18// notice, this list of conditions and the following disclaimer in the
19// documentation and/or other materials provided with the distribution.
20//
21// 3. Neither the name of the Corporation nor the names of the
22// contributors may be used to endorse or promote products derived from
23// this software without specific prior written permission.
24//
25// THIS SOFTWARE IS PROVIDED BY SANDIA CORPORATION "AS IS" AND ANY
26// EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
27// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
28// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SANDIA CORPORATION OR THE
29// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
30// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
31// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
32// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
33// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
34// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
35// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
36//
37// Questions? Contact Michael A. Heroux (maherou@sandia.gov)
38//
39// ************************************************************************
40// @HEADER
41
42#ifndef __TPETRA_DISTRIBUTORLOWERTRIANGULARBLOCK_HPP
43#define __TPETRA_DISTRIBUTORLOWERTRIANGULARBLOCK_HPP
44
45// Needed by DistributionLowerTriangularBlock
46#include "Tpetra_Distributor.hpp"
47
48// Needed by LowerTriangularBlock operator
49#include "Tpetra_Core.hpp"
50#include "Tpetra_Map.hpp"
51#include "Tpetra_Operator.hpp"
52#include "Tpetra_Vector.hpp"
53
54namespace Tpetra
55{
56
58template <typename gno_t, typename scalar_t>
59class DistributionLowerTriangularBlock : public Distribution<gno_t,scalar_t> {
60// Seher Acer's lower-triangular block decomposition for triangle counting
61// See also: LowerTriangularBlockOperator below that allows this distribution
62// to be used in Tpetra SpMV.
63//
64// Requirements:
65// Matrix must be square (undirected graph)
66// Number of processors np = q(q+1)/2 for some q.
67//
68// Only the lower triangular portion of the matrix is stored.
69// Processors are arranged logically as follows:
70// 0
71// 1 2
72// 3 4 5
73// ...
74//
75// The lower triangular part of the matrix is stored in a 2D distribution.
76// For example, the dense 7x7 lower triangular matrix below would be assigned
77// to processors according to numbers shown as follows:
78// 0 | |
79// 00| |
80// ---------
81// 11|2 |
82// 11|22 |
83// 11|222|
84// ---------
85// 33|444|5
86// 33|444|55
87// ...
88// (Note that we expect the matrix to be sparse. For dense matrices,
89// CrsMatrix is the wrong tool.)
90//
91// Matrix rows are assigned to processor rows greedily to roughly balance
92// (# nonzeros in processor row / # processors in processor row)
93// across processor rows.
94// The same cuts are used to divide rows and columns among processors
95// (that is, all processors have a square block).
96//
97// The lower triangular algorithm:
98// 1. distribute all matrix entries via 1D linear distribution
99// (this initial distribution is needed to avoid storing the entire
100// matrix on one processor, while providing info about the nonzeros per row
101// needed in step 2.
102// 2. (optional) sort rows in decreasing order wrt the number of nonzeros
103// per row
104// 3. find "chunk cuts": divisions in row assignments such that
105// (# nonzeros in processor row / # processors in processor row) is
106// roughly equal for all processor rows
107// 4. send nonzeros to their new processor assignment
108//
109// Known issues: (TODO)
110// - The sorting in Step 2 and computation of chunk cuts in step 3 are
111// currently done in serial and requires O(number of rows) storage each
112// processor. More effort could parallelize this computation, but parallel
113// load balancing algorithms are more appropriate in Zoltan2 than Tpetra.
114// - The sorting in Step 2 renumbers the rows (assigns new Global Ordinals to
115// the rows) to make them contiguous, as needed in Acer's triangle counting
116// algorithm.
117// (Acer's algorithm relies on local indexing from the chunk boundaries to
118// find neighbors needed for communication.)
119// The class currently provides a permutation matrix P describing the
120// reordering. Thus, the matrix stored in the lower triangular block
121// distribution is actually P A P -- the row and column permutation of
122// matrix A in the Matrix Market file.
123// The fact that a permuted matrix is stored complicates use of the matrix
124// in algorithms other than Acer's triangle counting. For SpMV with the
125// vector numbered according to the MatrixMarket numbering, for example,
126// P^T must be applied to the vector before SpMV, and P^T must be applied to
127// the result of SpMV. See LowerTriangularBlockOperator to see how this
128// permutation matrix is used.
129//
130// Before addressing these issues, we will decide (TODO)
131// - Is this Distribution general enough to be in Tpetra?
132// - Should we, instead, have a separate package for distributions (that could
133// use Zoltan2 and Tpetra without circular dependence)?
134// - Or should we allow users (such as the triangle counting algorithm) to
135// provide their own distributions (e.g., LowerTriangularBlock) that
136// inherit from Tpetra's Distribution class?
137// For now, we will push this Distribution into Tpetra, but we will revisit
138// this decision.
139
140public:
141 using Distribution<gno_t,scalar_t>::me;
142 using Distribution<gno_t,scalar_t>::np;
143 using Distribution<gno_t,scalar_t>::comm;
144 using Distribution<gno_t,scalar_t>::nrows;
145 using typename Distribution<gno_t,scalar_t>::NZindex_t;
146 using typename Distribution<gno_t,scalar_t>::LocalNZmap_t;
147
148 using map_t = Tpetra::Map<>;
149 using matrix_t = Tpetra::CrsMatrix<scalar_t>;
150
151 DistributionLowerTriangularBlock(size_t nrows_,
152 const Teuchos::RCP<const Teuchos::Comm<int> > &comm_,
153 const Teuchos::ParameterList &params) :
154 Distribution<gno_t,scalar_t>(nrows_, comm_, params),
155 initialDist(nrows_, comm_, params),
156 sortByDegree(false), permMatrix(Teuchos::null),
157 redistributed(false), chunksComputed(false), nChunks(0)
158 {
159 int npnp = 2 * np;
160 nChunks = int(std::sqrt(float(npnp)));
161 while (nChunks * (nChunks + 1) < npnp) nChunks++;
162
163 TEUCHOS_TEST_FOR_EXCEPTION(nChunks * (nChunks+1) != npnp, std::logic_error,
164 "Number of processors np = " << np <<
165 " must satisfy np = q(q+1)/2 for some q" <<
166 " for LowerTriangularBlock distribution");
167 nChunksPerRow = double(nChunks) / double(nrows);
168
169 const Teuchos::ParameterEntry *pe = params.getEntryPtr("sortByDegree");
170 if (pe != NULL) sortByDegree = pe->getValue<bool>(&sortByDegree);
171
172 pe = params.getEntryPtr("readPerProcess");
173 if (pe != NULL) redistributed = pe->getValue<bool>(&redistributed);
174
175 if (me == 0) std::cout << "\n LowerTriangularBlock Distribution: "
176 << "\n np = " << np
177 << "\n nChunks = " << nChunks
178 << std::endl;
179 }
180
181 enum DistributionType DistType() { return LowerTriangularBlock; }
182
183 bool areChunksComputed() {return chunksComputed; }
184
185 Teuchos::Array<gno_t> getChunkCuts() {
186 if(chunksComputed)
187 return chunkCuts;
188 else {
189 throw std::runtime_error("Error: Requested chunk cuts have not been computed yet.");
190 }
191 }
192
193 // Return whether this rank owns vector entry i.
194 // TODO: for now, use same vector dist as 1DLinear;
195 // TODO: think about best distribution of Vectors
196 inline bool VecMine(gno_t i) { return initialDist.VecMine(i); }
197
198 // Return whether this rank owns nonzero (i,j)
199 // Vector map and row map are the same in 1D distribution.
200 // But keep only the lower Triangular entries
201 bool Mine(gno_t i, gno_t j) {
202 if (redistributed) {
203 if (j > i) return false; // Don't keep any upper triangular entries
204 else return (procFromChunks(i,j) == me);
205 }
206 else
207 return initialDist.Mine(i,j);
208 }
209
210 inline bool Mine(gno_t i, gno_t j, int p) {return Mine(i,j);}
211
212 // How to redistribute according to chunk-based row distribution
213 void Redistribute(LocalNZmap_t &localNZ)
214 {
215 // Going to do chunking and sorting serially for now;
216 // need to gather per-row information from each processor
217 // TODO: think about a parallel implementation
218
219 gno_t myFirstRow = initialDist.getFirstRow(me);
220 gno_t nMyRows = initialDist.getNumRow(me);
221 Teuchos::Array<gno_t> nnzPerRow(nMyRows, 0);
222
223 Teuchos::Array<int> rcvcnt(np);
224 Teuchos::Array<int> disp(np);
225 for (int sum = 0, p = 0; p < np; p++) {
226 int prows = initialDist.getNumRow(p);
227 rcvcnt[p] = prows;
228 disp[p] = sum;
229 sum += prows;
230 }
231
232 // If desire sortByDegree, first need to sort with respect to ALL entries
233 // in matrix (not lower-triangular entries);
234 // decreasing sort by number of entries per row in global matrix.
235 // Generate permuteIndex for the sorted rows
236
237 Teuchos::Array<gno_t> permuteIndex; // This is the inverse permutation
238 Teuchos::Array<gno_t> sortedOrder; // This is the original permutation
239
240 Teuchos::Array<gno_t> globalRowBuf;
241 // TODO Dunno why there isn't a Teuchos::gatherAllv;
242 // TODO for now, compute and broadcast
243 if (me == 0) {
244 globalRowBuf.resize(nrows, 0); // TODO: Ick! Need parallel
245 }
246
247 if (sortByDegree) {
248 // Compute nnzPerRow; distribution is currently 1D and includes all nz
249 for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
250 gno_t I = it->first.first;
251 nnzPerRow[I-myFirstRow]++;
252 }
253
254 Teuchos::gatherv<int,gno_t>(nnzPerRow.getRawPtr(), nMyRows,
255 globalRowBuf.getRawPtr(),
256 rcvcnt.getRawPtr(), disp.getRawPtr(),
257 0, *comm);
258
259 permuteIndex.resize(nrows); // TODO: Ick! Need parallel
260 sortedOrder.resize(nrows); // TODO: Ick! Need parallel
261
262 if (me == 0) { // TODO: do on all procs once have allgatherv
263
264 for (size_t i = 0 ; i != nrows; i++) sortedOrder[i] = i;
265
266 std::sort(sortedOrder.begin(), sortedOrder.end(),
267 [&](const size_t& a, const size_t& b) {
268 return (globalRowBuf[a] > globalRowBuf[b]);
269 }
270 );
271
272 // Compute inverse permutation; it is more useful for our needs
273 for (size_t i = 0; i < nrows; i++) {
274 permuteIndex[sortedOrder[i]] = i;
275 }
276 }
277
278 Teuchos::broadcast<int,gno_t>(*comm, 0, permuteIndex(0,nrows));
279 // Ick! Use a directory TODO
280
281 // Sorting is changing the global IDs associated
282 // with rows/columns. To make this distribution applicable beyond
283 // triangle counting (e.g., in a Tpetra operator), we need a way
284 // to map from the original global IDs and back again.
285 // Create a permutation matrix for use in the operator; use
286 // default Tpetra layout.
287 Teuchos::Array<gno_t> myRows;
288 for (size_t i = 0; i < nrows; i++) {
289 if (VecMine(i)) myRows.push_back(i);
290 }
291
292 Tpetra::global_size_t dummy =
293 Teuchos::OrdinalTraits<Tpetra::global_size_t>::invalid();
294 Teuchos::RCP<const map_t> permMap =
295 rcp(new map_t(dummy, myRows(), 0, comm));
296
297 permMatrix = rcp(new matrix_t(permMap, 1)); // one nz / row in permMatrix
298
299 Teuchos::Array<gno_t> cols(1);
300 Teuchos::Array<scalar_t> vals(1); vals[0] = 1.;
301
302 for (size_t i = 0; i < permMap->getNodeNumElements(); i++) {
303 gno_t gid = permMap->getGlobalElement(i);
304 cols[0] = permuteIndex[gid];
305 permMatrix->insertGlobalValues(gid, cols(), vals());
306 }
307
308 permMatrix->fillComplete(permMap, permMap);
309 }
310
311 // Now, to determine the chunks, we care only about the number of
312 // nonzeros in the lower triangular matrix.
313 // Compute nnzPerRow; distribution is currently 1D
314 nnzPerRow.assign(nMyRows, 0);
315 size_t nnz = 0;
316 for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
317 gno_t I = (sortByDegree ? permuteIndex[it->first.first]
318 : it->first.first);
319 gno_t J = (sortByDegree ? permuteIndex[it->first.second]
320 : it->first.second);
321 if (J <= I) {// Lower-triangular part
322 nnzPerRow[it->first.first - myFirstRow]++;
323 nnz++;
324 }
325 }
326
327 // TODO Dunno why there isn't a Teuchos::gatherAllv;
328 // TODO for now, compute and broadcast
329
330 Teuchos::gatherv<int,gno_t>(nnzPerRow.getRawPtr(), nMyRows,
331 globalRowBuf.getRawPtr(),
332 rcvcnt.getRawPtr(), disp.getRawPtr(),
333 0, *comm);
334
335 Teuchos::Array<int>().swap(rcvcnt); // no longer needed
336 Teuchos::Array<int>().swap(disp); // no longer needed
337
338 size_t gNnz;
339 Teuchos::reduceAll(*comm, Teuchos::REDUCE_SUM, 1, &nnz, &gNnz);
340
341 chunkCuts.resize(nChunks+1, 0);
342
343
344 if (me == 0) { // TODO: when have allgatherv, can do on all procs
345 // TODO: or better, implement parallel version
346
347 // Determine chunk cuts
348 size_t target = gNnz / np; // target nnz per processor
349 size_t targetRunningTotal = 0;
350 size_t currentRunningTotal = 0;
351 gno_t I = gno_t(0);
352 for (int chunkCnt = 0; chunkCnt < nChunks; chunkCnt++) {
353 targetRunningTotal = (target * (chunkCnt+1));
354 currentRunningTotal = 0;
355 while (I < static_cast<gno_t>(nrows)) {
356 size_t nextNnz = (sortByDegree ? globalRowBuf[sortedOrder[I]]
357 : globalRowBuf[I]);
358 if (currentRunningTotal + nextNnz <= targetRunningTotal) {
359 currentRunningTotal += nextNnz;
360 I++;
361 }
362 else
363 break;
364 }
365 chunkCuts[chunkCnt+1] = I;
366 }
367 chunkCuts[nChunks] = static_cast<gno_t>(nrows);
368 }
369
370 // Free memory associated with globalRowBuf
371 Teuchos::Array<gno_t>().swap(globalRowBuf);
372
373 Teuchos::broadcast<int,gno_t>(*comm, 0, chunkCuts(0,nChunks+1));
374 chunksComputed = true;
375
376 //std::cout << comm->getRank() << " KDDKDD chunkCuts: ";
377 //for (int kdd=0; kdd <= nChunks; kdd++) std::cout << chunkCuts[kdd] << " ";
378 //std::cout << std::endl;
379
380 // Determine new owner of each nonzero; buffer for sending
381 Teuchos::Array<gno_t> iOut(localNZ.size());
382 Teuchos::Array<gno_t> jOut(localNZ.size());
383 Teuchos::Array<scalar_t> vOut(localNZ.size());
384 Teuchos::Array<int> pOut(localNZ.size());
385
386 size_t sendCnt = 0;
387 for (auto it = localNZ.begin(); it != localNZ.end(); it++) {
388 iOut[sendCnt] = (sortByDegree ? permuteIndex[it->first.first]
389 : it->first.first);
390 jOut[sendCnt] = (sortByDegree ? permuteIndex[it->first.second]
391 : it->first.second);
392 if (jOut[sendCnt] <= iOut[sendCnt]) { // keep only lower diagonal entries
393 vOut[sendCnt] = it->second;
394 pOut[sendCnt] = procFromChunks(iOut[sendCnt], jOut[sendCnt]);
395
396 //std::cout << comm->getRank()
397 // << " KDDKDD IJ ("
398 // << it->first.first << "," << it->first.second
399 // << ") permuted to ("
400 // << iOut[sendCnt] << "," << jOut[sendCnt]
401 // << ") being sent to " << pOut[sendCnt]
402 // << std::endl;
403 sendCnt++;
404 }
405 }
406
407 // Free memory associated with localNZ and permuteIndex
408 LocalNZmap_t().swap(localNZ);
409 if (sortByDegree) Teuchos::Array<gno_t>().swap(permuteIndex);
410
411 // Use a Distributor to send nonzeros to new processors.
412 Tpetra::Distributor plan(comm);
413 size_t nrecvs = plan.createFromSends(pOut(0,sendCnt));
414 Teuchos::Array<gno_t> iIn(nrecvs);
415 Teuchos::Array<gno_t> jIn(nrecvs);
416 Teuchos::Array<scalar_t> vIn(nrecvs);
417
418 // TODO: With more clever packing, could do only one round of communication
419 plan.doPostsAndWaits<gno_t>(iOut(0,sendCnt), 1, iIn());
420 plan.doPostsAndWaits<gno_t>(jOut(0,sendCnt), 1, jIn());
421 plan.doPostsAndWaits<scalar_t>(vOut(0,sendCnt), 1, vIn());
422
423 // Put received nonzeros in map
424 for (size_t n = 0; n < nrecvs; n++) {
425 NZindex_t nz(iIn[n], jIn[n]);
426 localNZ[nz] = vIn[n];
427 }
428
429 redistributed = true;
430 }
431
432 Teuchos::RCP<matrix_t> getPermutationMatrix() const { return permMatrix; }
433
434private:
435 // Initially distribute nonzeros with a 1D linear distribution
436 Distribution1DLinear<gno_t,scalar_t> initialDist;
437
438 // Flag indicating whether matrix should be reordered and renumbered
439 // in decreasing sort order of number of nonzeros per row in full matrix
440 bool sortByDegree;
441
442 // Column permutation matrix built only when sortByDegree = true;
443 Teuchos::RCP<matrix_t> permMatrix;
444
445 // Flag whether redistribution has occurred yet
446 // This is true
447 // i) after Tpetra performs the redistribution or
448 // ii) when Tpetra reads already-distributed nonzeros by readPerProcess function
449 bool redistributed;
450
451 // If we read the already-distributed nonzeros from per-process files,
452 // this will remain false until a triangle counting code actually computes
453 // the chunks when the need arises.
454 bool chunksComputed;
455
456 int nChunks; // in np = q(q+1)/2 layout, nChunks = q
457 double nChunksPerRow;
458 Teuchos::Array<gno_t> chunkCuts;
459
460 int findIdxInChunks(gno_t I) {
461 int m = I * nChunksPerRow;
462 while (I < chunkCuts[m]) m--;
463 while (I >= chunkCuts[m+1]) m++;
464 return m;
465 }
466
467 int procFromChunks(gno_t I, gno_t J) {
468 int m = findIdxInChunks(I);
469 int n = findIdxInChunks(J);
470 int p = m*(m+1)/2 + n;
471 return p;
472 }
473};
474
475
477// Tpetra::Operator that works with the DistributionLowerTriangularBlock
478
479template <typename scalar_t,
480 class Node = ::Tpetra::Details::DefaultTypes::node_type>
481class LowerTriangularBlockOperator :
482 public Tpetra::Operator<scalar_t, Tpetra::Map<>::local_ordinal_type,
483 Tpetra::Map<>::global_ordinal_type,
484 Node>
485{
486public:
489 using map_t = Tpetra::Map<>;
490 using import_t = Tpetra::Import<>;
491 using export_t = Tpetra::Export<>;
492 using vector_t = Tpetra::Vector<scalar_t>;
493 using mvector_t = Tpetra::MultiVector<scalar_t>;
495 using dist_t = Tpetra::DistributionLowerTriangularBlock<gno_t, scalar_t>;
496
497 LowerTriangularBlockOperator(
498 const Teuchos::RCP<const matrix_t> &lowerTriangularMatrix_,
499 const dist_t &dist)
500 : lowerTriangularMatrix(lowerTriangularMatrix_),
501 permMatrix(dist.getPermutationMatrix())
502 {
503 // LowerTriangularBlockOperator requires the range map and domain map
504 // to be the same. Check it here.
505 TEUCHOS_TEST_FOR_EXCEPTION(
506 !lowerTriangularMatrix->getRangeMap()->isSameAs(
507 *lowerTriangularMatrix->getDomainMap()),
508 std::logic_error,
509 "The Domain and Range maps of the LowerTriangularBlock matrix "
510 "must be the same");
511
512 // Extract diagonals
513
514 vector_t diagByRowMap(lowerTriangularMatrix->getRowMap());
515 lowerTriangularMatrix->getLocalDiagCopy(diagByRowMap);
516 diag = Teuchos::rcp(new vector_t(lowerTriangularMatrix->getRangeMap()));
517 Tpetra::Export<> exporter(lowerTriangularMatrix->getRowMap(),
518 lowerTriangularMatrix->getRangeMap());
519 diag->doExport(diagByRowMap, exporter, Tpetra::ADD);
520 }
521
522 void apply(const mvector_t &x, mvector_t &y, Teuchos::ETransp mode,
523 scalar_t alpha, scalar_t beta) const
524 {
525 scalar_t ZERO = Teuchos::ScalarTraits<scalar_t>::zero();
526 scalar_t ONE = Teuchos::ScalarTraits<scalar_t>::one();
527 if (alpha == ZERO) {
528 if (beta == ZERO) y.putScalar(ZERO);
529 else y.scale(beta);
530 return;
531 }
532
533 if (permMatrix == Teuchos::null) {
534
535 // Multiply lower triangular
536 lowerTriangularMatrix->apply(x, y, Teuchos::NO_TRANS, alpha, beta);
537
538 // Multiply upper triangular
539 lowerTriangularMatrix->apply(x, y, Teuchos::TRANS, alpha, ONE);
540
541 // Subtract out duplicate diagonal terms
542 y.elementWiseMultiply(-alpha, *diag, x, ONE);
543 }
544 else {
545
546 // With sorting, the LowerTriangularBlock distribution stores (P^T A P)
547 // in the CrsMatrix, for permutation matrix P.
548 // Thus, apply must compute
549 // y = P (beta (P^T y) + alpha (P^T A P) (P^T x))
550
551 vector_t xtmp(x.getMap(), x.getNumVectors());
552 vector_t ytmp(y.getMap(), y.getNumVectors());
553
554 permMatrix->apply(x, xtmp, Teuchos::TRANS);
555 if (beta != ZERO) permMatrix->apply(y, ytmp, Teuchos::TRANS);
556
557 // Multiply lower triangular
558 lowerTriangularMatrix->apply(xtmp, ytmp, Teuchos::NO_TRANS, alpha, beta);
559
560 // Multiply upper triangular
561 lowerTriangularMatrix->apply(xtmp, ytmp, Teuchos::TRANS, alpha, ONE);
562
563 // Subtract out duplicate diagonal terms
564 ytmp.elementWiseMultiply(-alpha, *diag, xtmp, ONE);
565
566 permMatrix->apply(ytmp, y, Teuchos::NO_TRANS);
567 }
568 }
569
570 Teuchos::RCP<const map_t> getDomainMap() const {
571 return lowerTriangularMatrix->getDomainMap();
572 }
573
574 Teuchos::RCP<const map_t> getRangeMap() const {
575 return lowerTriangularMatrix->getRangeMap();
576 }
577
578 bool hasTransposeApply() const {return true;} // Symmetric matrix
579
580private:
581 const Teuchos::RCP<const matrix_t > lowerTriangularMatrix;
582 const Teuchos::RCP<const matrix_t > permMatrix;
583 Teuchos::RCP<vector_t> diag;
584};
585
586
587}
588#endif
Functions for initializing and finalizing Tpetra.
Sparse matrix that presents a row-oriented interface that lets users read or modify entries.
Sets up and executes a communication plan for a Tpetra DistObject.
Communication plan for data redistribution from a (possibly) multiply-owned to a uniquely-owned distr...
Communication plan for data redistribution from a uniquely-owned to a (possibly) multiply-owned distr...
A parallel distribution of indices over processes.
GlobalOrdinal global_ordinal_type
The type of global indices.
LocalOrdinal local_ordinal_type
The type of local indices.
One or more distributed dense vectors.
Abstract interface for operators (e.g., matrices and preconditioners).
virtual void apply(const MultiVector< scalar_t, Tpetra::Map<>::local_ordinal_type, Tpetra::Map<>::global_ordinal_type, ::Tpetra::Details::DefaultTypes::node_type > &X, MultiVector< scalar_t, Tpetra::Map<>::local_ordinal_type, Tpetra::Map<>::global_ordinal_type, ::Tpetra::Details::DefaultTypes::node_type > &Y, Teuchos::ETransp mode=Teuchos::NO_TRANS, scalar_t alpha=Teuchos::ScalarTraits< scalar_t >::one(), scalar_t beta=Teuchos::ScalarTraits< scalar_t >::zero()) const=0
Computes the operator-multivector application.
A distributed dense vector.
Namespace Tpetra contains the class and methods constituting the Tpetra library.
void sort(View &view, const size_t &size)
Convenience wrapper for std::sort for host-accessible views.
size_t global_size_t
Global size_t object.
@ ADD
Sum new values.
@ ZERO
Replace old values with zero.