home *** CD-ROM | disk | FTP | other *** search
Wrap
IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) NNNNAAAAMMMMEEEE DDDDIIIItttteeeerrrraaaattttiiiivvvveeee, DDDDIIIItttteeeerrrraaaattttiiiivvvveeee____DDDDrrrrooooppppTTTToooollll, DDDDIIIItttteeeerrrraaaattttiiiivvvveeee____DDDDrrrrooooppppSSSSttttoooorrrraaaaggggeeee - Parallel sparse iterative linear system solver SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS Fortran synopsis: SSSSUUUUBBBBRRRROOOOUUUUTTTTIIIINNNNEEEE DDDDIIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE ((((_n,,,, _p_o_i_n_t_e_r_s,,,, _i_n_d_i_c_e_s,,,, _v_a_l_u_e_s,,,, _s_t_o_r_a_g_e,,,, _x,,,, _b,,,, _m_e_t_h_o_d,,,, _p_r_e_c_o_n_d,,,, _m_a_x_i_t_e_r_s,,,, _c_o_n_v_t_o_l,,,, _i_t_e_r_s,,,, _f_i_n_a_l_r_e_s)))) IIIINNNNTTTTEEEEGGGGEEEERRRR _n,,,, _s_t_o_r_a_g_e,,,, _m_e_t_h_o_d,,,, _p_r_e_c_o_n_d,,,, _m_a_x_i_t_e_r_s,,,, _i_t_e_r_s IIIINNNNTTTTEEEEGGGGEEEERRRR _p_o_i_n_t_e_r_s(*), _i_n_d_i_c_e_s(*) DDDDOOOOUUUUBBBBLLLLEEEE PPPPRRRREEEECCCCIIIISSSSIIIIOOOONNNN _v_a_l_u_e_s(*),,,, _x(*),,,, _b(*) DDDDOOOOUUUUBBBBLLLLEEEE PPPPRRRREEEECCCCIIIISSSSIIIIOOOONNNN _c_o_n_v_t_o_l,,,, _f_i_n_a_l_r_e_s SSSSUUUUBBBBRRRROOOOUUUUTTTTIIIINNNNEEEE DDDDIIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE____DDDDRRRROOOOPPPPTTTTOOOOLLLL ((((_D_r_o_p_T_o_l_e_r_a_n_c_e)))) DDDDOOOOUUUUBBBBLLLLEEEE PPPPRRRREEEECCCCIIIISSSSIIIIOOOONNNN _D_r_o_p_T_o_l_e_r_a_n_c_e SSSSUUUUBBBBRRRROOOOUUUUTTTTIIIINNNNEEEE DDDDIIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE____DDDDRRRROOOOPPPPSSSSTTTTOOOORRRRAAAAGGGGEEEE ((((_S_t_o_r_a_g_e__M_u_l_t_i_p_l_i_e_r)))) DDDDOOOOUUUUBBBBLLLLEEEE PPPPRRRREEEECCCCIIIISSSSIIIIOOOONNNN _S_t_o_r_a_g_e__M_u_l_t_i_p_l_i_e_r C/C++ synopsis: ####iiiinnnncccclllluuuuddddeeee <<<<ssssccccssssllll____ssssppppaaaarrrrsssseeee....hhhh>>>> vvvvooooiiiidddd DDDDIIIItttteeeerrrraaaattttiiiivvvveeee ((((iiiinnnntttt _n, iiiinnnntttt _p_o_i_n_t_e_r_s[[[[]]]],,,, iiiinnnntttt _i_n_d_i_c_e_s[[[[]]]],,,, ddddoooouuuubbbblllleeee _v_a_l_u_e_s[[[[]]]],,,, iiiinnnntttt _s_t_o_r_a_g_e,,,, ddddoooouuuubbbblllleeee _x[[[[]]]],,,, ddddoooouuuubbbblllleeee _b[[[[]]]],,,, iiiinnnntttt _m_e_t_h_o_d,,,, iiiinnnntttt _p_r_e_c_o_n_d,,,, iiiinnnntttt _m_a_x_i_t_e_r_s,,,, ddddoooouuuubbbblllleeee _c_o_n_v_t_o_l,,,, iiiinnnntttt *_i_t_e_r_s,,,, ddddoooouuuubbbblllleeee *_f_i_n_a_l_r_e_s)))) vvvvooooiiiidddd DDDDIIIItttteeeerrrraaaattttiiiivvvveeee____DDDDrrrrooooppppTTTToooollll ((((ddddoooouuuubbbblllleeee _d_r_o_p__t_o_l_e_r_a_n_c_e ))));;;; vvvvooooiiiidddd DDDDIIIItttteeeerrrraaaattttiiiivvvveeee____DDDDrrrrooooppppSSSSttttoooorrrraaaaggggeeee ((((ddddoooouuuubbbblllleeee _s_t_o_r_a_g_e__m_u_l_t_i_p_l_i_e_r ))));;;; IIIIMMMMPPPPLLLLEEEEMMMMEEEENNNNTTTTAAAATTTTIIIIOOOONNNN These routines are part of the SCSL Scientific Library and can be loaded using either the ----llllssssccccssss or the ----llllssssccccssss____mmmmpppp option. The ----llllssssccccssss____mmmmpppp option directs the linker to use the multi-processor version of the library. When linking to SCSL with ----llllssssccccssss or ----llllssssccccssss____mmmmpppp, the default integer size is 4 bytes (32 bits). Another version of SCSL is available in which integers are 8 bytes (64 bits). This version allows the user access to larger memory sizes and helps when porting legacy Cray codes. It can be loaded by using the ----llllssssccccssss____iiii8888 option or the ----llllssssccccssss____iiii8888____mmmmpppp option. A program may use only one of the two versions; 4-byte integer and 8-byte integer library calls cannot be mixed. The C and C++ prototypes shown above are appropriate for the 4-byte integer version of SCSL. When using the 8-byte integer version, the variables of type iiiinnnntttt become lllloooonnnngggg lllloooonnnngggg and the <<<<ssssccccssssllll____ssssppppaaaarrrrsssseeee____iiii8888....hhhh>>>> header file should be included. PPPPaaaaggggeeee 1111 IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN DDDDIIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE uses iterative techniques to solve the sparse system of equations _A _x = _b. Four different parallel preconditioned iterative solvers can be chosen: conjugate gradient (CG) and conjugate residual (CR) for symmetric systems, and conjugate gradient squared (CGS) and BiCGSTAB, a variant of CGS with smoother convergence properties, for unsymmetric systems. Several preconditioning schemes are available: Jacobi, symmetric successive over-relaxation (SSOR), incomplete LU (ILU) by pattern, also known as no-fill ILU, and incomplete LU by value, also known as thresholded ILU. In this release, the incomplete LU preconditioners are only available for symmetric matrices, specifically, they are incomplete LDLT (ILDLT) preconditioners. The ILDLT by value currently does not run in parallel. Two additional routines control parameters for the LDLT by value preconditioner: * DDDDIIIItttteeeerrrraaaattttiiiivvvveeee____DDDDrrrrooooppppTTTToooollll(((()))) allows the user to set the drop tolerance for the incomplete factorization. * DDDDIIIItttteeeerrrraaaattttiiiivvvveeee____DDDDrrrrooooppppSSSSttttoooorrrraaaaggggeeee(((()))) allows the user to control the amount of storage used for the incomplete factor. SSSSppppaaaarrrrsssseeee MMMMaaaattttrrrriiiixxxx FFFFoooorrrrmmmmaaaatttt Sparse matrix _A must be input to DDDDIIIItttteeeerrrraaaattttiiiivvvveeee in Compressed Sparse Column Storage format (CSC) (also known as Harwell-Boeing format) or Compressed Sparse Row Storage format (CSR). The matrix is held in three arrays: _p_o_i_n_t_e_r_s, _i_n_d_i_c_e_s, and _v_a_l_u_e_s. In CSC format, the _i_n_d_i_c_e_s array contains the row indices of the non-zeros in _A. The _v_a_l_u_e_s array holds the corresponding non-zero values. The _p_o_i_n_t_e_r_s array contains the index in _i_n_d_i_c_e_s for the first non-zero in each column of _A. Thus, the row indices for the non-zeros in column _i can be found in locations _i_n_d_i_c_e_s[[[[_p_o_i_n_t_e_r_s[[[[_i]]]]]]]] through _i_n_d_i_c_e_s[[[[_p_o_i_n_t_e_r_s[[[[_i+1]]]]-1]]]]. The corresponding values can be found in location _v_a_l_u_e_s[[[[_p_o_i_n_t_e_r_s[[[[_i]]]]]]]] through _v_a_l_u_e_s[[[[_p_o_i_n_t_e_r_s[[[[_i+1]]]]-1]]]]. For a symmetric matrix _A, the user must input either the lower or upper triangle of _A, but not both. Non-zeroes within a column of _A can be stored in any order. In the following example, the symmetric matrix: 1.0 0.0 3.0 2.0 0.0 5.0 0.0 4.0 0.0 6.0 PPPPaaaaggggeeee 2222 IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) stored in CSC format would be represented in FORTRAN as follows: INTEGER pointers(5), indices(6), i DOUBLE PRECISION values(6) DATA (pointers(i), i = 1, 5) / 1, 3, 5, 6, 7 / DATA (indices(i), i = 1, 6) / 1, 3, 2, 4, 3, 4 / DATA (values(i), i = 1, 6) / 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 / Zero-based indexing is used in C, so the _p_o_i_n_t_e_r_s and _i_n_d_i_c_e_s arrays would instead contain the following: int pointers[] = {0, 2, 4, 5, 6} int indices[] = {0, 2, 1, 3, 2, 3} double values[] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0} In CSR format, the _i_n_d_i_c_e_s array contains the column indices of the non- zeros in _A. The _v_a_l_u_e_s array holds the corresponding non-zero values. The _p_o_i_n_t_e_r_s array contains the index in _i_n_d_i_c_e_s for the first non-zero in each row of _A. Thus, the colunm indices for the non-zeros in row _i can be found in locations _i_n_d_i_c_e_s[[[[_p_o_i_n_t_e_r_s[[[[_i]]]]]]]] through _i_n_d_i_c_e_s[[[[_p_o_i_n_t_e_r_s[[[[_i+1]]]]-1]]]]. The corresponding values can be found in location _v_a_l_u_e_s[[[[_p_o_i_n_t_e_r_s[[[[_i]]]]]]]] through _v_a_l_u_e_s[[[[_p_o_i_n_t_e_r_s[[[[_i+1]]]]-1]]]]. Using the same symmetric matrix as in the above example: 1.0 0.0 3.0 2.0 0.0 5.0 0.0 4.0 0.0 6.0 the corresponding CSR format would be represented in FORTRAN as follows: INTEGER pointers(5), indices(6), i DOUBLE PRECISION values(6) DATA (pointers(i), i = 1, 5) / 1, 2, 3, 5, 7 / DATA (indices(i), i = 1, 6) / 1, 2, 1, 3, 2, 4 / DATA (values(i), i = 1, 6) / 1.0, 3.0, 2.0, 5.0, 4.0, 6.0 / Zero-based indexing is used in C, so the _p_o_i_n_t_e_r_s and _i_n_d_i_c_e_s arrays would instead contain the following: int pointers[] = {0, 1, 2, 4, 6} int indices[] = {0, 1, 0, 2, 1, 3} double values[] = {1.0, 3.0, 2.0, 5.0, 4.0, 6.0} PPPPaaaaggggeeee 3333 IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) These routines have the following arguments: _n (input) Integer. The number of rows and columns in the matrix _A. _n>=_0. _p_o_i_n_t_e_r_s, _i_n_d_i_c_e_s, _v_a_l_u_e_s (input) The _p_o_i_n_t_e_r_s and _i_n_d_i_c_e_s arrays store the non-zero structure of sparse input matrix _A in Compressed Sparse Column (CSC) or Compressed Sparse Row (CSR) format. In CSC format, the _p_o_i_n_t_e_r_s array stores _n+1 integers, where _p_o_i_n_t_e_r_s[[[[_i]]]] gives the index in _i_n_d_i_c_e_s of the first non-zero in column _i of _A. The _i_n_d_i_c_e_s array stores the row indices of the non-zeros in _A. The _v_a_l_u_e_s array stores the non-zero values in the matrix _A. _s_t_o_r_a_g_e (input) An integer. Specifies if the matrix is stored by columns or by rows. If _s_t_o_r_a_g_e====0000, the matrix is stored by columns (CSC), if _s_t_o_r_a_g_e====1111, the matrix is stored by rows (CSR). _x (input/output) The initial guess and final solution vector. _b (input) The right-hand-side vector in a DDDDIIIItttteeeerrrraaaattttiiiivvvveeee call. _m_e_t_h_o_d (input) An integer specifying the iterative method used. _m_e_t_h_o_d = 0: conjugate gradient _m_e_t_h_o_d = 1: conjugate residual _m_e_t_h_o_d = 10: conjugate gradient squared _m_e_t_h_o_d = 11: BiCGSTAB _p_r_e_c_o_n_d (input) An integer specifying the preconditioner used. 0000 <<<<==== _m_e_t_h_o_d <<<<==== 3333.... _p_r_e_c_o_n_d = 0: use Jacobi preconditioner. This option is not yet supported for unsymmetric matrices stored in CSC format. _p_r_e_c_o_n_d = 1: use SSOR preconditioner _p_r_e_c_o_n_d = 2: use no-fill ILU preconditioner _p_r_e_c_o_n_d = 3: use thresholded ILU preconditioner _m_a_x_i_t_e_r_s (input) An integer. The solver terminates once it has performed maxiters iterations. PPPPaaaaggggeeee 4444 IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) _c_o_n_v_t_o_l (input) A double precision number. The solver terminates when the norm of the residual relative to the norm of the right- hand-side is less than convtol. _i_t_e_r_s (output) The number of iterations performed by DDDDIIIItttteeeerrrraaaattttiiiivvvveeee. _f_i_n_a_l_r_e_s (output) A double precision number containing the final 2-norm of the residual. _d_r_o_p__t_o_l_e_r_a_n_c_e (input) A double precision argument to DDDDIIIItttteeeerrrraaaattttiiiivvvveeee____DDDDrrrrooooppppTTTToooollll. In thresholded factorization, an entry is discarded if it is smaller than _d_r_o_p__t_o_l_e_r_a_n_c_e times the corresponding diagonal element. _s_t_o_r_a_g_e__m_u_l_t_i_p_l_i_e_r (input) A double precision argument to DDDDIIIItttteeeerrrraaaattttiiiivvvveeee____DDDDrrrrooooppppSSSSttttoooorrrraaaaggggeeee. In thresholded factorization, the drop tolerance is automatically increased if the incomplete factor matrix contains more than _s_t_o_r_a_g_e__m_u_l_t_i_p_l_i_e_r times the number of non- zero values in A. EEEENNNNVVVVIIIIRRRROOOONNNNMMMMEEEENNNNTTTT VVVVAAAARRRRIIIIAAAABBBBLLLLEEEESSSS Environment variables can control various run-time features: * IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE____VVVVEEEERRRRBBBBOOOOSSSSEEEE prints messages about steps taken during factorization. * IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE____DDDDUUUUMMMMPPPP prints the matrix into the file "ppcr.mat" in CSC (Harwell-Boeing) format. * IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE____RRRRCCCCMMMM controls matrix reordering. Ordering of the matrix is NOT done be default. This variable, if defined, must be set to one of 0000 no reordering of the matrix is done (equivalent to NOT setting ITERATIVE_RCM). 1111 trimmed down version of Cuthill-McKee with only search for peripheral nodes and level sets is done. ----1111 reverse Cuthill-McKee reordering is performed. * OOOOMMMMPPPP____NNNNUUUUMMMM____TTTTHHHHRRRREEEEAAAADDDDSSSS determines the number of processors that are used by the iterative solver. NNNNOOOOTTTTEEEESSSS These routines are optimized and parallelized for the SGI R8000, R10000, R12000 and R14000 platforms. PPPPaaaaggggeeee 5555 IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) IIIITTTTEEEERRRRAAAATTTTIIIIVVVVEEEE((((3333SSSS)))) SEE ALSO IIIINNNNTTTTRRRROOOO____SSSSCCCCSSSSLLLL(3S), IIIINNNNTTTTRRRROOOO____SSSSOOOOLLLLVVVVEEEERRRRSSSS(3S) PPPPaaaaggggeeee 6666