home *** CD-ROM | disk | FTP | other *** search
Wrap
INTRO_SHMEM(3) INTRO_SHMEM(3) NNNNAAAAMMMMEEEE iiiinnnnttttrrrroooo____sssshhhhmmmmeeeemmmm - Introduction to logically shared memory access routines DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN The logically shared, distributed memory access (SHMEM) routines provide low-latency, high-bandwidth communication for use in highly parallelized scalable programs. The SHMEM routines are data passing library routines similar to message passing library routines. They can be used as an alternative to message passing routines such as Message Passing Interface (MPI) or Parallel Virtual Machine (PVM). Like the message passing routines, the SHMEM routines pass data between cooperating parallel processes. SHMEM routines can be used in programs that perform computations in separate address spaces and that explicitly pass data to and from different processes in the program. These processes are also called _p_r_o_c_e_s_s_i_n_g _e_l_e_m_e_n_t_s (PEs). The SHMEM routines minimize the overhead associated with data passing requests, maximize bandwidth, and minimize _d_a_t_a _l_a_t_e_n_c_y. _D_a_t_a _l_a_t_e_n_c_y is the period of time that starts when a PE initiates a transfer of data and ends when a PE can use the data. SHMEM routines support remote data transfer through _p_u_t operations, which transfer data to a different PE, and _g_e_t operations, which transfer data from a different PE. Other operations supported are work-shared broadcast and reduction, barrier synchronization, and atomic memory operations. An _a_t_o_m_i_c _m_e_m_o_r_y operation is an atomic read-and-update operation, such as a fetch-and-increment, on a remote or local data object. The value read is guaranteed to be the value of the data object just prior to the update. SSSSHHHHMMMMEEEEMMMM RRRRoooouuuuttttiiiinnnneeeessss The following SHMEM-related routines enhance the portabiliy of SHMEM programs across platforms. * PE queries: C/C++ only: ____nnnnuuuummmm____ppppeeeessss(3I), ____mmmmyyyy____ppppeeee(3I) Fortran only: NNNNUUUUMMMM____PPPPEEEESSSS(3I), MMMMYYYY____PPPPEEEE(3I) * Block data put routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____ppppuuuutttt33332222, sssshhhhmmmmeeeemmmm____ppppuuuutttt66664444, sssshhhhmmmmeeeemmmm____ppppuuuutttt111122228888 C/C++ only: sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____ppppuuuutttt, sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____ppppuuuutttt, sssshhhhmmmmeeeemmmm____iiiinnnntttt____ppppuuuutttt, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____ppppuuuutttt, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____ppppuuuutttt Fortran only: sssshhhhmmmmeeeemmmm____ccccoooommmmpppplllleeeexxxx____ppppuuuutttt, sssshhhhmmmmeeeemmmm____iiiinnnntttteeeeggggeeeerrrr____ppppuuuutttt, sssshhhhmmmmeeeemmmm____llllooooggggiiiiccccaaaallll____ppppuuuutttt, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll____ppppuuuutttt * Block data get routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____ggggeeeetttt33332222, sssshhhhmmmmeeeemmmm____ggggeeeetttt66664444, sssshhhhmmmmeeeemmmm____ggggeeeetttt111122228888 C/C++ only: sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____ggggeeeetttt, sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____ggggeeeetttt, sssshhhhmmmmeeeemmmm____iiiinnnntttt____ggggeeeetttt, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____ggggeeeetttt, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____ggggeeeetttt Fortran only: sssshhhhmmmmeeeemmmm____ccccoooommmmpppplllleeeexxxx____ggggeeeetttt, sssshhhhmmmmeeeemmmm____iiiinnnntttteeeeggggeeeerrrr____ggggeeeetttt, sssshhhhmmmmeeeemmmm____llllooooggggiiiiccccaaaallll____ggggeeeetttt, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll____ggggeeeetttt * Strided put routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____iiiippppuuuutttt33332222, sssshhhhmmmmeeeemmmm____iiiippppuuuutttt66664444, sssshhhhmmmmeeeemmmm____iiiippppuuuutttt111122228888 C/C++ only: sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____iiiippppuuuutttt, sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____iiiippppuuuutttt, sssshhhhmmmmeeeemmmm____iiiinnnntttt____iiiippppuuuutttt, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____iiiippppuuuutttt, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____iiiippppuuuutttt Fortran only: sssshhhhmmmmeeeemmmm____ccccoooommmmpppplllleeeexxxx____iiiippppuuuutttt, sssshhhhmmmmeeeemmmm____iiiinnnntttteeeeggggeeeerrrr____iiiippppuuuutttt, sssshhhhmmmmeeeemmmm____llllooooggggiiiiccccaaaallll____iiiippppuuuutttt, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll____iiiippppuuuutttt * Strided get routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____iiiiggggeeeetttt33332222, sssshhhhmmmmeeeemmmm____iiiiggggeeeetttt66664444, sssshhhhmmmmeeeemmmm____iiiiggggeeeetttt111122228888 C/C++ only: sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____iiiiggggeeeetttt, sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____iiiiggggeeeetttt, sssshhhhmmmmeeeemmmm____iiiinnnntttt____iiiiggggeeeetttt, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____iiiiggggeeeetttt, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____iiiiggggeeeetttt Fortran only: sssshhhhmmmmeeeemmmm____ccccoooommmmpppplllleeeexxxx____iiiiggggeeeetttt, sssshhhhmmmmeeeemmmm____iiiinnnntttteeeeggggeeeerrrr____iiiiggggeeeetttt, sssshhhhmmmmeeeemmmm____llllooooggggiiiiccccaaaallll____iiiiggggeeeetttt, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll____iiiiggggeeeetttt * Point-to-point synchronization routines: C/C++ only: sssshhhhmmmmeeeemmmm____iiiinnnntttt____wwwwaaaaiiiitttt, sssshhhhmmmmeeeemmmm____iiiinnnntttt____wwwwaaaaiiiitttt____uuuunnnnttttiiiillll, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____wwwwaaaaiiiitttt, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____wwwwaaaaiiiitttt____uuuunnnnttttiiiillll, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____wwwwaaaaiiiitttt, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____wwwwaaaaiiiitttt____uuuunnnnttttiiiillll, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____wwwwaaaaiiiitttt, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____wwwwaaaaiiiitttt____uuuunnnnttttiiiillll Fortran: sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____wwwwaaaaiiiitttt, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____wwwwaaaaiiiitttt____uuuunnnnttttiiiillll, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____wwwwaaaaiiiitttt, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____wwwwaaaaiiiitttt____uuuunnnnttttiiiillll * Barrier synchronization routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr____aaaallllllll, sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr * Atomic memory fetch-and-operate (fetch-op) routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____sssswwwwaaaapppp * Reduction routines: C/C++ only: sssshhhhmmmmeeeemmmm____iiiinnnntttt____aaaannnndddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____aaaannnndddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____aaaannnndddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____aaaannnndddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____mmmmiiiinnnn____ttttoooo____aaaallllllll, *sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt____oooorrrr____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____oooorrrr____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____oooorrrr____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____oooorrrr____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt____xxxxoooorrrr____ttttoooo____aaaallllllll sssshhhhmmmmeeeemmmm____lllloooonnnngggg____xxxxoooorrrr____ttttoooo____aaaallllllll sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____xxxxoooorrrr____ttttoooo____aaaallllllll sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____xxxxoooorrrr____ttttoooo____aaaallllllll, Fortran only: sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____aaaannnndddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____aaaannnndddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll4444____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll8888____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll4444____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll8888____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll4444____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll8888____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____ssssuuuummmm____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll4444____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll8888____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____oooorrrr____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____oooorrrr____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____xxxxoooorrrr____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____xxxxoooorrrr____ttttoooo____aaaallllllll * Broadcast routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____bbbbrrrrooooaaaaddddccccaaaasssstttt33332222, sssshhhhmmmmeeeemmmm____bbbbrrrrooooaaaaddddccccaaaasssstttt66664444 * Generalized barrier synchronization routine: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr * Cache management routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____uuuuddddccccfffflllluuuusssshhhh, sssshhhhmmmmeeeemmmm____uuuuddddccccfffflllluuuusssshhhh____lllliiiinnnneeee * Byte-granularity block put routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____ppppuuuuttttmmmmeeeemmmm and sssshhhhmmmmeeeemmmm____ggggeeeettttmmmmeeeemmmm Fortran only: sssshhhhmmmmeeeemmmm____cccchhhhaaaarrrraaaacccctttteeeerrrr____ppppuuuutttt and sssshhhhmmmmeeeemmmm____cccchhhhaaaarrrraaaacccctttteeeerrrr____ggggeeeetttt * Collect routines: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____ccccoooolllllllleeeecccctttt33332222, sssshhhhmmmmeeeemmmm____ccccoooolllllllleeeecccctttt66664444, sssshhhhmmmmeeeemmmm____ffffccccoooolllllllleeeecccctttt33332222, sssshhhhmmmmeeeemmmm____ffffccccoooolllllllleeeecccctttt66664444 * Atomic memory fetch-and-operate (fetch-op) routines: C/C++ only: sssshhhhmmmmeeeemmmm____ddddoooouuuubbbblllleeee____sssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____sssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____iiiinnnntttt____ccccsssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____iiiinnnntttt____ffffaaaadddddddd, sssshhhhmmmmeeeemmmm____iiiinnnntttt____ffffiiiinnnncccc, sssshhhhmmmmeeeemmmm____iiiinnnntttt____sssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____ccccsssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____ffffaaaadddddddd, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____ffffiiiinnnncccc, sssshhhhmmmmeeeemmmm____lllloooonnnngggg____sssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____ccccsssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____ffffaaaadddddddd, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____ffffiiiinnnncccc, sssshhhhmmmmeeeemmmm____lllloooonnnngggglllloooonnnngggg____sssswwwwaaaapppp FFFFoooorrrrttttrrrraaaannnn oooonnnnllllyyyy:::: sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____ccccsssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____ffffaaaadddddddd, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____ffffiiiinnnncccc, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____sssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____sssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll4444____sssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll8888____sssswwwwaaaapppp, sssshhhhmmmmeeeemmmm____iiiinnnntttt8888____ccccsssswwwwaaaapppp * Atomic memory operation routines: Fortran only: sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____aaaadddddddd, sssshhhhmmmmeeeemmmm____iiiinnnntttt4444____iiiinnnncccc * Remote memory pointer function: C/C++ and Fortran: sssshhhhmmmmeeeemmmm____ppppttttrrrr * Reduction routines: C/C++ only: sssshhhhmmmmeeeemmmm____lllloooonnnnggggddddoooouuuubbbblllleeee____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnnggggddddoooouuuubbbblllleeee____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnnggggddddoooouuuubbbblllleeee____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____lllloooonnnnggggddddoooouuuubbbblllleeee____ssssuuuummmm____ttttoooo____aaaallllllll Fortran only: sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll11116666____mmmmaaaaxxxx____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll11116666____mmmmiiiinnnn____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll11116666____pppprrrroooodddd____ttttoooo____aaaallllllll, sssshhhhmmmmeeeemmmm____rrrreeeeaaaallll11116666____ssssuuuummmm____ttttoooo____aaaallllllll RRRReeeemmmmooootttteeeellllyyyy AAAAcccccccceeeessssssssiiiibbbblllleeee DDDDaaaattttaaaa OOOObbbbjjjjeeeeccccttttssss Typically, target or source arrays that reside on remote processing elements (PEs) are identified by passing the address of the corresponding data object on the local PE. The local existence of a corresponding data object implies that a data object is _s_y_m_m_e_t_r_i_c as described on this man page. Symmetric data objects passed to SHMEM routines can be arrays or scalars. A _s_y_m_m_e_t_r_i_c data object is one for which the local and remote addresses have a known relationship. You can use SHMEM routines to access remote symmetric data objects by using the address of the corresponding data object on the local PE. The following data objects are symmetric: * Fortran data objects in common blocks or with the SSSSAAAAVVVVEEEE attribute. These data objects must not be defined in a dynamic shared object (DSO). * Non-stack C and C++ variables. These data objects must not be defined in a DSO. * Fortran arrays allocated with sssshhhhppppaaaalllllllloooocccc(3F) * C and C++ data allocated by sssshhhhmmmmaaaalllllllloooocccc(3C) SHMEM collective routines that operate on the same data object on multiple PEs require that symmetric data objects be passed. This restriction is for algorithm simplicity and efficiency. These routines define the set of target PEs by the following triplet of arguments: _P_E__s_t_a_r_t, _l_o_g_P_E__s_t_r_i_d_e, and _P_E__s_i_z_e. CCCCoooolllllllleeeeccccttttiiiivvvveeee RRRRoooouuuuttttiiiinnnneeeessss Some SHMEM routines, for example, sssshhhhmmmmeeeemmmm____bbbbrrrrooooaaaaddddccccaaaasssstttt(3) and sssshhhhmmmmeeeemmmm____ffffllllooooaaaatttt____ssssuuuummmm____ttttoooo____aaaallllllll(3), are classified as _c_o_l_l_e_c_t_i_v_e routines because they distribute work across a set of PEs. They must be called concurrently by all PEs in the active set defined by the _P_E__s_t_a_r_t, _l_o_g_P_E__s_t_r_i_d_e, _P_E__s_i_z_e argument triplet. The following man pages describe the SHMEM collective routines: * sssshhhhmmmmeeeemmmm____aaaannnndddd(3) * sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr(3) * sssshhhhmmmmeeeemmmm____bbbbrrrrooooaaaaddddccccaaaasssstttt(3) * sssshhhhmmmmeeeemmmm____ccccoooolllllllleeeecccctttt(3) * sssshhhhmmmmeeeemmmm____mmmmaaaaxxxx(3) * sssshhhhmmmmeeeemmmm____mmmmiiiinnnn(3) * sssshhhhmmmmeeeemmmm____oooorrrr(3) * sssshhhhmmmmeeeemmmm____pppprrrroooodddd(3) * sssshhhhmmmmeeeemmmm____ssssuuuummmm(3) * sssshhhhmmmmeeeemmmm____xxxxoooorrrr(3) UUUUssssiiiinnnngggg tttthhhheeee SSSSyyyymmmmmmmmeeeettttrrrriiiicccc WWWWoooorrrrkkkk AAAArrrrrrrraaaayyyy,,,, ppppSSSSyyyynnnncccc Multiple _p_S_y_n_c arrays are often needed if a particular PE calls a SHMEM collective routine twice without intervening barrier synchronization. Problems would occur if some PEs in the active set for call 2 arrive at call 2 before processing of call 1 is complete by all PEs in the call 1 active set. You can use sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr(((()))) or sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr____aaaallllllll(3) to perform a barrier synchronization between consecutive calls to SHMEM collective routines. There are two special cases: * The sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr(3) routine allows the same _p_S_y_n_c array to be used on consecutive calls as long as the active PE set does not change. * If the same collective routine is called multiple times with the same active set, the calls may alternate between two _p_S_y_n_c arrays. The SHMEM routines guarantee that a first call is completely finished by all PEs by the time processing of a third call begins on any PE. Because the SHMEM routines restore _p_S_y_n_c to its original contents, multiple calls that use the same _p_S_y_n_c array do not require that _p_S_y_n_c be reinitialized after the first call. SSSSHHHHMMMMEEEEMMMM FFFFuuuunnnnccccttttiiiioooonnnn IIIInnnnlllliiiinnnniiiinnnngggg Some SHMEM functions that can be called from C/C++ are defined in the form of macros in the mmmmpppppppp////sssshhhhmmmmeeeemmmm....hhhh header file. These functions are inlined by default on some platforms. To deactivate the automatic inlining of SHMEM functions from C/C++, add the following option to your C/C++ command line: ----DDDD____SSSSHHHHMMMMEEEEMMMM____MMMMAAAACCCCRRRROOOO____OOOOPPPPTTTT====0000. SSSSHHHHMMMMEEEEMMMM AAAApppppppplllliiiiccccaaaattttiiiioooonnnn PPPPllllaaaacccceeeemmmmeeeennnntttt oooonnnn NNNNUUUUMMMMAAAA SSSSyyyysssstttteeeemmmmssss On non-uniform memory access (NUMA) systems, such as Origin series systems, SHMEM start-up processing ensures that the process associated with a SHMEM PE executes on a processor near the memory associated with a SHMEM PE. The following environment variables allow you to control the placement of the SHMEM application on the system: VVVVaaaarrrriiiiaaaabbbblllleeee DDDDeeeessssccccrrrriiiippppttttiiiioooonnnn PPPPAAAAGGGGEEEESSSSIIIIZZZZEEEE____DDDDAAAATTTTAAAA Specifies the desired page size in kilobytes for program data areas. Specify an integer value. On Origin series systems, supported values include 16, 64, 256, 1024, and 4096. SSSSMMMMAAAA____BBBBAAAARRRR____CCCCOOOOUUUUNNNNTTTTEEEERRRR Specifies the use of a simple counter barrier algorithm. By default, this variable is not enabled for jobs with PE counts of 64 or more. SSSSMMMMAAAA____BBBBAAAARRRR____DDDDIIIISSSSSSSSEEEEMMMM Specifies the use of the alternate barrier algorithm, the dissemination/butterfly, within the sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr____aaaallllllll(3) function. This alternate algorithm provides better performance on jobs with larger PE counts. The SSSSMMMMAAAA____BBBBAAAARRRR____DDDDIIIISSSSSSSSEEEEMMMM option is enabled for jobs with PE counts of 64 or higher. By default, this variable is not enabled for jobs with PE counts below 64. SSSSMMMMAAAA____DDDDBBBBXXXX Specifies the PE number to be debugged. If you set SSSSMMMMAAAA____DDDDBBBBXXXX to _n, PE _n prints a message during program startup, describing how to attach to it with the DBX debugger. PE _n sleeps for seven seconds. If you set SSSSMMMMAAAA____DDDDBBBBXXXX to _n,_s, PE _n will sleep for _s seconds. SSSSMMMMAAAA____DDDDPPPPLLLLAAAACCCCEEEE____IIIINNNNTTTTEEEERRRROOOOPPPP____OOOOFFFFFFFF Disables a SHMEM/dplace interoperability feature available beginning with IRIX 6.5.13. By setting this variable, you can obtain the behavior of SHMEM with dplace on older releases of IRIX. By default, this variable is not enabled. SSSSMMMMAAAA____DDDDSSSSMMMM____CCCCPPPPUUUULLLLIIIISSSSTTTT Specifies a list of CPUs on which to run a SHMEM application. To ensure that processes are linked to CPUs, this variable should be used in conjunction with SSSSMMMMAAAA____DDDDSSSSMMMM____MMMMUUUUSSSSTTTTRRRRUUUUNNNN. For an explanation of the syntax for this environment variable, see the section entitled "Using a CPU List." SSSSMMMMAAAA____DDDDSSSSMMMM____MMMMUUUUSSSSTTTTRRRRUUUUNNNN Enforces memory locality for SHMEM processes. Use of this feature ensures that each SHMEM process will get a CPU and physical memory on the node to which it was originally assigned. This variable has been observed to improve program performance on IRIX systems running release 6.5.7 and earlier, when running a program on a quiet system. With later IRIX releases, under certain circumstances, setting this variable is not necessary. Internally, this feature directs the library to use the pppprrrroooocccceeeessssssss____ccccppppuuuulllliiiinnnnkkkk(3) function instead of pppprrrroooocccceeeessssssss____mmmmllllddddlllliiiinnnnkkkk(3) to control memory placement. SSSSMMMMAAAA____DDDDSSSSMMMM____MMMMUUUUSSSSTTTTRRRRUUUUNNNN should not be used when the job is submitted to miser (see mmmmiiiisssseeeerrrr____ssssuuuubbbbmmmmiiiitttt(1)) because program hangs may result. By default, this variable is not enabled. The pppprrrroooocccceeeessssssss____ccccppppuuuulllliiiinnnnkkkk(3) function is inherited across process ffffoooorrrrkkkk(2) or sssspppprrrroooocccc(2). For this reason, when using mixed SHMEM/OpenMP applications, it is recommended either that this variable not be set, or that ____DDDDSSSSMMMM____MMMMUUUUSSSSTTTTRRRRUUUUNNNN also be set (see pppp____eeeennnnvvvviiiirrrroooonnnn(5)). SSSSMMMMAAAA____DDDDSSSSMMMM____OOOOFFFFFFFF When set to any value, deactivates processor-memory affinity control. When set, SHMEM processes run on any available processor, whether or not it is near the memory associated with that process. SSSSMMMMAAAA____DDDDSSSSMMMM____PPPPPPPPMMMM When set to an integer value, specifies the number of processors to be mapped to every memory. The default is 2 on Origin 2000 systems. The default is 4 on Origin 3000 systems. SSSSMMMMAAAA____DDDDSSSSMMMM____TTTTOOOOPPPPOOOOLLLLOOOOGGGGYYYY Specifies the shape of the set of hardware nodes on which the PE memories are allocated. Set this variable to one of the following values: VVVVaaaalllluuuueeee AAAAccccttttiiiioooonnnn ccccuuuubbbbeeee A group of memory nodes that form a perfect hypercube. NNNNPPPPEEEESSSS////SSSSMMMMAAAA____DDDDSSSSMMMM____PPPPPPPPMMMM must be a power of 2. If a perfect hypercube is unavailable, a less restrictive placement will be used. ccccuuuubbbbeeee____ffffiiiixxxxeeeedddd A group of memory nodes that form a perfect hypercube. NNNNPPPPEEEESSSS////SSSSMMMMAAAA____DDDDSSSSMMMM____PPPPPPPPMMMM must be a power of 2. If a perfect hypercube is unavailable, the placement will fail, disabling NUMA placement. ccccppppuuuucccclllluuuusssstttteeeerrrr Any group of memory nodes. The operating system attempts to place the group numbers close to one another, taking into account nodes with disabled processors. (Default for IRIX 6.5.11 and higher). ffffrrrreeeeeeee Any group of memory nodes. The operating system attempts to place the group numbers close to one another. (Default for IRIX 6.5.10 and earlier releases). SSSSMMMMAAAA____DDDDSSSSMMMM____VVVVEEEERRRRBBBBOOOOSSSSEEEE When set to any value, writes information about process and memory placement to stderr. SSSSMMMMAAAA____IIIINNNNFFFFOOOO Prints information about environment variables that can control lllliiiibbbbssssmmmmaaaa execution. SSSSMMMMAAAA____SSSSYYYYMMMMMMMMEEEETTTTRRRRIIIICCCC____SSSSIIIIZZZZEEEE Specifies the size, in bytes, of symmetric memory. This is the size of static space plus per-PE symmetric heap size. SSSSMMMMAAAA____VVVVEEEERRRRSSSSIIIIOOOONNNN Prints the lllliiiibbbbssssmmmmaaaa library release version. UUUUssssiiiinnnngggg aaaa CCCCPPPPUUUU LLLLiiiisssstttt You can manually select CPUs to use for a SHMEM application by setting the SSSSMMMMAAAA____DDDDSSSSMMMM____CCCCPPPPUUUULLLLIIIISSSSTTTT shell variable. This is treated as a comma and/or hyphen delineated ordered list, specifying a mapping of SHMEM processes to CPUs. The shepherd process is not included in this list. Examples: VVVVaaaalllluuuueeee CCCCPPPPUUUU AAAAssssssssiiiiggggnnnnmmmmeeeennnntttt 8888,,,,11116666,,,,33332222 Place three SHMEM processes on CPUs 8, 16, and 32. 33332222,,,,11116666,,,,8888 Place the SHMEM process rank zero on CPU 32, one on 16, and two on CPU 8. 8888----11115555,,,,33332222----33339999 Place the SHMEM processes 0 through 7 on CPUs 8 to 15. Place the SHMEM processes 8 through 15 on CPUs 32 to 39. 33339999----33332222,,,,8888----11115555 Place the SHMEM processes 0 through 7 on CPUs 39 to 32. Place the SHMEM processes 8 through 15 on CPUs 8 to 15. Note that the process rank is the value returned by ____mmmmyyyy____ppppeeee(3I). CPUs are associated with the ccccppppuuuunnnnuuuummmm values given in the hardware graph(hhhhwwwwggggrrrraaaapppphhhh(4)). The number of processors specified must equal the number of SHMEM processes (excluding the shepherd process) that will be used. If an error occurs in processing the CPU list, the default placement policy is used. UUUUssssiiiinnnngggg ddddppppllllaaaacccceeee((((1111)))) The environment variables described previously allow you to map SHMEM processes and memories with hardware processors and nodes. The ddddppppllllaaaacccceeee(1) command, which is available on Origin series systems, can give you additional control over application placement. Perform the following steps to use the ddddppppllllaaaacccceeee(1) command with SHMEM programs: * Create file ppppllllaaaacccceeeeffffiiiilllleeee with these contents: threads $NPES + 1 memories ($NPES +1)/2 in topology cube distribute threads 1:$NPES across memories * Execute your program with NNNNPPPPEEEESSSS set to the number of PEs. For example, to run with 4 PEs, invoke your program this way: env NPES=4 dplace -place placefile a.out IIIInnnntttteeeerrrrooooppppeeeerrrraaaabbbbiiiilllliiiittttyyyy SHMEM routines can be used in conjunction with MPI message passing routines in the same application. Programs that use both MPI and SHMEM should call MMMMPPPPIIII____IIIInnnniiiitttt and MMMMPPPPIIII____FFFFiiiinnnnaaaalllliiiizzzzeeee but omit the call to the ssssttttaaaarrrrtttt____ppppeeeessss routine. SHMEM PE numbers are equal to the MPI rank within the MMMMPPPPIIII____CCCCOOOOMMMMMMMM____WWWWOOOORRRRLLLLDDDD environment variable. On IRIX clustered systems, you can use SHMEM to comunicate only with processes running on the same host. Use the sssshhhhmmmmeeeemmmm____ppppeeee____aaaacccccccceeeessssssssiiiibbbblllleeee function to determine if a remote PE is accessible via SHMEM communication from the local PE. CCCCoooommmmppppiiiilllliiiinnnngggg SSSSHHHHMMMMEEEEMMMM PPPPrrrrooooggggrrrraaaammmmssss The SHMEM routines reside in lllliiiibbbbssssmmmmaaaa....ssssoooo. The following sample command lines compile programs that include SHMEM routines: * IRIX systems: cccccccc ----66664444 cccc____pppprrrrooooggggrrrraaaammmm....cccc ----llllssssmmmmaaaa CCCCCCCC ----66664444 ccccpppplllluuuusssspppplllluuuussss____pppprrrrooooggggrrrraaaammmm....cccc ----llllssssmmmmaaaa ffff99990000 ----66664444 ----LLLLAAAANNNNGGGG::::rrrreeeeccccuuuurrrrssssiiiivvvveeee====oooonnnn ffffoooorrrrttttrrrraaaannnn____pppprrrrooooggggrrrraaaammmm....ffff ----llllssssmmmmaaaa ffff77777777 ----66664444 ----LLLLAAAANNNNGGGG::::rrrreeeeccccuuuurrrrssssiiiivvvveeee====oooonnnn ffffoooorrrrttttrrrraaaannnn____pppprrrrooooggggrrrraaaammmm....ffff ----llllssssmmmmaaaa * IRIX systems with Fortran 90 version 7.2.1 available: ffff99990000 ----66664444 ----LLLLAAAANNNNGGGG::::rrrreeeeccccuuuurrrrssssiiiivvvveeee====oooonnnn ----aaaauuuuttttoooo____uuuusssseeee sssshhhhmmmmeeeemmmm____iiiinnnntttteeeerrrrffffaaaacccceeee ffffoooorrrrttttrrrraaaannnn____pppprrrrooooggggrrrraaaammmm....ffff ----llllssssmmmmaaaa The sssshhhhmmmmeeeemmmm____iiiinnnntttteeeerrrrffffaaaacccceeee module is intended for use only with the ----aaaauuuuttttoooo____uuuusssseeee option. This module provides compile-time checking of interfaces. The kkkkeeeeyyyywwwwoooorrrrdddd====aaaarrrrgggg actual argument format is not supported for SHMEM subroutines defined in the sssshhhhmmmmeeeemmmm____iiiinnnntttteeeerrrrffffaaaacccceeee procedure interface module. The IRIX N32 ABI, selected by the ----nnnn33332222 compiler option, is also supported by SHMEM, but is recommended only for small process counts and program memory sizes, due to the limitation in the size of virtual addresses imposed by the N32 ABI. The use of the N64 ABI, selected by the ----66664444 compiler option, is recommended for most SHMEM programs. PPPPrrrrooooggggrrrraaaammmm SSSSttttaaaarrrrtttt----uuuupppp The SHMEM implementation uses mapped files to render static memory remotely accessible on IRIX systems. The result is that enough file space must be available in ////vvvvaaaarrrr////ttttmmmmpppp to accommodate a file of size _n_p_e_s **** _s_t_a_t_i_c_s_z, where _n_p_e_s is the number of PEs and _s_t_a_t_i_c_s_z is the size of the program's static data area. Static data includes Fortran common blocks and C/C++ static data. If a SHMEM program's memory requirements exceed available file space in ////vvvvaaaarrrr////ttttmmmmpppp, a SHMEM run-time error message is generated. You can use the TTTTMMMMPPPPDDDDIIIIRRRR environment variable to select a directory in a file system with sufficient file space. To minimize SHMEM program start-up time, use symmetric memory allocated by the SSSSHHHHPPPPAAAALLLLLLLLOOOOCCCC(3F) or sssshhhhmmmmaaaalllllllloooocccc(3C) routines instead of static memory. Memory allocated by these routines does not require a corresponding file space allocation in ////vvvvaaaarrrr////ttttmmmmpppp. This avoids problems when file space is low and executes more quickly when start-up processing needs to handle large static memory areas. NNNNOOOOTTTTEEEESSSS The SHMEM software is packaged with the Message Passing Toolkit. EEEENNNNVVVVIIIIRRRROOOONNNNMMMMEEEENNNNTTTT VVVVAAAARRRRIIIIAAAABBBBLLLLEEEESSSS For information on environment variables that affect SHMEM routines, see the "SHMEM Application Placement on NUMA Systems" section of this man page. EEEEXXXXAAAAMMMMPPPPLLLLEEEESSSS Example 1. The following Fortran SHMEM program runs on IRIX systems: PROGRAM REDUCTION REAL VALUES, SUM COMMON /C/ VALUES REAL WORK CALL START_PES(0) VALUES = MY_PE() CALL SHMEM_BARRIER_ALL ! Synchronize all PEs SUM = 0.0 DO I = 0,NUM_PES()-1 CALL SHMEM_REAL_GET(WORK, VALUES, 1, I) ! Get next value SUM = SUM + WORK ! Sum it ENDDO PRINT*,'PE ',MY_PE(),' COMPUTED SUM=',SUM CALL SHMEM_BARRIER_ALL END Since ssssttttaaaarrrrtttt____ppppeeeessss(3) is called with a value of 0, the number of PEs used to run the program is specified by the NNNNPPPPEEEESSSS environment variable. This Fortran program directs all PEs to sum simultaneously the numbers in the VVVVAAAALLLLUUUUEEEESSSS variable across all PEs. By executing the program using the following command line, you can you can run the program with 4 PEs: env NPES=4 a.out Example 2. The following C SHMEM program runs on IRIX systems: #include <mpp/shmem.h> main() { long source[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; static long target[10]; start_pes(0); if (_my_pe() == 0) { /* put 10 words into target on PE 1 */ shmem_long_put(target, source, 10, 1); } shmem_barrier_all(); /* sync sender and receiver */ if (_my_pe() == 1) shmem_udcflush(); /* needed on T90 */ printf("target[0] on PE %d is %d\n", _my_pe(), target[0]); } In this C program, PE 0 sends 10 integers to the target array on PE 1. By executing the program using the following command line, you can you can run the program with 2 PEs: env NPES=2 a.out SSSSEEEEEEEE AAAALLLLSSSSOOOO ddddppppllllaaaacccceeee(1) The following man pages also contain information on SHMEM routines. See the specific man pages for implementation information. CCCCCCCC(1), cccclllldddd(1), ffff99990000(1), ffff99990000(1M), mmmmpppppppprrrruuuunnnn(1) sssshhhhmmmmeeeemmmm____aaaadddddddd(3), sssshhhhmmmmeeeemmmm____aaaannnndddd(3), sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr(3), sssshhhhmmmmeeeemmmm____bbbbaaaarrrrrrrriiiieeeerrrr____aaaallllllll(3), sssshhhhmmmmeeeemmmm____bbbbrrrrooooaaaaddddccccaaaasssstttt(3), sssshhhhmmmmeeeemmmm____ccccaaaacccchhhheeee(3), sssshhhhmmmmeeeemmmm____ccccoooolllllllleeeecccctttt(3), sssshhhhmmmmeeeemmmm____ccccsssswwwwaaaapppp(3), sssshhhhmmmmeeeemmmm____eeeevvvveeeennnntttt(3), sssshhhhmmmmeeeemmmm____ffffaaaadddddddd(3), sssshhhhmmmmeeeemmmm____ffffeeeennnncccceeee(3), sssshhhhmmmmeeeemmmm____ffffiiiinnnncccc(3), sssshhhhmmmmeeeemmmm____ggggeeeetttt(3), sssshhhhmmmmeeeemmmm____iiiiggggeeeetttt(3), sssshhhhmmmmeeeemmmm____iiiinnnncccc(3), sssshhhhmmmmeeeemmmm____iiiippppuuuutttt(3), sssshhhhmmmmeeeemmmm____iiiixxxxppppuuuutttt(3), sssshhhhmmmmeeeemmmm____lllloooocccckkkk(3), sssshhhhmmmmeeeemmmm____mmmmaaaaxxxx(3), sssshhhhmmmmeeeemmmm____mmmmiiiinnnn(3), sssshhhhmmmmeeeemmmm____mmmmsssswwwwaaaapppp(3), sssshhhhmmmmeeeemmmm____mmmmyyyy____ppppeeee(3), sssshhhhmmmmeeeemmmm____oooorrrr(3), sssshhhhmmmmeeeemmmm____pppprrrroooodddd(3), sssshhhhmmmmeeeemmmm____ppppuuuutttt(3), sssshhhhmmmmeeeemmmm____qqqquuuuiiiieeeetttt(3), sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____gggg(3) sssshhhhmmmmeeeemmmm____sssshhhhoooorrrrtttt____pppp(3), sssshhhhmmmmeeeemmmm____ssssttttaaaacccckkkk(3), sssshhhhmmmmeeeemmmm____ssssuuuummmm(3), sssshhhhmmmmeeeemmmm____sssswwwwaaaapppp(3), sssshhhhmmmmeeeemmmm____wwwwaaaaiiiitttt(3), sssshhhhmmmmeeeemmmm____xxxxoooorrrr(3), ssssttttaaaarrrrtttt____ppppeeeessss(3) sssshhhhmmmmaaaalllllllloooocccc(3C) sssshhhhppppaaaalllllllloooocccc(3F) MMMMYYYY____PPPPEEEE(3I), NNNNUUUUMMMM____PPPPEEEESSSS(3I) For information on using SHMEM routines with message passing routines, see the _M_e_s_s_a_g_e _P_a_s_s_i_n_g _T_o_o_l_k_i_t: _P_V_M _P_r_o_g_r_a_m_m_e_r'_s _M_a_n_u_a_l, or the _M_e_s_s_a_g_e _P_a_s_s_i_n_g _T_o_o_l_k_i_t: _M_P_I _P_r_o_g_r_a_m_m_e_r'_s _M_a_n_u_a_l.