IRIX Base Documentation 2001 May

home *** CD-ROM | disk | FTP | other *** search

/ IRIX Base Documentation 2001 May / SGI IRIX Base Documentation 2001 May.iso / usr / relnotes / gl_dev / ch7.z / ch7

Wrap

Text File | 2001-04-17 | 42.6 KB | 1,189 lines

- 1 - 7. _D_e_v_i_c_e__S_p_e_c_i_f_i_c__H_i_n_t_s__a_n_d__F_e_a_t_u_r_e_s 7.1 _M_a_x__I_m_p_a_c_t_,__H_i_g_h__I_m_p_a_c_t_,__S_o_l_i_d__I_m_p_a_c_t_,__M_X_I_,__S_S_I_,__S_I 7.1.1 _P_b_u_f_f_e_r_s__o_n__I_m_p_a_c_t The OpenGL Pbuffer extension is now supported on all impact configurations: Max Impact, High Impact, and Solid Impact. These release notes describe the Impact Pbuffer implementation in detail. This will be of use primarily to developers of opengl applications that use pbuffers. 7.1.1.1 _P_b_u_f_f_e_r__E_x_t_e_n_s_i_o_n_s Note that you will have to use the FBConfig extension to create pbuffers. Also, if you want to copy between pbuffers and windows, you will need to use the MakeCurrentRead extension. This release includes all three extensions: Pbuffers, MakeCurrentRead, and FBConfig. 7.1.1.2 _P_b_u_f_f_e_r__S_i_z_e_s Each pbuffer allocated consumes a set of bitplanes with the same geometry as the entire screen. So, if you run the monitor at 1280x1024, allocating a single buffer will actually consume 1280x1024 of offscreen framebuffer memory. It is impossible to create a pbuffer larger than the screen size. You can create one smaller, but the remaining screen memory will be wasted and cannot be allocated. In some Impact configurations, there are actually multiple sets of bitplanes available to pbuffers (pbuffer banks), so it is in some cases (e.g. max impact) it is possible to allocate more than one pbuffer at a time. See the section below called "Recommended pbuffer and window combinations." 7.1.1.3 _P_b_u_f_f_e_r__f_o_r_m_a_t_s This release supports the following pbuffer formats: All X visuals (except overlays) support pbuffers. This includes double buffered, and stereo quad-buffered visuals, and visuals with depth and stencil planes. All appear to work fine, and allow copies from pbuffers to windows, and windows to pbuffers. Note that the context used to do this copy must be created with the same fbconfig - 2 - as the window (not the pbuffer). See the "glXMakeCurrent compatibility workaround" section below for details. 7.1.1.4 _g_l_X_M_a_k_e_C_u_r_r_e_n_t__c_o_m_p_a_t_i_b_i_l_i_t_y__r_e_l_a_x_a_t_i_o_n In this release, glXMakeCurrent's strict compatibility requirements have been relaxed for pbuffers (but not for windows). Any context that meets certain very minimal requirements can be used to render into a pbuffer, or copy from a pbuffer. The requirements are simply that the window and context must have the same renderType; in other words, they must both be color index, or rgba. There are no requirements that the color depths of the context match those of the pbuffer, or anything of the sort. Note that windows still have strict similarity requirements. Contexts and windows bound together with glXMakeCurrent must have been created from the same visual. 7.1.1.5 _g_l_X_M_a_k_e_C_u_r_r_e_n_t__c_o_m_p_a_t_i_b_i_l_i_t_y__w_o_r_k_a_r_o_u_n_d In order to copy between pbuffers and windows, you will need to use the glXMakeCurrentReadSGIX extension. This is all very straightforward if the window, pbuffer, and context you are using were all created with the same FBConfigID (using glXCreateContextWithConfigSGIX). If they weren't created from the same fbconfigid, things get more complicated. A problem arises since there is a bug in the X server which requires that windows (but not pbuffers) and contexts be created from the exact same visual (or fbconfig). (This does not follow the spec since the fbconfig spec specifies that windows should work with any "compatible" context if the context was created from an fbconfig.) You must make sure that the window and context were always created with the EXACT SAME fbconfig or visual id. So, to copy from a single buffered pbuffer to a double buffered window, you must: +o create an sb pbuffer +o create a db window +o create a db context This will allow glXMakeCurrentReadSGIX to work correctly around this bug. Note that the above example would not work - 3 - if the context were created single-buffered. This is due to the fact that makecurrent between the sb context and db window would fail since the window and context were not created with reference to the exact same visual (or fbconfig). Note that the db context in the example above may be used with makecurrent(pbuffer), makecurrent(window), and makecurrentread(window, pbuffer), or makecurrentread(pbuffer, window) even though the pbuffer is single buffered and the context double buffered. This is correct behavior according to the pbuffer spec. Window behavior should be corrected in a subsequent release so that window compatibility is properly tested against contexts. See the section at the end about Impact-specific glxMakeCurrentRead Compatibility. 7.1.1.6 _P_b_u_f_f_e_r__p_e_r_f_o_r_m_a_n_c_e The single most important thing you can do to insure good pbuffer performance on impact is to make sure that the windows that your applications are using are not using X visuals with Z bitplanes unless absolutely necessary. There are conditions outlined below under "pbuffer bank calculations" which cause zbuffers to have to swap into host memory when their bank usage conflicts with pbuffers. The simplest way to avoid this is to create all windows using visuals without zbuffers. The most expensive operation with pbuffers is actually glXMakeCurrentReadSGIX. Try to minimize your use of this routine in order to maximize performance. In pursuit of this goal, you should try to minimize the number of contexts you use in your application. 7.1.1.7 _P_b_u_f_f_e_r__b_a_n_k__c_a_l_c_u_l_a_t_i_o_n_s In order to calculate how many pbuffers you can have concurrently with your windows, use the following procedure. Determine how many pbuffer banks are available in your system, using the table below called "Pbuffer bank availability". This is the number of pbuffer banks available on your system. Now you need to determine how many banks your application requires. Use the table below called "Pbuffer bank usage" to look up how many banks each of the buffers that your - 4 - application uses requires. Note that this is a global resource, so you must include in your calculations all applications running the on the machine. If another application allocates a pbuffer, then there is one fewer pbuffer bank available for your application. Similarly, if any application uses a Z buffer, there will not be enough pbuffer banks to support a pbuffer at the same time. Pbuffers are capable of sharing the pbuffer banks with Z buffers, and the X server supports swapping the pbuffer bank when necessary so the bitplanes may be used for both purposes at once. This will incur a substantial performance penalty which may be prohibitive for some applications. In other cases where applications are willing to accept pbuffer/zbuffer swapping, you may allow a pbuffer to "share" bitplanes with a zbuffer in your calculations. The one exception to this sharing is that you cannot use glXMakeCurrentReadSGIX with both a window with a zbuffer and a pbuffer that resides in the same bitplanes as that zbuffer. In such a case, glXMakeCurrentReadSGIX will return GL_FALSE and fail. 7.1.1.8 _E_x_a_m_p_l_e__p_b_u_f_f_e_r__b_a_n_k__c_a_l_c_u_l_a_t_i_o_n_s Note that "overlap" in the table below refers to zbuffer/pbuffer overlap. Such overlap is not allowed in a single call to glXMakeCurrentReadSGIX. Such overlap may incur swapping performance penalties. 7.1.1.8.1 _n_o__o_v_e_r_l_a_p +o 0 banks window +o 1 bank window + z/s buffer +o 1 bank window + 12(L) pbuffer +o 1 bank window + 8888 pbuffer +o 2 banks window + (2) 12(L) pbuffers +o 2 banks window + (2) 8888 pbuffers +o 2 banks window + 8888 pbuffer + z/s buffer +o 3 banks window + (2) 8888 pbuffer + z/s buffer +o 2 banks window + 12/12/12 pbuffer + z/s buffer - 5 - 7.1.1.8.2 _w_i_t_h__o_v_e_r_l_a_p +o 0 banks window +o 1 bank window + z/s buffer +o 1 bank window + 12(L) pbuffer +o 1 bank window + 8888 pbuffer +o 2 banks window + (2) 12(L) pbuffers +o 2 banks window + (2) 8888 pbuffers +o 1 banks window + 8888 pbuffer + z/s buffer +o 2 banks window + (2) 8888 pbuffer + z/s buffer +o 2 banks window + 12/12/12 pbuffer + z/s buffer 7.1.1.9 _P_b_u_f_f_e_r__b_a_n_k__a_v_a_i_l_a_b_i_l_i_t_y The framebuffer memory available for pbuffers in Max and High Impact systems are organized as follows. Note that on High Impact, the number of banks available for pbuffers depends on the timing table which is loaded when the X Server starts. 7.1.1.9.1 _M_a_x__I_m_p_a_c_t +o normal timing tables: 2 pbuffer banks +o 1024x768 timing tables: 4 pbuffer banks +o 1600x1200 timing table: 1 pbuffer bank +o 1600x1200 32db: none 7.1.1.9.2 _H_i_g_h__I_m_p_a_c_t__a_n_d__S_o_l_i_d__I_m_p_a_c_t +o normal timing tables: 1 pbuffer bank +o 1024x768 timing tables: 1 pbuffer bank +o 1024x768 pbuf: 2 pbuffer banks +o 32db timing tables: none +o 1600x1200 timing table: none - 6 - 7.1.1.10 _P_b_u_f_f_e_r__b_a_n_k__u_s_a_g_e OpenGL pbuffer bank usage: 7.1.1.10.1 _C_o_l_o_r__b_u_f_f_e_r_s +o two banks db 8/8/8/8 +o two banks db 12/12/12 +o two banks stereo db (any resolution) +o one bank all other color resolutions 7.1.1.10.2 _A_n_c_i_l_l_a_r_y__b_u_f_f_e_r_s +o add one extra bank for visuals with Z and/or stencil N.B.: 12-12-12 color buffers (without depth) are prohibited from being allocated in the bitplanes normally used by the zbuffer (pbuffer bank 0). The zbuffer bank will be allocated last when you are allocating a series of pbuffers, so the simplest workaround is simply to make sure that you allocate any 12-12-12 pbuffers before your other pbuffers. This restriction will manifest itself as glxCreateGLXPbufferSGIX failing due to BadAlloc. 7.1.1.11 _I_m_p_a_c_t_-_s_p_e_c_i_f_i_c__g_l_X_M_a_k_e_C_u_r_r_e_n_t_R_e_a_d__C_o_m_p_a_t_i_b_i_l_i_t_y +o 1) Render types must match (color index/rgba). +o 2) Pbuf with depth & window with depth are incompatible. +o 3) Pbufs in bank 0 and window with z are incompatible. (Pbuffers will be put in bank 0 last.) (Pbuffers allocated earlier are not likely to be in bank 0.) +o 4) Color depths of drawables do NOT need to match. +o 5) DB/Stereo do NOT need to match. 7.1.2 _O_p_t_i_m_i_z_e_d__V_e_r_t_e_x__A_r_r_a_y_s IRIX 6.5.1 introduces performance optimizations for back-face culled primitives drawn using OpenGL vertex arrays. Acceleration is currently available in direct-rendered OpenGL contexts for GL_TRIANGLE_STRIP primitives rendered using glDrawArrays with back-face culling (GL_CULL_FACE) enabled. To enable acceleration for an OpenGL context, the GLMGRARRAYOPT environment variable must be defined when the context is - 7 - created and first made the current rendering context. Enabling this optimization may yield incorrect back-face culling results when combined with noncanonical usage of the projection and modelview matrices. 7.1.3 _Y_i_e_l_d_i_n_g _C_P_U _C_y_c_l_e_s _t_o _a _L_o_w_e_r _P_r_i_o_r_i_t_y _T_h_r_e_a_d _D_u_r_i_n_g _D_M_A _O_p_e_r_a_t_i_o_n_s The MXI, SSI and SI OpenGL implementations utilize a DMA completion synchronization mechanism that will not explicitly yield CPU cycles to lower priority threads while a DMA operation is outstanding. IRIX 6.5.5 supports a new systune variable which configures the OpenGL implementation for these products (only) to provide that behavior. The systune variable is in the gfx group and is named gfxdmasleepthreshold. The default value of 0 configures the system for the pre 6.5.5 behavior. Non zero values are interpreted as the number of bytes of transfer size within a DMA operation beyond which an explicit yielding of the CPU until completion will occur. Enabling this property of the MXI, SSI and SI OpenGL implementations can have undesirable performance effects upon imaging operations if not tuned properly. At small transfer size threshold values aggregate throughput degradations of up to 15 percent have been measured (for 16KB glDrawPixels transfers). Empirical testing on a two processor 250MHz R10K MXI system has shown that setting the value to 80000 yields a marked improvement in regaining CPU cycles for lower priority threads with about a 5 percent degradation in pixel transfer rate. Individual application performance will vary; this suggested value should be used only as a guide. Note that the default behavior will yield to threads of equal or higher priority. Therefore, consider enabling this only if it is desirable to have a lower priority thread run during lengthy pixel movement operations. 7.2 _I_n_f_i_n_i_t_e_R_e_a_l_i_t_y__P_e_r_f_o_r_m_a_n_c_e__H_i_n_t_s Set all texture parameters (especially the minification filter) before downloading a texture. Enable texturing before downloading any textures. - 8 - For best performance and greatest control over texture memory allocation, use subtexture loads to manage texture memory. 7.3 _I_n_f_i_n_i_t_e_R_e_a_l_i_t_y__S_p_e_c_i_f_i_c__F_e_a_t_u_r_e_s 7.3.1 _D_i_g_i_t_a_l__D_i_s_p_l_a_y__O_p_t_i_o_n__(_D_D_O_)__s_u_p_p_o_r_t DDO is a new video option board for Onyx InfiniteReality systems. 7.3.2 _H_i_g_h__Q_u_a_l_i_t_y__M_u_l_t_i_s_a_m_p_l_e_d__A_n_t_i_-_A_l_i_a_s_e_d__P_o_i_n_t_s The quality of multisampled anti-aliased points has been improved. To use these new improved points: glDisable(GL_POINT_SMOOTH); glEnable(GL_MULTISAMPLE_SGIS); To get the best looking points, some care must be taken to set the GL_POINT_FADE_THRESHOLD_SIZE correctly. For general use we suggest using a threshold of zero to disable the threshold size feature. glPointParameterfSGIS(GL_POINT_FADE_THRESHOLD_SIZE, 0.0); 7.3.3 _P_i_p_e_l_i_n_e__I_n_s_t_r_u_m_e_n_t_a_t_i_o_n The SGIX_instruments OpenGL extension defines a new mechanism for measuring the performance of the graphics pipeline. It can be used to determine when an application is limited by pixel fill, geometry processing load, etc. This is helpful for general performance tuning and for maintaining a guaranteed frame rate in simulation systems. 7.3.4 _F_o_r_c_i_n_g__C_o_m_p_l_e_t_i_o_n__o_f__R_a_s_t_e_r_i_z_a_t_i_o_n The SGIX_flush_raster OpenGL extension forces all rasterization operations to be completed before processing the next OpenGL command. Unlike glFinish(), it does not prevent the application from issuing new commands. This is used in conjunction with the SGIX_instruments extension to ensure that rasterization is complete before taking a measurement of the graphics pipeline. 7.3.5 _D_i_s_p_l_a_y_-_L_i_s_t__M_e_m_o_r_y__M_a_n_a_g_e_m_e_n_t One of the new hardware features of InfiniteReality is a display-list cache memory. Display lists may be transferred from this memory at roughly twice the maximum speed possible for display lists stored in main memory. The SGIX_list_priority OpenGL extension allows applications to manage the contents of the display list cache by setting residence priorities for display lists. - 9 - Note that this interface is experimental and its behavior is subject to change. We are very interested in feedback concerning it. 7.3.6 _C_a_l_l_i_g_r_a_p_h_i_c__L_i_g_h_t__S_u_p_p_o_r_t The SGIX_calligraphic_fragment OpenGL extension allows position and coverage information for light points to be transmitted to a combination calligraphic/raster display system. This is valuable for night-time flight simulation. 7.3.7 _L_a_r_g_e__C_o_l_o_r__T_a_b_l_e_s Luminance-format color tables may now have up to 32K entries. Non-luminance color tables may have up to 16K entries. (Performance is maximized when color tables have 4K or fewer entries, however.) 7.3.8 _V_i_r_t_u_a_l__C_l_i_p_m_a_p_s Clipmaps are an extension of mipmaps, intended to handle texture mapping for extremely large textures (such as high-resolution satellite photographs of the entire Earth). The first release of clipmaps was limited to 15 levels (implying a maximum texture size of 32Kx32K). This has been changed to allow a much larger number of levels, provided that no more than 15 adjacent levels are resident in texture memory at any one time. Even now though, there remains a limitation that over- subscribing texture memory with two 17-level clipmaps will fail. For now, you must not allocate more clipmaps than will fit into your physical texture memory. 7.3.9 _V_i_d_e_o__P_a_n_/_Z_o_o_m libXvc now supports the ability to pan over a framebuffer area larger than the display, as well as to zoom the display up or down without re-rendering. 7.3.10 _O_l_d_-_S_t_y_l_e__S_t_e_r_e_o The first InfiniteReality release included support only for stereo-in-a-window (``new-style'' stereo). This release also supports old-style stereo, in which the screen is split into two parts and each part is scaled up by a factor of two. 7.3.11 _D_e_p_t_h__T_e_x_t_u_r_e_s The SGIX_depth_texture extension defines the behavior of depth textures (analogous to color textures). Currently depth textures are used for real-time shadows. 7.3.12 _S_y_n_c_h_r_o_n_i_z_e_d__B_u_f_f_e_r__S_w_a_p The SGIX_swap_barrier extension allows buffer swaps to be synchronized with an external event. Normally this is used to coordinate swapping among several machines, each of which is responsible for a portion of a video wall or other sophisticated multiple-display system. - 10 - 7.3.13 _S_w_a_p__G_r_o_u_p_s The SGIX_swap_group OpenGL extension provides the ability to synchronize the buffer swaps of a group of windows. For example, a double-buffered main window and its associated double-buffered overlay window can be placed in a swap group so that they will always be buffer-swapped together. 7.3.14 _D_i_s_p_l_a_y__L_i_s_t_s Display lists can now be transferred from main memory to the graphics pipeline by DMA. This is substantially faster than immediate mode, though not as fast as display lists in the display list cache memory. For now, applications are limited to 128M of DMA-able display lists. Beyond that, the code falls back to non-DMA display lists. 7.3.15 _P_a_c_k_e_d__V_e_r_t_e_x__A_r_r_a_y__F_o_r_m_a_t_s Support for certain vertex array formats has been optimized with special microcode. See ``man glvertexpointerext'' for details. 7.3.16 _B_i_t_m_a_p_s__(_T_e_x_t_) Drawing display-listed bitmaps (OpenGL-based text) is dramatically faster. 7.3.17 _H_i_s_t_o_g_r_a_m_s Single-component histograms (a common case for luminance-only image-processing operations) and the glGetHistogramEXT() command have been tuned. 7.3.18 _S_m_a_l_l_-_A_r_e_a__P_i_x_e_l__O_p_e_r_a_t_i_o_n_s The overhead for glCopyPixels() and glDrawPixels() operations has been reduced, allowing significantly better throughput for those operations when applied to small pixel arrays. 7.3.19 _M_o_d_e__C_h_a_n_g_e_s Performance for a variety of mode- change operations has been improved. 7.3.20 _C_u_l_l_i_n_g Backface culling performance has been improved, though the penalty for using culling remains higher than it was on RealityEngine. 7.3.21 _E_v_a_l_u_a_t_o_r_s A number of optimizations have been applied to parametric polynomial surfaces (evaluators). 7.3.22 _C_o_n_v_o_l_u_t_i_o_n Performance has been improved substantially, especially for separable convolutions and for luminance-only convolutions. 7.3.23 _T_e_x_t_u_r_e__B_i_n_d_s Binding to a texture that is resident in texture memory is now over three times faster than it was in the first release. - 11 - 7.3.24 _Q_u_a_d_r_i_l_a_t_e_r_a_l_s Decomposition of quadrilaterals into triangles has been changed, yielding 30%-50% better performance for geometry-limited quadrilateral strips. 7.4 _O_c_t_a_n_e_2__V_P_r_o__P_e_r_f_o_r_m_a_n_c_e__H_i_n_t_s Octane2 VPro host software support is optimized for the n32 and n64 object file formats, and the mips4 instruction set. The old o32 object file format is supported, but cannot use the mips4 instruction set. Programs in this format will therefore miss out on optimizations in matrix arithmetic and in efficient submission of display lists to the graphics hardware. Users who run applications which make heavy use of the glAccum() function should configure their Octane2 VPro systems to take advantage of the hardware accumulation buffer. This buffer is not part of the default configuration as these systems are shipped. See the xsetmon(1) man page, and the "Unified Graphics Memory" topic in the Features section below, for details. Octane2 VPro provides optimized transfer paths for a selection of vertex array layouts. Use of these formats may substantially increase performance in glDrawArrays(), glDrawElements() and glDrawRangeElements(). This set of formats includes some in the set of enums provided for glInterleavedArrays() and some others as well. Using any means (either glInterleavedArrays() or explicit use of glVertexPointer() and related functions) for specifying the optimized formats will result in the optimized path being taken. The list of optimized formats follows. In this list, abbreviations are used which are akin to the enums provided for glInterleaveArrays(), even though some of these abbreviations are not actual legal enums. In the latter case, one would use glVertexPointer() and related functions to specify the array format. We hope the meaning is clear. +o Packed V3F +o Interleaved C3F_V3F +o Packed separate C3F_V3F +o Interleaved N3F_V3F - 12 - +o Interleaved V3F_N3F +o Packed separate N3F_V3F +o Packed separate N3S_V3F +o Interleaved T2F_V3F +o Packed separate T2F_V3F Octane2 VPro provides a way to improve precision of parameter interpolation across primitives (especially those primitives which are large in screen space), using the SGIX_vertex_preclip extension. See the man pages for glIntro(3G), glEnable(3G) and glHint(3G) for details. When vertex preclipping is disabled (glDisable(GL_VERTEX_PRECLIP_SGIX)), primitive performance is maximized. When it is enabled (glEnable(GL_VERTEX_PRECLIP_SGIX)), there is a hint which controls how much work is done to detect primitives for which enhanced interpolation may be beneficial. This hint, glHint(GL_VERTEX_PRECLIP_HINT_SGIX, ...), may as usual be set to either GL_FASTEST, GL_NICEST or GL_DONT_CARE. (GL_DONT_CARE operates identically to GL_FASTEST.) Use of vertex preclipping results in a decrease in peak primitive rates, with GL_NICEST costing more than GL_FASTEST. The initial state of vertex preclipping is disabled; the initial state of the hint is GL_DONT_CARE. However, we have provided an environment variable which sets the initial values, so that an unmodified application may operate in any of these modes. (If the application uses the SGIX_vertex_preclip extension at runtime, the application's settings will override the initial settings provided by the environment variable.) This environment variable is called GL_VERTEX_PRECLIP. It can have three values: the string "DISABLED" disables vertex preclipping (this simply confirms the default case); the string "FASTEST" both enables vertex preclipping and sets the hint to GL_FASTEST; and the string "NICEST" both enables vertex preclipping and sets the hint to GL_NICEST. Octane2 VPro systems represent the depth buffer in eye space rather than in window space (see the topic "Depth Buffer in Eye Space" in the Features section below for details). As a result, glReadPixels() and glDrawPixels() incur a cost to convert GL_DEPTH_COMPONENT data to and from the hardware format. Applications can use the GL_DEPTH_COMPONENT24_SGIX - 13 - format parameter to cause depth buffer reads and writes to be done directly in the hardware format, thus avoiding this cost. See the man pages for glReadPixels() and glDrawPixels(). Octane2 VPro systems are capable of asynchronous transfers of pixel data and texture images. This feature can provide dramatic performance gains for certain kinds of applications. One example is applications which engage in frequent, ongoing texture downloads, such as those which use dynamically changing textures. Another example is applications which can overlap substantial CPU work with glDrawPixels or glReadPixels operations, and can't easily structure themselves to be multithreaded. This feature is provided via the SGIX_async and SGIX_async_pixel extensions. For information on SGIX_async, see the man pages for glAsyncMarkerSGIX(3G), glPollAsyncSGIX(3G), glFinishAsyncSGIX(3G), glGenAsyncMarkersSGIX(3G), glIsAsyncMarkerSGIX(3G), glDeleteAsyncMarkersSGIX(3G), and glFinish(3G). For information on SGIX_async_pixel, see the man pages for glEnable(3G), glDrawPixels(3G), glReadPixels(3G), glTexImage1D(3G), glTexImage2D(3G), glTexImage3D(3G), glTexSubImage1D(3G), glTexSubImage2D(3G), and glTexSubImage3D(3G). The SGIX_async and SGIX_async_pixel extensions relax the normal OpenGL semantics of sequentiality. In particular, asynchronous transfers of pixel or texel data may happen out-of-order with respect to each other and with respect to any synchronous OpenGL commands that follow. Buffers for asynchronous glDrawPixels and glTexImage commands must not be modified before the transfers have finished. Likewise, buffers for asynchronous glReadPixels commands will not be valid until the transfers have finished. Applications must take care to enforce their own dependencies on asynchronously transferred data. To support this, the SGIX_async extension provides bookkeeping mechanisms and both blocking and non-blocking synchronization commands. Failure to enforce all dependencies may result in obscure, timing-related bugs, as well as bugs which remain latent until the application is run on higher-performance systems than may presently be available. As with all SGIX extensions, this feature may not be available on future products. - 14 - 7.5 _O_c_t_a_n_e_2__V_P_r_o__S_p_e_c_i_f_i_c__F_e_a_t_u_r_e_s 7.5.1 _U_n_i_f_i_e_d__G_r_a_p_h_i_c_s__M_e_m_o_r_y The Octane2 VPro graphics architecture supplies a pool of memory out of which many graphics memory resources are allocated. (This pool of memory is, however, separate from system memory and not addressable in the system's virtual address space.) The set of resources present in this "unified graphics memory" are: the framebuffer (including the color, depth and stencil buffers, and optionally the accumulation buffer); framebuffer overlays; pbuffers; textures; and some system overhead. The amount of unified graphics memory present, in megabytes, can be found as the last field of the string returned by glGetString(GL_RENDERER). See the man page for glGetString() for details. The xsetmon(1) program (also accessible from the IRIX desktop at Toolchest->System->Display Properties) is used to statically allocate the framebuffer. The user may choose among a variety of framebuffer X and Y dimensions; a framebuffer depth of 8 or 16 bytes; and the optional presence of a hardware accumulation buffer. The unified graphics memory remaining after this static allocation (and after a small amount of system overhead) is available for dynamic allocation of pbuffers and textures. If the hardware accumulation buffer is selected, it provides 24 bits of precision per component. When no hardware accumulation buffer is present, OpenGL will allocate a host software accumulation buffer for use by the glAccum() function. This software accumulation buffer provides 16 bits per component. The default configuration is a 1280x1024 framebuffer with 16 bytes per pixel, and no hardware accumulation buffer. Depending on the amount of memory present in the graphics subsystem, some framebuffer configurations may not fit. A rough calculation of the amount of memory consumed by a framebuffer configuration may be done as follows. Count pixels simply by multiplying the X and Y dimensions of the framebuffer. For each pixel, allow 8 or 16 bytes according to the selection made in xsetmon(1). Add 2 bytes per pixel for the overlay/WID buffer. If the hardware accumulation buffer is selected, add 12 more bytes per pixel. - 15 - 7.5.2 _C_l_i_p_p_i_n_g__a_s__S_c_i_s_s_o_r_i_n_g The Octane2 VPro architecture substitutes a function with scissoring semantics for fine- grained geometry clipping of the usual kind. This includes both view-frustum clipping and glClipPlane() clipping. In the latter case, fragments lying on the wrong side of the clip plane are efficiently killed in rasterization hardware. Therefore, the OpenGL clipping semantics with respect to the glPolygonMode(..., GL_LINES) function are modified so that new edges are not supplied when such polygons are clipped. As well, wide points and wide lines never slop over the bounds of the viewport. Although these are technical violations of the OpenGL specification, in practice most users consider the specified behaviors to be bugs, and their absence to be features. 7.5.3 _W_i_d_e__L_i_n_e_s Octane2 VPro systems use a "french-cut" style of line endings for anti-aliased wide lines. This means that the ends of wide line segments are either vertical, if the line has an X-major slope on the screen, or horizontal if the line has a Y-major slope. 7.5.4 _D_e_c_o_m_p_o_s_i_t_i_o_n__o_f__Q_u_a_d_s Unlike most other OpenGL implementations, Octane2 VPro systems decompose quads into triangles using the diagonal between the first and third vertices, rather than between the second and fourth vertices. This may produce different interpolations of parameters such as color, texture coordinates or Z coordinate at given interior points of the quad. This is most likely to be noticeable when the four values of a given parameter at the four vertices are not planar in the parameter space. One example would be a square with three vertices white and one vertex red. Another would be a quad whose geometry is grossly non-planar. In such cases, the application has underspecified the desired interpolation, and different OpenGL implementations are free to behave differently. 7.5.5 _C_o_m_p_u_t_a_t_i_o_n__o_f__T_e_x_t_u_r_e__L_e_v_e_l__o_f__D_e_t_a_i_l In computing the lambda parameter controlling level-of-detail while sampling a texture, the Octane2 VPro hardware uses a diagonal distance between pixels rather than the usual rectangular distances in estimating the partial derivatives of s and t with respect to x and y. The contours where level-of-detail changes across the surface of the primitive may be of different shape as a result. - 16 - 7.5.6 _V_i_s_u_a_l_s__w_i_t_h__M_u_l_t_i_p_l_e__B_u_f_f_e_r_s To provide maximum rasterization performance, Octane2 VPro systems provide duplicate depth buffers in double-buffered visuals which have depth. The depth buffer associated with the current draw buffer is used for depth tests. Therefore, the depth buffer should ideally be cleared at the same time as the color buffer. In any case, the depth buffer may be observed to have different contents as the result of a call to glXSwapBuffers() or glDrawBuffer(). 7.5.7 _C_l_e_a_r_s On Octane2 VPro systems, dithering is not applied to screen clears. 7.5.8 _D_e_p_t_h__B_u_f_f_e_r__i_n__E_y_e__S_p_a_c_e The Octane2 VPro graphics system stores eye Z values in the depth buffer instead of device Zvalues as many other graphics chips do. Specifically, it stores eye Z divided by eye W. Let us call this divided eye Z. In addition, the eye Z values are stored in the depth buffer in an internal floating point representation. Note: Eye coordinates are obtained by multiplying the object coordinates by the modelview matrix. Eye coordinates multiplied by the projection matrix yield clip coordinates. Clip coordinates divided by their W coordinates (clip W) result in device coordinates. The advantage of this approach is in improved resolution of Z values in areas where objects are most often located, that is in the first half of the viewing frustum (closer to the camera). First, divided eye Z values are always uniformly spaced in the view frustum, even in case of perspective projection. Second, the use of floating point values increases the resolution at the beginning of the viewing frustum. Thus for objects closer to the camera (both in perspective and orthographic projection) the precision of depth tests is increased, compared to objects further away from the camera. The increase in resolution close to the camera is not as drastic as in the case of device Z values in perspective projection. For example, in case of a frustum with near plane at 1 and far plane at 1001, the precision in the interval of eye Z values from -501 to -1001 is 18 bits, in the interval (-251,-501) is 19 bits, in the interval (-126,-251) is 20 bits and so on (the values have a 6 bit mantissa). On the other hand, device Z values would have 20 bit precision around -16, 19 bit around -32, and 18 bit around -64, assuming that the values are stored as integers or fixed point numbers. - 17 - Thus eye Z values have about 4 bits better precision than device Z values in the first half of the Z range, except the very beginning when device Z values are more precise. The difference increases as one moves further away from the camera, because eye Z values have 18 bit precision in the whole second half of the frustum, while device Z values still lose precision with higher distance from the camera. There are a few caveats the user should be aware of: The resolution of the depth buffer in orthographic projection is lower for objects in the second half of the view frustum (further away from the camera), compared to the same size depth buffer with integer values. If your application changes the projection matrix without clearing the depth buffer, the behavior of your program may be different, since the values are distributed uniformly in case of perspective projection. For example, if you draw an instrument panel using orthographic projection and then the cockpit view using perspective projection, you may have to adjust the parameters of your projections (or even better, use glDepthRange()). If you specify perspective projection it is very important to make the proper distinction between modelview and projection matrices, and to correctly specify the projection on the projection matrix. Most programmers who use lighting in their application are familiar with this condition, since lighting works in eye coordinates and it is necessary to specify the projection on the projection matrix stack to obtain correct eye coordinates. 7.5.9 _V_i_s_u_a_l_s__w_i_t_h__1_6_-_b_i_t__D_e_p_t_h On Octane2 VPro V12 systems, there is a new visual type which provides a 16-bit depth buffer in combination with color buffers of 12 bits each for red, green, blue and alpha channels; it may be either single- or double-buffered. (This visual type does not provide a stencil buffer.) Because most applications that use a depth buffer need the full precision of 24 bits, we have arranged the visual ranking algorithms in glXChooseVisual() and glXChooseFBConfig() to prefer the visuals with 24-bit depth buffers, even though they have fewer bits of color precision. If an application directly requests 24 bits of depth, then of course only those visuals will be considered by these GLX functions. Conversely, an application may get one of the new visuals by asking for 12 bits of red, green, blue and alpha, along with from 1 to 16 bits of depth. - 18 - 7.5.10 _S_p_e_c_u_l_a_r__H_i_g_h_l_i_g_h_t_s The Octane2 VPro graphics system is the first system from SGI to implement the GL_SGIX_fragment_lighting extension. The implementation is optimized for providing improved lighting effects with no loss of performance at interactive frame rates. If only a fragment light is enabled, and if a single highlight occupies a large area of the screen (more than 200 or so pixels across), some slight mach banding may be visible in the highlight. The effect is ameliorated by enabling other lights in addition to the fragment light, and is unnoticeable in the normal case where highlights are only a small part of the visible window area. For materials with very low shininess the extent of a specular highlight will also be slightly different from the value given by the formula in the OpenGL specification. The discrepancy becomes less as the shininess increases, and has disappeared by the time the shininess gets into the range typical of glossy surfaces where specular reflection is most significant.