dimanche 12 décembre 2010

How to use Python to plugin C++

Goal
Plug-ins permit to easily extend a software without modifying it. For the language, Python plug-ins avoid the per platform compilation needed with c++. Actually it does not need compilation.

We'll make a basic sample to call python implemented member function on a python built c++ object.

Prerequisite
CMake will be used to manage project build. I no more able to start a project without it :).

There is multiple way to use Python, for example the official Python C API, I'll use boost 1.42 python.

Files
There will be 3 different "programs" :
  • test executable : it loads the python plug-in, extract and use the corresponding c++ object. It need two arguments : plug-in interface library directory path and python plug-in path
  • plugin interface library : defines the plug-in interface so that it can be inherited in python and used in the c++
  • python plugin
  
CMakeLists.txt
cmake_minimum_required(VERSION 2.8)

#
# Setup depencies
#
 # For python you can't specify the version, so it'll take the default one (for me it's 2.7)
 # set PYTHON_INCLUDE_DIRS and PYTHON_LIBRARIES
find_package( PythonLibs REQUIRED )
 # set PYTHON_LIBRARIES
find_package( Boost 1.42.0 COMPONENTS python )

#
# boost/python.hpp need this to compile
#
include_directories ( ${PYTHON_INCLUDE_DIRS} )

#
#c++ plug-in interface
#
add_library( plugin SHARED "Plugin.hpp" "Plugin.cpp"  )
 # avoid the lib prefix on unix
set_target_properties( plugin PROPERTIES PREFIX "" )
target_link_libraries( plugin ${Boost_LIBRARIES} )

#
#test executable
#
add_executable( test "main.cpp" )
 # need to link with boost python and python
target_link_libraries( test ${Boost_LIBRARIES} ${PYTHON_LIBRARIES} plugin )

Plug-in interface Library
hpp
#include <boost/python.hpp>

namespace bpy = boost::python;

namespace plugin
{
  // Need to inherits from bpy::wrapper
  // to easily access python implementation
  // (see python_virtual definition )
  class Object : public bpy::wrapper<object>
  {
  public:    
    void exported();  // will be accessible from python ( see line 32 )
    void non_exported(); // won't be accessible from python

    void python_virtual(); // will call the python plugin implementation
  };
}

cpp
#include "Plugin.hpp"
#include <iostream>

namespace plugin
{
  void Object::exported()
  {
    std::cout << "exported called " << std::endl;
  }

  void Object::non_exported()
  {
  }

  void Object::python_virtual()
  {
      this->get_override("python_virtual")(); // call the inheriting python object python_virtual member function
  }
}

// Define the python module that will be used
// to encapsulate all exported symbols.
// It must be compiled in a library with this
// exact name ( plugin.so/dll )
BOOST_PYTHON_MODULE(plugin)
{
//All this will be accessible from python  
    boost::python::class_<plugin::Object, boost::noncopyable>("Object")
        .def("exported", &plugin::Object::exported );
    
    // You can export enums with 
    //boost::python::enum_<EnumType>("EnumTypePython")
    //  .value("EnumVal0Python", EnumVal0)
    //  .value("EnumVal1Python", EnumVal1)
    //;
}

Test executable
#include <boost/python.hpp>

#include <string>
#include <iostream>

#include "Plugin.hpp"

int main(int argc, char * argv[] )
{
  if( argc == 3 ) // need two parameters
  {
    std::string const python_interface_path = argv[1];
    std::string const python_plugin_path = argv[2];

    namespace plg = plugin;
    
    try
    {
      Py_Initialize(); // Initialize the python system
      
    //Construct a context for the plugin to run in
      bpy::object main_module = bpy::import( "__main__" );
      bpy::object main_namespace = main_module.attr( "__dict__" );
      // With sys.path setup to be able to import plugin library
      bpy::object sys_module = bpy::import( "sys" );
      bpy::object path = sys_module.attr( "path" );
      path.attr("append")( python_interface_path );
    
    //We execute the plugin in the previously built context
      bpy::exec_file(
         python_plugin_path.c_str(),
         main_namespace
      );    
      
    //And extract the object variable
    //It must be an instance of a plugin::Object child
      plg::Object & object = bpy::extract<plg::Object &>( main_namespace["object"] );

    //Call a "python virtual" function
      object.python_virtual();
    }
    catch( bpy::error_already_set const & )
    {
      PyErr_Print(); // For any boost::python exception the python log will be printed
    }    
  }
}

Python plugin-in
import plugin # import the plugin library

class PluginObject ( plugin.Object ): # inherit from plugin base class
  def __init__(self):
    plugin.Object.__init__(self)
    self.exported()
    #self.non_exported() # can't be called because it have not been exported

  def python_virtual(self) :
    print( "python virtual called" )

# this object variable will be extracted
# in the c++ text executable
object = PluginObject()


This is just a tiny part of what you can do. So, have fun coding your plug-ins ;)

jeudi 30 septembre 2010

Memory alignment : theory and c++ examples

--- [ e-on software research ] ---

1. Alignment theory
a. Definition
b. How processor fetch memory
c. Data structure padding

2. C++ examples
a. What the c++ specification says
b. GCC and Visual c++ x86 / x86-64 implementation
c. Benchmarks
d. Controlling alignment and padding
e. Common data type size and alignment

3. References


1. Alignment theory


This post is a refactoring of what you can find over the web. Used sources can be found in references section.


a. Definition


The alignment of a given variable is the largest power-of-2 value, where the address of the variable, modulo this power-of-two value is 0, that is :
address modulo alignment = 0
We will call this variable alignment-byte aligned.

Note
– Different types can have different alignment requirement
– If x > y, and both x and y are power-of-two values, a variable that is x-byte aligned is also y-byte aligned

Example
Address (bytes)Alignment
0x00infinite
0x011-byte
0x022-byte
0x031-byte
0x044-byte (so also 2-byte)
0x051-byte
0x062-byte
0x071-byte
0x088-byte (4 and 2)


b. How processor fetch memory


Aligned address
– Read the chunk and place it into the register

Unaligned address
– read the first chunk of the unaligned address
– shift out the "unwanted" bytes from the first chunk
– read the second chunk of the unaligned address
– shift out some of its information
– merged together the two chunks for placement in the register

Compared to only read a chunk, it's a lot of work !

schema

Some processors just aren't willing to do all of that work for you :
– exception (68000)
– nothing
– something wrong (Altivec, Itanium)


c. Data structure padding


Compilators add unnamed data members in structures :
– After members, to keep members aligned on their required alignment
– After the last member to keep structure aligned in arrays

Note
To keep these two constraints, a structure alignment requirement, is the stricter member alignment requirement.

Example
We take :
– char : 1-byte aligned and take 1 byte
– int : 2-byte aligned. And take 2 byte

struct S // must be 2–byte aligned
{
   char c1; // can be placed on any address
   int i;   // must be 2-byte aligned
   char c2; // can be placed on any address
};
S s[2]; // sizeof(s) == 10 bytes (2 bytes for padding)


AddressVariable
0x0s[0].c1
0x1unnamed member
0x2s[0].i
0x3s[0].i
0x4S[0].c2
0x5s[1].c1
0x6unnamed member
0x7s[1].i
0x8s[1].i
0x9S[1].c2

Tips
– We could have saved the 2 padding bytes by placing c2 just before i
– With power of two alignments ascending/descending (by size) declaration deliver an optimal size, but writing readable code should be your primary goal


2. C++ examples


a. What the c++ specification says


The C+ + memory model [intro.memory] (1.7 § 1)
The fundamental storage unit in the C + + memory model is the byte. A byte […] is composed of a contiguous sequence of bits, the number of which is implementation-defined.
Types (3.9 §5)
[…] The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.
Sizeof (5.3.3 §2)
When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.


b. GCC and Visual c++ x86/x86-64 implementation


For performance reason, all types are aligned on their natural lengths, except items that are greater than 8 bytes in length,. It is recommended that all structures larger than 16 bytes align on 16-byte boundaries.

In general, for the best performance, align data as follows:
– align 8-bit data at any address
– align 16-bit data to be contained within an aligned four-byte word
– align 32-bit data so that its base address is a multiple of four
– align 64-bit data so that its base address is a multiple of eight
– align 80-bit data so that its base address is a multiple of sixteen
– align 128-bit data so that its base address is a multiple of sixteen

SSE2 instructions on x86 CPUs do require the data to be 128-bit (16-byte) aligned and there can be substantial performance advantages from using aligned data on these architectures.


c. Benchmarks


9 000 000 iterations double copy from source[i] to dest[i].

unaligned / aligned access time ratio :
– pentium III (731 MHz) : 3.25 times slower
– pentium IV (2.53 GHz) : 2 times slower
– itanium2 (900 MHz) : 459 times slower


d. Controlling alignment and padding


visual 2008
#pragma pack(4) // 4-byte aligned
struct S
{
   char c;   // 1-byte aligned
   double d; // 4-byte aligned instead of 8-bytes aligned
             // causes warning C4121
};
#pragma pack() // reset to default

This pragma directive permits to have a maximum alignment of N-byte.


gcc 4
GCC understands pragma pack as visual. But can use more accurate syntaxes :
struct foo
{
   int x[2] __attribute__ ((aligned (8))); // minimum 8-byte aligned
};

struc foo
{
   char a;
   int x[2] __attribute__ ((packed)); // pack this member behind a
};


e. Common data type size and alignment


VISUAL C++ / GCC (WIN32 )
typesize (bytes)alignment(byte)
void *44
bool11
char11
short22
int44
long44
float44
double88

VISUAL C++ (WIN64)
typesize (bytes)alignment(byte)
void *88
bool11
char11
short22
int44
long44
float44
double88

MAC OS 10.6 (32 bits)
typesize (bytes)alignment(byte)
void *44
bool11
char11
short22
int44
long44
float44
double84

MAC OS 10.6 (64 bits)
typesize (bytes)alignment(byte)
void *88
bool11
char11
short22
int44
long88
float44
double88


3. References


C++ specification
INTERNATIONAL STANDARD ISO/IEC 14882 Second edition 2003-10-15

WIKIPEDIA
http://en.wikipedia.org/wiki/Data_structure_alignment

IBM
http://www.ibm.com/developerworks/library/pa-dalign

Microsoft MSDN
http://msdn.microsoft.com/en-us/library/aa290049%28VS.71%29.aspx

Intel
http://software.intel.com/en-us/articles/data-alignment-when-migrating-to-64-bit-intel-architecture

mardi 3 août 2010

OpenGL 3.3 / OpenCL 1.1 texture sharing

I'll present a sample that create an OpenGL texture, update it with an OpenCL kernel, and display it on a screen space quad.

The complete source code is located on sourceforge : virtrev

A CMakeList (CMake) is provided to create the project type you want.

Requirements
  • An OpenCL (1.0 or 1.1) SDK supporting the gl_sharing_khr extension
  • Freeglut libraries is used for windows management
  • An OpenGL 3.3 SDK


Code
To be more concise, almost all message printing and error management has been removed. But remember that error management is essential to know exactly what's happening, and information printing could be interesting.

Also here I use OpenGL, Glut, and OpenCL API directly, it's easier for teaching. But in a real project I guess I should be better to use higher lever libraries.

Main
int main(int argc, char * argv[])
{
  glutInit(&argc, argv);
  glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE | GLUT_ALPHA | GLUT_DEPTH | GLUT_STENCIL);

  glutCreateWindow("clgl_sharing");
  glutSetOption( GLUT_ACTION_ON_WINDOW_CLOSE, GLUT_ACTION_GLUTMAINLOOP_RETURNS );
   
  glutDisplayFunc( display );
  glutReshapeFunc( reshape );
  glutIdleFunc( idle );

  init_gl();
  glFinish(); //Wait until all opengl commands are processed
  init_cl();

  glutMainLoop();

  destroy_gl();
  destroy_cl();
}


Global data
size_t const TEXTURE_WIDTH = 1024;
size_t const TEXTURE_HEIGHT = 1024;


OpenGL data
namespace gl
{
  GLuint positionBuffer = 0;
  GLuint texcoordBuffer = 0;
  GLuint texture = 0;
  GLuint vertexShader = 0;
  GLuint fragmentShader = 0;
  GLuint program = 0;
  GLuint vertexArray = 0;
  GLuint sampler = 0;
  
  float const positionData[] = { -1,-1,0, -1,1,0, 1,1,0, 1,-1,0 };
  float const texcoordData[] = { 0,0,   0,1,   1,1,   1,0   };
  
  char const * const vertexShaderSrc = "\
    #version 330 core \n\
    \n\
    in Vertex{ \n\
      vec3 position; \n\
      vec2 texcoord; \n\
    } vertex; \n\
    \n\
    out vec2 texcoord; \n\
    \n\
    void main() \n\
    { \n\
      texcoord = vertex.texcoord; \n\
      gl_Position = vec4(vertex.position, 1); \n\
    } \n\
  ";

  char const * const fragmentShaderSrc = "\
    #version 330 core \n\
    \n\
    uniform sampler2D texture; \n\
    \n\
    in vec2 texcoord; \n\
    \n\
    out vec4 fragColor; \n\
    \n\
    void main() \n\
    { \n\
      fragColor = texture2D( texture, texcoord ); \n\
    } \n\
  ";
  
  GLuint positionAttrib;
  GLuint texcoordAttrib;
  GLuint samplerLocation;
}


OpenCL data
namespace cl
{
  cl_device_type const DEVICE_TYPE = CL_DEVICE_TYPE_GPU;
  cl_platform_id platform = 0;
  cl_device_id device = 0;
  cl_context context = 0;
  cl_mem texture = 0;
  cl_command_queue command_queue = 0;
  cl_program program = 0;
  cl_kernel kernel = 0;
  
  char const * programSrc = "\
   float m1_1_to_0_1( float val ) \n\
  { \n\
    return (val + 1.f ) / 2.f; \n\
  } \n\
  \n\
  __kernel void writeTexture( __write_only image2d_t image, float t ) \n\
  { \n\
    int2 coordi = (int2)( get_global_id(0), get_global_id(1) ); \n\
    float2 coordf = convert_float2(coordi) / (float2)( get_global_size(0), get_global_size(1) ); \n\
    float4 color = (float4)( \n\
      m1_1_to_0_1 ( cos( (coordf.x+0.5f) * 3.14f * 10.f + t ) * sin( coordf.y * 3.14f * 10.f + t ) * tan( coordf.y * coordf.x * 3.14f * 10.f + t ) ), \n\
     1.f -  m1_1_to_0_1 ( cos( (coordf.x+0.5f) * 3.14f * 10.f + t ) * sin( coordf.y * 3.14f * 10.f + t ) * tan( coordf.y * coordf.x * 3.14f * 10.f + t ) ), \n\
      0.f, \n\
      1.f \n\
    ); \n\
    \n\
    write_imagef( \n\
      image, \n\
      coordi, \n\
      color \n\
    );\n\
  } \n\
  ";
  
  char const * build_options = "";
  
  size_t const GLOBAL_WORK_SIZE[2] = {TEXTURE_WIDTH, TEXTURE_HEIGHT};
}


OpenGL initialization
void init_gl()
{
// Buffers
  glGenBuffers(1, &gl::positionBuffer);
  glGenBuffers(1, &gl::texcoordBuffer);
  
  glBindBuffer( GL_ARRAY_BUFFER, gl::positionBuffer );
  glBufferData( GL_ARRAY_BUFFER, sizeof(gl::positionData), gl::positionData, GL_STATIC_DRAW );
  
  glBindBuffer( GL_ARRAY_BUFFER, gl::texcoordBuffer );
  glBufferData( GL_ARRAY_BUFFER, sizeof(gl::texcoordData), gl::texcoordData, GL_STATIC_DRAW );
 
  glBindBuffer( GL_ARRAY_BUFFER, 0 );
  
// Texture
  glActiveTexture( GL_TEXTURE0 );

  glGenTextures(1, &gl::texture);
  glBindTexture( GL_TEXTURE_2D, gl::texture );
  
  GLfloat * textureData = new GLfloat[TEXTURE_WIDTH * TEXTURE_HEIGHT];
  glTexImage2D( GL_TEXTURE_2D, 0, GL_RGBA, TEXTURE_WIDTH, TEXTURE_HEIGHT, 0, GL_RED, GL_FLOAT, textureData );
  delete textureData;
  glGenerateMipmap( GL_TEXTURE_2D );
  
  glBindTexture( GL_TEXTURE_2D, 0 );
  
// Sampler
  glGenSamplers(1, &gl::sampler );
  glSamplerParameterf( gl::sampler, GL_TEXTURE_MIN_FILTER, GL_NEAREST );
  glSamplerParameterf( gl::sampler, GL_TEXTURE_MAG_FILTER, GL_NEAREST );  
  
// Shaders
  gl::vertexShader = create_shader( GL_VERTEX_SHADER, gl::vertexShaderSrc );
  gl::fragmentShader = create_shader( GL_FRAGMENT_SHADER, gl::fragmentShaderSrc );
  
  gl::program = glCreateProgram();
  
  glAttachShader( gl::program, gl::vertexShader );
  glAttachShader( gl::program, gl::fragmentShader );
  
  glLinkProgram( gl::program );
  
  GLint status;
  glGetProgramiv( gl::program, GL_LINK_STATUS, &status ); 
  
  if( status == GL_FALSE)
  {
    GLint length;
    glGetProgramiv( gl::program, GL_INFO_LOG_LENGTH, &length );
    
    GLchar * infoLog = new GLchar[length];
    
    glGetProgramInfoLog( gl::program, length, NULL, infoLog);

    std::cerr << "shader link log : " << infoLog << std::endl;
  }
  
//Attribute
  gl::positionAttrib = glGetAttribLocation( gl::program, "Vertex.position" );
  gl::texcoordAttrib = glGetAttribLocation( gl::program, "Vertex.texcoord" );
  
// Uniform
  gl::samplerLocation = glGetUniformLocation( gl::program, "texture" );
  
  glUseProgram( gl::program );
  
  glUniform1i( gl::samplerLocation, 0 );
  
  glUseProgram( 0 );
  
// Vertex array
  glGenVertexArrays( 1, &gl::vertexArray );
   
  glBindVertexArray( gl::vertexArray );
  
  glBindBuffer( GL_ARRAY_BUFFER, gl::positionBuffer );
  glVertexAttribPointer( gl::positionAttrib, 3, GL_FLOAT, false, 0, 0);
  
  glBindBuffer( GL_ARRAY_BUFFER, gl::texcoordBuffer );
  glVertexAttribPointer( gl::texcoordAttrib, 2, GL_FLOAT, false, 0, 0);
  
  glEnableVertexAttribArray( gl::positionAttrib );
  glEnableVertexAttribArray( gl::texcoordAttrib );
  
  glBindBuffer( GL_ARRAY_BUFFER, 0 );
  
  glBindVertexArray( 0 );
  
  glFinish();
}


OpenCL platform/device filtering
We only want one platform/device supporting gl_sharing_khr.
bool use_this_cl_platform(cl_platform_id const & platform)
{
  size_t extensions_size;
  char * extensions;
  
  clGetPlatformInfo( platform, CL_PLATFORM_EXTENSIONS, 0, NULL, &extensions_size);
  extensions = new char[extensions_size];
  clGetPlatformInfo( platform, CL_PLATFORM_EXTENSIONS, extensions_size, extensions, NULL);
  
  std::string ext = extensions;
  bool use_this_platform = ext.find("cl_khr_gl_sharing") != std::string::npos;
  
  delete[] extensions;
  
  return use_this_platform;
}

bool use_this_cl_device(cl_device_id const & device)
{
  size_t exensions_size;
  char * extensions;

  clGetDeviceInfo( device, CL_DEVICE_EXTENSIONS, 0, NULL, &exensions_size);
  extensions = new char[exensions_size];
  clGetDeviceInfo( device, CL_DEVICE_EXTENSIONS, exensions_size, extensions, NULL);
  
  std::string ext = extensions;
  bool use_this_device = ext.find("cl_khr_gl_sharing") != std::string::npos;
  
  delete[] extensions;
  
  return use_this_device;
}


OpenCL initialization
void init_cl()
{
//Platforms
  cl_uint num_platforms;
  cl_platform_id * platforms;
    
  clGetPlatformIDs( 0, NULL, &num_platforms);
  
  platforms = new cl_platform_id[num_platforms];
   
  clGetPlatformIDs( num_platforms, platforms, NULL);
  
  for(size_t p=0; p<num_platforms; ++p)
  {
    bool use_this_platform = false;
    if( cl::platform == 0 && use_this_cl_platform( platforms[p] ) )
    {
      cl::platform = platforms[p];
      use_this_platform = true;
    }
    
//Devices
    cl_uint num_devices;
    cl_device_id *devices;
    clGetDeviceIDs( platforms[p], cl::DEVICE_TYPE, 0, NULL, &num_devices);
    devices = new cl_device_id[num_devices];
    clGetDeviceIDs( platforms[p], cl::DEVICE_TYPE, num_devices, devices, NULL);
    
    for(size_t d=0; d<num_devices; ++d)
    {    
      bool use_this_device = false;
      if( use_this_platform && cl::device == 0 && use_this_cl_device( devices[d] ) )
      {
         cl::device = devices[d];
         use_this_device = true;
      }
    }
    
    delete[] devices;
  }
  
  delete[] platforms;

  if( cl::platform == 0 )
  {
    std::cerr << "no cl platform found with  cl_khr_gl_sharing extension" << std::endl;
    exit( 1 );
  }
  
  if( cl::device == 0 )
  {
    std::cerr << "no cl device found with  cl_khr_gl_sharing extension" << std::endl;
    exit( 1 );
  }
  
//Context
#ifdef BUILD_UNIX
  Display * display = XOpenDisplay(NULL);
#endif

  cl_context_properties properties[] =
  {
    CL_CONTEXT_PLATFORM, reinterpret_cast<cl_context_properties>(cl::platform),
#ifdef BUILD_UNIX
    CL_GL_CONTEXT_KHR, reinterpret_cast<cl_context_properties>(glXGetCurrentContext()),
    CL_GLX_DISPLAY_KHR, reinterpret_cast<cl_context_properties>(display),
#endif
#ifdef BUILD_WINDOWS
    CL_GL_CONTEXT_KHR, reinterpret_cast<cl_context_properties>( wglGetCurrentContext() ),
    CL_WGL_HDC_KHR, reinterpret_cast<cl_context_properties>( wglGetCurrentDC() ),
#endif
    0
  };
  
  cl::context = clCreateContext( properties, 1, &cl::device, NULL, NULL, NULL);

#ifdef BUILD_UNIX
  XCloseDisplay(display);
#endif

//Texture 2D
  cl::texture = clCreateFromGLTexture2D ( cl::context, CL_MEM_WRITE_ONLY, GL_TEXTURE_2D, 0, gl::texture, NULL);
  
//Command queue
  cl::command_queue = clCreateCommandQueue( cl::context, cl::device, 0, NULL); 
   
//Program
  cl::program = clCreateProgramWithSource  ( cl::context, 1, &cl::programSrc, NULL, NULL);
    
  clBuildProgram ( cl::program, 1, &cl::device, cl::build_options, NULL, NULL );

  cl_build_status buildStatus;
  
  clGetProgramBuildInfo ( cl::program, cl::device, CL_PROGRAM_BUILD_STATUS, sizeof( cl_build_status ), &buildStatus, NULL);
  
  if( buildStatus != CL_BUILD_SUCCESS )
  {
    std::cerr << "Build status : " << buildStatus << std::endl;
    char * log = NULL;
    size_t logSize;
    clGetProgramBuildInfo ( cl::program, cl::device, CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
    log = new char[logSize];
    clGetProgramBuildInfo ( cl::program, cl::device, CL_PROGRAM_BUILD_LOG, logSize, log, NULL);
    
    std::cerr << "Build log : " << log << std::endl;
    
    delete[] log;
  }
   
//Kernel
  cl::kernel = clCreateKernel( cl::program, "writeTexture", NULL);

  clSetKernelArg( cl::kernel, 0, sizeof( cl_mem), &cl::texture );
}



GPU resources destruction
It's really important to destroy data before exiting, otherwise the easyier way to free space on the GPU is to reboot the system.
void destroy_gl()
{
  glDeleteBuffers( 1, &gl::positionBuffer );
  glDeleteBuffers( 1, &gl::texcoordBuffer );
  glDeleteTextures( 1, &gl::texture );

  glDeleteShader( gl::vertexShader );
  glDeleteShader( gl::fragmentShader );
  glDeleteProgram( gl::program );
  
  glDeleteVertexArrays( 1, &gl::vertexArray );
 
  glDeleteSamplers( 1, &gl::sampler );
}

void destroy_cl()
{
  clReleaseMemObject( cl::texture );
  clReleaseKernel( cl::kernel );
  clReleaseProgram( cl::program );
  clReleaseCommandQueue( cl::command_queue );
  clReleaseContext( cl::context );
}


Display
void display()
{
  static float t = 0; // a "time" variable
  t += 0.01f;

//OPENCL
//Acquire shared objects
  clEnqueueAcquireGLObjects ( cl::command_queue, 1, &cl::texture, 0, NULL, NULL );

  clSetKernelArg( cl::kernel, 1, sizeof( cl_float ), &t );

//Run OpenCL
  clEnqueueNDRangeKernel(
    cl::command_queue,
    cl::kernel,
    2,
    NULL, cl::GLOBAL_WORK_SIZE, NULL,
    0, NULL, NULL
  );
 
//Release shared Objects
  clEnqueueReleaseGLObjects ( cl::command_queue, 1, &cl::texture, 0, NULL, NULL );
  
//Wait until all OpenCL commands are processed
  clFinish( cl::command_queue );

//OPENGL
  glClear( GL_COLOR_BUFFER_BIT );
  glUseProgram( gl::program );
  
  glActiveTexture( GL_TEXTURE0 + 0 );
  glBindTexture( GL_TEXTURE_2D, gl::texture );
  glBindSampler( 0, gl::sampler );
  
  glBindVertexArray( gl::vertexArray );
  glDrawArrays(GL_TRIANGLE_FAN, 0, 4);
  glBindVertexArray( 0 );
  
  glBindTexture( GL_TEXTURE_2D, 0 );
  
  glutSwapBuffers(); //After this all OpenGL commands are processed
}


Other functions
void idle()
{
  glutPostRedisplay();
}
void reshape(int width, int height)
{
  glViewport(0, 0, width, height);
}


Testing
  • on kubuntu 10.4 64bits with a nVidia 9800gt (256.35 drivers).
  • on windows vista 64 bits with nVidia gtx275