Invoke native method with java array

zufarfakhurtdinov · August 31, 2016, 9:55am

Hello. I want to use method glBufferData from opengles with different types of arrays, for example float array.
I have an interface that I should implement or bind (I have a lot of java arrays):

public static void glBufferDataF(int target, int size, float data[], int usage)

What the most performant way to invoke glBufferData with float array? Can I just annotate that method somehow and bind to native(with same signature)?

Native methos signature https://www.khronos.org/opengles/sdk/docs/man/xhtml/glBufferData.xml

P.s. for robovm I’ve used jni
GL.java:

public static native void glBufferDataF(int target, int size, float data[], int usage);

GL.cpp:

#include <jni.h>
#include <OpenGLES/ES2/gl.h>
#include <OpenGLES/ES2/glext.h>

extern "C" {
  static inline void glBufferDataHelper(JNIEnv *env, jint target, jint size, jarray data, jint usage) {
    void *dataPtr = env->GetPrimitiveArrayCritical(data, 0);
    glBufferData(target, size, dataPtr, usage);
    env->ReleasePrimitiveArrayCritical(data, dataPtr, JNI_COMMIT);
  }

  JNIEXPORT void JNICALL Java_delightex_opengl_GL_glBufferDataF(JNIEnv *env, jclass, jint target, jint size, jfloatArray data, jint usage) {
    glBufferDataHelper(env, target, size, data, usage);
  }
}

When I do same in MOE I get
/Volumes/SSD/gh-moe-master/moe/natj/natj/src/main/native/natj/NatJ.cpp:1639 FAILURE: C callback not found for method glBufferDataF in class mypackage.opengl.GL!

liliomk · August 31, 2016, 10:26am

Dear zufarfakhurtdinov,

Could you please send us the Java method definition with annotations of mypackage.opengl.GL.glBufferDataF?

liliomk · August 31, 2016, 10:28am

In the meantime, you should be able to use glBufferData without source code modification like this:

float[] buffer = getData();
OpenGLES.glBufferData(target, buffer.length * 4, PtrFactory.newFloatArray(buffer), usage);

zufarfakhurtdinov · August 31, 2016, 10:57am

Thank you! There is my code:

@Runtime(CRuntime.class)
public final class GL {

  static {
    NatJ.register();
  }

  private GL() {
  }
  @CFunction
  public static native void glBufferDataF(int target, int size, float data[], int usage);

}

What PtrFactory.newFloatArray(buffer) do exactly? Is it create new native array and copy buffer values to it?

liliomk · August 31, 2016, 11:24am

Dear zufarfakhurtdinov,

Yes

You should be able to use raw JNI without NatJ annotations:

public final class GL {
  private GL() {
  }
  public static native void glBufferDataF(int target, int size, float data[], int usage);
}

#include <jni.h>
#include <OpenGLES/ES2/gl.h>
#include <OpenGLES/ES2/glext.h>

extern "C" {
  static inline void glBufferDataHelper(JNIEnv *env, jint target, jint size, jarray data, jint usage) {
    void *dataPtr = env->GetPrimitiveArrayCritical(data, 0);
    glBufferData(target, size, dataPtr, usage);
    env->ReleasePrimitiveArrayCritical(data, dataPtr, JNI_COMMIT);
  }

  JNIEXPORT void JNICALL Java_mypackage_opengl_GL_glBufferDataF(JNIEnv *env, jclass, jint target, jint size, jfloatArray data, jint usage) {
    glBufferDataHelper(env, target, size, data, usage);
  }
}

pkirill · August 31, 2016, 11:34am

Hi

The problem is that we make real-time system that must deliver constant GL Framerate
So we mostly writing 0-garbage code that performs at some constant speed.
The fact that PtrFactory.newFloatArray create a new buffer - this is absolutely unacceptable.
RoboVM compiler provides transparent, most efficient way of getting temporary native address of an array.
This address is only valid during a function call, same as GetPrimitiveArrayCritical\ReleasePrimitiveArrayCritical pair.

In practice, GetPrimitiveArrayCritical\ReleasePrimitiveArrayCritical JNI pair is much slower then a RoboVM
bridge.

Is there a way of doing so in NatJ ?

pkirill · August 31, 2016, 11:35am

Hi, Kristóf

What might be a reason for
/Volumes/SSD/gh-moe-master/moe/natj/natj/src/main/native/natj/NatJ.cpp:1639 FAILURE: C callback not found for method glBufferDataF in class mypackage.opengl.GL!

?

zufarfakhurtdinov · August 31, 2016, 11:52am

method without jni works well

java:

public static native int glCreateBuffer();

Cpp:

extern "C" {
  int glCreateBuffer() {
    GLuint result[1] = {0};
    glGenBuffers(1, result);
    return result[0];
  }
}

In fact I don’t use natJ wrapper tool. I compile .a file by myself and patch xcode project to add linker flags to MOE_CUSTOM_OTHER_LDFLAGS += “-force_load libMy.a -framework OpenGLES”
Maybe I miss something?

zufarfakhurtdinov · August 31, 2016, 4:16pm

jni works. I’ve did mistake with imports.
So, the main question is how I can invoke native method with float[] with best performance and zero garbage?

kisg · September 1, 2016, 9:13am

Dear Zufar and Kirill,

let me summarise the options for calling native code from Java on MOE.

You essentially have 2 options: JNI and Nat/J.

Unlike RoboVM’s Bro, Nat/J is a layer on top of JNI, not a parallel solution. The reason for this design decision is that we wanted Nat/J to work with any JNI compatible VM, not just the one used by MOE. For example we use Nat/J with the Oracle VM in the device launcher and in the Nat/J Generator (to bind libclang to Java).

This means, that in general there is no shortcut in Nat/J around JNI. If you already have JNI code (e.g. for use on Android), that should work with the same performance characteristics on iOS with MOE, since the very same code (ART VM) will be executed on both platforms.

We also need to keep in mind, that ART includes a lot of advanced features, e.g. compacting and concurrent GC, that makes such things like “just give me the address of the object” impossible (or just very dangerous).

One thing you can try to do is to enable “Fast JNI” mode for specific, performance critical JNI functions. This is an extension of ART, so it should work on both Android and iOS with MOE. It is kind of a hidden feature of ART, here is the comment from the ART source code (jni_internal.cc):
> // Notes about fast JNI calls:
> //
> // On a normal JNI call, the calling thread usually transitions
> // from the kRunnable state to the kNative state. But if the
> // called native function needs to access any Java object, it
> // will have to transition back to the kRunnable state.
> //
> // There is a cost to this double transition. For a JNI call
> // that should be quick, this cost may dominate the call cost.
> //
> // On a fast JNI call, the calling thread avoids this double
> // transition by not transitioning from kRunnable to kNative and
> // stays in the kRunnable state.
> //
> // There are risks to using a fast JNI call because it can delay
> // a response to a thread suspension request which is typically
> // used for a GC root scanning, etc. If a fast JNI call takes a
> // long time, it could cause longer thread suspension latency
> // and GC pauses.
> //
> // Thus, fast JNI should be used with care. It should be used
> // for a JNI call that takes a short amount of time (eg. no
> // long-running loop) and does not block (eg. no locks, I/O,
> // etc.)
> //
> // A ‘!’ prefix in the signature in the JNINativeMethod
> // indicates that it’s a fast JNI call and the runtime omits the
> // thread state transition from kRunnable to kNative at the
> // entry.

When we tested the fast JNI mode, we measured up to 10x performance increase with short JNI functions, so it might be worth a try for you. But keep in mind that this is very platform specific and should be used with care to avoid any unwanted side effects.

Another option can be to use direct nio Buffers (possibly allocated from native code) instead of Java arrays. This can avoid GC overhead completely.

Finally, we are not opposed to adding platform specific optimizations to Nat/J or the MOE runtime. We just always want to make sure that there is an alternative code path that will work on other Java runtimes.

For example: we might introduce a @FastJNI annotation in the future, that will use the fast JNI mode on ART VM and do nothing on other VMs.

Best Regards,
Gergely

pkirill · September 1, 2016, 9:50am

Hi, thanks for the nice answer.

Unfortunately, direct nio read\write operations are so slower then java-array access so that using it will significantly slowdown java code working with such buffers (opengl matrices as an example).

According to description it seems logical that the most frequent opengl calls must be “Fast JNI”.

If the GetReleasePrimitiveArrayCritical is possible in “Fast JNI” mode, it will be very interesting to measure.

How we can add this magic “!” prefix ?

liliomk · September 1, 2016, 2:54pm

Dear Kirill Prazdnikov,

We created a FastJNI sample which is available in the moe-samples-java repository.

liliomk · September 1, 2016, 3:50pm

Dear Kirill Prazdnikov,

Here are some measurements with the sample on different targets.

FastJNIBench.pdf (60.5 KB)