The Cygnus Native Interface for C++/Java Integration

The Cygnus Native Interface for C++/Java Integration Writing native Java methods in natural C++ Cygnus Solutions March, 2000 This documents CNI, the Cygnus Native Interface, which is is a convenient way to write Java native methods using C++. This is a more efficient, more convenient, but less portable alternative to the standard JNI (Java Native Interface). Basic Concepts In terms of languages features, Java is mostly a subset of C++. Java has a few important extensions, plus a powerful standard class library, but on the whole that does not change the basic similarity. Java is a hybrid object-oriented language, with a few native types, in addition to class types. It is class-based, where a class may have static as well as per-object fields, and static as well as instance methods. Non-static methods may be virtual, and may be overloaded. Overloading is resolved at compile time by matching the actual argument types against the parameter types. Virtual methods are implemented using indirect calls through a dispatch table (virtual function table). Objects are allocated on the heap, and initialized using a constructor method. Classes are organized in a package hierarchy. All of the listed attributes are also true of C++, though C++ has extra features (for example in C++ objects may be allocated not just on the heap, but also statically or in a local stack frame). Because gcj uses the same compiler technology as g++ (the GNU C++ compiler), it is possible to make the intersection of the two languages use the same ABI (object representation and calling conventions). The key idea in CNI is that Java objects are C++ objects, and all Java classes are C++ classes (but not the other way around). So the most important task in integrating Java and C++ is to remove gratuitous incompatibilities. You write CNI code as a regular C++ source file. (You do have to use a Java/CNI-aware C++ compiler, specifically a recent version of G++.) You start with: #include <gcj/cni.h> You then include header files for the various Java classes you need to use: #include <java/lang/Character.h> #include <java/util/Date.h> #include <java/lang/IndexOutOfBoundsException.h> In general, CNI functions and macros start with the `Jv' prefix, for example the function `JvNewObjectArray'. This convention is used to avoid conflicts with other libraries. Internal functions in CNI start with the prefix `_Jv_'. You should not call these; if you find a need to, let us know and we will try to come up with an alternate solution. (This manual lists _Jv_AllocBytes as an example; CNI should instead provide a JvAllocBytes function.) These header files are automatically generated by gcjh. Packages The only global names in Java are class names, and packages. A package can contain zero or more classes, and also zero or more sub-packages. Every class belongs to either an unnamed package or a package that has a hierarchical and globally unique name. A Java package is mapped to a C++ namespace. The Java class java.lang.String is in the package java.lang, which is a sub-package of java. The C++ equivalent is the class java::lang::String, which is in the namespace java::lang, which is in the namespace java. Here is how you could express this: // Declare the class(es), possibly in a header file: namespace java { namespace lang { class Object; class String; ... } } class java::lang::String : public java::lang::Object { ... }; The gcjh tool automatically generates the nessary namespace declarations. Nested classes as a substitute for namespaces It is not that long since g++ got complete namespace support, and it was very recent (end of February 1999) that libgcj was changed to uses namespaces. Releases before then used nested classes, which are the C++ equivalent of Java inner classes. They provide similar (though less convenient) functionality. The old syntax is: class java { class lang { class Object; class String; }; }; The obvious difference is the use of class instead of namespace. The more important difference is that all the members of a nested class have to be declared inside the parent class definition, while namespaces can be defined in multiple places in the source. This is more convenient, since it corresponds more closely to how Java packages are defined. The main difference is in the declarations; the syntax for using a nested class is the same as with namespaces: class java::lang::String : public java::lang::Object { ... } Note that the generated code (including name mangling) using nested classes is the same as that using namespaces. Leaving out package names Having to always type the fully-qualified class name is verbose. It also makes it more difficult to change the package containing a class. The Java package declaration specifies that the following class declarations are in the named package, without having to explicitly name the full package qualifiers. The package declaration can be followed by zero or more import declarations, which allows either a single class or all the classes in a package to be named by a simple identifier. C++ provides something similar with the using declaration and directive. A Java simple-type-import declaration: import PackageName.TypeName; allows using TypeName as a shorthand for PackageName.TypeName. The C++ (more-or-less) equivalent is a using-declaration: using PackageName::TypeName; A Java import-on-demand declaration: import PackageName.*; allows using TypeName as a shorthand for PackageName.TypeName The C++ (more-or-less) equivalent is a using-directive: using namespace PackageName; Primitive types Java provides 8 primitives types: byte, short, int, long, float, double, char, and boolean. These are the same as the following C++ typedefs (which are defined by gcj/cni.h): jbyte, jshort, jint, jlong, jfloat, jdouble, jchar, and jboolean. You should use the C++ typenames (e.g. jint), and not the Java types names (e.g. int), even if they are the same. This is because there is no guarantee that the C++ type int is a 32-bit type, but jint is guaranteed to be a 32-bit type. Java type C/C++ typename Description byte jbyte 8-bit signed integer short jshort 16-bit signed integer int jint 32-bit signed integer long jlong 64-bit signed integer float jfloat 32-bit IEEE floating-point number double jdouble 64-bit IEEE floating-point number char jchar 16-bit Unicode character boolean jboolean logical (Boolean) values void void no value JvPrimClass primtype This is a macro whose argument should be the name of a primitive type, e.g. byte. The macro expands to a pointer to the Class object corresponding to the primitive type. E.g., JvPrimClass(void) has the same value as the Java expression Void.TYPE (or void.class). Objects and Classes Classes All Java classes are derived from java.lang.Object. C++ does not have a unique rootclass, but we use a C++ java::lang::Object as the C++ version of the java.lang.Object Java class. All other Java classes are mapped into corresponding C++ classes derived from java::lang::Object. Interface inheritance (the implements keyword) is currently not reflected in the C++ mapping. Object references We implement a Java object reference as a pointer to the start of the referenced object. It maps to a C++ pointer. (We cannot use C++ references for Java references, since once a C++ reference has been initialized, you cannot change it to point to another object.) The null Java reference maps to the NULL C++ pointer. Note that in some Java implementations an object reference is implemented as a pointer to a two-word handle. One word of the handle points to the fields of the object, while the other points to a method table. Gcj does not use this extra indirection. Object fields Each object contains an object header, followed by the instance fields of the class, in order. The object header consists of a single pointer to a dispatch or virtual function table. (There may be extra fields in front of the object, for example for memory management, but this is invisible to the application, and the reference to the object points to the dispatch table pointer.) The fields are laid out in the same order, alignment, and size as in C++. Specifically, 8-bite and 16-bit native types (byte, short, char, and boolean) are not widened to 32 bits. Note that the Java VM does extend 8-bit and 16-bit types to 32 bits when on the VM stack or temporary registers. If you include the gcjh-generated header for a class, you can access fields of Java classes in the natural way. Given the following Java class: public class Int { public int i; public Integer (int i) { this.i = i; } public static zero = new Integer(0); } you can write: #include <gcj/cni.h> #include <Int.h> Int* mult (Int *p, jint k) { if (k == 0) return Int::zero; // static member access. return new Int(p->i * k); } CNI does not strictly enforce the Java access specifiers, because Java permissions cannot be directly mapped into C++ permission. Private Java fields and methods are mapped to private C++ fields and methods, but other fields and methods are mapped to public fields and methods. Arrays While in many ways Java is similar to C and C++, it is quite different in its treatment of arrays. C arrays are based on the idea of pointer arithmetic, which would be incompatible with Java's security requirements. Java arrays are true objects (array types inherit from java.lang.Object). An array-valued variable is one that contains a reference (pointer) to an array object. Referencing a Java array in C++ code is done using the JArray template, which as defined as follows: class __JArray : public java::lang::Object { public: int length; }; template<class T> class JArray : public __JArray { T data[0]; public: T& operator[](jint i) { return data[i]; } }; template<class T> T *elements JArray<T> &array This template function can be used to get a pointer to the elements of the array. For instance, you can fetch a pointer to the integers that make up an int[] like so: extern jintArray foo; jint *intp = elements (foo); The name of this function may change in the future. There are a number of typedefs which correspond to typedefs from JNI. Each is the type of an array holding objects of the appropriate type: typedef __JArray *jarray; typedef JArray<jobject> *jobjectArray; typedef JArray<jboolean> *jbooleanArray; typedef JArray<jbyte> *jbyteArray; typedef JArray<jchar> *jcharArray; typedef JArray<jshort> *jshortArray; typedef JArray<jint> *jintArray; typedef JArray<jlong> *jlongArray; typedef JArray<jfloat> *jfloatArray; typedef JArray<jdouble> *jdoubleArray; You can create an array of objects using this function: jobjectArray JvNewObjectArray jint length jclass klass jobject init Here klass is the type of elements of the array; init is the initial value to be put into every slot in the array. For each primitive type there is a function which can be used to create a new array holding that type. The name of the function is of the form `JvNew<Type>Array', where `<Type>' is the name of the primitive type, with its initial letter in upper-case. For instance, `JvNewBooleanArray' can be used to create a new array of booleans. Each such function follows this example: jbooleanArray JvNewBooleanArray jint length jsize JvGetArrayLength jarray array Returns the length of array. Methods Java methods are mapped directly into C++ methods. The header files generated by gcjh include the appropriate method definitions. Basically, the generated methods have the same names and corresponding types as the Java methods, and are called in the natural manner. Overloading Both Java and C++ provide method overloading, where multiple methods in a class have the same name, and the correct one is chosen (at compile time) depending on the argument types. The rules for choosing the correct method are (as expected) more complicated in C++ than in Java, but given a set of overloaded methods generated by gcjh the C++ compiler will choose the expected one. Common assemblers and linkers are not aware of C++ overloading, so the standard implementation strategy is to encode the parameter types of a method into its assembly-level name. This encoding is called mangling, and the encoded name is the mangled name. The same mechanism is used to implement Java overloading. For C++/Java interoperability, it is important that both the Java and C++ compilers use the same encoding scheme. Static methods Static Java methods are invoked in CNI using the standard C++ syntax, using the `::' operator rather than the `.' operator. For example: jint i = java::lang::Math::round((jfloat) 2.3); Defining a static native method uses standard C++ method definition syntax. For example: #include <java/lang/Integer.h> java::lang::Integer* java::lang::Integer::getInteger(jstring str) { ... } Object Constructors Constructors are called implicitly as part of object allocation using the new operator. For example: java::lang::Int x = new java::lang::Int(234); Java does not allow a constructor to be a native method. Instead, you could define a private method which you can have the constructor call. Instance methods Virtual method dispatch is handled essentially the same way in C++ and Java -- i.e. by doing an indirect call through a function pointer stored in a per-class virtual function table. C++ is more complicated because it has to support multiple inheritance, but this does not effect Java classes. However, G++ has historically used a different calling convention that is not compatible with the one used by gcj. During 1999, G++ will switch to a new ABI that is compatible with gcj. Some platforms (including Linux) have already changed. On other platforms, you will have to pass the -fvtable-thunks flag to g++ when compiling CNI code. Note that you must also compile your C++ source code with -fno-rtti. Calling a Java instance method in CNI is done using the standard C++ syntax. For example: java::lang::Number *x; if (x->doubleValue() > 0.0) ... Defining a Java native instance method is also done the natural way: #include <java/lang/Integer.h> jdouble java::lang:Integer::doubleValue() { return (jdouble) value; } Interface method calls In Java you can call a method using an interface reference. This is not yet supported in CNI. Object allocation New Java objects are allocated using a class-instance-creation-expression: new Type ( arguments ) The same syntax is used in C++. The main difference is that C++ objects have to be explicitly deleted; in Java they are automatically deleted by the garbage collector. Using CNI, you can allocate a new object using standard C++ syntax. The C++ compiler is smart enough to realize the class is a Java class, and hence it needs to allocate memory from the garbage collector. If you have overloaded constructors, the compiler will choose the correct one using standard C++ overload resolution rules. For example: java::util::Hashtable *ht = new java::util::Hashtable(120); void *_Jv_AllocBytes jsize size Allocate size bytes. This memory is not scanned by the garbage collector. However, it will be freed by the GC if no references to it are discovered. Interfaces A Java class can implement zero or more interfaces, in addition to inheriting from a single base class. An interface is a collection of constants and method specifications; it is similar to the signatures available as a G++ extension. An interface provides a subset of the functionality of C++ abstract virtual base classes, but they are currently implemented differently. CNI does not currently provide any support for interfaces, or calling methods from an interface pointer. This is partly because we are planning to re-do how interfaces are implemented in gcj. Strings CNI provides a number of utility functions for working with Java String objects. The names and interfaces are analogous to those of JNI. jstring JvNewString const jchar *chars jsize len Creates a new Java String object, where chars are the contents, and len is the number of characters. jstring JvNewStringLatin1 const char *bytes jsize len Creates a new Java String object, where bytes are the Latin-1 encoded characters, and len is the length of bytes, in bytes. jstring JvNewStringLatin1 const char *bytes Like the first JvNewStringLatin1, but computes len using strlen. jstring JvNewStringUTF const char *bytes Creates a new Java String object, where bytes are the UTF-8 encoded characters of the string, terminated by a null byte. jchar *JvGetStringChars jstring str Returns a pointer to the array of characters which make up a string. int JvGetStringUTFLength jstring str Returns number of bytes required to encode contents of str as UTF-8. jsize JvGetStringUTFRegion jstring str jsize start jsize len char *buf This puts the UTF-8 encoding of a region of the string str into the buffer buf. The region of the string to fetch is specifued by start and len. It is assumed that buf is big enough to hold the result. Note that buf is not null-terminated. Class Initialization Java requires that each class be automatically initialized at the time of the first active use. Initializing a class involves initializing the static fields, running code in class initializer methods, and initializing base classes. There may also be some implementation specific actions, such as allocating String objects corresponding to string literals in the code. The Gcj compiler inserts calls to JvInitClass (actually _Jv_InitClass) at appropriate places to ensure that a class is initialized when required. The C++ compiler does not insert these calls automatically - it is the programmer's responsibility to make sure classes are initialized. However, this is fairly painless because of the conventions assumed by the Java system. First, libgcj will make sure a class is initialized before an instance of that object is created. This is one of the responsibilities of the new operation. This is taken care of both in Java code, and in C++ code. (When the G++ compiler sees a new of a Java class, it will call a routine in libgcj to allocate the object, and that routine will take care of initializing the class.) It follows that you can access an instance field, or call an instance (non-static) method and be safe in the knowledge that the class and all of its base classes have been initialized. Invoking a static method is also safe. This is because the Java compiler adds code to the start of a static method to make sure the class is initialized. However, the C++ compiler does not add this extra code. Hence, if you write a native static method using CNI, you are responsible for calling JvInitClass before doing anything else in the method (unless you are sure it is safe to leave it out). Accessing a static field also requires the class of the field to be initialized. The Java compiler will generate code to call _Jv_InitClass before getting or setting the field. However, the C++ compiler will not generate this extra code, so it is your responsibility to make sure the class is initialized before you access a static field. Exception Handling While C++ and Java share a common exception handling framework, things are not yet perfectly integrated. The main issue is that the run-time type information facilities of the two languages are not integrated. Still, things work fairly well. You can throw a Java exception from C++ using the ordinary throw construct, and this exception can be caught by Java code. Similarly, you can catch an exception thrown from Java using the C++ catch construct. Note that currently you cannot mix C++ catches and Java catches in a single C++ translation unit. We do intend to fix this eventually. Here is an example: if (i >= count) throw new java::lang::IndexOutOfBoundsException(); Synchronization Each Java object has an implicit monitor. The Java VM uses the instruction monitorenter to acquire and lock a monitor, and monitorexit to release it. The JNI has corresponding methods MonitorEnter and MonitorExit. The corresponding CNI macros are JvMonitorEnter and JvMonitorExit. The Java source language does not provide direct access to these primitives. Instead, there is a synchronized statement that does an implicit monitorenter before entry to the block, and does a monitorexit on exit from the block. Note that the lock has to be released even the block is abnormally terminated by an exception, which means there is an implicit try-finally. From C++, it makes sense to use a destructor to release a lock. CNI defines the following utility class. class JvSynchronize() { jobject obj; JvSynchronize(jobject o) { obj = o; JvMonitorEnter(o); } ~JvSynchronize() { JvMonitorExit(obj); } }; The equivalent of Java's: synchronized (OBJ) { CODE; } can be simply expressed: { JvSynchronize dummy(OBJ); CODE; } Java also has methods with the synchronized attribute. This is equivalent to wrapping the entire method body in a synchronized statement. (Alternatively, an implementation could require the caller to do the synchronization. This is not practical for a compiler, because each virtual method call would have to test at run-time if synchronization is needed.) Since in gcj the synchronized attribute is handled by the method implementation, it is up to the programmer of a synchronized native method to handle the synchronization (in the C++ implementation of the method). In otherwords, you need to manually add JvSynchronize in a native synchornized method. Reflection The types jfieldID and jmethodID are as in JNI. The function JvFromReflectedField, JvFromReflectedMethod, JvToReflectedField, and JvToFromReflectedMethod (as in Java 2 JNI) will be added shortly, as will other functions corresponding to JNI. Using gcjh The gcjh is used to generate C++ header files from Java class files. By default, gcjh generates a relatively straightforward C++ header file. However, there are a few caveats to its use, and a few options which can be used to change how it operates: --classpath path --CLASSPATH path -I dir These options can be used to set the class path for gcjh. Gcjh searches the class path the same way the compiler does; these options have their familiar meanings. -d directory Puts the generated .h files beneath directory. -o file Sets the name of the .h file to be generated. By default the .h file is named after the class. This option only really makes sense if just a single class file is specified. --verbose gcjh will print information to stderr as it works. -M -MM -MD -MMD These options can be used to generate dependency information for the generated header file. They work the same way as the corresponding compiler options. -prepend text This causes the text to be put into the generated header just after class declarations (but before declaration of the current class). This option should be used with caution. -friend text This causes the text to be put into the class declaration after a friend keyword. This can be used to declare some other class or function to be a friend of this class. This option should be used with caution. -add text The text is inserted into the class declaration. This option should be used with caution. -append text The text is inserted into the header file after the class declaration. One use for this is to generate inline functions. This option should be used with caution. All other options not beginning with a - are treated as the names of classes for which headers should be generated. gcjh will generate all the required namespace declarations and #include's for the header file. In some situations, gcjh will generate simple inline member functions. Note that, while gcjh puts #pragma interface in the generated header file, you should not put #pragma implementation into your C++ source file. If you do, duplicate definitions of inline functions will sometimes be created, leading to link-time errors. There are a few cases where gcjh will fail to work properly: gcjh assumes that all the methods and fields of a class have ASCII names. The C++ compiler cannot correctly handle non-ASCII identifiers. gcjh does not currently diagnose this problem. gcjh also cannot fully handle classes where a field and a method have the same name. If the field is static, an error will result. Otherwise, the field will be renamed in the generated header; `__' will be appended to the field name. Eventually we hope to change the C++ compiler so that these restrictions can be lifted.