Input Method Editor Help



This is an editor for Input Method keymaps for the systems Yudit (Unicode editor), GATE Unicode Toolkit and EUDICO / ELAN (created at MPI Nijmegen, NL). The first version of the Input Method Editor was written by Eric Auer during an internship at MPI Nijmegen in 2003.


GETTING STARTED:


You can load, merge, edit and save keymaps in the different formats, and you can use GATE Input Methods directly, for example to edit the glyphs column of the main tables or to pick glpyhs for the glyph palette. Glyphs in the glyph palette can be sent to the lower table, the clipboard or (if applicable) the parent frame from where you started this editor.




INTRODUCTION:


To know more about the abilities and limitations of the various file formats (and implementations!), read the text files that come with this distribution. You should also find a copy of the GPL and LGPL license (see www.gnu.org). This license means that the package must either contain the Java sourcecode (at least of the Input Method Editor in the LGPL case, or of the whole package in the GPL case) or contain instructions on where to get it (at no additional cost). Which license applies to which files is told in each file: The files from GATE use LGPL license, most of the other files use GPL license.


Using the sources and javadoc, you can generate detailed technical documentation on the IM editor and GATE IM (in HTML format). The GATE IM is part of the Gate Unicode Kit (GUK) and distributed under the LGPL license. The other Input Method Editor files may be distributed under the the GPL license. The initial version by Eric Auer is the result of an internship at the MPI for Psycholinguistics in Nijmegen (NL).


Please use tool tips (bubble help) to learn about interaction components! (keep the mouse pointer floating over a menu item (or other input field) or use the appropriate hotkey for your system: you will then see some short explanation of that item)


USER INTERFACE:


The main user interface is the central editing area which consists of two tables. The upper table has exactly one row for each 16bit Unicode char. You can select which of the rows should be visible by using the three Unicode menus in the top menu bar. The first menu allows a special selection Show USED glyphs: In this mode, the whole range of Unicode is scanned and only glyphs which have a key sequence (or comment) assigned will be shown.


In normal mode, all glyphs in the ranges that you select will be visible (so some used glyphs may be invisible!). Initially, some basic ASCII characters are enabled, but no ranges (turn the ASCII range on and off again to disable displaying of those basic ASCII characters). Ranges are sorted by glyph number (from \u0000 to \uffff) and detected by some Java function, so newer Java versions will give more detailed menus. If there are very many characters / glyphs in one range, you will be able to select sub ranges by using a sub menu. Whenever you activate some range, the corresponding glyph number will be copied to the scroll to field. Just click it and press enter to scroll to the first glyph of the newly activated range. You can enter a glyph number, \u1234 style escape or a glyph into the field at any time to try to scroll to that glyph. If the glyph is in a currently invisible row, scrolling will be to the best possible point.


The lower table allows you to edit the glyphs column. You can use that to define key sequences that produce multiple glyphs, or to assign different key sequences to equal glyphs (or glyph strings), which will happen for example during file merge. In the upper table you have the special feature of color-coded Unicode character ranges - which should make it easier to find a place back while scrolling - and smart bubble help. The bubble help will show the range and number of a glyph when you leave the mouse pointer over the glyph for a while (only happens if the table cell contains exactly one glyph). Both tables will try to highlight keystroke escapes in the key sequence column. However, Java cannot display HTML and Unicode in JLabel fields at the same time, so highlighting will be invisible when non=ASCII key strokes are found in a cell.


When you export to a GIM file and activate that file (usually by mentioning it in the im.list file of the GATE Input Method system and then restarting Java), you will be able to use the Input Method which is defined by that file while using the Input Method Editor. You have a color-coded menu bar at the left which you can use to enable some GATE IM as the new keyboard mapping for the tables and the scroll to field (other menu components are set up to use the default keyboard driver, so you do not have to worry about typing strange glyphs while working with the menus). You can assign ten of the available "locales" (keyboard mappings in our case) to the buttons, using the File menu, or activate a locale directly from the file menu. The none button (also available in the activate sub menu of the file menu) allows you to turn off the GATE IM at any time. The virtual keyboard which is displayed by the GATE IM will sometimes be visible or invisible at inapproriate times. You can tell it to become (in)visible by clicking the virtual keyboard menu item in the file menu, which does not necessarily show the current status, but only sends commands to the GATE IM about the desired status of the virtual keyboard.


A very use feature is the glyph preview and palette system: Whichever glyph (row) you select in the upper table, it will be shown in a bigger font on a button below the locale selection buttons on the left. You can either click that button directly, which will add that glyph to the palette under the tables, or click one of the four tiny buttons below it. The tiny buttons (see the tool tips to know which does what) allow you to copy the active glyph to either the clipboard, the left column of the lower table, the window that has invoked the IM Editor (if the IM Editor was started from another Java application) or scroll the upper table so that the glyph will be in the top visible row. Normally, you will want to collect glyphs in the palette first and use them later, to find back places in the table or to send special glyphs which you do not have on your keyboard mapping yet to table, clipboard or caller. When you use glpyhs from the palette, the drop down list near the palette determines what happens when you click a palette glyph. When the palette runs full, the oldest entries are spilled. Glyph button and palette also show useful tool tips (bubble help) when you hover your mouse pointer above them.


The main purpose of the tables is to represent and edit a mapping from key sequences. Usually they are plain text, but the GIM file format also allows you to have key stroke sequences like abA-cC-d, which for example means typing a, b, Alt-c and Control-d in sequence. Both GIM and "human readable " export formats will use that syntax, but you cannot import the human readable formats. They are only for dumping data in text or HTML format, for manual reading or for pasting into articles. For each mapping, you can give a comment, but only the human readable and Yudit formats will happily save comments. In GIM, you cannot import but only export comments. Imported comments will be preserved, but will not stay attached to any particular row of the table - so you cannot see or edit them after importing.


The U8 format is very simplistic and allows neither Control or Alt nor comments. However, it saves disk space and can encode mappings that encode choice lists (that is, when you type a key sequence, there are several candidates for the result, and you can select one using the cursor, digit or space keys). Usage of choice lists is in early development in the Input Method Editor! In GATE IM, you have limited support (there can be strings of different length but with the same start leading to different results, but you should not have several key sequences which are exactly the same leading to different results). I am planning to add automatic spreading when you export into other file formats: After sorting by key sequence, all key sequences which are the same are made different by adding a digits (on character, if we run out of digits) to them. As a result, each key sequence will have only one output, making live easy for GATE IM and the like but maybe a bit harder for you.


One of the more important features is the ability to load, merge and save the current table contents from and into files of various formats. You can select the file format independently from the file name. The encoding, however, is fixed: You can export into UTF-8 or ISO-8859-1 text (for pasting into your documents), into HTML (with non-ASCII characters either as \u12a4 escapes or as HTML 4.0 Ӓ entities), but you cannot import those special formats. You can import and export the better-defined file formats, however: GIM (optionally with the probably useless XGIM extension, see below) for the free GATE IM system, Yudit for the free self- contained Unicode editor (self-contained means that the rest of your system does not need to be Unicode enabled), and U8 which is used by IM implementations like the one which the MPI media annotation tool ELAN / EUDICO uses (it can use GATE IM as well). GIM and Yudit both use ASCII / ISO-8859-1 encoding.


U8 is good for mappings with huge numbers of glyphs and with choice lists, for example Chinese, but has some other disadvantages: You cannot add comments to .u8 files and you have to use an UTF-8 enabled editor (like the Editor.java that comes with the GATE Unicode Kit) to view and edit it. Yudit is a very friendly format (easy comments, lots of ways to escape characters that you cannot or do not want to have in the .kmap file). However, only GIM allows you to specify the text that should be sent separately from the text that should be shown on the virtual keyboard (neither Yudit nor U8 are using virtual keyboards. Talking about special features, Yudit has some "draw that CJK glyph and I will type it for you" function). And - probably more important - only GIM allows you to have key sequences with Control and Alt in addition to the normal modifiers (usually only Shift) involved. The GIM file format .gim seems to be a subset of a data format that is used by the Computing Research Labs of the New Mexico State University: Several of the GIM files that come with the GATE IM mention this place in the comments, and GATE IM itself does not make use of all possible features that GIM files can encode.


A final general remark: When you merge a file, you may get duplicate mappings. You can postprocess the files to fix that. However, when you have one key sequence leading to several output strings (or the other way round) which are more than just copies of one string, this effect will probably be useful. Further, merging a new file into the editor will overwrite all un-displayable data, like headers and those comments that are not assigned to particular key / string mappings. You can use a simple grep to fetch the "loose" comments from GIM or Yudit files in order to combine the comments of several files: Comments are lines that start with # or // in GIM, and anything between // (which is not escaped, so you do not need to worry about that) and the end of the affected line in Yudit. In Yudit, whitespace is encoded by giving the space as a glyph number, and un-encoded spaces are igored on import. This means that you simply write / / if you mean // as data, as opposed to // which starts a comment. The trouble is that Yudit allows quotes to be escaped as \", so be careful when writing your own parsers. Standard C style parsers are optimal for Yudit files. See the warning about "multiplying" Yudit files below: Files where the first assignment contains + signs but no = sign are in that special format. This Input Method Editor does only handle normal Yudit files.




BUGS:


Be warned that Java 1.3 for Unix is likely to crash when using the clipboard menu items. Using Ctrl-C and Ctrl-V is okay, though. Keyboard shortcuts might not be available in the scroll to field and the main tables, depending on which Input Method you are using. You may have to toggle the ":Show USED glyphs" setting, scroll or resize the tables, if the updated contents are not displayed correctly after a load or merge operation. The file selection window is kind of simplistic in Java 1.3, but okay in Java 1.4 - however, the user can create directories and rename files from there, which cannot be blocked easily. Further, you may have to toggle the "show virtual keyboard" and "block IM" settings to synchronize the IM editor settings with the actual status of the GATE IM.


HINTS, LIMITATIONS, PLANS:


The GATE Input Method files (*.gim) are only parsed partially by GATE itself. Headers are ignored and only one half of a digit assignment is processed. The idea with digit assignments is that a future GATE Input Method version would have a toggle keystroke to toggle between national and ASCII digits. The Input Method Editor splits such statements during import, so that the normal digit key will give the ASCII digit and pressing Control d before a digit key will give the national digit.


GATE Input Methods use a file "im.list" as source of the locale names for the GIM files, while the Input Method Editor lets you edit some header in the files: You have to update im.list by hand for current GATE Input Method versions that do not parse that header but only use the im.list file.


Text files (UTF-8 or plain) are meant for human readers and may fail to re-import. Use Yudit or GATE GIM files if you want to process the exported data further (or use it with Yudit or GATE, of course).


Even in GIM, you cannot have keycaps that differ from the strings to be sent (this means the virtual keyboard looks are less flexible than what GATE IM would allow - maybe storing the looks in the comment column during editing would help). This is a limitation of the Input Method Editor only, not of the GATE IM implementation!


Yudit allows a format where several sections are "multiplied" with each other. You will get unuseable results when you try to import or export such files (currently, Yudit only uses this feature for Hangul and for a keymap where you can type Unicode glyphs as "U1234"). I can write a Perl program to convert from "multiplied" to standard Yudit keymap format (the *.kmap text format, you cannot use the binary variant with the IM editor).


There are currently only UK and US keyboard layouts available. Others, like German, are planned. Contact me if you have whishes or suggestions (Eric Auer, or contact the MPI). You can not use AltGr yet - some US / UK keyboards treat it as Ctrl-Alt, but GATE IM normally only allow either Alt or Ctrl to select a "shift layer". Because the IM editor accepts more syntactic variants for "Alt", it can probably be used to fix broken GIM files by importing and exporting them.


You cannot use any special keystrokes with Yudit or U8 files: Those files describe a translation from text to text, not (like GIM) from keystrokes to text. Arbitrary keystrokes can be described in the XGIM language extension (see below), but this is probably not going to be processed by any Input Method system. It is unlikely that XGIM will become the official file format of any IM implementation.


GATE IM does not allow in-line comments in GIM files, while the IM editor does in some cases during import. The editor will export the comments as separate comment lines to be safe. It will also store an export imported GIM headers, but does not use them. If you need to edit GIM headers, simply use a text editor to edit the GIM file. You may have to restart Java to activate the changes, and you do have to restart Java to add new GIM files to the GATE IM collection (edit "im.list" for that


U8 files may not contain any comments, so when you save to U8, you will loose all comments. Yudit allows very flexible comments, but not GIM headers. All GIM headers will become comments when you export to Yudit, and cannot be re-imported from Yudit. Importing and exporting files will cause all comments that are not belonging to a line to end up on top of the file. Note that using GIM format means that no comments belong to any line. The order of the comments is, however, preserved. They will just not stay between data lines.




APPENDIX: Possible escapes for the key sequence fields:



KeyStroke description strings have the form:


<modifiers>* (<typedID> | <pressedReleasedID>)
modifiers := shift | control | meta | alt | button1 | button2 | button3
typedID := typed <typedKey>
typedKey := single unicode character
pressedReleasedID := (pressed | released)? key
key := KeyEvent keycode name, i.e. the name following "VK_".
(x | y)? means at most one of (x or y)
a* means any number (even zero) of items from a
a | b | c means a or b or c


Read the Java KeyEvent and KeyStroke documentation for more details.



Last updated: Nijmegen (NL), March 2003, by Eric Auer.
Licensing information updated February 2004.