AH Formatter V5.0 can hyphenate over 40 languages. There is no need to prepare the dictionary.
AH Formatter V5.0 supports the hyphenation for the following languages.
Code | Language | Characters | |
---|---|---|---|
af | afr | Afrikaans | Latin characters and Apostrophe |
bg | bul | Bulgarian | Cyrillic characters |
ca | cat | Catalan | Latin characters and Apostrophe and Decimal point (Full stop or Middle dot) |
cs | ces | Czech | Latin characters |
cy | cym | Welsh | Latin characters and Apostrophe |
da | dan | Danish | Latin characters and Apostrophe |
de | deu | German / Swiss German | Latin characters and Apostrophe |
el | ell | Greek | Greek characters |
en | eng | English | Latin characters and Apostrophe |
en-US | eng-US | American | Latin characters and Apostrophe |
eo | epo | Esperanto | Latin characters |
es | spa | Spanish | Latin characters |
et | est | Estonian | Latin characters |
eu | eus | Basque | Latin characters |
fi | fin | Finnish | Latin characters |
fr | fra | French / Canadian French | Latin characters and Apostrophe |
ga | gle | Irish (Erse or Gaelic) | Latin characters and Apostrophe |
hr | hrv | Croatian | Cyrillic characters or Latin characters |
hu | hun | Hungarian | Latin characters |
id | ind | Indonesian | Latin characters and Apostrophe and Digit 2 |
is | isl | Icelandic | Latin characters |
it | ita | Italian | Latin characters and Apostrophe |
la | lat | Latin | Latin characters |
lt | lit | Lithuanian | Latin characters |
lv | lav | Latvian | Latin characters |
ms | msa | Bahasa Malay | Latin characters and Apostrophe and Digit 2 |
mt | mlt | Maltese | Latin characters and Apostrophe |
nl | nld | Dutch / Flemish | Latin characters and Apostrophe |
no | nor | Norwegian | Latin characters and Apostrophe |
pl | pol | Polish | Latin characters |
pt | por | Portuguese / Brazilian | Latin characters |
ro | ron | Romanian / Moldavian | Latin characters and Apostrophe |
ru | rus | Russian | Cyrillic characters |
sk | slk | Slovak | Latin characters and Apostrophe |
sl | slv | Slovenian | Latin characters and Apostrophe |
sr | srp | Serbian | Cyrillic characters or Latin characters |
sv | swe | Swedish | Latin characters and Apostrophe |
sw | swa | Swahili | Latin characters and Apostrophe |
th | tha | Thai | Thai characters |
tr | tur | Turkish | Latin characters |
uk | ukr | Ukrainian | Cyrillic characters |
To use Czech hyphenation the following is placed in the fo file:
<fo:block hyphenate="true" language="ces">
Všichni lidé rodí se svobodní a sobě rovní co do důstojnosti a práv. Jsou nadáni rozumem a svědomím a mají spolu jednat v duchu bratrství.
</fo:block>
When country code is specified like xml:lang="nl-BE", country codes other than "en-US" are ignored.
It's not necessary to prepare the dictionary with AH Formatter V5.0. However, there may be a case that you want to treat the unexpected hyphened words as exceptions. In such case, it is possible to register the words in the exception dictionary.
The exception dictionary is stored in the hyphenation folder in the AH Formatter V5.0 installation folder or in the folder where the AHF50_HYPDIC_PATH (AHF50_64_HYPDIC_PATH for Windows x64 version) environment variable indicates. The name of the dictionary file conforms to the following rules, which is same as TeX dictionary.
For example: de.xml, en_US.xml
The following shows the content of exception dictionary.
Element | Location | Description |
---|---|---|
<hyphenation-info> | root element | |
<hyphen-char> | child of <hyphenation-info> | The element that indicates the hyphenation character alternative to <hyphen/> in the exception element. Hyphenation character is expressed by the value attribute. The initial value is "-" (U+002D). |
<exceptions> | child of <hyphenation-info> | A data of exception dictionary. The text of the exception element is a collection of hyphened words divided by white space. The hyphen information is indicated by the hyphen element, however the character specified by the hyphen-char element can also be used. |
<hyphen> | child of <exceptions> | A full functional hyphen equivalent to TeX's \discretionary. Hyphen element has the pre, post and no attributes. The pre attribute indicates the strings inserted before the hyphenation character when a hyphenation break occurs, The post attribute indicates the strings inserted after the hyphenation character when a hyphenation break occurs, the no attribute indicates the strings appearing when a hyphenation break does not occur. Hyphen element is used when the spelling changes when a hyphenation break occurs. |
<non-eol-words> | child of <hyphenation-info> | Specifies non-end-of-line words by dividing with white space. The word specified here is adjusted not to placed at the end of line, however in some case it's inevitable. The non-end-of-line process is effective all the time, independent of the hypenate property in FO. |
The DTD of Exception Dictionary is simple as follows:
<!ELEMENT hyphenation-info (hyphen-char?, exceptions?, non-eol-words?) > <!ELEMENT hyphen-char EMPTY > <!ATTLIST hyphen-char value CDATA #REQUIRED > <!ELEMENT exceptions (#PCDATA|hyphen)* > <!ELEMENT hyphen EMPTY > <!ATTLIST hyphen pre CDATA #IMPLIED > <!ATTLIST hyphen no CDATA #IMPLIED > <!ATTLIST hyphen post CDATA #IMPLIED > <!ELEMENT non-eol-words #PCDATA >
Suppose the followning exceation dictionary is prepared.
<hyphenation-info> <exceptions> ta-ble present ba<hyphen pre="k" no="c"/>ken </exceptions> </hyphenation-info>
The word table has a posobility of being hyphened only as ta-ble, the word present never be hyphened. The word backen is hyphened as bak-ken. And ta<hyphen/>ble is quite equivalent for ta-ble in this example.
Possible to specify the hyphenation by the hyphen element that change the spelling of the word.
Settings for Exception Dictionary | Word | Hyphenation |
---|---|---|
ab<hyphen/>def | abdef | ab-def |
ab<hyphen no="c"/>def | abcdef | ab-def |
ab<hyphen pre="x"/>def | abdef | abx-def |
ab<hyphen pre="x" no="c"/>def | abcdef | abx-def |
ab<hyphen post="z"/>def | abdef | ab-zdef |
ab<hyphen no="c" post="z"/>def | abcdef | ab-zdef |
ab<hyphen pre="x" post="z"/>def | abdef | abx-zdef |
ab<hyphen pre="x" no="c" post="z"/>def | abcdef | abx-zdef |
It's also available to do hyphnate using the TeX dictionary with AH Formatter V5.0. To hyphenate by Tex dictionary, it's necessary to specify HyphenationOption="false" in the Option Setting File. Dictionaries will be required for all the necessary languages. Dictionaries are XML files that are the same format as FOP. See also the Apache Website. Only the hyphenation dictionary for English (en.xml) is ready and provided with XSL Formatter V4.0.
Hyphenation Dictionaries are stored in the "hyphenation" folder where AH Formatter V5.0 is installed. The file name of Hyphenation Dictionary follows the rules shown below.
For example : de.xml, en_GB.xml The 3 letter language code in FO is converted to the 2 letter language code automatically. When the country code is also specified in the language setting as below, first the hyphenation dictionary en_GB.xml is detected, then if it's not found, the hyphenation dictionary en.xml is detected. In this case the country code is ignored.
The contents of Hyphenation Dictionary are defined in the hyphenation.dtd. hyphenation.dtd is included in FOP distribution. In AH Formatter V5.0, it is installed in the hyphenation folder where AH Formatter V5.0 is installed. Below is a brief explanation of the DTD. Refer to hyphenation.dtd for more details.
Element | Location | Description |
---|---|---|
<hyphenation-info> | root element | |
<hyphen-char> | child of <hyphenation-info> | This element expresses hyphenation characters in the exception dictionary data. Hyphenation character is expressed by the value attribute. Initial value is "-" (U+002D). But the hyphenation characters in the actual formatted result are given by the hyphenation-character property in the XSL specification. |
<hyphen-min> | child of <hyphenation-info> | When hyphenation break occurs, before and after attributes give the minimum number of characters in a hyphenated word before or after the hyphenation character. Before attribute is mapped to XSL hyphenation-remain-character-count property, after is mapped to XSL hyphenation-push-character-count. AH Formatter V5.0 uses these properties and the hyphen-min element in the dictionary is ignored. |
<classes> | child of <hyphenation-info> | Defined as character equivalent class. Text of classes' element is white space-separated list of character groups, all characters in a group are to be treated equivalent. Actually each group consists of lowercase and uppercase characters. Following is a sample of English dictionary (en.xml).
aA bB cC dD eE fF gG hH iI jJ kK lL mM nN oO pP qQ rR sS tT uU vV wW xX yY zZ
|
<pattern> | child of <hyphenation-info> | The hyphenation patterns, space separated. A pattern consists of character and digits. Character is the beginning characters of classes groups. (normally lowercase). Digits between characters indicate the strength of hyphenation potential (hyphenation value). |
<exceptions> | child of <hyphenation-info> | Data of hyphenation exception dictionary. Text of exceptions element consists of space-separated list of hyphenated words. A hyphen is indicated by the hyphen element, but you can use character defined in hyphen-char element. Exceptions element is used when hyphenation points determined by hyphenation-pattern dictionary are not appropriate or you want to use special hyphenation patterns of your own. |
<hyphen> | child of <exceptions> | A full functional hyphen equivalent to TeX's \discretionary. Hyphen element has the pre, post and no attributes. The pre attribute indicates the strings inserted before the hyphenation character when a hyphenation break occurs, The post attribute indicates the strings inserted after the hyphenation character when a hyphenation break occurs, the no attribute indicates the strings appearing when a hyphenation break does not occur. Hyphen element is used when the spelling changes when a hyphenation break occurs. |
If the sentence is placed in the narrow region and there occurs plural hyphenation for one word, sometimes the result does not follow the exception dictionary.