Sunday, December 13, 2020

Aksharamukha: An Open Source Transliteration Tool

 We probably might have used the google transliteration or many other transliteration services, There is a new open source transliteration tool in town, its called Aksharamukha. This tools main objective is to transliterate the languages primarily in the South Asia ,South East Asia to Romanized scripts. Currently it supports 85 Languages. We can use either online or you can install this tool as a python package. 

To use it as an online tool goto this website.   Select the source langue and its target language and you are good to go. If you want to use it in our  computer you can use it as an python package. Run the following command 

pip3 install aksharamukha
Now start python3 and type 

from aksharamukha import transliterate
transliterate.process('HK', 'Telugu', 'buddhaH')

Your transliterated word will now appear in the console. Thats it.

Typesetting in Tamil Brahmi in XeLaTeX

         Recently I found that Tamil Brahmi which is the precursor to the modern Tamil Language was included in the Unicode Standard. It is encoded in the range 11000-1107F. So I thought if its encoded in Unicode I wondered if I could typeset in Latex.This is my experiment. First we need the fonts for Brahmi and fortunately there is a font called Adinatha Tamil Brahmi and also its font manual.  and another font from Google Noto-Fonts called Noto Sans Brahmi.

         I Checked if I could type freely using IBus but unfortunately you can only enter characters one at a time this would be very time consuming, Luckily for us there is a transliteration tool from the authors of the font called Aksharamukha. To transliterate, set the base as Tamil and the output as Tamil Brahmi and you are good to go. Don't set the base as Tamil extended it doesn't work. Remember Aksharamukha is not a Translation tool.  Type your work and transliterate and copy the Tamil Brahmi letters. 

       Download and install the font. Run the following commands 
sudo fc-cache -v
sudo mkfontscale
sudo mkdir  
 IF you are using LuaLaTex run the following command to load the font 
luaotfload-tool --find "Adinatha Tamil Brahmi"
This  will load the font. Now you can typeset in either XeLaTex or LuaLaTex. Since both Polyglossia and  Babel don't support the Tamil Brahmi language natively we are turning to the fontspec package.  Here is the minimal working example. 

\newfontfamily{\TACtam}{Lohit Tamil}
\newfontfamily{\bram}{Adinatha Tamil Brahmi}
Here is the screenshot.  

Image showing containg the word hello in three languages

Well my document compiled perfectly. Once again kudos to
Vindodh Rajan, Shriramana Sharma and Udhaya Sankar for the Font and Vindoh Rajan For maintaining the Aksharamuka transliteration tool. 


Saturday, December 12, 2020

Some Essential difference in Writing a letter in Tamil using XeLaTeX

Typesetting  a Tamil letter in XeLaTeX is essentially the same as typesetting a letter in  English.But it has some significant difference. I will list it out below. 

1. Load the Polyglossia or Babel Package

To typeset in Tamil  one has to load either the polyglossia or Babel package. 
Add the following to your preamble.

% if you need English
% if you need English then set the font
\newfontfamily\englishfont{Times New Roman}[Scale=MatchLowercase,Renderer=Harfbuzz,Ligatures=TeX]  
OR if you are using the Babel Package then add the following to your preamble 

\babelprovide[main, import]{tamil}
% if you need English

% if you need English then set the font

2. Date is Messed up. 

If you want the date to print automatically it follows Year-Month-date format Since  most people Follow either dd/mm/yyyy or month-date-year format. So you have to set manually the date, to do this load the datetime package
\date{dd/mm/yyyy}   or
\date{month date, year} 

3. You Can't use \cc or \encl command  

Unless you create your own class and style it is impossible to use the native \cc and \encl command in the letter class. However we can do it manually in the letter class. Since CC and Enclosures are placed after the signature we manually  typeset them.
In Tamil cc is typed as நகல்  and enclosures as இணைப்புகள்.You can use the following commands after the signature in letter.

நகல்:{\par\hspace{10mm}{1. அ \par\hspace{10mm} 2. ஆ}}
{2. அ \par\hspace{10mm} 2.ஆ}}

Positioning a image at the top of the page in LaTeX

To position an image at the top of the page first begin the page with that image and use the following packages 
  • seqsplit
  • textpos
  • graphicx   
To position the image as a text block, it follows the syntax

\textblockorigin{x}{y} % co-ordinates for  origin
 \begin{textblock*}{x}{x,y}% co-ordinates
Now type the following code  in  your document

        \end{textblock*} ~\\[3em]

Miscellaneous Tamil Symbols in LaTeX

     Apart from the Tamil fractions and numerals there are other miscellaneous symbols. These can be typeset in XeLaTex or LuaLaTeX. Make sure you have installed the Noto Sans Tamil Supplement Font. To insert the Unicode character say in TeX studio press CTRL+ALT+u and input the Unicode number in the dialog box and press enter. i will mention the Unicode block and its corresponding character.  Whenever we can type the symbols in IBus [uses the Tamil99 layout] I'll mention the  keystrokes in the first column.

Measures of grain

 11FD5    𑿕   1 நெல்     one grain of paddy
11FD6    𑿖   செவிட்டு   360 grains of paddy
11FD7    𑿗    ஆழாக்கு  1,800 grains of paddy
11FD8    𑿘    உழக்கு    3,600 grains of paddy
-        உரி   2 உழக்கு  7,200 grains of paddy
11FD9    𑿙   மூவுழக்கு  10,800 grains of paddy
0BF3         படி        14,400 grains of paddy
11FDA    𑿚    குருனி*    115,200 grains of paddy
11FDB    𑿛   பதக்கு     230,400 grains of paddy
11FDC    𑿜   முக்குருனி*  345,600 grains of paddy
* மரக்கால் என்றும் அழைக்கலாம் 
* can also be called Marakkaal

Old currency symbols 

11FDD    𑿝    காசு        Paise
11FDE    𑿞    பணம்      Money
11FDF    𑿟    பொன்     Gold coin
11FE0    𑿠    வராகன்    Gold coin bearing boar insgignia

Symbols of weight, length, and area

11FE1    𑿡    பாரம்     Equals approx 227 kg
11FE2    𑿢    குழி*     Approx 121 sq ft
11FE3    𑿣    வேலி     Approx 242,000 sq ft

Agricultural symbols         

11FE4    𑿤    நன்செய்    Wet Cultivation
11FE5    𑿥    புன்செய்    Dry Cultivation
11FE6    𑿦    நிலம்       Land
11FE7    𑿧    உப்பளம்    Salt Pan

Clerical symbols  

11FE8    𑿨    வரவு        Credit
11FE9    𑿩     எண்        Number
11FEA    𑿪    நாளது       Current date sign
11FEB    𑿫    சில்லரை     Change
11FEC    𑿬    போக         Spent sign
11FED    𑿭    ஆக         Total Sign

Other symbols and abbreviations  

11FEE    𑿮    வசம்        Posession
11FEF    𑿯    முதல்        Principal [Money]
11FF0    𑿰    முதலிய      et cetera
11FF1    𑿱    வகையரா   indicates items of a family or kind 
11FFF    𑿿    -            End of Text

Symbols in the Standard Tamil Unicode Block

A    0BF3        Day also pillaiyar suli  நாள்
X    0BF4        Month                   மாதம்
C    0BF5      Year                    வருடம்
V    0BF6        Debit                   பற்று
B    0BF7       Credit                  வரவு
D    0BF8      As Above                 மேற்படி
A    0BF9        Rupee                   ருபாய் 
S    0BFA        Number          எண், நிலுவை
N    0BD0        Om                        -
L    0BF1        Raja                   ராஜ  

Some Special Symbols

G        🌕       Full Moon    பவுர்ணமி
       🌑       New Moon     அமாவாசை
J               Karthigai    கார்த்திகை
^$       ₹        Rupee        ரூபாய்  

To typeset the above symbols the following packages are needed they are 
  • lmodern
  • MnSymbol
  • wasysym
  • tfrupee
  • marvosym [has some other common symbols]
The full moon ,new moon and the star needs to be in math mode.Here is an MWE

\setmainfont{Times New Roman}
     $\newmoon \fullmoon $
    \rupee $\filledlargestar$
* Different Authorities give different measure for kuli. 
Most of these symbols are not in current use.
Some Measurement are still used in oral form.  
Some Symbols are used only for official documents although it has become rare nowadays.

Shriramana Sharma and others have done a huge work to make sure Tamil numbering and measuring system were included in Unicode. Also Shriramana Sharma is the author of the font Lohit Tamil Chart Font.Kudos to them for their hard work.  




Popular Posts