In News: Researchers from IIT Madras developed a method for reading documents in Bharati script using a multi-lingual optical character recognition (OCR) scheme. Earlier they developed a unified script for nine Indian languages, named the Bharati Script.
About Bharati Script:
- It is an alternative script for the languages of India developed by a team at the Indian Institute of Technology (IIT) in Madras led by Dr. Srinivasa Chakravarthy.
- Type of writing system: alphabet
- Direction of writing: left to right in horizontal lines
- Used to write: all major languages of India
- The scripts that have been integrated include Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Telugu, Kannada, Malayalam and Tamil.
- The Bharati characters are made up of three tiers stacked vertically.
- It has 17 vowels and 22 consonants.
A common script for the entire country is hoped to bring down many communication barriers in India.
Optical Character Recognition (OCR) scheme:
- The first step is separating (or segmenting) the document into text and non-text.
- The text is then segmented into paragraphs, sentences, words and letters.
- Each letter has to be recognised as a character in some recognisable format such as ASCII or Unicode.
- The letter has various components such as the basic consonant, consonant modifiers, vowels etc.