Skip to content

Categories:

Detect non US-ASCII character entry in a text input field

How would you detect arabic character entry in a text input field with the help of javascript? Recently I had to do this in one of my project and I was doing some research for this. Thanks to Pablo for a great piece of information on this.

“First 128 characters (US-ASCII) need one byte. The next 1,920 characters need two bytes to encode. This includes Latin letters with diacritics and characters from Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets. Three bytes are needed for the rest of the Basic Multilingual Plane (which contains virtually all characters in common use). Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters and various historic scripts.” – Wikipedea.

So arabic letters are double bytes and english letters are single byte characters. Just need to write a javascript function to find out the bytecount of the text

 function  isDoubleByteCharacter(Text){
        var countMe = Text;
        var escapedStr = encodeURI(countMe);
        if (escapedStr.indexOf("%") != -1) {
            var count = escapedStr.split("%").length - 1;
            if (count == 0) count++  //perverse case; can't happen with real UTF-8
            var tmp = escapedStr.length - (count * 3);
            count = count + tmp;
        } else {
            count = escapedStr.length;
        }
       // if the text is US-ASCII then the total byte count will be equellent to the length of the text
        if (count > Text.length){
		return true; // yes its non US-ASCII text
	}
	else{
		return false; // no it's not US-ASCII text
	}

Posted in Web development.

Tagged with .


0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.



Some HTML is OK

or, reply to this post via trackback.