Ali Farghaly

Senior Knowledge Engineer
DataFlux Corporation | A SAS Company
ali.farghaly@sas.com

Dr. Ali Farghaly is Senior Knowledge Engineer at DataFlux Corporation, A SAS Company. He has successful experience in designing and implementing English to Arabic and Arabic to English machine translation systems, English and Arabic Entity Extraction, building knowledge bases and ontologies, developing sentiment analysis systems, designing and using tools for automating the population of ontologies .Prior to moving to DarFkux, he was a senior technical member in Oracle, USA. He is also founder of Language Applications and Services company. He is currently Visiting Professor of Arabic Computational Linguistics at Faculty of Computers and Information, Cairo Universityand Adjunct Professor at Monterey Institute of International Studies.

He firmly believes linguistic knowledge can, if used properly, enhance search quality and information retrieval.

Dr. Farghaly has received his PhD from The University of Texas at Austin in Linguistics. His specialties include: Machine Translation, Entity Extraction, Sentiment Analysis, Ontology and/or Knowledge Base development, Conceptual Search, Discourse Analysis

Challenges in Information Retrieval for Arabic Script-based Languages

Information Retrieval (IR)  has become one of the most widely used  Natural Language processing (NLP) applications that is now indispensable in the daily life of ordinary citizens. For example, it was estimated that in 2006 more than 200 hundred million searches were performed every day at google.com alone.  Web search, which is primarily concerned with document retrieval, is only one aspect of IR.  Question-answering systems, clustering and classification, topic detection and tracking, sentiment analysis and opinion mining, multilingual and multi-media IR .etc.,   constitute other important aspects of IR.   Since IR is primarily concerned with the encoding, decoding and creation of knowledge, it is imperative for countries and therefore their languages that are aspiring to end the monopoly of  the English language  in the acquisition and creation of knowledge   to be actively involved in research and development in IR.  Arabic Script-based languages and the countries where these languages are spoken represent a striking example as the advent of the Arab Spring is a precursor to what will follow. However, Arabic script-based languages do not belong to a single language family. For example, Arabic is a Semitic language but Farsi is an Indo-European language. The twenty languages which do use the Arabic script exhibit many differences in their internal structure but the fact that they all use the Arabic script imposes common challenges  when it comes to developing IR systems.  For example, like Chinese, Japanese and Korean, Arabic has neither capitalization nor strict punctuation. Both Chinese and Arabic use  the comma  to separate sentences while the period usually signals the end of the paragraph.

Print this pagePrint this page