top of page
CURRICULUM VITAE

EDUCATION

Never stop learning!

Ph.D. in Electrical and Computer Engineering,

University of Maryland, College Park, MD, May 2003.

Ph.D. Thesis: Probabilistic Methods for Searching OCR-Degraded Arabic Text

Thesis Supervisor: Dr. Douglas W. Oard

Major area: Computer Engineering

Minor area: Microelectronics

GPA: 3.73


Masters Degree in Electrical and Computer Engineering,

University of Maryland, College Park, August 1999.

GPA: 3.40


Bachelor of Electrical and Computer Engineering,

University of Maryland, College Park, December 1995.

Magna Cum Laude, GPA: 3.88

SENIOR SCIENTIST - QATAR COMPUTING RESEARCH INSTITUTE

Feb. 2011 - Present

  • Defining the priorities of information retrieval and natural language processing research at QCRI

  • Leading and participating in collaborations between QCRI and external entities such as Aljazeera.net and Boeing

  • Conducting productive research in the following:

    • Social search

      • Improving language handling for search using language dependent and language independent methods

      • Performing topic detection and filtering in social media

      • Developing cross media (social media and news) summarization (ex. tweetMogaz.com)

    • Web search

      • Building the infrastructure for an Arabic web search engine including: crawling, distributed indexing, distributed search, etc.

    • Basic natural language processing

      • Developing tools to perform stemming, part of speech tagging, named entity recognition, phrase detection, automatic language detection, Arabizi to Arabic conversion, automatic diacritization, parsing, etc.

    • Social computing

      • Developing technologies for the automated analysis and understanding of social media streams

      • Applying the methodologies to specific case studies including turmoil in Egypt, ISIS sympathizers, Islamophobia, and xenophobia.

RESEARCHER - MICROSOFT, CAIRO

Mar. 2007 – Feb. 2011

Conducting productive research in the following projects:

  • BookWeb:

    • Exploring digitized content by cross linking topical segments and automatically extracted key phrases to topical segments

    • Automatically identifying table of content pages and linking their entries to appropriate pages

    • Automatic constructing of tables of content by identifying headlines in digitized content

    • Improving search of digitized content by identifying most valuable parts of content

    • Resulting in: ThinkWeek paper; invention disclosure; collaboration with MSRC (Natasa Milic-Frayling's group); joint TechFest 2008 demo with MSRC; papers in INEX and CIKM workshop.

  • IBIS:

    • Measuring Bing's (formerly Live) search effectiveness for Arabic

    • Working on improving search effectiveness:

      • Index coverage: supplied a white list of good Arabic URL's for manual boosting of their static rank

      • Word breaking: designed, tested, and helped code a new Arabic word breaker

    • Resulting in: a ThinkWeek paper; a white list that helped quadruple Bing's Arabic index; a word-breaker that was checked-in into Bing's tree

  • Transbulletization:

    • Attempting to trim superfluous parts from sentences that are translated from different languages into English using parsing and statistical summarization techniques without breaking the flow of sentences

    • Integrating concept into a cross-language search application

    • Resulting in: TechFest 2009 demo; invention disclosure

  • Enterprise documenting linking:

    • Attempting to help users navigate intranets by providing contextual links/recommendations based on a user's:

      • Current context -- searching using salient term in context

      • Browsing history -- utilizing relevance feedback and filtering

    • Providing related resources such as documentations, people pages, etc.

    • Resulting in: a TechFest 2009 demo; tool to become TechFest 2010 official search/browsing tool; incubation with FuturePoint group

  • Search results diversification:

    • Attempting to identify a query's different meanings (extrinsic diversity) or different facets (intrinsic diversity) to diversify top search results

    • Using knowledge bases to perform diversification

    • Using density-based clustering to perform diversification

    • Resulting in: paper submitted to ECIR-2010; participation in TREC-2009 relevance feedback track (in collaboration with MSRC's Stephen Robertson's group) with an oral presentation at TREC; collaboration with PAMI group in University of Waterloo (Dr. Mohamed Kamel)

  • OCRless retrieval:

    • Language independent searching of document images without performing OCR by clustering similar connected components and rendering queries into images

    • Applying IR techniques such as weighted structured queries to improve retrieval effectiveness

    • Applying density based clustering to improve clustering

    • Resulting in: TechFest 2009 demo; invention disclosure; SPIRE-2009 paper

  • Machine translation:

    • Developing Arabic language handling to improve Ar <=> En MT

    • Designing and testing an Arabic word breaker for Ar => En MT to improve lexical coverage

    • Helping design and test Arabic reverse word breaker for En => Ar MT

    • Performing acronym expansion to aid En => any_language MT

    • Detecting named entities as targets for transliteration

    • Transliterating Ar <=> En named entities

    • Resulting in: Tech transfer of Arabic word breaker in MSR MT system; reverse word breaker tech transfer expected in Nov. 2009

  • Bing instant answers:

    • Performing cross language search to improve image search for Arabic by overcoming the lack of Arabic meta-information about images

      • Arabic queries translated into English and results are shown to user

    • Translating English instant answers automatically into Arabic

    • Translating English Wikipedia Info-boxes into Arabic to aid in on Arabic question answering based on knowledge bases

Resulting in: 2 ThinkWeek papers; development of instant answers in progress

RESEARCHER - IBM, CAIRO

Mar. 2005 – Mar. 2007

  • Performing productive research and development in the following areas:

    • Arabic OCR degraded text retrieval

      • Incorporating character, word, stem, and stem-template based language modeling to improve error correction

      • Adapting blind relevance to OCR degraded documents

      • Investigating/developing techniques for developing relevance judgments without pooling for OCR degraded collections

      • Developing degraded Arabic word clustering to improve search term highlighting

    • Machine/Human Assisted Human/Machine translation

      • Using the output of a machine translation system to build localized language models to improve type-ahead for translators

      • Incorporating speech recognition technology with existing machine translation technology to speed-up human translation

    • Information extraction from biomedical text

      • Employing unsupervised learning techniques to infer patterns that contain relationships/interactions between biomedical named entities

      • Applying inferred models in extracting protein-protein interaction

    • Adaptive cross-language text filtering

  • Team leader, InfoMind Project

    • Integrating MT, IE, IR, adaptive filtering, and information visualization

    • Managing research engineers in varying parts of the projects

ASSOCIATE PROFESSOR - CAIRO UNIVERSITY, CAIRO

Aug. 2005 – Aug. 2016

Information Systems Department, Faculty of Computer Science and Informatics

· Designing and teaching of Courses

o Data structures

o Unstructured document retrieval

  • Managing teaching assistants

  • Supervising research assistants in

    • Rapid development of IR test collection

    • Arabic-Hebrew cross language retrieval

    • Employing Arabic morphological analysis in word clustering and highlighting of search results

    • Interactive query expansion

    • Automatic web page structural analysis

    • Wikipedia named entity tagging

  • Supervising senior graduation projects: natural language question authoring; multi-lingual desktop search; affect resolution; browser history caching and search; Arabic named entity recognition

Egyptian Ministry of Communication and Information Technology Research Center of Excellent (co-PI)

  • Developing a web portal for aggregating Arabic web news (www.alzoa.com)

    • Deploying state-of-the-art Arabic text search

    • Using automatic document clustering

    • Investigating automatic Arabic phrase extraction for summarization purposes

    • Investigating ways to anonymously customize web pages to suite specific users

    • Exploring ways to make news interactive and solicit user input

LECTURER - GERMAN UNIVERSITY IN CAIRO

Jan. 2004 - Aug. 2005

Department of Information Engineering and Technology

  • Designing and teaching introductory Electrical Engineering Courses

    • Basic Circuit Theory

    • Electric Circuits Lab

    • Digital Logic Design

  • Managing teaching and research assistants

  • Performing research in Arabic information retrieval

    • Comparing different word stemming and clustering techniques for Arabic information retrieval

    • Devising new methods for rapid construction of test collections for monolingual and cross-lingual retrieval

  • Performing research in Bioinformatics

    • Evaluating the effectiveness of controlled vocabularies in information retrieval

    • Designing automatic methods for assigning controlled vocabulary entries to documents

    • Participating in the 2004 Text REtrieval Conference (TREC) Genomics Track (Ranked 3rd in the Adhoc Retrieval task and 6th in the Triage task)

  • Collaborating with the Library of Alexandria in the Million Book Project

    • Introduction and evaluation of information retrieval technology to scanned and OCR’ed Arabic books

    • Exploration and evaluation of error tolerant Arabic word clustering and morphology techniques

SENIOR CONSULTANT - KEVRIC, SILVER SPRING, MD

Jun. 2003 – Feb. 2004

Knowledge Management project – under contract from the National Institutes of Health (NIH)

  • Served as Principal Investigator (PI) on the project to evaluate emerging knowledge management technologies intended to facilitate NIH’s grant review process

  • Identifying the needs of scientific review administrators tasked with routing incoming grant proposals to appropriate reviewers

  • Researching existing technologies and developing alternative ones to address the stated needs

  • Establishing criteria for evaluating potentially viable technologies

  • Evaluating the usability and effectiveness of the different technologies

The BISC project – under contract from NIH

  • Served as a research scientist to facilitate the adoption and integration of varying biomedical ontologies intended to support clinical research applications

  • Migrating different biomedical ontologies in varying formats (such DAML and OWL) to standardized formats to facilitate their integration

  • Designing applications that rely on the ontologies and defining insertion points for integrating the ontologies into the applications

  • Working on interfacing applications and the ontologies using API’s

bottom of page