
CURRICULUM VITAE
EDUCATION
Never stop learning!
Ph.D. in Electrical and Computer Engineering,
University of Maryland, College Park, MD, May 2003.
Ph.D. Thesis: Probabilistic Methods for Searching OCR-Degraded Arabic Text
Thesis Supervisor: Dr. Douglas W. Oard
Major area: Computer Engineering
Minor area: Microelectronics
GPA: 3.73
Masters Degree in Electrical and Computer Engineering,
University of Maryland, College Park, August 1999.
GPA: 3.40
Bachelor of Electrical and Computer Engineering,
University of Maryland, College Park, December 1995.
Magna Cum Laude, GPA: 3.88
SENIOR SCIENTIST - QATAR COMPUTING RESEARCH INSTITUTE
Feb. 2011 - Present
Defining the priorities of information retrieval and natural language processing research at QCRI
Leading and participating in collaborations between QCRI and external entities such as Aljazeera.net and Boeing
Conducting productive research in the following:
Social search
Improving language handling for search using language dependent and language independent methods
Performing topic detection and filtering in social media
Developing cross media (social media and news) summarization (ex. tweetMogaz.com)
Web search
Building the infrastructure for an Arabic web search engine including: crawling, distributed indexing, distributed search, etc.
Basic natural language processing
Developing tools to perform stemming, part of speech tagging, named entity recognition, phrase detection, automatic language detection, Arabizi to Arabic conversion, automatic diacritization, parsing, etc.
Social computing
Developing technologies for the automated analysis and understanding of social media streams
Applying the methodologies to specific case studies including turmoil in Egypt, ISIS sympathizers, Islamophobia, and xenophobia.
RESEARCHER -Â MICROSOFT, CAIRO
Mar. 2007 – Feb. 2011
Conducting productive research in the following projects:
BookWeb:
Exploring digitized content by cross linking topical segments and automatically extracted key phrases to topical segments
Automatically identifying table of content pages and linking their entries to appropriate pages
Automatic constructing of tables of content by identifying headlines in digitized content
Improving search of digitized content by identifying most valuable parts of content
Resulting in: ThinkWeek paper; invention disclosure; collaboration with MSRC (Natasa Milic-Frayling's group); joint TechFest 2008 demo with MSRC; papers in INEX and CIKM workshop.
IBIS:
Measuring Bing's (formerly Live) search effectiveness for Arabic
Working on improving search effectiveness:
Index coverage: supplied a white list of good Arabic URL's for manual boosting of their static rank
Word breaking: designed, tested, and helped code a new Arabic word breaker
Resulting in: a ThinkWeek paper; a white list that helped quadruple Bing's Arabic index; a word-breaker that was checked-in into Bing's tree
Transbulletization:
Attempting to trim superfluous parts from sentences that are translated from different languages into English using parsing and statistical summarization techniques without breaking the flow of sentences
Integrating concept into a cross-language search application
Resulting in: TechFest 2009 demo; invention disclosure
Enterprise documenting linking:
Attempting to help users navigate intranets by providing contextual links/recommendations based on a user's:
Current context -- searching using salient term in context
Browsing history -- utilizing relevance feedback and filtering
Providing related resources such as documentations, people pages, etc.
Resulting in: a TechFest 2009 demo; tool to become TechFest 2010 official search/browsing tool; incubation with FuturePoint group
Search results diversification:
Attempting to identify a query's different meanings (extrinsic diversity) or different facets (intrinsic diversity) to diversify top search results
Using knowledge bases to perform diversification
Using density-based clustering to perform diversification
Resulting in: paper submitted to ECIR-2010; participation in TREC-2009 relevance feedback track (in collaboration with MSRC's Stephen Robertson's group) with an oral presentation at TREC; collaboration with PAMI group in University of Waterloo (Dr. Mohamed Kamel)
OCRless retrieval:
Language independent searching of document images without performing OCR by clustering similar connected components and rendering queries into images
Applying IR techniques such as weighted structured queries to improve retrieval effectiveness
Applying density based clustering to improve clustering
Resulting in: TechFest 2009 demo; invention disclosure; SPIRE-2009 paper
Machine translation:
Developing Arabic language handling to improve Ar <=> En MT
Designing and testing an Arabic word breaker for Ar => En MT to improve lexical coverage
Helping design and test Arabic reverse word breaker for En => Ar MT
Performing acronym expansion to aid En => any_language MT
Detecting named entities as targets for transliteration
Transliterating Ar <=> En named entities
Resulting in: Tech transfer of Arabic word breaker in MSR MT system; reverse word breaker tech transfer expected in Nov. 2009
Bing instant answers:
Performing cross language search to improve image search for Arabic by overcoming the lack of Arabic meta-information about images
Arabic queries translated into English and results are shown to user
Translating English instant answers automatically into Arabic
Translating English Wikipedia Info-boxes into Arabic to aid in on Arabic question answering based on knowledge bases
Resulting in: 2 ThinkWeek papers; development of instant answers in progress
RESEARCHER - IBM, CAIRO
Mar. 2005 – Mar. 2007
Performing productive research and development in the following areas:
Arabic OCR degraded text retrieval
Incorporating character, word, stem, and stem-template based language modeling to improve error correction
Adapting blind relevance to OCR degraded documents
Investigating/developing techniques for developing relevance judgments without pooling for OCR degraded collections
Developing degraded Arabic word clustering to improve search term highlighting
Machine/Human Assisted Human/Machine translation
Using the output of a machine translation system to build localized language models to improve type-ahead for translators
Incorporating speech recognition technology with existing machine translation technology to speed-up human translation
Information extraction from biomedical text
Employing unsupervised learning techniques to infer patterns that contain relationships/interactions between biomedical named entities
Applying inferred models in extracting protein-protein interaction
Adaptive cross-language text filtering
Team leader, InfoMind Project
Integrating MT, IE, IR, adaptive filtering, and information visualization
Managing research engineers in varying parts of the projects
ASSOCIATE PROFESSOR - CAIRO UNIVERSITY, CAIRO
Aug. 2005 – Aug. 2016
Information Systems Department, Faculty of Computer Science and Informatics
· Designing and teaching of Courses
o Data structures
o Unstructured document retrieval
Managing teaching assistants
Supervising research assistants in
Rapid development of IR test collection
Arabic-Hebrew cross language retrieval
Employing Arabic morphological analysis in word clustering and highlighting of search results
Interactive query expansion
Automatic web page structural analysis
Wikipedia named entity tagging
Supervising senior graduation projects: natural language question authoring; multi-lingual desktop search; affect resolution; browser history caching and search; Arabic named entity recognition
Egyptian Ministry of Communication and Information Technology Research Center of Excellent (co-PI)
Developing a web portal for aggregating Arabic web news (www.alzoa.com)
Deploying state-of-the-art Arabic text search
Using automatic document clustering
Investigating automatic Arabic phrase extraction for summarization purposes
Investigating ways to anonymously customize web pages to suite specific users
Exploring ways to make news interactive and solicit user input
LECTURER - GERMAN UNIVERSITY IN CAIRO
Jan. 2004 - Aug. 2005
Department of Information Engineering and Technology
Designing and teaching introductory Electrical Engineering Courses
Basic Circuit Theory
Electric Circuits Lab
Digital Logic Design
Managing teaching and research assistants
Performing research in Arabic information retrieval
Comparing different word stemming and clustering techniques for Arabic information retrieval
Devising new methods for rapid construction of test collections for monolingual and cross-lingual retrieval
Performing research in Bioinformatics
Evaluating the effectiveness of controlled vocabularies in information retrieval
Designing automatic methods for assigning controlled vocabulary entries to documents
Participating in the 2004 Text REtrieval Conference (TREC) Genomics Track (Ranked 3rd in the Adhoc Retrieval task and 6th in the Triage task)
Collaborating with the Library of Alexandria in the Million Book Project
Introduction and evaluation of information retrieval technology to scanned and OCR’ed Arabic books
Exploration and evaluation of error tolerant Arabic word clustering and morphology techniques
SENIOR CONSULTANT - KEVRIC, SILVER SPRING, MD
Jun. 2003 – Feb. 2004
Knowledge Management project – under contract from the National Institutes of Health (NIH)
Served as Principal Investigator (PI) on the project to evaluate emerging knowledge management technologies intended to facilitate NIH’s grant review process
Identifying the needs of scientific review administrators tasked with routing incoming grant proposals to appropriate reviewers
Researching existing technologies and developing alternative ones to address the stated needs
Establishing criteria for evaluating potentially viable technologies
Evaluating the usability and effectiveness of the different technologies
The BISC project – under contract from NIH
Served as a research scientist to facilitate the adoption and integration of varying biomedical ontologies intended to support clinical research applications
Migrating different biomedical ontologies in varying formats (such DAML and OWL) to standardized formats to facilitate their integration
Designing applications that rely on the ontologies and defining insertion points for integrating the ontologies into the applications
Working on interfacing applications and the ontologies using API’s