Senior Research Engineer · AI & ML

Hi, I'm Emad. I build intelligent systems.

I design and build end-to-end ML systems — from large-scale annotation pipelines and entity resolution to LLM-powered summarization and agentic AI products for compliance and due diligence workflows. Seven years of industry experience, primarily in regulated enterprise environments.

Jul 2021 - Present

Senior Machine Learning Engineer II

Thomson Reuters

Toronto, Canada

Details & Impact

Senior Research Engineer at TR Labs, Thomson Reuters' applied research division, focused on CLEAR — the company's flagship due diligence platform for KYC/AML screening and investigations. Built production ML systems across the full lifecycle — including a commercially launched agentic investigation product, LLM-powered summarization features, and large-scale data infrastructure — in a regulated environment where model accuracy has direct compliance consequences.

Jul 2018 - Jun 2021

NLP & Machine Learning Engineer

INAGO INC.

Toronto, Canada

Details & Impact

Built NLP models for language understanding and automated text generation, including fine-tuning BERT and T5 for domain-specific tasks. Managed the full training lifecycle and led collaborative research projects with university partners.

Education

2014 - 2018

M.Sc. in Computer Science

York University

Toronto, Canada

GPA: 8.17 / 9

Thesis: Interactive Question Answering Using Frame-based Knowledge Representation

2010 - 2014

B.Sc. in Computer Engineering

Amirkabir University of Technology

Tehran, Iran

GPA: 17.18 / 20

Technical Skills

Python PyTorch Scikit-Learn Numpy Pandas SpaCy NLTK Transformers Large Language Models Prompt Engineering Agentic AI PydanticAI Retrieval-Augmented Generation (RAG) LLM Evaluation NLP Machine Learning Deep Learning XGBoost Apache Spark AWS Amazon SageMaker AWS CloudFormation Docker PostgreSQL Elasticsearch Opensearch OpenAI API Java Scala Bash Agile Development

Projects

Agentic Investigation System — CLEAR Investigate

2025 - 2026

Contributed to building CLEAR Investigate, Thomson Reuters' first agentic AI product, live in production for customers. Designed the agent experimentation architecture using PydanticAI for rapid prototyping of multi-step workflows across entity search, report retrieval, web search, and an internal knowledge graph. Implemented LLM caching (~25% cost reduction), automated LLM-as-judge evaluation pipelines, and SME annotation workflows for ground-truth benchmarking.

PythonPydanticAIOpenAI GPT-4Claude SonnetLLM-as-Judge EvaluationAWS

CLEAR Business AI — GenAI Report Summarization

2024

Built the AI-powered summary panel for CLEAR Business entity reports, live in production processing hundreds of reports daily. Designed a selective XML parsing engine and multi-prompt LLM architecture with separate calls for business overview, risk analysis, and social media discovery — with Bing Search integration and a verification LLM call to filter false positives. Implemented citation linking to source locations and ran SME annotation and evaluation rounds before launch.

PythonOpenAI GPT-4Claude SonnetBing Search APIPrompt EngineeringXML Processing

Entity Resolution Data Infrastructure — CLEAR KYC/AML Platform

2022 - 2024

Designed and built the data engineering infrastructure for an entity resolution system operating across ~800M entities and billions of documents at 700 idents/second. Architected a PII-isolated AWS environment and a unified versioned schema reconciling two incompatible annotation sources spanning hundreds of thousands of labeled records. Built Spark and Python pipelines for multi-format merging, conflict signal surfacing, and model experimentation, later automated into SageMaker Pipelines.

SparkPythonScikit-LearnAWS SageMakerCloudFormationS3EMROpenSearchRDS

Semantic Search Improvement — Checkpoint Tax Research

2021

Shipped a semantic search improvement for Checkpoint, Thomson Reuters' tax research platform, improving access to conceptual documents by 95%. Built a query intent classifier using sentence embeddings to dynamically promote higher-level content for general queries, delivered to production within a ~100ms latency budget.

PythonSentence EmbeddingsNLTKElasticsearchAWSJava

Automated Question Generation from Documents

2020

Fine-tuned T5 transformer for automated question generation, cutting manual data curation effort by 40%. Experimented with model input representations and evaluated using BLEURT as part of a university collaborative research project.

PyTorchSpaCyHuggingFace TransformersPython

Domain-Specific Language Understanding Engine

2019

Trained domain-specific Word2Vec embeddings and LSTM-based NLU models with interpretability testing to improve language understanding accuracy and model transparency.

PyTorchPythonWord2VecLSTM

Conversational Question Answering

2018

Built a domain-specific question answering dialogue system using syntactic and semantic document analysis and ontology generation, as part of a collaborative industry research project.

SpaCyPython

Publications

Question-worthy Sentence Selection for Question Generation

Canadian AI 2020 2020 · co-authored

Interactive Question Answering Using Frame-based Knowledge Representation

York University M.Sc. Thesis 2018

Time Aware Topic-based Recommender System

Big Data & Information Analytics 2016 · co-authored

A Study on Prediction of User's Tendency Toward Purchases in Websites based on Behavior Models

Information and Knowledge Technology (IKT), 6th Conference, IEEE 2014 · co-authored