Biography

Arif Khan's research interest includes graph algorithm, high performance computing, approximation algorithm along with their applications in bioinformatics, social network and machine learning. His goal is to explore how approximation algorithms can solve big graph problems using leadership class supercomputers.

Interests

  • Distributed Training
  • Combinatorial Optimization
  • Operations Research
  • Supply Chain Optimization
  • AI4Science

Education

  • PhD in Computer Science, 2017 Purdue University
  • MSc in Computer Science, 2011 University of Florida
  • BSc in Computer Science and Engg, 2006 Bangladesh University of Engg & Tech (BUET)

Projects

bMatching

Details: This project implements b-Suitor, a half-approximation algorithm for maximum weight b-matching in weighted graphs, where each vertex has a degree budget. The implementation targets large sparse inputs in Matrix Market format and supports configurable matching modes.

Major Contributions: The code operationalizes the SIAM SISC 2016 method with practical data structures for scalable matching and verification. It provides a reproducible baseline for studying approximation quality and runtime behavior on very large graphs.

Summary Result: The implementation demonstrates very fast convergence in iterative refinement and produces high-quality weighted matchings, making it useful for large network optimization workloads.

distMatching

Details: distMatching extends b-Suitor to distributed-memory systems using hybrid MPI+OpenMP parallelism. The design follows asynchronous supersteps to overlap computation and communication for high concurrency.

Major Contributions: The project introduces communication-reduction strategies and scalable scheduling choices that improve strong and weak scaling behavior on supercomputers. It turns an approximation algorithm into a production-grade distributed implementation.

Summary Result: As reported in SC'16, the method computes b-matching on graphs with up to 2 billion edges in under 4 seconds on 16K processors, showing high-end distributed performance.

netAlign

Details: netAlign focuses on network alignment via approximate matching with multithreaded and Python-accessible implementations. The repository combines algorithmic kernels with an OpenMP path and Python workflow support.

Major Contributions: The work translates the SC'12 network alignment approach into runnable software components suitable for experimentation and integration. It enables practical testing of alignment quality across graph datasets and mapping scenarios.

Summary Result: The implementation provides an efficient route to approximate network alignment on modern multicore systems while keeping the workflow flexible for research extensions.

QMAP

Details: QMAP implements algorithms for mapping sparse one-body and two-body interactions onto one-dimensional quantum circuit layouts. It combines assignment, matching, and ordering strategies tailored to sparse Hamiltonians.

Major Contributions: Based on the HiPC 2019 work, the project unifies MINLA/matching for one-body terms and HOLA/partial distance-2 coloring for two-body terms in one workflow. This design directly targets utilization and execution efficiency on constrained qubit connectivity.

Summary Result: Reported outcomes include up to 100% improvement for one-body mapping over prior methods and up to 86% utilization for two-body terms relative to theoretical peak.

WhatsApp ML Platform - Feature Engineering & TextRay Migration

Details: Built ML feature pipelines for WhatsApp Business Integrity, including recidivism, WABA lifecycle, and device signals for fraud/scam detection workflows.

Major Contributions: Led end-to-end TextRay migration from V20211208 to V20221219 with versioned embedding banks, phased JustKnobs rollout, and multi-version infrastructure across business entities.

Summary Result: Migration shipped with zero downtime and improved model data quality through label filtering and onboarding-ban handling in BM scam training.

MTIA/Athena Amodel - Elastic DLRM Training

Details: Built elastic distributed DLRM training for Meta's network-isolated Athena/MTIA Amodel environments where standard internal services are unavailable.

Major Contributions: Implemented custom launchers, trainers, module providers, TCP Store rendezvous, aten.fmod backend support, and graph-mode paths including AOT Autograd and TBE kernel work.

Summary Result: Delivered the first elastic HPC-style DLRM training workload on Athena silicon with improved backend operator parity and production-ready testing.

MCP Capacity Planning Solver - Time-Based Decomposition

Details: Designed Time-Based Decomposition for long-horizon datacenter planning by splitting large MIP formulations into sliding quarterly windows.

Major Contributions: Built iterative solve/merge framework, added scaling corrections and tests, automated hardware refresh output generation, and stabilized constraints with adaptive relaxations.

Summary Result: Enabled practical 4-5 year planning horizons, reduced infeasibility incidents, and replaced manual reporting steps with validated automated pipelines.

OPAL Solver - Performance Optimization

Details: Optimized OPAL expression-graph execution for very large graphs (137M nodes, 272M edges) and multi-objective optimization workflows.

Major Contributions: Reworked recursive traversal to iterative postorder, merged multi-pass traversal into callback-driven single pass, and removed massive child-vector allocations via index-based APIs.

Summary Result: Improved BuildEvaluatorContext runtime by about 2.7x (154s to 57s), and added per-objective solver time limits and expression-layer correctness fixes.

TAPS HW Placement Solver - Multi-Use-Case Move Engine

Details: Architected a unified TAPS hardware placement and move framework supporting placement, retread move, rack saves, defrag move, and ad-hoc scenarios.

Major Contributions: Implemented staged candidate generation/model building, global move accounting, refined hard/soft/goal constraint handling, and fixed parallel evaluation edge cases.

Summary Result: Delivered measurable production outcomes including successful rack evacuation and large-scale defrag moves with objective improvements.

SCX - LLC-Aware Thread Scheduling on Bergamo

Details: Built experimental sched_ext (SCX) infrastructure for LLC-aware thread scheduling and workload pinning experiments on Bergamo servers.

Major Contributions: Implemented CPU/LLC hint propagation, automated deployment/run/restore scripts, and designed decoupled thread-pinning logic with hard/soft cache modes.

Summary Result: Demonstrated up to 80% throughput gain in benchmark conditions (13 GBps to 23 GBps) with reproducible experiment pipelines.

Recent Publications

  1. A Distributed Travel Time Estimation Capability for Metropolitan-sized Road Transportation Networks
    Arif Khan, Arun Sathanur, Kelsey Maass & Robert Rallo (2020), SIGKDD Urban Computing.

    PDF
  2. Street-level Travel-time Estimation via Aggregated Uber Data
    Kelsey Maass, Arun Sathanur, Arif Khan & Robert Rallo (2020), SIAM CSC.

    PDF
  3. Graph Analytics and Optimization Methods for Insights from the Uber Movement Data
    Arun Sathanur, Vinay Amatya, Arif Khan, Robert Rallo & Kelsey Maass (2019), ACM SCC.

    PDF
  4. Mapping Arbitrarily Sparse Two-body Interactions on One-dimensional Quantum Circuits
    Arif Khan, Mahantesh Halappanavar, Tobias Hagge, Karol Kowalski, Alex Pothen & Sriram Krishnamoorthy (2019), HiPC.

    PDF
  5. Adaptive Anonymization of Data using b-Edge Cover
    Arif Khan, Krzysztof Choromanski, Alex Pothen, S M Ferdous, Mahantesh Halappanavar & Antonino Tumeo (2018), Supercomputing.

    PDF

See all publications

Contact