I study modeling, analysis, and optimization techniques for high-performance software, with a current focus on machine learning algorithms. My philosophy is that the combination of statistical methods, code analysis, and domain knowledge leads to better tools for understanding and building fast systems. My recent work includes the MachSuite and Fathom benchmarks for hardware accelerators and deep learning, resepctively, the Minerva optimization framework for designing neural network processors, and a modeling and analysis framework for deep learning models.
My past work was largely on supercomputers. For the most part, this meant looking at what goes wrong when we take software and scale it out by several orders of magnitude. More specifically, I looked at leverage compilers and runtime systems to automatically solve issues like nested parallelism, load imbalance, and work inefficiency. I spent a lot of time doing this by hand for specific applications on unconventional architectures, but more recent work looked at how most of the heavy lifting could be generalized and automated.
If you're interested in what else I've done over the years, I can send you a CV. Just ask me via email.
|Book . "Deep Learning for Computer Architects." Morgan-Claypool, 2017.|
|PDF . "A Case for Efficient Accelerator Design Space Exploration via Bayesian Optimization." International Symposium on Low-Power Electronics and Design (ISLPED). July 2017, Taipei, Taiwan.|
|PDF . "Designing Neural Network Hardware Accelerators with Decoupled Objective Evaluations." NIPS Workshop on Bayesian Optimization (BayesOpt). December 2016, Barcelona, Spain.|
|PDF arXiv Code . "Fathom: Reference Workloads for Modern Deep Learning Methods." Proceedings of the IEEE International Symposium on Workload Characterization (IISWC). September 2016, Providence, RI.|
|PDF . "Minerva: Enabling Low-Power, High-Accuracy Deep Neural Network Accelerators." Proceedings of the International Symposium on Computer Architecture (ISCA). June 2016, Seoul, Korea.|
|PDF Code . "MachSuite: Benchmarks for Accelerator Design and Customized Architectures." International Symposium on Workload Characterization (IISWC). October 2014, Raleigh, NC.|
|PDF . "Scalable, Multithreaded, Partially-in-place Sorting." Proceedings of the Seventh Workshop on Multithreaded Architectures and Applications. May 2013, Boston, MA.|
|PDF . "Towards Efficient N-x Contingency Selection Using Group Betweenness Centrality." Proceedings of the Second International Workshop on High-Performance Computing, Networking, and Analytics for the Power Grid. November 2012, Salt Lake City, UT.|
|PDF . "Techniques for Improving Filters in Power Grid Contingency Analysis." 7th International Conference on Machine Learning and Data Mining. August 2011, New York.|
|PDF . "High-Performance Descriptive Semantic Analysis of Semantic Graph Databases." 1st Workshop on High-Performance Computing for the Semantic Web. May 2011, Crete.|
|PDF . "High-performance Computing Applied to Semantic Databases." 8th Extended Semantic Web Conference. May 2011, Crete.|
|PDF . "The Design and Evolution of Deep Learning Workloads." IEEE Micro. Vol. 37, No. 1, 2017.|
|PDF , S. Borkar, N. DeBardeleben, M. Elnozahy, M. Heroux, D. Rogers, R. Ross, V. Sarkar, M. Schulz, M. Snir, and P. Woodward. "Inter-Agency Workshop on HPC Resilience at Extreme Scale." July 2012, Catonsville, MD.|
|PDF . "Materialization is Evil." Invited Position Paper, Workshop on Scalable Graph Libraries. June 2011, Atlanta.|
|PDF . "Graph Analysis for the Semantic Web." Invited Position Paper, Workshop on Scalable Graph Libraries. June 2011, Atlanta.|
|PDF . "Report on April, 2011, Workshop on Semantic Graph Database Search Patterns." Invited Report, 1st Workshop on High-Performance Computing for the Semantic Web. May 2011, Crete.|
|PDF . "High Performance Semantic Factoring." Runner-up Submission, Semantic Web Challenge 2010, Billion Triples Track. November 2010, Shanghai.|