Publications

Conference Papers

Journal Papers

Technical Memos

Technical Notes

Theses


 

 

Conference Papers

"Gregarious data restructuring in a many core architecture"
In The 17th IEEE International Conference on High Performance Computing and Communications, New York, USA, August, 2015.
S. Shrestha, J. Manzano, A. Marquez, S. Song , S. Zuckerman, and G. R. Gao.

"Locality Aware Concurrent Start for Stencil Applications"
In the Proceedings of the 13th International Symposium on Code Generation and Optimization, San Francisco, USA, February, 2015.
S. Shrestha, J. Manzano, A. Marquez, J. Feo, and G. R. Gao.

"ACDT: Architected Composite Data Types Trading-in Unfettered Data Access for Improved Execution"
In The 20th IEEE International Conference on Parallel and Distributed Systems, Hsinchu, Taiwan, December, 2014.
A. Marquez, J. Manzano, S. Song, B. Meister, S. Shrestha, T. St. John and G. R. Gao.

"Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading"
In the 27th International Workshop on Languages and Compilers for Parallel Computing, Hillsboro, OR, USA, September, 2014.
S. Shrestha, J. Manzano, A. Marquez, J. Feo and G. R. Gao.

"On the Feasibility of a Codelet Based Multi-core Operating System"
In 4th Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM'14). August 24, 2014, Edmonton, Alberta, Canada.
Jack B. Dennis and Guang R. Gao.

"Toward a Self-Aware Codelet Execution Model"
In 4th Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM'14). August 24, 2014, Edmonton, Alberta, Canada.
Stéphane Zuckerman, Aaron Landwehr, Kelly Livingston, and Guang R. Gao.

"Position Paper: Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading"
In Proceedings of Workshop on Multi-Threaded Architectures and Applications (MTAAP 2014), May 2014.
Jaime Arteaga, Stephane Zuckerman, Elkin Garcia, and Guang R. Gao.

"ASAFESSS: A Scheduler-driven Adaptive Framework for Extreme Scale Software Stacks"
In Proceedings of the 4th International Workshop on Adaptive Self-Tuning Computing Systems (ADAPT'14); 9th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC'14), Vienna, Austria. January 20-22, 2014. Best Paper Award
Tom St. John, Benoit Meister, Andres Marquez, Joseph B. Manzano, Guang R. Gao, and Xiaoming Li.

"A Dynamic Schema to increase performance in Many-core Architectures through Percolation operations"
In Proceedings of the 2013 IEEE International Conference on High Performance Computing (HiPC 2013), Hyderabad, India, December 18 - 21, 2013.
Elkin Garcia, Daniel Orozco, Rishi Khan, Ioannis Venetis, Kelly Livingston, and Guang Gao.

"Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture"
In Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2013), Santa Clara, CA, September 25-27, 2013.
Elkin Garcia, Jaime Arteaga, Robert Pavel, and Guang R. Gao.

"COStream: A Dataflow Programming Language and Compiler for Multi-Core Architecture"
In Proceedings of Data-Flow Models (DFM) for extreme scale computing Workshop 2013 in conjunction with Parallel Architectures and Compilation Technologies (PACT 2013), Edinburgh, Scotland, September 8, 2013.
Haitao Wei, Guang R. Gao, Weiwei Zhang, Junqing Yu.

"The TERAFLUX Project: Exploiting the DataFlow Paradigm in Next Generation Teradevices"
In Proceedings of the 16th Euromicro Conference on Digital System Design, Santander, Spain, September 4-6, 2013.
Marco Solinas, Rosa M. Badia, François Bodin, Albert Cohen, Paraskevas Evripidou, Paolo Faraboschi, Bernhard Fechner, Guang R. Gao, Arne Garbade, Sylvain Girbal, Daniel Goodman, Behran Khan, Souad Koliai, Feng Li, Mikel Luján, Laurent Morin, Avi Mendelson, Nacho Navarro, Antoniu Pop, Pedro Trancoso, Theo Ungerer, Mateo Valero, Sebastian Weis, Ian Watson, Stéphane Zuckermann, Roberto Giorgi.

"An Implementation of the Codelet Model"
In Proceedings of 19th International European Conference on Parallel and Distributed Computing (Euro-Par 2013), Aachen, Germany. August 26th, 2013.
Joshua Suettlerlein, Stephane Zuckerman, Guang R. Gao.

"Toward a Self-aware System for Exascale Architectures"
In Proceedings of Euro-Par 2013: Parallel Processing Workshops; the 1st Workshop on Runtime and Operating Systems for the Many-core Era (ROME 2013), Aachen, Germany. August 26th, 2013.
Aaron Landwehr, Stephane Zuckerman, and Guang R. Gao.

"Automatic Locality Exploitation in the Codelet Model"
In Proceedings of 11th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA-13), Melbourne, Australia, July, 2013.
Chen Chen, Yao Wu, Joshua Sutterlein, Long Zheng, Minyi Guo, and Guang R. Gao.

"Towards Memory-Load Balanced Fast Fourier Transformations in Fine-grain Execution Models"
In Proceedings of Workshop on Multithreaded Architectures and Applications (MTAAP 2013), May 24, 2013, Boston, Massachusetts USA
Chen Chen, Yao Wu, Stephane Zuckerman, and Guang R. Gao.

"Strategies for improving Performance and Energy Efficiency on a Many-core"
In Proceedings of 2013 ACM International Conference on Computer Frontiers (CF 2013), May 14-16, Ischia, Italy, ACM, 2013.
Elkin Garcia and Guang R. Gao.

"Towards An Energy-Efficient Scheduler in the Codelet Model"
Poster Paper. In Proceedings of IEEE Symposium on Low-Power and High-Speed Chips (IEEE COOL Chips XVI), April 17-19. 2013, Yokohama, Japan.
C. Chen, Y. Wu, J. Suetterlein, L. Zheng and G. Gao.

"Determinacy and Repeatability of Parallel Program Schemata"
In Proceedings of Second Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM 2012), Minneapolis, MN, USA, September 23, 2012.
Jack B. Dennis, Guang R. Gao, and Vivek Sarkar.

"Demystifying Performance Predictions of Distributed FFT3D Implementations"
In Proceedings of the 9th IFIP International Conference on Network and Parallel Computing (NPC 2012), Gwangju. Korea. September 6 - 8, 2012.
Daniel Orozco, Elkin Garcia, Robert Pavel, Orlando Ayala, Lian-Ping Wang and Guang R. Gao.

"MODA: A Framework for Memory Centric Performance Characterization"
In Proceedings of the 2nd International Workshop on High-Performance Infrastructure for Scalable Tools (WHIST 2012); 26th International Conference of Supercomputing (ICS'12), Venice, Italy. June 29, 2012.
Sunil Shrestha, Chun-Yi Sun, Amanda White, Joseph Manzano, Andres Marquez, Jhon Feo, Kirk Cameron and Guang R. Gao.

"A discussion in favor of Dynamic Scheduling for regular applications in Many-core Architectures"
In Proceedings of 2012 Workshop on Multithreaded Architectures and Applications (MTAAP 2012); 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2012), Shanghai, China. May 21 - 25, 2012.
Elkin Garcia, Daniel Orozco, Robert Pavel and Guang R. Gao.

"Dynamic Percolation: A case of study on the shortcomings of traditional optimization in Many-core Architectures"
In Proceedings of 2012 ACM International Conference on Computer Frontiers (CF 2012), Cagliari, Italy. May 15 - 17, 2012.
Elkin Garcia, Daniel Orozco, Rishi Khan, Ioannis Venetis, Kelly Livingston and Guang R. Gao.

"Massively Parallel Breadth First Search Using a Tree-Structured Memory Model"
In Proceedings of International Workshop on Programming Models and Applications for Multicores and Manycores (PMAM 2012); 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’12), New Orleans, LA, USA. February 25-29, 2012.
Tom St. John, Jack B. Dennis and Guang R. Gao.

"Toward High Throughput Algorithms on Many Core Architectures"
In Proceedings of 7th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC 2012), Paris, France. January 23-25, 2012.
Daniel Orozco, Elkin Garcia, Rishi Khan, Kelly Livingston and Guang R. Gao.

"TIDeFlow: The Time Iterated Dependency Flow Execution Model"
In Proceedings of Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM 2011); 20th International Conference on Parallel Architectures and Compilation Techniques (PACT 2011), Galveston Island, TX, USA. October 10 - 14, 2011.
Daniel Orozco, Elkin Garcia, Robert Pavel, Rishi Khan and Guang R. Gao

"Exploring Fine-Grained Task-based Execution on Multi-GPU Systems"
In Proceedings of Workshop on Parallel Programming on Accelerator Clusters (PPAC 2011); IEEE Cluster 2011. Austin, TX, USA. September 26, 2011.
Long Chen, Oreste Villa and Guang R. Gao

"Towards an integrated multiscale simulation of turbulent clouds on PetaScale computers"
In Proceedings of 13th European Turbulence Conference (ETC13), Warsaw, Poland. September 12-15, 2011.
Lian-Ping Wang, Orlando Ayala, Hossein Parishani, Wojciech W Grabowski, Andrzej A Wyszogrodzki, Zbigniew Piotrowski, Guang R Gao, Chandra Kambhamettu, Xiaoming Li, Louis Rossi, Daniel Orozco and Claudio Torres.

"Polytasks: A Compressed Task Representation for HPC Runtimes"
In Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2011), Fort Collins, CO, USA. September 8-10, 2011.
Daniel Orozco, Elkin Garcia, Robert Pavel, Rishi Khan and Guang R. Gao

"OPELL and PM: A Case Study on Porting Shared Memory Programming Models to Accelerators Architectures"
In Proceedings of the 24th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2011), Fort Collins, CO, USA. September 8-10, 2011.
Joseph B. Manzano, Ge Gan, Juergen Ributzka, Sunil Shrestha and Guang R. Gao

"Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures"
In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par'11), Bordeaux, France. August 29 - September 2, 2011.
Yonghong Yan, Sanjay Chatterjee, Daniel Orozco, Elkin Garcia, Zoran Budimlic, Jun Shirako, Robert Pavel, Guang R. Gao and Vivek Sarkar

"Experiments with the Fresh Breeze Tree-Based Memory Model"
In Proceedings of International Supercomputing Conference (ISC'11), Hamburg, Germany, June 19 - 23, 2011.
Jack B. Dennis,  Guang R. Gao and  Xiao X. Meng

"Position Paper: Using a "Codelet" Program Execution Model for Exascale Machines"
In Proceedings of ACM SIGPLAN 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (EXADAPT 2011); Programming Language Design and Implementation (PLDI 2011). San Jose, CA, USA. June 5, 2011.
Stephane Zuckerman, Joshua Suetterlein, Rob Knauerhase and Guang R. Gao

"The Elephant and the Mice: Non-Strict Fine-Grain Synchronization for Many-Core Architectures"
In Proceedings of 25th International Conference on Supercomputing (ICS'11), Tucson, AZ, USA. May 31 - June 4, 2011.
Juergen Ributzka, Joseph B. Manzano, Yuhei Hayashi and Guang R. Gao

"DEEP: An Iterative FPGA-based Many-Core Emulation System for Chip Verification and Architecture Research"
In Proceedings of 19th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'11), Monterrey, CA, USA. February 27 - March 1, 2011.
Juergen Ributzka, Yuhei Hayashi, Fei Chen and Guang R. Gao

"Energy efficient tiling on a Many-Core Architecture"
In Proceedings of 4th Workshop on Programmability Issues for Heterogeneous Multicores (MULTIPROG 2011); 6th International Conference on High-Performance and Embedded Architectures and Compilers (HiPEAC), Heraklion, Greece. January 23, 2011.
Elkin Garcia, Daniel Orozco and Guang R. Gao

"Locality Optimization of Stencil Applications using Data Dependency Graphs"
In Proceedings of the 23rd International Workshop on Languages and Compilers for Parallel Computing (LCPC 2010), Houston, TX, USA. October 7-9, 2010.
Daniel Orozco, Elkin Garcia and Guang R. Gao

"Optimized Dense Matrix Multiplication on a Many-Core Architecture"
In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par'10), Ischia, Italy. August 31- September 3, 2010.
Elkin Garcia, Ioannis E. Venetis, Rishi Khan and Guang R. Gao

"A Study of a Software Cache Implementation of the OpenMP Memory Model for Multicore and Manycore Architectures"
In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par'10), Ischia, Italy. August 31- September 3, 2010.
Chen Chen, Joseph B Manzano, Ge Gan, Guang R. Gao and Vivek Sarkar

"TiNy threads on BlueGene/P: Exploring many-core parallelisms beyond The traditional OS"
In Proceedings of Workshop on Multithreaded Architecures and Applications (MTAAP); 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, USA. April 23, 2010.
Handong Ye, Robert Pavel, Aaron Landwehr and Guang Gao

"Minimizing Communication in Rate-Optimal Software Pipelining for Stream Programs"
In Proceedings of Symposium on Code Generation and Optimization (CGO 2010), Toronto, Canada. April 24-28, 2010.
Haitao Wei, Junqing Yu, Huafei Yu and Guang R. Gao

"Dynamic Load Balancing on Single- and Multi-GPU Systems"
In Proceedings of the 24th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, USA. April 19-23, 2010.
Long Chen, Oreste Villa, Sriram Krishnamoorthy, and Guang R. Gao

"Performance Analysis of Cooley-Tukey FFT Algorithms for a Many-core Architecture "
In Proceedings of The High Performance Computing Symposium (HPC 2010), Orlando, FL, USA. April 12-15, 2010.
Long Chen and Guang R. Gao

"MODA: A Memory Centric Performance Analysis Tool"
In Proceedings of 11th LCI International Conference on High-Performance Clustered Computing, Pittsburgh, PA, USA. March 9-11, 2010
Joseph B. Manzano, Andres Marquez and Guang R. Gao

"Iterative Layer-Based Raytracing on CUDA"
In Proceedings of 28th IEEE International Performance Computing and Communications Conference (IPCCC 2009), Phoenix, AZ, USA. December 14-16, 2009.
Alejandro Segovia, Xiaoming Li and Guang R. Gao

"Mapping the FDTD Application to Many-Core Chip Architectures"
In Proceedings of the 38th International Conference on Parallel Processing (ICPP 2009), Vienna, Austria. September 22-25, 2009.
Daniel Orozco and Guang R. Gao

"Tile Percolation: an OpenMP Tile Aware Parallelization Technique for the Cyclops-64 Multicore Processor"
In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par'09), Delft, The Netherlands. August 25-28, 2009
Ge Gan, Xu Wang, Joseph Manzano and Guang R. Gao

"Tile reduction: the first step towards Openmp tile aware parallelization"
In Proceedings of the 5th International Workshop on OpenMP (IWOMP'09), Dresden, Germany, June 3-5, 2009
Ge Gan, Xu Wang, Joseph Manzano, Guang R. Gao

"Mapping the LU Decomposition on a Many Core Architecture: Challenges and Solutions"
In Proceedings of ACM International Conference on Computing Frontiers (CF 2009), Ischia, Italy. May 18-20, 2009
Ioannis E. Venetis and Guang R. Gao

"Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture"
In Proceedings of The 21st Annual Languages and Compilers for Parallel Computing Workshop (LCPC 2008), Alberta, Canada. July 31 - August 2, 2008
Guangming Tan, Vugranam Sreedhar, Guang R. Gao

"Experience on Optimizing Irregular Computation for Memory Hierarchy in Manycore Architecture "
Poster Paper. 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), Salt Lake City, UT, USA. February 20-23, 2008
Guangming Tan, Dongrui Fan, Junchao Zhang, Andrew Russo, Guang R. Gao

"Performance Tuning of the Fast Fourier Transform on a Multi-core Architecture"
In Proceedings of First Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG 2008), Goteborg, Sweden. January 27, 2008.
Liping Xue, Long Chen, Ziang Hu, Guang R. Gao

"Server I/O Acceleration Using an Embedded Multi-core Architecture"
In Proceedings of Workshop on Application Specific Processors (WASP 2007), Salzburg, Austria. October 4-5, 2007.
Lurng-Kuo Liu, Fei Chen, Christos J. Georgiou and Guang R. Gao

"Software-Pipelining on Multi-Core Architectures"
In Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), Brasov, Romania. September 15-19, 2007.
Alban Douillet and Guang R. Gao

"Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers"
In Proceedings of The 20th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2007), Urbana, IL, USA. October 11-13, 2007
Yuan Zhang, Evelyn Duesterwald and Guang R. Gao

"Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization for Many-Core Architectures"
In Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), San Diego, CA, USA. June 9-13, 2007
Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu, and Guang R. Gao
Available in pdf format

"A Parallel Dynamic Programming Algorithm on a Multi-core Architecture"
In Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2007), San Diego, CA, USA. June 9-11, 2007
Guangming Tan, Ninghui Sun, and Guang R. Gao

"ParalleX: A Study of A New Parallel Computation Model"
In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007.
Guang R. Gao, Thomas Sterling, Rick Stevens, Mark Hereld and Weirong Zhu

"On the Role of Deterministic Fine Grain Data Synchronization for Scientific Applications: A Revisit in the Emerging Many-Core Era"
In Proceedings of First Workshop on Multithreaded Architecures and Applications in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007.
Weirong Zhu, Ziang Hu, and Guang R. Gao

"Exploring a multithreaded Methodology to Implement a Network Communication Protocol on the Cyclops-64 Multithreaded Architecture"
In Proceedings of First Workshop on Multithreaded Architectures and Applications in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007.
Ge Gan, Ziang Hu, Juan del Cuvillo, and Guang R. Gao
Also available in pdf format

"Experience of Optimizing FFT on Intel Core Architecture"
In Proceedings of Workshop on Performance Optimization for High-Level Languages and Libraries in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007.
Daniel Orozco, Liping Xue, Murat Bolat, Xiaoming Li and Guang Gao
Also available in pdf format

"Automatic Program Segment Similarity Detection in Targeted Program Performance Improvement"
In Proceedings of Workshop on Performance Optimization for High-Level Languages and Libraries in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007.
Haiping Wu, Eunjung Park, Mihailo Kaplarevic, Yingping Zhang, Murat Bolat, Xiaoming Li and Guang Gao
Also available in pdf format

"Optimizing Fast Fourier Transform on a Multi-core Architecture"
In Proceedings of Workshop on Performance Optimization for High-Level Languages and Libraries in the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2007), Long Beach, CA, USA. March 26 - 30, 2007.
Long Chen and Ziang Hu
Also available in pdf format

"Optimized lock assignment and allocation: a method for exploiting concurrency among critical sections"
In the Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming (PPoPP 2007), San Jose, CA, USA, March 14 - 17, 2007.
Yuan Zhang, Vugranam C. Sreedhar, Weirong Zhu, Vivek Sarkar and Guang R. Gao

"Exploring Financial Applications on Many-core-on-a-chip Architecture: A First Experiment"
In Proceedings of Workshop on Frontiers of High Performance Computing and Networking (FHPCN2006), 4th International Symposium on Parallel and Distributed Processing and Applications (ISPA 2006) , Sorrento, Italy. December 4-7, 2006.
Weirong Zhu, Parimala Thulasiraman, Ruppa K. Thulasiram and Guang R. Gao
Available in pdf format

"Optimization of Dense Matrix Multiplication on IBM Cyclops-64: Challenges and Experiences"
In Proceedings of the 12th International European Conference on Parallel Processing (Euro-Par 2006), Dresden, Germany. August 29 - September 1, 2006.
Ziang Hu, Juan del Cuvillo, Weirong Zhu, and Guang R. Gao
Also available in pdf format

"Multi-Dimensional Kernel Generation for Loop Nest Software Pipelining"
In Proceedings of the 12th International European Conference on Parallel Processing (Euro-Par 2006), Dresden, Germany. August 29 - September 1, 2006.
Alban Douillet, Hongbo Rong, and Guang R. Gao
Also available in pdf format

"A User-Friendly Methodology for Automatic Exploration of Compiler Options"
In Proceedings of The International Conference on Programming Languages and Compilers (PLC06). Las Vegas, Nevada. June 26-29, 2006.
Haiping Wu, Long Chen, Joseph Manzano and Guang R. Gao
Also available in pdf format

"A User-Friendly Methodology for Automatic Exploration of Compiler Options: A Case Study on the Intel XScale Microarchitecture"
In Proceedings of The International Conference on Programming Languages and Compilers (PLC06). Las Vegas, Nevada. June 26-29, 2006.
Haiping Wu, Eunjung Park, Long Chen, Juan del Cuvillo and Guang R. Gao
Also available in pdf format

"Performance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture"
In Proceedings of the 2nd International Workshop on OpenMP (IWOMP2006), Remis, France. June 12-15 2006.
Weirong Zhu, Juan del Cuvillo and Guang R. Gao
Also available in pdf format

"Towards a Software Infrastructure for the Cyclops-64 Cellular Architecture"
In Proceedings of the 20th International Symposium on High Performance Computing Systems and Applications (HPCS'06), St. John's, Canada. May 14 - 17, 2006.
Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao
Also available in pdf format

"Landing OpenMP on Cyclops-64: An Efficient Mapping of OpenMP to a many-core System-on-a-chip"
In Proceedings of the 3rd ACM International Conference on Computing Frontiers, Ischia, Italy. May 2-5, 2006.
Juan del Cuvillo, Weirong Zhu and Guang R. Gao
Also available in pdf format

"A Study of the On-Chip Interconnection Network for the IBM Cyclops-64 Multi-Core Architecture"
In Proceedings of 20th International Parallel and Distributed Processing Symposium (IPDPS2006), Rhodes Island, Greece. April 25 - 29, 2006.
Ying M. P. Zhang, Taikyeong Jeong, Fei Chen, Haiping Wu, Ronny Nitzsche and Guang R. Gao
Also available in pdf format

"Hierarchical Multithreading: Programming Model and System Software"
In Proceedings of Workshop on NSF Next Generation Software Program (NSFNGS'06), in conjunction with 20th International Parallel and Distributed Processing Symposium (IPDPS2006), Rhodes Island, Greece. April 25 - 29, 2006.
Guang R. Gao, Thomas Sterling, Rick Stevens, Mark Hereld and Weirong Zhu

"Performance Modelling and Optimization of Memory Access on Cellular Computer Architecture Cyclops-64"
In Proceedings of Network and Parallel Computing (NPC 2005), Beijing, China. November 30 - December 3, 2005.
Yanwei Niu, Ziang Hu, Kenneth Barner and Guang R. Gao
Also available in pdf format

"Register Pressure in Software-Pipelined Loop Nests: Fast Computation and Impact on Architecture Design"
In Proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2005), Hawthorne, NY, USA. October 20-22, 2005.
Alban Douillet and Guang R. Gao
Also available in pdf format

"Identifying Multiply-Add Operations in Kylin Compiler"
In the proceedings of the 2005 International Conference on Embedded Systems and Applications (ESA'05), Las Vegas, NV, USA. June 27-30, 2005.
Haiping Wu, Ziang Hu, Joseph Manzano Yingping Zhang and Guang R. Gao

"Register Allocation for Software Pipelined Multi-dimensional Loops"
In Proceedings of Conference on Programming Language Design and Implementation (PLDI 2005), Chicago, IL, USA. June 11 - 15, 2005.
Hongbo Rong, Alban Douillet and Guang R. Gao
Also available in pdf format

"FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture"
In Proceedings of Workshop on Modeling, Benchmarking and Simulation (MoBS), held in conjunction with the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), Madison, WI, USA. June 4, 2005.
Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao
Also available in pdf format

"P3I: The Delaware Programmability, Productivity and Proficiency Inquiry"
In Proceedings of the Second International Workshop On Software Engineering for High Performance Computing System Applications (SE-HPCS '05), St. Louis, MO, USA. May 15, 2005
Joseph B. Manzano, Yuan Zhang and Guang R. Gao

"Atomic Section: Concept and Implementation"
In Proceedings of Mid-Atlantic Student Workshop on Programming Languages and Systems (MASPLAS '05), Newark, DE, USA. April 30, 2005.
Yuan Zhang, Joseph B. Manzano and Guang R. Gao

"TiNy Threads: a Thread Virtual Machine for the Cyclops-64 Cellular Architecture"
In Proceedings of the Fifth Workshop on Massively Parallel Processing (WMPP), held in conjunction with the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), Denver, CO, USA. April 3 - 8, 2005
Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao
Also available in pdf format

"Performance Portability on EARTH: A Case Study across Several Parallel Architectures"
In Proceedings of the 4th International Workshop on Performance Modeling, Evaluation, and Optimization of Parallel and Distributed Systems (PMEO-PDS'05), conjuncted with IPDPS 2005, Denver, CO, USA. April 4 - 8, 2005.
Weirong Zhu, Yanwei Niu and Guang Gao

"Sequential Consistency Revisited: The Sufficient Conditions and Method to Reason Consistency Model of a Multiprocessor-on-a chip Architecture"
In Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN2005), Innsbruck, Austria. February 15 - 17, 2005.
Yuan Zhang, Weirong Zhu, Fei Chen, Ziang Hu and Guang R. Gao

"If-Conversion in SSA Form"
In Proceedings of the International European Conference on Parallel and Distributed Computing (Euro-Par 2004), Pisa, Italy. August 31 - September 3, 2004.
Arthur Stoutchinin and Guang R. Gao

"Single-Dimension Software Pipelining for Multi-Dimensional Loops"
In Proceedings of International Symposium on Code Generation and Optimization (CGO 2004), San Jose, CA. March 21 -24, 2004.
Hongbo Rong, Zhizhong Tang, R. Govindarajan, Alban Douillet and Guang R. Gao
Also available in pdf format

"Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops"
In Proceedings of International Symposium on Code Generation and Optimization (CGO 2004), San Jose, CA. March 21 -24, 2004.
Hongbo Rong, Alban Douillet, R. Govindarajan and Guang R. Gao
Also available in pdf format

"DIMES: An Iterative Emulation Platform for Multiprocessor-System-on-Chip Designs"
In Proceedings of IEEE International Conference on Field-Programmable Technology (FPT'03), Tokyo, Japan. December 15 - 17, 2003.
Hirofumi Sakane, Levent Yakay, Vishal Karna, Clement Leung and Guang R. Gao

"Code Size Oriented Memory Allocation for Temporary Variables"
In Proceedings of Fifth Workshop on Media and Streaming Processors (MSP-5/MICRO-36), San Diego, CA, USA. December 1, 2003.
Ziang Hu, Yan Xie and Guang R. Gao

"Code Size Reduction with Global Code Motion"
In Proceedings of Workshop on Compilers and Tools for Constrained Embedded Systems (CTCES/CASES) 2003, San Jose, CA, USA. October 29, 2003.
Ziang Hu, Yuan Zhang, Hongbo Yang and Guang R. Gao

"Performance Study of a Whole Genome Comparison Tool on a Hyper-Threading Multiprocessor"
In Proceedings of Fifth International Symposium on High Performance Computing, Tokyo, Japan. October 20 - 22, 2003.
Juan del Cuvillo, Xinmin Tian, Guang R. Gao and Millind Girkar

"CARE: Overview of an Adaptive Multithreaded Architecture"
In Proceedings of Fifth International Symposium on High Performance Computing, Tokyo, Japan. October 20 - 22, 2003.
Andres Marquez and Guang R. Gao

"Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation"
In Proceedings of 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2003), College Station, TX, USA. October 2 - 4, 2003.
Hongbo Yang, R. Govindarajan, Guang R. Gao and Ziang Hu

"A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model"
In Proceedings of Fifth IEEE International Conference on Cluster Computing (CLUSTER2003), Hong Kong, China. September 20-23, 2003.
Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen and Guang R. Gao

"An Executable Analytical Performance Evaluation Approach for Early Performance Prediction"
In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France. April 22 - 26, 2003.
Adeline Jacquet, Vincent Janot,Clement Leung, Guang R. Gao, R. Govindarajan, Thomas L. Sterling

"Programming Models and System Software for Future High-End Computing Systems: Work-in-Progress"
In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2003), Nice, France. April 22 - 26, 2003.
Guang R. Gao, Kevin B. Theobald, R. Govindarajan, Clement Leung, Ziang Hu, Haiping Wu, Jizhu Lu, Juan del Cuvillo, Adeline Jacquet, Vincent Janot and Thomas L. Sterling

"On Achieving Balanced Power Consumption in Software Pipelined Loops"
In Proceedings of International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES 2002), Grenoble, France. Octuber 8 - 11, 2002.
Hongbo Yang, Guang R. Gao, Clement Leung, R. Govindarajan and Haiping Wu
Available as gzipped Postscript.

"Exploiting Schedule Slacks for Rate-Optimal Power-Minimum Software Pipelining"
In Proceedings of 3rd Workshop on Compilers and Operating Systems for Low Power (COLP), held in conjunction with The 11th International Conference on Parallel Architecture and Compilation Techniques (PACT), Charlottesville, VA, USA. September 22 - 25, 2002.
Hongbo Yang, R. Govindarajan, Guang R. Gao, George Cai and Ziang Hu
Available as gzipped Postscript.

"Power-Performance Trade-offs for Energy-Efficient Architectures: A Quantitative Study"
In Proceedings of 20th International Conference on Computer Design (ICCD) 2002, Freiburg, Germany. September 16 - 18, 2002.
Hongbo Yang, R. Govindarajan, Guang R. Gao and Kevin B. Theobald
Available gzipped Postscript.

"Whole Genome Alignment using a Multithreaded Parallel Implementation"
In Proceedings of Symposium on Computer Architecture and High Performance Computing, Pirenopolis, Brazil. September 10 - 12, 2001.
Wellington S. Martins, Juan del Cuvillo, Wenwu Cui and Guang R. Gao

"Next Generation System Software for Future High-End Computing Systems"
In Proceedings of the 16th International Parallel and Distributed Processing Symposium(IPDPS '02). IEEE Computer Society, Washington, DC, USA. April 15, 2002.
Guang R. Gao, Kevin B. Theobald, Ziang Hu, Haiping Wu, Jizhu Lu, Keshav Pingali, Paul Stodghill, Thomas L. Sterling, Rick Stevens, and Mark Hereld

"Power and Energy Impact by Loop Transformations"
In Proceedings of Workshop on Compilers and Operating Systems for Low Power 2001, in conjunction with Parallel Architecture and Compilation Techniques 2001, Barcelona, Spain. September 8, 2001.
Hongbo Yang, Guang R. Gao, Andres Marquez, George Cai and Ziang Hu
Available as gzipped Postscript.

"A Multi-Threaded Runtime System for a Multi-Processor/Multi-Node Cluster"
In Proceedings of 15th Annual International Symposium on High Performance Computing Systems and Applications, Windsor, ON, Canada. June 18 - 20, 2001.
Christopher J. Morrone, Jose N. Amaral, Guy Tremblay, and Guang R. Gao

"Minimum Register Instruction Sequence Problem: Revisiting Optimal Code Generation for DAGs"
In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, CA, USA. April 24 - 28, 2001.
R. Govindarajan, Hongbo Yang, C. Zhang, Jose N. Amaral and Guang R. Gao
Available as gzipped Postscript.

"Multithreaded Algorithms for Pricing a Class of Complex Options"
In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, CA, USA. April 24 - 28, 2001.
Ruppa K. Thulasiram, Lubomir Litov, Hassan Nojumi, Christopher T. Downing and Guang R. Gao
Available as gzipped Postscript.

"Compiling Several Classes of Reductions on a Multithreaded Architecture"
In Proceedings of the Mid-Atlantic Student Workshop on Programming Languages and Systems, Yorktown Heights, New York, IBM T. J. Watson Research Center, April 2001.
Rishi Kumar, Gagan Agrawal, Kevin Theobald, Gary M. Zoppetti, and Guang R. Gao

"Speculative Prefetching of Induction Pointers"
In Proceedings of International Conference on Compiler Construction (CC 2001), Genova, Italy. April 2 - 6, 2001.
Artour Stoutchinin, Jose N. Amaral, Guang R. Gao, Jim Dehnert, Suneel Jain and Alban Douillet
Available as gzipped Postscript.

"Computer Detection of Single Nulcleotide Polymorphisms (SNPs) in Maize ESTs"
In Proceedings of Plant & Animal Genome IX Conference (PAG-IX), San Diego, CA, USA. January 13 - 17, 2001.
F. Useche, M. Morgante, M.Hanafey, Scott Tingey, Guang R. Gao and Antoni Rafalski

"A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison"
In Proceedings of Pacific Symposium on Biocomputing (PSB 2001), pp. 311-322, Hawaii, HI, USA. January 3 - 7, 2001.
W.S. Martins, J.B. del Cuvillo, F.J. Useche, K.B. Theobald and Guang R. Gao

"Landing CG on EARTH: A Case Study of Fine-Grained Multithreading on an Evolutionary Path"
In Proceedings of Super Computing (SC2000), Dallas, TX, USA. November 4-10, 2000.
Kevin B. Theobald, Gagan Agrawal, Rishi Kumar, Gerd Heber, Guang R. Gao, Paul Stodghill and Keshav Pingali

"Developing a Communication Intensive Application on the EARTH Multithreaded Architecture"
In Proceedings of International European Conference on Parallel and Distributed Computing (Euro-Par 2000), Munchen, Germany. August 28 - September 1, 2000.
Kevin B. Theobald, Rishi Kumar, Gagan Agrawal Gerd Heber, Ruppa K. Thulasiram and Guang R. Gao

"Multithreaded Algorithms for the Fast Fourier Transform"
In Proceedings of the 12th Annual ACM Symposium on Parallel Algorithms and Architectures, Bar Harbor, Maine, pp. 176-185, July 2000.
Parimala Thulasiraman, Kevin B. Theobald, Ashfaq A. Khokhar, and Guang R. Gao

"Parallel FEM Simulation of Crack Propagation --Challenges, Status, and Perspectives"
In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS'00), pp. 443-449 Cancun, Mexico. May 1-5, 2000.
Bruce Carter, Chuin-Shan Chen, L. Paul Chew, Nikos Chrisochoides, Guang R. Gao, Gerd Heber, Antony R. Ingraffea, Roland Krause, Chris Myers, Demian Nave, Keshav Pingali, Paul Stodghill, Stephen Vavasis, Paul A. Wawrzynek

"Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System"
In Proceedings of International Parallel and Distributed Processing Symposium (IPDPS'00), pp. 589-594, Cancun, Mexico. May 1-5, 2000.
Wen-Yen Lin, Jean-Luc Gaudiot, Jose N. Amaral and Guang R. Gao

"Performance Analysis of the I-Structure Software Cache on Multi-Threading Systems"
In Proceedings of 19th IEEE International Performance, Computing and Communication Conference (IPCCC2000), Phoenix, AZ, USA. February 20-22, 2000.
Wen-Yen Lin, Jean-Luc Gaudiot, Jose N. Amaral and Guang R. Gao

"A Comparative Performance Study of Fine-Grain Multi-threading on Distributed Memory Machines"
In Proceedings of 19th IEEE International Performance, Computing and Communication Conference (IPCCC2000), Phoenix, AZ, USA. February 20-22, 2000.
Prasad Kakulavarapu, Christopher J. Morrone, Kevin B. Theobald, Jose N. Amaral and Guang R. Gao

"Coping With Very High Latencies in Petaflops Computer Systems"
In Proceedings of High Performance Computing, Second International Symposium, ISHPC'99, Kyoto, Japan. May 26-28, 1999.
Sean Ryan, Jose N. Amaral, Guang R. Gao, Zachary Ruiz, Andres Marquez and Kevin Theobald.

"A Multithreading Parallel Computational Approach for Valuing Derivatives"
In Proceedings of First WAFA Finance Research Conference, Fairfax, VA, USA. April 30, 1999.
R.K. Thulasiram and Guang R. Gao

"Load Adaptive Algorithms and Implementations for the 2D Discrete Wavelet Transform on Fine-Grain Multithreaded Architectures"
In Proceedings of Workshop on SPDP '99, San Juan, Puerto Rico, April 12-16, 1999.
Ashfaq A. Khokhar, Gerd Heber, Parimala Thulasiraman and Guang R. Gao
Available as gzipped Postscript.

"A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes"
In Proceedings of Workshop on SPDP '99, San Juan, Puerto Rico, April 12-16, 1999.
Gerd Heber, Rupak Biswas and Guang R. Gao.
Available as gzipped Postscript.

"Self-Avoiding Walks over Adaptive Unstructured Grids"
In Proceedings of Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR 1999), San Juan, Puerto Rico, April 12-16, 1999.
Gerd Heber, Rupak Biswas and Guang R. Gao.
Available as gzipped Postscript.

"Efficient State-Diagram Construction Methods for Software Pipelining"
In Proceedings of International Conference on Compiler Construction (CC 1999), Amsterdam, The Netherlands. March 20-28, 1999.
Chihong Zhang, R. Govindarajan, Sean Ryan and Guang R. Gao.
Available as gzipped Postscript.

"HTMT-C: Proposing A Programming Language For A Petaflop Machine"
In Proceedings of the Mid-Atlantic Student workshop on Programming Languages and Systems (MASPLAS 1999), pp 53-68, Baltimore, MD. March 27. 1999
Sean Ryan, Jose Nelson. Amaral, Zachary Ruiz and Guang R. Gao

"Superconducting Processors for HTMT: Issues and Challenges"
In Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS '99), pp 260-267, Annapolis, MD, USA. February 21-25, 1999.
Kevin B. Theobald, Guang R. Gao and Thomas L. Sterling.
Available as gzipped Postscript.

"Performance Prediction for the HTMT: A Programming Example"
TFP3 '99, Annapolis, Maryland, February 22, 1999
Jose Nelson Amaral, Guang R. Gao, Phillip Merkey, Thomas Sterling, Zachary Ruiz and Sean Ryan.

"A Superstrand Architecture and its Compilation"
In Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation, held in conjunction with HPCA-V, Orlando, FL, USA. January 9-12, 1999.
Andres, Marquez, Kevin B. Theobald, Xinan Tang and Guang R. Gao

"Design and Evaluation of Dynamic Load Balancing Schemes under a Fine-grain Multithreaded Execution Model"
In Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation, held in conjunction with HPCA-V, Orlando, FL, USA. January 9-12, 1999. Haiying Cai, Olivier Maquelin, Prasad Kakulavarapu and Guang R. Gao.

"An Implementation of a Hopfield Network Kernel on EARTH"
Brazilian Symposium on Computer Architecture and High Performance Processing , Buzios, Brazil, September, 1998.
Jose N. Amaral, Guang R. Gao and Xinan Tang
Available as gzipped Postscript.

"Using Multithreading for the Automatic Load Balancing of Adaptive Finite Element Meshes"
In Proceedings of Workshop on Parallel Algorithms for Irregularly Structured Problems (IRREGULAR 1998), Berkeley, CA, USA. August 9-11, 1998.
Gerd Heber, Rupak Biswas, Parimala Thulasiraman and Guang R. Gao
Available as gzipped Postscript.

"Elastic History Buffer: A Low-Cost Method to Improve Branch Prediction Accuracy"
In Proceedings of International Conference on Computer Design: VLSI in Computers & Processors (ICCD 1997), Austin, TX, USA. October 12-15, 1997
Guang R. Gao, Maria-Dana Tarlescu and Kevin B. Theobald.
Available as gzipped Postscript.

"Thread Partitioning and Scheduling Based on Cost Model"
In Proceedings of 9th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA 1997), Newport, RI, USA. June 22 - 25, 1997.
Guang R. Gao, Xinan Tang, Jian Wang and Kevin B. Theobald.
Available as gzipped Postscript.


 

 

Journal Papers

The Design and Implementation of TIDeFlow: A Dataflow-Inspired Execution Model for Parallel Loops and Task Pipelining
International Journal of Parallel Programming, July 2015, ISSN 0885-7458.
Daniel Orozco, Elkin Garcia, Robert Pavel, Jaime Arteaga, and Guang R. Gao

TERAFLUX: Harnessing dataflow in next generation teradevices
Microprocessors and Microsystems - Journal, Available online 18 April 2014, ISSN 0141-9331.
Roberto Giorgi, Rosa M. Badia, Françs Bodin, Albert Cohen, Paraskevas Evripidou, Paolo Faraboschi, Bernhard Fechner, Guang R. Gao, Arne Garbade, Rahul Gayatri, Sylvain Girbal, Daniel Goodman, Behran Khan, Souad Koliai, Joshua Landwehr, Nhat Minh LêFeng Li, Mikel Lujà Avi Mendelson, Laurent Morin, Nacho Navarro, Tomasz Patejko, Antoniu Pop, Pedro Trancoso, Theo Ungerer, Ian Watson, Sebastian Weis, Stéphane Zuckerman, Mateo Valero

Exploitation of Locality for Energy Efficiency for Breadth First Search in Fine-grain Execution Models
Tsinghua Science and Technology - Journal, Volume 18, Number 3, June 2013.
Chen Chen, Souad Koliai and Guang R. Gao.

StreamTMC: Stream Compilation for Tiled Multi-core Architectures
Elsevier Journal of Parallel and Distributed Computing (JPDC), Volume 73, Issue 4, April 2013, Pages 484-494.
Haitao Wei, Mingkang Qin, Junqing Yu, Dongrui Fan and Guang R. Gao.

Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures
IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No.12, Dec. 2012, pp. 2338-2350.
Haitao Wei, Junqing Yu, Huafei Yu, Mingkang Qin and Guang R. Gao

Toward High Throughput Algorithms on Many Core Architectures
ACM Transactions on Architecture and Code Optimization (TACO), Volume 8, Issue 4, January 2012, Article No. 49.
Daniel Orozco, Elkin Garcia, Rishi Khan, Kelly Livingston and Guang R. Gao.

Experiments with the Fresh Breeze tree-based memory model
Computer Science - Research and Development, June 2011, Volume 26, Issue 3-4, pp 325-337.
Jack B. Dennis, Guang R. Gao, Xiao X. Meng.

Analysis and Performance Results of Computing Betweenness Centrality on IBM Cyclops64
ACM Journal of Supercomputing, Vol. 56, No.1, April 2011, pp. 1-24.
Guangming Tan, Vugranam C. Sreedhar and Guang R. Gao.

Improving Performance of Dynamic Programming via Parallelism and Locality on Multi-core Architectures
IEEE Transactions on Parallel and Distributed Systems, Vol.20, No.2, 2009, pp. 261-274.
Guangming Tan, Ninghui Sun and Guang R. Gao

Register allocation for software pipelined multidimensional loops
ACM Trans. Program. Lang. Syst. 30(4), July 2008.
Hongbo Rong, Alban Douillet and Guang R. Gao

EnGENIUS - Environmental Genome Informational Utility System
Journal of Bioinformatics and Computational Biology, JBCB-119R1, July 2008
M. Kaplarevic, A.E. Murray and Guang R. Gao

Single-Dimension Software Pipelining for Multidimensional Loops
ACM Transactions on Architecture and Code Optimization (TACO), Volume 4, Issue 1, March 2007, Article No. 7.
Hongbo Rong, Zhizhong Tang, R. Govindarajan, Alban Douillet and Guang R. Gao

Performance Portability on EARTH: A Case Study across Several Parallel Architectures
Cluster Computing, Volume 10, Number 2, June, 2007, page 115-126.
Weirong Zhu, Yanwei Niu and Guang R. Gao

Madd Operation Aware Redundancy Elimination
International Journal of Software Engineering and Knowledge Engineering, Vol. 15, No. 2, 2005, pp357-362
Haiping Wu, Ziang Hu, Joseph Manzano and Guang. R. Gao.

Improving Power Efficiency with Compiler-Assisted Cache Replacement
Journal of Embedded Computing, 2005
Hongbo~Yang, R. Govindarajan, Guang R. Gao and Ziang Hu

A Cluster-Based Solution for High Performance Hmmpfam Using EARTH Execution Model
International Journal of High Performance Computing and Networking, Vol 2, Issue 2/3/4, 2004
Weirong Zhu, Yanwei Niu, Jizhu Lu, Chuan Shen and Guang R. Gao,

An Improved Hidden Markov Model for Transmembrane Protein Topology Prediction and Its Applications to Complete Genomes
Bioinformatics, Volume 21, Number 9, pp. 1853-158, 2005
Robel Kahsay, Li Liao and Guang Gao

Quasi-Consensus Based COMParison of Profile Hidden Markov Models for Protein Sequences
Bioinformatics, Volume 21, Number 10, pp. 2287-2293, 2005
Robel Kahsay, Guoli Wang, Guang R. Gao, Li Liao and Roland Dunbrack.

Efficient Multithreaded Algorithms for the Fast Fourier Transform
Parallel and Distributed Computing Practices, Vol. 5, No. 2, Pages: 177-191, 2004
Parimala Thulasiraman, Kevin B. Theobald, Ashfaq A. Khokhar and Guang R. Gao

A Fine-Grain Load Adaptive Algorithm of the 2D Discrete Wavelet Transform for Multithreaded Architectures
Journal of Parallel and Distributed Computing (JPDC), Vol.64, No.1, Pages: 68-78, January 2004
Parimala Thulasiraman, Ashfaq A. Khokhar, Gerd Heber and Guang R. Gao

Evaluation and Choice of Various Branch Predictors for Low-Power Embedded Processor
Journal of Computer Science and Technology, Vol. 18, No. 6, Pages: 833-838, November, 2003
Dong Rui Fan, Hongbo Yang, Gaung R. Gao and Rong Cai Zhao

Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures
IEEE Transactions on Computers, Vol. 52, No. 1, Pages: 4-20, January 2003
Ramaswamy Govindarajan, Hongbo Yang, Jose N Amaral, Chihong Zhang and Guang R. Gao

Implementation of the EARTH Programming Model on SMP Clusters: a Multi-Threaded Language and Runtime System
Concurrency and Computation: Practice and Experience, Vol. 15, No. 9, Pages: 821-844, August 2003
Guy Tremblay, Christopher J. Morrone, Jose N. Amaral and Guang R.Gao

Minimizing Buffer Requirements in Rate-Optimal Schedules in Regular Dataflow Networks
Journal of VLSI Signal Processing, Vol. 31, No. 3, Pages: 207-229, Jul 2002
Ramaswamy Govindarajan and Guang R. Gao

Implementation and Evaluation of a Communication Intensive Application on the EARTH Multithreaded System
Concurrency and Computation: Practice and Experience, 14(3):183-201, March 2002
Kevin B. Theobald, Rishi Kumar, Gagan Agrawal, Gerd Heber, Ruppa K. Thulasiram, and Guang R. Gao

A Theory for Co-Scheduling Hardware and Software Pipelines in ASIPs and Embedded Processors
Design Automation for Embedded Systems, Vol. 6, No. 3, Pages: 243-275, March 2002
Ramaswamy Govindarajan, Erik R. Altman and Guang R. Gao

CASA: A Server for The Critical Assessment of Sequence Alignment Accuracy
Bioinformatics, Vol. 18, No. 3, Pages: 496-497, March 2002
Robel Y. Kahsay, Nataraj Dongre, Guang R. Gao, Guoli Wang and Roland L. Dunbrack Jr.

TROLL--Tandem Repeat Occurrence Locator
Bioinformatics, Vol. 18, No. 4, Pages: 634-636, April 2002
Adalberto T. Castelo, Wellington S. Martins and Guang R. Gao

Exploiting Locality in single Assignment Data Structures Updated through Split Phase Transactions
Cluster Computing, Special issue on Internet Scalability: Advances in Parallel, Distributed and Mobile Systems, Vol. 4, No. 4, Pages: 281-293, October 2001
Jose N. Amaral, Wen-Yen Lin, Jean-Luc Gaudiot and Guang R. Gao

Dynamic Load Balancers for a Multithreaded Multiprocessor System
Parallel Processing Letters, Vol. 11, No. 1, Pages: 169-184, March 2001
Prasad Kakulavarapu, Olivier Maquelin, Jose N. Amaral and Guang R. Gao

A New Memory Model and Cache Consistency Protocol
IEEE Transactions on Computers, Vol. 49, No. 8, Pages: 798-813, August 2000
Guang R. Gao and Vivek Sarkar, Location Consistency

Automatically Partitioning Threads for Multithreaded Architectures
Special Issues on Compilation and Architectural Support for Parallel Applications, Journal of Parallel and Distributed Computing, Vol. 58, No. 2, Pages: 159-189, August 1999
Xinan Tang and Guang R. Gao

Advances in the Dataflow Computational Model
Parallel Computing , Vol. 25, No.13 - 14, Pages: 1907 . 1927, 1999
Walid A. Najjar , Edward A Lee and Guang R. Gao

A New Framework for Elimination Based Data Flow Analysis Using DJ Graphs
ACM Transaction on Programming Languages and Systems, Vol. 20, No. 2, Pages 388-435, March 1998
Vugranam C. Sreedhar, Guang R. Gao, and Yong-Fong Lee

Optimal Modulo Scheduling Through Enumeration
International Journal on Parallel Programming, Vol. 26, No.2, Pages: 313-344, 1998
Erik R. Altman and Guang R. Gao

A Unified Framework for Instruction Scheduling and Mapping for Function Units with Structural Hazards
Journal of Parallel and Distributed Computing, Vol. 49, No. 2, Pages: 259-293, 1998
Erik R. Altman, Ramaswamy Govindarajan and Guang R. Gao

Incremental Computation of Dominator Trees
ACM Transactions on Programming Languages and Systems, Vol. 19, No. 2, Pages: 239-252, March 1997
Vugranam C. Sreedhar, Guang R. Gao and Yong-fong Lee

A Quadratic Time Algorithm for Computing Multiple Node Immediate Dominators
Journal of Programming Languages, 1996
Vugranam C. Sreedhar, Guang R. Gao and Yongfong Lee

The W-Network: A Low-Cost Fault-Tolerant Multistage Interconnection Network for Fine-Grain Multiprocessing
Concurrency: Practice and Experience, 8(6):415-428, July-August 1996
Kevin B. Theobald

A Framework for Resource-constrained Rate-optimal Software Pipelining
IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 11, Pages: 1133-1149, November 1996
Ramaswamy Govindarajan, Erik R. Altman and Guang R. Gao

A Study of the EARTH-MANNA Multithreaded System
International Journal of Parallel Programming, Vol. 24, No. 4, Page 319-347, August 1996
Herbert H. J. Hum, Olivier Maquelin, Kevin B. Theobald, Xinmin Tian, Guang R. Gao and Laurie J. Hendren

Identifying Loops Using DJ Graphs
ACM Transactions on Programming Languages and Systems (TOPLAS), Vol. 18, No. 6, Pages: 649 . 658, November 1996
Vugranam Sreedhar, Guang R. Gao and Yongfong Lee

A Linear Time Algorithm for Placing OE-nodes
Journal of Programming Languages, 1995. Accepted
Vugranam C. Sreedhar and Guang R. Gao

Automatic Data and Computation Decomposition for Distributed Memory Machines
Parallel Processing Letters, Vol. 5, No. 4, Pages: 539-550, April 1995
Qi Ning, Vincent V. Dongen and Guang R. Gao

Computing phi-nodes in Linear Time Using DJ Graphs
Journal of Programming Languages, Vol. 3, Pages: 191-213, April 1995
Vugranam C. Sreedhar and Guang R. Gao

ABC++: Concurrency by Inheritance in C++
IBM Systems Journal, Vol. 34, No. 1, Pages: 120-137, 1995
Eshrat Arjomandi, William O'Farrell, Ivan Kalas,Gita Koblents, Frank Ch. Eigler and Guang. R. Gao

Rate-optimal Schedule for Multi-rate DSP Computations
Journal of VLSI Signal Processing, Vol. 9, No.3, Pages: 211-232, April 1995
Ramaswamy Govindarajan and Guang R. Gao

An Efficient Hybrid Dataflow Architecture Model
Journal of Parallel and Distributed Computing, Vol. 19, No. 4, Pages: 293-307, December 1993
Guang. R. Gao

A Register Allocation Framework Based on Hierarchical Cyclic Interval Graphs
The Journal of Programming Languages, Vol. 1, No. 3, Pages: 155-185, 1993
Laurie J. Hendren, Guang R. Gao, Erik R. Altman and Chandrika Mukerji

Optimal Loop Storage Allocation for Argument-fetching Dataflow Machines
International Journal of Parallel Programming, Vol. 21, No. 6, Pages: 421-448, December 1992
Qi Ning and Guang R. Gao

A High-speed Memory Organization for Hybrid Dataflow/von Neumann Computing
Future Generation Computer Systems, Vol. 8, Pages: 287-301, 1992
Herbert H. J. Hum and Guang. R. Gao

Toward Efficient Fine-grain Software Pipelining and the Limited Balancing Techniques
International Journal of Mini and Microcomputers, Vol. 13, No. 2, Pages: 57-68, 1991
Guang. R. Gao, Herbert H. J. Hum and Yue-Bong Wong

Exploiting Fine-grain Parallelism on Dataflow Architectures
Parallel Computing, Vol. 13, No. 3, Pages: 309-320, March 1990
Guang R. Gao


 

Technical Memos

CAPSL Technical Memo 140:
clCodeletPipe API Documentation : Implementation of Codelet Pipe on Intel Iris Pro Architecture (Available upon request -- Please contact sraskar -AT- udel.edu)

Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Ga
May 2021


CAPSL Technical Memo 139:
iRealization of Dataflow Software Pipelining for Codelet Model using Hardware-Software Co-design (Available upon request -- Please contact sraskar -AT- udel.edu)

Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao
May 2021


CAPSL Technical Memo 138:
DECARD & DEMAC, A Distributed Runtime and a Modular Cluster for Embedded Systems (Available upon request -- Please contact diegor -AT- udel.edu)

Diego A. Roa Perdomo, Jose M. Monsalve Diaz, and Guang Gao
December 2020


CAPSL Technical Memo 137:
Towards Surrogate Model aware Program Execution Model (Available upon request -- Please contact sraskar -AT- udel.edu)

Siddhisanket Raskar and Guang Gao
January 2021


CAPSL Technical Memo 136:
DEMAC and CODIR: A whole stack solution for a HW/SW co-design using an MLIR Codelet Model Dialect

D. Roa Perdomo, R. Kabrick, J. Monsalve Diaz, S. Raskar, D. Fox, G. Gao
May 2020


CAPSL Technical Memo 135:
Study of Dataflow Software Pipelining under Codelet Model using Cannons Algorithm (Available upon request -- Please contact sraskar -AT- udel.edu)

Siddhisanket Raskar, Jose M Monsalve Diaz, Thomas Applencourt, Kalyan Kumaran, and Guang Gao
February 2020


CAPSL Technical Memo 134:
Brain-Flow : A brain inspired dataflow implementation using DEMAC

Diego Roa, Ryan Kabrick, Siddhisanket Raskar, Jose M Monsalve Diaz and Guang Gao
October 2019


CAPSL Technical Memo 133:
Extending Codelet Model for Dataflow Software Pipelining using Software-Hardware Co-design (Available upon request -- Please contact sraskar -AT- udel.edu)

Siddhisanket Raskar, Thomas Applencourt, Kalyan Kumaran, and Guang Gao
June 2019


CAPSL Technical Memo 132:
Sequential Codelet Model for Parallel Execution (Available upon request -- Please contact josem -AT- udel.edu)

Jose M Monsalve Diaz, and Guang R Gao
June 2019


CAPSL Technical Memo 131:
Toward A Parallel Turing Machine Model

Peng Qu, Jin Yan, and Guang Gao
July 2016


CAPSL Technical Memo 130:
Multigrain Parallelism: Compiling Coarse-Grain Parallel Programs for Fine-Grain Execution

Jaime Arteaga, Stephane Zuckerman, and Guang Gao
April 2016


CAPSL Technical Memo 129:
A Multiscale Modeling and Simulation Methodology for Financial Market Stability and Risk Analysis

Guang Gao, Paul Laux, Bintong Chen, Xiaoming Li and Stephane Zuckerman
March 2016


CAPSL Technical Memo 128:
Massively Multi-Core Systems and Virtual Memory

Guang Gao and Jack B. Dennis
April 2014


CAPSL Technical Memo 127:
Architecture and Programming Model for High Performance Interactive Computation

Jack B. Dennis, Arvind, Guang R. Gao, Xiaoming Li, and Lian-Ping Wang
April 2014
Full Document available on request


CAPSL Technical Memo 126:
SPARTA: a Stream-based Processor And Run-Time Architecture

Jean-Luc Gaudiot, Ahmed Louri, Guang R. Gao
March 2014
Available on request


CAPSL Technical Memo 125:
Locality-Driven Scheduling of Tasks for Data-Dependent Multithreading

Jaime Arteaga, Elkin Garcia, Stephane Zuckerman, Robert Pavel and Guang Gao
January 2014


CAPSL Technical Memo 124:
Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture

Elkin Garcia, Jaime Arteaga, Robert Pavel and Guang R. Gao
July 2013


CAPSL Technical Memo 123:
Toward a Self-aware System for Exascale Architectures

Aaron Landwehr, Stéphane Zuckerman and Guang R. Gao
June 2013


CAPSL Technical Memo 122:
SMART: a Stream-based Multi-core Architecture & Runtime Technology

Jean-Luc Gaudiot, Guang R. Gao, Elkin Garcia, Ganghee Jang, Souad Koliai and Haitao Wei
February 2013
Available on request


CAPSL Technical Memo 121:
Toward Exascale Systems: from Applications to Architectures

Jean-Luc Gaudiot, Guang R. Gao, Elkin Garcia, Ganghee Jang and Souad Koliai
October 2012
Available on request


CAPSL Technical Memo 120:
Leveraging Dataflow Execution Models for Exascale Performance

Stephane Zuckerman, Marco Solinas, Souad Koliai, Guang R. Gao and Roberto Giorgi
August, 2012
Available on request


CAPSL Technical Memo 119:
Determinacy and Repeatability of Parallel Program Schemata

Jack B Dennis, Guang R. Gao and Vivek Sarkar
August, 2012
Available on request


CAPSL Technical Memo 118:
Performance Modeling of Fine Grain Task Execution Models with Resource Constraints on Many-core Architectures

Elkin Garcia, Robert Pavel, Daniel Orozco and Guang R. Gao
June, 2012
Available on request


CAPSL Technical Memo 117:
Demystifying Performance Predictions of Distributed FFT3D Implementations

Daniel Orozco, Elkin Garcia, Robert Pavel, Orlando Ayala, Lian-Ping Wang and Guang R. Gao
June, 2012


CAPSL Technical Memo 116:
MACO: MetadatA Coalescing and Optimizing Framework

Juergen Ributzka, Aaron M. Landwehr, Sunil Shrestha and Guang R. Gao
June, 2012
Available on request


CAPSL Technical Memo 115:
Design Manual for the Fresh Breeze Simulator

Xiaoxuan Meng, Tom St. John, Jack B. Dennis and Guang R. Gao
April, 2012


CAPSL Technical Memo 114:
Massively Parallel Breadth-First Search Using a Tree-Structured Memory Model

Tom St. John, Jack B. Dennis and Guang R. Gao
April, 2012


CAPSL Technical Memo 113:
Toward a Highly Parallel Framework for Discrete-Event Simulation

Robert Pavel, Elkin Garcia, Daniel Orozco and Guang R. Gao
April, 2012
Available on request


CAPSL Technical Memo 112:
A Fresh Foundation for Software/Hardware Co-Design of Exascale Computing Systems

Jack B. Dennis, Guang R. Gao, Chengmo Yang, Xiaoming Li, Robert Pavel, Aaron Landwehr, Daniel Orozco and Kelly Livingston
February, 2012
Available on request


CAPSL Technical Memo 111:
Toward Efficient Fine-grained Dynamic Scheduling on Many-Core Architectures

Elkin Garcia, Daniel Orozco, Robert Pavel and Guang R. Gao
February, 2012


CAPSL Technical Memo 110:
SHF:Large:Collaborative Research: Power-Efficient Fault Resilience in Massively Parallel Computing

Guang R. Gao, Jack B. Dennis and Chengmo Yang
November, 2011
Available on request


CAPSL Technical Memo 109:
Comparative Evaluation of Alternative Program Execution Models

Jack B. Dennis, Robert Pavel and Guang R. Gao
September, 2011
Available on request


CAPSL Technical Memo 108:
Code Partition and Overlays: A Reintroduction to High Performance Computing

Joseph B. Manzano, Ge Gan, Juergen Ributzka, Sunil Shrestha and Guang R. Gao
August, 2011


CAPSL Technical Memo 107:
TIDeFlow: The Time Iterated Dependency Flow Execution Model

Daniel Orozco, Elkin Garcia, Robert Pavel, Rishi Khan and Guang R. Gao
August, 2011


CAPSL Technical Memo 106:
C64prof: A Parallel Pro?ling Environment for the Cyclops64 Architecture

Mark Pellegrini and Guang R. Gao
June, 2011


CAPSL Technical Memo 105:
Polytasks: A Compressed Task Representation for HPC Runtimes

Daniel Orozco, Elkin Garcia, Robert Pavel, Rishi Khan and Guang Gao
June, 2011


CAPSL Technical Memo 104:
Toward an Execution Model for Extreme-Scale Systems-Runnemede and Beyond

Guang R. Gao, Joshua Suetterlein and Stephane Zuckerman
April, 2011
Available on request


CAPSL Technical Memo 103:
High Throughput Queue Algorithms

Daniel Orozco, Elkin Garcia, Rishi Khan, Kelly Livingston and Guang R. Gao
January, 2011


CAPSL Technical Memo 102:
Energy efficient tiling on a Many-Core Architecture

Elkin Garcia, Daniel Orozco and Guang R. Gao
October, 2010


CAPSL Technical Memo 101:
Locality Optimization of Stencil Applications using Data Dependency Graphs

Daniel Orozco, Elkin Garcia and Guang R. Gao
October, 2010


CAPSL Technical Memo 100:
Experiments with the Fresh Breeze Tree-Based Memory Model

Jack B. Dennis, Guang R. Gao and Xiao X. Meng
October, 2010


CAPSL Technical Memo 99 Revised:
The Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures

Juergen Ributzka, Yuhei Hayashi and Guang R. Gao
April, 2011


CAPSL Technical Memo 99:
The Elephant and the Mouse: Non-Strict Fine-Grain Synchronization for Many-Core Architectures

Juergen Ributzka, Yuhei Hayashi and Guang R. Gao
June, 2010


CAPSL Technical Memo 98:
Dynamic Percolation - Mapping Dense Matrix Multiplication on a Many-Core Architecture

Elkin Garcia, Rishi Khan, Kelly Livingston, Ioannis E. Venetis and Guang R. Gao
June, 2010


CAPSL Technical Memo 97:
TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS

Handong Ye, Robert Pavel, Aaron Landwehr, and Guang R. Gao
May, 2010


CAPSL Technical Memo 96:
Many-Core Chip Architecture - A Report on a Novel Architecture/Software Co-Verification Platform

Juergen Ributzka, Yuhei Hayashi and Guang R. Gao
April, 2010


CAPSL Technical Memo 95:
Optimized Dense Matrix Multiplication on a Many-Core Architecture

Elkin Garcia, Ioannis E. Venetis, Rishi Khan and Guang R. Gao
February, 2010


CAPSL Technical Memo 94:
Synchronization for Dynamic Task Parallelism on Manycore Architectures

Yonghong Yan, Sanjay Chatterjee, Daniel Orozco, Elkin Garcia, Jun Shirako, Zoran Budimlic, Vivek Sarkar and Guang Gao
February, 2010


CAPSL Technical Memo 93:
A Study of a Software Cache Implementation of the OpenMP Memory Model for Multicore and Manycore Architectures

Chen Chen, Joseph B Manzano, Ge Gan, Guang R. Gao, Vivek Sarkar
February, 2010


CAPSL Technical Memo 92:
Establishing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs

Chen Chen, Wenguang Chen, Vugranam Sreedhar, Rajkishore Barik, Vivek Sarkar and Guang Gao
January, 2010


CAPSL Technical Memo 91:
Diamond Tiling: A Tiling Framework for Time-iterated Scientific Applications.

Daniel Orozco and Guang Gao
December, 2009


CAPSL Technical Memo 90:
Analysis and Performance Results of Computing Betweenness Centrality on IBM Cyclops64

Guangming Tan, Vugranam Sreedhar, Guang R. Gao
October, 2009


CAPSL Technical Memo 89:
Formalizing Causality as a Desideratum for Memory Models and Transformations of Parallel Programs

Chen Chen, Wenguang Chen, Vugranam Sreedhar, Rajkishore Barik, Vivek Sarkar and Guang Gao
July, 2009


CAPSL Technical Memo 88:
Collaborative Research: Programming Models and Storage System for High Performance Computation with Many-Core Processors

Jack B. Dennis, Guang R Gao and Vivek Sarkar
May 11th, 2009


CAPSL Technical Memo 87:
Mapping the FDTD Application to Many-Core Chip Architectures

Daniel A. Orozco and Guang R. Gao.
March 3rd, 2009


CAPSL Technical Memo 86:
A Study of Different Instantiations of the OpenMP Memory Model and Their Software Cache Implementations

Chen Chen, Joseph B Manzano, Ge Gan, Guang R. Gao and Vivek Sarkar.
January, 2009


CAPSL Technical Memo 85:
Tile Reduction: an OpenMP Extension for Tile Aware Parallelization

Ge Gan, Xu Wang, Joseph B Manzano and Guang R. Gao
December, 2008


CAPSL Technical Memo 84:
Optimizing the LU Benchmark for the Cyclops-64 Architecture.

Ioannis E. Venetis and Guang R. Gao
July 8th, 2009


CAPSL Technical Memo 83:
Analysis and Performance Results of Computing Betweeness Centrality on IBM Cyclops64

Guangming Tan, Andrew Russom Vugranam Sreedhar and Guang R Gao
April 9th, 2008


CAPSL Technical Memo 82:
A New Cache Protocol Based on the Order Free Consistency Memory Model

Chen Chen, Joseph B Manzano, Ge Gan, Guang R Gao and Vivek Sarkar
May, 2008


CAPSL Technical Memo 81:
Performance Tuning of the Fast Fourier Transform on a Multicore Architecture

Liping Xue, Long Chen, Ziang Hu and Guang R Gao
Febraury 8th, 2008


CAPSL Technical Memo 80:
Order Free Consistency: Towards a Fully Asynchronous Memory Model

Chen Chen, Joseph B Manzano, Wenguang Chen and Guang R Gao
November, 2007

CAPSL Technical Memo 79:
Concurrency Analysis for Shared Memory Programs with Textually Unaligned Barriers

Yuan Zhang, Evelyn Duesterwald and Guang R Gao
November, 2007


CAPSL Technical Memo 78:
Implementation of the Smith-Waterman Algorithm on A Reconfigurable Supercomputing Platform

Peiheng Zhang, Guangming Tan and Guang R. Gao
April 16th, 2007


CAPSL Technical Memo 77:
A Study of Parallel Betweenness Centrality Algorithm on a Many-core architecture

Guangming Tan and Guang R. Gao
June 27th, 2007


CAPSL Technical Memo 76:
FAME: Financial Application with Many-core-on-a-chip Architecture

Weirong Zhu, Parimala Thulasiraman, Ruppa K. Thulasiram and Guang R. Gao
February 17th, 2006


CAPSL Technical Memo 75:
Optimizing the LU Benchmark for the Cyclops-64 Architecture

Ioannis E. Venetis and Guang R. Gao
February, 2007


CAPSL Technical Memo 74:
Exploring a Multithreaded Methodology to Implement a Network Communication Protocol on the IBM Cyclops-64 Multithreaded Architecture

Ge Gan, Ziang Hu, Juan del Cuvillo and Guang R. Gao
January, 2007


CAPSL Technical Memo 73:
A Parallel Dynamic Porgramming Algorithm on a Multi-core Architecture

Guangming Tan and Guang R. Gao
February, 2007


CAPSL Technical Memo 72:
Automatic Program Segment Similarity Detection in Targeted Program Performance Improvement

Haiping Wu, Eunjung Park, Mihailo Kaplarevic, Yingping Zhang, Murat Bolat and Guang R. Gao
December 30, 2006


CAPSL Technical Memo 71:
An Automatic Methodology for Program Segment-based Compiler Optimization Search

Haiping Wu, Eunjung Park, Murat Bolat, Mihailo Kaplarevic, Yingping Zhang, Xiaoming Li and Guang R. Gao
November 14, 2006


CAPSL Technical Memo 70:
Handling Massive Parallelism Efficiently: Introducing Batches of Threads

Ioannis E. Venetis, Theodore S. Papatheodorou and Guang R. Gao
October 18, 2006


CAPSL Technical Memo 69:
Software Pipelining On Multi-core Chip Architectures: A case study on IBM Cyclops-64 Chip Architure

Alban Douillet, Junmin Lin and Guang R. Gao
February 14, 2006


CAPSL Technical Memo 68:
Server I/O Acceleration Using an Embedded Multi-core Architecture

Lurng-Kuo Liu, Fei Chen, Christos J. Georgiou and Guang R. Gao
May 12, 2006


CAPSL Technical Memo 67 Revised:
Synchronization State Buffer: Supporting Efficient Fine-Grain Synchronization on Many-Core Architectures

Weirong Zhu, Vugranam C. Sreedhar, Ziang Hu and Guang R. Gao
November 20, 2006


CAPSL Technical Memo 67:
Efficient Fine-Grain Synchronization on a Multi-Core Chip Architecture: A Fresh Look

Weirong Zhu, Ziang Hu, and Guang R. Gao
July 17, 2006


CAPSL Technical Memo 66:
An Efficient Communication Infrastructure for the IBM Cyclops-64 Computer System

Ge Gan, Ziang Hu, Juan del Cuvillo and Guang R. Gao
June 12, 2006


CAPSL Technical Memo 65:
Optimized Lock Assignment and Allocation for Productivity: A Method for Exploiting Concurrency among Critical Sections

Yuan Zhang, Vugranam C. Sreedhar, Weirong Zhu, Vivek Sarkar and Guang R. Gao
May 10th, 2006


CAPSL Technical Memo 64:
Multidimensional Kernel Generation for Loop Nest Software Pipelining

Alban Douillet, Hongbo Rong and Guang R. Gao
Febraury 13th, 2006


CAPSL Technical Memo 63:
A New Framework for Analysis and Optimization of Shared Memory Parallel Programs"

Vugranam C. Sreedhar, Yuan Zhang and Guang R. Gao
July 18th, 2005


CAPSL Technical Memo 62:
" FAST: A Functionally Accurate Simulation Toolset for the Cyclops-64 Cellular Architecture"
Juan del Cuvillo, Weirong Zhu, Ziang Hu and Guang R. Gao
June 17th, 2005


CAPSL Technical Memo 61:
"P3I: Delaware's Programmability, Productivity and Proficiency Inquiry"
Joseph B. Manzano, Yuan Zhang and Guang R. Gao
June 10th, 2005


CAPSL Technical Memo 60:
"Performance Analysis of Interconnection Network of Cyclops-64 Chip Architecture"
Yingping Zhang, Taikyeong Jeong, Fei Chen, Ronny Nitzsche and Guang R. Gao
June 1st, 2005


CAPSL Technical Memo 59:
"Concurrency Analysis and Its Applications"
Yuan Zhang and Guang Gao
May 28th, 2005


CAPSL Technical Memo 58:
"Register Pressure in Software Pipelined Loop Nests: Fast Computation and Impact on Architecture Design"
Alban Douillet, Hongbo Rong and Guang R. Gao
May 3rd, 2005


CAPSL Technical Memo 57:
"Parallel Reconstruction for Parallel Imaging SPACERIP on Cellular Architecture"
Yuanwei Niu, Ziang Hu and Guang R. Gao
June 15, 2004


CAPSL Technical Memo 56:
"Quasi consensus based comparison of profile hidden Markov models for protein sequences"
Robel Y. Kahsay, Guoli Wang, Li Liao, Roland Dunbrack and Guang R. Gao
May 28, 2004


CAPSL Technical Memo 55:
"Toward a Software Infrastructure for the Cyclops-64 Cellular Architecture"
Juan B. del Cuvillo, Ziang Hu, Weirong Zhu, Fei Chen and Guang R. Gao
April 26, 2004


CAPSL Technical Memo 54:
"Speeding up CG on Cluster with Two Dimensional Blocking Method and EARTH Runtime Support"
Fei Chen, Kevin B. Theobald and Guang R. Gao
April 23, 2004


CAPSL Technical Memo 53:
"Lamport Order Revisit: A Study on How to Efficiently Achieve Sequential Consistency on a Modern Multiprocessor-on-a-Chip Architecture"
Yuan Zhang, Weirong Zhu, Fei Chen, Ziang Hu and Guang R. Gao
March 01, 2004


CAPSL Technical Memo 52:
"Analyzable Atomic Sections: Integrating Fine-Grained Synchronization and Weak Consistency Models for Scalable Parallelism"
Vivek Sarkar and Guang R. Gao
February 09, 2004


CAPSL Technical Memo 51:
"Code Generation for Single-Dimension Software Pipelining of Multi-Dimensional Loops"
Hongbo Rong, Alban Douillet, R.Govindarajan and Guang R. Gao
September 26, 2003


CAPSL Technical Memo 49:
"Single-Dimension Software Pipelining for Multi-Dimensional Loops"
Hongbo Rong, Zhizhong Tang, R.Govindarajan, Alban Douillet and Guang R. Gao
September 26, 2003


CAPSL Technical Memo 48:
"Programming Method and software Infrastructure for Cellular Architecture"
Guang R. Gao, Juan del Cuvillo, Ziang Hu, Robert Klosiwicz, Clement Leung, Jason McGuiness, Hirofumi Sakane, Yingping Zhang
September 16, 2003


CAPSL Technical Memo 47:
"Compiler-Assisted Cache Replacement: Problem Formulation and Performance Evaluation"
Hongbo Yang, R. Govidarajan, Guang R. Gao and Ziang Hu
September 9, 2003


CAPSL Technical Memo 45:
"Selective Slim Scheduling: On Software Pipelining of Loop Nests"
Hongbo Rong, Zhizhong Tang, R. Govidarajan, Guang R. Gao
June 8, 2003


CAPSL Technical Memo 44:
"Algorithms, Applications, and Environments for Emerging Petascale Architectures"
R. Govindarajan, H. Tufo, S. Thomas, R. Loft, Guang R. Gao, J. Moreira and J.Castanos
March 6, 2003


CAPSL Technical Memo 43:
"Executable Performance Model and Evaluation of High Performance Architectures with Percolation"
Adeline Jacquet, Vincent Janot, R. Govindarajan, Clement Leung, Guang R. Gao and Thomas Sterling
November 21, 2002


CAPSL Technical Memo 42:
"A Quantitative Study on Performance-Power Impact of Dual-Speed Pipeline Architectures"
Hongbo Yang, R.Govindarajan, Guang R. Gao and Kevin B. Theobald
June 13, 2002


CAPSL Technical Memo 41:
"Maximizing Pipelined Functional Units Usage for Minimum Power Software Pipelining"
Hongbo Yang, R.Govindarajan, Guang R. Gao and George Cai
September 27, 2001


CAPSL Technical Memo 40:
"New Normalization Method and Error Analysis for Gene Expression Microarray Data"
Stanley D. Luck, Francisco Jose Useche G., Wellington S. Martins and Guang R. Gao
December 11, 2000


CAPSL Technical Memo 39:
"Threaded-C Language Reference Manual (Release 2.0)"
Guy Tremblay, Kevin B.Theobald, Christopher J.Morrone, Mark D.Butala, Jose Nelson Amaral and Guang R. Gao
September 23, 2000


CAPSL Technical Memo 38:
"Automatic Prefetching of Induction Pointers"
Artour Stouctchinin, Jose Nelson Amaral, Guang R. Gao, Jim Dehnert, Suneel Jain and Alban Douillet
April 18, 2000


CAPSL Technical Memo 37:
"Automatic Prefetching of Induction Pointers for Software Pipelining"
Artour Stoutchinin, Jose Nelson Amaral, Guang R. Gao, Jim Dehnert and Suneel Jain
November 12, 1999


CAPSL Technical Memo 36:
"Minimum Register Instruction Sequence Problem: Revisiting Large Optimal"
R. Govindarajan, Hongbo Yang, Chihong Zhang, Jose Nelson Amaral and Guang R. Gao
November 12, 1999


CAPSL Technical Memo 35:
"A Comparative Performance Study of Fine-Grain Multi-Threading on Distributed Memory Machines"
Prasad Kakulavarapu, Christopher J. Morrone, Kevin B. Theobald, Jose Nelson Amaral and Guang R. Gao
November 11, 1999


CAPSL Technical Memo 34:
"Caching Single-Assignment Structures to Build a Robust Fine-Grain Multi-Threading System"
Wen-Yen Lin, Jose Nelson Amaral, Jean-Luc Gaudiot and Guang Gao
October 13, 1999


CAPSL Technical Memo 33:
"Definition of the EARTH Model"
Kevin B. Theobald
October 6, 1999


CAPSL Technical Memo 32:
"The Benefits of Hardware-Assisted Fine-Grain Multithreading"
Kevin B. Theobald and Guang R. Gao
July 20, 1999


CAPSL Technical Memo 31:
"HTMT Phase 2 Report"
Guang R Gao, Jose Nelson Amaral, Andres Marquez, Kevin B. Theobald, Sean Ryan, Zachary Ruiz, Thomas Geiger and Christopher J. Morrone
July 19, 1999


CAPSL Technical Memo 30:
"Design and Implementation of an Eefficient Thread Partitioning Algorithm"
Jose Nelson Amaral, Guang R. Gao, Erturk Dogan Kocalar, Patrick O'Neil and Xiang Tang
July 1, 1999


CAPSL Technical Memo 29:
"Advances in Dataflow Computational Model"
Walid A Najjar, Edward A. Lee and Guang R. Gao
April 1, 1999


CAPSL Technical Memo 28:
"Efficient State-Diagram Construction Methods for Software Pipelining"
Chihong Zhang, R. Govindarajan, Sean Ryan and Guang R. Gao
March 5, 1999


CAPSL Technical Memo 27:
"SEMi: A Simulator for EARTH, MANNA, and i860"
Kevin Theobald
March 1, 1999


CAPSL Technical Memo 26:
"An HTMT Performance Prediction Case Study: Implementing Cannon's Dense Matrix Multiply Algorithm"
Jose Nelson Amaral, Guang R. Gao, Phillip Merkey, Thomas Sterling, Zachary Ruiz and Sean Ryan
February 17, 1999


CAPSL Technical Memo 25:
"Option Pricing Problem on a Multithreaded Parallel Architecture"
Ruppa K. Thulasiram and Guang R.Gao
November 11, 1998


CAPSL Technical Memo 24:
"Design of the Runtime System for the Portable Threaded-C Language"
Prasad Kakulavarapu, Olivier Maquelin and Guang R. Gao
July 21, 1998


CAPSL Technical Memo 23:
"Automatically Partitioning Threads Based on Remote Paths"
Xinan Tang and Guang R. Gao
July 20, 1998


CAPSL Technical Memo 22:
"A Refinement of the HTMT Program Execution Model"
Guang Gao, Jose Nelson Amaral, Andres Marquez and Kevin Theobald"
July 13, 1998


CAPSL Technical Memo 21:
"Self-Avoiding Walks Over Two-Dimensional Adaptive Unstructured Grids"
Gerd Heber, Rupak Biswas and Guang R. Gao
April 20, 1998


CAPSL Technical Memo 20:
"Using Multithreading for the Automatic Load Balancing of 2-D Adaptive Finite Element Meshes"
Gerd Heber, Rupak Biswas,Parimala Thulasiraman and Guang R. Gao
March 16, 1998


CAPSL Technical Memo 19:
"Overview of the Threaded-C Language"
Kevin B. Theobald, Jose Nelson Amaral, Gerd Herber, Oliver Maquelin, Xinan Tang and Guang R. Gao
March 16, 1998


CAPSL Technical Memo 18:
"A Superstrand Architecture"
Andres Marquez, Kevin B. Theobald, Xinan Tang, Thomas L. Sterling and Guang R. Gao
March 14, 1998


CAPSL Technical Memo 17:
"An Enhanced Co-Scheduling Method Using Reduced MS-State Diagrams"
R. Govindarajan, N.S.S. Narasimha Rao, Erik R. Altman and Guang R. Gao
February 18, 1998


CAPSL Technical Memo 16:
"Location Consistency -- A New Memory Model and Cache Consistency Protocol"
Guang R. Gao and Vivek Sarkar
February 16, 1998


CAPSL Technical Memo 15:
"Superconducting Processors for HTMT: Issues and Challenges"
Kevin B. Theobald, Guang R. Gao and Thomas L. Sterling
December 15, 1997


CAPSL Technical Memo 14:
"A Superstrand Architecture"
Andres Marquez, Kevin B. Theobald, Xinan Tang and Guang R. Gao
December 1, 1997


CAPSL Technical Memo 13:
"Partial Sampling with Reverse State Reconstruction: A New Technique for Branch Predictor Performance Estimation"
Darren E. Vengroff and Guang R. Gao


CAPSL Technical Memo 11:
"Heap Analysis and Optimizations for Threaded Programs"
Xinan Tang, Rakesh Ghiya, Laurie J. Hendren and Guang R. Gao
November 7, 1997


CAPSL Technical Memo 10:
"A Register Pressure Sensitive Instruction Scheduler for Dynamic Issue Processors"
Raul Silvera, Jian Wang and Guang R. Gao


CAPSL Technical Memo 09:
"The HTMT Program Execution Model"
Guang R. Gao, Kevin B. Theobald, Andres Marquez and Thomas Sterling
July 18, 1997


CAPSL Technical Memo 08:
"Benefits of Efficient Multithreading on Distributed Memory for the Parallelization of Communication-Intensive Applications"
Angela C. Sodan and Guang R. Gao


CAPSL Technical Memo 07:
"An Interger Linear Programming Model of Software Pipelining for the MIPS R8000 Processor"
Artour Stoutchinin


CAPSL Technical Memo 06:
"A New Fast Algorithm for Optimal Register Allocation in Modulo Scheduled Loops"
Sylvain Lelait, Guang R. Gao and Christine Eisenbeis


CAPSL Technical Memo 05:
"Design and Evaluation of Dynamic Load Balancing Schemes under A Multithreaded Execution Model"
Haiying Cai, Olivier Maquelin and Guang R. Gao


CAPSL Technical Memo 04:
"Non-Clustered Statistical Trace Sampling for Large Cache Design Space Exploration"
Darren E. Vengroff, Kenneth Simpson and Guang R. Gao


CAPSL Technical Memo 03:
"Thread Partitioning and Scheduling Based on Cost Model"
Xinan Tang, Jian Wang, Kevin B. Theobald and Guang R. Gao
April 15, 1997


CAPSL Technical Memo 02:
"Elastic History Buffer: A Low-Cost Method to Improve Branch Prediction Accuracy"
Maria-Dana Tarlescu, Kevin B. Theobald and Guang R. Gao
November 14, 1996


CAPSL Technical Memo 01:
"Hybrid Technology Multithreaded Architecture"
Guang R. Gao, Konstantin K. Likharev, Paul C. Messina and Thomas L. Sterling



 

 

Technical Notes

CAPSL Technical Note 23:
"A Brief Overview of the PICASim Model and Framework - Draft"
Robert Pavel
May, 2014

CAPSL Technical Note 22:
"Overview of the UHPC Execution Model"
Guang Gao
June, 2009

CAPSL Technical Note 21:
"Experiences Porting Mstack to ParalleX"
Mark Pellegrini
August, 2008

CAPSL Technical Note 20:
"The EDIF2KSF Converter"
Jonathan Barton
August, 2007

CAPSL Technical Note 19:
"Mrs. Clops Tool Chain Manual"
Matthew Wells
March, 2006

CAPSL Technical Note 18:
"ASAP Low-Level Connection Library"
Inanc Dogru
March, 2006

CAPSL Technical Note 17:
"C64 DDR Verification and Critical Path Reduction"
Michael Bodnar
September, 2005

CAPSL Technical Note 16:
"The Cyclops-E Emulation Environment"
Juan del Cuvillo and Nathaniel Merritt.
August, 2005

CAPSL Technical Note 15:
"SLICED: a Source Level Interacting Cyclops-64 Effective Debugger"
Geoff Gerfin and Ziang Hu.
August 26, 2004

CAPSL Technical Note 14:
"DISC64: A Disassembler for the Instruction Set of Cyclops-64"
John Tully
August 5, 2004

CAPSL Technical Note 13:
"Generate the Multiple and Add Operation during the WHIRL Lowering Phase
Joseph Bryant Manzano Franco and Haiping Wu
May 31, 2004

CAPSL Technical Note 12:
"Integrate EBO with Pattern Matching"
Divya Parthasarathi
May 28, 2004

CAPSL Technical Note 11:
"A DIMES Demonstration Application: Mandelbot-Set Generation Using a Work-Stealing Algorithm"
Jason M. McGuiness
June 15, 2002

CAPSL Technical Note 10 Revised:
"A Software Development Kit for CeDIMES"
Juan del Cuvillo, Robert Klosiewicz and Yingping Zhang
March 15, 2005

CAPSL Technical Note 10:
"A Software Development Kit for CeDIMES"
Juan del Cuvillo, Robert Klosiewicz and Yingping Zhang
September 30, 2002

CAPSL Technical Note 09:
"Threaded-C Release 2.0: Motivation, Description, and Rationale"
Guy Tremblay
June 15, 2000

CAPSL Technical Note 08:
"Runtime Locality Transformations for NAS Conjugate Gradient (Sparse Matrix Computation)"
Rishi Kumar, Nathaniel Johnson, Ruppa K. Thulasiram, Gagan Agrawal, Guang R. Gao
December 17, 1999

CAPSL Technical Note 07:
"Computational Financial Derivatives ---A Primer"
Ruppa K. Thulasiram, Guang R. Gao
October 9, 1998

CAPSL Technical Note 06:
"Debugging: The `Feedback' Way"
James P. Durbano
October 9, 1998

CAPSL Technical Note 05:
"Portable Threaded-C Release 1.1"
Jos'e Nelson Amaral, Zachary Ruiz, Sean Ryan, Andres Marquez, Christopher Morrone, Prasad Kakulavarapu, Guang R. Gao
October 8, 1998

CAPSL Technical Note 04:
"Implementation of I-Structures as a Library of Functions in Portable Threaded-C"
Jos'e Nelson Amaral, Guang R. Gao
June 15, 1998

CAPSL Technical Note 03:
"Proposed Changes to Threaded-C"
Kevin B. Theobald
January 20, 1998

CAPSL Technical Note 02:
"A Portable Threaded-C Language for EARTH Multiprocessors"
Xinan Tang, Olivier Maquelin, Kevin B. Theobald, Guang R. Gao, Prasad Kakulavarapu
January 6, 1998

CAPSL Technical Note 01:
"An Overview of the Threaded-C Language"
Guang R. Gao, Xinan Tang, Parimala Thulasiraman, Kevin B. Theobald
July 25, 1997


 

 

CAPSL Theses

  Ph.D. Theses

  Masters Theses

 

Ph.D. Theses:

"Toward High Performance and Energy Efficiency on Manycore Architectures"
Elkin Garcia
Summer 2014

"Concurrency and Synchronization in the Modern Many-Core Era: Challenges and Opportunities"
Juergen Ributzka
Spring 2013

"TIDeFlow: A Dataflow-inspired execution model for high performance computing programs"
Daniel Orozco
Spring 2012

"A comparison between virtual code management techniques"
Joseph B. Manzano
Summer 2011

"Exploring novel many-core architectures for scientific computing"
Long Chen
Fall 2010

"Programming Model and Execution Model for OpenMP on the Cyclops-64 many-core processor"
Ge Gan
Spring 2010

"Enabling System Validation for the many-core Supercomputer"
Fei Chen
Summer 2009
Available on request

"Breaking away from the OS Shadow: A Program Execution Model Aware Thread Virtual Machine for Multicore Architectures"
Juan del Cuvillo
Summer 2008

"Static Analyses and Optimizations for Parallel Programs with Synchronization"
Yuan Zhang
Summer 2008

"Efficient Synchronization for a Large-Scale Mult-Core Chip Architecture"
Weirong Zhu
Spring 2007

"Advanced Protein Sequence Analysis Methods for Structure and Function Prediction"
Robel Y. Kahsay
Spring 2005

"The CARE Architecture"
Andrés Marquez
Winter 2004

"Power-Aware Compilation Techniques for High Performance Processors"
Hongbo Yang
Fall 2003

"Irregular Computations on Fine-Grain Multithreaded Architecture"
Parimala Thulasiraman
Fall 2000

"Compiling for Multithreaded Architectures"
Xinan Tang
Fall 1999

"EARTH: An Efficient Architecture for Running Threads"
Kevin Bryan Theobald
Spring 1999

 

Masters Theses:

"DARTS: A Runtime Based on the Codelet Execution Model"
Joshua Suetterlein
Spring 2014

"Memory Optimization in Codelet Execution Model on Many-Core Architectures"
Yao Wu
Spring 2014

"Tapestry: Weaving Execution and Synchronization Models"
Joshua Landwehr
Winter 2013

"Parallel Low-Overhead Data Collection Framework for a Resource Centric Performance Analysis Tool"
Sunil Shrestha
Spring 2012

"Memory State Flow Analysis and Its Application"
Xiaomi An
Winter 2011

"Toward a software pipelining framework for many-core chips"
Juergen Ributzka
Summer 2009

"Optimizing the Fast Fourier Transform on a Many core Architecture"
Long Chen
Winter 2008

"Design and Implementation of Tool-chain framework to support OpenMP Single Source Compilation on CELL platform"
Yi Jiang
Winter 2007

"A Study of Simulation and Verification of a Many-core Architecture on two modern reconfigurable platforms"
Dimitrij Krepis
Summer 2007

"Methodology of Dynamic Compiler Option Selection Based on Static Program Analysis - Implementation and Evaluation"
Eun Jung Park
Summer 2007

"Efficient Mapping of Fast Fourier Transform on the Cyclops-64 Multithreaded Architecture"
Liping Xue
Summer 2007

"Tower Methodology for Verification of Multi-Core Architecture - A Case Study"
Divya Parthasarathi
Summer 2005

"A Study of Architecture and Performance of IBM Cyclops-64 Interconnection Network"
Yingping Zhang
Summer 2005

"Quantitive Study of Human-Computer interaction in adaptive search on Mobile Handsets and its Localization for Mandarin Chinesse"
Xing Wang
Fall 2004

"A Parallel Debugger for the Cyclops Architecture"
Robert S. Klosiewic Jr.
Summer 2004

"Multithreaded Parallel Implementation of HPMMPFAM on EARTH"
Weirong Zhu
Spring 2004

"Implementing Parallel CG Algorithm on the EARTH Multithreaded Architecture"
Fei Chen
Spring 2004

"Code Size Oriented Memory Allocation for Temporary Variables"
Yan Xie
Winter 2004

"Binary Diffing"
Kapil Khosla
Fall 2003

"A Portable Runtime System and its Derivation for the Hardware SU Implementation"
Chuan Shen
Fall 2003

"A Interconnect Architecture for Commodity Off-the-thelf Multiprocessor Emulation Testbed"
Mark Lawrence Legutko
Spring 2002

"A Visual Perspective to Motif/Pattern Analysis"
Praveen R Thiagarajan
Summer 2001

"Automated Single Nucleotide Ploymorphism Discovery Pipeline"
Francisco Jose Useche Gomez
Summer 2001

"Efficient Parallelization of Reductions and Loop Based Programs on EARTH"
Rishi Kumar
Summer 2001

"Whole Genome Comparison Using A Multithreaded Parallel Implementation"
Juan Del Cuvillo
Summer 2001

"A EARTH Runtime System For Multi-Processor/Multi-Node Beowulf Cluster"
Christopher Jason Morrone
Spring 2001

"Implementation Issues of a Hardware-Based EARTH Synchronization Unit"
Thomas Geiger
Spring 2001

"Register Stack and Optimal Allocation Instruction Placement"
Alban Douillet
Spring 2001

"Advanced Compilers, Architectures and Parallel Systems"
ShaoHua Han
Spring 2001

"Dynamic Load Balancing Issues in the EARTH Runtime System"
Kamala Prasade Kakulavarapu
Fall 1999

"Towards a Custom EARTH Synchronization Unit"
Ian Stuart MacKenzie Walker
Summer 1999

"Static Instruction Schedule For Dynamic Issue Processor"
Raul E. Silvera Muñoz
Spring 1997

 


© CAPSL 1996-2013. All Rights Reserved.
capslwww@capsl.udel.edu