Preliminary BPOE program  is released.


Big data has emerged as a strategic property of nations and organizations. There are driving needs to generate values from big data. However, the sheer volume of big data requires significant storage capacity, transmission bandwidth, computations, and power consumption. It is expected that systems with unprecedented scales can resolve the problems caused by varieties of big data with daunting volumes. Nevertheless, without big data benchmarks, it is very difficult for big data owners to make choice on which system is best for meeting with their specific requirements. They also face challenges on how to optimize the systems and their solutions for specific or even comprehensive workloads. Meanwhile, researchers are also working on innovative data management systems, hardware architectures, operating systems, and programming systems to improve performance in dealing with big data.

This workshop, the seventh its series, focuses on architecture and system support for big data systems, aiming at bringing researchers and practitioners from data management, architecture, and systems research communities together to discuss the research issues at the intersection of these areas.

Call for Papers


The workshop seeks papers that address hot topic issues in benchmarking, designing and optimizing big data systems. Specific topics of interest include but are not limited to:

  • Big data workload characterization and benchmarking
  • Performance analysis of big data systems
  • Workload-optimized big data systems
  • Innovative prototypes of big data infrastructures
  • Emerging hardware technologies in big data systems
  • Operating systems support for big data systems
  • Interactions among architecture, systems and data management
  • Hardware and software co-design for big data
  • Practice report of evaluating and optimizing large-scale big data systems

Papers should present original research. As big data spans many disciplines, papers should provide sufficient background material to make them accessible to the broader community.

Download CFP

Paper Submissions

Papers must be submitted in PDF, and be no more than 8 pages in standard two-column SIGPLAN conference format including figures and tables but not including references. Shorter submissions are encouraged. The submissions will be judged based on the merit of the ideas rather than the length. Submissions must be made through the on-line submission site.

Submission site:


Important Dates

Papers due                                February 3rd,  2016
Papers due                                February 17,  2016
Notification of acceptance      February 23,   2016
Camera-ready copies               March 30,       2016
Workshop Session                     April 3rd,        2016



Opening remark



Keynote I: Big Graph Analytics: Models, Platforms and Optimizations

Speaker: Prof. Ling Liu, Distributed Data Intensive Systems Lab, School of Computer Science, Georgia Institute of Technology

Abstract: Big graphs are finding increasing applications in many science and engineering domains, such as computational biology, cybermanufacturing and social media. Graphs provide a very flexible mathematical abstraction for describing relationships between entities in complex systems. Real world graphs are characterized by high connectivity and high irregularity. Such non-uniform characteristics increase the mismatch between the vertex centric parallel computation model and the computer hardware resources. Another problem with the vertex-centric computation model is that it treats vertices symmetrically and this uniform assumption breaks when graphs exhibit high irregularity and graph algorithms reveal non-uniform workloads. In this keynote, I will advocate a fundamental revisit of graph computation models and promotes a methodical framework for support high performance graph parallel abstractions that are resource aware, composable and programmable. I will discuss a suite of graph optimization techniques that explore workload characteristics of graph algorithms and irregularity hidden in graph structures. I will conclude the talk by presenting some interesting research problems and unique opportunities for big graph analytics.

Bio: Ling Liu is a Professor in the School of Computer Science at Georgia Institute of Technology. She directs the research programs in Distributed Data Intensive Systems Lab (DiSL), examining various aspects of large scale data intensive systems, including performance, availability, security and privacy. Prof. Liu is an elected IEEE Fellow, a recipient of IEEE Computer Society Technical Achievement Award in 2012. She has published over 300 international journal and conference articles and is a recipient of the best paper award from a number of top venues, including ICDCS 2003, WWW 2004, 2005 Pat Goldberg Memorial Best Paper Award, IEEE Cloud 2012, IEEE ICWS 2013, Mobiquitous 2014, ACM/IEEE CCGrid 2015. In addition to serve as general chair and PC chairs of numerous IEEE and ACM conferences in data engineering, very large databases, distributed computing, cloud computing fields, Prof. Liu has served on editorial board of over a dozen international journals. Currently, Prof. Liu is the editor in chief of IEEE Transactions on Service Computing and serves on the steering committee of IEEE Big Data Initiative (BDI). Prof. Liu’s current research is primarily sponsored by NSF, IBM and Intel.



Invited Talk I: On Horizontal Decomposition of the Operating System [pdf]

Speaker: Dr. Gang Lu, Beijing Academy of Frontier Science and Technology

Abstract: As previous OS abstractions and structures fail to explicitly consider the separation between resource users and providers, the shift toward server-side computing poses serious challenges to OS structures, which is aggravated by the increasing many-core scale and workload diversity.
This talk presents the horizontal OS model. We propose a new OS abstraction—-subOS—an independent OS instance owning physical resources that can be created, destroyed, and resized swiftly. We horizontally decompose the OS into the supervisor for the resource provider and several subOSes for resource users. The supervisor discovers, monitors, and provisions resources for subOSes, while each subOS independently runs applications. We confine state sharing among subOSes, but allow on-demand state sharing if necessary.
We present the first implementation—RainForest, which supports unmodified Linux applications binaries. Our comprehensive evaluations using six benchmark suites quantitatively show RainForest outperforms Linux with three different kernels, LXC, and XEN. The RainForest source code is soon available.

Bio: Gang Lu is the executive director of Beijing Academy of Frontier Science and Technology, Inc.. He received his Ph.D. in Computer Science from the Institute of Computing Technology, Chinese Academy of Sciences, under the direction of Professor Jianfeng Zhan. He received his B.S. degree in 2006 from Huazhong University of Science and Technology in China. His research interests include operating systems, cloud computing, and parallel and distributed computing.


Tea Break



Invited Talk II: Exploiting HPC Technologies to Accelerate Big Data Processing [pdf]

Speaker: Prof. Dhabaleswar K. Panda, The Ohio State University

Abstract: Modern HPC clusters are having many advanced features, such as
multi-/many-core architectures, high-performance RDMA-enabled interconnects, SSD-based storage devices and parallel file systems (Lustre). However, current generation Big Data middleware (such as Hadoop, Spark, and Memcached) have not fully exploited the benefits of the advanced features on modern HPC clusters. This talk will discuss about opportunities in accelerating Big Data middleware on modern HPC clusters while exploiting HPC technologies. Overview of advanced designs based on RDMA and heterogeneous storage architecture for multiple components of Hadoop (HDFS, MapReduce, RPC and HBase), Spark and Memcached will be presented. Benefits of these designs on various cluster configurations will be shown.

Bio: DK Panda is a Professor and University Distinguished Scholar of
Computer Science and Engineering at the Ohio State University. He has published over 350 papers in the area of high-end computing and networking. The MVAPICH2 (High Performance MPI and PGAS over InfiniBand, iWARP and RoCE) software packages for modern clusters, developed by his research group (, are currently being used by more than 2,525 organizations worldwide (in 77 countries). More than 355,000 downloads of this software have taken place from the project’s site. These software packages have enabled a large number of clusters to achieve TOP500 ranking during the last 14 years. Examples in the latest TOP500 list includes the 10th, 13th and 25th ranked ones. The new RDMA-enabled Apache Hadoop and Spark libraries, designed and developed by his team to exploit HPC technologies under the High-Performance Big Data (HiBD) project ( are currently being used by more than 150 organizations in 20 countries. More than 15,000 downloads of these libraries have taken place from the project’s site. Prof. Panda’s research has been supported by funding from US National Science Foundation, US Department of Energy, and several industry including IBM, Intel, Cisco, Cray, SUN, Mellanox, QLogic, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM. More details about Prof. Panda are available at


Regular Paper I: An analysis of image storage systems for scalable training of deep neural networks [PDF]

Authors: Seung-Hwan Lim, Steven Young and Robert Patton (Oak Ridge National Laboratory)

Abstract: This study presents a principled empirical evaluation of image storage systems for training deep neural networks. We employ the Caffe deep learning framework to train neural network models for three different data sets, MNIST, CIFAR-10, and ImageNet. While training the models, we evaluate five different options to retrieve training image data: (1) PNG-formatted image files on local file system; (2) pushing pixel arrays from image files into a single HDF5 file on local file system; (3) in-memory arrays to hold the pixel arrays in Python and C++; (4) loading the training data into LevelDB, a log-structured merge tree based key-value storage; and (5) loading the training data into LMDB, a B+tree based key- value storage. The experimental results quantitatively highlight the disadvantage of using normal image files on local file systems to train deep neural networks and demonstrate reliable performance with key-value storage based storage systems. When training a model on the ImageNet dataset, the image file option was more than 17 times slower than the key-value storage option. Along with measurements on training time, this study provides in-depth analysis on the cause of performance advantages/disadvantages of each back-end to train deep neural networks. We envision the provided measurements and analysis will shed light on the optimal way to architect systems for training neural networks in a scalable manner.


Lunch Break



Keynote II: Energy-Efficient Manycore Architectures for Big Data [pdf]

Speaker: Prof. Josep Torrellas, University of Illinois Urbana-Champaign

Abstract: As transistor sizes continue to scale, we are about to witness stunning levels of chip integration, with 1,000 cores on a single die. At the same time, energy and power will continue to strongly constrain the designs. In this context, this talk presents some of the promising technologies that we may need to deploy to design such multicores. Specifically, we will need Voltage-Scalable cores–i.e., flexible cores that can competitively operate both at high and low voltage ranges, unlike existing big-little designs. Extensive power gating will be crucial, likely with the help of non-volatile memory. Further, to avoid energy waste, we will need power-management controllers that use control-theoretic concepts for maximum energy efficiency. A combination of all of these techniques–and more–will be needed to execute big data workloads.

Bio: Josep Torrellas is a Professor of Computer Science and Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. He is a Fellow of IEEE and ACM. He is the Director of the Center for Programmable Extreme-Scale Computing, a center focused on architectures for extreme energy and power efficiency. He was until recently the Director of the Intel-Illinois Parallelism Center (I2PC), a center created by Intel to advance parallel computing. He has made contributions to parallel computer architecture in the areas of shared memory multiprocessor organizations, cache hierarchies and coherence protocols, thread-level speculation, and hardware and software reliability.


Tea Break



Invited Talk III: Benchmarking and Ranking Big data systems [pdf]

Speaker: Mr. Xinhui Tian, ICT, CAS and University of CAS

Abstract: BigDataBench is an open-source big data benchmark suite which includes diverse and representative datasets and workloads from various big data application scenarios. It currently includes 14 real-world data sets and 34 big data workloads with different implementations on various big data systems. In this talk, we present the recent progress of BigDataBench, discuss about the future works, and also present the primary results of BigData100, a project that benchmarks different big data systems using BigDataBench.

Bio: Xinhui Tian is a PhD candidate of Institute of Computing Technology, Chinese Academy of Sciences. He received his bachelor degree from Peking University in 2011. His research interest is big data benchmarking, distributed system and data warehouse.


Regular Paper II: When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big-Data Workloads [PDF]

Authors: Jason Power, Mark D. Hill and David A. Wood (University of Wisconsin–Madison)

Abstract: Response time requirements for big-data processing systems are shrinking. To meet this strict response time requirement, many big- data systems store all or most of their data in main memory to reduce the access latency. Main memory capacities have grown to support the growing amount of data. Systems with 2 TB of main memory capacity available today. However, the rate at which processors can access this data—the memory bandwidth—has not grown at the same rate. In fact, some of these big-memory systems can access less than 10% of their main memory capacity in one second (billions of processor cycles).
3D die-stacking is one promising solution to this bandwidth problem, and industry is investing significantly in 3D die-stacking. We use a simple back-of-the-envelope-style model to characterize if and when the 3D die-stacked architecture is more cost-effective than current architectures for in-memory big-data workloads. We find that die-stacking has much higher performance than current systems (up to 256× lower response times), and it does not require expensive memory over provisioning to meet real-time (10 ms) re- sponse time service-level agreements. However, the power require- ments of the die-stacked systems are significantly higher (up to 50×) than current systems, and its memory capacity is lower in many cases. Even in this limited case study, we find 3D die-stacking is not a panacea. Today, die-stacking is the most cost-effective solu- tion for strict SLAs and by reducing the power of the compute chip and increasing memory densities die-stacking can be cost-effective under other constraints in the future.



Invited Talk IV: Let’s Get Sirius

Speaker: Prof. Jason Mars, University of Michigan and CEO/Co-founder of Clinc

Abstract: “Ultimately, that’s why [Clarity Lab] is running the Sirius project. The Apples and Googles and the Microsofts know how this new breed of service operates, but the rest of the world doesn’t. And they need to.” — Wired Magazine
Demand for cloud services that deliver sophisticated AI on the critical path of each query, as is the case with intelligent personal assistants like Siri, are estimated to grow significantly. If these trend predictions are correct, these types of applications will likely consume most of the world’s compute cycles. The Sirius project was developed to investigate what this future might look like, and how our cloud architectures should evolve to get us there.

Bio: Jason Mars is an Assistant Professor at the University of Michigan and CEO/Co-founder of Clinc. Jason is recognized as a Leading expert in the design of cross-layer systems for emerging cloud computing applications, and in particular artificial intelligence, computer vision, and natural language processing. Jason has published dozens top notch papers in these areas and received a number of awards and honors for excellence in his research work including a Google Faculty Research Award. Jason’s work has impacted both industry and academia and is routinely covered by press outlets such as Wired, Venturebeat, Business Insider, EETimes, among others. Most recently, Jason Mars, along with members of his Lab at UMich, has had significant impact on industry and academia with Sirius (Lucida), an open source intelligent personal assistant. This project has taken the community by storm, ascending to the top trending project on GitHub during the early weeks of it’s releases. You can find out more information about Jason Mars at and his company Clarity Lab, Inc. (Clinc) at


Regular Paper III: Characterizing Cloudera Impala Workloads with BigDataBench on Infiniband Clusters [PDF]

Authors: Kunal Kulkarni, Xiaoyi Lu and Dhabaleswar K. Panda (The Ohio State University)

Abstract: Cloudera Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Hadoop. Impala brings scalable parallel database technol- ogy to Hadoop, enabling users to issue low-latency SQL queries to data stored in HDFS and HBase. The High Performance Comput- ing (HPC) domain has exploited high performance networks such as InfiniBand for many years. InfiniBand provides high bandwidth and low latency network. Cloudera Impala can run in IPoIB mode on InfiniBand currently. BigDataBench is a well known bench- marking suite for Big Data applications that provides real workload scenarios. In this paper we first characterize BigDataBench query workloads in Impala on an InfiniBand cluster and determine time spent for computation, communication, and I/O. We see that for In- ner Join, computation outweighs communication. With full remote mode operations, we see that BigDataBench Queries run faster on InfiniBand QDR (IPoIB, 32Gbps) compared to 10GE Ethernet – Scan by 21%, Aggregate by 27%, Inner Join by 6%. Interesting observation is although Join is most communication intensive, we did not see much difference in the overall runtime of the query on InfiniBand QDR (IPoIB) and 10GE Ethernet. This is because the computation dominates the Join query performance even though the network communication latency and throughput on InfiniBand QDR was better than 10GE Ethernet by 19%. In the scalability study by fixing the data size and increasing the number of compute nodes in the cluster, we observe that Scan and Aggregate queries scale linearly but the improvement in Join execution time is neg- ligible. This is because Impala by default does a broadcast Join where the smaller table data is sent to all the nodes and thus re- sults in increased computation to do the Join matching. From these experiments we see that although InfiniBand improves the commu- nication part of Join query, there are opportunities to make the Join computation in Impala better so that we get more benefit in the overall query execution time.



Invited Talk V: Leveraging Hardware Address Sampling for Memory Performance Insights [pdf]

Speaker: Prof. Xu Liu, College of William and Mary

Abstract: Hardware address sampling is widely supported in modern Intel, AMD, and IBM processors. Lightweight tools based on address sampling provide deep insights into performance bottlenecks of a program that uses memory subsystems inefficiently. We developed performance tools to collect and attribute samples to quantify the memory bottlenecks in parallel programs. Moreover, we go one step further to analyze address samples and offer optimization guidance. We show that even with sparse samples, our analysis methods can still provide accurate results. Experiments show that our analysis tools pinpoint and quantify performance bottlenecks in complex codes that other tools do not provide. Guided by our tool, we can significantly speedup several well-known parallel programs by optimizing their memory accesses.

Bio: Xu Liu is an assistant professor in the Department of Computer Science at College of William and Mary. He obtained his Ph.D. from Rice University in 2014. His research interests are parallel computing, compilers, and performance analysis. Prof. Liu has been working on a few open-source performance tools, which are world-widely used at universities, DOE national laboratories, and in industry. Prof. Liu received HPC fellowships from NAG, Schlumberger, and BP while a Ph.D. candidate at Rice University. After joining W&M, Prof. Liu received Best Paper Award at SC’15.

Venue Information

BPOE: Fuchsia, Conference B, Georgia Tech Hotel and Conference Center.

Contact Information

Prof. Jianfeng Zhan:
Dr. Gang Lu:      
Dr. Rui  Han:      


Steering committee:

  • Christos Kozyrakis,   Stanford
  • Xiaofang Zhou, University of Queensland
  • Dhabaleswar K Panda, Ohio State University
  • Aoying Zhou,  East China Normal University
  • Raghunath Nambiar, Cisco
  • Lizy K John,  University of Texas at Austin
  • Xiaoyong Du,  Renmin University of China
  • Ippokratis Pandis,  IBM Almaden Research Center
  • Xueqi Cheng, ICT, Chinese Academy of Sciences
  • Bill Jia, Facebook
  • Lidong Zhou, Microsoft Research Asia
  • H. Peter Hofstee,  IBM Austin Research Laboratory
  • Alexandros Labrinidis,  University of  Pittsburgh
  • Cheng-Zhong Xu, Wayne State University
  • Jianfeng Zhan, ICT, Chinese Academy of Sciences
  • Guang R. Gao, University of Delaware.
  • Yunquan Zhang, ICT, Chinese Academy of Sciences

Program Chairs: 

Prof. Jianfeng Zhan, ICT, Chinese Academy of Sciences and University of Chinese Academy of Sciences
Dr. Gang Lu, Beijing Academy of Frontier Science & Technology
Dr. Rui Han, ICT, Chinese Academy of Sciences

Web and Publicity Chairs: 

Zhen Jia.  ICT, CAS, and UCAS
Wanling Gao, ICT, CAS and UCAS

Keynote speaker


Program Committee (Confirmed)

Bingsheng He, Nanyang Technological University
Xu Liu, College of William and Mary
Rong Chen, Shanghai Jiao Tong University
Weijia Xu, Texas Advanced Computing Center, University of Texas at Austin
Lijie Wen, School of Software, Tsinghua University
Xiaoyi Lu, The Ohio State University
Yueguo Chen, Renmin University
Edwin Sha, Chongqing University
Mingyu Chen, Institute of Computing Technology, Chinese Academy of Sciences
Zhenyu Guo, Microsoft
Tilmann Rabl, University of Toronto
Farhan Tauheed, EPFL
Chaitanya Baru, San Diego Supercomputer Center, UC San Diego
Seetharami Seelam, IBM
Rene Mueller, IBM Research
Cheqing Jin, East China Normal University
Onur Mutlu, Carnegie Mellon University
Kai Wei, Chinese Academy of Information and Communications
Christos Kartsaklis, Computer Science and Mathematics Division
Weining Qian, East China Normal University
Jian Ouyang, Baidu, Inc.
Lei Wang, ICT, Chinese Academy of Sciences
Zhibin Yu, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
Yuanchun Zhou, Computer Network Information Center, Chinese Academy of Sciences

Photo Gallery


Previous Events





October 7, 2013

IEEE BigData Conference, San Jose, CA


October 31,2013

CCF HPC China, Guilin, China


December 5,2013

CCF Big Data Technology Conference 2013, BeiJing, China


March 1, 2014

ASPLOS 2014, Salt Lake City, Utah, USA


September 5, 2014

VLDB 2014, Hangzhou, Zhejiang Province, China


September 4, 2015

VLDB 2015, Hilton Waikoloa Village, Kohala Coast , Hawai‘i


April 9, 2017

ASPLOS 2017, Xi’an, China