Wikipedia
RCFile (Record Columnar File) is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile structure is a systematic combination of multiple components including data storage format, data compression approach, and optimization techniques for data reading. It is able to meet all the four requirements of data placement: (1) fast data loading, (2) fast query processing, (3) highly efficient storage space utilization, and (4) a strong adaptivity to dynamic data access patterns.
RCFile is a result of basic research with collaborative efforts from Facebook, Ohio State University, and Institute of Computing Technology, Chinese Academy of Sciences. A research paper entitled “RCFile: a Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse systems” was published and presented in ICDE’ 11. The data placement structure and its implementation presented in the paper have been widely adopted in the open source community, big data analytics industries, and application users. See the section of Impacts.