The PGX team at Oracle Labs focuses on high-performance shared-memory and distributed graph
processing and has open internship positions available.
Oracle, a global provider of enterprise cloud computing, is empowering businesses of all sizes on their journey of digital transformation. Oracle Cloud provides leading-edge capabilities in software as a service, platform as a service, infrastructure as a service, and data as a service.
Oracle’s application suites, platforms, and infrastructure leverage both the latest technologies and emerging ones – including artificial intelligence, machine learning, blockchain, and Internet of Things – in ways that create business differentiation and advantage for customers. Continued technological advances are always on the horizon.
Oracle Labs is the advanced research and development arm of Oracle. We focus on the development of technologies that keep Oracle at the forefront of the computer industry. Oracle Labs researchers look for novel approaches and methodologies, often taking on projects with high risk or uncertainty, or that are difficult to tackle within a product- development organization. Oracle Labs research is focused on real-world outcomes: our researchers aim to develop technologies that will someday play a significant role in the evolution of technology and society. For example, chip multithreading and the Java programming language grew out of work done in Oracle Labs.
Parallel Graph AnalytiX (PGX)
PGX is a toolkit for graph analytics that supports graph algorithms, such as PageRank, graph queries with PGQL (an SQL-like graph query language), and graph ML. PGX includes both a single-machine in-memory engine and a distributed engine for extremely large graphs, and is already available as an option in Oracle products and an active research project at Oracle Labs.
The goal of this project is to extend PGX, both the single-machine runtime (PGX.SM) and the distributed runtime (PGX.D) with new capabilities. We offer various topics depending on the skills and the interests of the candidate (topics are not limited to the ones below; see also the "Related Topics" sub- section below):
Recent research shows that machine learning workloads can benefit from
information encoded in the graph to achieve higher accuracy and faster
convergence when learning models. In this project, we will explore, given the
distributed nature of the graph, how it is possible to retrieve embeddings for ML
algorithms from such distributed graphs efficiently for processing in external ML
Extension of an SQL-like graph query processing engine (PGQL)
In this project, we will extend the semantics and implementation of the PGQL graph query language. Example topics include: (i) improving the composability of PGQL queries (i.e., starting a PGQL query from the results of a previous one, or from graph algorithms results) and optimizing the execution of such composed queries, and (ii) designing and implementing pipelined versions of the PGQL operators to reduce the peak memory consumption during query.
Dynamic data loading for very large graphs
Main memory is a limited resource. Consequently, in a data-analytics engine, such as PGX.SM, only the most recent or most important data should can be kept in memory, and other data can be offloaded to external storage/systems. During this internship, we will extend PGX.SM support of dynamically loading of data that is present in offloaded systems, in a graceful, efficient and transparent manner.
The successful candidate is expected to complete the internship using a wide and diverse set of skills.
For more information about the internship, please contact Vasileios Trigonakis or Damien Hilloulin.