From Graph to Knowledge Graph - KDD 2019 Hands-on Tutorial

Mining Large-scale Heterogeneous Networks Using Spark

Presenters: Iris Shen, Charles Huang, Chieh-Han Wu, Anshul Kanakia

Contributors: Yuxiao Dong, Junjie Qian

Microsoft Research - Microsoft Academic Graph team

Time: Thu, August 08, 2019, 9:30 am - 12:00 pm and 1:00 pm - 3:30 pm

Abstract

Many real-world datasets come in the form of graphs. These datasets include social networks, biological networks, knowledge graphs, the World Wide Web, and many more. Having a comprehensive understanding of these networks is essential to truly understand many important applications.

This hands-on tutorial introduces the fundamental concepts and tools used in modeling large-scale graphs and knowledge graphs. The audience will learn a spectrum of techniques used to build applications that use graphs and knowledge graphs: ranging from traditional data analysis and mining methods to the emerging deep learning and embedding approaches.

Five lab sessions are included to give the audience hands-on experience to work through real-life examples on major topics covered in this tutorial. This includes:

  1. understanding basic graph properties;
  2. using graph representation learning to explore network similarity;
  3. utilizing NLP and text mining techniques to build knowledge graphs;
  4. modeling knowledge graphs with embedding techniques and how to apply it to recommendation applications.

We use Microsoft Academic Graph (MAG) -- the largest publicly available academic domain knowledge graph –- as the dataset to demonstrate the algorithms and applications presented here. MAG includes 6 types of entities with 450 million nodes, and over 3 billion edges covering more than 660K academic concepts. The MAG dataset (500G+) is regularly updated at a bi-weekly cadence. We use a Top CS Conference Sub-Graph from one of the most up-to-date data versions for this hands-on tutorial. The full graph with bi-weekly updates is available for free here.

Key takeaways for attendees will be:

Agenda

Time Module Slides Codes
9:30am - 10:30am I: Welcome, Setup, Dataset link link1 link2
10:30am - 11:15am II: Graph Basics link link
11:15am - 12:00pm III: Graph Representation Learning link link
12:00pm - 1:00pm LUNCH BREAK
1:00pm - 2:05pm IV: Knowledge Graph Fundamentals and Construction link link
2:05pm - 3:10pm V: Knowledge Graph Inference and Applications link link
3:10pm - 3:30pm VI: Summary and Looking Forward link n/a

Previous Edition

For a longer and more thoeretic version of the graph and knowledge graph contents, please see DAT278x edX Online Course: From Graph to Knowledge Graph: Algorithms and Applications (GitHub link) (slides and codes only), (edX link here - with videos, quiz, final exam, and you can earn a certificate for this course at edX).