Position-aware Graph Neural Networks Learning node embedding that captures the position of the node within a broader graph structure is crucial for many prediction tasks on graphs. However, while expressive and most popular, existing Graph Neural Network (GNN) approaches have limited power for representing positions/locations of nodes in a bigger network structure. Here we propose {\em Position-aware Graph Neural Networks (P-GNN)}, a new class of GNNs for computing position-aware node embeddings. P-GNN first selects a set of anchor nodes, characterizes the distance of a given target node towards the anchor-set, and then learns a non-linear aggregation scheme over the anchor-sets adjacent to the target node. P-GNN has several advantages: it is inductive, scalable, and can incorporate node feature information. We apply P-GNNs to multiple prediction tasks including link prediction and community detection. We show that P-GNNs consistently outperform state of the art GNN variants, with an improvement up to 38% in terms of AUC score. Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm Many machine learning problems come in the form of networks with relational data between entities, and one of the key unsupervised learning tasks is to detect communities from such a network. We adopt the mixed-membership stochastic blockmodel as the underlying probabilistic model, and give conditions under which the membership of a subset of nodes can be uniquely identified. Our method start by constructing a second-order graph moment, which can be shown to converge to a specific product of the true parameters as the size of the network increases. To correctly recover the true membership parameters, we carefully formulate an optimization problem using insights from convex geometry. We show that if the true memberships satisfy a so called sufficiently scattered condition, then solving the proposed problem correctly identifies the ground truth. We also develop an extremely efficient algorithm, which is significantly faster than prior work and with better convergence properties. Experiments on synthetic and real data justify the effectiveness of the proposed learning framework for network data. Learning Generative Models across Incomparable Spaces Generative Adversarial Networks have shown remarkable success in learning a distribution that faithfully recovers a reference distribution in its entirety. However, in some cases, we may want to only learn some aspects (e.g., cluster or manifold structure), while modifying others (e.g., style, orientation or dimension). In this work, we propose an approach to learn generative models across such incomparable spaces, and demonstrate how to steer the learned distribution towards target properties. A key component of our model is the Gromov-Wasserstein distance, a notion of discrepancy that compares distributions relationally rather than absolutely. While this framework subsumes current generative models in identically reproducing distributions, its inherent flexibility allows application to tasks in manifold learning, relational learning and cross-domain learning. Relational Pooling for Graph Representations This work generalizes graph neural networks (GNNs) beyond those based on the Weisfeiler-Lehman (WL) algorithm, graph Laplacians, and graph diffusion kernels. Our approach, denoted Relational Pooling (RP), draws from the theory of finite partial exchangeability to provide a framework with maximal representation power for graphs. RP can work with existing graph representation models, and somewhat counterintuitively, can make them even more powerful than the original WL isomorphism test. Additionally, RP is the first theoretically sound framework to use architectures like Recurrent Neural Networks and Convolutional Neural Networks for graph classification. RP also has graph kernels as a special case. We demonstrate improved performance of novel RP-based graph representations over current state-of-the-art methods on a number of tasks. Disentangled Graph Convolutional Networks The formation of a real-world graph typically arises from the highly complex interaction of many latent factors. The existing deep learning methods for graph-structured data neglect the entanglement of the latent factors, rendering the learned representations non-robust and hardly explainable. However, learning representations that disentangle the latent factors poses great challenges and remains largely unexplored in the literature of graph neural networks. In this paper, we introduce the disentangled graph convolutional network (DisenGCN) to learn disentangled node representations. In particular, we propose a novel neighborhood routing mechanism, which is capable of dynamically identifying the latent factor that may have caused the edge between a node and one of its neighbors, and accordingly assigning the neighbor to a channel that extracts and convolutes features specific to that factor. We theoretically prove the convergence properties of the routing mechanism. Empirical results show that our proposed model can achieve significant performance gains, especially when the data demonstrate the existence of many entangled factors. Open Vocabulary Learning on Source Code with a Graph-Structured Cache Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task --- with over 100% relative improvement on the latter --- at the cost of a moderate increase in computation time. Learning Discrete Structures for Graph Neural Networks Graph neural networks (GNNs) are a popular class of machine learning models that have been successfully applied to a range of problems. Their major advantage lies in their ability to explicitly incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin. Compositional Fairness Constraints for Graph Embeddings Learning high-quality node embeddings is a key building block for machine learning models that operate on graph data, such as social networks and recommender systems. However, existing graph embedding techniques are unable to cope with fairness constraints, e.g., ensuring that the learned representations do not correlate with certain attributes, such as race or gender. Here, we introduce an adversarial framework to enforce fairness constraints on graph embeddings. Our approach is {\em compositional}---meaning that it can (optionally) enforce multiple different fairness constraints during inference. Experiments on standard knowledge graph and recommender system benchmarks highlight the utility of our proposed framework. A Recurrent Neural Cascade-based Model for Continuous-Time Diffusion Many works have been proposed in the literature to capture the dynamics of diffusion in networks. While some of them define graphical Markovian models to extract temporal relationships between node infections in networks, others consider diffusion episodes as sequences of infections via recurrent neural models. In this paper we propose a model at the crossroads of these two extremes, which embeds the history of diffusion in infected nodes as hidden continuous states. Depending on the trajectory followed by the content before reaching a given node, the distribution of influence probabilities may vary. However, content trajectories are usually hidden in the data, which induces challenging learning problems. We propose a topological recurrent neural model which exhibits good experimental performances for diffusion modeling and prediction.