Grale: Designing Networks for Graph Learning

Halcrow, Jonathan; Moşoi, Alexandru; Ruth, Sam; Perozzi, Bryan

doi:10.1145/3394486.3403302

Abstract:How can we find the right graph for semi-supervised learning? In real world applications, the choice of which edges to use for computation is the first step in any graph learning process. Interestingly, there are often many types of similarity available to choose as the edges between nodes, and the choice of edges can drastically affect the performance of downstream semi-supervised learning systems. However, despite the importance of graph design, most of the literature assumes that the graph is static. In this work, we present Grale, a scalable method we have developed to address the problem of graph design for graphs with billions of nodes. Grale operates by fusing together different measures of(potentially weak) similarity to create a graph which exhibits high task-specific homophily between its nodes. Grale is designed for running on large datasets. We have deployed Grale in more than 20 different industrial settings at Google, including datasets which have tens of billions of nodes, and hundreds of trillions of potential edges to score. By employing locality sensitive hashing techniques,we greatly reduce the number of pairs that need to be scored, allowing us to learn a task specific model and build the associated nearest neighbor graph for such datasets in hours, rather than the days or even weeks that might be required otherwise. We illustrate this through a case study where we examine the application of Grale to an abuse classification problem on YouTube with hundreds of million of items. In this application, we find that Grale detects a large number of malicious actors on top of hard-coded rules and content classifiers, increasing the total recall by 89% over those approaches alone.

Comments:	10 pages, 6 figures, to be published in KDD'20
Subjects:	Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Cite as:	arXiv:2007.12002 [cs.LG]
	(or arXiv:2007.12002v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2007.12002
Related DOI:	https://doi.org/10.1145/3394486.3403302

Computer Science > Machine Learning

Title:Grale: Designing Networks for Graph Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators