Science and Research Content

How Pinterest uses its Taxonomy -


Pinterest recently rolled out a new tool, Pinterest Trends. This tool offers insights into how content performs on the platform, the emerging trends, and consumer behavior. Pinterest gathers these insights from over 200 billion ideas saved by 320 million-plus people to over 4 billion boards with the help of a taxonomy-based knowledge management system.

At Pinterest, the taxonomy is called Interest Taxonomy as it is used to organize interests and curate nodes for targeted advertising. The interests, which are popular topics and entities, are grouped in a hierarchical parent-child tree structure. In this structure, each child is treated as a subclass of its single parent. The top-level taxonomy nodes define the broad verticals and capture the general interests associated with Pins across Pinterest. The children nodes, which have up to 11 levels, capture topics at a more granular level.

The Interest Taxonomy is used in interest-based targeting to help advertisers reach the right audience based on Pinterest’s unique understanding of Pinners’ interests, taste, and what they are planning. Pin2Interest (P2I), a scalable Machine Learning (ML) system for content classification, also uses the taxonomy hierarchy information. The granularity and the accuracy of the taxonomy are critical for ensuring the prediction accuracy of P2I. Together, P2I and the Interest Taxonomy provide critical insight into content understanding.

Pinterest also leverages the Interest Taxonomy for user2interest, an ML system that infers users’ interests. The ML system utilizes the user engaged Pins and corresponding interest labels of those Pins output from Pin2Interest as inputs. The user interest signal is widely used at Pinterest for targeted advertising and organic recommendations. The user interest signals can also provide insights on the Interest Taxonomy from the user perspective.

Another way Pinterest understands users’ intent and serve them with relevant results is by mapping queries to the Interest Taxonomy. Query2Interest (Q2I), which is in production and used in various Ads and organic surfaces, maps short text queries to the taxonomy nodes. It groups queries having similar categories and meanings to the taxonomy nodes by leveraging Pintext, a multitask text embedding system in Pinterest, to compute the similarity score between the short text and taxonomy nodes.

The taxonomy curation process followed by Pinterest has two major components. The data modeling into the Resource Description Framework (RDF) graph, WebProtégé visualization, and curation component and the engineering workflow component.

For modeling the data in the taxonomy, Pininterest uses RDF triples to generate graphs, which will be subsequently used for curation as well. The open-source tool WebProtégé is leveraged for visualization and human curation of the taxonomy, which facilitates the creation of a high-quality taxonomy by collaborative curation. The engineering workflow component takes the RDF graphs (in XML format) as input and generates the relational database tables for downstream consumption.

Pinterest follows an incremental way of taxonomy generation and development. It builds on the taxonomy developed from the previous iteration. Besides when a new version of the taxonomy is created, Pinterest performs and supports operations by adding a new node, renaming an existing node, deleting a node and merging two or more nodes into one to develop a high-quality relevant taxonomy.

In the future, Pinterest would update the Interest Taxonomy and the downstream signals (P2I, U2I, B2I, and Q2I) regularly and automatically. In the coming future, Pinterest will also be working towards building new types of relationships among entities automatically in the taxonomy and associate attributes.

Click here to read the original article published by Pinterest.

STORY TOOLS

  • |
  • |

Please give your feedback on this article or share a similar story for publishing by clicking here.


sponsor links

For banner ads click here