The rise of dominant ﬁrms in data driven industries is often credited to their alleged data advantage. Empirical evidence lending support to this conjecture is surprisingly scarce. In this paper we document that data as an input into machine learning tasks display features that support the claim of data being a source of market power. We study how data on keywords improve the search result quality on Yahoo!. Search result quality increases when more users search a keyword. In addition to this direct network effect caused by more users, we observe a novel externality that is caused by the amount of data that the search engine collects on the particular users. More data on the personal search histories of the users reinforce the direct network effect stemming from the number of users searching the same keyword. Our ﬁndings imply that a search engine with access to longer user histories may improve the quality of its search results faster than an otherwise equally efficient rival with the same size of user base but access to shorter user histories.
Keywords: Competition, network effects, search engines, Big Data