Microsoft open-sources a crucial algorithm behind its Bing Search services

Microsoft today announced that it has open-sourced a key piece of what makes its Bing search services able to quickly return search results to its users. By making this technology open, the company hopes that developers will be able to build similar experiences for their users in other domains where users search through vast data troves, including in retail, though in this age of abundant data, chances are developers will find plenty of other enterprise and consumer use cases, too.

The piece of software the company open-sourced today is a library Microsoft developed to make better use of all the data it collected and AI models it built for Bing .

“Only a few years ago, web search was simple. Users typed a few words and waded through pages of results,” the company notes in today’s announcement. “Today, those same users may instead snap a picture on a phone and drop it into a search box or use an intelligent assistant to ask a question without physically touching a device at all. They may also type a question and expect an actual reply, not a list of pages with likely answers.”

With the Space Partition Tree and Graph (SPTAG) algorithm that is at the core of the open-sourced Python library, Microsoft is able to search through billions of pieces of information in milliseconds.

Vector search itself isn’t a new idea, of course. What Microsoft has done, though, is apply this concept to working with deep learning models. First, the team takes a pre-trained model and encodes that data into vectors, where every vector represents a word or pixel. Using the new SPTAG library, it then generates a vector index. As queries come in, the deep learning model translates that text or image into a vector and the library finds the most related vectors in that index.

“With Bing search, the vectorizing effort has extended to over 150 billion pieces of data indexed by the search engine to bring improvement over traditional keyword matching,” Microsoft says. “These include single words, characters, web page snippets, full queries and other media. Once a user searches, Bing can scan the indexed vectors and deliver the best match.”

The library is now available under the MIT license and provides all of the tools to build and search these distributed vector indexes. You can find more details about how to get started with using this library — as well as application samples — here.

TurboTax and H&R Block hide their free tax filing tools from Google on purpose

Low-income Americans can file their taxes for free, but odds are they ended up paying anyway.

ProPublica found that tax-filing giant Intuit is deliberately concealing search results for its free filing service, instead pointing all consumers toward its paid products. While users visiting TurboTax’s homepage will be greeted with what looks like free tax software, the software’s parent company usually finds a way to charge anyone using the product. The manipulative design choice echoes recent conversation around dark pattern design and likely explains why free filing services remain underutilized.

Intuit’s true free filing software is called TurboTax Free File. Compared to the company’s main TurboTax portal, TurboTax Free File is much more difficult to find. That service, designed to make the process free for low-income filers individually making less than $34,000 a year, is part of an agreement between tax-filing companies and the IRS stipulating that a free option must be provided for lower-income filers. In the course of reporting, ProPublica found that Intuit competitor H&R Block uses the same tactic to bury its own free service, H&R Block Free File.

To effectively bury its free filing service, TurboTax included a snippet of code in the page’s robots.txt file instructing search engines not to index it. The code was spotted by a Twitter user Larissa Williams and Redditor ethan1el.

Screenshot via ProPublica

Instead of pointing users toward its free file tool, TurboTax funnels the vast majority of users toward its paid and premium services, whether they qualify for free filing or not. The Senate Finance Committee’s top Democrat Ron Wyden denounced the tactic as “outrageous” in a statement to ProPublica, indicating that he intended to bring up the issue with the IRS.

WTF is dark pattern design?