Mitigating Unauthorized Scraping Alliance Newsletter

MUSA Monthly Newsletter

Issue 2 | February 14, 2023

Welcome to the Mitigating Unauthorized Scraping Alliance newsletter, where we highlights topics of interest related to unauthorized data scraping. Unauthorized data scraping involves the automated collection of user data at scale that violates a platform's Terms of Service.

Featured Articles

Read full article

Safeguarding User Data: Building a United Front Against Unauthorized Scraping:

In a guest blog post for techUK, MUSA offers an explainer on unauthorized scraping and why it is an industry-wide problem. The blog discusses reasons the unauthorized scraping industry has grown over the past decade and examines the motivations and negative impacts behind unauthorized scraping. The article highlights the importance of industry-wide collaboration to combat the issue of unauthorized scraping.

Read full article

Building the Conversation Around Unauthorized Scraping:

The Mitigating Unauthorized Scraping Alliance hosted a public event on January 23, 2023, entitled “The State of Unauthorized Scraping and its Impacts on Users and Industry” in observance of International Data Privacy Day. The event featured perspectives from leading academic, legal, and industry representatives who discussed the impacts of unauthorized scraping on users and industry as well as the legal and regulatory landscape. Watch a full recording of the event here.

Industry & Scraping In the News

Twitter curbs researcher access, sparking backlash in Washington:

Twitter has decided to restrict free access to its Application Programming Interfaces (APIs) used by researchers to collect and analyze data on the platform. The company stated its intent to charge for access to its APIs, but did not specify the cost. Academics and lawmakers criticized the decision, arguing that it would decrease transparency on the platform and create a barrier to access for journalists and academics. Read more on Washington Post

ChatGPT's Data-Scraping Model Under Scrutiny From Privacy Experts:

The new chatbot ChatGPT has raised privacy concerns over its use of scraping to aggregate data for its training models. The article outlines the concerns surrounding the potential sale of publicly available user data, the inability of AI models’ to identify inaccurate information when scraping data for training purposes, and the potential for global regulatory challenges. Read more on Info Security Magazine

Meta, which pays for web scraping, sues to stop web scraping:

In a recent court filing by Bright Data, the company alleges that Meta Platforms Inc. paid a contractor to scrape data from other websites despite publicly condemning the practice. A Meta spokesperson confirmed that Meta had paid Bright Data to gather information from e-commerce sites to build brand profiles, but ended the relationship after it learned Bright Data violated the company’s Terms of Service. Meta has been an outspoken critic of unauthorized scraping and recently filed a lawsuit against US-based Voyager Labs. Read more on The Register

Web Scraper Software Market to reach a market value of USD 1.73 Billion by 2030, growing at a 13.48% CAGR according to Market Research Future:

The web scraper software market is continuing to grow and will reach a market value of 1.73 billion USD by 2030. The in-depth report conducted by Market Research Future highlights the reliance on scraping tools for big data collection and outlines the major players in the industry. Read more on the Digital Journal

DuoLingo investigating dark web post offering data from 2.6 million accounts: Duolingo, the language learning platform, announced that it is investigating a post on a hacking forum offering information from 2.6 million customer accounts. Duolingo has stated that the information being offered, which included emails, phone numbers, and courses taken, were “obtained by data scraping public profile information.” The poster of the information explained that the information was obtained through the scraping of an exposed API. The article highlights that scraping has become a widespread problem for many tech companies and platforms. Read more on The Record

ChatGPT Stole Your Work. So What Are You Going to Do?:

AI companies are increasingly reliant on user data to train machine learning models. Much of the innovation in AI technology over the past few years has been focused on how to improve user data collection methods. This article from Wired explains how users can leverage their position as data generators to combat exploitative collection methods. Read more on Wired

Legislation, Regulation, & Court Cases In the News

South Carolina must face NAACP suit over ban on court data scraping:

On January 11, 2023, a South Carolina court ruled that the South Carolina State Conference of the NAACP can move forward with its lawsuit challenging the state high court’s prohibition on the automated data collection of online court records. The suit was initially filed after the NAACP claimed that the South Carolina Supreme Court prevented the organization from collecting data on evictions and alleges that the court’s prohibition on data scraping violates the First Amendment’s right of access to information. Read more on Courthouse News

Getty Images sues AI art generator Stable Diffusion in the US for copyright infringement:

Getty Images has filed a case against Stability AI for unlawfully copying and processing millions of images protected by copyright to train its AI model without permission or compensation. Getty Images alleges that Stability AI infringed on both the company’s copyright and trademark protections and did not seek permission to collect data for artificial intelligence training systems despite the existence of viable licensing options. It is expected that the copyright infringement arguments in the lawsuit will rely on the interpretation of the US fair use doctrine, which protects unlicensed use of copyrighted-work. Read more on The Verge

Lawsuits over Stability AI's Stable Diffusion could threaten the future of AI-generated art:

Stability AI is facing two lawsuits over its AI image generating product, stable diffusion, which trains its systems through data scraping. Stability AI received backlash over its data collection methods, especially from artists who say they were neither asked nor compensated for the rights to use their art in this way. The outcome of the lawsuits against Stability AI from a group of artists and Getty Images has tremendous implications for the future of generative AI and interpretation of fair use. Read more on Business Insider

Surveillance Company Voyager Labs Sued by Meta for Data Scraping, Use of Fake Accounts:

Meta Platforms Inc. has sued Voyager Labs for violating the company’s Terms of Service. Meta seeks a permanent injunction against Voyager Lab’s proprietary software, which allegedly collects user data without consent and permission. Voyager Labs used fake accounts to collect information from Facebook profiles including profile information, photos, friend lists, and comment history. Read more on CPO Magazine

Small banks, fintechs ask CFPB for more time to phase out screen scraping:

A new Consumer Financial Protection Bureau (CFPB) proposal aims to eliminate screen scraping by requiring banks to establish APIs, which are regarded as a more secure way to connect consumers’ accounts to financial apps. Small banks, credit unions, and fintech companies are requesting that the CFPB provide enough time to make the transition because a sudden ban would disadvantage companies with fewer resources. The proposal seeks to provide users with greater control over their financial data. Read more on Banking Dive

Albright Scratches Patent Claims In Website Data Spat:

A recent decision by a Texas district court ruled that a Tennessee healthcare ecommerce website, MDSave, could proceed with a lawsuit against Sesame, Inc. for scraping data about Texas residents on its platform without permission due to a potential Lanham Act violation. The judge dismissed a concurrent patent infringement claim. Read more on Law 360

About MUSA

The Mitigating Unauthorized Scraping Alliance (MUSA) brings together leading companies committed to protecting data from unauthorized scraping and misuse. In collaboration with industry members, policymakers, and the public, MUSA is generating a global dialogue around unauthorized data scraping focused on protecting user data through education, advocacy, public-private partnerships, and the sharing of reasonable practices to mitigate unauthorized scraping.

Connect with us: