Close Menu
StreamLineCrypto.comStreamLineCrypto.com
  • Home
  • Crypto News
  • Bitcoin
  • Altcoins
  • NFT
  • Defi
  • Blockchain
  • Metaverse
  • Regulations
  • Trading
What's Hot

Tether scales back $20 billion funding ambitions after investor resistance: FT

February 4, 2026

Burry Warns of $1B Sell-Off: Why Bitcoin Hyper ($HYPER) is the Future of $BTC Utility

February 4, 2026

SHIB Price Prediction: Mixed Signals Point to $0.0000085 Target by February End

February 4, 2026
Facebook X (Twitter) Instagram
Wednesday, February 4 2026
  • Contact Us
  • Privacy Policy
  • Cookie Privacy Policy
  • Terms of Use
  • DMCA
Facebook X (Twitter) Instagram
StreamLineCrypto.comStreamLineCrypto.com
  • Home
  • Crypto News
  • Bitcoin
  • Altcoins
  • NFT
  • Defi
  • Blockchain
  • Metaverse
  • Regulations
  • Trading
StreamLineCrypto.comStreamLineCrypto.com

AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms

July 16, 2024Updated:July 17, 2024No Comments2 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms
Share
Facebook Twitter LinkedIn Pinterest Email
ad



AI training dataset used by tech giants allegedly created by scraping YouTube videos in violation of terms

Non-profit AI analysis group EleutherAI scraped YouTube subtitles to create a dataset in violation of YouTube’s phrases of service, ProofNews stated on July 16.

The dataset, known as the Pile, allegedly consists of subtitles of 173,536 YouTube movies from over 48,000 channels. About 12,000 deleted movies are a part of the dataset.

A number of high tech and AI companies, together with Anthropic, have since used the Pile for coaching. Anthropic spokesperson Jennifer Martinez stated the dataset consists of “a really small subset of YouTube subtitles” however declined to touch upon attainable violations of YouTube’s phrases of service.

Enterprise software program agency Salesforce additionally used the dataset. Salesforce VP of AI analysis Caiming Xiong stated the dataset was “publicly out there” and that Salesforce used it for tutorial and analysis functions. ProofNews stated Salesforce finally launched the identical dataset publicly.

Apple used the Pile to coach OpenELM, an environment friendly language mannequin for on-device AI. Nvidia, Bloomberg, and Databricks additionally used the Pile for AI coaching.

ProofNews stated its listing of corporations that used the dataset shouldn’t be complete, as corporations don’t all the time disclose which datasets they use in AI coaching.

Dataset incorporates crypto channels, extra

ProofNews’ search software signifies that Pile consists of movies from crypto channels and creators, together with Coinbase, Cointelegraph, Bitcoin Journal, BitBoy Crypto, 99Bitcoins, Ivan On Tech, and Andreas Antonopolous.

ProofNews highlighted that the dataset consists of transcripts from main information channels, training channels, late-night exhibits, in style YouTube hosts, and different classes. The Pile dataset extends past YouTube to different web sites and on-line content material.

ProofNews famous an earlier report from the New York Occasions, which stated OpenAI and Google had beforehand harvested YouTube textual content. Google, which owns YouTube, stated the motion was permissible attributable to its settlement with customers. OpenAI didn’t verify or deny the report.

AI copyright disputes are far-reaching. Regulation agency Baker Hoestler lists a minimum of fifteen lawsuits involving tech companies similar to Anthropic, Meta, GitHub, Stability AI, Nvidia, and Google. OpenAI faces high-profile lawsuits from Mom Jones’ mother or father firm and The New York Occasions.

Talked about on this article



Source link

ad
allegedly created dataset giants scraping tech terms training videos violation YouTube
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Related Posts

Tether scales back $20 billion funding ambitions after investor resistance: FT

February 4, 2026

Burry Warns of $1B Sell-Off: Why Bitcoin Hyper ($HYPER) is the Future of $BTC Utility

February 4, 2026

SHIB Price Prediction: Mixed Signals Point to $0.0000085 Target by February End

February 4, 2026

Bitmine’s Tom Lee Defends $6.6B ETH Loss as LiquidChain Unifies DeFi

February 4, 2026
Add A Comment
Leave A Reply Cancel Reply

ad
What's New Here!
Tether scales back $20 billion funding ambitions after investor resistance: FT
February 4, 2026
Burry Warns of $1B Sell-Off: Why Bitcoin Hyper ($HYPER) is the Future of $BTC Utility
February 4, 2026
SHIB Price Prediction: Mixed Signals Point to $0.0000085 Target by February End
February 4, 2026
Bitmine’s Tom Lee Defends $6.6B ETH Loss as LiquidChain Unifies DeFi
February 4, 2026
Solana price falls under $100: Dead-cat bounce coming?
February 4, 2026
Facebook X (Twitter) Instagram Pinterest
  • Contact Us
  • Privacy Policy
  • Cookie Privacy Policy
  • Terms of Use
  • DMCA
© 2026 StreamlineCrypto.com - All Rights Reserved!

Type above and press Enter to search. Press Esc to cancel.