1 of 24

Slide Notes

DownloadGo Live

Influence

Published on Nov 18, 2015

No Description

PRESENTATION OUTLINE

INFLUENCE

TRACKING THE ZEITGEIST
Photo by ep_jhu

OLD IMPLEMENTATION

Photo by AMANITO

PIPELINE

  • Poll for tweets
  • Extract URLs + scrape pages
  • Extract main content + topics
  • Calculate TF-IDF scores
  • Save 99th percentile
Photo by wili_hybrid

IMPLEMENTATION

  • Used green threads to parallelize
  • Used celery to scale
  • Could process ~7-12 Tweets / s
Photo by Entropyer

WHY NOT PRODUCTIONIZE?

Photo by TunnelBug

INSUFFICIENCIES

  • Too slow for complex analysis
  • Twitter specific
  • Resource intensive
  • Requires custom IO primitives
Photo by VinothChandar

INCOMPATIBILITIES

  • Poor Celery support for gevent
  • No muitiprocessing within worker
  • No processed once guarantee
Photo by timtom.ch

SCALA TO THE RESCUE

NATIVE CONCURRENCY

  • Futures
  • Lightweight threading
  • Lazy async evaluation
Photo by yewenyi

AKKA ACTORS

  • Message passing
  • Green threading
  • 50M messages / s
  • 2.5M actors / GB of heap
Photo by geezaweezer

FASTER PROTOTYPING

  • Types + powerful DSL
  • More concise code
  • Most time spent learning

NOT ALL GOOD

  • Version fragmentation
  • Only recently popular
  • Centrally controlled
Photo by ndmpsy

WHERE WE'RE AT

Untitled Slide

  • Re-implemented RecSys Core
  • Connected to Twitter Streaming API
  • Live update of TF-IDF rankings
Photo by mrtopp

ARCHITECTURE

Photo by marcp_dmoz

KEY PARTS

  • API written with Play
  • Akka worker pool
  • Spark cluster for TF-IDF

BENEFITS

  • Horizontally scalable on all levels
  • No duplicated data
  • Highly re-usable
Photo by kevin dooley

IMPROVEMENTS

  • Kafka queue
  • Stronger decoupling
Photo by Daniele Zanni

WHAT'S LEFT?

FOR PRODUCTION

  • Calculating metrics per campaign
  • Writing microservice API
  • IDF calculated from global baselines
  • Configuration management

FOR QUALITY

  • Windowed ranking calculations
  • Store raw tweets for reprocessing
  • Handling >5000 Twitter handles
  • Better keyword recommendations
  • Allow users to get trends from searches
Photo by nataliej

THE FUTURE!

Untitled Slide

  • Tracking brand audience
  • Track global trend activity
Photo by Brad Higham

DEMO

class SimpleCoordinator extends SubscribableActor {
val feeder = context.system.actorOf(TwitterFeederActor.props())
feeder ! AddUser("Newscred_Devs_P")
feeder ! SubscribeReceiver(self)
val scraper = context.system.actorOf(Props[ScraperActor])

receiveBuilder += {
case SendStatus(status: Status) =>
scraper ! ExtractTopics(status.getURLEntities()(0).getURL(), scraper)
}
}
Photo by jurvetson