1 of 18

Slide Notes

DownloadGo Live

Topic Extraction

Published on Mar 16, 2016

No Description

PRESENTATION OUTLINE

TOPICS

Photo by bookfinch

Current Process

Untitled Slide

  • Topics are hand made sets of keywords
  • Articles w/ keyword are assigned the topic
Photo by Jacob Joaquin

BENEFITS

  • Allows topics to be refined online
  • Human added topics have high fidelity

DRAWBACKS

  • Can't apply ML techniques to improve topics
  • Tied to keyword matching model
  • Low tagging accuracy
Photo by michael.heiss

The Options

Alchemy

Alchemy

BENEFITS

  • Handles all of our current classification tasks
  • Provides sentiment analysis
Photo by Muffet

DRAWBACKS

  • No full ontology available
  • Little control over how concepts / keywords are extracted
Photo by Theen ...

Watson

BENEFITS

  • Very accurate search
  • Handles keyword expansion and other tagging
  • Improves without developer effort
Photo by Martin LaBar

DRAWBACKS

  • Hosted document storage
  • No control over search
  • Entity / topic extraction costs extra
Photo by chrisotruro

The Solution

Photo by Trace Nietert

Untitled Slide

  • Build it with Alchemy
  • Transition to Watson

To Prototyping!

PLATFORM CHANGES

  • Add ontology information
  • Add topic extraction + creation to aggregator
  • Add support for category creation
  • Handle concepts

SOLR CHANGES

  • Use nested documents to support topic metadata
  • Use block join syntax to search
  • Restrict topic core to NC ontology
Photo by rishibando