Skip to content Skip to navigation

DiscoverText & Sifter:Text Analytics for Social Data

When: Tuesday, January 17, 2017 (Three Repeating Sessions)

Where:  Session A: 10-11:30 AM – Green Library Bing Wing, Room 121A

                Session B: 1:30-2:50 PM – McClatchy Hall

                Session C: 3:15-4:45 PM – Green Library Bing Wing, Room 121A

Presenter: Dr. Stuart W. Shulman

Please RSVP to reserve a seat:http://web.stanford.edu/group/iriss/iriss-forms/discovertext.fb

Participate in this workshop to learn how to build custom machine classifiers for sifting social media data. The topics covered include how to:

  • construct precise social data fetch queries,
  • use Boolean search on resulting archives,
  • filter on metadata or other project attributes,
  • count and set aside duplicates, cluster near-duplicates,
  • crowd source human coding,
  • measure inter-rater reliability,
  • adjudicate coder disagreements, and
  • build high quality word sense and topic disambiguation engines.

DiscoverText collects and cleans up messy Twitter and other text data streams. The workshop covers how to reach and substantiate inferences using a theoretical and applied model informed by a decade of interdisciplinary, NSF-funded research into the text classification problem. Participants will learn how to apply “CoderRank” in machine learning.  The major idea of the workshop is that when training machines for text analysis, researchers should rely on the input of those humans most likely to create a valid observation.

Bio

Dr. Stuart W. Shulman is founder & CEO of Texifter.  He was a Research Associate Professor of Political Science at the University of Massachusetts Amherst and the founding Director of the Qualitative Data Analysis Program (QDAP) at the University of Pittsburgh and at UMass Amherst. Dr. Shulman is Editor Emeritus of the Journal of Information Technology & Politics, the official journal of Information Technology & Politics section of the American Political Science Association.