U_Cite: America politician network analysis based on Quotebank
Final project for CS-401 Applied Data Analysis
Lecturer: Prof. Robert West
Quotebank is a dataset of 235 million unique, speaker-attributed quotations that were extracted from 196 million English news articles crawled from over 377 thousand English web domains. The project aims at analyzing the quotebank mentions in between year 2015 and 2020 to reveal the bi-polar political landscape of America. Keypoints in the project:
- data cleaning and preprocessing pipeline from original Quotebank quotations, Wikidata dump and Partisan Audience Bias Scores Dataset.
- political mention analysis pipeline including topic, sentiment and bias analysis.
- political network analysis pipeline including network construction, community/centrality analysis and edge/node feature detection.
- visualisation pipeline for the analysis above with interactive network graphs.