Wednesday, May 25, 2016

Computational Journalism (Spring Week 7 (late))

Future of computational journalism

(Used the site AMERICAN JOURNALISM REVIEW as primary data source)

http://ajr.org/2014/10/24/using-drones-to-make-3d-models/
http://compute-cuj.org/cj-2014/cj2014_session5_paper1.pdf
Journalists are mashing up drones with GPS-equipped cameras to automatically create 3D models of newsworthy structures.

http://ajr.org/2014/10/24/tool-helps-journalists-track-source-false-twitter-rumors/
http://compute-cuj.org/cj-2014/cj2014_session2_paper2.pdf
Computer scientists are creating software programs to help journalists identify and correct false rumors spreading on Twitter.

http://ajr.org/2014/10/24/teaching-sensor-journalism/
Journalism students are using electronic sensors that monitor dust and noise to investigate construction sites.

http://ajr.org/2014/10/24/artificial-intelligence-tool-for-reporters/
http://compute-cuj.org/cj-2014/cj2014_session1_paper1.pdf
Journo-hackers are developing tools that use artificial intelligence to pull story ideas from big, complicated data sets.

Wednesday, May 11, 2016

Computational Journalism Research Project (Spring Week 6)

Polling and Data Gathering Methods


The Huffington Post:

They don’t so much poll themselves as they report the results of every public poll that claims to provide a representative sample of the population or electorate. Their process is more to do with choosing which polls are worth showing and including to get their data. To that end, all polls used have to meet the minimal disclosure requirements of the National Council on Public Polling. They also weed out polls that fail to disclose survey dates, sample size and sponsorship. In addition, they only include closed-ended trial heat poll questions, and will not include any open-ended questions or those that provide information that will not be on the ballot. In short, they have a thurough vetting process for their data gathering, and a high standard for what they use.


Politifact:

Political statements are rated on a case-by-case. They follow the major political figures and collect statements made by them, and then start the process of reviewing its factual accuracy. A writer researches the claim and writes the Truth-O-Meter article with a recommended ruling. After the article is edited, it is reviewed by a panel of at least three editors that determines the Truth-O-Meter ruling.


FiveThirtyEight:

Predictions based not on polling data alone, but a statistical model driven mostly by demographic and past vote data. Is a poll aggregator site, which means it predicts upcoming elections and such by gathering and averaging pre-election polls published by others.

FiveThirtyEight’s pollster ratings are calculated by analyzing the historical accuracy of each firm’s polls along with its methodology. Accuracy scores are adjusted for the type of election polled, a firm’s sample size, the performance of other polls surveying the same race, and other factors. They also calculate measures of statistical bias in the polls.


Real Clear Politics:

Aggregates polls for presidential and congressional races into averages, taking from a wide range of sources. Honestly seems to just cast a wide net and pull the average, and I can’t find any more detailed clarification.

Wednesday, May 4, 2016

Computational Journalism Research Project (Spring Week 5)

The Huffington Post - http://www.huffingtonpost.com/

News/blog site, it’s somewhat left-leaning in viewpoints, overall good site with just a hair of clickbait-y flavor to it.

Their “Huffpost Pollster” section has a wealth of political polls with near-daily updates on many of them. An effective way to keep up to date on the numbers.


Politifact - http://www.politifact.com/

Quick, checking of political statements and accuracy. Very well put together, allows for quick and spontaneous fact checking on political statements. Seems politically neutral.


Five Thirty Eight - http://fivethirtyeight.com/

Lots and lots of opinion polls. Focuses on politics, economics, and sports. Has detailed interactive graphs and charts to display the polling information.

Lot to work with here, they not only show the data but add noted for why said data is significant.


Real Clear Politics - http://www.realclearpolitics.com/

Shows wide diversity of polls, and also shows how much a poll number has changed since last check. Very easy to see trends with data.

Wednesday, April 27, 2016

Computational Journalism Research Project (Spring Week 4)

How The Guardian is pioneering data journalism with free tools
http://www.niemanlab.org/2010/08/how-the-guardian-is-pioneering-data-journalism-with-free-tools/

The Guardian uses public, read-only Google Spreadsheets to share the data they’ve collected, which require no special tools for viewing and can be downloaded in just about any desired format. They post massive spreadsheets and data graphs for all to see and often just let the data speak for itself.

This method and the popularity of it, shows that a lot of people want the raw information to speak for itself, and it need not be dressed up to catch attention. The data displayed has often gotten good traffic, with the Data Blog logging a million hits a month during the recent election coverage.

This is an interesting view on how The Guardian displays data, and how data journalism is relevant and noteworthy. The Guardian will likely be a focus of research in the project.


Four crowdsourcing lessons from the Guardian’s (spectacular) expenses-scandal experiment
http://www.niemanlab.org/2009/06/four-crowdsourcing-lessons-from-the-guardians-spectacular-expenses-scandal-experiment/

The Guardian sifts through the massive amount of data they deal with via tens of thousands of volunteers who are willing to help them. It’s a rather interesting case of crowdsourced Data Journalism.

The four point used to keep the system working are: Your workers are unpaid, so make it fun, Public attention is fickle so launch immediately, Speed is mandatory so use a framework, and Participation will come in one big burst so have servers ready.

The Guardian clearly has a good system in place, and again they clearly should be a focus of my future efforts.


Hacks and Hackers talk computational journalism
http://www.stanforddaily.com/2015/10/28/hacks-and-hackers-talk-computational-journalism/

How we use technology to enhance news narratives. Computational journalism uses data to find interesting trends to generate stories and help complement them, such as through graphics. This meeting “H/H @Stanford: Computational journalism with CIR, Vocativ and SmartNews” goes over that subject.

Data visualization uses maps and graphs to display information about a subject and help the reader understand it, though it is subject to some generalizations and misleading information.

The SmartNews app chooses news recommendations for its users by using an “exploration” mode, choosing articles outside of the user’s preferences in order to enlarge their knowledge of the world. This type of model contrasts with the “exploitation” model, which only recommends articles within the user’s preferences and is the norm for most such systems

Finnaly, the Center for Investigative Reporting (CIR) described work mining the data from the National Missing and Unidentified Persons System (NamUs) to make a more user-friendly website to help solve cold cases.

This put forth a lot of interesting concepts, for what this tech can and does do. Not sure it’s all closly related enough to include later on, but all noteworthy none the less. The solving of cold-cases is especially noteworthy, need to look into that and more related later.


Is that a fact? Checking politicians' statements just got a whole lot easier
http://www.theguardian.com/commentisfree/2016/apr/19/is-that-a-fact-checking-politicians-statements-just-got-a-whole-lot-easier

ClaimBuster is a program that searches sentences for key words and structures that are commonly found in factual statements. It found a LOT of (Australian) political statements that rated either non-true or otherwise disconnected from factual discussion.


Has interesting implications about the future of fact-checking and the relationship between politics and data journalism. Wonder if we will ever reach the point where politicians can’t bullshit us anymore because their words will be fact-checked in real-time as they say them...I can dream.

Thursday, February 18, 2016

Computational Journalism Research Project


Computational journalism can be defined as the application of computation to the activities of journalism such as information gathering, organization, sense-making, communication and dissemination of news information, while upholding values of journalism such as accuracy and verifiability.[1] The field draws on technical aspects of computer science including artificial intelligence, content analysis (NLP, vision, audition), visualization, personalization and recommender systems as well as aspects of social computing and information science.

Week 1-4
(Insert work I SHOULD have done here.)

Week 5
Preliminary data gathering.
Joined Stanford Computational Journalism Lab email newsletter
http://cacm.acm.org/magazines/2011/10/131400-computational-journalism/fulltext
http://news.stanford.edu/news/2015/march/hamilton-computational-journalism-031315.html

Week 6
Create reading list (and start reading):
-What should the digital public sphere do?, Jonathan Stray
-Computational Journalism, Cohen, Turner, Hamilton
-Precision Journalism, Ch.1, Journalism and the Scientific Tradition, Philip Meyer
-The Jobless rate for People Like You, New York Times
-Dollars for Docs, ProPublica
-What did private security contractors do in Iraq and document mining methodology, Jonathan Stray
-Message Machine, ProPublica

Week 7
Expand research into related fields:
Computational journalism, Database journalism, Computer-assisted reporting, Data-driven journalism



What computer science theory and/or subject matter will it illuminate?
Computational Journalism is a rapidly rising field of computer science. It is already an important field and will only become more so as time goes on. This research project will render a better understanding of the concept and its intricacies.
What will you learn in accomplishing it?
I wish to learn about the development of compilers, and how automata theory relates to AI development.
What resources will you use?
I plan on mostly using scientific journals and research papers.

How will you know when you are done (i.e. what criteria should I use in judging the completion of the project)?

Week 8
Watch CJ lecture-
www.youtube.com/watch?v=pjlPyEkDKrA

Week 9
Reading list:
                -Fundamentals of Computer Graphics, Third Edition - Tamara Munzner
                                http://www.cs.ubc.ca/labs/imager/tr/2009/VisChapter/akp-vischapter.pdf
                -how the guardian is pioneering data journalism with free tools
                                www.niemanlab.org/2010/08/how-the-guardian-is-pioneering-data-journalism-with-free-tools/
                -Preference Networks: Probabilistic Models for Recommendation Systems
                                http://crpit.com/confpapers/CRPITV70Truyen.pdf

Week 10

-???