Basic Algorithmic Reasoning in Computer Science
-
Upload
christopher-reeves -
Category
Technology
-
view
960 -
download
1
description
Transcript of Basic Algorithmic Reasoning in Computer Science
Simple Reasoning Algorithms
Reporting algorithm on live newsChris Reeves (10/16/2012)
@cjreeves2011
Come to a consensus on an article as being positive or negative, and what the article is about. The “main idea”. This is a Naïve approach but
shows the searchable results method for reasoning algorithms
Obtain Data
• Scrape data using python script
Algorithmic Method
Obtain Data
Parse Data
Event Actions
Determine an Action to Take
Return an Action
Action
Accuracy
Data Object Form
Action
• An action is a call to an action function that has no return type. An action is the most basic part of an algorithmic method
• You can have different classes of actions relevant to that class
Void action1(double accuracy){
//perform action}
Accuracy
• Every call should return an accuracy with an action. Every action should take in to account the accuracy of the computation.
• Each word shall be scored in a negative or positive manner as a naïve overview of the article.
Data Object Form
• A structure of objects to hold data associated with a set of data.
• Positive or negative article• Main people, places, things• Main idea of article• References
Stored Tables and Values
• Massive storage tables are needed to store a list of positive and negative words.
• There are roughly 171,476 words in the english language.
• You must rank them all as positive or negative weight on a paragraph.
• Words like sad, glad, upset, down, the, about, all have different positive or negative weights.
How to score words positive or negative?
• Read a lot of text and determine the average amount of times a word is used.
• Manually define words that are negative (20-50 should suffice)
• For any word N in the set of all words, search for N and scrape resulting articles and synonyms.
• Determine if the sample negative and positive words appear disproportionately.
How to score words positive or negative? (cont.)
• Record disproportionate results and rank them as a percentage variance from the mean.
• Dynamiclly add the top 5% of words that are disproportionately, to your manually defined positive and negative lists respectively.
Caution
• Do not look for specific key words to determine the the meaning of an article only.
• Search for articles of meaning “happy”, “upset”, “good news”,”bad news” and let the algorithm set its own idea of “good” and “bad” after reading 1000’s of articles on the topic.
Caution (cont.)
• Example: if articles pertaining to “happy” happen to contain the word “automotive” at a disproportionate level, leave this data.
• It might not make sense, but with a large enough data set this is the best method.
• Search engines provide a limitless quota of data to be parsed and used for calibrating intelligent systems.
Searchable Results Method
• Searchable results method is where you queue articles of a desired type with a search term to guaruntee articles of known result.
• These articles are all positive or negative so you can see if your algorithm guesses correctly.
You are now ready to set accuracy.
• How do you determine if you were correct in rating an article for positivity or negativity?
Simply search for “good news” or “bad news” and the algorithm will give a rating, usually from 0 to 100 on the good-bad scale.
The algorithm should, on average, return a number in the direction of the term that you searched.
If you search “bad news”, the algorithm should return very low numbers. I
Determine an Object Form
• Use language rules to determine topics with searchable results method.
• How often is the title the topic?• How often does the topic appear in the text?• Is the topic a person, place, thing, or idea?
Determine an Object Form (cont.)
• You can determine the topic with searchable results method by searching “windmills”. Then counting the number of results with windmills in the title.
• Now you know what % chance the title is also the topic. You must have a well defined format for topics though.
Simple Resoning Algorithm
Raw DataPositive and Negative words list
Search Results
Populate word list
Match raw data against Word list and determine positive or negative
List of objects and parameters Determine main ideas,
Learn language from Sample dataOpinion of
article/data Test raw data against conclusionsFrom recorded data sample.