InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer...

20
InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department of Psychology, MacEwan University CANADA

Transcript of InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer...

Page 1: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

InfoBright for Analyzing Social Sciences Data

Julia JohnsonDepartment of Mathematics and Computer

Science, Laurentian University

Genevieve JohnsonDepartment of Psychology, MacEwan University

CANADA

Page 2: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Social Sciences Data•Quantitative vs Qualitative

Page 3: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Quantitative Data Analysis When children use the Internet at

home, sometimes they are alone and sometimes they are with others. Considering all the time that your child uses the Internet at home, please provide the approximate amount of time that your child uses it alone and with others (the total should equally 100%).

Alone or by themselves ________% With an adult ________% With another child ________% Total 100%

Page 4: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Online Communities for those who Self-Injure

Page 5: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

How often do you visit this message board?

Page 6: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Why do you use this board?• I get support and advice, i can talk openly about

problems including SI to people who I know will understand. I also find if I can be any help to anyone else it helps me see things more in proportion and make sense of my own difficulties.

• I use it as a distraction when I feel bad and I use it as a place to find social support.

• I like the philosophy. I also like that there is an established population of people "my age". The later is very important.

Page 7: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Software for Qualitative Data Analysis

Page 8: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Semantic Similarity Comparison Metrics: Previous work:

• Algorithms rely on WordNet lexical database.

• An interface is available that accepts words and gives a measure of their similarity.

• Various algorithms have been developed by researchers including Resnik, Lin, Jiang-Conrath, Leacock-Chodorow, Hirst-St.Onge, Wu-Palmer, Banerjee-Pedersen, and Patwardhan-Pedersen.

Page 9: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Work of Chien & Immorlica:

• discover semantically similar queries of search engines

• based on the similarity in behavior of the queries over time.

• find temporally correlated input queries

• quantify relatedness between queries.

may be relevant for relatedness of questions, but logically related qualitative responses may not be temporally correlated as are the requests of a web server.

Page 10: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Existing Software for Analysis of Social Sciences Data

• To some extent, the coding of qualitative social sciences data has been improved with the development of software. Current code based software includes ‐content analysis tools, word frequencies, word indexing with key word in context retrieval, and text based searching tools [3]. Such software systems are, however, inadequate for making meaning of text. They amount to data management systems requiring the user to reformulate the data in a preprocessing step. To avoid the subjectivity so introduced a view based on rough sets was investigated.

Page 11: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Infobright Implementation

We investigate usefulness of information generated as a result of decompression for evaluating meaning of word-based responses to queries.

Page 12: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Observation

correlation exists between amount of time required to evaluate a query and the amount of exact computation required for the query

Page 13: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Hypothesis

• information generated as a result of decompression has semantic utility • there is a correlation between

query speed and semantic relatedness of queries.

Page 14: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

• Infobright Enterprise Edition was obtained on a special academic promotional offer.

• It was installed on an 8-core, 8 gig ram server running the Debian operating system (a flavor of Linux).

• Infobright MySQL database server was accessed using an SQL client running on Windows.

Page 15: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Schema Definition using Infobright server

Page 16: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Data to Populate Database

• ~001~ Nov. 13~ ~9:37pm~ ~bus ~• ~ no answer From question 4, the answer can be supposed

to be ‘34’ ~• ~ Female ~• ~ I started cutting completely by accident. I discovered it

after a bad day when I was 5 and had broken a glass and accidently cut myself with the glass. I noticed that it helped release the intense emotions and made me feel more able to breathe. There were times when I would burn but that was an entirely different sensation than cutting and was rare.~

• ~ I started around 5 or 6 and I am 34 now.~

Page 17: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Self injury table to collect preliminary information and answers to questions

Page 18: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Streamlining Extensions with Existing Infobright Methodology

• knowledge grid useful to provide semantic information about whether a given text and one selected from a column are identical in meaning, not similar in meaning at all, or overlapping in meaning.

• An indicator of the degree of overlap is already partially provided by the statistics returned from IB upon each query evaluation.

Page 19: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Proposed Extensions to Infobright• associating with the results of relational expressions whether the

relationship is irrelevant, suspect or relevant • In the case of an exact computation (suspect data packs), a measure of

the amount of overlap. Column metrics rather than row metrics• For the HAVING and GROUP BY clauses in queries a column oriented

feature extraction and classification process that clusters texts within a given column by placing those requiring a high degree of exact computation (semantically similar ones) all in one cluster.

• Expansion of the statistics about query speed to include other information available during selective decompression to provide a more accurate indicator of the degree of overlap.

• A facility provided whereby an Infobright database may be overlaid by a MySQL schema for specification and enforcement of key constraints and referential integrity constraints (open to debate).

Page 20: InfoBright for Analyzing Social Sciences Data Julia Johnson Department of Mathematics and Computer Science, Laurentian University Genevieve Johnson Department.

Summary and Conclusions

• Support for textual column values that tell a story• The ability to distinguish texts from a given column that require

no decompression from those in the same column that require some.

• Support for the query language to cluster texts from a given column based on the degree of decompression required to materialize them.

• Provision of additional parameters regarding the amount of decompression done.

• Expandable column names to reveal the full text of questions and expandable column entries to reveal the full text of answers given by respondents.