Collaborative Filtering and Rules for Music Object Rating and Selection Sifter Project Meeting...

41
Collaborative Filtering and Rules for Music Object Rating and Selection Sifter Project Meeting Michelle Anderson Marcel Ball Harold Boley Nancy Howse Daniel Lemire NRC, IIT Fredericton, NB, Canada June 19th, 2003 (Revised June 18 th )
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    225
  • download

    3

Transcript of Collaborative Filtering and Rules for Music Object Rating and Selection Sifter Project Meeting...

Collaborative Filtering and Rules for Music Object Rating and Selection

Sifter Project MeetingMichelle Anderson

Marcel BallHarold BoleyNancy HowseDaniel Lemire

NRC, IITFredericton, NB, Canada

June 19th, 2003(Revised June 18th)

How to implement industry standards for existing Sifter subprojects: RALOCA, COFI Music?

Currently, several industry standards are in place to facilitate the description, search, storage, etc of Learning Objects.*

An LO can be expressed as an entity with content surrounded by an outer shell of descriptive tags (metadata).

* Learning Objects can be composed of multimedia content (images, video, sound), instructional content, learning objectives, or a combination of these different formats.

Learning Objects: Metadata components*

LO

General

Meta Metadata

Technical

Life Cycle

EducationalRights

Relation

Annotation

Classification

*Based on the SCORM Meta-data Information Model

Where do RALOCA and COFI Music come in?

If these systems (or combined together to form one entity)can interpret relevant meta information about an LObased on current standards to provide interoperabilitythese LOs can be sifted, weighted or compared.

RALOCA / COFI Music

- SCORM- CanCore- RSS-LOM- IMS

LO repository

COFI MusicCOFI Music

byby

Nancy E. HowseNancy E. Howse

Collaborative Filtering Systems (COFI)

• Collects ratings from a number of users

• Recommends items to user based on correlations between ratings of current user and other users in database

Multi-Dimensional Ratings

Some Algorithms

• Average – O(1)

• Per Item Average – O(1)

• Pearson – O(m)

• Where ‘m’ is the number of users.

Some Admin Features…

• Add/remove items

• Remove users

• View a list of users and the number of items they have rated

• View the ratings of a user

RALOCA

Rule-Applying Learning-Object

Comparison Agent

Marcel A. BallNational Research Council

Institute for Information Technologye-Business

RALOCA

• RALOCA is a rule-based system for multi-dimensional comparison of learning objects (currently, music albums) based on jDREW Bottom-up (BU) with data represented in Object-Oriented RuleML.

• Part of Sifter Mosaic/NRC e-Learning project

The functionality of RALOCA

• COFI provides RALOCA with a table of predictions (summarized ratings)

• RALOCA uses a rule-based approach to combine the multi-dimensional predictions from COFI into a one-dimensional ranking of the items (objects)

RALOCA Architecture

RALOCA Architecture

Interfacing with COFI Music

• RALOCA builds on top of collaborative filtering technology from the COFI Music project (Nancy Howse) for ratings of the LOs

• Currently data is exchanged between RALOCA and COFI Music using Java serialization

• Currently has code in place to use the per item average algorithm

• We will use more advanced collaborative filtering algorithms, which will lead to better predictions

LO RuleML Representation<fact>

<_head><atom>

<_opr><rel>product</rel></_opr><_r n=”asin”><ind>B00004YTYO</ind></_r><_r n=”title”><ind>Between the Bridges</ind></_r><_r n=”artist”><ind>Sloan</ind></_r><_r n=”cost”><ind>15.99</ind></_r><_r n=”lyrics”><ind>6</ind></_r><_r n=”originality”><ind>8</ind></_r><_r n=”performance”><ind>6</ind></_r>. . .

</atom></_head>

</fact>

Modification RulesModification rules allow the system to dynamically change values

for the dimensions of a LO, based on information about the LOs, and the user profile.

Example: There is a 5% discount for students buying roducts costing over $20.00.

The modify relation has four roles:

- amount - in our example this is '%-5'- variable - we want to change the 'cost'- product - a variable that will hold the asin of the LO- comment

<imp><_head>

<atom><_opr><rel>modify</rel></_opr><_r n=”amount”><ind>%-5</ind></_r><_r n=”comment”><ind>5% discount for students</ind></_r><_r n=”variable”><ind>cost</ind></_r><_r n=”product”><var>ASIN</var></_r>

</atom></_head><_body>

<and><atom>

<_opr><rel>isstudent</rel></_opr><ind>yes</ind>

</atom><atom>

<_opr><rel>product</rel></_opr><_r n=”asin”><var>ASIN</var></_r><_r n=”cost”><var>COST</var></_r>

</atom><atom>

<_opr><rel>$gt</rel></_opr><var>COST</var><ind>20</ind><ind>true</ind>

</atom></and>

</_body></imp>

Is the user a student?

Is the cost greater-than 20?

Retrieve asin and cost of the LO

XML Representation of n-Dimensional Object Ratings

• Ratings of (music, film, …) object will be on a scale from 0 to 10

• COFI’s n-dimensional ratings of a given music object with some asin code can be represented in OO RuleML as ‘complex term’ (cterm) elements:

– A rate value v becomes marked up as <ind>v</ind>.– Each v-rated dimension d becomes <_r =“d”><ind>v</ind></_r>.

• There is one rating cterm for every music object and for every rater: One row from the COFI prediction table

Two Sample RatingsFor example, for object asinXYZ let us illutstrate 3 dimensions“lyrics”, “originality”, “performance”, as rated by 2 raters:

<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics”><ind>3</ind></_r><_r n=“originality”><ind>8</ind></_r><_r n=“performance”><ind>6</ind></_r>

</cterm>rating

Cterm

lyricsperformance

originality

<ind>3</ind><ind>8</ind>

<ind>6</ind>

<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics”><ind>7</ind></_r><_r n=“originality”><ind>8</ind></_r><_r n=“performance”><ind>4</ind></_r>

</cterm>

rating

Cterm

lyricsperformance

originality

<ind>7</ind> <ind>4</ind><ind>8</ind>

COFI: Generating a Summarized RatingThese ratings can act as a ‘training set’ of typical instances and aweighted representation can be inferred, e.g. using data-mining / collaborating filtering techniques: in this example just the arithmetic means, using the standard deviations to determine the significance(w) of the ratings.

<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics” w=“0.2”><ind>5</ind></_r><_r n=“originality” w=“0.7”><ind>8</ind></_r><_r n=“performance” w=“0.1”><ind>5</ind></_r>

</cterm>

The weights, w, on a scale from 0.0 to 1.0, reflect the raters’ agreements in each of the dimensions (weights add up to 1.0).

<_r n="lyrics"> <cterm> <_opc><ctor>rating</ctor></_opc> <_r n="value"><ind>6</ind></_r> <_r n="stddev"><ind>2.4</ind></_r> </cterm></_r><_r n="music"> <cterm> <_opc><ctor>rating</ctor></_opc> <_r n="value"><ind>8</ind></_r> <_r n="stddev"><ind>3.8</ind></_r> </cterm></_r><_r n="performance"> <cterm> <_opc><ctor>rating</ctor></_opc> <_r n="value"><ind>3</ind></_r> <_r n="stddev"><ind>1.6</ind></_r> </cterm></_r>

. . .

Ranking by Standard Deviation<fact> <_head> <atom> <_opr><rel>product</rel></_opr>

<_r n="asin"> <ind>B00004YTYO</ind> </_r> <_r n="title"> <ind>Between the Bridges</ind> </_r>

. . .

<_r n="publish Year"><ind/></_r> <_r n="numtracks"><ind>13</ind></_r> <_r

n="genre"><ind>Traditional</ind></_r> <_r n="label"><ind>BMG</ind></_r> </atom> </_head>

</fact>

RALOCA: Retrieval Patterns

A retrieval pattern can now be used to find a subset of ranked instances from (a ‘test set’ of ) many instances, based on the summarized rating.

For example a user might specify desired (minimum) “lyrics”,“originality”, and “performance” ratings along with their weights.

<cterm uriref=“asinXYZ”> <_opc><ctor>rating</ctor></_opc> <_r n=“lyrics” w=“0.4”><ind>8</ind></_r> <_r n=“originality” w=“0.3”><ind>7</ind></_r>

<_r n=“performance” w=“0.3”><ind>6</ind></_r></cterm>

XSLT: OO RuleML to “Song Rating XML”

We can use XSLT to transform generic OO RuleML into a domainspecific positional format (“Song Rating XML”) for rating of musicobjects

<cterm uriref="asinXYZ/userid"> <_opc><ctor>rating</ctor></_opc> <_r n=“lyrics”><ind>6</ind></_r> <_r n=“originality”><ind>9</ind></_r> <_r n=“performance”><ind>6</ind></_r></cterm>

<===XSLT===>

<rating song="asinXYZ" user="userid"> <lyrics>6</lyrics> <originality>9</originality> <performance>6</performance> </rating>

OO RuleML to Positional RuleMLTranslators

•XSLT Transformations

•3 Step Process

•Similar to Unix pipes

OO RuleML Representation

signature(database schema, template)

implementation

<fact>implementation

<fact>

. . .

Apply to

Apply toApply to

applysig.xsl

<fact> <_head> <atom> <_opr><rel>product</rel></_opr> <_r n="title"><ind>Between the Bridges</ind></_r> <_r n="artist"><ind>Sloan</ind></_r> <_r n="asin"><ind>B00004YTYO</ind></_r> <_r n="publishYear"><ind>1997</ind></_r> <_r n="numtracks"><ind>13</ind></_r> <_r n="genre"><ind>Traditional</ind></_r> <_r n="cost"><ind>14.99</ind></_r> <_r n="label"><ind>BMG</ind></_r> </atom> </_head></fact>

Signature is applied to atoms with rel = product and fills in missing roles.All order is lost.

<fact> <_head>

<atom> <_opr><rel>product</rel></_opr> <_r n="label"><ind>BMG</ind></_r>

… <_r n="publish Year"><ind/></_r> <_r n="quality"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="music"><ind/></_r> <_r n="lyrics"><ind/></_r>

… <_r n="asin"><ind>B00004YTYO</ind>

</_r></atom>

</_head></fact>

<signature … order="sorted"><_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind/></_r> <_r n="title"><ind/></_r> <_r n="artist"><ind/></_r> <_r n="cost"><ind/></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r>

<_r n="publish Year"><ind/></_r> <_r n="numtracks"><ind/></_r> <_r n="genre"><ind/></_r> <_r n="label"><ind/></_r> </atom></_head>

</signature>

The order of the signature is applied to the atom when order = sorted. Otherwise _r’s are sorted by the n attribute

nprmlsort.xsl

<fact> <_head>

<atom> <_opr><rel>product</rel></_opr> <_r n="label"><ind>BMG</ind></_r>

… <_r n="publish Year"><ind/></_r> <_r n="quality"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="music"><ind/></_r> <_r n="lyrics"><ind/></_r>

… <_r n="asin"><ind>B00004YTYO</ind>

</_r></atom>

</_head></fact>

<fact><_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind> … </ind></_r> <_r n="title"><ind> … </ind> </_r> <_r n="artist"><ind> … </ind></_r> <_r n="cost"><ind> … </ind></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r>

<_r n="publish Year"><ind> … </ind></_r> <_r n="numtracks"><ind> … </ind></_r> <_r n="genre"><ind> … </ind></_r> <_r n="label"><ind> … </ind></_r> </atom></_head>

</fact>

<signature … order="sorted"><_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind/></_r> <_r n="title"><ind/></_r> <_r n="artist"><ind/></_r> <_r n="cost"><ind/></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r>

<_r n="publish Year"><ind/></_r> <_r n="numtracks"><ind/></_r> <_r n="genre"><ind/></_r> <_r n="label"><ind/></_r> </atom></_head>

</signature>

oorml2prml.xsl

<fact> <_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind> … </ind></_r> <_r n="title"><ind> … </ind> </_r> <_r n="artist"><ind> … </ind></_r> <_r n="cost"><ind> … </ind></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r> <_r n="publish Year"><ind> … </ind></_r> <_r n="numtracks"><ind> … </ind></_r> <_r n="genre"><ind> … </ind></_r> <_r n="label"><ind> … </ind></_r> </atom> </_head></fact>

<signature … order="sorted"><_head> <atom> <_opr><rel>product</rel></_opr> <_r n="asin"><ind/></_r> <_r n="title"><ind/></_r> <_r n="artist"><ind/></_r> <_r n="cost"><ind/></_r> <_r n="lyrics"><ind/></_r> <_r n="music"><ind/></_r> <_r n="performance"><ind/></_r> <_r n="originality"><ind/></_r> <_r n="quality"><ind/></_r>

<_r n="publish Year"><ind/></_r> <_r n="numtracks"><ind/></_r> <_r n="genre"><ind/></_r> <_r n="label"><ind/></_r> </atom></_head>

</signature>

<signature … order="sorted"> <_head>

<atom> <_opr><rel>product</rel></_opr> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> </atom>

</_head></signature>

Metaroles (_r) are removed, leaving a positionalized version of each atom

<fact><_head> <atom> <_opr><rel>product</rel></_opr> <ind>B00004YTYO</ind> <ind>Between the Bridges</ind> <ind>Sloan</ind> <ind>14.99</ind> <ind/> <ind/> <ind/> <ind/> <ind/> <ind/> <ind>13</ind> <ind>Traditional</ind> <ind>BMG</ind> </atom></_head>

</fact>

Relational Database

• Table 1: Ratings (ItemID, UserID, Dimension1, …)

• Table 2: Users (UserID, UserName, Password, …)

• Table 3: Comments (ItemID, UserID, Date, Comment, …)

• Table 4: Item (ItemID, Title, Author, …)

Free Text

• We will start collecting and displaying comments for two reasons:– add more content to our sites – allow further research by Anna Maclachlan

and others

Conclusion

• COFI and RALOCA are specific to the music domain they describe, but they can easily be converted to describe various other e-Learning domains: movies, etc.

• This could be implemented to add an advanced rating / search feature to existing data-collecting systems.

Extra Slides

Learning Objects: Industry Standardization

LO IEEE-LOM; provides structuredDescriptions of re-usable digitalLearning resources.

RSS-LOM Module (translation)

RSS 1.0; allows learning object repositories to syndicate listings and description of learning objects.

Date

LO / FEED

Author Technical FormatX X

Unique Identifier:(registry agency

identifier number and time in milliseconds)

RSS-LOM-Eval

Completing Missing DimensionsTaking the “performance” rating from the collaborative pattern, this will be expanded into the final retrieval pattern:

<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics”><ind>8</ind></_r><_r n=“originality”><ind>7</ind></_r><_r n=“performance”><ind>5</ind></_r>

</cterm>

Possibly also the weights can be taken from the collaborative pattern (so omitting a dimension would not mean it has weight 0.0)

<cterm uriref=“asinXYZ”><_opc><ctor>rating</ctor></_opc><_r n=“lyrics” w=“0.6”><ind>8</ind></_r><_r n=“originality” w=“0.1”><ind>7</ind></_r><_r n=“performance” w=“0.3”><ind>5</ind></_r>

</cterm>

Scoring RulesThe system uses a RuleML file to calculate the score of an LO. The only fixed relation within the scoring rule file is the 'score' relation, which has two arguments, one containing the 'asin' of the album (a unique identifier) and the actual score.

• Currently implemented as a normalized weighted sum

• Can be changed to implement another scoring scheme providing thatthe scheme can be calculated using the built in relations in jDREW– currently the following: addition ($add), subtraction

($sub), multiplication($mul), division ($div), summation ($sum), less-than ($lt), greater-than ($gt), square-root($sqrt).

RALOCA: Technologies used

• Object-Oriented RuleML – using the XSLT Translators written by Stephen Greene to convert Object-Oriented RuleML into positional RuleML, which can be interpreted by jDREW

• jDREW BU developed by Bruce Spencer modifications by Marcel Ball

Scale and Translation Invariant Algorithms

• Scale

• Translation

User 1 User 2 User 3 User 4 User 5

Scale and Translation Invariant Algorithms

• Scale

• Translation

User 1 User 2 User 3 User 4 User 5

Collaborative filtering I

Correlates the current user’s ratings with those of other users.

Collaborative filtering system correlate the provided ratings of the current user with the ratings of all other users of the database.

to predict the current users’ ratings for unrated items.

Collaborative filtering II

• RALOCA user pre-rates 3 standard items/objects

• Ratings used for filtering similar raters’ ratings from COFI Music

• Similar means