WEB & MEDIA GROUP...Enrichment • Personalized Semantic Search • Mobile Museum & City Tours •...
Transcript of WEB & MEDIA GROUP...Enrichment • Personalized Semantic Search • Mobile Museum & City Tours •...
WEB & MEDIA GROUP TOPICS FOR MSC PROJECTS
Jacco van Ossenbruggen
Johan Oomen
Guus Schreiber
Tobias Kuhn
Lora Aroyo Victor de Boer
Jan Wielemaker Valentina Maccatrozzo
Davide Ceolin
Anton Eliens
Antske Fokkens
Martine de Vos
Antoine Isaac
Chris Welty
Benjamin Timmerman
s
Oana Inel
Anca Dumitrache
Dena Tahvildar
i
Xander Wilcke
Ronald Siebes
http:/ / wm.cs.vu.nl/
http:/ / wm.cs.vu.nl/
• Social Sciences • Communication Sciences • Languages • History
Above Water Systems
WE WORK WITH ….
MSC PROJECTS THEMES
Data, Analysis & Visualization
Cultural Heritage, Web & Visitors
Collection & Metadata Enrichment
Future TV & Web
Crowdsourcing & Social Web
Interactive Mobile & Web Apps
• experimenting • exploring & analytics • to be driven by curiosity • pragmatics-oriented approaches • creative working spirit • having fun • general 'hacker' attitude … then these are the right projects for you
IF YOU LIKE ….
CULTURAL HERITAGE, THEIR VISITORS & THE WEB INTERACTIVE INTERFACES
PROJECTS
• Museum, Libraries, Archives Collection Enrichment
• Personalized Semantic Search • Mobile Museum & City Tours • Interactive Multitouch Applications • Innovative Interactive User Interfaces • Games with a purpose • Crowdsourcing for Video & Image Tagging • and much more ....
CULTURAL HERITAGE, THEIR VISITORS & THE WEB, UIS
- Het Netwerk Oorlogsbronnen (NOB) wil samen bouwen aan een verbeterde digitale toegang tot de Nederlandse collectie over de Tweede Wereldoorlog
- collecties van circa 400 instellingen digitaal (beter) bruikbaar en vindbaar te maken & ontwikkelen van een digitale basisdiensten die helpen relevante informatie over de Tweede Wereldoorlog te vinden. Dit zijn (semantisch) gestructureerde informatievoorzieningen en innovatieve digitale diensten om digitale bronnen toegankelijk te maken op gebied van “Wie”, “Wat”, “Wanneer” en “Waar”.
- NOB is een samenwerkingsverband van erfgoedinstellingen met WO2-collecties, gefaciliteerd door het NIOD Instituut voor Oorlogs-, Holocaust- en Genocidestudies. www.oorlogsbronnen.nl
Tekst mining
• De portal van het Netwerk Oorlogsbronnen bevat op dit moment 10 miljoen digitale objecten. De komende jaren zal dit aantal enorm groeien. De meeste objecten zijn beperkt gemetadateerd. De metadata is bovendien vaak ongestructureerd. Of de relevante informatie voor gebruikers en onderzoekers is als lopende tekst in beschrijvingsvelden opgenomen.
• In veel gevallen zijn kranten en archieven full text geautomatiseerd. Zonder enige structuur.
• Wij zijn op zoek naar geautomatiseerde mogelijkheden om met behulp van namen, plaatsen en inhoudelijke trefwoorden te herkennen in teksten en gematchte bronnen te markeren.
• Doel van deze stage is het terrein van geautomatiseerde matching te verkennen en bij voorkeur een werkende oplossing (tool, applicatie…) op te leveren.
Personenportal Het NOB werkt samen met een aantal archieven, musea en herinneringscentra aan het ontwikkelen van een Personenportal. In deze personenportal worden personen/individuen vastgelegd (vanuit namenlijsten, archieven, persoonsdossiers enzovoorts) met relevante biografische gegevens en verwijzingen naar (digitale) bronnen. Doel van de portal is dat mensen vanuit 1 portal kunnen zien in welke transporten of kampen familieleden hebben gezeten. Of welke Nederlanders waar gevochten hebben (zowel in het Nederlands leger, als in het verzet als aan Duitse zijde enzovoorts).
Voor de personenportal zoeken wij: Stagiair RDF modelling semantisch structureren van persoonsgegevens Doel van deze stage is het ontwikkelen van een Linked Open Semantisch model voor vastlegging van personen. - Op dit moment zijn er heel verschillende, vaak lokaal ontwikkelde datamodellen voor
persoonsgegevens. - De gegevens uit verschillende bronnen dienen via Linked, Open, Interoperabele
structuren samengebracht en gedeeld te worden. Stagiair retrieving en matching personen uit semi gestructureerde bronnen Doel van deze stage is het ontwikkelen van methodes en sripts om uit semi gestructureerde bronnen - personen te identificeren en geautomatiseerd te voorspellen of verschillende
verwijzingen dezelfde persoon betreffen
Webanalyse De website oorlogsbronnen.nl ondergaat momenteel een transformatie. We maken de website meer netwerk-georienteerd. Een van de doelstellingen is daarnaast om webanalyses uit te voeren. Naast een aanpassing in de interface wordt die functionaliteit ook meegenomen (met name wat betreft de portal). De website is het centrale platform van NOB en we zijn benieuwd wie ons wanneer en naar welke tevredenheid bezoekt, en niet onbelangrijk: onze bronnen gebruikt! We zoeken een stagiair die kundig is in het maken van webanalyses, in het speciaal van portalen. Van de website oorlogsbronnen.nl in het algemeen denken we aan informatie als: - Bezoekersaantallen en –profielen (geografische standplaats) en verder gedrag (paginabezoek,
duur van bezoek, van waar binnenkomst oorlogsbronnen.nl – vindbaarheid via Google speelt hiermee samen - , etc.) uitgewerkt.
Over het gebruik van de portal willen we graag weten: - Zoekopdrachten in portal: welke termen zijn ingevoerd (percentages)? Welke resultaten bij
zoekopdrachten? Verder doorklikgedrag. - Gebruik van filters in portal: Algemeen gebruik maar ook in welke fase van de zoekopdracht
worden deze gebruikt? Daarnaast: Aanbevelingen voor de uitvoer van een structurele webanalyse. Doel is om met name resultaten uit de portal-analyse te gebruiken voor verbetering. Door bijvoorbeeld veelgebruikte zoektermen op te nemen in de WO2-thesaurus.
CONTROCURATOR Crowds and Machines for Modeling and Discovering Controversy
Summarization of high profile and catastrophic events in broadcast news & social media: • How to combine machine learning and crowd
annotations to improve on the identification of salient sub-events?
• How to present and visualize narrative results and
timelines of events to help media professionals create news stories?
http:/ / dive.beeldengeluid.nl collaboration with Sound & Vision
Contextualizing information in videos • identify meaningful information/keywords (events, keywords, entities) in videos
(content, synopsis, subtitles) • plot the keywords in the timeline of the video
Ranking events in videos based on • identifying the main event and its sub-events • relevance and/or salience to the video • event enrichment with participating entities such as people, location and other concepts
http:/ / dive.beeldengeluid.nl
REPRESENTING HISTORICAL NARRATIVES
(Media) historians find and collect data and media online (for example with the DIVE tool). They collect these into proto-narratives, but when they are publishing it, they write it down in non-interactive formats. ● How can we develop richer Linked Media Narratives instead of boring
old papers? ● What is a good ontology for media narratives (nanopublications?) ● Can we generate nice-looking web publications out of these narratives?
http:/ / crowdtruth.org/ collaboration with IBM
Crowdsourcing Experiments for UI Design for Templates • perform comparative evaluation of different design choices • defining optimal template designs for different tasks How can you capture data ambiguity? • is ambiguity related to template design or to disagreement between annotators?
Crowdsourcing Games for Art Annotations
Reasoning and representation of Dance
Different representations for Dance and expressive movement exist (for example Labanotation). However, there is a disconnect between the low-level representations and higher-level creative reasoning. ● Investigate opportunities for semantic
represtations of Dance / creative movement
● Explore possibilities for machine learning
and other techniques for semi-automatic choreography
DATA ENRICHMENT FOR MUSEUM, LIBRARIES, ARCHIVES & TV
PROJECTS
ANALYSIS OF THE EUROPEANA SOCIAL MEDIA PRESENCE
• Interested in business information analysis?
• Want to explore what is the influence of social media on visibility, business relations, etc?
• Want to know how to provide effective and efficient strategies with Social Media, based on user log analysis?
http:/ / www.europeana.eu/ portal/
Quality & Perspectives in Deep (Web) Data
• 80 % of digital data is in unstructured textual form • Textual data is rich and complex - contains massive amounts of
statements & perspectives on them: emotions, opinions, the interpersonal, as well as the current social debate.
• Textual data = big and also deep • Framework for deep data representation that makes data
provenance, quality and perspective explicit in the way such data is described and consumed.
• Will allow to track variations over time and enhance our understanding of data and its reliability.
DATA ANALYSIS & VISUALIZATION
PROJECTS
Interact with Nuclear Radiation
The SafeCast.org community is a collaborative effort of volunteers to measure nuclear radiation and share it as open-data. This project is about investigating Big-Data technology with respect of increasing the usability of the SafeCast data
Amsterdam Tourism Barometer - Monthly visitor trends
- Gives an indication on the current situation in Amsterdam
- Use the comparison tool to have more insights on different nationalities, subregions and periods in time.
[email protected] crowdtruth.org
Data Analysis in Crowdsourcing Games • What are useful data analytics to motivate more
gamer? • What are useful data analytics to increase the learning
in the game? • Interactive data visualization for crowdsourcing data
analytics
collaboration with
FUTURE TV TV GUIDES SECOND SCREEN
PROJECTS
NoTube: Personalized, Social & Interactive TV ViSTA-TV: Linked Open Data, Live Analytics, Recommendations [email protected]
INTERACTIVE TV
SHARED & PERSONALIZED SECOND SCREENS
INTEGRATION WEB & TV
TV RECOMMENDERS http:/ / vista-tv.eu
SOCIAL MEDIA SOCIAL WEB HUMAN-ASSISTED COMPUTING CROWDSOURCING WEB APPS
PROJECTS
The Wisdom of the Crowd in Digital Humanities
http://crowdtruth.org
CROWDSOURCING & HUMAN COMPUTATION
• Video Annotations (Sound & Vision, IBM) • Events, people, locations, times ….
• Image Annotations (Rijksmuseum, IBM) • Events, Flowers, Castles, Birds, ….
• Text Annotations (IBM, Google) • Medical relations, diagnosis, …. • Open domain questions • Video descriptions, transcripts
http://crowdtruth.org
CROWDSOURCING ANNOTATIONS FOR MUSEUM COLLECTIONS
http:/ /sealincmedia.wordpress.com/
Winner EuroITV Competition Best Archives on the Web Award
http:/ / waisda.nl
collaboration with Sound & Vision
GAMES WITH A PURPOSE CROWDSOURCING GAMES
PROJECTS
http:/ / crowdtruth.org/
collaboration with IBM
http:/ / game.crowdtruth.org Designing Crowdsourcing Games
[email protected] crowdtruth.org
Creating Crowdsourcing Workflows for Information Extraction • How to combine paid crowdsourcing with crowdsourcing games? • How to combine machine processing with crowdsourcing? • How to get the most of the crowdsourcing workers for min pay? • How to measure quality of results in open crowdsourcing tasks? • How to optimally combine Amazon Mechanical Turk & CrowdFlower?
Crowdsourcing for (Active) Machine Learning
• Use the crowd to train a machine learning for recognizing entities in text, images, videos or sounds?
collaboration with
ElevatorAnnotator On-site crowdsourcing to annotate media content using pervasive technology (rasperryPi, camera, microphone).
In collaboration with Sound and Vision
USER-GENERATED & LINKED DATA DATA QUALITY & TRUST
PROJECTS
RANKING CROWDSOURCING ANNOTATIONS BASED ON USER’S TRUST
Chris Dijkshoorn
http:/ / crowdtruth.org/ collaboration with IBM
Combining machine & crowd annotations for Images, Video or Text • heuristics for quality of image, video or text annotations • algorithm for using these criteria for an annotation workflow • evaluate & compare quality of machine vs. crowd annotations • optimal workflow for machine & crowd annotating together
Adapting disagreement metrics across a wide range of crowdsourcing tasks • improve the existing CrowdTruth metrics by experimenting with human annotations What Crowdsourcing Factors Influence the Quality of Results • which factors influence quality of results, e.g. time, native language, location of the
crowd worker • determine how and to what degree they influence • which factors generally influence, and which are task-specific
Trust Management [email protected] - http:/ wiki.cs.vu.nl/ mp/ Davide_Ceolin
• Can algorithms mimic human trust?
• What are the characteristics of trustworthy pieces of text?
• Can provenance help determine the trustworthiness of artifacts?
http:/ / www.flickr.com/ photos/ 8185675@N0 7/
Who made it? When? How? Where?
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim. Donec pede justo, fringilla vel, aliquet nec, vulputate eget, arcu. In enim justo, rhoncus ut, imperdiet a, venenatis vitae, justo. Nullam dictum felis eu pede mollis pretium.
Project related to Linkflows
http://vu-amsterdam-web-media-group.github.io/linkflows/
https://wiki.cs.vu.nl/mp/index.php/Project_related_to_Linkflows
Tobias Kuhn, VU University A t d
Open Master P j t
2 / 21
LinkEngine: a platform for aggregated scientific contributions on the Web
https://wiki.cs.vu.nl/mp/index.php/LinkEngine:
_a_platform_for_aggregated_scientific_contributions_on_the_Web
Tobias Kuhn, VU University A t d
Open Master P j t
3 / 21
Docker for Scientific Demos
https://wiki.cs.vu.nl/mp/index.php/Docker_for_Scientific_Demos
Tobias Kuhn, VU University A t d
Open Master P j t
4 / 21
Combined Analysis of Bibliometric Datasets
https://wiki.cs.vu.nl/mp/index.php/Combined_Analysis_of_Bibliometric_Datasets
Tobias Kuhn, VU University A t d
Open Master P j t
5 / 21
Full-Text Analysis of the Literature of an Entire Scientific Field
https:
//wiki.cs.vu.nl/mp/index.php/Full-Text_Analysis_of_the_Literature_of_an_Entire_Scientific_Field
Tobias Kuhn, VU University A t d
Open Master P j t
6 / 21
The Properties of Knowledge Networks
https://wiki.cs.vu.nl/mp/index.php/The_Properties_of_Knowledge_Networks
Tobias Kuhn, VU University A t d
Open Master P j t
7 / 21
Extract Core Sentences from Scientific Articles
Mosquitoes transmit malaria.
Malaria is transmitted by female mosquitoes.
more specific meaning of
study A
provides evidence for provides counter-
evidence against
study B study C
study D Malaria is transmitted by moscitos.
same meaning
Malaria is transmitted by mosquitoes.
corrected version of
https://wiki.cs.vu.nl/mp/index.php/Extract_Core_Sentences_from_Scientific_Articles
Tobias Kuhn, VU University A t d
Open Master P j t
8 / 21
Application of Nanopublications to a Specific Domain
Nanopub0001
Assertion: Provenance:
opm:wasDerivedFrom d:DataSourceX
ns1:mosquito ns2:malaria
ns3:transmission
Publication Information: dc:created “2013-01-01”
pav:createdBy p:Isabelle_Dubois
https://wiki.cs.vu.nl/mp/index.php/Application_of_Nanopublications_to_a_Specific_Domain
Tobias Kuhn, VU University A t d
Open Master P j t
9 / 21
Integrating Nanopublications into Scientific Articles
assertion
provenance
nanopublication
⇒ publication info
https://wiki.cs.vu.nl/mp/index.php/Integrating_Nanopublications_into_Scientific_Articles
Tobias Kuhn, VU University A t d
Open Master Projects 10 / 21
Nanopublications for Data Curation and Text Annotation
⇐ publication info
nanopublication
assertion
provenance
⇐
https://wiki.cs.vu.nl/mp/index.php/Nanopublications_for_Data_Curation_and_Text_Annotation
Tobias Kuhn, VU University A t d
Open Master Projects 11 / 21
Nanopublications in the KNIME Workflow Tool
https://wiki.cs.vu.nl/mp/index.php/Nanopublications_in_the_KNIME_Workflow_Tool
Tobias Kuhn, VU University A t d
Open Master Projects 12 / 21
Publishing Web Annotations as Linked Data
https://wiki.cs.vu.nl/mp/index.php/Publishing_Web_Annotations_as_Linked_Data
Tobias Kuhn, VU University A t d
Open Master Projects 13 / 21
Refining and Linking Messy Data
https://wiki.cs.vu.nl/mp/index.php/Refining_and_Linking_Messy_Data
Tobias Kuhn, VU University A t d
Open Master Projects 14 / 21
Reliable Linked Web Forms
https://wiki.cs.vu.nl/mp/index.php/Reliable_Linked_Web_Forms
Tobias Kuhn, VU University A t d
Open Master Projects 15 / 21
Extend or Apply the Nanopublication Server Network
https://wiki.cs.vu.nl/mp/index.php/Extend_or_Apply_the_Nanopublication_Server_Network
Tobias Kuhn, VU University A t d
Open Master Projects 16 / 21
Science Bots
gives positive assessment for
is contributed by
77 Eigenvector centrality (0-100)
100 72 0.0
0 88
0.0 0.0
0 0 51
51 0 0.0 0.0
0
https://wiki.cs.vu.nl/mp/index.php/Science_Bots
Tobias Kuhn, VU University Amsterdam Open Master Projects 17 / 21
Using the Bitcoin Block Chain for Reproducible and Trustworthy Science
https:
//wiki.cs.vu.nl/mp/index.php/Using_the_Bitcoin_Block_Chain_for_Reproducible_and_Trustworthy_Science
Tobias Kuhn, VU University A t d
Open Master Projects 18 / 21
Generating a controlled natural language grammar from example business rules
https://wiki.cs.vu.nl/mp/index.php/Generating_a_controlled_natural_language_grammar_from_example_
business_rules
Tobias Kuhn, VU University A t d
Open Master Projects 19 / 21
Business vocabulary and linked data / Business rules and type inference
⇔
https://wiki.cs.vu.nl/mp/index.php/Business_vocabulary_and_linked_data
https://wiki.cs.vu.nl/mp/index.php/Business_rules_and_type_inference
Tobias Kuhn, VU University A t d
Open Master Projects 20 / 21
Applying Controlled Natural Language
https://wiki.cs.vu.nl/mp/index.php/Applying_Controlled_Natural_Language
Tobias Kuhn, VU University A t d
Open Master Projects 21 / 21
COMPUTER SCIENCE FOR DEVELOPMENT (CS4D)
PROJECTS
‘Internet of rainmeters’ Paving the way for an African rainradar
s
• Problem: African farmers do not use full potential of their harvest due to poor irrigation planning
• Cause: weather predictions (esp. rain) very poor in Africa, lack of local metrological and air measurements
• Solution: combine local sensors, global sensor and weather predictions
• Your task: • Create a rural weather station • Combine data with satellite data • Build an African rainradar • [improve food security]
• Work (and get paid) at award-winning App Company in Utrecht
• Contact: Chris van Aart, [email protected]
Hans Akkermans www.w4ra.or
6
‘Smart rural sensoring’ Build a lab on an African scooter • Problem: Pollution in air and water causes illness • Cause: lack of local labs / sensors • Solution: build a (cheap) portable (smart city)
sensor kit • Your task:
• Create a basic sensor kit (e.g. Arduino) • Select best sensors for air – and water
quality (we will buy them for you) • Design interface (e.g. mobile, display, web) • [create happy kids]
• Work (and get paid) at award-winning App Company in Utrecht
• Contact: Chris van Aart, [email protected]
Hans Akkermans www.w4ra.or
6
SDK for smart rural applications
● Problem: reusable software component platform is lacking, but needed as a baseline ➔Rapid development, robust (desert dust, electricity),
simple to use, to (locally) maintain, self-healing ● Approach: technical, software/service engineering, based
on commonalities in apps ● Different areas: voice service interfacing, language,
…; ● Several projects: Andre Baart + 1 ● Contact: Hans Akkermans, Victor De Boer
Hans Akkermans www.w4ra.or
10
Value chain improvement by communicating market info
● Problem: Rural agro-value chains (bring produce to market elsewhere) poor or absent: lack of info
● Approach: RadioMarché: ➔communication mix (mobile, web, radio)
exchanging different types of market info from small villages over large distances in different languages
➔(organization + technology issues) ● Project: Gossa Lô + … ● Contact: Anna Bon and Hans Akkermans
Hans Akkermans www.w4ra.or
71
Extending e3value for ICT4D ● Context:
➔ ICT4D: Information technology for development ➔ E3value is a methodology to develop electronic services for
networks of enteprises and end-users. (see www.e3value.com)
● Problem: ➔ Can e3value be used as a service development tool? ➔ Socio-economic “Sustainability” of services
● Approach: ➔ Analyze the case studies we have in the field of ICT4D and
interview experts
● Supervision/Contact: ➔ Jaap Gordijn ([email protected]) and Anna Bon
Hans Akkermans www.w4ra.or
72
Using block chain technology for Intellectual Property Rights Clearing ● Context:
➔ Intellectual Property Rights clearing ensures that artists, sing&song writers, etc. are paid if their music is played on the radio or television
➔ Block chain technology is a technology to have distributed administration between various parties (e.g. artists, radio stations, etc.)
➔ IPR clearing is done by IPR societies. They obtain money from IPR users (e.g. radio stations) and pay these revenues to IPR owners (e.g. artists)
● Problem: ➔ Can the IPR clearing process be streamlined by block chain technology?
● Approach: ➔ Obtain understanding of the block chain technology ➔ Work with IPR societie(s) to analyze how block chain technology can
streamline the current right clearing process ● Supervision/Contact: Jaap Gordijn ([email protected])
Hans Akkermans www.w4ra.or
73
INTERACTIVE SCREENS, TABLES, TABLETS http:/ / www.networkinstitute.org/ tech-labs/ intertain-lab/
Constructing qualitative models for understanding and explaining depression
• Create a cause-effect model that prunes and integrates the current knowledge on the working of antidepressants.
• The model should help explain and understand the phenomenon using first principles and general physiological mechanisms.
• Existing authoring tools for qualitative reasoning can be used to create the model.
• Information & Supervision – Zhisheng Huang – http://www.cs.vu.nl/~huang/ – Bert Bredeweg – https://staff.science.uva.nl/b.bredeweg/ – Annette ten Teije – http://www.cs.vu.nl/~annette/