Semantic representation

UPM recommends the morph-xr2rml tool in order to convent data from mongDB to RDF.

UPM has experience with morph and we could afford this task.

We also can contribute to T2.4 with recommendations on which ontologies can fit the model better.

InfAI:

For a semantic-representation of our current data, my colleagues recommended the tools:

RML - http://rml.io/
Karma - https://usc-isi-i2.github.io/karma/
SparqlMap - https://github.com/tomatophantastico/sparqlmap/
morph-xr2rml - https://github.com/frmichel/morph-xr2rml

Depending on our use-cases of a RDF representation, it might be sufficient to only map data and queries to our actual MongoDB content. In case we have more sophisticated usage scenarios (e.g. a lot of requests), it is better to have an ETL process to a fully featured triple-store.
Anyway, we came up with the following vocabs to (basically) describe slides, decks, users and all the rest:

Sioc
Doap
DC
Foaf

As another colleague of mine researches about co-evolution of RDF based data, he has an extensive overview of changeset describing vocabs, that might fit for our revision model of slides/decks.

Notes hangout 16-12-2016 (Roy, Antje, Klaas):
Multiple use-case for semantic representation/annotation:

In-page semantic annotation (manual add+view. Use NLP?) based on LOD cloud and custom ontologies (Upload own ontology?!). makes slide content more explicit - Good for learners/teaching (RQ: Klaas)
NLP/Named entity recognition (LSI?) on slide content to detect topics of slide/slides/decks - link to LOD
Be an LOD provider - RDF store - store ontologies + instances + provide SPARQL query endpoint
Semantic search - search in semantic annotations / decks/slides in RDF. cluster slides/decks based on topics
Translate RDF/semantic-annotations?!
We already have a system for tagging → help in determining topics / link to LOD cloud. Antje: could use of training - currently lack of data.
Someone creates deck(presentation) or slide - get recommendations for similar decks/slide that can be reused - do you want to add/reuse this slide? (Antje Schlaf - possible RQ?) – Klaas: different languages - same/similar topic? Antje: difficult if decks have only pictures. Klaas: needs OCR...
Share public (CC0) SlideWiki(.org) content among all instances (even private ones)

Do we want to semantically represent all data on slidewiki (users, comments, etc..) or only public data, or only tags/semantic annotions? More data is more use cases.

Research questions:

Klaas - RQ: Do in-slide semantic annotations and (possibly) domain ontology of teaching materials/didactic help learners (slide consumers) and teachers/instructors (slide consumers) in better learning (better grade results, better understanding, support for diverse students (languages)) + better learning analytics? Relates to use case 1, 2 (for analytics) + use case 4 + use case 5
Klaas - RQ: Can we infer new knowledge based on annotations on slides? Relates to use case 1 + use case 3 + use case 4 + use case 5
Klaas - RQ: can we propose an ontology + instances + relationships + complete knowledge base for a certain deck/lecture series based on the annotated instances in a deck/series of slides? Relates to use case 1, 2, 3
Klaas (with Darya Tarasowa?) - Can we generate exams based on semantic annotations/representation?
Roy - RQ: Can we push just the "latest" changes of the semantic representation (depends on what it covers) to other instances of SlideWiki, in order to make content available and searchable? E.g. a versioned semantic representation, that shares diffs, but instances do not need to be synchronized - use case 3, 4 and 8
Roy - RQ: How do we spread the information among SlideWiki instances, based on decentralized Linked Data principles? - use case 3 and 8
Abi - RQ: How do we make an effective UI for semantic annotations when content consumers have little knowledge of linked data etc - use case 1, 4, 7, 8
Abi - RQ: Semantic annotations often proposed to improve annotations. Can we provide evidence of this?

Revisions - if we tag decks, automatic annotions, detect topics, etc.. dynamically → do we create a new revision?!

Notes hangout 21-12-2016 (Roy, Antje, Abi, Mariano, Klaas):

(level 1): annotate at slide level + generic DBpedia ontology. Do before month 3 deliverable. Annotate general things, e.g., general topic of slide. At deck-level identify information about accessibility of whole deck → need to identify vocublary for decks at deck-level. At deck level identify educational topics/content. At deck level we may have multiple ontologies (accessibility, availability, education) At slide level → DBpedia.

Next level (level 2): In-page semantic annotation + other ontologies + own ontology.

TODO check deliverable minimal requirements.

Mariano : How will users annotate content? - Extra tab in platform

Mariano : need to agree on ontology - use DBpedia?

Klaas: Need screen/UI design and use cases.

Abi: during plenary: annotate at slide level.

Mariano: First prototype available in 3th month. First annotate → select from list of things/concepts, e.g., people + add name of presenter, or name of people in slide. Buildings, place, etc.. general labels. Specify literal. With this level we can start with recommendation/semantic search, etc...

Klaas: We do iterative and play it safe → minimal level (level 1): annotate at slide level + generic DBpedia ontology. Next level (level 2): In-page semantic annotation + other ontologies + own ontology.

Mariano: I will send proposal with vocabulary for minimal level.

Mariano: Look at deliverable → what is minimal requirement. Only slides. Recommendations? Minimal: List of topics to users → topic of slides

Roy: autocomplete list.

Mariano: yes/good.

Klaas: is list of topics fixed? Why limit if we connect to DBpedia anyways? At least list should be subset of DBpedia.

Mariano: DBpedia has 400 topics - a bit much.

Klaas: Perhaps we have a clever directory structure/selection mechanism → generic topics first, then more specific.

Abi: are DBpedia topics in multiple languages?

Abi: concerned about complexity for users → already have data sources, etc.. Is interest of me (Klaas: is RQ?)

Mariano: not sure if topic modelling will work with few slides.

Klaas: nevertheless: Antje can work on this in parallel → she can already prototype/discover requirements for good topic modelling.

Antje: Good → I already did tests on old slidewiki.org which has thousands of slides. Is recommendation system planned?

Mariano: Recommendation system → As far as I know → for suggesting slides related to topic of slides you are working on.

Antje: Recommendation system as in: automatic annotation.

Mariano: Selecting text in slides. We can have both → also automatic annotation. Not sure if this is in proposal.

Klaas: semi-automatic annotation in slides == level 2. (topic modelling is level 1)

Abi: Also learning objects in slides (e.g? Darya is working on this. Exam mode.When editing slides, e.g., put image in, ask users: is this graph with data in it?

Roy: map part of our database to RDF model.

Mariano: yes: to use in our search-facility. We need to think what users can ask → the more things we extract from model, the more they can ask.

Antje: Have graph visualisation at deck level - see connections → e.g., slides about Einstein, relativity, etc.. Can we easily ask DBpedia if entities are connected?

Mariano: yes, also number of steps in between, e.g., 2 people in between.

Antje: would be good for users → show what deck is about.

Mariano: is third or fourth level.

Mariano: I will look into ontologies. Abi do you have ideas for ontologies as well.

Abi: yes ideas for accessibility ontology (colleague will be working more on this in the New Year). Mirette Elias at Bonn has been doing something in this area linked to user profiles. Darya Tarasowa is interested from questions .

Should look at common standards already in use in educational publishing e.g IMS https://www.imsglobal.org/metadata/index.html Dublin Core ; Schema.org http://schema.org/docs/schemas.html which is being extended for accessibility http://www.a11ymetadata.org/ and is mapped to DBpaedia

Roy: Store annotations in DBpedia or different DB.

Mariano: store in model → map MongoDB model to RDF.

Klaas: concerned about performance. Also if we provide SPARQL query endpoint later.

Mariano: indeed takes time. Batch processing during night → MongoDB to Virtuoso → annotations are 24 hours old max → not real time update if someone annotates.

Klaas: we have to work out technical details later on. Concerned about backwards compatibility.

Mariano: perhaps we can generate RDF every half hour.

Antje: need timestamp

Antje: assign probability to annotations → user has prob. 100%. Automatic annotation has less probability.

Abi: semantic representation is related to search results → SWIK-883 - Getting issue details... STATUS

Mariano: provide DBpedia topics + entries/text as training data → antje can to tests.

Antje: yes! we can do experiments.

Mariano: we do experiment (in parallel) with DBpedia topics + entries → see if we can do topic suggestions.

TODOs → in 1st sprint - new years resolution. :

Mariano → provide list of topics for slides (level 1)

Mariano, Abi, Darya Tarasowa?→ Look at suitable ontologies (educational, accessibility) for slide (topic?) annotation - select several ontologies per deck? Appropiate ontologies or adapt one?

Klaas: link to confluence page on schema's suggested by Ben. → Use of Metadata Standards

Klaas: Make above TODOs into tasks for 1st sprint.

Klaas → Look at Month 3 deliverable → what is minimal requirement for level 1?

Antje → work in parallel on prototyping/experimenting topic modelling / automatic recommendations.

Antje → take look at topics in DBpedia

Level 1 semantic annotation:

D2.3: SlideWiki annotator module. -SlideWiki component for semi-automatic semantic annotation of content using the ontologies and existing vocabularies.

Related / larger task: T2.4. Semantic annotation, enrichment and recommendation: (Start M1, End M27 ; Lead: UFRJ; Participants: VUA, InfAI, Pupin, UPM, Fraunhofer, ATHENA, SOTON). Enriching educational content with semantic representations helps to create more efficient and effective search interfaces, such as faceted search or question answering. It will also provide customized and context-specific content which better fits user needs. In this task, we will develop and align SlideWiki content with appropriate ontologies (e.g. DublinCore, SCORM, FOAF, SIOC, Schema.org) for representing semantics of OpenCourseWare material. We will use an RDB2RDF mapping approach (e.g. SparqlMap47) for dynamically mapping the existing relational multi-versioning data structure to our semantic model. Providing suitable user interfaces for annotation of content is another goal of this task. We will customize RDFaCE48 semantic editor to support manual annotation of content based on RDFa and Microdata markups. For automatic content annotation and interlinking, FOX and LIMES49 will be employed. We will also take advantage of the generated annotations to create a recommender system to propose related content to authors based on information context and user profile. Building content on top of the existing content will save a considerable amount of time for users and will increase the consistency and integrity of the content.