Description/requirements for D2.3:
(from work packages in grant agreement - https://drive.google.com/open?id=0B4Qow4ezpDrNcnFScklybVZyNXc ) -
In this task, we will develop and align SlideWiki content with appropriate ontologies (e.g. DublinCore, SCORM, FOAF, SIOC, Schema.org) for representing semantics of OpenCourseWare material. We will use an RDB2RDF mapping approach (e.g. SparqlMap ) for dynamically mapping the existing relational multi-versioning data structure to our semantic model. Providing suitable user interfaces for annotation of content is another goal of this task. We will customize RDFaCE semantic editor to support manual annotation of content based on RDFa and Microdata markups. For automatic content annotation and interlinking, FOX and LIMES will be employed. We will also take advantage of the generated annotations to create a recommender system to propose related content to authors based on information context and user profile.
Goal: decide on implementation
- How will we implement tags?
- Option 1:As plain text that is appended to slide revisions and decks (e.g. tags: [mathematics, fibonacci, ...] or
- Pros: easy to implement
- Cons: enriching the tags with data from the WWW might become difficult, as we have no place where to save additional information(Roy Meissner )
- Option 2: as URIs that are already linked to further knowledge (RDF)?
- Pros: we might be able to infer knowledge from the WWW, we might be able to present a user enriched recommendations
- Cons: we might not be able to link to distinct topics/concepts in case we are unsure about the meaning of a tag.
- Option 3: as both URIs and Tags (Mariano Rico ) providing both options as Plain text and url like this: '{ name: Scientific, url: http://dbpedia.org/ontology/Scientific }'. (Aleksandr Korovin ) (Option 3 added by Klaas Andries de Graaf for discussion overview purposes - check if I misunderstood argument/pro's/con's)
- Pros: Can get language label via DBpedia via URI (Mariano Rico , see SWIK-903). User can still specify tag without URL (Aleksandr Korovin) + link is optional for a tag and links are not requested from users (Roy Meissner ). users are able to add plaintext tags that we might enrich later + urls for tags are added later by our automatic system (e.g. a Named Entity Recognition that is able to disambiguate) (Roy Meissner )
- Cons: not good for keyword searching in MongoDB - Typical model data is an array (https://docs.mongodb.com/manual/tutorial/model-data-for-keyword-search/). Need to test query that will search in tags array by name: tags: \[{name: 'name1', ...}, \{name: 'name2'} , .... ] (Aleksandr Korovin) → also in SOLR? (Serafeim Chatzopoulos )
- Possible complication: disambiguation problem tag → URI link (Roy Meissner). need a custom mapping for tags with some meta information (e.g. slide202 → {tag1 → {sameAs: <url>, algorithm: NamedEntityRecognition, manuallyAdded: false, ...}, tag2 → ...} (Roy Meissner)
- Pros: Can get language label via DBpedia via URI (Mariano Rico , see SWIK-903). User can still specify tag without URL (Aleksandr Korovin) + link is optional for a tag and links are not requested from users (Roy Meissner ). users are able to add plaintext tags that we might enrich later + urls for tags are added later by our automatic system (e.g. a Named Entity Recognition that is able to disambiguate) (Roy Meissner )
- Other options? (Ali Khalili , Roy Meissner , Mariano Rico , Antje Schlaf , Luis Daniel Fernandes Rotger )
- Option 1:As plain text that is appended to slide revisions and decks (e.g. tags: [mathematics, fibonacci, ...] or
- How do we save these tags?
- DECISION 1: save tags as part of the current MongoDB, right at slides/decks (either as JSON-LD or plain JSON) - passed with 5 votes (Roy, Klaas, Alex, Luis, Paul) > 50% of involved
Decision 2: Option 2) as plain JSON passed with 5 votes (Roy, Antje, Klaas, Luis, Alex) (> 50% 5/9 possible votes)
- Option 1 - As part of the slide model in the MongoDB or
- Pros: easy to add to our current model (Roy Meissner), same technology stack. use Solr to have a (however materialized) index for tags, so we are able to efficiently search among these (Roy Meissner)
- Cons: possibly complex (+long running) queries to find things that are related to tags, more complex to add information from the WWW. for reasoning, inferring knowledge, semantic search we will need a RDF store with SPARQL endpoint working in a real time for users, right (T2.4) (Aleksandr Korovin , Klaas Andries de Graaf agrees). Needs mapping (also holds for option 1.1 ? see point "additions/complications")
- Option 1.1 - JSON-LD (Allan Third) (option 1.1. added by Klaas Andries de Graaf for discussion overview purposes - check if I misunderstood argument/pro's/con's)
- Pros: there's no separation between the Mongo representation and the RDF. can help for SEO if embedded in pages (Ali Khalili ) . Graph-db supports JSON-LD (Aleksandr Korovin)
- Cons: shouldn't be such a disruptive change, it should just involve adding some fields to the JSON that's there (Allan Third) add \@context field with context.json and that is all (Aleksandr Korovin ).May need additional mapper to triple-DB for fast SPARQL queries (see additions/complications below).
The discussion was originally how we store the tags in mongoDB, and if we need to do it with JSON-LD to store the tags. Like Roy said, after a longer discussion here, we came to the conclusion that just for the storing decision normal json would be sufficient for now. To implement storing as JSON-LD requires design decisions for a possible later RDF graph. We came to the result, that we can also store it just as normal json and as soon someone wants to create an RDF graph just a good ETL to RDF process is needed. So no need to discuss details of JSON-LD if we don't know yet what this potential RDF graph should look like. (Antje Schlaf) - Additions/complications: needs SPARQL mapper for querying (Aleksandr Korovin , Roy Meissner). the on-the-fly conversion of a SPARQL request to a mongodb request can be too slow for complex SPARQL queries. I vote for a off-line RDF generation by means of mappers. (Mariano Rico) for performance reasons I also am more apt to using a triple store rather than on-the-fly query rewriting. go for both approaches, do a benchmarking and then decide (Ali Khalili). transfer (ETL) the database content (or just changes since last time) to an actual Graph Store from time to time (Roy Meissner ) .
- Pros: there's no separation between the Mongo representation and the RDF. can help for SEO if embedded in pages (Ali Khalili ) . Graph-db supports JSON-LD (Aleksandr Korovin)
- Option 2 - as a separate DB (e.g. a Graph DB - read next section)
- Pros: simple queries, leverage default Semantic Web technologies (like interlink/infer knowledge on the WWW), no schema boundries
- Cons: possibly not so good performance, another technology stack. Needs synchronising (Roy Meissner). How to search both GraphDB and MongoDB?
- Other options? (Ali Khalili , Roy Meissner , Mariano Rico , Antje Schlaf , Luis Daniel Fernandes Rotger )
- Related:
- RML - http://rml.io/
- Karma - https://usc-isi-i2.github.io/karma/
- SparqlMap - https://github.com/tomatophantastico/sparqlmap/
- morph-xr2rml - https://github.com/frmichel/morph-xr2rml
- DECISION 1: save tags as part of the current MongoDB, right at slides/decks (either as JSON-LD or plain JSON) - passed with 5 votes (Roy, Klaas, Alex, Luis, Paul) > 50% of involved
- How do we realize semi-automatic semantic annotation of decks and slides?
...