...
- How will we implement tags? As plain text that is appended to slide revisions and decks (e.g. tags: [mathematics, fibonacci, ...] or as URIs that are already linked to further knowledge (RDF)?
- for the second option: How do we make sure that the concept is distinct, so we can link it to further knowledge? E.g. someone tags a slide with "bench" .... did he meant furniture or law (maybe not the best example in English)
- how do we handle concepts/topics in case we can't distinguish topics (so we can't decide whether to link to bench (furniture) or bench (law))?
- for the second option: How do we save these tags? As part of the slide model in the MongoDB or as a separate DB (e.g. a Graph DB - read next section)
- for the second option: Do we want to show to users that they are working with RDF (e.g. by showing them URIs)????
- regardless if we model the tags inside MongoDB or as RDF-graph: a representation of a tag should be possible even if it can't be assigned to further knowledge. The link to further knowledge entity should be optional. In the process it is 2 steps: 1) get tags 2) try to link them to further knowledge if possible
- ???
- define tag fields/features: what do we need here? of course not only 1 but several tags per deck, plus each tag having a certain features: string for the tag name itself, a probability assigned and a source (since the tag might come from different sources like manually assigned, tag-algorithmA, tag-algoB, etc.). With this we have the possibility to say: Show tags which are assigned manually and for algoA above threshold X, algoB above threshold Y. If there is a recognized link from the tag to a dbpedia-entity (or other ontology entity) this should also somehow be modeled. (depending on if the tags are stored in MongoDB or as graph like suggested / asked by Roy)
- storing tag results vs. showing tags to user: there should be a difference between storing the tagging results and showing them (e.g. through thresholds). Should the user be allowed to remove non-fitting tags which result from an automatic assignment? If we do not delete them but just do not show them anymore, we can later learn which tags from which algorithms work best and which get often removed. Thatswhy we should keep them somehow internally. When the platform is later nicely manually tagged by the users, we can use this data as evaluation data for our automatic algos as well as as training data to train models for a better automatic assignment. But this would require something like a notshow / ignore field to allow the user to remove an automatic tag even it has high probability.
- ???
Concrete Tasks for the future:
...
Real use cases for the semantic representation via RDF:
- Interlink several SlideWiki instances (in case we implement versioning as part of the data store)
- Interlink knowledge (slide/deck content and tags) to further knowledge on the web and provide users with a "discover view" to find related topics, superior topics/concepts, other decks, sources (e.g. papers), information (like wikipedia articles), ....
- queries which might be easier via graph: show decks which have the same tags / are also high connected to tag collection x, show users which use similar tags me, what other tags have decks/users who use the same tags like my deck...
- ???
Academic use cases for the semantic representation:
...