User:Patrick/annotation
From semanticweb.org
User space unless indicated otherwise:
annotation | ask | Help:Attribute name | Help:Browsing and searching | category | Help:Chains of relations and attributes | Help:Custom units | Help:Namespace | Help:Relation name | selection | Help:SearchTriple | sorting | Help:Templates in SMW | this page on sandbox.semantic-mediawiki.org | this page on Wikible
[edit] Kinds of annotation
An annotation is a special way of entering data at an arbitrary position in a page which is not a talk page. It is repeated in a factbox at the bottom of the page, and can be used in queries at any page. A category tag is a simpler and older kind of annotation, considered here also for comparison.
There are the following kinds:
[[category:page|sortkey]] - pair (subject article, category name), "the subject is in the given category".
[[relation::page|label]] - triple (subject article, relation name, object article), "the subject has the given relation to the object".
[[attribute:=value|label]] - triple (subject article, attribute name, attribute value) where the attribute has a page specifying its type, "the subject has the given attribute set at the given value".
In all cases the subject page contains the annotation. The sortkey and label are optional. In the case of a category, at an edge of the page a link to the category page is provided. In the case of a relation a link to the 3rd item is provided unless the label is a blank space. With this proviso a relation can be considered a link category.
The whole category system is comparable with a single relation "Relation:In category". Similar but a little more specific are Relation:Subclass of and the on this wiki deprecated Relation:Instance of.
Compare:
- <ask>Category:continent</ask> gives !, !Main, !Main, !Main-c, Faisal Alkhateeb, 'n Beetje Verliefd, ( 31) - 20 - 598 7731/7483, (+31) - 20 - 598 7678, (+44)(0)114 222 1891, (+44-161) 275-6176, (+45) 4674 3835, (302) 831-1959, (306) 585-5226, (310) 448-8472, (33)4 67 41 85 79, (33-1) 40 27 22 53, (351)911085353, (404) 3851139, (407) 380-1200, (412) 266-2492, (412) 268-5477, (509) 372-4946, (610) 758-6533, (617)440-4364, (65)6790-4802, (650) 723-6725, (650) 723-9770, (650) 859-6153, (805) 523-0193, (814) 865-6460, (animated), (testing 2), (testing), *, ++31 (0) 20 - 598 7753, ++49681/302-5358, +1 (215) 898-2665, +1 (650) 859-6058, +1 (703) 669 5510, +1 000 000 8080, +1 302.463.1354, +1 412 268 1357, +1 415 555 1212, +1 425 785 9995, +1 507-5388384, +1 613-321-2043, +1 617 258 6167, +1 951 710 1861, +1(973)386-6695, and +1-410-455-2590 … further resultswarning.pngThe part "Category:" of the query was not understood.
Results might not be as expected. The part "continent" of the query was not understood. Results might not be as expected. Some subquery has no valid condition.
- <ask>instance of::continent</ask> gives !, !Main, !Main, !Main-c, Faisal Alkhateeb, 'n Beetje Verliefd, ( 31) - 20 - 598 7731/7483, (+31) - 20 - 598 7678, (+44)(0)114 222 1891, (+44-161) 275-6176, (+45) 4674 3835, (302) 831-1959, (306) 585-5226, (310) 448-8472, (33)4 67 41 85 79, (33-1) 40 27 22 53, (351)911085353, (404) 3851139, (407) 380-1200, (412) 266-2492, (412) 268-5477, (509) 372-4946, (610) 758-6533, (617)440-4364, (65)6790-4802, (650) 723-6725, (650) 723-9770, (650) 859-6153, (805) 523-0193, (814) 865-6460, (animated), (testing 2), (testing), *, ++31 (0) 20 - 598 7753, ++49681/302-5358, +1 (215) 898-2665, +1 (650) 859-6058, +1 (703) 669 5510, +1 000 000 8080, +1 302.463.1354, +1 412 268 1357, +1 415 555 1212, +1 425 785 9995, +1 507-5388384, +1 613-321-2043, +1 617 258 6167, +1 951 710 1861, +1(973)386-6695, and +1-410-455-2590 … further resultswarning.pngThe part "instance of" of the query was not understood.
Results might not be as expected. The part "::" of the query was not understood. Results might not be as expected. The part "continent" of the query was not understood. Results might not be as expected. Some subquery has no valid condition.
- <ask>Category:continent||country</ask> gives <a sk>Category:continent||country</ask>
- <ask>instance of::continent||country</ask> gives <a sk>instance of::continent||country</ask>
- <ask sort="instance of">instance of::continent||country</ask> gives <a sk sort="instance of">instance of::continent||country</ask>
A union of a category and a set of subjects relating to a given object is not possible. Therefore, if one wants the possibility to take a union one has to use two categories, or two objects with the "instance of" relation, not one of each.
[edit] Advantages of a category
In SMW an advantage of using categories and not just such relations is the feature to give all pages directly or indirectly in a category, because a similar feature is not available for a relation like Relation:In category. However, if we also have the converse relation Relation:Category that contains we can list the contents of one sublevel
[edit] Advantages of "instance of"
In the case of a relation, we can also relate to a page in the main namespace. Using a category we have to choose between:
- The duplication of having e.g. pages continent and category:continent. If relations use the page "C" as subject and/or object then we have the following complications:
- if C is in the selection resulting from a query due to C R D, then this query cannot list supercategories of category:C
- if category:C is in the selection resulting from a query due to being directly or indirectly in a supercategory, the query cannot produce, for a given relation R, pages D such that C R D.
- if category:C is in the selection resulting from a subquery due to being directly or indirectly in a supercategory, then the query cannot select, for a given relation R, pages D such that D R C.
- Putting all pages about classes in the category namespace. However, this somewhat clutters the titles of the pages and makes linking more cumbersome.
All instances of "instance of" can be produced by one query, either alphabetically, or sorted by the object of the relation.
[edit] Relation
A relation, in OWL called an object property, is a mathematical relation between pages. It is a set of pairs (subject page, object page); each pair is an instance of the relation. All relations together form a set of triples (subject page, relation name, object page). Each triple in an instance of a relation. An instance can be expressed as "the subject has the given relation to the object".
An instance of a relation is defined by the annotation [[relation name::object page|label]] anywhere on the subject page.
We can distinguish between the semantic relation, i.e. the meaning in general, e.g. "parent of", which can be applied on any set of people (living, dead, future people, fictional people) and also on animals, etc., and the actual relation in the wiki, given as the set of page pairs (P, Q) for which P has annotation [[R::Q]].
For a wiki with a given set of pages V, assuming there are no incorrect annotations, the actual relation is a set of page pairs (P, Q) with P in V, and (P, Q) in the semantic relation. For the given V the actual relation can be said to be complete when it contains all (P, Q) in the semantic relation with P in V.
Note that this completeness is not always desired: in transitive relations such as located in, part of, and subclass of and their inverses, we tend to specify e.g. that Amsterdam is located in the Netherlands and that the Netherlands is located in Europe, but not that Amsterdam is located in Europe. A query about Amsterdam can find Europe and conversely. These are two steps, queries can handle up to three steps.
The question arises whether we want to annotate that the Netherlands is located in Europe, or dispense with that because we have already that the Netherlands is located in the European Union and the European Union in Europe. In the latter case we cannot get a single table with a row for each country of Europe. We can get two tables with non-overlapping sets of countries provided that there is not more than one level in between. If we similarly do not annotate a country as located in Europe if we annotate it as being in the Schengen zone, we need three tables, and the sets of countries they cover have as overlap the intersection of the European Union and the Schengen zone.
[edit] Symmetry
A binary relation R over a set V is symmetric if for all P and Q in V with P R Q, also Q R P.
In SMW we have to distinguish between semantic symmetry and actual symmetry, i.e., symmetry in the sense that whenever P R Q is annotated, also Q R P is annotated. The latter implies also that page Q exists, hence, that we do not annotate a relation to a non-existing page. Even if the actual relation is complete in the above sense, this need not be the case.
Example, for the Relation:Sibling:
- Complete in the above sense means: for all persons with a page, all siblings are annotated.
- Actual symmetry means: if P has annotation sibling::Q, Q has his own page with annotation sibling::P.
- The combination implies that for all persons with a page, all siblings have their own page.
[edit] A relation and its inverse
For full query possibilities, for every annotation P R Q one also needs to annotate Q Rinv P:
- selecting P (hence listing properties of P, and, with a nested query, selecting S such that S R1 P) is only possible with the annotation P R Q
- listing P for a selected Q is only possible with the annotation Q Rinv P
Although these double annotations are logically redundant, unfortunately the system does not provide all these possibilities without them. It can be useful to routinely, every time one annotates an instance of a relation, also put the corresponding annotation of the inverse relation and similarly, every time one puts a category tag, also put the corresponding annotation of Relation:Has instance or Relation:Has subclass. For this purpose templates producing wikitext can be helpful.
To check the triples (P R Q) and (Q Rinv P) for consistency and completeness, and adding them, the result of a query on page Q showing the pages P such that P R Q can be compared with the list of P's such that Q Rinv P, shown in the factbox on the same page Q. It can be helpful to put the annotations in an order such that for each relation the order of the objects is alphabetical, just like the query results.
If one wants to put annotations for only one of the two relations one should carefully choose between R and Rinv, based on the queries one wants to apply and the possibilities of the software. See also Help:Chains of relations and attributes.
For a symmetric relation there is no choice between R and Rinv, but similarly to the above, in addition to annotating (P R Q), one can best also annotate (Q R P).
Possible senses of "completeness" of annotations with regard to a relation and its inverse are:
- for every annotation (P R Q) there is an annotation (Q Rinv P)
or
- for every annotation (P R Q) and existing page Q there is an annotation (Q Rinv P)
and
for every existing page P all true (P R Q) and (P Rinv Q) are annotated
Trying to fulfill all conditions can often be too much. For example, with the parent/child relations, we would end up with a page for every person in the world, and with lines and stops of public transport, the "has stop"/"stop of" relations would lead to all public transport in the world except parts isolated from the rest of the total network.
Of course a relation can be restricted to allow this "completeness", which is then restricted accordingly, e.g. P R Q is "P is Intercity train stop in the Netherlands on line Q".
[edit] Attributes
An attribute page [[Attribute:attributename]] is required, containing at least
- [[has type::Type:typename]]
Semantic MediaWiki knows a number of different datatypes that we can choose for attributes. In addition a user can define a type on [[Type:typename]].
Datatypes determine what values are valid and how they are sorted, and may allow unit conversion.
[edit] Further differences between relations and attributes
A condition with a relation involves a set of pages, possibly defined in terms of categories, namespaces, or a nested query.
A condition with an attribute can involve an inequality.
In an annotation and in the result of a query the third item of a relation is linked unless a blank label is specified, or "link=none", respectively, while the values of attributes are never linked.
With regard to sorting in-page query results: pagenames can only be sorted alphabetically, while attribute values of numeric type can be sorted numerically, and of type date/time chronologically.
To allow numerical sorting and have links we need a relation and a corresponding attribute, see Relation:Relation equal to attribute.
[edit] Organizing info
Suppose we have 3-tuples (K,L,M).
Fitting that into the pattern (subject article, relation name, object article) or similar for attributes, there are 6 choices on which of the three items is the subject, which the relation/attribute, and which the object/attribute value.
E.g: Germany has capital Berlin:
- in Germany use [[has capital::Berlin]]
- in Berlin use [[capital of::Germany]]
- in capital use [[of Germany::Berlin]]
- in capital use [[is Berlin related to::Germany]]
- in Berlin use [[has property related to Germany::capital]]
- in Germany use [[has Berlin as::capital]]
In an in-page query the relation or attribute name have to be provided. In this case that makes the first two options probably the most useful. Germany and Berlin are more likely to be items in lists than capital.
[edit] Triadic relation
Instead of a binary relation such as "capital of" or "has capital" there are also triadic relations.
Suppose we have 4-tuples (K,L,M,N), e.g. 3 pieces of info and one item explaining the relation between them. These are equivalent to triples (K,L,(M,N)).
There are 6 ways we can choose two items out of 4 to combine; in each case there are 6 choices for which of the three remaining items is the subject, which the relation/attribute, and which the object/attribute value.
Thus we have the following possibilities:
- make subject pages on subtopics about the combination of two items
- make more specific relations or attributes combining two items
- make object pages or attribute values combining two items
E.g: Germany before/after has capital Bonn/Berlin:
- use Germany before and Germany after
- use Relation:Had capital and Relation:Has capital
- use capital of Germany with relations before and after
- etc.
See also [1].
[edit] Triadic relation film / character / actor
One can make the character name unique for a film / actor combination, and use:
- a relation between character and film
- another relation between character and actor
To be able to produce automatically a cast list (see e.g Afblijven) and filmography (see e.g. Melody Klaver) a page with two tags is needed for each character, e.g. Debby.
While on e.g. Wikipedia currently a page for a film character is usually only created if it is of particular interest, in this system a page is created for any character that one wants included in the cast list on the film page and the filmography on the actor page.
If two film characters have the same name, even if both films are about the same character, separate pages are needed, e.g. character (film), because otherwise it is not clear which actor plays in which film. The disambiguation can wait until the second character is entered: it only requires renaming the page for the first character with the name concerned; the film and actor pages are automatically updated.
It cannot be avoided that the cast list on the film page and the filmography on the actor page show these disambiguating texts (film), although this info is superfluous: on the film page obviously because the whole page is about the film, and on the actor page because the second column in the table shows the films. (If for all characters the pagename is of the form character (film) then at least on the actor page we can avoid the duplication by only providing the first column.)
Although this way we can automatically produce a cast list, we cannot produce info about the cast on the film page; for that we need an annotation on each actor page either indicating whom he played, or in which film.
Similarly, although we can automatically produce a filmography, we cannot produce info about the film on the actor page; for that we need an annotation on each film page either listing the characters or the actors.
We can also use the inverse relations. Each character is annotated at the film article and actor article. The articles have a query to extract the annotated info of each other, so that the film article automatically produces the actor, and conversely. Both in the film article and in the actor article we can use a template to avoid entering the character twice per article, once for the article itself and for annotation, and once for the query. However, we have to substitute the template, because a query with a template parameter only works with substitution, so we end up with two occurrences of the character, making later changes cumbersome (either making the change twice or substituting the template again). On the other hand, if the actor article is renamed, or the actor info is corrected by moving the annotation to another actor article, the actor info produced on the film page is automatically updated. See e.g. Step_Up.
A possibility is also to set up a relation or attribute for each film (e.g. Attribute:She's_the_Man) or for each actor. A disadvantage is that in-line queries cannot search for relation or attribute names.
[edit] Other examples
A range of examples corresponds to a relation with a time, or with a start time and/or end time, e.g. for capital, presidency (e.g. Presidency of the United States of George Walker Bush), spouse, etc. See also User_talk:Markus_Krötzsch#n-ary relations
[edit] Datatypes and units of measurement
Using different types, attributes can be used to describe very different properties. A complete list of available types is available from Special:Allpages. Basic types include:
- Type:String (text strings)
- Type:Integer (whole numbers)
- Type:Float (decimal numbers with optional exponent)
These can be used creatively for very different purposes. For instance, attributes of type string can be used for encoding phone numbers (which in fact can contain non-numeric symbols), email addresses, or URLs.
Attributes that represent decimal numbers appear in many scientific and technological applications where to denote physical quantities. In these applications, values typically come with a unit of measurement that is important to understand the meaning of the value. For example, the value "17" alone does not represent any distance unless we know whether it is measured in meters, kilometres, or miles. Semantic MediaWiki therefore supports units for the datatype Type:Float. This means that you can specify an attribute length as
[[length:=17 km]]
to assign this unit to the value. If other articles use different units for lengths (e.g. light years), then these statements will not be confused by Semantic MediaWiki. Values of different units are kept apart, even if they are given to the same attribute.
If simple floats are used, it is helpful to use one unit in many articles, so that a large number of attribute values can be used together (in fact, it is also easier for human readers if standard units are used consistently). Even if a different unit is preferred in article, one can use a more common unit for annotation without changing the look of the article. For example
[[length:=3.26km|3260m]]
is displayed as "3260m" in the article, but is evaluated as "3.26km" by Semantic MediaWiki.
[edit] Unit conversion
Type:Float does not know how two different units relate to each other, and even different label for the same unit (e.g. "m" and "metres") are unrelated. The unit after the number merely prevents confusion between things that might be totally different.
As explained in the help page on custom units, it is easy to use and introduce types with unit support. These automatically convert values to and from standard representations, so that users are free to use their preferred unit in each article yet still query and compare with attribute values in other articles.
[edit] Special types
There are some special built-in types which support more complicated formats.
- Type:Temperature can't be user-defined since converting temperature units is more complicated than multiplying by a conversion factor.
- Type:Geographic coordinate describes geographic locations. It includes functions for recognizing different forms of geographic coordinates, and it dynamically provides links to online map services.
- Type:Date specifies particular points in time. This type is still somewhat experimental, but may feature complex conversions between (historic) calendar models in the future.
For specifying URLs and emails, there are some special variations of the string type:
- Type:URL stores URLs as a string datavalue, but automatically link them within the page.
- Type:URI: attributes of this type are interpreted as relations to external objects, denoted by the URI. See the type page for documentation.
- Type:Email stores emails as a string datavalue, but automatically links them (with mailto:) within the page.
- Type:Annotation URI: attributes of this type are interpreted as relations to external objects, denoted by the URI. They are special since they are interpreted as annotation properties on export. See the type page for documentation.
[edit] Annotation inside other wikitext constructs
Relations:
- c c c - formatting works
- dce fgh dce fgh - a relation annotation can be part of a template parameter
- {{dce}} - an annotation cannot be part of a template name
- Expression error: Unrecognised punctuation character "[" - an annotation cannot be in a numeric expression
Attributes:
- c c c - formatting works
- dce fgh dce fgh - an attribute annotation cannot be part of a template parameter
- {{dce}} - an annotation cannot be part of a template name
- Expression error: Unrecognised punctuation character "[" - an annotation cannot be in a numeric expression
[edit] Multiple annotations for one property
In some cases, when a query result shows a property of the selected pages, we want to link it partly: of a result P Q we want to link P. For this purpose we can use a relation for P and an attribute for Q. Furthermore we may use an attribute S as sortkey.
Example (see date-related tables):
- P is a year.
- Q is the remaining date info, which may be nothing, the month, or the month and the day of the month.
- S is a sortkey for chronological sorting
[edit] Factbox
At the bottom of every page with at least one annotation (on a category page: at the bottom of the editable part, before the list of subcategories and the list of pages) there is a factbox showing the instances of relations and attributes regarding the subject page. It has, as far as they are not empty, the following sections:
- Relations to other articles
- Attribute values
- Special properties
The relation name is in canonical form, the object is shown exactly as in the annotation. The instances are sorted by relation. The relations are ordered according the order of the first annotation of each relation on the page. For each relation the object pages are ordered according the order of the first occurrence of the instance on the page. If the same object page has been written in different forms (e.g. Relation:B, Relation:b, relation:B, relation:b) all forms are listed. TripleSearch shows as many as forms were used, but converted to canonical form. In-page queries show only one instance. Thus there are four levels of rawness of the data:
- annotation: [[a::Relation:B]] [[a::Relation:b]] [[a::relation:B]] [[a::relation:b]], [[a::Relation:B]]
- factbox: Relation:B, Relation:b, relation:B, and relation:b
- TripleSearch: Relation:B, Relation:B, Relation:B, and Relation:B
- in-page query: Relation:B, displayed as B
To avoid confusion it may be best to write the object of an annotation in canonical form.
Similarly the attribute name is in canonical form. The instances are sorted by attribute. The attributes are ordered according the order of the first annotation of each attribute on the page. For each attribute the attribute values are ordered according the order of the first occurrence of the instance on the page.
If an attribute value is an external link it is in the factbox displayed as a link, as opposed to how it is displayed in the result of a query.
All categories | properties | types
Advice on Annotation | Ask | Attribute name | Browsing and searching | Category | Chains of relations and attributes | Custom units | Namespace | Relation name | Selection | SearchTriple | Sorting | Templates in SMW
