NSL API Documentation

v1.0, June 2015 (Main documentation for the NSL API version 1.0)

1. Introduction

1.1. Australian Plant Name Index (APNI)

APNI is a database for the botanical community that deals with the names of Australian plants and their usage in the scientific literature, whether as a current name or synonym. APNI does not recommend any particular taxonomy or nomenclature. For a listing of currently accepted scientific names for the Australian vascular flora, see the Australian Plant Census (APC). Information available from APNI includes:

  • Scientific plant names;

  • Author details;

  • Original publication details (protologue), with links to a PDF in some cases via a PDF icon

  • Subsequent usage of the name in the scientific literature (in an Australian context)

  • Typification details;

  • An APC tick showing which, if any, concept has been accepted for the APC

  • State distribution (from the Australian Plant Census (APC));

  • Relevant comments and notes;

  • Links to other information such as plant distributions, descriptions and images search via a picture search icon.

APNI is maintained at the Centre for Australian National Biodiversity Research with staff, resources and financial support from the Australian National Herbarium, Australian National Botanic Gardens and the Australian Biological Resources Study. The CANBR, ANBG and ABRS collaborate to further the updating and delivery of APNI and APC.

1.2. Australian Plant Census (APC)

APC is a database of the accepted scientific names for the Australian vascular flora, ferns, gymnosperms, hornworts and liverworts, both native and introduced, and lists synonyms and misapplications for these names. The APC will cover all published scientific plant names used in an Australian context in the taxonomic literature, but excludes taxa known only from cultivation in Australia. The taxonomy and nomenclature adopted for the APC are endorsed by the Council of Heads of Australasian Herbaria (CHAH).

For further information about names listed in APC, including bibliographic information, secondary references and typification, consult the Australian Plant Name Index (APNI). Alternatively, clicking on hyperlinked names in APC search results will link to the APNI data for any given name.

Information available through APC includes:

  • Accepted scientific name and author abbreviation(s);

  • Reference to the taxonomic and nomenclatural concept adopted for APC;

  • Synonym(s) and misapplications;

  • State distribution;

  • Relevant comments and notes

APC is coordinated through a network of contributors, and is maintained by the Centre for Australian National Biodiversity Research with staff, resources and financial support from the Australian National Herbarium, Australian National Botanic Gardens, Australian Biological Resources Study, CHAH and State and Territory herbaria. These organisations collaborate to further the updating and delivery of APC.

1.3. National Species List (NSL)

The National Species List is a complete database covering vascular plants, mosses, fungi, animals etc. The data for the NSL is kept in disparate systems that are combined under the NSL.

The current NSL infrastructure does this via RDF web services over some semi static datasets, "but that is changing." What you see here is the start of the new NSL infrastructure that allows the separatly governed datasets to be curated by their "owners" while combining them into a live discoverable, searchable data resource with a consistent modern interface.

The new infrastructure takes the existing datasets and makes them "shards" of the NSL. Each shard will be imported separately into the new system as resources allow.

The new system incorporates an improved editing system and separate distributed search services, including linked data services.

1.3.1. What we have now

We have migrated

  1. APNI and APC

  2. Mosses (AusMoss)

  3. Lichens (ABRS)

data into NSL "shards".

The NSL services including search and the RDF/SPARQL interfaces are available via https://biodiversity.org.au/nsl .

1.3.2. The road map

We will be adding new shards to the system as we go, with the likely priority as:

  1. Fungi

  2. Algae

  3. Mosses

  4. Lichen

  5. the Australian Faunal Directory

As we add datasets, improvements to the editor and services will be required to cater for differing requirements, these changes will be incorporated based on priorities and resources.

2. Using the NSL

2.1. Searching the NSL

The current NSL service includes the APNI and APC data. APNI is a nomenclator, it includes a list of names of vascular plants and where they have been used in references. In this sense, APNI has no opinion about the name, it just states where and how it has been used in a reference.

APC is a consensus-based classification, it provides an opinion about what is the accepted name, and where the taxon sits in the classification "tree".

What is a tree?

We may speak of "trees" when referring to classifications in the NSL system, which shouldn’t be confused with actual woody vascular plants. A tree in this sense is the structure of set of names: where one name is in relation to other names. For example, the plant family name is "above" the names of its constituent genera, a genus name ranks "below" the family in the tree. A family may contain many genera and each genus many species and so on.

When you look at the NSL services page you will see the APNI and APC "products" as links in the "Navigation Bar" at the top of the page. To search APNI click the APNI link.

Figure 1. The NSL Services page (APNI product)

The search page consists of a set of search forms that can be found by selecting the appropriate tab.

By default when you go to the search page you are on the "Name Search" tab. This gives you a simple text box to enter a search for the name or names you are looking for.

This search just looks at the full name text, it is not trying to comprehend the name or its placement in a classification.

When you type your query into the search box you will get suggestions that tell you what your query is likely to return. The system only returns the first 15 matching names in the suggestion, and will end with an ellipsis (…​) if there are more results. You can click on one of those suggestions to copy it to the search box.

You don’t have to click on or use the suggestions, just press enter or click the search button with your query.

The name search looks at the 'Full Name including the Author', so the system adds an automatic wild card at the end of the query to match all endings. This means that if you just enter the simple name without author you should find what you want, however it may also return more than you expect.

The query is not case sensitive.

The query is an ordered set of search terms, so viola l. will match "Viola L." and "Viola L. sect Viola."

Putting double quotes around your entire query will cause it to be matched exactly (except case). e.g. "Viola L." will match just Viola L.

You can use a % as a wild card inside the search query e.g. hakea elon% be or "hakea % var. elon% benth." to find "Hakea ceratophylla var. elongata Benth."

You can search for hybrids using the letter 'x' or a multiplication sign &mul; if you use a multiplication sign.

Under Construction

This documentation is under construction. We understand it needs extending and improving, but we thought we’d let you see what we’ve got so far.

3. Application Interfaces (APIs)

The NSL services provide a number of Application Interfaces, called APIs, to let you access the name data within.

3.1. Changes

This section lists recent changes to the API.

May 1 2016 : * Removed NSL Simple Name export and JSON representation. June 22 2016 : * Added the Export section * Added name taxon-search api end point

3.2. Linked Data

The NSL system uses a linked data service called a mapper or broker to map the URIs for a resource to a service that can provide that resource. As linked data we don’t guarantee that the same service will deliver the resource each time, but we do guarantee the object returned will be the same one associated with that URI.

The Mapper takes the URI and will issue you a '303 redirect' to a service that will provide the requested resource in the format requested.

The mapper uses content negotiation to provide you with the correct service for the resource in the format you request, e.g. HTML, JSON, XML, RDF. You must follow the 303 redirect to the service and provide the redirected service the same content type to get the correct format.

Browsers do not allow you to request a different content type (easily), and you will always get HTML back in a browser. To make it easier to demonstrate the other format, or just browse them we have provided '.format' handling in our services so you can add '.json' or '.xml' to a URL to get the desired format. The .format does not survive the 303 redirection, so you will need to add it to the redirected service URL manually.

e.g. the URL https://id.biodiversity.org.au/name/apni/70914.json redirects to https://biodiversity.org.au/nsl/services/api/name/apni/70914

As you can see below the browser gets the 303 redirect and goes there, but the browser only 'accepts' html.

Figure 2. 303 redirect headers

If you now add '.json' to the redirected service URL in the browser we can force the response to JSON.

Figure 3. Forcing a JSON response using .json

3.3. Basic Resource Objects

The basic resource objects in the NSL system are representations of the underlying database objects, which in turn represent the taxonomic world. As such you will need to construct a view of the taxonomic object from these objects. Some higher level api calls perform some of this work for you to create a useful view.

The basic form of a URI to return an object from the API via a preferred link is:

https://id.biodiversity.org.au/[objectType]/[namespace]/[random unique number]

3.3.1. Author

An Author (person) object represents a Name or Reference author. It’s primary purpose is to act as an identifier for an author that can be linked to another identifier such as an IPNI Author.

Currently an author of both references and names may have two Author entries because of the imported APNI data. We will be working to de-duplicate these authors over time.

e.g. https://id.biodiversity.org.au/author/apni/1441

 

Figure 4. Author object HTML output

Example JSON output (Names left out)

{ "class": "au.org.biodiversity.nsl.Author", "_links": { "permalinks": [ { "link": "https://id.biodiversity.org.au/1441", "resources": 1 }, { "link": "https://id.biodiversity.org.au/author/apni/1441", "resources": 1 } ] }, "audit": { "created": { "by": "LADAMS", "at": "2002-02-13T13:00:00Z" }, "updated": { "by": "LADAMS", "at": "2002-02-13T13:00:00Z" } }, "namespace": "APNI", "abbrev": "R.Br.", "name": "Brown, R.", "fullName": null, "dateRange": null, "notes": null, "ipniId": null, "duplicateOf": null, "references": [], "names": [ { "class": "au.org.biodiversity.nsl.Name", "_links": { "permalink": { "link": "https://id.biodiversity.org.au/name/apni/82208", "resources": 1 } }, "nameElement": "crassifolius" }, { "class": "au.org.biodiversity.nsl.Name", "_links": { "permalink": { "link": "https://id.biodiversity.org.au/name/apni/128739", "resources": 1 } }, "nameElement": "reflacta" }, ... //(1000s of names) ], baseNames: [...], exNames: [...], exBaseNames: [...] }

3.3.2. Instance

An instance object represents an instance of the use of a Name in a Reference (a Usage) and it relationship. There are two main Instance occurrences, 'Standalone' instances and 'Relationship' instances (see figure below).

Figure 5. Instance relationships.

An Instance is of a particular instance type. Each instance type defines a set of properties for that instance. The currently defined instance types are:

  1. replaced synonym

  2. basionym

  3. pro parte replaced synonym

  4. nomenclatural synonym

  5. doubtful nomenclatural synonym

  6. pro parte nomenclatural synonym

  7. pro parte misapplied

  8. doubtful misapplied

  9. doubtful pro parte misapplied

  10. taxonomic synonym

  11. pro parte taxonomic synonym

  12. doubtful taxonomic synonym

  13. doubtful pro parte taxonomic synonym

  14. synonym

  15. pro parte synonym

  16. doubtful synonym

  17. doubtful pro parte synonym

  18. isonym

  19. autonym

  20. trade name

  21. comb. et stat. nov.

  22. comb. nov.

  23. comb. et nom. nov.

  24. misapplied

  25. nom. et stat. nov.

  26. nom. nov.

  27. tax. nov.

  28. excluded name

  29. doubtful invalid publication

  30. primary reference

  31. homonym

  32. invalid publication

  33. [n/a]

  34. [unknown]

  35. sens. lat.

  36. common name

  37. vernacular name

  38. [default]

  39. secondary reference

  40. implicit autonym

  41. orthographic variant

As you can see the instance type describes the relationship of the reference to the name.

Basic Instance object structure

{ "instance": { "class": "au.org.biodiversity.nsl.Instance", "_links": {...}, //links to this object and related resources "audit": {...}, //change information "namespace": "APNI", //The shard or dataset this instance belongs to "verbatimNameString": "Doodia R.Br.", //The name string as written in the reference "page": "151", //The page(s) of the reference this useage was found "pageQualifier": null, // "nomenclaturalStatus": null, "bhlUrl": null, //link to Biodiversity Heritage Librabry http://www.biodiversitylibrary.org/ "instanceType": {}, "name": {}, "reference": {}, "parent": {}, //Parent instance todo: explain "cites": {}, //An Instance that this instance cites "citedBy": {}, //An Instance that this instance is cited by "externalRefs": [], "instancesForCitedBy": [], "instancesForCites": [...], "instancesForParent": [], "instanceNotes": [...] } }

e.g. https://id.biodiversity.org.au/instance/apni/481759

Figure 6. Instance object HTML output

Example Full JSON output

3.3.3. InstanceNote

Instances can have several notes associated with them. An instance Note consists of a key and a value. Current instance note keys are:

  1. Neotype

  2. Ex.distribution

  3. APC Comment

  4. EPBC Impact

  5. Status

  6. Under

  7. Distribution

  8. URL

  9. Lectotype

  10. Context

  11. Vernacular

  12. Text

  13. Comment

  14. Synonym

  15. Type

  16. APC Dist.

  17. Etymology

  18. EPBC Advice

  19. APNI

  20. Type herbarium

e.g. https://id.biodiversity.org.au/instanceNote/apni/1121197

Figure 7. Instance Note object HTML output

Example Full JSON output

3.3.4. Name

The Name object represents a name string, type, rank, and status. This object is an identifier of a name string with enough information to be able to reconstruct the name string from parts. As an identifier it links together Instances and Authors.

 

This shouldn’t be confused with the Taxon, which is more correctly described by the Protologue Instance of a Name.

e.g. https://id.biodiversity.org.au/name/apni/70914

Figure 8. Name object HTML output

Example Name object in JSON

3.3.5. NslSimpleName

3.3.6. Reference

A Reference object represents a place that a name might be published. References are categorized by their reference type. The reference types currently defined are:

  1. Book

  2. Chapter

  3. Database

  4. Herbarium annotation

  5. Index

  6. Journal

  7. Series

  8. Personal Communication

  9. Database Record

  10. Paper

  11. Section

  12. Unknown

e.g. https://id.biodiversity.org.au/reference/apni/22408

 

Figure 9. Name object HTML output

Example Reference object in JSON

3.3.7. Node

To be completed.

3.3.8. Tree

To be completed.

3.3.9. Branch

To be completed.

3.3.10. Event

To be completed.

3.4. Name API V1.0

The name API lets you query and manipulate names. In general to change a name you need to be authorized to do so.

The general format of a name API call is

where the things in round brackets are optional.

This is a REST API, and uses HTTP methods to determine the action. If you use the wrong HTTP method you will get an error

The services use content negotiation to determine the output type, so a content type of text/JSON gives you JSON response. The services will give an HTML response in a browser, which doesn’t include all the data available in JSON or XML.

In this document we will use JSON.

The output of services generally include standard JSON object representations of Name, Instance, Reference, [Instance Note], Author

If the name record ID doesn’t exist for any operation you will get a 404 status result with the following body (in JSON)

An action that completes will in general return a 200 status, though the action may not have 'worked'. In general an errors field with a list of error strings will be returned with a negative result field it the action doesn’t work.

If you can’t do an action (e.g. can’t delete an instance because of references to it) you will generally get a 403 with a payload containing the error message.

If you try to do an API call that requires Autorization without supplying a valid API Key, you will get an authorization error response.

3.4.1. branch

GET branch gets the APC branch for this name. This gives you a list of Names in Classification order. e.g.

Examples

  • using curl

Returns

  • a list of Name objects from the top of the classification tree down to the name requested.

Example response

3.4.2. delete

GET delete tells you if a name can be deleted. If not it gives a list of error messages explaining why not.

DELETE delete deletes the name if it can be deleted. If not it gives a list of error messages explaining why not.

Returns

  • Brief Name object

  • action: 'delete'

  • ok: true/false - false means you can’t delete this name

  • errors: List - if ok is false this contains a list of error strings explaining the problem

if you are trying to delete and it fails (i.e. you didn’t check if you could delete it) you will get a 403 return code, with a payload containing error messages.

Example 1

Check if you can delete

response

Example 2

Non working response

Working response

3.4.3. family

GET family returns the family of the name according to the APNI or 'Name classification'. The Name classification may be different to other classifications such as APC.

Returns

  • Brief name object of the name you are querying

  • action: 'family'

  • famlyName: the full Name object of the Family this name belongs to

Example

Example response

3.4.4. apc

GET apc tells you if this name is in the APC classification tree.

A name being in the classification tree is not the same as the name being part of the Australian Plant Census. Names in the APC tree have one of the following types:

  • ApcConcept. The name is part of APC.

  • ApcExcluded. An excluded name. For convenience, a boolean attribute 'excluded' is also included in the response.

  • DeclaredBt. Names that, in the previous system, were declared as being "Broader Terms" of names in APC, but which were not themselves in APC.

A typical place where 'DeclaredBt' nodes appear in the tree is when several species under a genus are excluded, but where the genus itself has not been dealt with explicitly. Another place where they can appear is where a higher taxonomy is not completed for a group of names.

That is, you cannot treat 'DeclaredBt' as implying that the name is (or should be) an excluded name, or that it is (or should be) part of APC. Even in the case where a number of species in a genus have been excluded, but where the genus is a 'DeclaredBt', it may or may not be the case that the genus has another species that does appear in Australia.

Work on the Australian Plant Census is ongoing. As users of the data, all that we know about DeclaredBt names is that the name was used as a higher grouping in the previous APC system with nothing more being said about it by the APC team. We imported this data into the NSL as it was.

Returns

  • The Brief name of the name in the query

  • inAPC: true/false

Example

Example response

3.4.5. apni

GET apni Tells you if this name is in the APNI classification.

Returns

  • The Brief name of the name in the query

  • inAPNI: true/false

Example

curl -L -H "Accept: application/json" -X GET https://id.biodiversity.org.au/name/apni/54427/api/apni

Example response

3.4.6. name-strings

GET name-strings constructs the name strings for this Name using the rules in the 'ConstructedNameService' and returns them as a JSON resource. This will not change the Name object.

PUT name-strings constructs the name strings for this Name using the rules in the 'ConstructedNameService', updates the Name object with these strings, and returns them as a JSON resource.

Updating the name strings of a Name may be necessary if a name string gets out of sync with the Name data for some reason, (such as an SQL update) or the name construction algorithm has been changed.

This will re-write the full and simple names on the name object and cause the name updater to run updating NSL Simple Names and contacting anyone who has registered to get notifications of changes.

Returns

  • The Brief na of the name in the query

  • action: 'nameStrings'

  • result:

    • fullMarkedUpName: the full name including author marked up with HTML5/XML

    • simpleMarkedUpName: the name sans author marked up with HTML5/XML

    • fullName: the full name with author in plain text

    • simpleName: the name sans author in plain text

Example 1

Response

Example 2

Response

3.4.7. name-update-event-uri

PUT name-update-event-uri to have your service notified of changes to a name register a call back URI with the NSL services using registerNameUpdateEventUri.

DELETE name-update-event-uri removes your uri from our event notification list, your service will stop being notified of name updates.

Parameters

uri: the URI you wish to put or remove

This will register your URI with the name service. when a Name change occurs this URI will be called with the type of update and the Identifier.

The type will be one of:

  • create

  • update

  • delete

For example it may call:

http://myservice.org.au/notify/update?id=https://id.biodiversity.org.au/name/apni/70914

Your service end point can then just call that URI identifier directly to get the updated name details (see Name).

Example 1 add a URI

Response

Example 2 delete a URI

Example response

{"text":"unregistered http://localhost:8088/test"}

3.4.8. export-nsl-simple

3.4.9. apni-format / apni-format-embed

GET apni-format gets the APNI formatted output for a Name.

GET apni-format-embed gets the APNI formatted output for a Name in an embeddable format.

 

this is currently only available as HTML

Example 1

https://id.biodiversity.org.au/name/apni/61294/api/apni-format

Figure 10. APNI Format HTML output

Example 1

https://id.biodiversity.org.au/name/apni/61294/api/apni-format-embed

Figure 11. APNI Format HTML embeded output

3.4.10. apc-format / apc-format-embed

GET apni-format gets the APC formatted output for a Name.

GET apni-format-embed gets the APC formatted output for a Name in an embedable format.

Parameters

embed: if set to true get the output as an embedable fragment of html.

Example 1

https://id.biodiversity.org.au/name/apni/61294/api/apc-format

Figure 12. APC Format HTML output

Example 1

https://id.biodiversity.org.au/name/apni/61294/api/apc-format-embed

Figure 13. APC Format HTML embeded output

3.4.11. simple-name

 

this has been removed.

3.4.12. acceptable-name

GET acceptable-name gets a list of acceptable brief format names given the simple or full name.

An 'acceptable' name is one that is not illegitimate or illegal and has one of the following name status'

  • 'legitimate'

  • 'manuscript'

  • 'nom. alt.'

  • 'nom. cons.'

  • 'nom. cons., nom. alt.'

  • 'nom. cons., orth. cons.'

  • 'nom. et typ. cons.'

  • 'orth. cons.'

  • 'typ. cons.'

Parameters

  • name: the search term

Example 1

Response

3.4.13. apni-concepts

GET apni-concepts gets a JSON representation of the APNI Format information on a name. This gives a summary of the name information and the usage instances of the name in references with their relationships.

Parameters

  1. relationships=false - gives you a shorter output without synonomy (see example 2), this takes less time to return.

The output us broken into two main sections, name and references. Name includes the name, its primary instance (normally the protologue), if it’s in APC and its Family.

The references section contains an ordered list of references that contain a list of 'citations', or relationship name usage instances. References without citations are stand alone references.

The citation instance link under a Reference should be used in preference to the Reference, since it contains the usage information linking the reference to a name with a relationship, including the page, e.g.:

Example

https://biodiversity.org.au/nsl/services/name/apni/71063/api/apni-concepts.json

Result

Example 2 sans relationships

https://biodiversity.org.au/nsl/services/name/apni/71063/api/apni-concepts.json?relationships=false

Result

3.4.14. find-concept

GET find-concept finds the concept (Instance) with a reference that most matches the given term. You get back the brief name plus the instance that most matched the term given. The rank field tells you how many of the tokens in the term matched the reference citation. The brief Instance tells you the reference found.

Example

https://biodiversity.org.au/nsl/services/name/apni/166271/api/find-concept.json?term=B.S. Parris (1998)

Result

GET taxon-search finds the taxon in a classification, returning the output in the Taxon Export format.

Parameters

  1. q=Some name - e.g. Isopogon asper

  2. tree=APC - The classification to use for the output. Currently only APC available.

Example

Result

3.5. Instance API V1.0

3.5.1. delete

GET delete tells you if a Instance can be deleted. If not it gives a list of error messages explaining why not.

DELETE delete deletes the Instance if it can be deleted. If not it gives a list of error messages explaining why not.

Returns

  • Brief Instance object

  • action: 'delete'

  • ok: true/false - false means you can’t delete this instance

  • errors: List - if ok is false this contains a list of error strings explaining the problem

if you are trying to delete and it fails (i.e. you didn’t check if you could delete it) you will get a 403 return code, with a payload containing error messages.

Example: Check if you can delete

response

Failing example trying to delete

Non working response

Working example trying to delete

Working response

3.6. Reference API V1.0

3.6.1. citation-strings

GET citation-strings constructs the citation strings for this Reference using the rules in the 'ReferenceService' and returns them as a JSON resource. This will not change the Reference object.

PUT citation-strings constructs the citation strings for this Reference using the rules in the 'ReferenceService', updates the Reference object with these strings, and returns them as a JSON resource.

Updating the citation strings of a Reference may be necessary if a citation string gets out of sync with the Reference data for some reason, (such as an SQL update) or the citation construction algorithm has been changed.

This will re-write the HTML and plain citation on the Reference object.

Returns

  • The Brief Reference of the reference in the query

  • action: 'citation-strings'

  • result:

    • citationHtml: the full citation marked up with HTML5/XML

    • citation: the citation in plain text

Example 1

Response

Example 2

Response

3.6.2. delete

GET delete tells you if a reference can be deleted. If not it gives a list of error messages explaining why not.

DELETE delete deletes the reference if it can be deleted. If not it gives a list of error messages explaining why not.

 

You need to be an administrator, or administrator service to call this with the 'DELETE' method. Use your apiKey to authenticate.

Returns

  • Brief Reference object

  • action: 'delete'

  • ok: true/false - false means you can’t delete this name

  • errors: List - if ok is false this contains a list of error strings explaining the problem

if you are trying to delete and it fails (i.e. you didn’t check if you could delete it) you will get a 403 return code, with a payload containing error messages.

Parameters

  • reason - the reason this reference is being deleted.

Example 1

Check if you can delete

response

Example 2

Non working response

Working response

3.6.3. move

DELETE move move all associated resources for a reference to another reference. This is typically used in de-duplicating references that have been entered multiple times. This action will:

  • redirect the URI’s associated with the source reference to the target reference,

  • move all instances, comments, external references, notes, to the target reference, then

  • delete the source reference.

Parameters

  • target - the target reference id on the service, ie. the database ID (this is not intended for use externally)

  • user - (optional) The user to blame, defaults to the administrator.

You use the resource URI as the source reference and pass the target reference ID as a parameter.

example

response

A brief target Reference object is returned along with the result "ok" to indicate success. If there are errors they will be in an errors field as a list.

3.6.4. deduplicate-marked

DELETE deduplicate-marked finds all references with duplicateOf set to another reference and calls the move action on it.

Parameters

  • user - (optional) The user to blame, defaults to the administrator.

example

response

a list of de-duplicated reference DB IDs with an indication of success.

3.7. Author API V1.0

3.7.1. deduplicate

DELETE deduplicate Deduplicate author. Takes a duplicate author and replaces it with a target author. All References and Names that currently use the duplicate author will have the author replaced with the target author. The mapper is updated so that URL IDs are updated to refer to the target author. The duplicate author is then deleted from the database.

Parameters

  • user - (optional) The user to blame, defaults to the administrator.

example

response

The results of deduplicating the author.

3.8. Suggestions API V1.0

The NSL infrastructure provides a number of simmple suggestions services as described in http://nerderg.com/Simple+Suggestions+plugin

The simple suggestions service is open and provides a way to do a simple type ahead or suggestion for search results to do with names, references and authors in the NSL.

The suggestion service works on a set of subjects where you provide a search term, and the service gives a list of strings as a result set.

the basic structure of a suggestionservice URL is:

Using the jQuery-ui autocomplete widget you can add the suggestions to your web page with the following javascript:

You then just mark up you input text box like this:

That says do a autocomplete here (class="suggest") on subject "apni-search" and quote the result if clicked on.

The suggestion subjects are:

  • apni-search - search APNI on full name as per the apni name search service

  • apc-search - search APC on full name as per the search service

  • simpleName - search on simple name, returns all names that match

  • acceptableName - search on simple or full name, only returns names that are deemed 'acceptable' see acceptable-name

  • author - search for an author

  • publication - search for a publication

  • epithet - search for an epithet

  • nameType - search on name types.

Most suggestion end points just return a simple list of names.

Example

https://biodiversity.org.au/nsl/services/suggest/simpleName?term=acacia%20dealbata%20sub

Results

acceptableName returns a list of JSON objects of name and link, where link is the ID or URI of the name.

Example

https://biodiversity.org.au/nsl/services/suggest/acceptableName?term=acacia%20dealbata%20sub

Results

3.9. Tree Structure V1.0

This documentation is under construction. We understand it needs extending and improving, but we thought we’d let you see what we’ve got so far.

Our tree component stores taxonomic trees, and potentially more. It can store arbitrary document-structured data, such as profile data, whose elements are identified by RDF classes and predicates.

The tree component holds a full history of every change made to any part of the tree.

The principal design goal is that no persistent [1] node of the tree should ever change. That is: given a publically available node id, the entire document at that node - the node itself and all nodes below it - should be stable.

The purpose of this is to permit node ids to be used as reliable references. Using a node id, an author can declare that they are using a particular name in a particular classification at a particular time, and that declaration will not change its meaning as our classifications are maintained and updated.

To make this possible, nodes are never changed - they are replaced by new nodes. [2]. This replacement propagates upward through the tree, resulting in a new root node whose entire document includes the new node in place of the previous one, by way of a series of new nodes along the path from the root to the changed node. In our code, we call this process "versioning".

The process is somewhat inspired by Git and other version-management tools. We do not use hashes to identify subdocuments, however, just the nodes themselves. We also do not use timestamps to manage or identify versions of a node - the structure over time is directly in the nodes and the links between them. [3]

Our algorithms permit any number of changes to be made to a tree simultaneously. This permits, for instance, a name to be moved from one higher taxon to another as a single operation. Certain bulk updates which have had to be done as part of the maintenance of the classification appear as single changes affecting almost all nodes.

Our algorithms permit a node to be used as a subnode of any number of other nodes (provided no cycles are formed). This would permit in future a user to create their own trees that connect together fragments of other trees.

3.9.1. Overall Structure

Figure 14. Tree overall structure

Our data consists of a number of tree Nodes.
Nodes are linked into directed acyclic graphs with Links.
Nodes each belongs to an Arrangement.
Node lifecycle is recorded in Events.

A node "is" its content and the set of links of which it is the supernode (and therefore the nodes below them). To put it another way, links are part of the node above them, a node has the links of which it is the supernode. A change to a link is a change to the node above it, not to the node below it. Nodes, in a sense, do not care where they are placed, they include what is placed under them.

Links, therefore, are not separate from nodes and there is not an api to retrieve them separately. A link’s identity is its supernode and its linkSeq. A node’s links are guaranteed to have unique linkSeq numbers but are not guaranteed to have consecutive linkSeq numbers. This makes it possible to work out what about a node has changed from one version to the next - matching linkSeq numbers are "the same" link. If a subnode is deleted from a node, what will happen is that a new version of the node will be created with that linkSeq missing.

Arrangements

We use the term "arrangement" in the sense of "an arrangement of nodes". The term "Tree" or "Classification" is used more strictly - it means an arrangement that has a specific higher-order structure. Most of the arrangements accessible by the public API are in fact classifications. Other arrangements are used internally.

Every node belongs to one Arrangement.

Every arrangement has one 'top' node.

Arrangements are where permissions and authorisation is done. We do not have an access control list on every node - it’s done at the arrangement layer.

Lifecycle

A node has two events in its lifecycle. It changes from being a draft node to being a (current) persistent node, and it changes from being a current node to being a replaced node. The checkedInAt attribute links to the Event at which a node is made persistent, and the replacedAt attribute links to the Event at which a node became replaced.

Most Events have many nodes checked in and replaced by them. Events have a timestamp, and so a node is current from the timestamp of it’s checkedInAt event up to but not including the timestamp of its replacedAt event.

Most of the time, users want to work with the set of current nodes in an arrangement, and our system is optimised towards that. From the point of view of our editors, the set of current nodes "is" the tree.

History

Corresponding to the lifecycle event attributes, a node also has a prev and next attribute (which we inconsistently call its copyOf and replacedBy node in some places). Nodes and links also have a boolean synthetic attribute.

Most of the time, nodes are updated because the nodes beneath them have been updated - a change has been rippled "up the tree" by the versioning algorithm. In these cases, the next and prev attributes will form a doubly-linked list and the node and its links will be marked 'synthetic'.

Cases where this is not true usually indicates a user edit - that something interesting has happened. A newly created node has no 'prev'. If a node is used in a different tree, and then in that different tree edits are performed, then the new node will have a 'prev' of the node from which it was copied, but the node from which it was copies will not indicate that the new node is its 'next'. Many cases are possible, and graphically showing them to a user in a meaningful and useful way … is something that would be very nice to have.

And so to find points in a node’s history where the node itself has been edited for some reason, search for nodes whose synthetic attribute is false.

A node has a next node if and only if it has a replacedAt event. We therefore have a special End Node whose id is 0 and whose RDF identifier is a constant belonging to the BOA RDF vocabulary [4]. This node has to belong to an arrangement, and so arrangement 0 is the End Tree consisting only of that single node, also having a constant id that is part of the BOA vocabulary [5]. From the point of view of the semantic web, the end node and end tree are each the same semantic-web "thing" wherever they appear.

The primary reason for this is internal - it is so that SQL queries that look at node histories and changes don’t have to outer join on node.next_node_id.

Certain attributes of Arrangements, Nodes, and Links alter how they are treated internally by the system at the lower level of processing. That is, they do not have taxonomic meaning. These types are separate from the RDF types, which are discussed in [Node and Link RDF types].

Arrangement types

An arrangement may be

E: The End Tree

There is only ever one end tree, and it has an id of 0. Discussed in History

P: Public classification

This is the most usual type that a user of our API will deal with. Classification trees have a specific higher-level structure discussed in Classification trees.

U: User

User trees will be made up of fragments of other trees.

B: Bookmark

These will consist of one (or perhaps several) nodes that "track" nodes in other arrangements.

Z: System temporary

These are used internally to perform certain operations and discarded.

Node types

A node may be

S: System node

These are nodes which are used internally by the system, but which do not have scientific or taxonomic meaning.

T: Taxonomic node

This node will be associated with a name, and usually with an instance.

D: Document node

These nodes will be collections of value nodes and other document nodes.

Z: Temp node

These are nodes used internally to perform certain operations and discarded.

V: Value node

A value node either has a literal value or is a semantic web URI.
Value nodes never change, are never replaced, and never have subnodes [6]. Value nodes are always attached to supernodes with fixed links. In RDF, value nodes do not appear as nodes in their own right with an identifying URI; they are instead rendered as properties on the Document or Taxonomic node to which they are attached.

A node may also be synthetic (or not). This is discussed in History.

A link has a versioning Method. This may be

V: Versioning

If the subnode of the link is replaced with a new version, then the supernode must be replaced with a new version.
This is the usual case. Versioning links are how the normal "changes must be rippled up" operation of the system works. If the supernode of a versioning link is a current node, then the subnode will also be current.

F: Fixed

If the subnode of the link is replaced with a new version, then do not ripple the change up.
Value nodes are always attached to ther supernodes with fixed links. Aside from this, we do not use fixed links at present, although they may be an option in user-created arrangements. If an arrangement uses fixed links, then it is not possible to identify nodes currently attached to the root of the tree without doing a tree-walk.

T: Tracking

If the subnode of the link is replaced with a new version, then update the link to refer to the new version without making a new version of the supernode.
The subnode of a tracking link is always a current node. This even applies to replaced (old) nodes. The tracking links of replaced nodes are not frozen in time because "where the tracking link happened to be at the time this node was replaced" doesn’t mean anything that could not be meant by using a versioning link.

We use tracking links to provide a persistent handles to nodes that change over time, to provide a persistent name for whatever the current version of some other node might be. They do not form part of taxonomic trees [7]. See Classification trees for the most important use of this currently.

Nodes and links have attributes that carry data. These attributes are not of interest to the low-level maintenance of the tree structure, but they are of interest to whatever it is that is using the tree to store data.

Firstly, and most obviously, T type nodes usually hold a name and instance id. In NSL, we have a 'Name Tree' whose main job is to provide Phylum and Family for generic and subgeneric names so that suitable output can be produced. We also have an 'APC' classification whose job is to hold accepted names and taxa for the Australian Plant Census.

Parallel with the name and instance ids, we also have name and taxon URIs. Name and taxon URIs match the name and instance ids when these ids refer to names held in the local shard. However, having URIs permit us to create branches that terminate at names and taxa that are foreign ids.

Along with name and taxon URIs, we also have a 'Resource' URI. This is mainly intended for V (value) type nodes. In RDF, a value may be a types primitive, or is may be a "resource" - a URI.

Nodes and links also have a Type URI. In RDF, the link type becomes the RDF predicate, and the node type becomes the type of a typed primitive (where the node is a value), or is used as the OWL class of the node.

The uri type may have meaning within the tree. In particular, our APC tree has nodes of type APCConcept, APCExcluded, and DeclaredBT. Their meanings are described in apc.

Physically, our URIs are broken into a namespace and a an id. These are named ‹Name|Taxon|Resource›UriNsPart and ‹Name|Taxon|Resource›UriIdPart. The original purpose for splitting the URIs in this way was to make it easier to generate RDF. However, now we use D2R to generate the RDF, which does not use this feature to label the URI prefixes, and this design may be unnecessary.

Uri namespace 0 is always the 'empty' namespace. A namespace of 0 means that the entire URI is in the 'UriIdPart'. The purpose of this is so that in SQL you don’t need to outer join the namespace table.

Uri namespace 1 is always the boatree namespace `http://biodiversity.org.au/voc/boa/Tree#'. This namespace is the prefix for internal artifacts when they are exposed as RDF.

3.9.4. Classification trees

Classification trees are trees that have a specific higher-order structure, a specific way of using the lower-level data structure and algorithms.

The Arrangement of a classification tree has type P - public classification. The arrangement always has a label, and that label becomes the suffix of the persistent URI identifying the classification.

The top node of the arrangement is a system node S. This node has RDF type classification-node and holds one tracking link to a taxonomic node T of RDF type 'classification-root'.

The classification root is not part of the taxonomy, does not have a name or an instance, and it may have been a mistake to make it a taxonomic node rather than a system node. We have an unnamed root node because

  • having a single taxon at the top level would make it impossible to change the name at the top of the taxonomy while recording the history correctly. If our taxonomists were to decide to "push down" the top name, the only way to do it would be to change the name of the top node and add a new node under it with the previous name. The versioning history would not reflect what actually happened - it would not show that the top node got pushed down and that the new top node is new. Instead it would show that the 'Plantae' APC taxon had its name changed to 'Eukaryota' (or whatever). This would simply be wrong - the Plantae concept would not have changed, it would just have been moved.

  • some taxonomies have multiple taxa at the top level that are not organised into a higher classification. For instance, AFD has 'Animalia' and 'Prototheria' at the top level. It would - again - simply be wrong to insist on a top level taxon of Eukaryota or "All life on Earth", because these names are not part of the Australian Faunal Directory.

The effect of this is that changes in classification trees result in a new classification-root node, and those nodes form a single line of history that can be navigated by looking at their next/prev attributes and the timestamps on their associated Events. The single classification-node acts as a bookmark - its single sublink always points to the current (most recent) classification root.

There are a couple more rules:

A classification only ever has one current node for any given name. That is - names appear only once. Consequently, our API for classification is is built entirely around names: "add this name to that name as an excluded name" and so on.

Every node in a classification arrangement belongs to that arrangement. The purpose of this is to expedite the most important and common operation: find a name’s current placement in a classification.

IF a node

  • has the name you are looking for; AND

  • belongs to the classification you are searching; AND

  • is current

THEN

  • it is the current placement of that name in that classification

  • it will have one superlink that links it to a node that

    • also belongs to the classification; AND

    • is current

Both of these conditions need to be checked. Nodes may be included as subnodes in user classifications, and nodes will be placed under multiple copies of a supernode when sibling nodes are updated. However, only one will be current.

And remember - this only applies to Arrangements of type P, which are maintained in such a way as these rules are followed.

3.10. Tree API for JSON V1.0

The tree JSON API is built to work with AngularJS and similar platforms.

This API comprises three main components: * the tree view services; and * JSON views of nodes, arrangements, and events; * the tree edit services, which require a user login

  • TODO: put in correct URLs for these services **

3.10.1. tree view services

listNamespaces

GET (no parameters) to list the known namespaces.

Returns

  • A JSON array of information about namespaces

Example 1

Response

listClassifications

GET namespace to list the classifications in a namespace

Parameters

namespace: the name of a namespace

Returns

  • A JSON array of the URIs of classifications in the namespace

Example 1

Response

listWorkspaces

GET namespace to list the workspaces in a namespace

Parameters

namespace: the name of a namespace

Returns

  • A JSON array of the URIs of workspaces in the namespace

Example 1

Response

permissions

3.10.2. JSON views of nodes, arrangements, and events

3.10.3. Tree Edit Services

createWorkspace
deleteWorkspace
updateWorkspace

3.11. SPARQL and the semantic web

The NSL system provides a SPARQL endpoint which runs queries written in "SPARQL Protocol and Semantic Query Language", and which serves NSL data as sets of RDF "triples".

This endpoint is located at https://biodiversity.org.au/sparql . Use it as per . In a nutshell: pass an encoded SPARQL query as a parameter named 'query' (GET or POST).

The endpoint will respond with an RDF document by default, but can respond with various other formats as specificed in the HTTP Accept header. Our endpoint is a Jena Joseki instance. More details on what Joseki will accept may be available at .

The query itself is written in SPARQL, a query language very similar to SQL. Documentation is here . Although this document may show some SPARQL examples, it is not intended as a SPARQL primer. The main focus here is a description of what data we have available at the sparql endpoint.

3.11.1. Linked data and semantic web integration

The triples available at our endpoint are intended to be resolvable linked data URIs.

Static vocabulary

Vocabulary items - names of predicates, enumerated values - resolve to ontology documents either externally available or hosted at http://biodiversity.org.au/voc/ .

For instance, some of our rdf objects have an rdf:type of http://biodiversity.org.au/voc/boa/InstanceNote#Key . An http GET of this url will result in a redirect to http://biodiversity.org.au/voc/boa/InstanceNote.rdf, which is the document containing the definition of that rdf class along with other things.

Data items

Entities corresponding to data in our system are identified by NSL identifiers, eg: http://id.biodiversity.org.au/name/apni/106541 .

Entered into a web browser (eg, by clicking the link above), these identifiers result in a linked data 303 redirect to a html page supplied by our service layer.

However, if the requested content type (the HTTP Accept header) is application/rdf+xml, or if an .rdf suffix is appended, then this will result in a 303 redirect to the sparql endpoint. In this particular case, the redirect will be to https://biodiversity.org.au/sparql/?query=DESCRIBE+%3Chttp://id.biodiversity.org.au/name/apni/106541%3E .

This has the effect of sending the query

To the sparql endpoint, which responds with an RDF document produced by Joseki.

That is: linked data for data items is simply served up from our sparql endpoint, via HTTP redirects.

The goal of this "round trip" integration of the sparql service and semantic web standards is that our content should be machine-friendly. It should be possible for tools like Protege, web-based semantic web browsers, even reasoning engines, to be able to work over our content. It should be possible to mount our SPARQL endpoint as a named graph in an external JENA installation, and for queries run against that external installation to simply work.

3.11.2. Overall content

The triples available at our sparql endpoint are organised into named graphs. The default graph contains triples describing what named graphs are available. The intent of doing this is that a simple SELECT * where { ?s ?p ?o }

will pull back useful information without the server attempting to serve up the entire content of our data in response.

A more useful organisation of this data can be gotten by way of

(link)

Or

(link)

Table 1. Named graphs available at

Graph URI

title

Graph URI

title

g:AFD_PRF

AFD profile data

g:AFD_PUB_CIT

AFD Publication Citations

g:AFD_TAX_CON

AFD Taxa

g:AFD_TAX_NAM

AFD Names

g:AFD_TREE

AFD Taxonomy

g:APC_TREE

APC Taxonomy

g:APNI_PRF

APNI profile data

g:APNI_PUB_CIT

APNI Publication Citations

g:APNI_TAX_CON

APNI taxa

g:APNI_TAX_NAM

APNI Names

g:APNI_TREE

Taxonomy according to the reference

g:COL_TAX_CON

Accepted names and synonyms in the Catalogue of Life

g:COL_TAX_NAM

Taxon names in Catalogue of Life that do not appear elsewhere in our data.

g:NSL_APNI

NSL APNI

g:NSL_OZMOSS

NSL OZMOSS

g:afd

AFD complete

g:apni

APNI complete

g:col

COL complete

g:dc_voc

Dublin Core vocabulary

g:dwc_voc

Darwin Core vocabulary

g:ibis_voc

Complete IBIS vocabulary

g:ibis_voc_local

IBIS vocabulary

g:meta

Service metadata

g:names

All names

g:nsl

National Species List

g:taxa

All taxa

g:tdwg_voc

TDWG Vocabulary

Vocabulary and metadata

Our sparql instance contains rdf ontologies for the various vocabularies we use. That is: our sparql store loads a copy of the linked data ontology documents. With this content, a sparql query can pull back predicate labels and descriptions.

g:ibis_voc_local contains only those terms defined at biodiversity.org.au`, and g:ibis_voc`` is the union of all the vocabulary graphs and is probably the most useful. For instance, to list all classes defined in all vocabularies loaded into the sparql dataset:

This, of course, doesn’t mean that we have objects that are instances of all of these types, merely that they are defined in the vocabularies that we have loaded into our data.

This means that a sparql query can pull back titles for attributes. Consider:

We get a list of labels values for all properties of the name. We could go further and find for each value that is an RDF resource the label of the type of the value, and so on.

Outdated APNI, AFD, and CoL data

Our data set contains a static data extract from the Australian Faunal Directory, the Australian Plant Names Index, and data pulled from the Catalogue of Life 2011 CD-ROM. This data is very out-of-date and will probably not be being updated again (althoug a frech extract of AFD may still be possible).

APNI is being superseded by NSL, and AFD will also become available as an NSL-structured data set.

The old data is presented as far as possible using TDWG terms (classes and predicates) with various additions from local vocabularies where the TDWG ontology did not have terms matching closely enough to the meaning of fields in our data. This document will not attempt to describe the stucture and meaning of this deprecated data.

The g:afd, g:apni and g:col graphs are a union of the component parts of the AFD, APNI, and CoL datasets and are probably the most useful way to run queries against that data.

NSL data

The nsl datasets: g:NSL_APNI, g:NSL_OZMOSS are live links to the underlying NSL tables, provided by way of the d2rq JENA library. The g:nsl graph is a union of these two graphs and the NSL vocbulary. These graphs present live data as RDF triples, queryable by SPARQL. To make sense of the triples available in these graphs, some understanding of the nsl data model is required.

3.11.3. The NSL data model and its RDF representation

TODO: this will probably need some images and diagrams

3.12. Export

3.12.1. Exports

There will be a number of specific exports available via the NSL. These exports are available via the export index found at

Names export

The names export contains all the names currently contained in the shard.

example APNI name output
Taxon export

The taxon export contains a list of Accepted names for the chosen classification.

example APC taxon output

Notes

  1. nodes start in a draft state that permits edits

  2. That is: the content does not change. The state obviously does, but only in respect of the internal functioning of the tree, not in terms of the node’s nomenclatural or taxonomic meaning or status.

  3. timestamps are often problematic, Common problems involving confusions about timezone and system clocks not properly set.

  4. http://www.biodiversity.org.au/voc/boa/Tree#END-NODE

  5. http://www.biodiversity.org.au/voc/boa/Tree#END-TREE

  6. It doesn’t make sense to say that 1 has changed into 2. It only makes sense to say that something has a property that has changed from 1 to 2. That is: you are talking about the link, which belongs to the supernode, not about the value node itself.

  7. It might be reasonable for profile data to be linked to with tracking links, depending on how it is managed. Such a link would be more of an FYI inclusion - it would mean that the content of the document does not form part of the identity of the taxon