TTNWW - TST Tools for the Dutch Language as Web services in a Workflow

<?xml version="1.0" encoding="UTF-8"?>
<cmd:CMD xmlns:cmd="http://www.clarin.eu/cmd/1"
         xmlns:cmdp="http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1342181139640"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         CMDVersion="1.2"
         xsi:schemaLocation="http://www.clarin.eu/cmd/1 https://infra.clarin.eu/CMDI/1.x/xsd/cmd-envelop.xsd http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1342181139640 https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.1/profiles/clarin.eu:cr1:p_1342181139640/1.2/xsd">
   <cmd:Header>
      <cmd:MdCreator>rogierkraf</cmd:MdCreator>
      <cmd:MdCreationDate>2013-11-30+02:00</cmd:MdCreationDate>
      <cmd:MdProfile>clarin.eu:cr1:p_1342181139640</cmd:MdProfile>
      <cmd:MdCollectionDisplayName>CLARIN Netherlands</cmd:MdCollectionDisplayName>
   </cmd:Header>
   <cmd:Resources>
      <cmd:ResourceProxyList>
		       <cmd:ResourceProxy id="TTNWW001">
			         <cmd:ResourceType>Resource</cmd:ResourceType>
			         <cmd:ResourceRef>http://yago.meertens.knaw.nl/apache/TTNWW/</cmd:ResourceRef>
		       </cmd:ResourceProxy>
	     </cmd:ResourceProxyList>
      <cmd:JournalFileProxyList/>
      <cmd:ResourceRelationList/>
   </cmd:Resources>
   <cmd:Components>
      <cmdp:ClarinSoftwareDescription>
         <cmdp:GeneralInfo>
            <cmdp:name xml:lang="nld">TTNWW</cmdp:name>
            <cmdp:name xml:lang="eng">TTNWW</cmdp:name>
            <cmdp:title xml:lang="nld">TTNWW - TST Tools voor het Nederlands als Webservices in een Workflow</cmdp:title>
            <cmdp:title xml:lang="eng">TTNWW - TST Tools for the Dutch Language as Web services in a Workflow</cmdp:title>
            <cmdp:publicationYear>2013</cmdp:publicationYear>
            <cmdp:url>http://yago.meertens.knaw.nl/apache/TTNWW/</cmdp:url>
            <cmdp:CLARINCentre>Meertens Institute</cmdp:CLARINCentre>
            <cmdp:OriginalSource>http://portal.clarin.nl/node/1964</cmdp:OriginalSource>
            <cmdp:ReleaseStatus>
               <cmdp:LifeCycleStatus>withdrawn</cmdp:LifeCycleStatus>
               <cmdp:lastUpdate>2018-09-07</cmdp:lastUpdate>
            </cmdp:ReleaseStatus>
            <cmdp:NationalProjects>
               <cmdp:Project>
                  <cmdp:name>CLARIN-NL</cmdp:name>
                  <cmdp:title>CLARIN in the Netherlands</cmdp:title>
                  <cmdp:id>184.021.003</cmdp:id>
                  <cmdp:funder>NWO</cmdp:funder>
                  <cmdp:url>http://www.clarin.nl</cmdp:url>
                  <cmdp:Contact>
                     <cmdp:Person>Jan Odijk</cmdp:Person>
                     <cmdp:Role>National Coordinator</cmdp:Role>
                     <cmdp:Address>Utrecht, the Netherlands</cmdp:Address>
                     <cmdp:Email>j.odijk@uu.nl</cmdp:Email>
                     <cmdp:Department>UiL-OTS</cmdp:Department>
                     <cmdp:Organisation>Utrecht University</cmdp:Organisation>
                  </cmdp:Contact>
                  <cmdp:Duration>
                     <cmdp:StartYear>2009</cmdp:StartYear>
                     <cmdp:CompletionYear>2015</cmdp:CompletionYear>
                  </cmdp:Duration>
               </cmdp:Project>
			         </cmdp:NationalProjects>
            <cmdp:Country>
			            <cmdp:CountryName>Netherlands</cmdp:CountryName>
               <cmdp:CountryCoding>NL</cmdp:CountryCoding>
            </cmdp:Country>
            <cmdp:Description>
				           <cmdp:Description>TTNWW integrates and makes available existing Language Technology (LT) software components for the Dutch language that have been developed in the STEVIN and CGN projects. The LT components (for text and speech) are made available as web-services in a simplified workflow system that enables researchers without much technical background to use standard LT workflow recipes.
				 The web services are available in two separate domains: "Text" and "Speech" processing. For "Text", workflows for the following functionality is offered by TTNWW:
				 - Orthographic Normalisation using TICCLops (version CLARIN-NL 1.0);
				 - Part of Speech Tagging, Lemmatisation, Chunking, limited Multiword Unit Recognition, and Grammatical Relation Assignment by Frog (Version 012.012);
				 - Syntactic Parsing (including grammatical relation assignment, limited named entity recognition, and limited multiword unit recognition) by the Alpino Parser (version 1.3);
				 - Semantic Annotation;
				 - Named Entity Recognition;
				 - Co-reference Assignment.
				 
				 For "Speech", the following workflows are offered:
				 - Automatic Transcription of speech files using a Netherlands Dutch acoustic model;
				 - Automatic Transcription of speech files using a Flemish Dutch acoustic model;
				 - Conversion of the input speech file to the required sampling rate, followed by automatic transcription.
				 
				 The TTNWW services have been created in a Dutch and Flemish collaboration project building on the results of past Dutch and Flemish projects. The web services are partly deployed in the SURF-SARA BiG-Grid cloud or at CLARIN centres in the Netherlands and at CLARIN VL University partners.
				 The architecture of the TTNWW portal consists out of several components and follows the principles of Service Oriented Architecture (SOA). The TTNWW GUI front-end is a Flex module that communicates with the TTNWW web-application which keeps track of the different sessions and knows which LT recipes are available. TTNWW communicates assigments (workflow specifications) to the WorkflowService that evaluates the requested workflow and requests the DeploymentSevice to start the required LT web-services. After initialization of the LT web-services, the workflow specification is sent to the Taverna Server, that takes further care of the workflow.
				 To facilitate the process of wrapping applications that were originally designed as standalone applications into web services, the CLAM (Computational Linguistics Application Mediator) wrapper software allows for easy and transparent transformation of applications into RESTful web services. The CLAM software has extensively been used in the TTNWW project for both text and speech processing tools. With the exception of Alpino and MBSRL all web services work operate on CLAM wrappers.
				 Given the number of web services involved in the TTNWW project and possibilities offered by the cloud environment the preferred method of delivering the web service installations was delivery of complete virtual machine images by the LT providers. These could be directly uploaded into the cloud environment and thus relieving the CLARIN centres nd LT providers from the original foreseen task of running the webservices themselves. A potential advantage of this method, that has not been exploited in the project yet, is that these images may be also be delivered directly to the end user so these can be run in a local configuration using virtualization software such as VMWare of VirtualBox.
				 The workflow engine used in the project was Taverna. But build on top of this was a a number of selectable task recipes, following a task oriented approach in line with the premises that users with no or little technical expertise should be able to use the system. In this context, tasks are understood in terms of end results of processes such as semantic role labelling, pos tagging or syntactic analysis and ready-made workflows are constructed that can be readily used by the end user. </cmdp:Description>
            </cmdp:Description>
         </cmdp:GeneralInfo>
         <cmdp:SoftwareFunction>
            <cmdp:toolCategory>NLP development aid</cmdp:toolCategory>
            <cmdp:toolCategory>written language tool</cmdp:toolCategory>
            <cmdp:toolCategory>spoken language tool</cmdp:toolCategory>
            <cmdp:ToolTasks>
               <cmdp:toolTask>grammatical relation assignment</cmdp:toolTask>
               <cmdp:toolTask>coreference resolution</cmdp:toolTask>
               <cmdp:toolTask>corpus processing</cmdp:toolTask>
               <cmdp:toolTask>dependency parsing</cmdp:toolTask>
               <cmdp:toolTask>lemmatisation</cmdp:toolTask>
               <cmdp:toolTask>multiword unit identification</cmdp:toolTask>
               <cmdp:toolTask>named entity recognition</cmdp:toolTask>
               <cmdp:toolTask>orthographic normalisation</cmdp:toolTask>
               <cmdp:toolTask>part of speech tagging</cmdp:toolTask>
               <cmdp:toolTask>semantic role labeling</cmdp:toolTask>
               <cmdp:toolTask>chunking</cmdp:toolTask>
               <cmdp:toolTask>parsing</cmdp:toolTask>
               <cmdp:toolTask>speech recognition</cmdp:toolTask>
               <cmdp:toolTask>speech transcription</cmdp:toolTask>
               <cmdp:toolTask>tokenisation</cmdp:toolTask>
			            <cmdp:toolTask>up/down sampling</cmdp:toolTask>
			         </cmdp:ToolTasks>
            <cmdp:ResearchPhases>
               <cmdp:ResearchPhase>Enriching Data</cmdp:ResearchPhase>
            </cmdp:ResearchPhases>
            <cmdp:ResearchDomains>
				           <cmdp:researchDomain>Linguistics</cmdp:researchDomain>
				           <cmdp:researchDomain>Communication and Media Studies</cmdp:researchDomain>
				           <cmdp:researchDomain>History</cmdp:researchDomain>
				           <cmdp:researchDomain>Oral History</cmdp:researchDomain>
			         </cmdp:ResearchDomains>
            <cmdp:LinguisticsSubject>
               <cmdp:linguisticsSubject>discourse analysis</cmdp:linguisticsSubject>
				           <cmdp:Description>
					             <cmdp:Description/>
				           </cmdp:Description>
            </cmdp:LinguisticsSubject>
            <cmdp:LinguisticsSubject>
               <cmdp:linguisticsSubject>orthography</cmdp:linguisticsSubject>
				           <cmdp:Description>
					             <cmdp:Description/>
				           </cmdp:Description>
            </cmdp:LinguisticsSubject>
            <cmdp:LinguisticsSubject>
               <cmdp:linguisticsSubject>semantics</cmdp:linguisticsSubject>
				           <cmdp:Description>
					             <cmdp:Description/>
				           </cmdp:Description>
            </cmdp:LinguisticsSubject>
            <cmdp:LinguisticsSubject>
               <cmdp:linguisticsSubject>syntax</cmdp:linguisticsSubject>
				           <cmdp:Description>
					             <cmdp:Description/>
				           </cmdp:Description>
            </cmdp:LinguisticsSubject>
            <cmdp:LanguageVariety>
               <cmdp:languageDependent>yes</cmdp:languageDependent>
               <cmdp:Language>
                  <cmdp:LanguageName>Dutch</cmdp:LanguageName>
                  <cmdp:ISO639>
                     <cmdp:iso-639-3-code>nld</cmdp:iso-639-3-code>
                  </cmdp:ISO639>
               </cmdp:Language>
               <cmdp:Centuries>
					             <cmdp:centuryDependent>yes</cmdp:centuryDependent>
					             <cmdp:CenturyInterval>
					                <cmdp:centuryFrom>20</cmdp:centuryFrom>
					                <cmdp:centuryThrough>21</cmdp:centuryThrough>
					             </cmdp:CenturyInterval>
				           </cmdp:Centuries>
            </cmdp:LanguageVariety>
         </cmdp:SoftwareFunction>
         <cmdp:SoftwareImplementation>
            <cmdp:distributionMedium>Online available</cmdp:distributionMedium>
            <cmdp:UserInterface>
               <cmdp:interfaceType>graphical user interface</cmdp:interfaceType>
               <cmdp:applicationType>web application</cmdp:applicationType>
            </cmdp:UserInterface>
            <cmdp:Input>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:inputType>text</cmdp:inputType>
	              <cmdp:inputResource>textual input</cmdp:inputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname/>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/plain</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Input>
            <cmdp:Input>
	              <cmdp:inputType>audio</cmdp:inputType>
	              <cmdp:inputResource>audio input</cmdp:inputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname/>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>audio/wav</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Input>
            <cmdp:Output>	
	              <cmdp:outputType>text</cmdp:outputType>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:outputResource>orthographic normalisation</cmdp:outputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname>idiosyncratic XML-like schema (undefined)</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/xml</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Output>
            <cmdp:Output>	
	              <cmdp:outputType>text</cmdp:outputType>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:outputResource>coreference resolution</cmdp:outputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname>idiosyncratic XML-like schema (undefined)</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/xml</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Output>
            <cmdp:Output>	
	              <cmdp:outputType>text</cmdp:outputType>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:outputResource>grammatically enriched text (tokenisation, lemmatisation, part of speech tagging, chunking, named entity recognition, limited multi-word identification, limited dependency parsing)</cmdp:outputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname>Frog CSV output</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/csv</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Output>
            <cmdp:Output>	
	              <cmdp:outputType>text</cmdp:outputType>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:outputResource>grammatically enriched text (tokenisation, lemmatisation, part of speech tagging, chunking, named entity recognition, limited multi-word identification, limited dependency parsing)</cmdp:outputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname>Frog FoLiA output</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/xml</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Output>
            <cmdp:Output>	
	              <cmdp:outputType>text</cmdp:outputType>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:outputResource>parsed text, text enriched with semantic role assignment</cmdp:outputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname>LASSY DTD</cmdp:schemaname>
                  <cmdp:schemaURL>http://www.let.rug.nl/vannoord/Lassy/alpino_ds.dtd</cmdp:schemaURL>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/xml</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Output>
            <cmdp:Output>	
	              <cmdp:outputType>text</cmdp:outputType>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:outputResource>speech transcription</cmdp:outputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname>idiosyncratic CSV, space separated, .stm, .seg, .ssa, .ctm_pp file extensions</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/csv</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Output>
            <cmdp:Output>	
	              <cmdp:outputType>text</cmdp:outputType>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:outputResource>speech transcription</cmdp:outputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname>idiosyncratic XML-like (undefined)</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/plain</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Output>
            <cmdp:Output>	
	              <cmdp:outputType>text</cmdp:outputType>
	              <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
	              <cmdp:outputResource>speech transcription</cmdp:outputResource>
	              <cmdp:Schema>
                  <cmdp:schemaname>undefined csv format, space-separated</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
		                <cmdp:MimeType>text/csv</cmdp:MimeType>
	              </cmdp:MimeType>	
            </cmdp:Output>
         </cmdp:SoftwareImplementation>
         <cmdp:Access>
            <cmdp:ResourceLicense>
               <cmdp:license>unknown</cmdp:license>
               <cmdp:distributionType>public</cmdp:distributionType>
               <cmdp:url>http://yago.meertens.knaw.nl/apache/TTNWW/</cmdp:url>
               <cmdp:Price>
                  <cmdp:amount>0</cmdp:amount>
                  <cmdp:ISO4217>
                     <cmdp:iso-4217-currency>EUR</cmdp:iso-4217-currency>
                  </cmdp:ISO4217>
               </cmdp:Price>
            </cmdp:ResourceLicense>
            <cmdp:Contact>
			            <cmdp:Person>Daan Broeder</cmdp:Person>
               <cmdp:Email>Daan.Broeder@meertens.knaw.nl</cmdp:Email>
               <cmdp:Organisation xml:lang="eng">Meertens Institute</cmdp:Organisation>
            </cmdp:Contact>
         </cmdp:Access>
         <cmdp:ResourceDocumentation>
            <cmdp:Documentation>
               <cmdp:title>TTNWW handleiding</cmdp:title>
               <cmdp:documentationTarget>user</cmdp:documentationTarget>
               <cmdp:url>http://yago.meertens.knaw.nl/apache/TTNWW/assets/TTNWW.pdf</cmdp:url>
               <cmdp:ISO639>
                  <cmdp:iso-639-3-code>nld</cmdp:iso-639-3-code>
               </cmdp:ISO639>
            </cmdp:Documentation>
			         <cmdp:Publication>
               <cmdp:publicationCategory>in book</cmdp:publicationCategory>
               <cmdp:publicationPurpose>scientific background</cmdp:publicationPurpose>
               <cmdp:peerReviewStatus>yes</cmdp:peerReviewStatus>
               <cmdp:Description>
                  <cmdp:Description LanguageID="eng">Kemps-Snijders, M, Schuurman, I, Daelemans, W, Demuynck, K, Desplanques, B, Hoste, V, Huijbregts, M, Martens, J-P, Paulussen, H, Pelemans, J, Reynaert, M, Vandeghinste, V, van den Bosch, A, van denHeuvel, H, van Gompel, M, van Noord, G and Wambacq, P. 2017. TTNWW to the Rescue: No Need to Know How to Handle Tools and Resources. In: Odijk, J and van Hessen, A. (eds.) CLARIN in the Low Countries, Pp. 83â93. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.7. License: CC-BY 4.0</cmdp:Description>
               </cmdp:Description>
            </cmdp:Publication>
            <cmdp:Pictures>
			            <cmdp:picture height="400" type="other" width="400">
				  http://dev.clarin.nl/sites/default/files/ttnww.jpg
			   </cmdp:picture>
			         </cmdp:Pictures>
			      </cmdp:ResourceDocumentation>
         <cmdp:SoftwareDevelopment>
            <cmdp:Project>
               <cmdp:name>TTNWW</cmdp:name>
               <cmdp:title>TTNWW - TST Tools voor het Nederlands als Webservices in een Workflow</cmdp:title>
               <cmdp:funder>?</cmdp:funder>
               <cmdp:url>http://portal.clarin.nl/node/1964</cmdp:url>
               <cmdp:Contact>
                  <cmdp:Email>Daan.Broeder@meertens.knaw.nl</cmdp:Email>
               </cmdp:Contact>
               <cmdp:Duration/>
            </cmdp:Project>
            <cmdp:Creator>
               <cmdp:Contact>
                  <cmdp:Email>Daan.Broeder@meertens.knaw.nl</cmdp:Email>
               </cmdp:Contact>
            </cmdp:Creator>
         </cmdp:SoftwareDevelopment>
         <cmdp:TechnicalInfo>
            <cmdp:ImplementationLanguage>
               <cmdp:implementationLanguage>unknown</cmdp:implementationLanguage>
               <cmdp:version>unknown</cmdp:version>
            </cmdp:ImplementationLanguage>
         </cmdp:TechnicalInfo>
      </cmdp:ClarinSoftwareDescription>
   </cmd:Components>
</cmd:CMD>
Organisation:
Meertens Institute
Utrecht University
Resources:

Resource

text/plain
Access