INPOLDER: Integrated Parser and Lemmatizer Dutch in Retrospect

<?xml version="1.0" encoding="UTF-8"?>
<cmd:CMD xmlns:cmd="http://www.clarin.eu/cmd/1"
         xmlns:cmdp="http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1342181139640"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         CMDVersion="1.2"
         xsi:schemaLocation="http://www.clarin.eu/cmd/1 https://infra.clarin.eu/CMDI/1.x/xsd/cmd-envelop.xsd http://www.clarin.eu/cmd/1/profiles/clarin.eu:cr1:p_1342181139640 https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/1.1/profiles/clarin.eu:cr1:p_1342181139640/1.2/xsd">
   <cmd:Header>
      <cmd:MdCreator>rogierkraf</cmd:MdCreator>
      <cmd:MdCreationDate>2013-11-30+02:00</cmd:MdCreationDate>
      <cmd:MdProfile>clarin.eu:cr1:p_1342181139640</cmd:MdProfile>
      <cmd:MdCollectionDisplayName>CLARIN Netherlands</cmd:MdCollectionDisplayName>
   </cmd:Header>
   <cmd:Resources>
      <cmd:ResourceProxyList>
		       <cmd:ResourceProxy id="INPOLDER001">
			         <cmd:ResourceType>Resource</cmd:ResourceType>
			         <cmd:ResourceRef>http://194.171.119.69/InPolderClient/inpolder.html</cmd:ResourceRef>
		       </cmd:ResourceProxy>
	     </cmd:ResourceProxyList>
      <cmd:JournalFileProxyList/>
      <cmd:ResourceRelationList/>
   </cmd:Resources>
   <cmd:Components>
      <cmdp:ClarinSoftwareDescription>
         <cmdp:GeneralInfo>
            <cmdp:name xml:lang="eng">INPOLDER</cmdp:name>
            <cmdp:title xml:lang="eng">INPOLDER: Integrated Parser and Lemmatizer Dutch in Retrospect</cmdp:title>
            <cmdp:publicationYear>unknown</cmdp:publicationYear>
            <cmdp:url>http://194.171.119.69/InPolderClient/inpolder.html</cmdp:url>
            <cmdp:CLARINCentre>Meertens Institute</cmdp:CLARINCentre>
            <cmdp:OriginalSource>http://portal.clarin.nl/node/1927</cmdp:OriginalSource>
            <cmdp:ReleaseStatus>
               <cmdp:LifeCycleStatus>published</cmdp:LifeCycleStatus>
               <cmdp:lastUpdate>2015-01-01</cmdp:lastUpdate>
            </cmdp:ReleaseStatus>
            <cmdp:NationalProjects>
               <cmdp:Project>
                  <cmdp:name>CLARIN-NL</cmdp:name>
                  <cmdp:title>CLARIN in the Netherlands</cmdp:title>
                  <cmdp:id>184.021.003</cmdp:id>
                  <cmdp:funder>NWO</cmdp:funder>
                  <cmdp:url>http://www.clarin.nl</cmdp:url>
                  <cmdp:Contact>
                     <cmdp:Person>Jan Odijk</cmdp:Person>
                     <cmdp:Role>National Coordinator</cmdp:Role>
                     <cmdp:Address>Utrecht, the Netherlands</cmdp:Address>
                     <cmdp:Email>j.odijk@uu.nl</cmdp:Email>
                     <cmdp:Department>UiL-OTS</cmdp:Department>
                     <cmdp:Organisation>Utrecht University</cmdp:Organisation>
                  </cmdp:Contact>
                  <cmdp:Duration>
                     <cmdp:StartYear>2009</cmdp:StartYear>
                     <cmdp:CompletionYear>2015</cmdp:CompletionYear>
                  </cmdp:Duration>
               </cmdp:Project>
            </cmdp:NationalProjects>
            <cmdp:Country>
               <cmdp:CountryName>Netherlands</cmdp:CountryName>
               <cmdp:CountryCoding>NL</cmdp:CountryCoding>
            </cmdp:Country>
            <cmdp:Description>
				           <cmdp:Description>INPOLDER (Integrated Parser and Lemmatizer of Dutch in Retrospect) provides a tool that assigns morphological tagging, lemmatization, and syntactic parsing for historical Dutch texts. It is built on the Adelheid tool (tagging and lemmatization) and Collins-Bikel statistical Parser.
				As an essential part of the Dutch cultural heritage, it is of vital importance that the Dutch historical record be made accessible for research into a wide range of historical and linguistic research questions. In the transition from the Middle Ages to the modern era, the Netherlands developed from speaking a diverse group of dialects (Hollandic, Brabantic, Flemish, North-eastern, Limburgian) to a country with a standard language, and there is good reason to believe that this process was an extremely dynamic one. Systematic research into these processes affecting syntax, phonology, morphology and spelling cannot be done without access to lemmatized, tagged and parsed corpora of historical Dutch. In recent years, a tagger-lemmatizer has been developed by Hans van Halteren (Adelheid, also available in the CLARIN infrastructure). INPOLDER complements these enrichment tool with a parser for historical Dutch.
				
				The INPOLDER parser is trained using a subset of the corpus of fourteenth-century texts (Corpus van Reenen/Mulder CRM, van Reenen and Mulder, 1993; Rem, 2003) and a subset of the Drenthe corpus (DC). CRM consists of 2700 charters from 345 places of origin. The corpus was designed as representative for the local language use of Middle Dutch and to be suitable for all types of linguistic research. </cmdp:Description>
            </cmdp:Description>
         </cmdp:GeneralInfo>
         <cmdp:SoftwareFunction>
            <cmdp:toolCategory>annotation tool</cmdp:toolCategory>
            <cmdp:toolCategory>written language tool</cmdp:toolCategory>
            <cmdp:ToolTasks>
               <cmdp:toolTask>corpus processing</cmdp:toolTask>
               <cmdp:toolTask>constituency-based parsing</cmdp:toolTask>
               <cmdp:toolTask>morphosyntactic tagging</cmdp:toolTask>
               <cmdp:toolTask>lemmatisation</cmdp:toolTask>
			         </cmdp:ToolTasks>
            <cmdp:ResearchPhases>
               <cmdp:ResearchPhase>Enriching Data</cmdp:ResearchPhase>
            </cmdp:ResearchPhases>
            <cmdp:ResearchDomains>
				           <cmdp:researchDomain>Linguistics</cmdp:researchDomain>
			         </cmdp:ResearchDomains>
            <cmdp:LinguisticsSubject>
               <cmdp:linguisticsSubject>historical linguistics</cmdp:linguisticsSubject>
				           <cmdp:Description>
					             <cmdp:Description/>
				           </cmdp:Description>
            </cmdp:LinguisticsSubject>
            <cmdp:LinguisticsSubject>
               <cmdp:linguisticsSubject>syntax</cmdp:linguisticsSubject>
				           <cmdp:Description>
					             <cmdp:Description/>
				           </cmdp:Description>
            </cmdp:LinguisticsSubject>
            <cmdp:LanguageVariety>
               <cmdp:languageDependent>yes</cmdp:languageDependent>
               <cmdp:Language>
                  <cmdp:LanguageName>Dutch</cmdp:LanguageName>
                  <cmdp:ISO639>
                     <cmdp:iso-639-3-code>nld</cmdp:iso-639-3-code>
                  </cmdp:ISO639>
               </cmdp:Language>
               <cmdp:Centuries>
                  <cmdp:centuryDependent>no</cmdp:centuryDependent>
               </cmdp:Centuries>
            </cmdp:LanguageVariety>
         </cmdp:SoftwareFunction>
         <cmdp:SoftwareImplementation>
            <cmdp:distributionMedium>Online available</cmdp:distributionMedium>
            <cmdp:UserInterface>
               <cmdp:interfaceType>graphical user interface</cmdp:interfaceType>
               <cmdp:applicationType>web application</cmdp:applicationType>
            </cmdp:UserInterface>
            <cmdp:Input>
				           <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
				           <cmdp:inputType>text</cmdp:inputType>
				           <cmdp:inputResource>treebank</cmdp:inputResource>
				           <cmdp:Schema>
                  <cmdp:schemaname>Helsinki Penn Treebank .mrg format</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
					             <cmdp:MimeType>text/xml</cmdp:MimeType>
				           </cmdp:MimeType>
            </cmdp:Input>
            <cmdp:Input>
				           <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
				           <cmdp:inputType>text</cmdp:inputType>
				           <cmdp:inputResource>settings</cmdp:inputResource>
				           <cmdp:Schema>
                  <cmdp:schemaname>Collins-Bikel properties file format</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
					             <cmdp:MimeType>text/plain</cmdp:MimeType>
				           </cmdp:MimeType>
            </cmdp:Input>
            <cmdp:Input>
				           <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
				           <cmdp:inputType>text</cmdp:inputType>
				           <cmdp:inputResource>corpus</cmdp:inputResource>
				           <cmdp:Schema>
                  <cmdp:schemaname/>
               </cmdp:Schema>
               <cmdp:MimeType>
					             <cmdp:MimeType>text/plain</cmdp:MimeType>
				           </cmdp:MimeType>
            </cmdp:Input>
            <cmdp:Input>
				           <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
				           <cmdp:inputType>text</cmdp:inputType>
				           <cmdp:inputResource>corpus</cmdp:inputResource>
				           <cmdp:Schema>
                  <cmdp:schemaname>Adelheid XML format</cmdp:schemaname>
               </cmdp:Schema>
               <cmdp:MimeType>
					             <cmdp:MimeType>text/plain</cmdp:MimeType>
				           </cmdp:MimeType>
            </cmdp:Input>
            <cmdp:Output>
				           <cmdp:outputType>binary</cmdp:outputType>
				           <cmdp:characterEncoding>UTF8</cmdp:characterEncoding>
				           <cmdp:outputResource>training result</cmdp:outputResource>
				
			         </cmdp:Output>
         </cmdp:SoftwareImplementation>
         <cmdp:Access>
            <cmdp:ResourceLicense>
               <cmdp:license>unknown</cmdp:license>
               <cmdp:distributionType>public</cmdp:distributionType>
               <cmdp:Price>
                  <cmdp:amount>0</cmdp:amount>
                  <cmdp:ISO4217>
                     <cmdp:iso-4217-currency>EUR</cmdp:iso-4217-currency>
                  </cmdp:ISO4217>
               </cmdp:Price>
            </cmdp:ResourceLicense>
            <cmdp:Contact>
               <cmdp:Person>Gertjan Postma</cmdp:Person>
               <cmdp:Email>gertjan.postma@meertens.knaw.nl</cmdp:Email>
               <cmdp:Organisation xml:lang="eng">Meertens Institute</cmdp:Organisation>
            </cmdp:Contact>
            <cmdp:Contact>
               <cmdp:Person>Marc Kemps-Snijders</cmdp:Person>
               <cmdp:Email>marc.kemps.snijders@meertens.knaw.nl</cmdp:Email>
               <cmdp:Organisation xml:lang="eng">Meertens Institute</cmdp:Organisation>
            </cmdp:Contact>
         </cmdp:Access>
         <cmdp:ResourceDocumentation>
            <cmdp:Documentation>
               <cmdp:title>Gebruikershandleiding en demonstratiescenario bij de INPOLDER web applicatie</cmdp:title>
               <cmdp:documentationTarget>user</cmdp:documentationTarget>
               <cmdp:url>http://dev.clarin.nl/sites/default/files/User%20manual-demonstatiescenarioINPOLDER.pdf</cmdp:url>
               <cmdp:ISO639>
                  <cmdp:iso-639-3-code>nld</cmdp:iso-639-3-code>
               </cmdp:ISO639>
            </cmdp:Documentation>
         </cmdp:ResourceDocumentation>
         <cmdp:SoftwareDevelopment>
            <cmdp:Project>
               <cmdp:name>INPOLDER</cmdp:name>
               <cmdp:title>INPOLDER: Integrated Parser and Lemmatizer Dutch in Retrospect</cmdp:title>
               <cmdp:funder>CLARIN-NL</cmdp:funder>
               <cmdp:url>http://portal.clarin.nl/node/1927</cmdp:url>
               <cmdp:Contact>
				              <cmdp:Person>Marc Kemps-Snijders</cmdp:Person>
				              <cmdp:Email>marc.kemps.snijders@meertens.knaw.nl</cmdp:Email>
               </cmdp:Contact>
               <cmdp:Duration/>
            </cmdp:Project>
            <cmdp:Creator>
               <cmdp:Contact>
				              <cmdp:Person>Prof. Dr. Ans van Kemenade</cmdp:Person>
				              <cmdp:Email>A.v.Kemenade@let.ru.nl</cmdp:Email>
				              <cmdp:Organisation xml:lang="eng">Radboud University</cmdp:Organisation>
               </cmdp:Contact>
            </cmdp:Creator>
         </cmdp:SoftwareDevelopment>
         <cmdp:TechnicalInfo>
            <cmdp:ImplementationLanguage>
               <cmdp:implementationLanguage>unknown</cmdp:implementationLanguage>
               <cmdp:version>unknown</cmdp:version>
            </cmdp:ImplementationLanguage>
         </cmdp:TechnicalInfo>
      </cmdp:ClarinSoftwareDescription>
   </cmd:Components>
</cmd:CMD>
Organisation:
  • Meertens Institute
  • Radboud University
  • Utrecht University

Resources:

Resource

text/xml