<!DOCTYPE html>
<!--
Copyright (C) 2025 twagoo

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
-->
<html>
    <head> 
        <title>CLARIN Tool Portal</title> 
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
        <meta name="viewport" content="width=device-width, initial-scale=1" />
        <link rel="icon" href="data:," />
        <link href="/webjars/font-awesome/7.2.0/css/fontawesome.css" rel="stylesheet" />
        <link href="/webjars/font-awesome/7.2.0/css/solid.css" rel="stylesheet" />
        <link href="/webjars/bootstrap/5.3.8/css/bootstrap.min.css" rel="stylesheet" />
        <link href="/css/style.css" rel="stylesheet" />
        <script src="/webjars/bootstrap/5.3.8/js/bootstrap.bundle.min.js" defer></script>
        <script src="/webjars/htmx.org/2.0.10/dist/htmx.min.js" defer></script>
        <script src="/webjars/alpinejs/3.15.11/dist/cdn.min.js" defer></script>
        

    </head>

    <body>
        <div class="container-xl">

            <nav class="navbar navbar-expand-lg bg-body-tertiary">
            <div class="container-fluid">
                <a class="navbar-brand" href="/">CLARIN Tool Portal <span class="fs-6 text-danger">prototype</span></a>
                <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
                    <span class="navbar-toggler-icon"></span>
                </button>
                <div class="collapse navbar-collapse" id="navbarSupportedContent">
                    <ul class="navbar-nav me-auto mb-2 mb-lg-0">
                        <li class="nav-item">
            <a class="nav-link active" 
               aria-current="page" 
               href="/search">Search</a>
        </li>
                        <!--                        <li th:replace="~{::top-navigation-item('contributors','/contributors',${current})}">
                                                    [Contributors]
                                                </li>-->
                        <li class="nav-item">
            <a class="nav-link" 
               href="/help">Help</a>
        </li>
                    </ul>
                </div>
            </div>
        </nav>

            <main>


                <div class="mt-3">
                    <form action="/search">
                        <div class="d-flex">
                            <div class="ms-auto w-50 p-2">
                                <input type="search" class="form-control"
                                       name="q" value="" />
                                
                            </div>
                            <div class="p-2">
                                <input type="submit" class="btn btn-primary"
                                       value="Search" />
                            </div>
                        </div>
                    </form>
                </div>

                <nav aria-label="breadcrumb">
                    <ol class="breadcrumb">
                        <li class="breadcrumb-item"><a href="/">Home</a></li>
                        <li class="breadcrumb-item"><a href="/search?q=&amp;fq=languageCode:code:fra">Search</a></li>
                        <li class="breadcrumb-item active" aria-current="page">Record</li>
                    </ol>
                </nav>

                <div class="container-md">
                    <div class="row mt-md-2">
                        <div class="col-md-8">
                            <h1>Ucto Tokeniser</h1>
                        </div>

                        <div class="col-md-4">
                            <div class="mt-2 mb-2">
                                
                            </div>
                        </div>
                    </div>

                    <div class="row">
                        <div class="col-md-8">

                            <div>

        <ul id="mainContentTabsNav" class="nav nav-underline border-bottom" 
            hx-boost="true"
            hx-target="#recordTabsContent"
            hx-swap="innerHTML show:none">

            <li class="nav-item">
        <a class="nav-link active"
           hx-select-oob="#mainContentTabsNav"
           aria-current="page"
           href="/records/CSD_32_Tools_47_UCTO-Tokenizer.cmdi.xml"
           hx-select="#recordOverview">Overview</a>



            <li class="nav-item">
        <a class="nav-link"
           hx-select-oob="#mainContentTabsNav"
           href="/records/CSD_32_Tools_47_UCTO-Tokenizer.cmdi.xml/metadata"
           hx-select="#recordAllMetadata">Detailed metadata</a>



            <li class="nav-item">
        <a class="nav-link"
           hx-select-oob="#mainContentTabsNav"
           href="/records/CSD_32_Tools_47_UCTO-Tokenizer.cmdi.xml/links"
           hx-select="#recordLinks">Links</a>


        </ul>

        <div id="recordTabsContent" class="pt-2">

            

            

            <div>
                <div class="pt-2" id="recordOverview">
                <div class="mb-2" >
                    <p>Ucto tokenizes text files: it separates words from punctuation, and splits sentences. This is one of the first tasks for almost any Natural Language Processing application. Ucto offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. The tokeniser engine is language independent. By supplying  language-specific tokenisation rules in an external configuration file a tokeniser can be created for a specific language. Ucto comes with tokenization rules for English, Dutch, French, Italian, and Swedish; it is easily extendible to other languages. It recognizes dates, times, units, currencies, abbreviations. It recognizes paired quote spans, sentences, and paragraphs. It produces UTF8 encoding and NFC output normalization, optionally accepts other encodings as input.  Optional conversion to all lowercase or uppercase. Ucto supports FoLiA XML.</p>
                </div>
                <div class="mb-2" >
                    <p>Ucto</p>
                </div>
            </div>
            </div>
        </div>
    </div>

                        </div>
                        <div class="col-md-4">

                            <div class="me-2">
                                <div class="mb-2">
                                    
                                </div>
                                <div class="mb-2">
                                    
                                        <strong>Organisation:</strong><br />
                                        <ul class="list-group list-group-flush">
                                            <li class="list-group-item">Utrecht University</li>
                                            <li class="list-group-item">Radboud University Nijmegen</li>
                                        </ul>
                                    
                                </div>
                                <div class="mb-2">
                                    
                                </div>
                            </div>

                            <div class="mt-2 mb-2">
                                <h2 class="h5">Resources:</h2>
                                <div class="card mb-2">
                                    <div class="card-body">
                                        <h3 class="h5 card-title">Resource</h3>
                                        <div class="h6 card-subtitle mb-2 text-body-secondary">application/pdf</div>
                                        
                                        <div>
                                            <a class="btn btn-outline-secondary btn-sm" href="https://webservices-lst.science.ru.nl/ucto/">
                                                <i class="fa-solid fa-cloud-arrow-down"></i>
                                                Access
                                            </a>
                                        </div>
                                    </div>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </main>

            
            <footer class="row row-cols-1 row-cols-md-3 py-4 my-5 text-bg-light">
                <div class="col mb-3">
                    <p class="text-body-secondary"><a href="/help">About</a></p>
                    <p class="text-body-secondary">vdevelop</p>
                </div>
                <div class="col mb-3 text-md-center">
                    <p>Service provided by <a href="https://www.clarin.eu">CLARIN</a></p>
                    <p class="text-body-secondary">
                        <span class="text-light footer-hidden-info">
                            built: 2026-05-10T20:06:23Z; revision: 22efec9
                        </span>
                    </p>
                </div>
                <div class="col mb-3 text-md-end">
                    <p><a href="mailto:toolportal@clarin.eu">Contact</a></p>
                </div>
            </footer>
            

        

        </div>
    </body>
</html>
