Mòideal:languages
Uses Lua: |
Related pages |
---|
(deasbaireachd⧼tpt-languages-separator⧽ ⧼tpt-languages-separator⧽eachdraidh⧼tpt-languages-separator⧽ceanglaichean⧼tpt-languages-separator⧽doc⧼tpt-languages-separator⧽bogsa-gainmhich⧼tpt-languages-separator⧽cùisean deuchainn)
This module is currently protected from editing. See the protection policy and protection log for more details. Please discuss any changes on the talk page; you may submit an edit request to ask an administrator to make an edit if it is uncontroversial or supported by consensus. You may also request that this page be unprotected. |
This module is subject to page protection. It is a highly visible module in use by a very large number of pages, or is substituted very frequently. Because vandalism or mistakes would affect many pages, and even trivial editing might cause substantial load on the servers, it is protected from editing. |
This module is used to retrieve and manage Wiktionary's various languages and the information associated with them. See {{enUicl|Languages]] for more information.
This module provides access to other modules. To access the information from within a template, see Mòideal:languages/templates.
The information itself is stored in the various data modules that are subpages of this module. They are listed in Roinn-seòrsa:Language data modules. These modules should not be used directly by any other module, the data should only be accessed through the functions provided by Mòideal:languages.
Finding and retrieving languages
The module exports a number of functions that are used to find languages.
getByCode
getByCode(code)
Finds the language whose code matches the one provided. If it exists, it returns a Language
object representing the language. Otherwise, it returns nil
.
getByCanonicalName
getByCanonicalName(name)
Finds the language whose canonical name (the name used to represent that language on Wiktionary) matches the one provided. If it exists, it returns a Language
object representing the language. Otherwise, it returns nil
. The canonical name of languages should always be unique (it is an error for two languages on Wiktionary to share the same canonical name), so this is guaranteed to give at most one result.
findByName
findByName(name, inexact)
- This function is expensive
Finds languages which have the provided name among their list of possible names (including their canonical name). It returns a table containing Language
objects for the languages found, or an empty table if none were found.
The inexact
parameter can be given as true
to perform a substring search of the name instead of an exact match. The result will then contain all languages that have the provided name as part of one of their possible names.
This function searches through the whole database of languages, and is therefore relatively resource-intensive. It should be used sparingly.
getAll
getAll()
- This function is expensive
Returns a table containing Language
objects for all languages, sorted by code.
This function searches through the whole database of languages, and is therefore relatively resource-intensive. It should be used sparingly.
Language objects
A Language
object is returned from one of the functions above. It is a Lua representation of a language and the data associated with it. It has a number of methods that can be called on it, using the :
syntax. For example:
local m_languages = require("Mòideal:languages")
local lang = m_languages.getByCode("fr")
local name = lang:getCanonicalName()
-- "name" will now be "French"
Language:getCode
:getCode()
Returns the language code of the language. Example: "fr"
for French.
Language:getCanonicalName
:getCanonicalName()
Returns the canonical name of the language. This is the name used to represent that language on Wiktionary, and is guaranteed to be unique to that language alone. Example: "French"
for French.
Language:getAllNames
:getAllNames()
Returns a table of all names that the language is known by, including the canonical name. The names are not guaranteed to be unique, sometimes more than one language is known by the same name. Example: {"French", "Modern French"}
for French.
Language:getType
:getType()
Returns the type of language, which can be "regular"
, "reconstructed"
or "appendix-constructed"
.
Language:getWikimediaLanguages
:getWikimediaLanguages()
Returns a table containing WikimediaLanguage
objects (see Mòideal:wikimedia languages), which represent languages and their codes as they are used in Wikimedia projects for interwiki linking and such. More than one object may be returned, as a single Wiktionary language may correspond to multiple Wikimedia languages. For example, Wiktionary's single code sh
(Serbo-Croatian) maps to four Wikimedia codes: sh
(Serbo-Croatian), bs
(Bosnian), hr
(Croatian) and sr
(Serbian).
The code for the Wikimedia language is retrieved from the wikimedia_codes
property in the data modules. If that property is not present, the code of the current language is used. If none of the available codes is actually a valid Wikimedia code, an empty table is returned.
Language:getScripts
:getScripts()
Returns a table of Script
objects for all scripts that the language is written in. See Mòideal:scripts.
Language:getFamily
:getFamily()
Returns a Family
object for the language family that the language belongs to. See Mòideal:families.
Language:getAncestors
:getAncestors()
Returns a table of Language
objects for all languages that this language is directly descended from. Generally this is only a single language, but creoles, pidgins and mixed languages can have multiple ancestors.
Language:getCategoryName
:getCategoryName()
Returns the name of the main category of that language. Example: "French language"
for French, whose category is at Roinn-seòrsa:French language.
Language:makeEntryName
:makeEntryName(term)
Converts the given term into the form used in the names of entries. This removes diacritical marks from the term if they are not considered part of the normal written form of the language, and which therefore are not permitted in page names. It also removes certain punctuation characters like final question marks or periods which are never present in page names. Example for Latin: "amō"
→ "amo"
(macron is removed).
The replacements made by this function are defined by the entry_name
setting for each language in the data modules.
Language:makeSortKey
:makeSortKey(term)
Creates a sort key for the given, following the rules appropriate for the language. This removes diacritical marks from the term if they are not considered significant for sorting, and may perform some other changes. Any initial hyphen is also removed, and anything parentheses is removed as well.
The replacements made by this function are defined by the sort_key
setting for each language in the data modules.
Language:transliterate
:transliterate(text, sc, module_override)
Transliterates the text from the given script into the Latin script (see {{enUicl|Transliteration and romanization]]). The language must have the translit_module
property for this to work; if it is not present, nil
is returned.
The sc
parameter is handled by the transliteration module, and how it is handled is specific to that module. Some transliteration modules may tolerate nil
as the script, others require it to be one of the possible scripts that the module can transliterate, and will show an error if it's not one of them. For this reason, the sc
parameter should always be provided when writing non-language-specific code.
The module_override
parameter is used to override the default module that is used to provide the transliteration. This is useful in cases where you need to demonstrate a particular module in use, but there is no default module yet, or you want to demonstrate an alternative version of a transliteration module before making it official. It should not be used in real modules or templates, only for testing. All uses of this parameter are tracked by Teamplaid:tracking/module_override.
Language:getRawData
:getRawData()
- This function is not for use in entries or other content pages.
Returns a blob of data about the language. The format of this blob is undocumented, and perhaps unstable; it's intended for things like the module's own unit-tests, which are "close friends" with the module and will be kept up-to-date as the format changes.
local export = {}
local Language = {}
function Language:getCode()
return self._code
end
function Language:getCanonicalName()
return self._rawData.canonicalName
end
-- Commented out; I don't think anything uses this, the presence/absence of script errors should confirm
--function Language:getAllNames()
-- return self._rawData.names
--end
function Language:getOtherNames()
return self._rawData.otherNames or {}
end
function Language:getType()
return self._rawData.type
end
function Language:getWikimediaLanguages()
if not self._wikimediaLanguageObjects then
local m_wikimedia_languages = require("Mòideal:wikimedia languages")
self._wikimediaLanguageObjects = {}
local wikimedia_codes = self._rawData.wikimedia_codes or {self._code}
for _, wlangcode in ipairs(wikimedia_codes) do
table.insert(self._wikimediaLanguageObjects, m_wikimedia_languages.getByCode(wlangcode))
end
end
return self._wikimediaLanguageObjects
end
function Language:getScripts()
if not self._scriptObjects then
local m_scripts = require("Mòideal:scripts")
self._scriptObjects = {}
for _, sc in ipairs(self._rawData.scripts) do
table.insert(self._scriptObjects, m_scripts.getByCode(sc))
end
end
return self._scriptObjects
end
function Language:getFamily()
if not self._familyObject then
self._familyObject = require("Mòideal:families").getByCode(self._rawData.family)
end
return self._familyObject
end
function Language:getAncestors()
if not self._ancestorObjects then
self._ancestorObjects = {}
for _, ancestor in ipairs(self._rawData.ancestors or {}) do
table.insert(self._ancestorObjects, export.getByCode(ancestor))
end
end
return self._ancestorObjects
end
function Language:getAncestorChain()
if not self._ancestorChain then
self._ancestorChain = {}
local step = #self:getAncestors() == 1 and self:getAncestors()[1] or nil
while step do
table.insert(self._ancestorChain, 1, step)
step = #step:getAncestors() == 1 and step:getAncestors()[1] or nil
end
end
return self._ancestorChain
end
function Language:getCategoryName()
local name = self._rawData.canonicalName
-- If the name already has "language" in it, don't add it.
if name:find("[Ll]anguage$") then
return name
else
return name .. " language"
end
end
function Language:makeEntryName(text)
text = mw.ustring.gsub(text, "^[¿¡]", "")
text = mw.ustring.gsub(text, "[؟?!;՛՜ ՞ ՟?!।॥။၊་།]$", "")
if self._rawData.entry_name then
for i, from in ipairs(self._rawData.entry_name.from) do
local to = self._rawData.entry_name.to[i] or ""
text = mw.ustring.gsub(text, from, to)
end
end
return text
end
function Language:makeSortKey(name)
name = mw.ustring.lower(name)
-- Remove initial hyphens and *
name = mw.ustring.gsub(name, "^[-־ـ*]+(.)",
"%1")
-- Remove anything in parentheses, as long as they are either preceded or followed by something
name = mw.ustring.gsub(name, "(.)%([^()]+%)", "%1")
name = mw.ustring.gsub(name, "%([^()]+%)(.)", "%1")
-- If there are language-specific rules to generate the key, use those
if self._rawData.sort_key then
for i, from in ipairs(self._rawData.sort_key.from) do
local to = self._rawData.sort_key.to[i] or ""
name = mw.ustring.gsub(name, from, to)
end
end
return mw.ustring.upper(name)
end
function Language:transliterate(text, sc, module_override)
if not ((module_override or self._rawData.translit_module) and text) then
return nil
end
if module_override then
require("Mòideal:debug").track("module_override")
end
return require("Mòideal:" .. (module_override or self._rawData.translit_module)).tr(text, self:getCode(), sc and sc:getCode() or nil)
end
function Language:toJSON()
local entryNamePatterns = nil
if self._rawData.entry_name then
entryNamePatterns = {}
for i, from in ipairs(self._rawData.entry_name.from) do
local to = self._rawData.entry_name.to[i] or ""
table.insert(entryNamePatterns, {from = from, to = to})
end
end
local ret = {
ancestors = self._rawData.ancestors,
canonicalName = self:getCanonicalName(),
categoryName = self:getCategoryName(),
code = self._code,
entryNamePatterns = entryNamePatterns,
family = self._rawData.family,
otherNames = self:getOtherNames(),
scripts = self._rawData.scripts,
type = self:getType(),
wikimediaLanguages = self._rawData.wikimedia_codes,
}
return require("Mòideal:JSON").toJSON(ret)
end
-- Do NOT use this method!
-- All uses should be pre-approved on the talk page!
function Language:getRawData()
return self._rawData
end
Language.__index = Language
local function getDataModuleName(code)
if code:find("^[a-z][a-z]$") then
return "languages/data2"
elseif code:find("^[a-z][a-z][a-z]$") then
local prefix = code:sub(1, 1)
return "languages/data3/" .. prefix
elseif code:find("^[a-z-]+$") then
return "languages/datax"
else
return nil
end
end
local function getRawLanguageData(code)
local modulename = getDataModuleName(code)
return modulename and mw.loadData("Mòideal:" .. modulename)[code] or nil
end
function export.makeObject(code, data)
return data and setmetatable({ _rawData = data, _code = code }, Language) or nil
end
function export.getByCode(code)
return export.makeObject(code, getRawLanguageData(code))
end
function export.getByCanonicalName(name)
local code = mw.loadData("Mòideal:languages/by name")[name]
if not code then
return nil
end
return export.makeObject(code, getRawLanguageData(code))
end
function export.iterateAll()
mw.incrementExpensiveFunctionCount()
local m_data = mw.loadData("Mòideal:languages/alldata")
local func, t, var = pairs(m_data)
return function()
local code, data = func(t, var)
return export.makeObject(code, data)
end
end
return export