Skip to content

Commit

Permalink
Merge pull request #77 from koheiw/add-turkish
Browse files Browse the repository at this point in the history
Add turkish
  • Loading branch information
koheiw committed May 23, 2024
2 parents beffaaa + 5f089d3 commit f8e3003
Show file tree
Hide file tree
Showing 10 changed files with 355 additions and 7 deletions.
7 changes: 4 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: newsmap
Type: Package
Title: Semi-Supervised Model for Geographical Document Classification
Version: 0.8.4
Version: 0.9.0
Authors@R: c(person("Kohei", "Watanabe", email = "watanabe.kohei@gmail.com", role = c("aut", "cre", "cph")),
person("Stefan", "Müller", email = "mullers@tcd.ie", role = "aut"),
person("Dani", "Madrid-Morales", email = "dani.madrid@my.cityu.edu.hk", role = "aut"),
Expand All @@ -13,10 +13,11 @@ Authors@R: c(person("Kohei", "Watanabe", email = "watanabe.kohei@gmail.com", rol
person("Elad", "Segev", email = "eladseg@gmail.com", role = "aut"),
person("Dai", "Yamao", email = "daiyamao@scs.kyushu-u.ac.jp", role = "aut"),
person("Barbara Ellynes", "Zucchi Nobre Silva", email = "barbara@zucchi.science", role = "aut"),
person("Lanabi", "la Lova", email = "l.lalova@lse.ac.uk", role = "aut"))
person("Lanabi", "la Lova", email = "l.lalova@lse.ac.uk", role = "aut"),
person("Lungta", "Seki", email = "yahoo.co.jp0409@gmail.com", role = "aut"))
Maintainer: Kohei Watanabe <watanabe.kohei@gmail.com>
Description: Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>.
This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic Japanese and Chinese (Simplified and Traditional).
This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional).
License: MIT + file LICENSE
URL: https://github.com/koheiw/newsmap
BugReports: https://github.com/koheiw/newsmap/issues
Expand Down
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## Changes in v0.9.0

* Add Turkish seed dictionary
* Add `as.dictionary()` for `textmodel_newsmap`.

## Changens in v0.8.4

* Add `select` to `coef()` and improve its documentation
Expand Down
10 changes: 9 additions & 1 deletion R/data.R
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ NULL
#' @name data_dictionary_newsmap_ru
#' @docType data
#' @author Katerina Tertytchnaya \email{katerina.tertytchnaya@gmail.com}
#' @author Lanabi la Lova \email{S.Bilalova@lse.ac.uk}
#' @author Lanabi la Lova \email{l.lalova@lse.ac.uk}
#' @keywords data
NULL

Expand Down Expand Up @@ -79,6 +79,14 @@ NULL
#' @keywords data
NULL

#' Seed geographical dictionary in Turkish
#'
#' @name data_dictionary_newsmap_tr
#' @docType data
#' @author Lungta Seki \email{yahoo.co.jp0409@gmail.com}
#' @keywords data
NULL

#' Seed geographical dictionary in Chinese (simplified)
#'
#' @name data_dictionary_newsmap_zh_cn
Expand Down
Binary file added data/data_dictionary_newsmap_tr.RData
Binary file not shown.
3 changes: 3 additions & 0 deletions dict/import.R
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ save(data_dictionary_newsmap_he, file = 'data/data_dictionary_newsmap_he.RData')
data_dictionary_newsmap_ar <- dictionary(file = 'dict/arabic.yml')
save(data_dictionary_newsmap_ar, file = 'data/data_dictionary_newsmap_ar.RData')

data_dictionary_newsmap_tr <- dictionary(file = 'dict/turkish.yml')
save(data_dictionary_newsmap_tr, file = 'data/data_dictionary_newsmap_tr.RData')

data_dictionary_newsmap_zh_cn <- dictionary(file = 'dict/chinese_simplified.yml')
save(data_dictionary_newsmap_zh_cn, file = 'data/data_dictionary_newsmap_zh_cn.RData')

Expand Down
294 changes: 294 additions & 0 deletions dict/turkish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
# Newsmap Gegraphy Dictionary in Turkish
# Source: https://github.com/koheiw/newsmap
# Author: Lungta Seki

AFRICA:
EAST:
'BI': [Burundi, Burundili*, Bujumbura]
'DJ': [Cibuti, Cibutili*]
'ER': [Eritre, Eritreli*, Asmara]
'ET': [Etiyopya, Etiyopyalı*, Addis Ababa]
'KE': [Kenya, Kenyalı*, Nairobi]
'KM': [Komorlar, Komorlu*, Moroni]
'MG': [Madagaskar, Madagaskarlı*, Antananarivo]
'MU': [Mauritius, Mauritiuslu*, Port Louis]
'MW': [Malavi, Malavili*, Lilongwe]
'MZ': [Mozambik, Mozambikli*, Maputo]
'RE': [Reunion, Réunion, Réunionlu, Reunionlu*, Reunionnais]
'RW': [Ruanda, Ruandalı*, Kigali]
'SC': [Seyşeller, Seyşelli*]
'SO': [Somali, Somalili*, Mogadişu]
'TZ': [Tanzanya, Tanzanyalı*, Dodoma, Dar es Salaam]
'UG': [Uganda, Ugandalı*, Kampala]
'YT': [Mayotte, Mahorlu*, Mamoudzou]
'ZM': [Zambiya, Zambiyalı*, Lusaka]
'ZW': [Zimbabve, Zimbabveli*, Harare]

MIDDLE:
'AO': [Angola, Angolalı*, Luanda]
'CD': [Kongo Demokratik Cumhuriyeti, Kongo DC, KDC, Kongolu*, DR Kongolu*, Kinşasa]
'CF': [Orta Afrika Cumhuriyeti, OAC, Orta Afrikalı*, Bangui]
'CG': [Kongo, Kongo Cumhuriyeti, Kongolu*, Brazavil]
'CM': [Kamerun, Kamerunlu*, Yaounde, Yaoundé]
'GA': [Gabon, Gabonlu*, Librevil]
'GQ': [Ekvator Ginesi, Ekvator Ginesi Cumhuriyeti, Ekvator Ginesili*, Malabo]
'ST': [Sao Tome ve Princ, São Tomé ve Príncipe, São Tomé ve Príncipe, Sao Tome ve Princ*]
'TD': [Çad, Çadlı*, N'Djamena]

NORTH:
'DZ': [Cezayir, Cezayirli*, Cezayir]
'EG': [Mısır, Mısırlı*, Kahire]
'EH': [Batı Sahra, Sahralı*, El Aaiun]
'LY': [Libya, Libyalı*, Trablus]
'MA': [Fas, Faslı*, Rabat]
'SD': [Sudan, Sudanlı*, Hartum]
'SS': [Güney Sudan, G Sudan, G Sudanlı*, Güney Sudanlı*, Cuba]
'TN': [Tunus, Tunuslu*, Tunus]

SOUTH:
'BW': [Botsvana, Botsvanalı*, Gaborone]
'LS': [Lesoto, Lesotolu*, Maseru]
'NA': [Namibya, Namibyalı*, Vindhuk]
'SZ': [Esvatini, Svazili*, Lobamba, Mbabane]
'ZA': [Güney Afrika, G Afrika, GA, Güney Afrikalı*, G Afrikalı*, Cape Town, Johannesburg, Pretoria]

WEST:
'BF': [Burkina Faso, Burkinalı*, Vagadugu]
'BJ': [Benin, Beninli*, Porto Novo]
'CI': [Fildişi Sahili, Fildişi Sahili Cumhuriyeti, Côte d'Ivoire, Fildişi Sahilli*, Yamusukro, Abican]
'CV': [Yeşil Burun Adaları, Yeşilburunlu*, Praia]
'GH': [Gana, Ganalı*, Akra]
'GM': [Gambiya, Gambiyalı*, Banjul]
'GN': [Gine, Gineli*, Konakri]
'GW': [Gine-Bissau, Gine-Bissaulu*, Bissau]
'LR': [Liberya, Liberyalı*, Monrovia]
'ML': [Mali, Malili*, Bamako]
'MR': [Moritanya, Moritanyalı*, Nuakşot]
'NE': [Nijer, Nijerli*, Niamey]
'NG': [Nijerya, Nijeryalı*, Abuja, Lagos]
'SH': [Saint Helena, St Helena, Saint Helenalı*, St Helenalı*, Jamestown]
'SL': [Sierra Leone, Sierra Leoneli*, Freet]
'SN': [Senegal, Senegalli*, Dakar]
'TG': [Togo, Togolu*, Lome, Lomé]

AMERICA:
CARIB:
'AG': [Antigua ve Barbuda, Antigualı*, Barbudalı*]
'AI': [Anguilla, Anguillalı*, The Valley]
'AW': [Aruba, Arubalı*, Oranjstad]
'BB': [Barbados, Barbadoslu*, Bridgetown]
'BL': [Saint Barthelemy, Saint-Barthelemy, Saint-Barthélemy, St Barthelemy, Barthelemois, Gustavia]
'BQ': [Bonaire, Bonaireli*, Kralendijk]
'BS': [Bahamalar, Bahamalı*, Nassau]
'CU': [Küba, Kübalı*, Havana]
'CW': [Curaçao, Curaçaolu*, Willemstad]
'DM': [Dominika Milletler Topluluğu, Dominikalı*, Roseau]
'DO': [Dominik Cumhuriyeti, Dominikli*, Santo Domingo]
'GD': [Grenada, Grenadalı*, Saint George's, St George's]
'GP': [Guadeloupe, Guadelupeli*, Basse-Terre]
'HT': [Haiti, Haitili*, Port-au-Prince]
'JM': [Jamaika, Jamaikalı*, Kingston]
'KN': [Saint Kitts ve Nevis, St Kitts ve Nevis, Kittitian*, Nevisian*, Basseterre]
'KY': [Cayman Adaları, Caymanlı*, George Town]
'LC': [Saint Lucia, St Lucia, Saint Lucialı*, St Lucialı*, Castries]
'MF': [Saint Martin, St Martin, Saint Martinli*, St Martinli*, Marigot]
'MQ': [Martinik, Martinikli*]
'MS': [Montserrat, Montserratlı*, Brades]
'PR': [Porto Riko, Porto Rikolu*, San Juan]
'SX': [Sint Maarten, St Maarten, Sint Maartenli*, St Maartenli*, Philipsburg]
'TC': [Turks ve Caicos Adaları, Turks ve Caicoslu*, Cockburn Town]
'TT': [Trinidad ve Tobago, Trinidad, Trinidadlı*, Tobagolu*, Trinbagolu*, Port of Spain]
'VC': [Saint Vincent ve Grenadinler, St Vincent ve Grenadinler, Vincentli*, Kingstown]
'VG': [İngiliz Virgin Adaları, Virgin Adalı*, Road Town]
'VI': [Amerikan Virgin Adaları, ABD Virgin Adaları, Amerikan Virgin Adalı*, ABD Virgin Adalı*, Charlotte Amalie]

CENTER:
'BZ': [Belize, Belizeli*, Belmopan]
'CR': [Kosta Rika, Kosta Rikalı*, Ticos, San José]
'GT': [Guatemala, Guatemalalı*, Guatemala City]
'HN': [Honduras, Honduraslı*, Tegucigalpa]
'MX': [Meksika, Meksikalı*, Mexico City]
'NI': [Nikaragua, Nikaragualı*, Managua]
'PA': [Panama, Panamalı*, Panama City]
'SV': [El Salvador, Salvadorlu*, San Salvador]

SOUTH:
'AR': [Arjantin, Arjantinli*, Buenos Aires]
'BO': [Bolivya, Bolivyalı*, Sucre, La Paz]
'BR': [Brezilya, Brezilyalı*, Sao Paulo, Rio de Janeiro] #in English ver. Rio
'CL': [Şili, Şilili*, Santiago]
'CO': [Kolombiya, Kolombiyalı*, Bogotá]
'EC': [Ekvador, Ekvadorlu*, Quito]
'FK': [Falkland Adaları, Falklandlı*]
'GF': [Fransız Guyanası, Fransız Guyanalı*]
'GY': [Guyana, Guyanalı*]
'PE': [Peru, Perulu*, Lima]
'PY': [Paraguay, Paraguaylı*, Asunción]
'SR': [Surinam, Surinamlı*, Paramaribo]
'UY': [Uruguay, Uruguaylı*, Montevideo]
'VE': [Venezuela, Venezuelalı*, Caracas]

NORTH:
'BM': [Bermuda, Bermudalı*]
'CA': [Kanada, Kanadalı*, Ottawa, Toronto, Quebec]
'GL': [Grönland, Grönlandlı*, Nuuk]
'PM': [Saint Pierre ve Miquelon, St Pierre ve Miquelon, Saint Pierrais, Miquelonnais, Saint Pierre]
'US': [Amerika Birleşik Devletleri, ABD, Amerikalı*, Washington, New York]

ASIA:
CENTER:
'KG': [Kırgızistan, Kırgız*, Bişkek]
'KZ': [Kazakistan, Kazak*, Astana]
'TJ': [Tacikistan, Tacik*, Duşanbe]
'TM': [Türkmenistan, Türkmen*, Aşkabat]
'UZ': [Özbekistan, Özbek*, Taşkent]

EAST:
'CN': [Çin, Çinli*, Pekin, Şanghay]
'HK': [Hong Kong, Hong Konglu*]
'JP': [Japonya, Japon, Japonyalı*, Tokyo]
'KP': [Kuzey Kore, K Kore, Kuzey Koreli*, K Koreli*, KDHC, Pyongyang]
'KR': [Güney Kore, G Kore, Güney Koreli*, G Koreli*, Seul]
'MN': [Moğolistan, Moğolistanlı*, Moğol*, Ulan Batur]
'MO': [Makao, Makaolu*]
'TW': [Tayvan, Tayvanlı*, Taipei]

SOUTH:
'AF': [Afganistan, Afgan*, Kabil]
'BD': [Bangladeş, Bangladeşli*, Dakka]
'BT': [Butan, Butanlı*, Timbu]
'IN': [Hindistan, Hintli*, Mumbai, Yeni Delhi]
'IR': [İran, İranlı*, Tahran]
'LK': [Sri Lanka, Sri Lankalı*, Kolombo]
'MV': [Maldivler, Maldivli*]
'NP': [Nepal, Nepalli*, Katmandu]
'PK': [Pakistan, Pakistanlı*, İslamabad]

SOUTH-EAST:
'BN': [Brunei, Bruneili*]
'ID': [Endonezya, Endonezyalı*, Cakarta]
'KH': [Kamboçya, Kamboçyalı*, Phnom Penh]
'LA': [Laos, Laoslu*, Vientiane]
'MM': [Myanmar, Burma, Myanmarlı, Burmalı*, Yangon, Naypyitaw]
'MY': [Malezya, Malezyalı*, Kuala Lumpur, Putrajaya]
'PH': [Filipinler, Filipinli*, Manila]
'SG': [Singapur, Singapurlu*]
'TH': [Tayland, Taylandlı*, Bangkok]
'TL': [Doğu Timor, Timor Leste, Doğu Timorlu*, Dili]
'VN': [Vietnam, Vietnamlı*, Hanoi, Ho Chi Minh, Saigon]

WEST:
'AE': [Birleşik Arap Emirlikleri, BAE, Emirlikli*, Dubai, Abu Dabi]
'AM': [Ermenistan, Ermeni*, Erivan]
'AZ': [Azerbaycan, Azerbaycanlı*, Azeri*, Bakü]
'BH': [Bahreyn, Bahreynli*, Manama]
'CY': [Kıbrıs, Kuzey Kıbrıs, Kuzey Kıbrıs Türk Cumhuriyeti, KKTC, Güney Kıbrıs Rum Yönetimi, GKRY, Kıbrıslı*, Lefkoşa] # North Cyprus
'GE': [Gürcistan, Gürcü*, Tiflis]
'IL': [İsrail, İsrailli*, Kudüs]
'IQ': [Irak, Iraklı*, Bağdat]
'JO': [Ürdün, Ürdünlü*, Amman]
'KW': [Kuveyt, Kuveytli*, Kuveyt Şehri]
'LB': [Lübnan, Lübnanlı*, Beyrut]
'OM': [Umman, Ummanlı*, Maskat]
'PS': [Filistin, Filistinli*, Gazze Şehri, Gazze, Batı Şeria]
'QA': [Katar, Katarlı*, Doha]
'SA': [Suudi Arabistan, Suudi*, Riyad]
'SY': [Suriye, Suriyeli*, Şam]
'TR': [Türkiye, Türk*, Ankara, İstanbul]
'YE': [Yemen, Yemenli*, Sanaa]

EUROPE:
EAST:
'BG': [Bulgaristan, Bulgar*, Sofya]
'BY': [Belarus, Belaruslu*, Minsk]
'CZ': [Çek Cumhuriyeti, Çekya, Çek*, Prag] # Çekya
'HU': [Macaristan, Macar Cumhuriyeti, Macar*, Budapeşte]
'MD': [Moldova, Moldovyalı*, Kişinev]
'PL': [Polonya, Polonyalı*, Leh*, Varşova]
'RO': [Romanya, Romanyalı*, Bükreş]
'RU': [Rusya, Rus*, Moskova]
'SK': [Slovakya, Slovak*, Bratislava]
'UA': [Ukrayna, Ukraynalı*, Kiev, Kyiv]

NORTH:
'AX': [Åland Adaları, Ålandlı*, Mariehamn]
'DK': [Danimarka, Danimarkalı*, Kopenhag]
'EE': [Estonya, Estonyalı*, Tallinn]
'FI': [Finlandiya, Finlandiyalı*, Helsinki]
'FO': [Faroe Adaları, Faroeli*, Torshavn]
'GB': [Birleşik Krallık, BK, İngiltere, İngiliz, Britanya, Britanyalı*, İngiltereli*, Londra]
'GG': [Guernsey, Guernseylı*, Saint Peter Port, St Peter Port]
'IE': [İrlanda, İrlandalı*, Dublin]
'IM': [Manş Adaları, Manş Adalarılı*]
'IS': [İzlanda, İzlandalı*, Reykjavik]
'JE': [Kanal Adaları, Kanal Adalı*]
'LT': [Litvanya, Litvanyalı*, Vilnius]
'LV': [Letonya, Letonyalı*, Riga]
'NO': [Norveç, Norveçli*, Oslo]
'SE': [İsveç, İsveçli*, Stockholm]
'SJ': [Svalbard ve Jan Mayen Adaları]

SOUTH:
'AD': [Andorra, Andorralı*]
'AL': [Arnavutluk, Arnavut*, Tiran]
'BA': [Bosna, Bosnalı*, Bosna Hersek, Hersek, Saraybosna]
'ES': [İspanya, İspanyalı*, İspanyol*,Madrid, Barselona]
'GI': [Cebelitarık, Cebelitarıklı*, Llanitos]
'GR': [Yunanistan, Yunan, Yunanlı*, Yunanistanlı*, Atina]
'HR': [Hırvatistan, Hırvat*, Hırvatistanlı*, Zagreb]
'IT': [İtalya, İtalyan*, Roma]
'KV': [Kosova, Kosovalı*, Priştine]
'ME': [Karadağ, Karadağlı*, Podgorica]
'MK': [Makedonya, Kuzey Makedonya, Makedonyalı*, Üsküp] #Add North Macedonia
'MT': [Malta, Maltalı*, Valletta]
'PT': [Portekiz, Portekizli*, Lizbon]
'RS': [Sırbistan, Sırp*, Belgrad]
'SI': [Slovenya, Slovenyalı*, Ljubljana]
'SM': [San Marino, San Marinolu*]
'VA': [Vatikan, Vatikanlı*]

WEST:
'AT': [Avusturya, Avusturyalı*, Viyana]
'BE': [Belçika, Belçikalı*, Brüksel]
'CH': [İsviçre, İsviçreli*, Zürih, Bern]
'DE': [Almanya, Alman*, Berlin, Frankfurt]
'FR': [Fransa, Fransız*, Paris]
'LI': [Liechtenstein, Liechtensteinlı*, Vaduz]
'LU': [Lüksemburg, Lüksemburglu*]
'MC': [Monako, Monakolu*]
'NL': [Hollanda, Felemenk, Hollandalı*, Felemenkli*, Amsterdam]

OCEANIA:
AU-NZ:
'AU': [Avustralya, Avustralyalı*, Canberra, Sidney]
'CK': [Cook Adaları, Cook Adalı*, Avarua]
'NF': [Norfolk Adası, Norfolk Adalı*]
'NZ': [Yeni Zelanda, Yeni Zelandalı*, Wellington, Auckland]

MEL:
'FJ': [Fiji, Fijili*]
'NC': [Yeni Kaledonya, Yeni Kaledonyalı*, Noumea]
'PG': [Papua Yeni Gine, Papua Yeni Gineli*, Papualı*, Port Moresby]
'SB': [Solomon Adaları, Solomon Adalı*, Honiara]
'VU': [Vanuatu, Vanuatulu*, Port Vila]

MIC:
'FM': [Mikronezya, Mikronezyalı*, Palikir]
'GU': [Guam, Guamlı*, Hagåtña]
'KI': [Kiribati, Kiribatili*, Tarawa]
'MH': [Marshall Adaları, Marshall Adalı*, Marşallı*, Majuro]
'MP': [Kuzey Mariana Adaları, Kuzey Mariana Adalı*, Capital Hill]
'NR': [Nauru, Naurulu*, Yaren]
'PW': [Palau, Palaulu*, Melekeok]

POL:
'AS': [Amerikan Samoası, Amerikan Samoalı*, Pago Pago]
'NU': [Niue, Niueli*, Alofi]
'PF': [Fransız Polinezyası, Fransız Polinezyalı*, Papeete]
'PN': [Pitcairn Adaları, Pitcairn Adalı*, Adamstown]
'TK': [Tokelau, Tokelaulu*, Nukunonu]
'TO': [Tonga, Tongalı*, Nuku'alofa]
'TV': [Tuvalu, Tuvalulu*, Funafuti]
'WF': [Wallis ve Futuna Adaları, Wallis ve Futunalı*, Mata-Utu]
'WS': [Samoa, Samoalı*, Apia]

2 changes: 1 addition & 1 deletion man/data_dictionary_newsmap_ru.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 13 additions & 0 deletions man/data_dictionary_newsmap_tr.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit f8e3003

Please sign in to comment.