aspects of the grammar of Griko

30
RESEARCH PROJECTS 2013 Documentation and analysis of an endangered language: aspects of the grammar of Griko Dr. Marika Lekakou, Assistant Professor of Linguistics, University of Ioannina Dr. Valeria Baldissera, University Ca’ Foscari of Venice Antonis Anastasopoulos, Electrical and Computer Engineering, National Technical University of Athens Dr. Sjef Barbiers, Professor of Dutch Variation Linguistics, Utrecht University & Senior researcher, Meertens Instituut (KNAW) December 2013

Transcript of aspects of the grammar of Griko

Page 1: aspects of the grammar of Griko

RESEARCH PROJECTS 2013

Documentation and analysis of an endangered language: aspects of the grammar of Griko

Dr. Marika Lekakou, Assistant Professor of Linguistics, University of

Ioannina

Dr. Valeria Baldissera, University Ca’ Foscari of Venice Antonis Anastasopoulos, Electrical and Computer Engineering,

National Technical University of Athens Dr. Sjef Barbiers, Professor of Dutch Variation Linguistics, Utrecht

University & Senior researcher, Meertens Instituut (KNAW)

December 2013    

Page 2: aspects of the grammar of Griko

2  

Project  Report    

Table  of  contents                   2    Project  summary  in  English               3  Project  summary  in  Greek               4    1.  Introduction  –  Objectives  of  the  project             5  2.  Team  members                   7  3.  Methodology                   7  3.1  Data  collection                   7  3.2  Data  transcription  and  enrichment           8  3.3  Data  storage  and  retrieval               10  4.  Results                     10  5.  Extensions  and  avenues  for  future  research     11  Acknowledgments     12  Selected  References     13    APPENDIX  A1  Transcription  Protocol     14  APPENDIX  A2  Part-­‐of-­‐Speech  Tagging  Protocol         18  APPENDIX  A3  Database  and  Website  Manual         24  

Page 3: aspects of the grammar of Griko

3  

PROJECT  SUMMARY    

Documentation  and  analysis  of  an  endangered  language:    aspects  of  the  grammar  of  Griko  

 The   project   aimed   at   collecting,   digitizing   and   analyzing   new   data   from  Griko,   the  Greek  dialect  spoken  in  Puglia,  Southern  Italy.  Griko  is  rare  among  Greek  dialects  in  retaining  the  infinitive  in  particular  syntactic  contexts.  In  all  Greek  varieties  spoken  in  Greece,   infinitives   have   been   replaced   by   finite   embedded   clauses.   Taking   the  infinitive  as  our  point  of  departure,  we  examined  aspects  of  the  grammar  of  Griko,  emphasizing  in  verbal  morphosyntax  (voice,  tense,  mood,  modality,  aspect)  and  the  structure  of  embedded  clauses.    During  our  fieldtrip  in  the  Greek-­‐speaking  villages  of  Puglia,  we  collected  new  data,  which   was   digitally   recorded,   transcribed   and   morphosyntactically   tagged.   The  enriched  data,  along  with  the  corresponding  sound  files,  have  been  made  available  in  an  on-­‐line  database,  where  users  can  perform  searches  according  to  parameters  such   as   location,   tag,   lemma,   gloss,   and   syntactic   variable.   Our   research   records  aspects   of   a   language   under   extinction,   emphasizing   on   the   often   overlooked  syntactic   level  of  grammar.  The  online  availability  of  enriched  data  ensures  optimal  access   of   our   material   to   the   international   linguistic   community.   The   theoretical  analysis  of  such  data  is  relevant  for  theoretical  syntax,  theories  of   language  change  through  language  contact,  as  well  as  the  diachronic  development  of  Greek.      

Page 4: aspects of the grammar of Griko

4  

ΠΕΡΙΛΗΨΗ  ΤΗΣ  ΜΕΛΕΤΗΣ    

Καταγραφή  και  ανάλυση  μιας  γλώσσας  υπό  εξαφάνιση:    το  γραμματικό  σύστημα  της  Γκρίκο  

 Σκοπός  της  μελέτης  μας  ήταν  η  συλλογή,  καταγραφή  και  ανάλυση  δεδομένων  από  την   Γκρίκο,   την   ελληνική   διάλεκτο   της   Απουλίας   στη   Νότια   Ιταλία.   Η   Γκρίκο  χρησιμοποιεί  σε  ορισμένες  περιπτώσεις  το  απαρέμφατο,  γραμματικό  τύπο  που  έχει  εκλείψει   από     τις   ελληνικές   διαλέκτους   του   ελλαδικού   χώρου   και   έχει  αντικατασταθεί  από  παρεμφατικές  εξαρτημένες  προτάσεις.  Με  αφετηρία  αυτή  την  ιδιότητα   της   Γκρίκο,   ερευνήσαμε   πτυχές   της   γραμματικής   της   με   έμφαση   στη  μορφοσύνταξη  του  ρήματος  (φωνή,  χρόνος,  έγκλιση,  τροπικότητα,  όψη)  και  τη  δομή  των  εξαρτημένων  προτάσεων.    Με   επιτόπια   έρευνα  στα   ελληνόφωνα   χωριά   της  Απουλίας  συλλέξαμε  πρωτότυπο  γλωσσικό   υλικό,   το   οποίο   μαγνητοφωνήθηκε   ψηφιακά,   μεταγράφηκε   και  επισημειώθηκε   μορφοσυντακτικά.   Το   συλλεχθέν   υλικό   διατίθεται   σε   ηλεκτρονική  βάση  δεδομένων,  στην  οποία  οι  χρήστες  είναι  σε  θέση  να  εκτελέσουν  αναζητήσεις  ανάλογα  με  διαφορετικές  παραμέτρους,  όπως  τοποθεσία,  λήμμα,  μορφοσυντακτική  κατηγορία,  συντακτική  μεταβλητή.  Η  έρευνά  μας  συμβάλλει  στην  καταγραφή  μιας  γλώσσας  υπό  εξαφάνιση,  και  συγκεκριμένα  στην  περιγραφή  του  συντακτικού  τομέα  της   γραμματικής,   που  συχνά  παραγνωρίζεται   σε   διαλεκτικές   έρευνες.   Επιπλέον,   η  επεξεργασία   και   ηλεκτρονική   διάθεση   των   δεδομένων   προσφέρει   διευρυμένες  δυνατότητας   αξιοποίησής   τους   από   τη   διεθνή   γλωσσολογική   κοινότητα.   Η  θεωρητική   ανάλυση   των   δεδομένων   μας   ενδιαφέρει   και   εμπλέκει   τη   σύγχρονη  συντακτική  θεωρία,  τη  θεωρία  γλωσσικής  αλλαγής  μέσω  γλωσσικής  επαφής  καθώς  και  τη  διαχρονική  εξέλιξη  της  ελληνικής  γλώσσας.      

Page 5: aspects of the grammar of Griko

5  

1.  Introduction  and  objectives  It  would  not  be  unfair  to  state  that  scientific  research  on  dialect  systems  is  currently  a   popular   and   productive   enterprise.   This   is   evidenced   by   the   large   number   of  scientific  projects,  recently  completed  or  currently  underway,  whose  explicit  focus  is  on  dialects.  Dialects  have  become  the  point  at  which  language  specialists  of  different  convictions   (theoretical   linguists,   traditional   dialectologists,   typologists,  sociolinguists,  as  well  as  historical  linguists)  join  forces  towards  shared  goals,  such  as:  - to  record  linguistic  variation  before  it  altogether  disappears;  - to  enhance  the  reliability  of  the  data  collected.    - to  broaden  the  empirical  basis  of  research;  - to   submit   under   scrutiny   theories   based   more   often   than   not   on   standard  

languages;  - to   illuminate   the   diachrony   of   particular   languages   (since   dialects   very   often  

preserve  phenomena  that  are  only  attested  in  previous  stages  of  the  (standard)  language);  

- to   shed   light   on   the   workings   of   contact-­‐induced   change   (since   dialects   are  practically  always  spoken  in  bi-­‐  or  multi-­‐lingual  communities).  

The   relevance   of   dialects   for   linguistic   theory,   and   especially   for   syntactic  theory,  has  been  recognized  at   least  since  Kayne’s  (1996)  explicit  parallelism  of  the  investigation   of   dialect   systems  with   an   experimental   setting:   the   study   of   closely  related   varieties,   which   differ   from   each   other   in   minimal   ways,   is   the   closest  theoretical  linguistics  can  get  to  a  controlled  experiment.  This  has  come  to  be  known  as  the  micro-­‐comparative  approach  to  linguistic  variation.  (For  an  illustration  of  the  positive   outcomes   of   the   interplay   between   theoretical   linguistics   and   dialect  syntactic  studies,  see  Koeneman  &  Lekakou  2006).    

The   adoption   of   the   micro-­‐comparative   approach   has   led   to   a   wealth   of  research   on   dialect   syntax   in   recent   years.   At   the   European   level,   a   number   of  projects,   both   large-­‐   and   small-­‐scale,   especially   dedicated   to   the   study   of   the  syntactic   properties   of   dialects,   have   been   carried   out   (for   an   indication,   see:  http://www.dialectsyntax.org/wiki).   Within   Greece,   however,   dialect   syntactic  variation   has   only   been   recorded   via   sporadic   and   individual-­‐based   research,   and  digital  access  to  the  data  has  not  been  ensured.  In  traditional  dialectological  studies,  syntactic  description  is  either  lacking  or  not  theoretically  informed  (Tzitzilis  2000).     It  is  from  this  perspective  that  we  approach  Griko,  the  Greek  dialect  spoken  in  Puglia   (province   of   Lecce),   Southern   Italy,   in   the   area   known   as   Grecìa   Salentina.  Officially,   Grecìa   Salentina   consists   of   12   villages   (Calimera,   Carpignano   Salentino,  Castrignano   dei   Greci,   Corigliano   d'Otranto,   Cutrofiano,   Martano,   Martignano,  Melpignano,   Sogliano   Cavour,   Soleto,   Sternatia   and   Zollino).   In   reality,   only   in   a  subset  of   these  villages   is  Griko  spoken  actively   today.  Moreover,   the  speakers  are  mainly   advanced   in   age.   There   is   currently   no   reliable   estimate   available   of   the  number   of   active   Griko-­‐speakers.   According   to   the   Unesco   Atlas   of   the   World’s  

Page 6: aspects of the grammar of Griko

6  

Languages   in   Danger   (Moseley   2010),   Griko   is   facing   severe   danger   of   extinction  (http://www.unesco.org/culture/languages-­‐atlas/en/atlasmap.html).       Its   status   as   a   severely   endangered   language   is   not   the  only   reason   to   study  Griko.   For   centuries,   Grik   has   been   spoken   alongside   the   local   Romance   dialect,  Salentino,   as   well   as,   more   recently,   Standard   Italian   (or   the   regional   version  thereof),  in  a  «complex  linguistic  situation  of  diglossia  with  expanding  bilingualism»  (Ledgeway  2013:2).  Griko  is  thus  important  not  only  in  illuminating  the  diachrony  of  Greek,   but   as   a   potential  window   into   the  workings   of   contact-­‐induced   change   as  well.       Possibly   the   most   exotic   syntactic   property   Griko   exhibits   is   that,   almost  uniquely   among  Modern   Greek   varieties,   it   retains   the   infinitive   (cf.   Joseph   1983;  Mackridge  1987  on  Romeyka  of  Pontus;   and  Katsoyiannou  1995  on  Grecanico).  All  Modern  Greek  varieties  spoken  within  Greece  have  lost  the  infinitive  and  replaced  it  with   embedded   finite   clauses.   The   following   examples   from   Baldissera   (2013)  illustrate   the  current  distribution  of   infinitives   in  Griko,  namely  as  complements   to  the  modal  verb  sodzo  ‘can’.      (1)   a.   Poa  sodzi  piai  ta  pipogna?           when  can-­‐2SG  take.INF  the  melons       ‘When  can  you  take  the  melons?’  

b.   Ta  sodzo  piai  simmeri.      them  can-­‐1SG  take.INF  today     ‘I  can  take  them  today.’  

 The  data  in  (1)  in  fact  show  not  only  retention  of  the  infinitive  in  Griko,  but  also  the  commonly   co-­‐occurring   phenomenon   of   clitic   climbing   (Terzi   1996),   i.e.   the  placement  of   the  object  clitic  belonging  to   the   infinitival  clause  close  to   the  matrix  verb.  This  is  not  possible  in  Standard  Modern  Greek  (SMG),  for  instance,  where  the  sentence   in   (1b)  would   involve   a  na-­‐clause   as   the   complement   to   the  modal  boro  and   no   clitic   climbing.   We   thus   see   that   the   existence   of   infinitive   in   a   language  correlates   with   other   morphosyntactic   properties,   an   example   of   which   is   clitic  climbing.  

In   this   project,   we   undertook   empirical   and   theoretical   research   on   the  morphosyntactic   properties   of  Griko,  with   the   aim  of   collecting   syntactic   data   (i.e.  sentences   in   Griko),   analyzing   them   theoretically,   and   making   them   available  electronically  for  the  purposes  of  future  (micro-­‐comparative)  research.  We  thus  had  two  major  objectives,  one  involving  data  collection  and  analysis,  and  one  concerning  data  enrichment  and  storage.  

Regarding  the  first  objective,  by  taking  the  infinitive  as  our  point  of  departure,  the   project   focused   on   a   central   aspect   of   the   grammar   of   Griko,   namely   the  

Page 7: aspects of the grammar of Griko

7  

morphosyntax  of   the  verb   in  main  and  embedded  clauses.  The  examination  served  the  following  goals:  1. to  record  the  distribution  of  infinitives,  na-­‐clauses  (subjunctives),  and  other  types  

of  embedded  clauses;  2. to  provide  a  description  of  the  dimensions  of  voice,  tense,  aspect  and  modality,  

focusing  in  particular  on  the  following  phenomena:  a. the  structure  of  subjunctive  and  imperative  clauses  b. the  three-­‐way  voice  distinction  (active,  passive,  reflexive)  c. the  split-­‐auxiliary  selection  system  (based  on  person)  in  compound  tenses    d. the  encoding  of  futurity  (periphrastic  and  not)  e. the  encoding  of  aspectual  distinctions  (periphrastic  and  not)  f. properties  of  modal  verbs  (e.g.  non-­‐volitional  ‘want’)  

3. to  provide  an  analysis  of  the  syntactic  status  of  the  subjunctive  marker  na  and  of  the  dependent  verbal  form  it  embeds  (see  section  4).  The  second  objective  of  our  project  was  to  make  available  the  empirical  results  

of  our  research  to  the  wider  linguistic  community,  by  digitizing  the  data  and  storing  them  in  a  searchable  online  database.   In  other  words,  our  second  objective  was  to  contribute   to   dialect   syntactic   research   in   Greece   in   terms   of   infrastructure,   by  initiating  a  way  to  annotate  and  store  empirical  data  which  is  a  widely  used  in  similar  research  endeavours  outside  of  Greece.  To  this  end,  our  team  included  international  partners,   whose   input   ensured   that   future   integration   of   the   database   within   the  larger   European   family,   if   that   would   be   desired,   would   be   possible.   We  implemented   this   long-­‐term   goal   by   aligning   our   methodology   (in   terms   of   data  collection,  enrichment,  storage  and  retrieval)  with  the  one  used  in  large-­‐scale  dialect  syntactic  projects   in   Europe.   This   enhances   the   comparability  of  our  data  with   the  dialect   syntactic   data   already   available   in   corpora   unified   via   the   Edisyn   search  engine  (http://www.dialectsyntax.org/wiki/Edisyn_search_engine).    2.  Team  members    The   co-­‐ordinator   of   the   project   was   Dr.   Marika   Lekakou,   Assistant   Professor   of  Linguistics   (University   of   Ioannina).   The   team   included   the   following  members:  Dr.  Valeria   Baldissera     (University   Ca’   Foscari   of   Venice),   Antonis   Anastasopoulos  (Electrical   and  Computer  Engineering,  National  Technical  University  of  Athens)  and  Prof.  Dr.  Sjef  Barbiers,  Professor  of  Dutch  Variation  Linguistics  (Utrecht  University)  &  Senior  researcher  (Meertens  Instituut,  KNAW).    3.  Methodology  3.1  Data  collection  We  collected  data  via  oral   interviews   conducted   in  May  and  August  2013.   In  May,  members  of  our  team  visited  the  four  villages  in  the  Greek-­‐speaking  area  that  all  of  our  contacts  within  and  outside  of  Greece  (Professor  Ralli  at  the  University  of  Patras,  

Page 8: aspects of the grammar of Griko

8  

Professor  Katsoyiannou  at   the  University  of  Cyprus  and  Professor  Bernardini  at   the  Universit   of   Lecce)   pointed   out   to   us   as   extant   Griko   enclaves,   namely   Calimera,  Corigliano   d’Otranto,   Martano   and   Sternatia.   In   August,   Valeria   Baldissera   also  conducted   a   follow-­‐up   interview   in   Corigliano   d’Otranto   with   the   informant  consulted  in  May.  In  Calimera  and  Sternatia,  the  informants  belonged  to  a  superset  of   those   that   Valeria   Baldissera   had   interviewed   for   the   empirical   research   of   her  PhD  thesis   (Baldissera  2013).  The   informants  already  acquainted  with  Dr  Baldissera  (our  main   contacts   in  Grecìa   Salentina)   liaised  us  with  more  members  of   the   local  communities.  Eventually,  the  interviews  conducted  in  these  locations  (Calimera  and  Sternatia)   involved  more  speakers  than  those  conducted  in  Martano  and  Corigliano  d’Otranto,  where  we  met  our   informants   for   the   first   time  upon  arrival.  There  was  thus  always  at   least  one   informant  per   location  and   in  some  cases  more  than  one.  Not  all  speakers  participated  in  the  inteviews  in  equal  measure.  We  interviewed  ten  informants   in   total,   4   female   and   6  male.  With   the   exception   of   the   informant   in  Corigliano  who  is  younger,  all  of  our  informants  are  aged  over  60.       Our  elicitation  method  involved  a  translation  task  (from  Standard  Italian)  and,  as  follow-­‐up  questions,  grammaticality   judgments  of  sentences  in  Griko.  As  Cornips  &   Poletto   (2004)   discuss,   these   methodological   choices   have   been   successfully  employed   in   other   dialect   syntactic   projects,   such   as   SAND   (Syntactic   Atlas   of   the  Dutch   Dialects)   and   ASIS   (Syntactic   Atlas   of   Italian   Dialects).   We   were   able   to  construct   Griko   versions   of   all   our   test   sentences   in   advance   of   our   fieldtrip,   by  sending   an   electronic   copy   of   our   questionnaire   to   our   contact   in   Sternatia.   This  enabled   us   to   have   a   first   idea   of   what   to   expect   in   the   oral   interviews,   and   to  calibrate   our   follow-­‐up   questions   accordingly.   In   total,   the   questionnaire   we  administered   contained   78   test   sentences.   In   October   and   November,   Valeria  Baldissera   conducted   telephone   interviews  with  one  of   the   speakers   to  make   final  confirmations  and  to  ask  follow-­‐up  questions  needed  for  our  theoretical  analysis.    

Because   the   speakers   interviewed   in   2013   are   a   superset   of   the   speakers  interviewed   in   2011   by   Valeria   Baldissera,   and   since   the   methodology   of   data  collection  was  homogeneous,  we  decided  to  also  include  parts  of  the  data  published  in   Baldissera   (2013)   in   the   online   corpus.   Both   sets   of   data   have  been   transcribed  and  annotated  in  the  same  way  (see  immediately  below),  the  difference  being  that  the  data  from  Baldissera  (2013)  systematically  lack  a  corresponding  sound  file.  In  the  corpus   they  bear  a  diacritic,   so   that  users  will  be  able   to  distinguish   them  and  cite  them  accordingly.    3.2  Data  transcription  and  enrichment  The   data   recorded   during   the   fieldtrips   of   May   and   August   were   transcribed   and  enriched  using  the  free  program  PRAAT  (http://www.fon.hum.uva.nl/praat/).  PRAAT  is  a  program  commonly  used  for  the  transcription  of  audio  files  for  the  purposes  of  phonetic   research.   We   used   PRAAT   for   the   transcription   of   our   material,   which  

Page 9: aspects of the grammar of Griko

9  

involved   entire   sentences.   In   addition,   Part-­‐of-­‐Speech   (PoS)   tagging,   glossing   (in  Italian)  and  lemmatization  (in  Griko)  was  also  provided,  within  separate  PRAAT  tiers  (see   «Transcription   Protocol»   and   «Part   of   Speech   Tagging   Protocol»   in   the  Appendix).   The   assignment   of   PoS   tag,   lemma   and   gloss   was   done  manually.   This  kind  of  information  makes  the  data  much  more  accessible  to  the  database  users  and  enables  advanced  search  possibilities.  We  have  also  assigned  to  each  test  sentence  one  or  more  syntactic  keywords,  so  that  searching  by  syntactic  variable  will  also  be  an   option.   This   kind   of   search   is  most   interesting   for   those  with   little   idea   of   the  grammatical  properties  of  Griko  more  closely  investigated  in  our  project.      Aspectual  periphrasis  Aspectual  verb  By-­‐phrase  Causative  verb  Compound  tense  Conditional  clause  Concessive  clause  Clitic  Clitic  climbing  Clitic  doubling  Declarative  complement  Dative  argument  Factive  complement  Focus  Future  Habituality  Imperative  Infinitival  complement  Intensional  verb  Modal  periphrasis  Modal  verb  Negation  Non-­‐volitional  want  Passive  verb  Perception  verb  Purpose  clause  Raising  verb  Reason  clause  Reflexive  verb  Subjunctive  complement  Temporal  clause  Wh-­‐complement  Wh-­‐question    Table  1:  List  of  syntactic  keywords  instantiated  in  the  Griko  corpus  

 

Page 10: aspects of the grammar of Griko

10  

The  transcription  protocol  that  we  developed  for  Griko  was  based  on  existing  practices  for  writing  the  language  within  the  community;  unlike  other  Modern  Greek  dialects,   Griko   has   some   tradition   of   written   texts.   We   thus   decided   to   forego   a  phonetic   or   phonological   transcription   of   our   data,   as   this   would   seem   foreign   to  members  of  the  Griko  community,  who  we  hope  will  also  be  interested  in  the  results  of   our   fieldtrip.   The   conventions   used   for   the   orthographic   transcription   are  explicated  in  the  Transcription  Protocol  (Appendix  A1).  

For   the   morphosyntacic   annotation,   we   developed   a   PoS   tagging   protocol  especially   for   Griko   (see   the   Part   of   Speech   Tagging   Protocol,   Appendix   A2).   We  relied  on  the  guidelines  of  EAGLES  (Expert  Advisory  Group  on  Language  Engineering  Standards)   (http://www.ilc.cnr.it/EAGLES96/annotate/annotate.html),   and   on   tools  developed   for   the   purposes   of   the   Edisyn   search   engine  (http://www.dialectsyntax.org/wiki/Edisyn_search_engine).   Future   incorporation   of  our   data   within   the   Edisyn   family   of   dialect   syntactic   corpora   relies   on   database  interoperability.   In   terms   of   PoS   tagging,   a   mapping   will   need   to   be   provided  between   the   tagset   developed   for   Griko   and   the   Edisyn   tagset.   This   will   be  undertaken  in  the  future.      3.3  Data  storage  and  retrieval    The  transcribed  data  have  been  stored   in  a  MySQL   (relational)  database,  hosted   in  the  same  server  as  the  project  website  (http://griko.project.uoi.gr/).    The  audio  files,  wherever   available,   are   also   stored   in   the   server,   in   a   WAV   format.   The  transcriptions,  tags,  lemmas  and  glosses  are  stored  in  MySQL  tables.    

The   website   provides   an   interface   for   queries   to   the   database,   with   various  parameters.   It   also  enables   a   selection  of   the   results   to  be   shown.   The   results   are  automatically   exported   to   html   format.   In   addition,   the   audio   files   are   accessed  through  a  simple  interface,  which  automatically  selects  the  player  that  each  browser  supports,  in  order  to  avoid  compatibility  issues.  

Since   the   project   focuses   on   the   syntactic   aspect   of   the   Griko   language,   the  database   is   also   constructed   accordingly.   The   PoS   tags   have   different   features,  depending  on  the  category,  so  they  are  stored  on  different  SQL  tables,  to  optimize  performance.  Integrity  is  ensured  using  foreign  keys.  

The   transcribed   data   (.TextGrid   files,   as   resulted   from   Praat)   were   parsed,  checked   and   stored   using   Python   and   the   InnoDB   storage   engine   for   SQL,   which  supports  foreign  keys  constraints.  

For  details  of  this  aspect  of  the  project,  see  Appendix  A3.      4.    Results  In  this  project,  we  pursued  two  goals:  to  collect  and  analyze  data  that  pertain  to  the  level  of  syntax,  and  to  make  the  data  widely  available.  Regarding  the  first  objective,  we   have   collected   new,   theoretically   informed   data,   which  will   guide   research   on  

Page 11: aspects of the grammar of Griko

11  

topics   related   to  Griko   verbal  morphosyntax   and   clause   structure,   as  well   as   raise  new   empirical   questions.   Regarding   the   second   objective,   the   data   collected   have  been  transcribed,  annotated  and  stored  in  a  searchable  online  database.  The  project  not  only  preserves  significant  aspects  of  our  cultural  heritage,  in  danger  of  becoming  forever   lost.   It   also  brings  dialect   research   in  Greece   in   line  with  dialect   reasearch  carried  out   in   the  majority  of     European   countries,  where   (a)   syntactic   variation   is  intensely   researched   and   (b)   available   technological   advances   in   data   storage   and  retrieval  are  exploited  for  the  benefit  of  the  scientific  community.    

We   have   already   presented   some   of   our   empirical   findings   along   with   our  theoretical  analysis  in  the  following  two  workshops:  - Workshop   on   Language   Contact   in   the   Light   of   Modern   Greek   Morphological  

Variation,   11th   International   Conference   of   Greek   Linguistics,   University   of   the  Aegean,  Rhodes  26-­‐29  September  2013.  

- Workshop  on  Balkan  –  Romance  Contact,  University  Ca’Foscary  of  Venice,  26-­‐27  November  2013.  In   these   oral   presentations,   which   will   result   in   two   peer-­‐reviewed  

publications,  we   have   provided   syntactic   arguments   for   the   claim   that   contrary   to  Standard  Modern   Greek   (Holton   et   al.   1997),   Griko   encodes   subjunctive   mood   in  verbal  morphology.  We  have  sought   to  detect  contact  with  Romance  as  a  possible  cause  of  this  microvariation.  The  results  of  our  research  are  thus  directly  relevant  for  issues   such:   the   effects   of   contact   between   Italo-­‐Greek   and   Italo-­‐Romance,   the  diachronic  development  and  origins  of  Griko  (Rohlfs  1950;  Profili  1983;  Manolessou  2005),   as  well   as   the   diachronic   development   of   Greek  more   in   general.   In   future  work,  we  will   turn   to   the   synchronic   comparison  of  Griko  and  Romeyka  of  Pontus,  which  too  retains  the  infinitive  (albeit  in  slightly  different  syntactic  contexts).    5.  Extensions  and  avenues  for  future  research    This   is   the   first   Greek   dialect   study   to   focus   exclusively   on   a   cluster   of  (morpho)syntactic  properties  and  its  repercussions  for  the  overall   linguistic  system,  and   also   to   provide   access   to   a   corpus   of   transcribed   and   morphosyntactically  annotated  sentences.  A  number  of  actions  can  be  undertaken  in  the  future,  in  order  to  maximize  the  long-­‐term  effects  of  this  project.    

Regarding   the  main   deliberable   of   the   project,   a   number   of  minor   additions  will  be  made  in  the  future.  We  aim  to  provide  English  glosses,  so  as  to  make  the  data  even  more  easily   accessible   to   linguists  who  don’t   speak   Italian   (or  Griko).  We  are  also   currently   compiling   an   updated   bibliography   on   Griko,   which   will   be   added  presently   to   the  website.     Finally,   in   the   interest  of   international   collaboration  and  visibility,   and   given   the   standardized   methodology   employed   in   this   project,   we  aspire  to  explore  the  possibility  of  allowing  our  database  to  be  linked  to  the  Edisyn  search  engine.  

Page 12: aspects of the grammar of Griko

12  

We  also  hope  it  will  be  possible  to  expand  the  corpus  by  importing  additional  data   from   Griko,   to   be   collected   in   the   near   future   through   new   rounds   of   data  collection,  by  us  or  by  others  using   similar  methodology.  Another  extension  of   the  corpus  would  involve  incorporation  of  data  from  dialects  other  than  Griko,  collected  and   enriched   with   the   use   of   comparable   methodology.   In   this   way,   the  infrastructure  work  undertaken  for  the  purposes  of  this  project  will  have  served  the  purpose   of   a   pilot   study,  making   it   easier   in   the   future   to   undertake   theoretically  informed  and  technologically  up-­‐to-­‐date  dialect  syntactic  research.        Acknowledgments  We  are  extremely  grateful  to  the  John  S.  Latsis  Foundation  and  to  the  members  of  the  Griko  communities  who  took  part  in  our  research;  without  the  financial  support  of  the  former  and  the  enthusiastic  participation  of  the  latter  this  research  would  not  have  been  possible.  For  their  help  with  informants  and  the  data  collection  process,  we  are  extremely  thankful  to  Isabella  Bernardini,  Carmine  Greco,  Luigi  Tommasi  and    Giuseppe   De   Pascalis;   for   her   continuous   help   with   the   data,   we   thank   Adriana  Spagnolo.  For  her  help  with  PRAAT,  we  are  grateful  to  Cinzia  Avesani  and  especially  to   Evia   Kainada.   Finally,   for   their   support   and/or   advice   in   various   stages   of   the  project   we   thank   Marianna   Katsoyiannou,   Jan   Pieter   Kunst,     Josep   Quer,   Ioanna  Sitaridou,  Angeliki  Ralli,  and  Arhonto  Terzi.            

Page 13: aspects of the grammar of Griko

13  

Selected  References  Baldissera,   V.   2013.   Il   dialetto   grico   del   Salento:   elementi   balcanici   e   contatto  linguistico.   [The   Griko   dialect   of   Salento:   Balkan   features   and   linguistic   contact.]  Doctoral  Dissertation,  University  Ca’  Foscari  of  Venice.  Cornips,   L.   &   C.   Poletto   2004.   On   standardizing   syntactic   elicitation   techniques.  Lingua  115.7:  939-­‐957.    Holton   D.,   P.   Mackridge   &   I.   Philippaki-­‐Warburton.   1997.   Greek:   A  Comprehensive  Grammar  of  the  Modern  Language.  London:  Routledge.  Joseph,   Brian.   1983/2009.   The   synchrony   and   diachrony   of   the   Balkan   infinitive.   A  study   in   areal,   general,   and   historical   linguistics.   Cambridge:   Cambridge  University  Press.      Katsoyannou   M.   1995.     Le   Parler   Greco   de   Galliciano   (Italie):   Description   d’une  Langue  en  Voie  de  Disparition.  Doctoral  Dissertation.  University  of  Paris  VII.    Kayne,  R.  1996.  Microparametric  syntax:  some   introductory  remarks.   In   J.R.Black  &  V.  Motapanyane   (eds.),  Microparametric   syntax   and   dialect   variation.   Amsterdam:  John  Benjamins.  9-­‐18.  Koeneman   O.   &  M.   Lekakou.   2006.   The   role   of   syntactic   theory   in   the   SAND   and  EDiSyn  projects.  Ms.,  Meertens  Institute.    Ledgeway,   A.   2013.   Greek   disguised   as   Romance?   The   case   of   Southern   Italy.  Ms.  Cambridge  University.   To   appear   in  Proceedings  of  5th   International   Conference  on  Modern  Greek  Dialects  and  Linguistics.    Mackridge,  P.  1987.  Greek-­‐speaking  Moslems  of  North-­‐East  Turkey:  Prolegomena  to  Study  of  the  Ophitic  Sub-­‐Dialect  of  Pontic.  Byzantine  and  Modern  Greek  Studies  11:  115–137.  Manolessou,  I.  2005.  The  Greek  dialects  of  southern  Italy:  An  overview.  ΚΑΜΠΟΣ  13:  103-­‐35.  Moseley,  Christopher  (ed.).  20103  .  Atlas  of  the  World’s  Languages  in  Danger.  Paris,  UNESCO    Publishing.  http://www.unesco.org/culture/en/endangeredlanguages/atlas  Profili,  O.  1983.  Le  parler  grico  de  Corigliano  d'Otranto.  Phénomènes  d'interférence  entre   ce   parler   grec   et   les   parlers   romans   environnants,   ainsi   qu'avec   l'italien.  Doctoral  Dissertation,  Université  des  Langues  et  Lettres,  Grenoble.  Rohlfs,   G.   1950.  Historische   Grammatik   der   unteritalienischen   Grazitat,    Munchen:  Bayerischen  Akademie  der  Wissenschaften.    Terzi,  A.  1996.  Clitic  climbing  out  of   finite  clauses  and  tense  raising.  Probus  8:  273-­‐295.  Τζιτζιλής,   Χ.   2000.   Νεοελληνικές   διάλεκτοι   και   νεοελληνική   διαλεκτολογία.   Στο  Χριστίδης,   Α.-­‐Φ.   (επιμ.),   Η   ελληνική   γλώσσα   και   οι   διάλεκτοί   της.   15-­‐22.   Αθήνα:  ΥΠΕΠΘ  &  Κέντρο  Ελληνικής  Γλώσσας.      

Page 14: aspects of the grammar of Griko

14  

Appendix  A1:  Protocol  for  Transcription  of  Griko    

1.  Introduction  This   document   is   the  manual   used   for   transcribing  of  Griko   audio   files.   It   contains  information  on  the  conventions  employed  in  the  transcription  of  the  sound  files,  as  well  as  information  on  how  data  enrichment  more  in  general  was  carried  out.    

Transcription   was   carried   out   manually.   In   addition,   each   word   in   the   corpus   is  assigned   a   Part-­‐of-­‐Speech   tag,   a   lemma,   and   a   translation   into   Italian.   This  information  is  provided  in  separate  tiers:  a. Text  Tier:  contains  the  transcription  of  the  sound  file.  b. Part-­‐of-­‐Speech  (PoS)  Tag  Tier:  contains  the  Part  of  Speech  Tag  of  each  item  in  the  

Text  Tier  (see  “Protocol  for  Part-­‐of-­‐Speech  tagging  of  Griko”).  c. Italian   Gloss:   contains   a   word-­‐for-­‐word   translation   of   the   Griko   sentence   in  

Italian.  d. Griko  Lemma:  contains  lemmas  for  each  word.  The  version  of  Griko  lemmas  we  

used   is   the   one   provided   in   the   following   dictionary:   Greco,   Carmine   (2001).  Lessico  di  Sternatia,  paese  della  Grecia  salentina:   italiano-­‐griko-­‐neogreco,  griko-­‐italiano-­‐neogreco,  neogreco-­‐griko.  Lecce:  Edizioni  Del  Grifo.  

All   four   tiers   were   created   manually   using   Praat  (http://www.fon.hum.uva.nl/praat/).  Within  this  program,  the  sound  file  was  divided  into  sentences,  which  were  separated  by  boundaries.  Within  each  set  of  boundaries,  spaces  were   used   to   indicate  word   boundaries   in   the   text   tier.   The   apostrophe   is  used   as   an   alternative   to   space   only   to   mark   word   boundaries   wherein   the  phenomenon   of   raddoppiamento   sintattico   (phonosyntactic   germination)   takes  place  (see  page  4).    

In  order  to  ensure  a  one-­‐to-­‐one  correspondence  between  items  on  the  transcription  tier  and  items  on  the  other  tiers,  spaces  are  used  on  those  tiers  as  well.  Whenever  itemization  discrepancies  across  different   tiers  occurred,   the   following  conventions  were  used:  1. when   a   phonological   word   contains   two   morphological   words   of   different  

syntactic  category,  e.g.  with  prepositions  fusing  with  the  definite  article  (text-­‐tag  discrepancy),  the  tag  of  the  two  words  is  separated  by  period  (“.”).    

2. If   a   Griko   word   corresponds   to   more   than   one   word   in   Italian   (text-­‐gloss  discrepancy),  then  underscore  (“_”)  between  the  two  Italian  words   is  used.  E.g.  ‘irta  and  sono_venuta.  

Only   the   speech  of   the   informants   is   transcribed,   and  not   that   of   the   interviewer.  Whenever  additional   speakers  were  present,  an  additional   tier  was  used   to  notate  their  speech.  Intonational  breaks  are  transcribed  as  “#”.  Whenever  transcription  was  not  possible,  this  was  notated  on  the  text  tier  by  “[…]”.    

Page 15: aspects of the grammar of Griko

15  

Each  sentence  in  the  transcription  tier  is  preceded  by  a  number,  followed  by  a  space.  The  number  corresponds   to   the  number  of   the   test   sentence   in   the  questionnaire.  The   end   of   a   sentence   is   notated   by   a   period   or   a   question  mark,   as   appropriate  depending  on  sentence  type.  The  only  other  punctuation  mark  that  appears  on  the  text   tier   is   the   apostrophe,  which   is   used   to  mark   phonosyntactic   gemination.   No  other  punctuation  marks  are  used  on  the  transcription  tier.  On  all  other   tiers,  only  the  underscore  and  the  period  are  used,  in  the  way  mentioned  above.  

Synopsis  of  symbols  used  [Text  Tier]  .   end  of  test  sentence    ?   end  of  test  sentence  (question)  ’   phonosyntactic  gemination  #     intonational  break  […]     material  in  the  sound  file  is  not  transcribed      [Gloss  Tier]  _   separates  two  words  in  Italian  corresponding  to  a  single  one  in  Griko    [PoS  Tag  Tier]  .   separates  two  PoS  tags  realized  as  a  single  word  in  Griko    

The  transcription  is  orthographic.  Since  there  is  a  (limited)  tradition  of  written  Griko,  we   decided   to   forego   a   phonetic   or   phonological   transcription,   which   would   be  foreign   to   native   speakers   of   Griko.   We   relied   on   a   version   of   the   orthographic  conventions   adopted   in   texts   written   in   Griko,   like   the   ones   employed   in   e.g.   the  magazine   Spitta   (of  which   available   digitized   issues   can   be   found   by   following   the  relevant   link   in   the   project  website).   The   orthographic   conventions   used   for  Griko  closely   recall   conventions   adopted   for   Standard   Italian.   Notes   explicating   the  conventions  used  are  provided  on  page  4.  

Page 16: aspects of the grammar of Griko

16  

TRANSCRIPTION  CONVENTIONS  

Transcription  in  GRIKO   Correspondence  to  simplified  IPA    

NOTES  

a;  à   [a]    b   [b]    c   [k]  before  [a]  ,  [o],  [u]  

[tʃ]  before  [i]  and  [e]  e.  g.  Carlo  [’Karlo]  ceràsi  [tʃe’rasi];  cilìa  [tʃi’lia]    <cia>  =  [tʃa];  e.g.  cialatèdda  [tʃalat’ed:a]  <cio>  =  [tʃo];  e.g.  ciofàli  [tʃo’fali]  <ciu>  =  [tʃu];  e.g.  ciumpì  [tʃum’pi]  <cìa>  =  [tʃ’ia]    <cìo>  =  [tʃ’iu]    

ch   [x]     e.g.  rùcho  [r’uxo]  d   [d]    e,  è   [e]    f   [f]    g          

[g]  before  [a]  ,  [o],  [u]    [dʒ]  before  [i]  and  [e]        

e.g.  garrofèddo  [gar:o’fed:o]  <ge>  =  [dʒe]    <gi>  =  [dʒi  <gia>  =  =  [dʒa]  e.g.    sangìa  [san’dʒia  <gio>  =  =  [dʒo]  <giu>  =  [dʒu]  

gh   [g]   e.g.  ègghene  [‘eg:ene]  i;  ì   [i]    j   [γ]      k   [k]    l   [l]    m   [m]    n   [n]    o;  ò   [o]    p   [p]    r   [r]    s   [s]    t   [t]    u;  ù   [u]    v   [v]    z   [dz]   e.g.  ziò  [z’io]  ts   [ts]   e.g.  tsìlo  [‘tsilo]  sc   [ʃ]   e.g.  scìmmata  [‘ʃim:ata]  gn   [ɲ]   e.g.  signurèdda  [siɲu’red:a]    

Page 17: aspects of the grammar of Griko

17  

NOTES    1.  The  symbol  <g>  represents  the  plosive  /ɡ/,  unless  it  precedes  a  front  vowel  (⟨i⟩  or  ⟨e⟩).   In   this   case   it   represents   the   affricate   /dʒ/.  When   the   plosive   pronunciation  occurs  before  a   front   vowel,   ⟨gh⟩   is  used,   so   that  <ghe>   represents   [ge]  and  <ghi>  represents  [gi].  

2.  <c>  represents  the  affricate  /tʃ/  before  front  vowels  ⟨i⟩  and  ⟨e⟩.    In  some  words  of  Italian  origin,  before  non-­‐front  vowels  (<a>,  <o>,  <u>)  <c>  spells  the  unvoiced  plosive  /k/,  e.g.  Carlo  [‘Karlo].      

3.  Thus,  the  letter  <i>  may  function  as  a  mere  indicator  that  the  preceding  ⟨c⟩  or  ⟨g⟩  is  affricate,  e.g.  cia  (/tʃa/),  ciu  (/tʃu/),  gia  (/dʒa/),  giu  (/dʒu/).    

4.   The   symbols   <ch>   always   represent   [x].   E.g.   cheretìmmata   [xere’tim:ata];   chàri  [’xari].  Plosive  [k]  is  transcribed  as  <k>.  

5.   For   every   word   that   has   two   or  more   syllables   an   accent   diacritic   is   used,   to  indicate  the  location  of  stress,  e.g.  cheretìmmata.    6.   Raddoppiamento   fonosintattico   (or   phonosyntactic   doubling,   PD):   the  phenomenon  of  syntactic  gemination,  i.e.  the  lengthening  of  word-­‐initial  consonants  related  to  the  presence  of  a  particular  set  of  preceding  elements.  PD  is  notated  by  an  apostrophe  <’>  (used  whenever  a  segment   in  the  preceding  word   is  elided)  and  no  space  between  the  two  words,  e.g.  si’putèka  [sip:u’teka].                              

Page 18: aspects of the grammar of Griko

18  

Appendix  A2:  Protocol  for  Part-­‐of-­‐Speech  Tagging  of  Griko      1.  Introduction  This   document   is   the  manual   used   for   performing   part-­‐of-­‐speech   (PoS)   tagging   of  Griko   texts.   All   aspects   of   the   data   enrichment   process,   namely   transcription,  tagging,  lemmatization  and  glossing  in  Italian  were  carried  out  manually,  using  Praat  (http://www.fon.hum.uva.nl/praat/);   see   also   “Protocol   for   transcription  of  Griko”.  The  categories  used  for  PoS  tagging  are  the  following:    1.  N  [Noun]  2.  Adj  [Adjective]  3.  V  [Verb]  4.  Adv  [Adverb]  5.  P  [Adposition]  6.  C  [Complementizer]  7.  Pr  [Pronoun]  8.  D  [Determiner]  9.  Prt  [Particle]  10.  Num  [Numeral]    The  specifications  for  values  and  attributes  that  were  ascribed  to  each  category  are  explicated  in  separate  subsections  below.      2.  General  remarks  1. In   the   transcription,   the   category   of   the   word   appears   first.   Specifications   for  

other  attributes  are  separated  with  a  plus  (“+”)  sign.    2. For  each  category,  there  exist  obligatory  and  optional  attributes.  A  value  for  the  

obligatory  attributes  is  always  specified.  Regarding  optional  attributes,  when  no  value   is   provided,   the   value   is   set   to   default   (which   is   provided   for   particular  categories  and  optional  attributes).    

3. The  size  of  the  internal  composition  of  each  tag  is  constant  for  each  category,  but  not  identical  across  categories.  For  instance,  for  Griko  nouns  a  4-­‐character  tag  is  minimally  needed,  whereas  for  finite  verbs  the  tags  are  9-­‐character  long.  

4. In  case  a  specification  cannot  be  given  with  certainty,  e.g.  in  case  the  gender  of  a  particular  noun  is  unclear  the  value  ‘unspecified’  (“U”)  is  provided.    

5. In   case   characterization   for   a   particular   attribute   does   not   apply   for   a   given  category,  0  (zero)  is  used.    

3.  Specifications,  Attributes  and  Values  per  Category  3.1  Noun  Abbreviation:  N  Specification:  Features  Obligatory  attributes:  Gender,  Number,  Case    Values  for  obligatory  attributes  Gender:  Masc/Fem/Neu  Number:  S/Pl  

Page 19: aspects of the grammar of Griko

19  

Case:  Nom/Gen/Acc/Voc    Optional  Attribute:  Type.  Since  most  nouns  in  our  corpus  are  common,  we  do  not  specify  the  type;  common  is  treated  as  default.  Thus,  only  proper  name  come  with  a   fifth   specification,  namely  Prop  (for  Proper).      Example:  the  tag  for  Maria  is  N+Fem+S+Nom+Prop.    The  case  ascribed  to  a  noun  does  not  always  reflect  morphological  distinctions,  but  may  rely  on  the  syntactic  context.  For  instance,  nouns  realizing  the  syntactic  role  of  object   will   be   tagged   as   realizing   accusative   case,   even   if   there   is   no   discrete  morphological  marking  for  accusative  case  on  the  noun.  This  was  deemed  necessary  for  several  reasons,  one  of  the  being  the  lack  of  syntactic  annotation  of  the  corpus.      3.2  Adjective  Abbreviation:  Adj  Specification:  Features,  Degree,  Position  Obligatory  Attributes:  Gender,  Number,  Case    Values  for  obligatory  attributes:  Gender:  Masc/Fem/Neu  Number:  S/Pl  Case:  Nom/Gen/Acc/Voc    Optional  attributes:  Position,  Degree,  Nominalization  Values  for  optional  attributes:  Position:  Post(nominal)  Degree:  Comp(arative)/Sup(erlative)  Nominalization:  NM  (Nominalized)    The   default   value   for   Position   is   Preposed   (reflecting   the   order   Adj-­‐N).   When  Postposed,  an  adjective  will  receive  the  specification  Post  (i.e.  post-­‐posed).    Example,  the  tag  for  the  adjective  in  petìa  mincià  (“children  young”)  is  the  following:  Adj+Neu+Pl+Nom+Post.    The   default   value   for   degree   is   Positive.  When   the   adjective   is   of   comparative   or  superlative  degree,  the  values  Comp  and  Super  are  used.      The  default  value  for  Nominalized  is  negative.  So  NM  (Nominalized)  only  appears  in  the   marked   case.   When   nominalized,   the   adjective   is   neither   preposed   nor  postposed,   as   there   is   by   definition   no   overt   noun   with   respect   to   which   the  adjective  is  ordered.  So  NM  could  be  seen  as  another  value  for  Position.    3.3  Verb  Abbreviation:  V.  

Page 20: aspects of the grammar of Griko

20  

Specification:  Features,  Type.  Obligatory  attributes  for  all  members  of  category  V:  Finiteness,  Voice,  Type.    Values  for  obligatory  attributes:  Finiteness:  Fin(ite)N(on)Fin(ite)  Voice:  Act(ive)/N(on)Act(ive)  Type:  M(ain)/Aux(iliary)    Subtypes  of  Aux:    a. Mod  =  modal  auxiliary  verb  b. PRF  =  perfect  auxiliary,  e.g.  ‘have’  and  ‘be’  in  compound  tenses  (i.e.  present  and  

past  perfect).  c. PASS  =  passive  auxiliaries,  e.g.  ‘be’  and  ‘come’.  d. ASP  =  aspectual  auxiliaries,  e.g.  steo.    

 3A.   Attributes  of  finite  verbs  (VFin)  Tense:  Past/NonPast  NonPast  is  the  present  tense  form,  used  also  in  Griko  as  future  tense.  Aspect:  Perf(ective)/Imperf(ective)  The  aspectual  distinction  is  morphologically  realized  in  e.g.  past  tense  indicative.  Mood:  Ind  (Indicative)/Imp  (Imperative)/Subj  (Subjunctive)  Subjunctive  is  the  value  attributed  to  Griko  finite  verbs  that  realize  perfective  aspect  and  nonpast  tense.    Number:  S(ingular)/P(lural)  Person:  1/2/3    For   example,   a   finite   main   verb   like   teli   (“wants”)   is   tagged   as   follows:  V+fin+M+Act+Nonpast+Imperf+Ind+S+3    3B.   Attributes  of  nonfinite  verbs  (VNfin)  Subtype:  Inf(initive)/Part(iciple)  We   characterize   all   non-­‐finite   verb   forms   that   are   not   infinitives   as   participles  (subsuming  gerunds  too).  This  is  meant  purely  as  a  descriptive  label.  Aspect:  Perf(ective)/Imperf(ective)  The  characterization  reflects  the  morphological  specification  of  the  stem.  Number:  S(ingular)/P(lural)  Gender:  Masc(uline)/Fem(inine)/Neu(ter)  Griko  passive  participles  inflect  for  gender  and  number.  In  the  all  other  cases  (active  participle,  infinitive),  the  distinctions  don’t  apply,  so  0  is  used  for  these  attributes.    

 For   example:   a   VNfin   such   as   vriskonta   in   pao   vriskonta   would   be   tagged   in   the  following  way:  V+Nfin+M+Act+Part+Imperf+0+0.    3.4  Adverb  Abbreviation:  Adv.  Specification:  Type,  Features.  Obligatory  attributes:  Type.  

Page 21: aspects of the grammar of Griko

21  

 Values  for  obligatory  attributes:  Type:   Temp(poral),   Loc(ative),   Interr(ogative),   Asp(ectual),   Epist(emic),  Quant(ificational),  QuantNeg  (Negative  Quantificational).  Subtype:  Temp(oral)/Loc(ative).    The   specification   of   an   adverb   as   interrogative   makes   possible   its   further  specification  as  temporal  or  locative.      Optional  Attribute:  Degree.  Value  for  Degree:  Comp/Super  Default  degree  specification  is  positive,  unless  otherwise  stated.    Example,  pu  is  tagged  as  Adv+Interr+Loc,  pote  as  Adv+Interr+Temporal.    3.5  Adposition  Abbreviation:  P  Specification:  Feature  Attribute:  P/Pfus(ed)    P   is   used   for   simple  P’s,   Pfus   for  when  P   is   fused  with   the  definite   article   (D)   that  follows  it.  In  the  latter  case,  we  include  the  information  of  the  D  head  too.  This  is  a  case  where  a  single  word  corresponds  to  two  tags,  separated  by  a  “.”.      Examples:  atsè  is  tagged  as  P,  s(t)i  is  tagged  as  P+Pfus.D+Det+Fem+S+Acc.    3.6  Complementizer/Conjunction  Abbreviation:  C    Specification:  Type  and  Subtype.    Attributes  of  type:  Sub(ordinating)/Coord(inating)  Co-­‐ordinating  conjunctions  correspond  to  “and”,  “or”.    Sub-­‐ordinating  conjunctions  introduce  embedded  clauses.      Attributes  of  subordinating  (Sub)  C:  Decl(arative),  Inter(rogative),  Rel(ative),  Caus(al),  Temp(oral),  Cond(itional),  Subj(unctive),  Def(ault).    Def(ault)  occurs  whenever  the  value/function  of  the  all-­‐purpose  complementizer  ka  is  unclear.    Examples:  ce:  C+Coord,  na:  C+Sub+Subj.    

Page 22: aspects of the grammar of Griko

22  

3.7  Pronoun  Abbreviation:  Pr  Specification:  Type,  Features  Attributes   for  Type:  Pers(onal)/  Dem(onstrative)/   Inter(rogative)/  Quant(ifcational)/  Poss(essive)  Attributes  for  Features:  Strength:  W(eak)/Str  (ong)  Person:  1/2/3  Gender:  Masc/Fem/Neu  Number:  S/P  Case:  Nom/Acc/Gen/Voc  Strength  and  person  specifications  are  only  applicable  for  personal  pronouns.      Example:  cìni  (“those”)  is  tagged  as  Pr+Dem+0+0+Pl+Masc+Nom.    Optional  attributes:  Position,  Clitic  Doubling.  Default  value  for  Position  is  proclisis  (weak  personal  pronouns  precede  finite  verbs  in  Griko  as  in  Standard  Modern  Greek).  Encl(isis)  is  specified  when  the  pronoun  follows  the  verb.    Default  value   for  Clitic  Doubling   is  no  occurrence  of  clitic  doubling.  When  doubling  occurs,  dou(bling)  is  additionally  specified.    3.8  Determiner  Abbreviation:  D  Specification:  Type,  features.  Values  for  Type:  Def(inite)/Indef(inite)    Values  for  Features:    Gender:  Masc/Fem/Neu  Number:  S/Pl  Case:  Nom/Gen/Acc/Voc    Example:  i  (definite  feminine  singular)  is  tagged  as  D+Det+Fem+S+Acc.    3.9  Particle  Abbreviation:  Prt  Specification:  Type,  Subtype  Attributes:  Neg/Other  Attributes  for  Subtype  Neg:  Ind(icative)/N(on)Ind(icative)/Sent(ential)  U(nknown)    In   our   corpus,   all   particles   are   negative.   In   Griko,   as   in   Standard   Modern   Greek,  sentential  negative  markers  are   sensitive   to   the  mood   (indicative/nonindicative)  of  the  verb.    Negative  particles  that  occur  in  clausal  ellipsis  contexts  are  characterized  as  Sent(ential).  For  example,  ndè  is  tagged  as  Prt+Neg+Sent.    

Page 23: aspects of the grammar of Griko

23  

3.10  Numeral  Abbreviation:  Num  Example:  ettà  (“seven”)  

Page 24: aspects of the grammar of Griko

24  

Appendix  A3:  Database  and  website  manual   1.  Introduction  This  manual  includes  all  necessary  information  regarding  the  implementation  of  the  database   and   of   the   website   for   the   project   ‘Documentation   and   analysis   of   an  endangered  language:  aspects  of  the  grammar  of  Griko’.  The  flow  of  the  manual  follows  the  flow  of  the  project.  First  the  preprocessing  of  the  transcribed   data   is   presented.   The   relational   database   is   described   in   the   next  section  and  the  website  user  interface  in  the  final  chapter.    2.  Data  Preprocessing  The   programme   used   for   the   transcription   of   the   data   is   Praat.   Using   Praat,   the  information  for  each  audio  segment  (which  corresponds  to  a  sentence)   is  stored   in  tiers.  Following  the  transcription  protocol,  the  tiers  used  were:  

• transcription    • tagging  • gloss  • lemma  • metadata  

 The   metadata   tier   included   information   on   the   speaker,   only   in   the   cases   of   the  locations  where  there  were  multiple  speakers  in  the  interviews.  The  preprocessing  of  the  data  included  two  steps:  

1. Parse   the   .TextGrid   files   into   a   more   suitable   format,   such   as   plain   text  format.   This   was   done   with   the   ParseTextGrid.py   python   script,   which   is  available  in  http://griko.project.uoi.gr/pythoscripts/ParseTextGrid.py.  

2. Check   the   tags   for   possible   inconsistencies   in   the   tags,   according   to   the  tagging  protocol.  This  was  done  with  the  CheckData.py  python  script,  which  is  available  in  http://griko.project.uoi.gr/pythoscripts/CheckData.py.  

 3.  Database  The  processed  data  are  then  stored   in  a  relational  SQL  database,  using  the   InnoDB  database  engine  for  SQL.  The   database   is   automatically   filled   from   a   python   script,   which   reads   the  preprocessed  data  and  stores  them  into  the  appropriate  tables.  The   character   set   of   the   database   is   set   to   `utf8_general_ci`   in   order   to  accommodate  for  all  the  characters  present  in  the  Griko,  Italian  and  Greek  alphabet.    

Page 25: aspects of the grammar of Griko

25  

3.1  SQL  Tables  The  main  table  of  the  database  is  table  `sentences`.  It  stores  the  whole  trascription,  tagging,   gloss,   lemma   and   question   id.   In   addition,   it   stores   information   on   the  location  and  whether  the  segment   is   retrieved  from  other  sources.   It  also  provides  the  name  of  the  .wav  file  that  corresponds  to  this  segment,  if  it  is  available.  The  table  `questions`  stores  the  questionnaire  that  was  used  during  the  interviews,  including   the   test   sentence,   its   Italian   translation,   and   a   list   of   the   syntactic  phenomena  that  this  question  tries  to  examine.  The  table  `location`  stores  a   list  of  the   locations   of   the   interviews   and   the   table   `keyword`   stores   a   list   of   all   the  syntactic  variables/keywords,  which  were  examined  with  the  various  questions.    Moreover,   the   table   `questionkeywords`   provides   the  M-­‐to-­‐N   (multiple   relation   in  both   ways)   match   of   each   question   to   each   syntactic   keyword.   This   is   needed   in  order   to   efficiently   retrieve   the   question   ids   and   the   sentences   when   performing  queries  based  on  the  syntactic  keywords.  In   addition,   the   tables   `tokens`,   `itgloss`,   `lemma`   and   `tags`   store   each   individual  transcription  word,  Italian  gloss  word,  lemma  and  tag  for  all  sentences,  also  storing  (incrementally)   its   position   in   the   sentence,   as   they   would   result   from   any   split()  function  on  the  sentences.    Although   it  may  seem  redundant,   these  tables  enable  a   faster   individual  search  on  their   contents   and   also   ensure   the   matching   of   the   transcription   tokens   to   their  relevant  tag,  gloss  and  lemma,  through  the  variable  of  the  position.  Finally,   the   rest  of   the   tables   include   the   information  of   the   tags.   For  each  part  of  speech,   there   exists   its   corresponding   table,   which   stores   the   information   of   its  particular  features.  This  is  needed  because  the  different  part-­‐of-­‐speech  tags  employ  different  features  (for  more  information,  see  `Tagging  protocol`).    3.2  Tags'  features  The  features  for  each  tag  are  always  stored  as  an  integer  (or  Boolean)  value,  in  order  to   ensure   coherency,   avoid   issues   with   string   comparisons   and   also   increase   the  speed  of  the  queries'  execution.  The  features  are  stored  in  the  following  way:  

• If  the  feature  has  only  two  values,  then  it  can  be  represented  with  a  boolean  variable   ("0"   or   "1").   Example   of   such   feature   is   the   feature   Italian   which  denotes  whether  the  word  is  Italian  or  not.  

• If   the  feature  can  have  multiple  values,  then  each  value   is  matched  with  an  integer   (incrementally,   starting   from   "0").   The  matching   of   these   values   to  the  integers  is  presented  in  the  relevant  tables  in  the  appendix.  

 Note:   It   is   important   to   note   that   the   feature   ``case``   -­‐for   nouns,   adjectives,  pronouns,  determiners-­‐  is  referred  as  casse  in  the  database,  because  the  term  CASE  is  a  bound  word,  as  it  is  part  of  the  SQL  syntax.  

Page 26: aspects of the grammar of Griko

26  

 4.  Website  The  website   is   created   by   simple   html   pages,   using   javascript   and   php   for   specific  functionalities.  Javascript   is   used   for   creating   the   adjusting   forms   for   querying   the   database.   This  allows  for  the  form  to  be  user  friendly  and  also  ensure  that  no  meaningless  queries  are   executed.   For   example,   for   each   selected   tag   to   search   by,   only   the  corresponding  features  are  shown.    Javascript  is  also  used  for  the  functionalities  of  the  images.  The  lightbox  functionality  uses  the  lightbox-­‐2.6.min.js  package.  PHP   is   used   for   the   connecting,   querying   the   database   and   displaying   the   query  results.    4.1  Database  Search  UI  The  database  server  uses  the  mysqli  php  extention  (its  main  advantage  being  that  it  supports   Unicode   character   encoding),   so   the   interface   for   querying   the   database  also  uses  php  and  mysqli.    4.1.1  SEARCH  FORM  The  form  is  organised  into  tabs,  enabling  queries  with  the  various  parameters.  The   tabs  Word,   Lemma,   Gloss   (Italian)   receive   as   input   from   the   user   a   term   and  search   in   the   appropriate   tables   for   it.   Note   that   in   the   current   version   the   input  term  has   to  be  spelled  consistently  with   the  orthographic  conventions  adopted   for  Griko.  As  a   result,  a  Griko  term  also  has   to   include  the  appropriate  accent  diacritic  (eg.   ``tròo``   instead  of   ``troo``),  otherwise   it  will  not  return  any  results.  This  means  that   an   Italian   (or   other   Unicode)   keyboard   is   required.   The   next   version   will  hopefully  enable  search  without  the  accent.  The  Test   Sentence   and  Keyword   tabs   receive   input   only   from   the   check-­‐boxes   and  return   the  corresponding  sentences.  The   form   is   constructed  by   retrieving   the   test  sentences   and   the   keywords   from   the   `questions`   and   `keyword`   tables   of   the  database.  The   form   in   the  Location   tab   is   constructed   in   the   same  way,  only   in   this   case   the  input  is  requested  in  the  form  of  a  drop-­‐down  list.  Special   attention   is   given   in   the  Tag   tab,  which   enables   search   by   PoS   tags.   Since  each   part   of   speech   has   different   features,   with   which   the   user   must   be   able   to  search  the  database,  the  form  is  constructed  dynamically.  The  user  first  selects  the  PoS   tag   and   the   needed   features   are   then   presented   for   selection.   This   is  accomplished  with  the  Pane()  and  init()  javascript  functions.  The   function   Pane()   determines   the   properties   of   each   ``pane``   (every   form   is  defined  as  a  different  pane)  and  implements  the  functionality  that  allows  for  forms  

Page 27: aspects of the grammar of Griko

27  

to  only  appear  when  they  are  needed.  The  syntax  of  the  function  can  be  interpreted  as:    

Pane(X,Y,Z)  à  Show  pane  Y,  if  X  takes  value  Z    The  function   Init()  defines  the  various  dependencies  of  the  form-­‐panes  on  the  user  input,  initializing  the  page.  The  values  for  each  option  of  the  features  correspond  to  the  values  that  are  stored  in  the  database  and  can  be  found  in  the  appendix.  The   current   version  does  not   support   queries  with  more   than  one  PoS   tags,   but   a  new  version  with  this  feature  enabled  is  already  planned.    4.1.2  QUERIES  The   queries   in   general   are   the   result   of   SQL   JOIN   queries   between   the   tables  `sentences`,  `location`  and  a  3rd  table,  which  depends  on  user  input.  The   third   table   is   decided  based  on   the   search   tab   that   the  user   has   selected   (for  example,   if  the  user  is  querying  through  the  Lemma  tab,  then  the  third  table  is  the  `lemma`  table)  and,  in  the  case  of  PoS  tags,  the  type  of  PoS  (eg,  if  the  selected  PoS  tag  is  Verb,  then  the  third  table  is  the  `verbtags`  table).  The  rest  of  the  input  of  the  user  is  used  for  constructing  the  conditions  of  the  JOIN.  The  searchtags.php   script  constructs  and  executes   the  query,  by   iterating  over   the  possible  user  inputs.  It  is  important  to  note  that  input  for  different  features  is  used  in  an  additive  way  to  construct   the   conditional   query.   For   example,   when   querying   the   database   for   all  Adjective   tags  and  the  Masculine  and  Singular  options  are  selected  (for  the  Gender  and   Number   features),   then   the   result   will   contain   all   adjectives   that   are   both  masculine  AND  in  singular  number.  However,   the   additional   input   over   the   same   feature   is   used   in   a   different   way,  constructing  the  conditional  query  using  ``OR``  for  the  conditions  of  this  feature.  For  example,  when  querying   the  database   for  all  Adjective   tags  and  the  Masculine  and  Feminine  options  are  selected  (for  the  Gender   feature),  then  the  result  will  contain  all  adjectives  that  are  either  masculine  OR  feminine.  Of  course,  the  above  can  also  be  combined,  in  order  to  construct  more  complicated  queries.  If  no  option  is  selected  for  some  feature,  then  this  feature  does  not  form  part  of  the  query.  As   a   result,   if   no  option   is   selected   in   general,   the   result   consists   of   all   the  sentences  in  the  database.    

Page 28: aspects of the grammar of Griko

28  

4.1.3  RESULTS  The  results  of   the  query  are  presented   in  a   table,  which   is  also  constructed  by   the  searchtags.php  script.  As  a  default,  the  question  id,  transcription,  location  and  audio  file  (if  available)  of  the  sentence  are  included  in  the  results.  According   to   the   options   selected   by   the   user,   also   italian   gloss,   tags   and   lemmas  may  be  shown.  Where  audio  data  are  available,  the  appropriate  image  is  shown,  also  opening,  when  clicked,  a  new  browser  tab  or  window  (depending  on  user  preferences)  with  a  simple  player  for  the  wav  file.  The  audio  files  are  accessed  through  a  simple  interface,  which  automatically   selects   the   player   that   each   browser   supports,   in   order   to   avoid  compatibility  issues.    5.  Conclusion  For  any  enquiries,  suggestions  or  more  information  on  the  implementation  and  the  technical  details  of  the  database,  the  website  or  the  whole  project,  please  contact:  

• Antonis   Anastasopoulos   (for   technical   information   or   for   help   on   using   the  search  engine)  at  [email protected]  

• Marika   Lekakou   (for   enquiries   about   the   project   or   the   corpus)   at  [email protected].  

   

Page 29: aspects of the grammar of Griko

29  

Appendix  The  following  tables  present  the  1-­‐1  match  of  the  features  to  the  values  used  in  the  database  and  the  online  form.  

 

Location  1    Calimera    

2    Corigliano    

3    Martano  4    Sternatia  5    other    

Gender  

0    Masculine    

1    Feminine    2    Neutral  3    Unknown  

Degree  0    Positive  1   Comparative  2    Superlative  

Number  0    Singular    1    Plural    

2    Unknown  

Case  

0    Nominative    

1    Genitive    2    Accusative  3    Vocative  4    Unknown    5    Undefined  

Position  0    Preposed    1   Postposed  

Verb  Finiteness  0    Non  Finite    1   Finite  

Verb  Type  0    Main    1   Auxiliary  

Auxiliary  Verb  Type  

0    Modal    1   Perfect  3   Passive  4   Aspectual  

Verb  Voice  0    Active    1   Non  Active  

Verb  Tense  1   Past    2   Non  Past  

Verb  Aspect  1   Perfective    2   Imperfective  

Verb  Mood  1   Indicative  2   Imperative  3   Subjunctive  

Person  0   First  1   Second  2   Third  

Non  Finite  Verb  Subtype  

0  Not  Applicable  (finite  verb)  

1   Infinitive  2   Participle  

Adverb  Type  0    Temporal    1    Locative    2    Interrogative  3    Aspectual  4    Epistemic    5    Quantitative  

6    Quantitative  (Negative)  

7    Other    8    Manner    

Subordinating  Complementizer  

Subtype  1    Temporal    2    Default    3    Declarative    4   Interrogative    5    Relative  6    Conditional  7    Subjunctive  8    Causal  

Pronoun  Type  0    Personal    3    Demonstrative      5    Interrogative    6    Possessive  7    Quantificational    

Pronoun  Strength  0   Weak  1   Strong  2   Non  applicable  

Complementizer  Type  

0   Subordinating  1   Coordinating  

Page 30: aspects of the grammar of Griko

30  

The  rest  of  the  features:    

• Proper  (nouns)  • Nominalised  (adjectives)  • Fused  (adpositions)  • Enclisis  (pronouns)  • Participation  in  Clitic  Doubling  (pronouns)  • Italian  (all  words)  

 are   modeled   with   Boolean   values.   Their   default   value   is   0   (false)   and   if   the  specification  applies  to  the  particular  word/tag,  then  the  value  is  1  (true).    

Particle  Subtype  0   Indicative  1   Non  Indicative  2   Sentential  3     Unknown  

Particle  Type  0   Negative  1   Other