Diferencias

Muestra las diferencias entre dos versiones de la página.

--- materias:pln:2019:practico1 [2019/03/14 14:59] – francolq
+++ materias:pln:2019:practico1 [2025/11/15 13:47] (actual) – editor externo 127.0.0.1
@@ Línea 7: / Línea 7: @@
   * Repositorio: https://github.com/PLN-FaMAF/PLN-2019.
-  * Fecha de entrega: 28/3 a las 23:59.
+  * Fecha de entrega: <del>28/3</del> 4/4 a las 23:59.
@@ Línea 34: / Línea 34: @@
   * Revisar a ojo la tokenización y segmentado en oraciones. Si es muy mala, probar otras formas de tokenización/segmentado.
   * Modificar el script ''train.py'' para utilizar el nuevo corpus.
+Interfaz de ''train.py'':
+<code>
+$ python languagemodeling/scripts/train.py  --help
+Train an n-gram model.
+Usage:
+  train.py [-m <model>] -n <n> -o <file>
+  train.py -h | --help
+Options:
+  -n <n>        Order of the model.
+  -m <model>    Model to use [default: ngram]:
+                  ngram: Unsmoothed n-grams.
+                  addone: N-grams with add-one smoothing.
+                  inter: N-grams with interpolation smoothing.
+  -o <file>     Output model file.
+  -h --help     Show this screen.
+</code>
 Documentación:
   * [[https://groups.google.com/forum/#!topic/pln-famaf-2015/KAa15XcqsXw|Ideas para corpus]]
+  * [[https://github.com/crscardellino/sbwce/blob/master/unlabeled_corpora.tsv|Recursos del Spanish Billion Words Corpus (Cristian Cardellino)]]
   * [[http://www.nltk.org/book/ch03.html#regular-expressions-for-tokenizing-text|NLTK: Regular Expressions for Tokenizing Text]]
   * [[https://github.com/PLN-FaMAF/PLN-2019/blob/master/notebooks/01%20Procesamiento%20B%C3%A1sico%20de%20Texto.ipynb|Jupyter notebook: Tokenización (ejemplo visto en clase)]]
@@ Línea 200: / Línea 223: @@
   -h --help     Show this screen.
 </code>
+Documentación:
+  * {{ :materias:pln:2019:lm-notas.pdf |Modelado de Lenguaje: Notas Complementarias}}
+  * Mails:
+    * [[https://groups.google.com/forum/#!topic/pln-famaf-2015/29EwJKp5nrY|Resultados típicos de perplexity para todos los modelos]]
@@ Línea 239: / Línea 268: @@
 (Course notes for NLP by Michael Collins, Columbia University)]]
     * **Especialmente** [[https://cs.famaf.unc.edu.ar/~francolq/Ejercicio%204.png|esta parte]] (última parte  de la sección 1.4.1).
-  * [[https://cs.famaf.unc.edu.ar/~francolq/lm-notas.pdf|Modelado de Lenguaje: Notas Complementarias]]
+  * {{ :materias:pln:2019:lm-notas.pdf |Modelado de Lenguaje: Notas Complementarias}}
   * [[https://www.youtube.com/watch?v=-aMYz1tMfPg&list=PL6397E4B26D00A269&index=17|4 - 6 - Interpolation - Stanford NLP - Professor Dan Jurafsky & Chris Manning]]
   * [[http://nbviewer.jupyter.org/url/cs.famaf.unc.edu.ar/~francolq/Modelado%20de%20Lenguaje%20Parte%202.ipynb#Suavizado-por-Interpolación|Jupyter notebook: Suavizado por Interpolación (ejemplo visto en clase)]]
@@ Línea 303: / Línea 332: @@
   * [[http://www.cs.columbia.edu/~mcollins/lm-spring2013.pdf|Language Modeling
 (Course notes for NLP by Michael Collins, Columbia University)]]
-  * [[https://cs.famaf.unc.edu.ar/~francolq/lm-notas.pdf|Modelado de Lenguaje: Notas Complementarias]]
+  * {{ :materias:pln:2019:lm-notas.pdf |Modelado de Lenguaje: Notas Complementarias}}
   * [[https://www.youtube.com/watch?v=hsHw9F3UuAQ&index=3&list=PLO9y7hOkmmSHE2v_oEUjULGg20gyb-v1u|Discounting Methods - Part I]] (Michael Collins, Columbia University)
   * [[https://www.youtube.com/watch?v=FedWcgXcp8w&index=4&list=PLO9y7hOkmmSHE2v_oEUjULGg20gyb-v1u|Discounting Methods - Part II]] (Michael Collins, Columbia University)