Diferencias

Muestra las diferencias entre dos versiones de la página.

--- materias:pln:2019:practico3_draft [2019/04/15 18:16] – francolq
+++ materias:pln:2019:practico3_draft [2025/11/15 13:47] (actual) – editor externo 127.0.0.1
@@ Línea 54: / Línea 54: @@
   * [[https://web.archive.org/web/20160325024315/http://nlp.lsi.upc.edu/freeling/doc/tagsets/tagset-es.html|Etiquetas EAGLES]]
   * [[https://nlp.stanford.edu/software/spanish-faq.shtml#tagset|Stanford CoreNLP simplified tagset]]
-  * [[http://nbviewer.ipython.org/url/cs.famaf.unc.edu.ar/~francolq/Etiquetado%20de%20Secuencias.ipynb|Jupyter notebook: Etiquetado de Secuencias]]
+  * [[https://github.com/PLN-FaMAF/PLN-2019/blob/master/notebooks/tagging/01%20Etiquetado%20de%20Secuencias.ipynb|Jupyter notebook: Etiquetado de Secuencias]]
@@ Línea 128: / Línea 128: @@
-===== Ejercicio 4: Clasificador "three words" =====
+===== Ejercicio 4: Hidden Markov Models y Algoritmo de Viterbi =====
-===== Ejercicio 5: Clasificador con Embeddings =====
+  * Implementar un Hidden Markov Model cuyos parámetros son las probabilidades de transición entre estados (las etiquetas) y de emisión de símbolos (las palabras).
+  * Implementar el algoritmo de Viterbi que calcula el etiquetado más probable de una oración.
+Interfaz de ''HMM'' y ''ViterbiTagger'' en  ''hmm.py'':
+<code python>
+class HMM:
+    def __init__(self, n, tagset, trans, out):
+        """
+        n -- n-gram size.
+        tagset -- set of tags.
+        trans -- transition probabilities dictionary.
+        out -- output probabilities dictionary.
+        """
+    def tagset(self):
+        """Returns the set of tags.
+        """
+    def trans_prob(self, tag, prev_tags):
+        """Probability of a tag.
+        tag -- the tag.
+        prev_tags -- tuple with the previous n-1 tags (optional only if n = 1).
+        """
+    def out_prob(self, word, tag):
+        """Probability of a word given a tag.
+        word -- the word.
+        tag -- the tag.
+        """
+    def tag_prob(self, y):
+        """
+        Probability of a tagging.
+        Warning: subject to underflow problems.
+        y -- tagging.
+        """
+    def prob(self, x, y):
+        """
+        Joint probability of a sentence and its tagging.
+        Warning: subject to underflow problems.
+        x -- sentence.
+        y -- tagging.
+        """
+    def tag_log_prob(self, y):
+        """
+        Log-probability of a tagging.
+        y -- tagging.
+        """
+    def log_prob(self, x, y):
+        """
+        Joint log-probability of a sentence and its tagging.
+        x -- sentence.
+        y -- tagging.
+        """
+    def tag(self, sent):
+        """Returns the most probable tagging for a sentence.
+        sent -- the sentence.
+        """
+class ViterbiTagger:
+    def __init__(self, hmm):
+        """
+        hmm -- the HMM.
+        """
+    def tag(self, sent):
+        """Returns the most probable tagging for a sentence.
+        sent -- the sentence.
+        """
+</code>
+Tests:
+  $ nosetests tagging/tests/test_hmm.py
+  $ nosetests tagging/tests/test_viterbi_tagger.py
+Documentación:
+  * [[http://www.cs.columbia.edu/~mcollins/hmms-spring2013.pdf]]
+===== Ejercicio 5: HMM POS Tagger =====
+  * Implementar en una clase ''MLHMM'' un Hidden Markov Model cuyos parámetros se estiman usando Maximum Likelihood sobre un corpus de oraciones etiquetado.
+  * La clase debe tener **la misma interfaz que ''HMM''** con las modificaciones y agregados especificadas abajo.
+  * Agregar al script de entrenamiento (train.py) una opción de línea de comandos que permita utilizar la MLHMM con distintos valores de ''n''.
+  * Entrenar y evaluar para varios valores de ''n'' (1, 2, 3 y 4). Reportar los resultados en el README. Reportar también tiempo de evaluación.
+Interfaz de ''MLHMM'' en ''hmm.py'':
+<code python>
+class MLHMM:
+    def __init__(self, n, tagged_sents, addone=True):
+        """
+        n -- order of the model.
+        tagged_sents -- training sentences, each one being a list of pairs.
+        addone -- whether to use addone smoothing (default: True).
+        """
+    def tcount(self, tokens):
+        """Count for an n-gram or (n-1)-gram of tags.
+        tokens -- the n-gram or (n-1)-gram tuple of tags.
+        """
+    def unknown(self, w):
+        """Check if a word is unknown for the model.
+        w -- the word.
+        """
+    """
+       Todos los métodos de HMM.
+    """
+</code>
+Tests:
+  $ nosetests tagging/tests/test_ml_hmm.py
+Documentación:
+  * [[http://www.cs.columbia.edu/~mcollins/hmms-spring2013.pdf]]
+===== Ejercicio 6: Clasificador "three words" =====
+**TBA**
+===== Ejercicio 7: Clasificador con Embeddings =====
+**TBA**
+===== Ejercicio 8: Análisis de Error y Nuevos Features  =====
+**TBA**
+===== Ejercicio 9: Red Neuronal Recurrente =====
+**TBA**
-===== Ejercicio 6: Análisis de Error y Nuevos Features  =====
 /*
-===== Ejercicio 7: Red Neuronal =====
+===== Ejercicio 9: Red Neuronal =====
-===== Ejercicio 8: Red Neuronal Recurrente =====
+===== Ejercicio 10: Red Neuronal Recurrente =====
   * https://allennlp.org/tutorials
   * https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html#example-an-lstm-for-part-of-speech-tagging
 */