Friday, February 27, 2009
Read
by Makoto Nagao, Shinsuke Mori
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.57.8416
Class-Based n-Gram Models of Natural Language (1992)
by Peter F. Brown, Peter V. Desouza, Robert L. Mercer, Jenifer C. LaiComputational Linguistics
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.9919
Beyond word n-grams (1995)
by Fernando C. Pereira, Yoram Singer, Naftali TishbyIn Proceedings of the Third Workshop on Very Large Corpora
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.7
Friday, February 20, 2009
next in line
N-GRAM AND LOCAL CONTEXT ANALYSIS FOR PERSIAN TEXT RETRIEVAL
by Farhad Oroumchian A, Abolfazl Aleahmad A, Parsia Hakimian A, Farzad Mahdikhani A http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.8268
Proceedings of ICDAR
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.47.2338
Discovering characteristic expressions from literary works: A new text analysis method beyond n-gram and kwic (2001)
by Masayuki Takeda, Tetsuya Matsumoto, Tomoko Fukuda, IchirĂ… Nanri
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.5337
currently reading
Information Retrieval
http://www.cc.gatech.edu/~isbell/tutorials/TextRetrieval.fm.pdf
Understanding Inverse Document Frequency:On theoretical arguments for IDF
Stephen RobertsonMicrosoft Research7 JJ Thomson AvenueCambridge CB3 0FBUK(and City University, London, UK)
http://www.soi.city.ac.uk/~ser/idfpapers/Robertson_idf_JDoc.pdf#search=
Using TF-IDF to Determine Word Relevance in Document Queries
Juan Ramos Department of Computer Science, Rutgers University, 23515 BPO Way, Piscataway, NJ, 08855
http://www.cs.rutgers.edu/~mlittman/courses/ml03/iCML03/papers/ramos.pdf#search=
TEXT MINING
Ian H. WittenComputer Science, University of Waikato, Hamilton, New Zealand