November 19th, 2012

Grouping Customer Opinions Written in Natural Language Using Unsupervised Machine Learning

Дорогие коллеги,
В ближайшую субботу, 24 ноября, состоится очередной семинар по автоматической обработке естественного языка. К нам приезжает профессор из университета Брно Ян Жижка. Ян выступит с докладом на тему "Grouping Customer Opinions Written in Natural Language Using Unsupervised Machine Learning".
Внимание - семинар будет проходить на АНГЛИЙСКОМ языке.
In the first part, the talk deals with a problem of automatic clustering of unstructured textual documents. Here, this known problem is investigated empirically, focusing especially on very large data taken from the real world: Reviews of customers of hotel accommodations booked online. The data come from one of popular booking service provided by Using the biggest selection (almost 2,000,000 freely written reviews in English), the talk presents the problem which clustering method should be used, what parameters of the selected algorithm are optimal, and how to estimate the clustering correctness.
In the second part, the talk mentions another problem that played a specific role in the clustering task and that arose: How to process very large (textual) data volumes when our computers cannot process it because of the never sufficient RAM (memory) size? The side experiments with (pseudo)parallel processing demonstrated some interesting things related to the representing demand of randomly selected subsets of the large original set. When the subsets lose their representative role due to omitting some relevant words because of the selection? Is it better to process more smaller subsets faster or less bigger ones slower?
Семинар пройдет по адресу: 10 линия В.О. дом 49, ауд 308. Начало в 17:00.
Пароль для прохода через вахту: "Я на семинар".