Indosum_A_New_Benchmark_Dataset_for_Indonesian_Text_Summarization.pdf
Document Details
Full Transcript
I NDO S UM: A New Benchmark Dataset for Indonesian Text Summarization Kemal Kurniawan Samuel Louvan Kata Research Team Fondazione Bruno Kessler...
I NDO S UM: A New Benchmark Dataset for Indonesian Text Summarization Kemal Kurniawan Samuel Louvan Kata Research Team Fondazione Bruno Kessler Kata.ai University of Trento Jakarta, Indonesia Trento, Italy [email protected] [email protected] Abstract—Automatic text summarization is generally con- dataset publicly available. In short, the contribution of this sidered as a challenging task in the NLP community. One of work is two-fold: the challenges is the publicly available and large dataset that is relatively rare and difficult to construct. The problem is 1) I NDO S UM, a large dataset for text summarization even worse for low-resource languages such as Indonesian. in Indonesian that is compiled from online news In this paper, we present I NDO S UM, a new benchmark articles and publicly available. dataset for Indonesian text summarization. The dataset con- 2) Evaluation of state-of-the-art extractive summariza- sists of news articles and manually constructed summaries. tion methods on the dataset using ROUGE as the Notably, the dataset is almost 200x larger than the previous Indonesian summarization dataset of the same domain. standard metric for text summarization. We evaluated various extractive summarization approaches The state-of-the-art result on the dataset, although impres- and obtained encouraging results which demonstrate the sive, is still significantly lower than the maximum possible usefulness of the dataset and provide baselines for future ROUGE score. This result suggests that the dataset is suf- research. The code and the dataset are available online under permissive licenses. ficiently challenging to be used as evaluation benchmark for future research on Indonesian text summarization. Keywords-extractive summarization; dataset; Indonesian; II. R ELATED WORK I. I NTRODUCTION Fachrurrozi et al. proposed some scoring methods The goal of text summarization task is to produce and used them with TF-IDF to rank and summarize news a summary from a set of documents. The summary articles. Another work used latent Dirichlet allocation should retain important information and be reasonably coupled with genetic algorithm to produce summaries for shorter than the original documents. When the set online news articles. Simple methods like naive Bayes has of documents contains only a single document, the task also been used for Indonesian news summarization , is usually referred to as single-document summarization. although for English, naive Bayes has been used almost There are two kinds of summarization characterized by two decades earlier. A more recent work employed how the summary is produced: extractive and abstractive. a summarization algorithm called TextTeaser with some Extractive summarization attempts to extract few impor- predefined features for news articles as well. Slamet et tant sentences verbatim from the original document. In al. used TF-IDF to convert sentences into vectors, and contrast, abstractive summarization tries to produce an their similarities are then computed against another vector abstract which may contain sentences that do not exist obtained from some keywords. They used these similarity in or are paraphrased from the original document. scores to extract important sentences as the summary. Despite quite a few number of research on Indonesian Unfortunately, all these work do not seem to be evaluated text summarization, none of them were trained nor evalu- using ROUGE, despite being the standard metric for text ated on a large, publicly available dataset. Also, although summarization research. ROUGE is the standard intrinsic evaluation metric for An example of Indonesian text summarization research English text summarization, for Indonesian it does not which used ROUGE is. They employed the best seem so. Previous works rarely state explicitly that their method on TAC 2011 competition for news dataset and evaluation was performed with ROUGE. The lack of a achieved ROUGE-2 scores that are close to that of hu- benchmark dataset and the different evaluation metrics mans. However, their dataset consists of only 56 articles make comparing among Indonesian text summarization which is very small, and the dataset is not available research difficult. publicly. In this work, we introduce I NDO S UM, a new benchmark An attempt to make a public summarization dataset has dataset for Indonesian text summarization, and evaluated been done in. They compiled a chat dataset along with several well-known extractive single-document summa- its summary, which has both the extractive and abstractive rization methods on the dataset. The dataset consists of versions. This work is a good step toward standardizing online news articles and has almost 200 times more doc- summarization research for Indonesian. However, to the uments than the next largest one of the same domain. best of our knowledge, for news dataset, there has not To encourage further research in this area, we make our been a publicly available dataset, let alone a standard. 978-1-7281-1175-9/18/$31.00 c 2018 IEEE 215 Authorized licensed use limited to: Universitas Negeri Semarang. Downloaded on June 11,2024 at 06:57:00 UTC from IEEE Xplore. Restrictions apply. Suara.com - Cerita sekuel terbaru James Bond bocor :::::::::::::::::::::::::::::::::::::: Menurut sumber yang terlibat dalam produksi film ini, agen rahasia 007 berhenti menjadi mata-mata Inggris demi menikah dengan :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: perempuan yang dicintainya. ::::::::::::::::::: ”Bond berhenti menjadi agen rahasia karena jatuh cinta dan menikah dengan perempuan yang dicintai,” tutur seorang sumber yang dekat dengan produksi seperti dikutip laman PageSix.com. Dalam film tersebut, Bond diduga menikahi Madeleine Swann yang diperankan oleh Lea Seydoux. Lea diketahui bermain sebagai gadis Bond di sekuel Spectre pada 2015 silam. Jika benar, ini merupakan satu-satunya sekuel yang bercerita pernikahan James Bond sejak 1969. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Sebelumnya, di sekuel On Her Majesty, James Bond menikahi Tracy Draco yang diperankan Diana Rigg. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Namun, di film itu Draco terbunuh. ::::::::::::::::::::::: Plot sekuel film James Bond ke-25 bocor tak lama setelah Daniel Craig mengumumkan bakal kembali memerankan tokoh agen 007. Cerita sekuel terbaru James Bond bocor. Menurut sumber yang terlibat dalam produksi film ini, agen rahasia 007 berhenti menjadi mata-mata Inggris demi menikah dengan perempuan yang dicintainya. Jika benar, ini merupakan satu-satunya sekuel yang bercerita pernikahan James Bond sejak 1969. Sebelumnya, di sekuel On Her Majesty, James Bond menikahi Tracy Draco. Namun, di film itu Draco terbunuh. Suara.com - Newest James Bond sequel’s story was leaked ::::::::::::::::::::::::::::::::::::::::: According to a source involved in the movie production, the secret agent 007 stopped being an English spy to marry a woman whom :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: he loved. :::::: ”Bond stopped being a spy because he fell in love and married a woman that he loved,” said a source who is close to the production as reported by PageSix.com. In the movie, Bond was suspected to marry Madeleine Swann who is played by Lea Seydoux. Lea is known to play as a Bond girl in the sequel Spectre in 2015. If true, this would be the only sequel that tells about James Bond’s marriage since 1969. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Previously, in the sequel On Her Majesty, James Bond married Tracy Draco who was played by Diana Rigg. :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: However, in the movie Draco was killed. ::::::::::::::::::::::::::: The plot of the 25th James Bond sequel movie was leaked not long after Daniel Craig announced that he would play agent 007 character again. Newest James Bond sequel’s story was leaked. According to a source involved in the movie production, the secret agent 007 stopped being an English spy to marry a woman whom he loved. If true, this would be the only sequel that tells about James Bond’s marriage since 1969. Previously, in the sequel On Her Majesty, James Bond marries Tracy Draco. However, in the movie Draco was killed. Figure 1. A sample article, its abstractive summary, and their English translations. Underlined sentences are the extractive summary obtained by following the greedy algorithm in. III. M ETHODOLOGY concert, list of award nominations, and so on. Since such a list is never included in the summary, we truncated such A. I NDO S UM: a new benchmark dataset articles so that the number of paragraphs are at most two We used a dataset provided by Shortir,1 an Indonesian standard deviations away from the mean.3 For each fold, news aggregator and summarizer company. The dataset the mean and standard deviation were estimated from the contains roughly 20K news articles. Each article has the training set. We discarded articles whose summary is too title, category, source (e.g., CNN Indonesia, Kumparan), long since we do not want lengthy summaries anyway. The URL to the original article, and an abstractive summary cutoff length is defined by the upper limit of the Tukey’s which was created manually by a total of 2 native speakers boxplot, where for each fold, the quartiles were estimated of Indonesian. There are 6 categories in total: Entertain- from the training set. After removing such articles, we ment, Inspiration, Sport, Showbiz, Headline, and Tech. A ended up with roughly 19K articles in total. The complete sample article-summary pair is shown in Fig. 1. statistics of the corpus is shown in Table I. Note that 20K articles are actually quite small if we Since the gold summaries provided by Shortir are compare to English CNN/DailyMail dataset used in abstractive, we needed to label the sentences in the article which has 200K articles. Therefore, we used 5-fold cross- for training the supervised extractive summarizers. We fol- validation to split the dataset into 5 folds of training, lowed Nallapati et al. to make these labeled sentences development, and testing set. We preprocessed the dataset (called oracles hereinafter) using their greedy algorithm. by tokenizing, lowercasing, removing punctuations, and The idea is to maximize the ROUGE score between replacing digits with zeros. We used NLTK and the labeled sentences and the abstractive gold summary. spaCy2 for sentence and word tokenization respectively. Although the provided gold summaries are abstractive, In our exploratory analysis, we discovered that some in this work we focused on extractive summarization articles have a very long text and some summaries have because we think research on this area are more mature, too many sentences. Articles with a long text are mostly especially for Indonesian, and thus starting with extractive articles containing a list, e.g., list of songs played in a summarization is a logical first step toward standardizing Indonesian text summarization research. 1 http://shortir.com 2 https://spacy.io 3 We assume the number of paragraphs exhibits a Gaussian distribution. 2018 International Conference on Asian Language Processing (IALP) 216 Authorized licensed use limited to: Universitas Negeri Semarang. Downloaded on June 11,2024 at 06:57:00 UTC from IEEE Xplore. Restrictions apply. Table I C ORPUS STATISTICS Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 train dev test train dev test train dev test train dev test train dev test # of articles 14262 750 3762 14263 749 3762 14290 747 3737 14272 750 3752 14266 747 3761 avg # of paras / article 10.54 10.42 10.39 10.49 10.83 10.47 10.47 10.57 10.61 10.52 10.37 10.49 10.49 10.23 10.54 avg # of sents / para 1.75 1.74 1.75 1.75 1.75 1.75 1.75 1.74 1.73 1.74 1.73 1.77 1.75 1.79 1.74 avg # of words / sent 18.86 19.26 18.91 18.87 18.71 19.00 18.89 18.95 18.90 18.88 19.27 18.82 18.92 18.81 18.82 avg # of sents / summ 3.48 3.42 3.47 3.47 3.50 3.47 3.48 3.44 3.46 3.48 3.40 3.48 3.47 3.54 3.48 avg # of words / summ sent 19.58 19.91 19.59 19.60 19.54 19.58 19.57 19.77 19.65 19.58 19.92 19.60 19.63 19.05 19.57 Since there can be many valid summaries for a given 4) T EXT R ANK, which is very similar to L EX R ANK but article, having only a single abstractive summary for an ar- computes sentence similarity based on the number ticle is a limitation of our dataset which we acknowledge. of common tokens. Nevertheless, we feel that the existence of such dataset is For the non-neural supervised methods, we compared: a crucial step toward a fair benchmark for Indonesian text 1) BAYES, which represents each sentence as a feature summarization research. Therefore, we make the dataset vector and uses naive Bayes to classify them. publicly available for others to use.4 Four features are used: whether the sentence has B. Evaluation less than 5 words, whether the sentence contains For evaluation, we used ROUGE , a standard metric signature words, its position in the document, and for text summarization. We used the implementation pro- its position in the paragraph. To obtain the signature vided by pythonrouge.5 Following , we report the words, TF-IDF are used. The original paper com- F1 score of R-1, R-2, and R-L. Intuitively, R-1 and R-2 putes TF-IDF score on multi-word tokens that are measure informativeness and R-L measures fluency. identified automatically using mutual information. We report the F1 score instead of just the recall score We did not do this identification, so our TF-IDF because although we extract a fixed number of sentences computation operates on word tokens. as the summary, the number of words are not limited. So, 2) H MM, which uses hidden Markov model where reporting only recall benefits models which extract long states correspond to whether the sentence should sentences. be extracted. Gaussian distribution is used as the emission probability distribution, where each C. Compared methods sentence is represented as a feature vector. Four We compared several summarization methods which features are used: its position in the paragraph, can be categorized into three groups: unsupervised, non- number of terms, sum of probability of terms in neural supervised, and neural supervised methods. For the the document, and sum of probability of terms in unsupervised methods, we tested: a baseline document. We used a precomputed TF 1) S UM BASIC, which uses word frequency to rank table for the last feature. The original work uses sentences and selects top sentences as the sum- QR decomposition for sentence selection but our mary ,. implementation does not. We simply ranked the 2) L SA, which uses latent semantic analysis (LSA) sentences by their scores and picked the top 3 as to decompose the term-by-sentence matrix of a the summary. document and extracts sentences based on the result. 3) M AX E NT, which represents each sentence as a fea- We experimented with the two approaches proposed ture vector and leverages maximum entropy model in and respectively. to compute the probability of a sentence should be 3) L EX R ANK, which constructs a graph representation extracted. Several features are used: word pairs, of a document, where nodes are sentences and edges sentence length, previous sentence length, sentence represent similarity between two sentences, and runs position, and whether the sentence is at the start PageRank algorithm on that graph and extracts sen- of a paragraph. The original approach puts a prior tences based on the resulting PageRank values. distribution over the labels but we put the prior In the original implementation, sentences shorter on the weights instead. Our implementation still than a certain threshold are removed. Our imple- agrees with the original because we employed a bias mentation does not do this removal to reduce the feature which should be able to learn the prior label number of tunable hyperparameters. Also, it origi- distribution. nally uses cross-sentence informational subsumption As for the neural supervised method, we evaluated (CSIS) during sentence selection stage but the paper N EURAL S UM using the original implementation by does not explain it well. Instead, we used an approx- the authors.6 We modified their implementation slightly to imation to CSIS called cross-sentence word overlap allow for evaluating the model with ROUGE. Note that all described in by the same authors. the methods are extractive. Our implementation code for 4 https://github.com/kata-ai/indosum 5 https://github.com/tagucci/pythonrouge 6 https://github.com/cheng6076/NeuralSum 2018 International Conference on Asian Language Processing (IALP) 217 Authorized licensed use limited to: Universitas Negeri Semarang. Downloaded on June 11,2024 at 06:57:00 UTC from IEEE Xplore. Restrictions apply. all the methods above is available online.7 every N EURAL S UM scenario scores are still considerably As a baseline, we used L EAD -N which selects N lower than O RACLE, hinting that it can be improved leading sentences as the summary. For all methods, we further. Moreover, initializing with FAST T EXT pre-trained extracted 3 sentences as the summary since it is the median embedding slightly lowers the scores, although they are number of sentences in the gold summaries that we found still within one standard deviation. This finding suggests in our exploratory analysis. that the effect of FAST T EXT pre-trained embedding is unclear for our case. D. Experiment setup Some of these approaches optionally require precom- puted term frequency (TF) or inverse document frequency B. Out-of-domain results (IDF) table and a stopword list. We precomputed the TF Since Indonesian is a low-resource language, collecting and IDF tables from Indonesian Wikipedia dump data and in-domain dataset for any task (including summarization) used the stopword list provided in. Hyperparameters can be difficult. Therefore, we experimented with out-of- were tuned to the development set of each fold, optimizing domain scenario to see if N EURAL S UM can be used easily for R-1 as it correlates best with human judgment. for a new use case for which the dataset is scarce or non- For N EURAL S UM, we tried several scenarios: existent. Concretely, we trained the best N EURAL S UM 1) tuning the dropout rate while keeping other hyper- (with word embedding size of 300) on articles belonging parameters fixed, to category c1 and evaluated its performance on articles 2) increasing the word embedding size from the default belonging to category c2 for all categories c1 and c2. 50 to 300, As we have a total of 6 categories, we have 36 domain 3) initializing the word embedding with FAST T EXT pairs to experiment on. To reduce computational cost, pre-trained embedding. we used only the articles from the first fold and did Scenario 2 is necessary to determine whether any improve- not tune any hyperparameters. We note that this decision ment in scenario 3 is due to the larger embedding size might undermine the generalizability of conclusions drawn or the pre-trained embedding. In scenario 2 and 3, we from these out-of-domain experiments. Nonetheless, we used the default hyperparameter setting from the authors’ feel that the results can still be a useful guidance for implementation. In addition, for every scenario, we picked future work. As comparisons, we also evaluated L EAD -3, the model saved at an epoch that yields the best R-1 score O RACLE, and the best unsupervised method, L EX R ANK. on the development set. For L EX R ANK, we used the best hyperparameter that we found in the previous experiment for the first fold. We only IV. R ESULTS AND DISCUSSION report the ROUGE-1 scores. Table III shows the result of A. Overall results this experiment. Table II shows the test F1 score of ROUGE-1, ROUGE- We see that almost all the results outperform the 2, and ROUGE-L of all the tested models described L EAD -3 baseline, which means that for out-of-domain previously. The mean and standard deviation (bracketed) cases, N EURAL S UM can summarize not just by selecting of the scores are computed over the 5 folds. We put the some leading sentences from the original text. Almost score obtained by an oracle summarizer as O RACLE. Its all N EURAL S UM results also outperform L EX R ANK, sug- summaries are obtained by using the true labels. This gesting that when there is no in-domain training data, oracle summarizer acts as the upper bound of an extractive training N EURAL S UM on out-of-domain data may yield summarizer on our dataset. As we can see, in general, better performance than using an unsupervised model like every scenario of N EURAL S UM consistently outperforms L EX R ANK. Looking at the best results, we observe that the other models significantly. The best scenario is N EU - they all are the out-of-domain cases. In other words, RAL S UM with word embedding size of 300, although its training on out-of-domain data is surprisingly better than ROUGE scores are still within one standard deviation on in-domain data. For example, for Sport as the target of N EURAL S UM with the default word embedding size. domain, the best model is trained on Headline as the L EAD -3 baseline performs really well and outperforms al- source domain. In fact, using Headline as the source most all the other models, which is not surprising and even domain yields the best result in 3 out of 6 target domains. consistent with other work that for news summarization, We suspect that this phenomenon is because of the simi- L EAD -N baseline is surprisingly hard to beat. Slightly larity between the corpus of the two domain. Specifically, lower than L EAD -3 are L EX R ANK and BAYES, but their training on Headline yields the best result most of the scores are still within one standard deviation of each other time because news from any domain can be headlines. so their performance are on par. This result suggests that Further investigation on this issue might leverage domain a non-neural supervised summarizer is not better than an similarity metrics proposed in. Next, comparing the unsupervised one, and thus if labeled data are available, best N EURAL S UM performance on each target domain to it might be best to opt for a neural summarizer right O RACLE, we still see quite a large gap. This gap hints that away. We also want to note that despite its high ROUGE, N EURAL S UM can still be improved further, probably by lifting the limitations of our experiment setup (e.g., tuning 7 https://github.com/kata-ai/indosum the hyperparameters for each domain pair). 2018 International Conference on Asian Language Processing (IALP) 218 Authorized licensed use limited to: Universitas Negeri Semarang. Downloaded on June 11,2024 at 06:57:00 UTC from IEEE Xplore. Restrictions apply. Table II T EST F1 SCORE OF ROUGE-1, ROUGE-2, AND ROUGE-L, AVERAGED OVER 5 FOLDS Group Method R-1 R-2 R-L O RACLE 79.27 (0.25) 72.52 (0.35) 78.82 (0.28) Oracle/Baseline L EAD -3 62.86 (0.34) 54.50 (0.41) 62.10 (0.37) S UM BASIC , 35.96 (0.18) 20.19 (0.31) 33.77 (0.18) L SA , 41.37 (0.19) 28.43 (0.25) 39.64 (0.19) Unsupervised L EX R ANK 62.86 (0.35) 54.44 (0.44) 62.10 (0.37) T EXT R ANK 42.87 (0.29) 29.02 (0.35) 41.01 (0.31) BAYES 62.70 (0.39) 54.32 (0.46) 61.93 (0.41) Non-neural supervised H MM 17.62 (0.11) 4.70 (0.11) 15.89 (0.11) M AX E NT 50.94 (0.42) 44.33 (0.50) 50.26 (0.44) N EURAL S UM 67.60 (1.25) 61.16 (1.53) 66.86 (1.30) Neural supervised N EURAL S UM 300 emb. size 67.96 (0.46) 61.65 (0.48) 67.24 (0.47) N EURAL S UM + FAST T EXT 67.78 (0.69) 61.37 (0.93) 67.05 (0.72) Table III T EST F1 SCORE OF ROUGE-1 FOR THE OUT- OF - DOMAIN EXPERIMENT Target dom. Method Source dom. Entertainment Inspiration Sport Showbiz Headline Tech O RACLE — 75.59 81.19 77.65 78.33 80.52 80.09 L EAD -3 — 51.27 52.12 67.56 65.05 65.21 50.01 L EX R ANK — 51.41 50.78 67.52 65.01 65.19 50.01 Entertainment 52.51 53.15 72.51 67.01 67.63 51.81 Inspiration 52.51 52.71 72.51 67.01 68.02 51.67 Sport 52.41 53.85 72.51 66.62 68.48 50.89 N EURAL S UM Showbiz 53.65 49.86 72.51 67.81 70.88 51.22 Headline 52.80 55.07 72.53 67.17 71.59 50.92 Tech 50.39 47.93 62.43 56.93 63.44 48.00 V. C ONCLUSION AND FUTURE WORK approach can also be interesting directions for future We present I NDO S UM, a new benchmark dataset for avenue. Other tasks such as further investigation on the Indonesian text summarization, and evaluated state-of-the- out-of-domain issue, human evaluation, or even extending art extractive summarization methods on the dataset. We the corpus to include more than one summary per article tested unsupervised, non-neural supervised, and neural are worth exploring as well. supervised summarization methods. We used ROUGE as the evaluation metric because it is the standard intrinsic ACKNOWLEDGMENT evaluation metric for text summarization evaluation. Our results show that neural models outperform non-neural We thank anonymous reviewers for their helpful feed- ones and in absence of in-domain corpus, training on out- back. We acknowledge the support from Shortir and of-domain one seems to yield better performance instead Tempo. Lastly, we also thank Muhammad Pratikto and of using an unsupervised summarizer. Also, we found that Ahmad Rizqi Meydiarso for their relentless support. the best performing model achieves ROUGE scores that are still significantly lower than the maximum possible scores, which suggests that the dataset is sufficiently R EFERENCES challenging for future work. The dataset, which consists of 19K article-summary pairs, is publicly available. We D. Das and A. F. Martins, “A survey on automatic text hope that the dataset and the evaluation results can serve summarization,” Literature Survey for the Language and as a benchmark for future research on Indonesian text Statistics II course at CMU, vol. 4, pp. 192–195, 2007. C.-Y. Lin, “Rouge: A package for automatic evaluation of summarization. summaries,” in Text Summarization Branches out: Proceed- Future work in this area may focus on improving ings of the ACL-04 Workshop, vol. 8. Barcelona, Spain, the summarizer performance by employing newer neural 2004. models such as SummaRuNNer or incorporating A. Najibullah, “Indonesian Text Summarization based on side information. Since the gold summaries are Naı̈ve Bayes Method,” in Proceeding of the Interna- tional Seminar and Conference 2015: The Golden Triangle abstractive, abstractive summarization techniques such as (Indonesia-India-Tiongkok) Interrelations in Religion, Sci- attention-based neural models , seq2seq models , ence, Culture, and Economic, Semarang, Indonesia, 2015, pointer networks , or reinforcement learning-based p. 12. 2018 International Conference on Asian Language Processing (IALP) 219 Authorized licensed use limited to: Universitas Negeri Semarang. Downloaded on June 11,2024 at 06:57:00 UTC from IEEE Xplore. Restrictions apply. M. Fachrurrozi, N. Yusliani, and R. U. Yoanita, “Frequent extraction, utility-based evaluation, and user studies,” in Term based Text Summarization for Bahasa Indonesia,” in Proceedings of the 2000 NAACL-ANLP Workshop on Au- Proceedings of the International Conference on Innovations tomatic Summarization. Association for Computational in Engineering and Technology, Bangkok, Thailand, 2013, Linguistics, 2000, pp. 21–30. p. 3. R. Mihalcea and P. Tarau, “Textrank: Bringing order into Silvia, P. Rukmana, V. Aprilia, D. Suhartono, R. Wongso, text,” in Proceedings of the 2004 Conference on Empirical and Meiliana, “Summarizing Text for Indonesian Lan- Methods in Natural Language Processing, 2004. guage by Using Latent Dirichlet Allocation and Genetic J. Conroy and D. O’Leary, “Text summarization via hidden Algorithm,” in Proceeding of International Conference on Markov model and pivoted QR matrix decomposition,” Electrical Engineering, Computer Science and Informatics 2001. (EECSI 2014), Yogyakarta, Indonesia, 2014, p. 6. M. Osborne, “Using maximum entropy for sentence ex- C. Aone, M. E. Okurowski, and J. Gorlinsky, “Trainable, traction,” in Proceedings of the Workshop on Automatic scalable summarization using robust NLP and machine Summarization (Including DUC 2002). Philadelphia: learning,” in Proceedings of the 17th International Confer- Association for Computational Linguistics, Jul. 2002. ence on Computational Linguistics-Volume 1. Association F. Tala, J. Kamps, K. E. Müller, and R. de M, “The impact for Computational Linguistics, 1998, pp. 62–66. of stemming on information retrieval in Bahasa Indonesia,” D. Gunawan, A. Pasaribu, R. F. Rahmat, and R. Budiarto, Studia Logica - An International Journal for Symbolic “Automatic Text Summarization for Indonesian Language Logic - SLOGICA, Jan. 2003. Using TextTeaser,” IOP Conference Series: Materials Sci- C.-Y. Lin and E. Hovy, “Automatic evaluation of summaries ence and Engineering, vol. 190, no. 1, p. 012048, 2017. using n-gram co-occurrence statistics,” in Proceedings of C. Slamet, A. R. Atmadja, D. S. Maylawati, R. S. Lestari, the 2003 Conference of the North American Chapter of W. Darmalaksana, and M. A. Ramdhani, “Automated Text the Association for Computational Linguistics on Human Summarization for Indonesian Article Using Vector Space Language Technology-Volume 1. Association for Compu- Model,” IOP Conference Series: Materials Science and tational Linguistics, 2003, pp. 71–78. Engineering, vol. 288, p. 012037, Jan. 2018. P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “En- D. T. Massandy and M. L. Khodra, “Guided summariza- riching word vectors with subword information,” arXiv tion for Indonesian news articles,” in 2014 International preprint arXiv:1607.04606, 2016. Conference of Advanced Informatics: Concept, Theory and S. Ruder and B. Plank, “Learning to select data for transfer Application (ICAICTA), Aug. 2014, pp. 140–145. learning with Bayesian Optimization,” in Proceedings of F. Koto, “A Publicly Available Indonesian Corpora for Au- the 2017 Conference on Empirical Methods in Natural tomatic Abstractive and Extractive Chat Summarization,” Language Processing. Copenhagen, Denmark: Association in Proceedings of the Tenth International Conference on for Computational Linguistics, Jul. 2017, pp. 372–382. Language Resources and Evaluation (LREC 2016). Por- S. Narayan, N. Papasarantopoulos, M. Lapata, and S. B. torož, Slovenia: European Language Resources Association Cohen, “Neural Extractive Summarization with Side Infor- (ELRA), 2016, p. 5. mation,” CoRR, vol. abs/1704.04530, 2017. R. Nallapati, F. Zhai, and B. Zhou, “SummaRuNNer: A A. M. Rush, S. Chopra, and J. Weston, “A neural attention Recurrent Neural Network Based Sequence Model for Ex- model for abstractive sentence summarization,” in Proceed- tractive Summarization of Documents,” in Proceedings of ings of the 2015 Conference on Empirical Methods in Nat- the Thirty-First AAAI Conference on Artificial Intelligence, ural Language Processing. Lisbon, Portugal: Association February 4-9, 2017, San Francisco, California, USA., 2017, for Computational Linguistics, 2015, pp. 379–389. pp. 3075–3081. R. Nallapati, B. Zhou, C. dos Santos, C. Gulcehre, J. Cheng and M. Lapata, “Neural summarization by ex- and B. Xiang, “Abstractive Text Summarization using tracting sentences and words,” in Proceedings of the 54th Sequence-to-sequence RNNs and Beyond,” in Proceedings Annual Meeting of the Association for Computational Lin- of The 20th SIGNLL Conference on Computational Natural guistics. Berlin, Germany: Association for Computational Language Learning. Berlin, Germany: SIGNLL, 2016. Linguistics, Aug. 2016, pp. 484–494. A. See, P. J. Liu, and C. D. Manning, “Get to the point: S. Bird, E. Loper, and E. Klein, Natural Language Pro- Summarization with pointer-generator networks,” in Pro- cessing with Python. O’Reilly Media Inc., 2009. ceedings of the 55th Annual Meeting of the Association for A. Nenkova and L. Vanderwende, “The impact of frequency Computational Linguistics (Volume 1: Long Papers). Van- on summarization,” Microsoft Research, Redmond, Wash- couver, Canada: Association for Computational Linguistics, ington, Tech. Rep. MSR-TR-2005, vol. 101, 2005. 2017, pp. 1073–1083. L. Vanderwende, H. Suzuki, C. Brockett, and A. Nenkova, R. Paulus, C. Xiong, and R. Socher, “A Deep Reinforced “Beyond SumBasic: Task-focused summarization with sen- Model for Abstractive Summarization,” arXiv:1705.04304 tence simplification and lexical expansion,” Information [cs], May 2017. Processing & Management, vol. 43, no. 6, pp. 1606–1618, 2007. Y. Gong and X. Liu, “Generic text summarization using relevance measure and latent semantic analysis,” in Pro- ceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2001, pp. 19–25. J. Steinberger and K. Jezek, “Using latent semantic analysis in text summarization and summary evaluation,” in Proc. ISIM’04, 2004, pp. 93–100. G. Erkan and D. R. Radev, “Lexrank: Graph-based lexical centrality as salience in text summarization,” Journal of Artificial Intelligence Research, vol. 22, pp. 457–479, 2004. D. R. Radev, H. Jing, and M. Budzikowska, “Centroid- based summarization of multiple documents: Sentence 2018 International Conference on Asian Language Processing (IALP) 220 Authorized licensed use limited to: Universitas Negeri Semarang. Downloaded on June 11,2024 at 06:57:00 UTC from IEEE Xplore. Restrictions apply.