Academic
Publications
A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News

A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News,10.1007/978-3-642-04769-5_12,Jin Zhang,Lei Xie,Wei Feng,Ya

A Subword Normalized Cut Approach to Automatic Story Segmentation of Chinese Broadcast News   (Citations: 3)
BibTex | RIS | RefWorks Download
This paper presents a subword normalized cut (N-cut) approach to automatic story segmentation of Chinese broadcast news (BN). We represent a speech recognition transcript using a weighted undirected graph, where the nodes correspond to sentences and the weights of edges describe inter-sentence similarities. Story segmentation is formalized as a graph-partitioning problem under the N-cut criterion, which simultaneously minimizes the similarity across different partitions and maximizes the similarity within each partition. We measure inter-sentence similarities and perform N-cut segmentation on the character/syllable (i.e. subword units) overlapping n-gram sequences. Our method works at the subword levels because subword matching is robust to speech recognition errors and out-of-vocabulary words. Experiments on the TDT2 Mandarin BN corpus show that syllable-bigram-based N-cut achieves the best F1-measure of 0.6911 with relative improvement of 11.52% over previous word-based N-cut that has an F1-measure of 0.6197. N-cut at the subword levels is more effective than the word level for story segmentation of noisy Chinese BN transcripts.
Conference: Asia Information Retrieval Symposium - AIRS , pp. 136-148, 2009
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.
    • ...Lexical Chaining [6] 0.5168 NCuts [11] 0.6197 Multi-feature Integration 0.7722...
    • ...Finally, we compared our method with several state-ofthe-art topic segmentation approaches on the TDT2 corpus and the results are shown in Table 3. We can see that the proposed multi-feature integration approach significantly outperform the TextTiling approach [5], the lexical chaining approach [6] and the NCuts approach [11]...

    Lei Xieet al. Integrating acoustic and lexical features in topic segmentation of Chi...

    • ...Story boundary detection approaches can be categorized to detection -based [2]–[7] and model -based [8]–[10]...
    • ...We experiment with the CCTV broadcast news corpus [10], which contains 71 Mandarin broadcast news videos (over 30 hours in duration) from China Central Television...
    • ...Note that all the lexical features were calculated on character unigram sequences for the Mandarin LVCSR transcripts due to the robustness of sub-word to speech recognition errors [10]...

    Mi-Mi Luet al. Multi-modal feature integration for story boundary detection in broadc...

    • ...Recently, the partial matching merit of subwords lexical units [3], has been successfully combined with NCut method on story segmentation of Chinese broadcast news [4]...
    • ...Another reason given by Zhang [4], they demonstrate that, the same topic news may be rebroadcasted many times during one program period, which may implicitly increases the similarity between the re-occurred topics...
    • ...Motivated by the merits of subwords in lexical matching in Chinese broadcast news transcripts, Zhang [4] proposed a subwords Ncut approach for story segmentation task...
    • ...Fig. 2. Fully connected graph model (a) and discarding edges between sentences whose distance exceeds two (b) [4]...
    • ...Fig. 4. Experimental results on TDT2 test set. Ncut method [4]...
    • ...Results indicate that the syllable-trigram-based automatic Ncut achieves the best F1-measure of 0.7118 with relative improvement of 3% over Zhang’s syllable-bigrambased Ncut(0.6911) [4], the previous best mothed...
    • ...Our syllable-bigrambased automatic Ncut achieves the best F1-measure of 0.7118 with relative improvement of 3% over Zhang’s syllablebigram-based Ncut that has an F1-measure of 0.6911 [4]...

    YuanYuan Jinet al. An automatic normalized cut topic segmentation approach

Sort by: