• Users Online: 193
  • Print this page
  • Email this page
Year : 2020  |  Volume : 6  |  Issue : 1  |  Page : 67-73

Text mining and analysis of treatise on febrile diseases based on natural language processing

1 School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, Beijing 100029, China
2 School of Life Science, Beijing University of Chinese Medicine Beijing 100029, China
3 School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, China

Correspondence Address:
Prof. Xiao-Ying Xu
School of Traditional Chinese Medicine, Beijing University of Chinese Medicine, 11 Beisanhuan Donglu, Chaoyang, Beijing 100029
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/wjtcm.wjtcm_28_19

Rights and Permissions

Objective: With using natural language processing (NLP) technology to analyze and process the text of “Treatise on Febrile Diseases (TFDs)” for the sake of finding important information, this paper attempts to apply NLP in the field of text mining of traditional Chinese medicine (TCM) literature. Materials and Methods: Based on the Python language, the experiment invoked the NLP toolkit such as Jieba, nltk, gensim, and sklearn library, and combined with Excel and Word software. The text of “TFDs” was sequentially cleaned, segmented, and moved the stopped words, and then implementing word frequency statistics and analysis, keyword extraction, named entity recognition (NER) and other operations, finally calculating text similarity. Results: Jieba can accurately identify the herbal name in “TFDs.” Word frequency statistics based on the word segmentation found that “warm therapy” is an important treatment of “TFDs.” Guizhi decoction is the main prescription, and five core decoctions are identified. Keyword extraction based on the term “frequency-inverse document frequency” algorithm is ideal. The accuracy of NER in “TFDs” is about 86%; latent semantic indexing model calculating the similarity, “Understanding of Synopsis of Golden Chamber (SGC)” is much more similar with “SGC” than with “TFDs.” The results meet expectation. Conclusions: It lays a research foundation for applying NLP to the field of text mining of unstructured TCM literature. With the combination of deep learning technology, NLP as an important branch of artificial intelligence will have broader application prospective in the field of text mining in TCM literature and construction of TCM knowledge graph as well as TCM knowledge services.

Print this article     Email this article
 Next article
 Previous article
 Table of Contents

 Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
 Citation Manager
 Access Statistics
 Reader Comments
 Email Alert *
 Add to My List *
 * Requires registration (Free)

 Article Access Statistics
    PDF Downloaded46    
    Comments [Add]    

Recommend this journal