以中文文本分析為主的線上社交訊息作者辨識

Guan-Ting Ke; Kuo-Hui Yeh; Li-Hsuan Lo

以中文文本分析為主的線上社交訊息作者辨識

Guan-Ting Ke,
Kuo-Hui Yeh,
Li-Hsuan Lo,

Abstract

本研究主要探討基於社交聊天訊息文本之身份鑑別，近年來，線上社交詐騙行為頻傳，大多情況為利用社交工程手法進行個人帳號之盜用，鑑於此，本研究以此現象為研究標的，希望能建立一套有效率的身分鑑別系統以辨別文本訊息之來源使用者的真實性與合法性。研究方法中將以使用者的社交文本訊息作為使用者鑑別資料來源，並利用語意分析模型（Semantic Analysis Model）、多層感知器（Multilayer Perceptron, MLP）與支援向量機（Support Vector Machine, SVM）做為主要的資料分析演算法，進行使用者鑑別符元的產生與鑑別準確率檢測。研究成果顯示，在語意模型分析實驗中，有65%的檢測案例之相似度皆低於70%，而多層感知器分析與支援向量機分析則分別可達到80%與88%的鑑別準確率。Recently, cases of scamming on social media keep pouring in. Most cases are related to hacked social media accounts, which belong to those who suffered from identity stealing by social engineering. In this research, we focus on how users' instant messages can be exploited to defeat identity thieves. We proposed an authentication system based on stylometry of users' instant messages, which is able to tell whether the current user of the account having both of its representation and perpetuity. We collect users' instant message as the raw data for training process, create the classifiers through Latent Semantic Analysis (LSA), Multilayer Perceptron (MLP) and Support Vector Machine (SVM). The research result pointed out that, with only LSA model equipped, 65% of test cases reach lower than 70% of similarity, while utilizing MLP and SVM can reach 80% and 88% of accuracy, respectively.

Keywords

身份鑑別; 社群網路; 語意模型; 支援向量機; 多層感知器; Authentication; Social Media; Semantic Analysis Model; Support Vector Machine; Multilayer Perceptron

Citation Format:
Guan-Ting Ke, Kuo-Hui Yeh, Li-Hsuan Lo, "以中文文本分析為主的線上社交訊息作者辨識," Communications of the CCISA, vol. 24, no. 4 , pp. 15-30, Oct. 2018.

Full Text:

PDF

Refbacks

There are currently no refbacks.

Published by Chinese Cryptology and Information Security Association (CCISA), Taiwan, R.O.C
CCCISA Editorial Office
E-mail: ccisa.editor@gmail.com

Username
Password
Remember me