This is .csv file of the normalized data . There are 22 attributes and their class as follow: Id Name IsVerified ProfileImageUrl FollowersCount FriendsCount FavouritesCount StatusesCount Description Location TimeZone CreatedDate Status Url Mentions Number of Mentions HashTags Number of HashTags RetweetCount TwittCreatedDate MessageText MessageImage Class The normalized rules are: Attributes Normalization Condition Id 0 None 1 Range of Id is 9-10 digits 2 Range of Id is 18 digits Name, Description 0 None 1 All are Thai 2 All are English 3 Mixed of Thai or English or Number 4 Only symbols 5 Otherwise IsVerified 0 Null 1 True 2 False ProfileImageUrl 0 None 1 .jpg image 2 .png image 3 otherwise FollowersCount, FriendsCount, FavouritesCount, StatusesCount, RetweetCount, 0 None 1 = 1-9 2 = 10-99 3 = 100-999 4 = 1,000-9,999 5 = 10,000-99,999 6 = 100,000-999,999 7 = 1,000,000-9,999,999 Location, TimeZone 0 None 1 Thailand 2 South East Asia (Not include Thailand) 3 Asia (Not include South East Asia) 4 Australia/New Zealand 5 Europe/Russia 6 US/Canada/Alaska/Hawaii 7 Africa 8 Otherwise CreatedDate 0 None 1 Less than 0.5 year 2 Between 0.5 year and 1 year 3 Between 1 years and 1.5 years … 24 Maximum value Status 0 None 1 Valuable Url, MessageImage, Mentions, HashTags, Number of Mentions, Number of HashTags 0 None 1 for a link / @ (an account) / # (a topic) 2 for 2 links / accounts / topics 3 for 3 links / accounts / topics 4 more than 3 links / accounts / topics TweetCreatedDate 0 None 1 Between 06.01-12.00 2 Between 12.01-18.00 3 Between 18.01-24.00 4 Between 00.01-06.00 MessageText 1 Own message 2 Retweet messages