This dataset on user emotions was obtained from LiveJournal (http://www.livejournal.com/) and created by crawling users within three hops of a starting user.
If using this dataset, please acknowledge so by citing the following:
Jin, Shengmin, and Reza Zafarani. “Emotions in Social Networks: Distributions, Patterns, and Models.” In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management, ACM, 2017.
@inproceedings{jin2017,
title={Emotions in Social Networks: Distributions, Patterns, and Models},
author={Jin, Shengmin and Zafarani, Reza},
booktitle={Proceedings of the 26th ACM International on Conference on Information and Knowledge Management},
pages = {1907–1916},
year={2017},
organization={ACM}
}
This is a pre-processed version of the dataset used in our paper ‘Emotions in Social Networks: Distributions, Patterns, and Models’ capable of reproducing all results. As discussed in the paper, a set of preprocessing steps on the dataset is performed. In particular, for consistency in sentiment analysis and removing meme-type moods, we only retain the posts that have their moods selected from the predefined list provided by LiveJournal. In addition, we only retain users that have 10 or more
posts to exclude occasionally active or inactive users. Finally, we manually convert each mood in our dataset to its polarity (positive, negative, or neutral). After this step, all moods in our dataset are either positive (1), negative (-1), or neutral (0).
For protecting user privacy, we assign each user with a unique user id, and each community with a unique community id.
This dataset can be downloaded from here and includes:
Friendship.txt
Format: userA,userB
Description: Friendship.txt includes a listing of the friend relationships of the users up to 2 hops away from the starting user. The format of the listings is the user id of a given user followed by a friend of that user, separated by a comma.
Followers.txt
Format: Follower,Followee
Description: Followers.txt includes a listing of the followers of a given user. The format of the listings is the user id of the following user proceeded by the user id of the user being followed, separated by a comma.
Community.txt
Format: user_id,community_id
Description: Community.txt includes a listing of any communities a user was a member of. The format of the listings is the user id we assigned followed by the community id
of the community the user was a member of, the two fields being separated by a comma.
Posts.txt
Format: user_id,emotions,the date
Description: Posts.txt includes a listing of the user-provided moods that accompany the posts of a given user. The format of the listings is the user id of the user, the emotion of the post (+1,-1,0), and finally the date (YYYY-MM-DD HH:MM:SS) of the post accompanying the mood. All three fields are separated by commas. If the time of the post retrieved did not contains seconds, an assumed value of 00 is used for the seconds value of the date for the post.