New Social Networking Project
I’ve decided to kick off a new personal software project to explore the convergence of social networking technologies, software as a service (SaaS) and graph-based data mining (my doctoral dissertation topic). Specifically, I’m looking at ways in which meaningful patterns can be discovered by combining the unstructured text data (e.g. tweets, status updates, etc) with the structured data (relationships to people, events, places, etc) created in social networking technologies like Facebook and Twitter. At present, it seems that Facebook offers a richer source of structured data because of the way they structure data around their social graph. However, I have observed that Twitter appears to spawn social networks organized around topics (e.g. political ideology) and events, whereas Facebook seems to be primarily organized around social networks that already exist in meatspace. With Facebook, it is generally bad form to attempt to add someone into one’s social network that one does not already know, because the action of expanding one’s social network is bidirectional. With Twitter, adding someone into one’s social network is a unidirectional action (i.e. following), and following complete strangers is the accepted norm, with no requirement or expectation of reciprocity. I know and have met (in meatspace) everyone in my Facebook social network. I haven’t met anyone in my Twitter network. I find that to be very appealing.
Initially, my project will focus on the basics:
- Utilizing the APIs for Facebook and Twitter to collect data samples for both structured and unstructured data.
- Develop feature extraction approaches for the unstructured data. For the free text data, I expect to start with basic NLP algorithms such as phrase extractors and stemming.
- Normalizing the structured and unstructured data into a common graph representation, most likely using an XML or JSON representation.
- Evaluate pattern mining results from various graph-based data mining algorithms.
In my previous research experience, pushing data through the mining algorithms offers a lot of insight into the ways in which the data should be structured to optimize pattern mining results. Analyzing the resultant patterns also offers a great deal of insight into how those patterns could be utilized. In the coming days I’ll be thinking about the utility of this type of pattern mining to the social networking user community and to the corporations that run the sites and collect the data (e.g for targeted advertising, recommendations, etc). I will also be working to define an architecture for deploying the data management and data mining software into a SaaS configuration. Expect more posts describing my progress over the coming days, weeks and months.



Pingback: Writing a RESTful Web Service Client
Pingback: Project Overview « Engineering Notebook