The availability of huge masses of data and the various ways to exploit it is currently changing the way science is applied in numerous study domains (Halevy et al., 2009; Cristianini, 2010). For instance, according to Michel et al. (2011) and Lazer et al. (2009), new data-driven approaches are being deployed in the field of social sciences, and Holderness (2014) provides an example of geosocial intelligence as one aspect that can be capitalized studying water sanitation. Specifically, Holderness (2014) asserts that geosocial media, which encapsulates the use of social media, as well as Geographical Information Systems (GIS) to be essential tools in dealing with water sanitation problems, especially in the developing nations. Even though Big Data use is still in its early phases, delays, as Watts (2007) wrote, were caused by social interaction complexity, as well as the unavailability of digital data. However, current use is inevitable, for example, studying human interactions via mobile phones (Gonzalez et al., 2008; Onnela et al., 2007), via online environments and games (Szell et al., 2010), as well as text annotation endeavors as employed in science research (Cardie et al,. 2008).
For this reason, Big Data has gained significant persuasiveness and materiality in framing both institutional and social domains. However, various challenges exist in this social science and Big Data domain, including technological and ethical challenges. This paper aims at defining Big Data while also touching on the imminent problems while employing it in the social research paradigm. Being a case study, this paper assesses the need of geosocial intelligence in solving water sanitation problems.
Big Data is a widely defined term, and therefore, according to Ward and Barker (2013), the term has become ubiquitous, owing to its shared origin within the media, academic, and industry proportions. As such, different stakeholders have provided variant definitions. Big Data is composed of huge datasets whose capacity exceeds the current capability of conventional database and software tools in capturing, storing, managing, and analyzing these huge datasets (Demchenko, 2013). For this reason, todays Big Data may not be tomorrows, and thus evidencing the fact that Big Data is ever growing (Franks, 2013). As Lohr (2013) purports, interest into Big Data has gained huge attention, and its interest since 2011 has increased exponentially. Big Data is becoming related to all aspects of human activities, from mere events recording to design, research, digital services, production, as well as product delivery from manufacturers to the final consumer (Demchenko et al., n.d). current technologies, including GIS and cloud computing, as well as ubiquitous network connectivity are important in providing a platform supported by the Internet, which has allowed the automation of all data handling processes, including data collection, data storage, processing, and data visualization (Grimmer, 2015).
Big Data can be used in social research because via the aggregation of information over data networks, such as social media, huge amounts of information can be retrieved, and therefore, enabling the study of human behavior and activities. For this reason, capitalizing on Big Data technologies, it is increasingly becoming easy to mine data for desirable information, and therefore, in social research, it can lead to information that can predict and learn the behavior of both individuals and groups by supporting social group activity prediction (Demchenko et al., 2013). As such, Big Data has played a huge role in creating a radical shift in the manner we conduct research. For instance, as Boyd and Crawford (2012) assert, it offers an insightful change from the dimensions of ethics and epistemology. In addition, it enables researchers to reframe key questions as how to generate knowledge from people, streamlining the research process, as well as providing a profound mechanism and patterns on how to engage information to make it more useful.
Capitalizing Big Data and using it for social research is inevitable, which has been facilitated by the growth of the Internet. Therefore, its advantage is that we use existing data, and thus Big Data is easily retrievable via the web, mainly via social media sites, for example, Twitter and Facebook among others. As such, rather than collecting new data sets, social scientists leverage on existing data to conduct social research, for example, they analyze already existing tweets to mine relevant data in relation to the subject being addressed (Demchenko et al., 2013). Therefore, using Big Data for social research is cheaper than embarking on collecting fresh data. However, without reliable computational tools, analyzing Big Data would be quite difficult. These tools are vital in processing the massive amounts of data. As such, social scientists do not have to choose between data depth and size, rather, through the computational tools, they can study trajectories and patterns formed by billions of experiences, links, cultural expressions, and texts accumulated via social media sites, as well as mobile phone companies(Manovich, 2011). It is worth noting that Big Data mining, computation, and correlation is based on a similar manner that Google Search operates. Ideally, when an individual tries to find an article via the platform, Google algorithms analyze web pages to the tune of billions, as well as PDF, Excel, and Word documents, plain text files, and Flash files, as well as Twitter and Facebook content (since 2009) (Manovich, 2011).
Despite the numerous advantages of Big Data, various challenges, as opposed to traditional research methods, usually accompany its use for social research. There exists limitations and risks with its use, especially through the reliability, validity, as well as ethical issues surrounding its use. For instance, according to Crawford and Boyd (2012), Big Data has limitations of accuracy and objectivity, primarily because working with Big Data is a subjective matter, and its quantification does not always have claims of objective truth, especially when using social media messages. Also, it is vital that social scientists consider Big Data as raw information that needs a lot of interpretation because it is not self-explanatory, and remove any existing biases.
In addition, as Crawford and Boyd (2012) purport, Big Data is not always better because it does not always represent the entirety of the population, even when studies claim that millions of users were used in a specific study (Wang, 2011). For example, Crawford (2009) claims that even though Twitter has millions of users, some accounts are just bots because even though some users may post content frequently on the site, others are just listeners, meaning that they do not actively participate in posting new texts and ideas. In addition, Twitter showcased that 40% of the active users only sign in solely for listening purposes Crawford and Boyd (2012). Also, as Crawford and Boyd (2012) assert, taken out of context, Big Data may be meaningless. Essentially, relations that may be established via social media are not equal to kinship networks and sociograms sociologists (Freeman, 2006).
As such, there has to be an underlying study function for meaning to be established, and the lack of a connection does not always indicate that relationships should be made. According to Crawford and Boyd (2012), retaining context is critical in social research, but it is difficult in interpreting and hard to maintain when data is reduced to fit a certain model. Also, ethical issues arise, especially when in instances when privacy is being compromised (Zimmer, 2008). For this reason, accountability is inevitable, which is a broader privacy concern (Troshynski et al., 2008; Chaudhuri, 2012), which should be expressed to colleagues, superiors, as well as participants and the public (Dowrish & Bell, 2011). Big Data also creates digital divide as it is not readily available for everybody, instead, only social media companies have access to the large datasets (Manovich, 2011). However, use of Big Data has benefits, including reduction in costs, resources, and efforts needed in conducting social research.
The UN Millennium Campaign (UNMC) in collaboration with the Water Supply and Sanitation Collaborative Council (WSSCC) conducted social research aiming to comprehend public engagement levels in water sanitation (UN Global Pulse, 2014).. According to Global Pulse UNMC and WSSCC used Twitter to filter relevant words, such as sewage, and the tweets categorized into health, human rights, gender, policy and governance, and general interest. Using social data analytics platform, Crimson Hexagon ForSight, the partners performed trend analysis, and correlation, influencers, as well as hashtags in the tweets. Out of the 260,000 analyzed from January 2011 to December 2013, 33% were related to cholera, which falls in the health category. The remaining categories had low tweet volumes. As such, this study demonstrated that Twitter can provide useful insight public perceptions, as well as how public discourse changes over time (UN Global Pulse, 2014).
As such, social media is a convenient technique for analyzing water sanitation aspects such as dissatisfaction, as well as resources reduction. However, data quality and context were major problems in the research. Various ethical issues surfaced including privacy, confidentiality, and consent. However, accountability was upheld as no accounts were names, and only tweets were used to study water sanitation trends. In addition, the findings, in a bid to reduce ethical and privacy issues, were limited as no results were visualized. Not all population may have been represented because not all nations speak English, and therefore, the information is biased. Also, as the young use Twitter, the old were left out, providing more evidence that the information was biased. In addition, since most individuals are sarcastic, tweets that fall in the context of water sanitation may refer to something else, some purporting positivity rather than negativity.
In conclusion, Big Data is essential in conducting social research because it is cheap in terms to resources utilized due to the readily available data as opposed to traditional research methods. Social media, such as Twitter can be important in mining Big Data. However, various limitations exist. These include compromising confidentiality, where ethical and privacy issues emerge. Also, context is important in acquiring the right data. These issues need a lot of consideration while conducting social research.
References
Cardie, C, Wilkerson, J. (2008). Text annotation for political science research, Journal of Information Technology & Politics, 5(1), 16.
Chaudhuri, S. (2012, May). What next?: a half-dozen data management research goals for big data and the cloud. In Proceedings of the 31st symposium on Principles of Database Systems (pp. 1-4).
Cristianini, N (2010), Are we there yet? Neural Networks, Elsevier, 23(4), 466-470.
Crawford, K. (2009). Following you: disciplines of listening in social media. Continuum: Journal of Media & Cultural Studies, 23(4), 532533.
Crawford, K., Kranzberg, M., & Bowker, G. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon.Information, Communication & Society, 15(5), 662-679.
Dourish, P. & Bell, G. (2011). Divining a Digital Future: Mess and Mythology in Ubiquitous .Computing, MIT Press: Cambridge, MA.
Demchenko, Y., Grosso, P., De Laat, C., & Membrey, P. (2013, May). Addressing big data issues in scientific data infrastructure. In Collaboration Technologies and Systems (CTS), 2013 International Co...
If you are the original author of this essay and no longer wish to have it published on the SuperbGrade website, please click below to request its removal: