Tag: data

What is Data?

To be honest, data has always been one of the words that confuses me, especially when writing papers. Its meaning can change depending on the context of the sentence, and it’s also one of those sneaky words, that is plural but doesn’t outright appear to be. So, it’s fair to say that data hasn’t been one of my favorite words to use. However, after 4 lab courses at Colby, all in the natural sciences, I have become more accustomed to working with raw data.

While Professor Aaron Hanlon’s lecture on Revolutions in Big Data initially seemed boring, as I was not very keen on the subject, I was very surprised to be intrigued and fascinated by his presentation. Hanlon’s lecture looked at the evolution of data in several different ways, including meaning, interpretation, and frequency of use.

According to Hanlon, the first recorded use of the the word data was in the early 17th century as “a heap of data” ;describing the word of God. This use of the word in the religious/spiritual context makes it sound as if the word is synonymous with truth, but this is not right. Data isn’t fact, but is the first step in formulating ideas, and builds fact and truth. Data has come a long way since then, and has expanded to mean multiple things.

An example Hanlon used was Hooke’s book Micrographia, which showed small insects and organic material, such as fleas and leaves, blown up in drawings to show very small detail. This in itself was a small revolution as this new form of data, revolutionized the way people thought, as they’d never been able to see creatures in such detail before.

The lens and context in which data is presented is also very important. One of the main concerns Hanlon expressed, is that in this day and age where data is abundant and constantly changing, that it is easy to misconstrue the meaning of data if you don’t have the context. For example, imagine looking at a medical chart for a patient that shows concerning vital signs.  If a doctor was to look at this cart without any previous knowledge of the patient, they could easily think that the patient was in a declining state of health. However, what if the patient’s vital signs had been significantly worse an hour ago and they were actually showing signs of improvement? This shows the danger of taking raw data at face value without understanding the context of the situation.

One of the most interesting parts of the lecture though, was when he showed us how the frequency of words varied over the years using the Google n-gram viewer. Not only was I amazed to learn about new software that I could play with, but I surprised to see how much fluctuation there was within the words fact, truth and data. Around the 1850s, the use of the word fact increases and the use of the word truth decreases. This shows that as the meanings of words change, their popularity changes, but also that authors were becoming less concerned with feeling and more concerned with fact. However, the word data surpassed the usage of both of these words, showing that data is more all-encompassing and is the building block of both fact and truth.

The Necessity of Data

In this evenings lecture, Professor Aaron Hanlon spoke about data and its evolution to what we see today. Apparently, much of what we know about data and the scientific method comes from the Royal Society and Francis Bacon. Data became a necessity  in order to understand more about society than what was posed by individual scholars that made generalization. Data came about because there was such a need of real proof of different theories and sufficient evidence to back these ideas.

 

It is so interesting for me to try to think of a world without data. As a science major most of what I know has been thoroughly supported by data set after data set. When we are young we are taught the scientific method in grade school that helps up to understand how we know what we know about the world. The key step of the scientific method is data collection. It reminds me of my first science fair experiment where I looked at a series of different mixes that caused either endothermic or exothermic reactions. I used a thermometer to test if there was a temperature drop or increase while I mixed together different things in our kitchen. I got very excited about this experiment and collecting data as a whole. I mixed together almost everything in our kitchen to see if there was a reaction. To this day I think that this is one of the reasons I am a science major. I learned to like the scientific process and with that the process of collecting data, to this day I love experimenting.

I dont think I had any idea about how hard it is to collect unbiased data until this semester. I am currently in a statistics class to compliment my geology major and one of our assignments is this huge project we do throughout the semester. In this project we could decide to find a data set online or we could have decided to collect our own data. My group decided to collect our own data and we decided on surveying the student population about what extra curriculars they do and how happy they are ar Colby. Some of the variables we are also collecting data on are how many hours of extra curriculars people do a week, what gender each person is and what class year they are in. I never knew how hard it is to collect unbiased data for the most accurate data set we can get to. One of the problems we’re are running into is how to hand out the surveys without bias as the data can be locationaly biased. People that go through the Colby College spa is not an accurate representation of the population because not every student walks through our student center. When I was a sophomore I was someone that actually never walked through the spa because I lived on the other side of campus and never had the need to. We are still working to get past this bias.

 

The Data Dilemma

Every four out of five dentists recommend Crest toothpaste. Every four out of five dentists also recommend Colgate toothpaste. How is this possible? Data is extremely easy to manipulate, alter, and use out of context to paint two completely different pictures. While data can convey meaning without words, it can also convey multiple different meanings depending how it is interpreted. In Professor Aaron Hanlon’s lecture on the revolution of data, he stressed the importance of understanding how to interpret data properly, as “properly” varies entirely on context. Data of course stands such significant importance, given it is used in nearly every industry imaginable, allowing individuals to make decisions based off of gathered data in any which direction. However, the value of big data does not necessarily lie in “what is being done” or “how many of x products are being purchased,” but rather why certain things are being done or purchased – big data allows for entry into the mind of the individual and consumer. Large quantities of data are not necessarily a solution to all problems however, as data on its own does not accomplish anything. We must thank Robert Hooke for introducing the idea of data, with his work Micrographia cataloguing images which had been seen under a microscope, along with descriptions, to share his findings with the world. Images served much more effectively for colleagues, friends, peers, and even those entirely not in the science world, to better understand his research. Detailed images of insects showed the world for the first time a new representation of the very world in which they lived, in an easily visualized form. Allowing scientists to recognize patterns and trends, data was born. With the various revolutions in humanity, data has become ever present, serving as visual representations of information (generally) allowing for people to better understand the world around us. Data, while extremely valuable, becomes dangerous as it can be completely non-indicative of what it is perceived to be. In this year’s election, data consistently showed President-elect Trump as the loser of the election, no matter how various factors changed. However, this data clearly proved to be inaccurate, as millions of voters who were not polled, or polled differently than their vote, voted for Trump, resulting in an election that shocked the world. There were enormous amounts of data, with polls on numerous different populations taken at numerous different times, all resulting in the same decision – one that proved to be wildly incorrect. It is undeniable that data is vital to our understandings of many industries and humans as a whole, however it is dangerous to take data as truth – as we witnessed, data only paints one picture, one that may be inaccurate despite having lots of it.

More Data, More Problems?

The increase in the popularity and use of data is a reflection of people’s need for visible evidence, rather than just being told something is the way it is. Data is just an illustration of words. It is words put in to simpler context that is much easier to understand. However, data completely replacing words does not seem like the positive route it has been laid out to be. When we replace words with data, we loose the capacity for contextual meaning. It is true that words can be misleading, but so can data.

“Data” is everywhere today. Almost every field of study deals with data and if there is no data, the study is not considered reliable. It is interesting to think that in the past, data was not necessary to prove a point. One could just say that something was the way it is and people would believe them. Obviously there are limits to this power, but it is much harder to pull that off today. People need data, or evidence, to believe a claim and take it as fact. We have become so dependent on data to separate truth from fiction, that we may have become blind to its downfalls.

It is clear from Professor Aaron Hanlon’s lecture that words can be misleading. If a picture is worth a thousand words, then it can be a thousand misleading words. Just because there is a picture, does not mean what is being portrayed is accurate. When it comes to art, especially photographs, framing is extremely important for context. We think photographs to be concrete proof with no bias, but it is very easy to manipulate a photograph to portray what the photographer wants. This is true of illustrations, and it is true of data.

It is important to keep in mind that the data presented is going to be biased based on who is presenting it. There is a reason for particular uses of data, and data can be manipulated to reflect the intention of the presenter. Another point from Professor Hanlon’s lecture was that meaning requires context. There are multiple ways to interpret a statement, so one needs some background to infer the correct meaning. This applies to data in the sense that it needs context in order for those interpreting it to make the right connections. In many cases, words are required to help provide this foreground and form the context.

A conclusion from this lecture was that when data becomes the main form of evidence that will be revolutionary. But are there questions that cannot be answered with data? I would argue that yes, not everything can be proven with data. There are other ways to present information and prove findings that are better than data in certain context. Relying solely on data as evidence is irresponsible. I am not saying data is bad. Data is great, but it should not be the method of evidence used. Yes it would be revolutionary for data to become our main form of evidence, but that does not make it the best option.

Data Used In Politicized Topics

In his talk about Big Data, Aaron Hanlon most interesting points centered around using Google Trends and another Google function that analyzed how many times electronic books mentioned certain words. Hanlon noted that the word “data” has been used much more often in books published in recent years while words like “truth” and “fact” have been used less often on a downward trend. Now, these are only  words in books that have been electronically transferred to Google Books, but it brings up an interesting point. He discussed how truths were known on a theoretical basis in the early literary years (1600’s and before). Truths and facts were primarily used interchangeably with the word “evidence”, but in recent years it seems that “data” is becoming the new word to interchangeably use with evidence. To paraphrase Hanlon, “When data become the main form of evidence, that’s revolutionary”. However it may be problematic with using data as the main phrase associated with evidence. Depending on the subject matter, almost all data taken can be taken with a certain bias to create and back up an argument. For example, most surveys have an inherent bias depending on whether it’s an online survey or whether it’s taken in person, who answers the survey and where the survey is taken amongst other biases. While all data is not based on human response, where data is taken can be biased to “prove” an argument.

I cannot help but think about the 2016 Elections when I think about this talk about Big Data and how data is used to argue so many issues that presidential, senatorial, and congressional candidates stand for and against. Not only on arguing issues, but the reliance on “Big Data” can be and was disillusioning for predicting the president-elect in 2016. Donald Trump is going to be the President of the United States and almost no political analysts or pollsters saw it coming. Hillary Clinton was expected to win (some said by a landslide) in almost every “legitimate” poll released and many millions of Americans were disillusioned when the result of the election went the opposite way. While it is hard to think of a different method of trying to figure out who will win elections, the over reliance on data and its inherent bias can be extremely misleading in some cases.

During this election season I have noticed that on social media, most prominently Facebook, users bicker back and forth about politics using data to argue points about race, violence, the environment, and many other prominent issues. There is an overabundance of information sources in this day and age online, which allows people to pick and choose which sources to follow on their Facebook “feeds”, instilling certain ideas and values depending on what side of the political spectrum the user falls on. Most sources I see shared about politics come from biased sources whether that be on the right or left. While I definitely fall on the left side of the political spectrum, it can be annoying and concerning to see fellow “liberals” share posts about some of these issues that are blatantly wrong with biased data to further their argument. However, it can be more frustrating to me when people on the other side of the “aisle” share very biased sources talking about things like “black-on-black violence in inner cities” to argue that police officers are not abusing their power in certain parts of the country. Either way, data needs to be looked at carefully when making arguments.

Revolutionising data

When the last global soccer season ended, back in April of this year, something incredibly crazy had taken place. Something which no one could have accounted for; something for which statistics had no justification; something which any data in the world could not explain. Leicester City, a soccer team barely anyone within global soccer circles had even heard of, had clinched the English Premier League, defeating the mighty and globally recognised teams of Manchester United, Manchester City, Arsenal and Liverpool. When its coach was asked the biggest reason of their success, he attributed it to their ‘firm belief’. This whole example teaches us one thing, that even though the world has undergone a data revolution, there are some things which can never be undermined.

Last Tuesday, Professor Hanlon discussed about how data affects every aspect of our lives, while also reminding of its potential risk. Data sees everything in black and white, and leaves no room for abstract qualities. Data is nothing but just binary cods strung codes, looking to transform inout into quantifiable output. Therefore, it is essential that we remain aware of how to utilise data. let us not forget; data is one of those things which distinguishes humans from animals. It makes our lives extremely convenient. However, if we fail to keep control over data and instead our governed by it, then we run the risk of overlooking many other things in life.

Since the data revolution, the whole world has come a long way.  And by no means have we reached the end. Therefore, it is safe to say that the revolution in data is not yet over. There are yet more discoveries to be made and groundbreaking research to be conducted. Our knowledge into the realms of data is only meant to expand. However, it also requires that human element.

The Stats to Back it Up: Living in the New World of Data

For almost a year and one half now, every morning for me has consisted of a somewhat odd routine.  Odd in that it includes at least one activity that all Americans could not ten years ago have done.  Each morning I wake up, I shower, get dressed, eat breakfast and then check my phone for one thing; the polls.  Perhaps, a sign of insanity, perhaps a simple act of intrigue, my mornings have included looking at fivethirtyeight, the statistical news site, for much of the last two years.  My main focus; the U.S. election. Primaries, Democrats, Republicans, Clinton, Sanders, Cruz, and of course Mr. Trump.   Specifically, searching that the latter of these won’t actually be elected, and frequently being reassured by the numbers that pop up on the screen.

The fact that these poll numbers are a comfort as of late in this political climate is a phenomena unique to our modern age.  It’s data; hardcore evidence processed not by humans but by machines which couldn’t have any subjectivity.  It’s a new language, as the emergence of the term within a language-english-can chart.  That is what Professor Hanlon explored in his lecture,  in what was a compelling argument for the fact that recently, humans have looked for proof of things and ideas in very different places than we used too.

 

The main shift Hanlon points out is the historical change from proof in science being displayed to the public on a purely qualitative platform, such as in Microscopia, to a more empirical quantitative manner.  As science advanced this sort of proof, “data”,  became the primary way that we as humans chose to back our ideas up, even if not in science.  Hanlon argues that the increased usage of the word data is evidence of this, and that it has had consequences on how we live our lives as individuals and societies; right up to how we elect a president.  And so it is today that we live in a world where evidence is synonomous with data, and statistics are given precedent over other forms of rhetoric.  We therefore live in a world where logos is valued over pathos  and ethos, something that is a new and presents itself in many forms.

The most evident of which is Nate Silver’s media giant, fivethirtyeight.  In the current political climate it is given precedent over the media, the objectivity of a newscaster, because in a world where computers can completely eliminate objectivity, the newscasters objectivity is now subject.  Hence we have found comfort in data as a means of supporting our ideas, which is a new idea in itself.  We live then in a world undergoing a revolution of data.

 

The Modern Data Dilemma

The revolutions explored so far this semester have mostly addressed events in the past; the periods of adjustment following these have ended, and their effects have been implemented in society. Khalid Albaih’s lecture addressed a current ongoing political revolution in Egypt. The societal implications of political revolutions can be overtly demonstrated to the rest of the world using images, publications, art. Other, more abstract, can be difficult to detect and their effects on society sometimes go unacknowledged. The world is currently going through one of these more abstract revolutions. With the development of technology, humans have obtained the ability to gather massive amounts of data, so much data, in fact, that we cannot review and process it all.
What is data? In latin it translates to “a thing given.” Facts and evidence are the results of sets of data, which help demonstrate the significance of data to society. The use of data arose as a tool in science around the 17th century. During the era of the Enlightenment and the Scientific Revolution people could no longer rely on their assumptions to explain phenomena of the natural world, they needed quantifiable proof.
Francis Bacon emphasized the use of images and visual representations of matter to describe natural phenomenon. The use of words to explain processes was considered to be less scientific than visual representations. Robert Hooke’s Micrographia was a perfect example of how images worked better than words to describe scientific knowledge. His book contained images of flies under a microscope, which allowed normal people to actually see what they looked like up close instead of relying on the words of scientists in combination their own imaginations for descriptions. Visual representations were seen as more scientifically accurate since they relinquished the influence of subjective imagination that follows literary descriptions.
Today, illustrations or pictures, like Hooke’s, are not considered to be the most effective way of demonstrating scientific knowledge to a community. To further purge subjectivity from scientific processes, people have started to show information in purely quantitative and measurable terms, numbers. Now, graph and charts are used to explain processes. For example, in the 19th and early 20th century scientists looked images of embryos at different under a microscope to visualize patterns of allometric growth. This comparing images would sufficiently serve as evidence of allometric growth patterns. Today, these images would need supplemental graphs comparing the precise measurements of allometric structures at different stages of growth. As data has become more complex, the modes are portraying the meanings of data have also evolved.
The development of complex software and computational tools have changed the way that we handle data. In the 17th century, scientists dealt with the question of whether or not a set of data was obtained under subjective procedures. Today, people rely on technology, which is assumed to be objective, to gather sets of data. The questions now are, what do we do with it all, how do we decide which data sets are worth exploring, and how can we portray this “big data” to society? Storing data is also a modern dilemma. The massive amounts of data produces with modern technology may be overwhelming and most of it probably serves no use for society at large. However, if people are able to create programs that can effectively collate and organize “big data,” more of it could be useful to society.

Our Visual World

              Thinking back to Khalid Albaih’s talk about social media activism Hanlon’s talk made me think about the saturation of information in todays society.  Basically how we live in world where we hold super computers at our palm, are constantly involved in the production and exchange of information, and are surrounded by visual information-yet we do not know what to accept as objective or not. How did we get to this point? Surely the technological revolution and the dot com rush have ushered in our current era, but how have we gotten to the point where we accept most of this information in all of its messiness and without much evidence? Amongst this issue, why the insistence on visual data? Why is it that academic, scientist, peers, students, professors, etc use visual models as a means of expressing data?
               Hanlon’s lecture at its core really expressed the ways that data in particular is a product of rhetorical, theoretical, and epistemological context. That is all data is a product of a particular historical moment and motive. Now what exactly is data? Data refers to information or knowledge that is represented. Represented that is by visual codes or other characters. Historically data arose from an empirical tradition that developed into an epistemological evolution in the ways that scientist and scriptural intellects did their arguments. Specifically the ways that intellectuals did their job around the 17th century began to rely more on more scientific studies that valued images and visualization.
                  So when did data become revolutionary? Hanlon argues that once the technological and epistemological methods of codifying data became big is when it constituted as revolutionary. That is once information around the world became datafied then all of the information and understanding of information became large and heavily codified. So if data has always been visual and it is now the main form, what does that mean for our future?
                 Dealing with a heavily datafied societal context, technology has allowed for a certain saturation in visual information. So what does this mean for our future? In this heavily datafied context when is there going to be a critical reflection on the ways we continue to visual declare our information? Because the implicit objectivity that is accompanied with visual representation asserts no need to undercover the complexities. However, is not there an importance in uncovering the complexities of our world? Surely we should not accept all information as it is handed to us. We must critically deconstruct and analyze the broad assertions accompanied with the saturation of data. We must not willingly look at websites such as Facebook and accept all of the supposed news or supposed scientific education that comes in images and graphs and pie charts.

Is Data Always Bad

In my mind, data should always being used. I may be slightly biased in my assessment of the use of data as opposed to the lecture last night.   Being an economics major at Colby, data is the backbone that drives all the studies. While having theories about a lot of things is useful for teaching concepts it is the experimental data that will prove theories. Precisely, in one experiment the theory would suggest one outcome, but the real outcome is much more drastic because of behavior we cannot predict. For this the use of data is extremely crucial for trying to make progress in the field of economics.

Along with my study of economics my other passion is sports. In the last ten years, there has been a major push, with one word coming to mind sabermetrics. When referring to the last ten years it is more that more people have learned about the idea on a larger scale. Sabermetrics was the idea behind the “Money Ball”, which led the As to the playoffs with one of the lowest paid rosters in all of major league baseball. While in baseball it revolutionized the important numbers that batter must produce, favoring on base percentage as opposed to batting average it went much deeper. It started to track what pitches a pitcher threw and if he didn’t vary it enough it could give the batter an advantage. In basketball sabermetrics studied how players being used and if they could be used better, for example many players are taken out early because of foul trouble. The team would most likely perform better if they kept them in the game longer and risked fouling out. In both cases, big data has been used for the benefit teams and individuals. Is there an argument for not using big data?

As previously stated my studies at Colby, and the way I like statistical studies to back up theories I am biased towards thinking data is always a benefit to society. I think studying data is always very useful, but could the release of data allow for people to support their own flawed ideas? For instance, black men are more likely to be incarcerated compared to white men. This statistic could allow for someone to support their idea that black men are more dangerous than white men, an obvious racist idea. Whereas it should most likely be viewed as how has the education and justice system failed black men in regards that going to jail is such a common occurrence.

Data has its role in helping us see the world. It has given light into many different areas about how we can improve. Whether that is regarding a sports team trying to be more competitive, or highlighting a possible flaw in our correctional system for people. Today we will always be able to find data about a certain idea or how to influence a result. How we go about using data and making sure it is understood is the most important part.