Category: November 1 (Page 1 of 4)

What is Data?

To be honest, data has always been one of the words that confuses me, especially when writing papers. Its meaning can change depending on the context of the sentence, and it’s also one of those sneaky words, that is plural but doesn’t outright appear to be. So, it’s fair to say that data hasn’t been one of my favorite words to use. However, after 4 lab courses at Colby, all in the natural sciences, I have become more accustomed to working with raw data.

While Professor Aaron Hanlon’s lecture on Revolutions in Big Data initially seemed boring, as I was not very keen on the subject, I was very surprised to be intrigued and fascinated by his presentation. Hanlon’s lecture looked at the evolution of data in several different ways, including meaning, interpretation, and frequency of use.

According to Hanlon, the first recorded use of the the word data was in the early 17th century as “a heap of data” ;describing the word of God. This use of the word in the religious/spiritual context makes it sound as if the word is synonymous with truth, but this is not right. Data isn’t fact, but is the first step in formulating ideas, and builds fact and truth. Data has come a long way since then, and has expanded to mean multiple things.

An example Hanlon used was Hooke’s book Micrographia, which showed small insects and organic material, such as fleas and leaves, blown up in drawings to show very small detail. This in itself was a small revolution as this new form of data, revolutionized the way people thought, as they’d never been able to see creatures in such detail before.

The lens and context in which data is presented is also very important. One of the main concerns Hanlon expressed, is that in this day and age where data is abundant and constantly changing, that it is easy to misconstrue the meaning of data if you don’t have the context. For example, imagine looking at a medical chart for a patient that shows concerning vital signs.  If a doctor was to look at this cart without any previous knowledge of the patient, they could easily think that the patient was in a declining state of health. However, what if the patient’s vital signs had been significantly worse an hour ago and they were actually showing signs of improvement? This shows the danger of taking raw data at face value without understanding the context of the situation.

One of the most interesting parts of the lecture though, was when he showed us how the frequency of words varied over the years using the Google n-gram viewer. Not only was I amazed to learn about new software that I could play with, but I surprised to see how much fluctuation there was within the words fact, truth and data. Around the 1850s, the use of the word fact increases and the use of the word truth decreases. This shows that as the meanings of words change, their popularity changes, but also that authors were becoming less concerned with feeling and more concerned with fact. However, the word data surpassed the usage of both of these words, showing that data is more all-encompassing and is the building block of both fact and truth.

Data and Decision Making

Despite my apprehension about going to any talk surrounding the topic of the discussion of what I percieve to be numbers in raw form, I really enjoyed the conversation of the idea of data. Professor Aaron Henlon made some truly interesting points in his discussion about fact and truth, about the origins of the word data as the word of God, and of the evolution of the use of data. In thinking about how I use data, and how I will use data, it will not be as fact or as truth, but as the building blocks of cases for a point in thesis, investment ideas, and marketing pitches. Data is not fact, rather it is the first point in thinking about how to forumulate ideas and opinions. Data is something that can be manipulated, in that it is something that can be interpreted. Just as important as the numbers that are included are the numbers or series of numbers that are not included. In thinking about data as fact, the point is missed that facts are proven while numbers are not.
I really liked the historical discussion surrouding data, and the trends surrounding the ideas of fact and truth. I think that data does come in all sizes, and can be both quantititative and more troublingly qualitative. I think that it qualitative data is the epitomy of the arugement that data is interpretive, even in number form. Qualitative data is something that is no objective.
Somethings that I really have thought about have been the methods of gathering data, and how revolutionary the idea of data mining and gathering have been over the last five to ten years. Recently, Google’s or Alphabets PokemonGO has become a sensation that captured the imaginations of many who have been kids between 1980 through the present. The data that they are able to collect is truly unbelievable. Using your smartphones camera as a guide, they are not only able to advertise through the application, as one might expect, but they are able to collect data on where you are walking, how far you walk, what stores you go into, what food you eat, the clothing you look at, what you are wearing at any given time, whether or not you exercise and how often, and even access your emails all due to an argeement you sign when you open up the application. In thinking about how this can be extracted, we can see how all sorts of data from this alone does not tell any given individual any singular truth. However in aggregate, the collection of these numbers actually tell us alot about behaviors, which again is something that is to be interpreted. I really liked the way Professor Hanlon ended the lecture with his four points about data. I think that BIG DATA truly has been revolutionary in the way he pointed out, how we deal with the globalization of the world and the sheer mass of number and points that are thrown at us help emphasize a point that I made in an earlier post: we are revolutionary in finding new ways to deal with new problems and issues.

“What does this data mean?” “It’s right there sir, see for yourself.”

Professor Aaron Hanlon delivered a talk on the introduction of the word “Data” into the English language in the 17th century. From here, he explored the ways in which this was used and how that has shaped the use of the word today. What began as meaning literally, “the word of God” data has now become a ubiquitous term in everyday language. Professor Hanlon showed the google n-gram viewer for the usage of words data, fact and truth. A trend can be noted that the usage of “fact” and “truth” diverged near 1850, “fact” increasing and “truth” decreasing. “Data” also began to increase rapidly in usage from throughout the 20th century and the usage of “truth” plateaued, falling well below the usage of “data” and “fact” by 2000.  What can these trends say about the meanings of these words have changed.

The general trend would tell an observer that the transition has been from a word more related to belief to more neutral and scientific words. The word “truth” implies intimacy, a type of interaction with the information where it is the truth, but it is also taken as the truth. It means the same as fact but facts imply a separation from the interpreter. Facts are simply what is presented. Data is what builds these facts and truths. The analysis of data can lead to conclusions. Today, data has instead become synonymous with evidence. Not support, but more related to facts and truths. This is problematic because data is nothing without proper context. Data can be framed and manipulated to be evidence for anything.

I’m not sure how revolutionary this idea is, but it is alarming that we are moving towards a reality where data functions alone. Instead of relying on the context, people simply look at the raw data and set aside why or how it was collected in the first place. Data should not reveal anything without some context. This is not a framing of the information, but rather an actual description. In a time with excessive information, it is important to remember this. All of this random information must be supported. And even then, the supported information relies on how and why it was collected and analyzed. This may shape the interpretation and may also lead to faults in the method of collection. There is a lot more to data than simply interpreting the magnitude of numbers compared to one another. Data is extremely important, but it must be appreciated for what it is.

What Professor Hanlon was getting at is that data has always been visual but in current times, it is becoming evidence for people’s claims. What is surrounding the specific data has been losing importance over the years. The interpretation of data cannot be generalized. Depending on a critical lens, not only can data have varying importance, but it can lead to completely different conclusions. When a person looks at data, they need to take context into account and also look at the reason for this data in the first place. Evidence, facts and truths can come from data, but it is wrong to assume that this is always true.

Not Modern Data

First “coined” in 1646 by Henry Hammond, data first made its appearance, however, at the time, the definition for the word was ambiguous.  Throughout history, the word data and the materials that constitute data have changed drastically.  While most define data today as numerical figures that are quantifiable and comparable to other figures; historically, the word data had a much broader meaning.  Robert Hooke, a famous scientist during the 1600s considered data to be anything that reveals something, a greater truth.  Studying small organisms using a microscope, Hooke drew images of beetles and other microorganisms, distributing these images amongst the public and creating a portfolio of his work.  In addition, Hooke included lengthy descriptions describing these creatures.  Nearly comical, Hooke personified these species using language such as “the little enraged creature”.  As unorthodox as it may have seemed at the time, these beautifully drawn pictures, as well as their lengthy descriptions were considered to be data, however, can this truly be data?  Although Hooke was a fantastic scientist and contributed greatly to the scientific community, it is hard to consider hand drawn pictures accurate data.  Although he lacked the scientific resources we have available now, Hooke’s work, although telling, was not completely accurate.  Although many scientists argue that Hooke in fact did contribute, Hooke’s lengthy descriptions reveal the great difference between today’s concept of data and Hooke’s concept of data.  Although some of the descriptions Hooke published with his works contained some, it is imperative to note that these descriptions were opinions, Hooke’s own thoughts in relation to these discoveries.  While these may be helpful, options inherently are biased; simply writing in a language and interpreting something that one sees is not necessarily how everyone else might perceive that certain object or organism.  In this sense, Hooke’s descriptions and even his drawings are not a primary source because these findings are disputable, they are secondary sources.


So what is modern data and how can it be obtained?  Modern data is something that is both quantifiable and not disputable.  In contrast with Hooke, modern data must express no bias and be as raw as possible.  Such data would include spreadsheets of numbers, graphs or charts filled with indisputable numerical data, but could not include lengthy descriptions filled with superfluous adjectives lest the meaning of these findings be skewed.  While many argue that data is in fact conceptual and should not be limited to simply numbers and graphs, the distinction between data and portrayal of truth must be discussed.  There is little debate that Hooke’s drawings of beetles and other organisms educated many people, portraying a micro world that hadn’t been previously available to common people.  However, as stated above, Hooke fails to present concrete evidence that can’t be argued against because his drawing are filled with opinions.  In addition to the descriptions, Hooke may have favored certain parts of the organism’s body unconsciously and portrayed them with greater size and detail.  While Hooke heavily contributed to the scientific community, his drawings, along with his lengthy descriptions, can’t be considered “data” in the modern sense.

Data is Not Fact or Truth

Are there ways to fact check data? Aaron R. Hanlon , an English Professor at Colby in his talk on the Revolutions in Data, Big, and Little, has brought up that question that I bet many have not thought to ask. Hanlon opened his talk with how we take Data for granted. Data comes from the British tradition, entering the English language in the 1600s from Latin, where it was seen as fact and truth the way nowadays we view Google data. Hanlon discusses the way we misinterpret “data,” a conversation that would truly benefit people today as we are in the age of technology where “data” is attainable at a click of a button.

Continue reading

Post-Truth/Big Data

Now that Oxford Dictionary has positioned the word of 2016 as “post-truth,” perhaps it is the perfect time to study the origins of data. As Professor Aaron Hanlon said, after all, “data is a big deal.” If big data is about how we ‘see’ information, though, I wonder what it says about our current understanding of big data. Coming off this election, many argue that America is anti-intellectual. Many Americans would rather hear comments during Presidential Debates with words they can understand rather than trying to parse through the educated political arguments former candidates brought to the podium. Policy? Foreign Affairs? None of that seemed to matter this election—and neither did any ounce of fact checking.

Professor Hanlon showed us the Ngram Viewer and its results for “data,” “fact,” and “truth.” The graph clearly demonstrates there has been a sharp decline in “truth” over the years, while “data” has seemingly surged from the abyss. But what does this mean in a post-truth year? If data is “a thing given” and anyone can label anything as data nowadays without real evidence backing it up (aside from fancy credentials or buckets of corporate money behind them), then anything can be that which is given. Data, perhaps, has become exclusive from truth. After all, Professor Hanlon explained how data as we know it now was born from Scriptural Data. These givens were practiced by Bernard in Faithful Shepherd back in 1607; indeed, he claimed that the church needed to be more plain and use less verbose language for the “truths” of scripture. The image became the trusted source because words “can mislead us.” I wonder, though, what this means in this day and age of Photoshop. Can we trust images? Can we trust words?

What is more, in this age of post-truth and anti-intellectualism, does anyone really read the data? Or is it something to attach to a tweet or a HuffPost blog post that will be taken as fact without a second glance? Professor Hanlon ended his lecture with four main points: 1. Data has always been visual 2. “Big Data” was a conceptual revolution as much as a technological one 3. When data becomes the main form of evidence, that’s revolutionary 4. All data is rhetorical and theory-laden. In response to his third point, he commented that we must be more discerning between which questions deserve which responses. But what implicit value judgments are made in “data?” How does one attempt objectivity? What else is implied in ‘just observation’? And if it is true that “context shapes quality,” who today is truly discerning the quality of the context of data? While there are plenty of websites fact-checking President-Elect Trump’s comments (even real-time tweet fact-checkers), I think it is pretty clear that no one cares if he is lying. No one cares if he is racist. No one cares if he brags about sexual assaulting women. No one cares. We are in the age of post-truth. Is this actually a revolution? Are we experiencing of revolutionary moment?



Data’s Power

In this lecture, Professor Hanlon discussed an interesting topic about data revolution. He argues that for a long time in human history, the use of data as a visual proof is a revolutionary approach to illustrate one’s point. His main arguments are since data is visual and seeing is believe, using data is a powerful tool to deliver messages or arguments.

Continue reading

The Malleability in the Social Media Age

Listening to Professor Hanlon, Assistant English Professor here at Colby,  speak about the dangers of misrepresenting data and the need to prioritize collecting valid data, and representing it as such, I found myself thinking immediately of the recent political events in the country.  Furthermore, Professor Hanlon’s discussion of data sourcing, and the lack of focus on disseminating data from reliable sources resonated with me as I thought about the cost of false media in the 2016 Election. Growing up in an urban area, especially as a student in the 21st century, I have been exposed to the pinnacle of technology, interaction, and conveyance of information about events in our country.  On a daily basis, I hear discussion amongst my peers that mention Twitter articles they saw retweeted hundreds of thousands of times, and I find myself wondering how on Earth one would know if President Obama had made a final checklist of dogs to adopt as his family’s second pet?

Professor Hanlon makes it a point to stress the weight and influence that data can have on any person’s opinion or stance on a person, topic, or event.  Take the fact that millions of Americans watch Fox News every night, and that following the shooting in Orlando, I remember seeing a stat that disappeared later (for inaccuracy) suggesting 13% of Muslims supported radical Islam.  After the lecture, I found myself thinking how many Americans had believed this about Muslims, as millions of viewers could have been changed by that completely outlandish, incorrect stat showed one night on Fox. As I think about more recent, concrete events that have consumed not just my generation or Colby College campus, but essentially the entire country.  More specifically, I remember reading a Buzzfeed piece- not exactly a reliable journal, but with citations- that proved Facebook had featured fake news articles that reached over 10 million people in the worst cases. Days before the election I read an article stating that Hilary Clinton was finally put on trial, and that she would not be participating in the election; this is simply the epitome of what Professor Hanlon warns against. Although at age 19 I know such an outlandish article is fake, I created my Facebook account when I was in 7th grade and a younger audience could certainly take this as truth.

Having not taken a political science class at Colby, and not being particularly interested in politics in my free time, I find myself more susceptible to this political dogma. However after professor Hanlon’s talk, and my reconciliation of his words with past events, I find myself understanding, and appreciating the importance of good data more than I have in the past.  As I continue my academics, but more importantly continue as a millennial, I think making an effort to consume, and more importantly promulgate data that is valid, well-supported, and influential is important for me and my peers.

Data and Context

The term “data” is not unique to our generation. We were raised in a society dominated by technology and are privileged enough to store and remember most, if not every, moment of our lives. Data is part of our daily jargon: we have data plans on our phones, we store our files in databases in our computer, talk about data when backing up claims in an article…

Although the concept may seem contemporary, the term data originated in the 17th Century when it originally meant scriptural givens. A “heap of data” was understood to mean the word of God, something that is undebatable. Over the centuries, data changed in meaning to the result of experimentation. Now, data has started to replace the term “evidence” and has eventually come to replace it.

To understand data, we need context. If not, it is just information with no rules, boundaries, or meaning. The contexts which define the interpretation can make the same data set mean two different, and even opposing, things. For example, in the 16th century, Tycho Brahe recorded and documented extensive astronomical data. Tycho used it to support his own cosmological system, the Tychonic system, in which the Earth remained stationary in the center while the sun and moon circled around it, and the other planets circled the sun. However, after his death, his student Johannes Kepler used the same data to support the Copernican system, and eventually his own elliptical orbits. It turns out that the Tychonic and Copernican systems were mathematically equivalent, and Tycho’s same data could be used to contextualize the data for either interpretation.

We do not collect data for nothing. Data collection has a purpose, whether it is as the proof of a theorem or the result of an experiment, it is done with purpose. This is why the context cannot be ignored, because if we take something that was done for the sake of something else and consider it without its purpose, it becomes meaningless. Data has no truth. Information as an abstraction with no content.

However, we are shifting into a mode of operation in which nothing needs to be explained anymore. Information is turning into a standalone unit and we no longer need to describe or explain them. This is the moment where data visualization starts being more and more important, and where Professor Hanlon’s talk is particularly relevant. Data has always been visual and is starting to become the main form of evidence. The context matters less and less, and a single interpretation of data is assumed. How do we implicitly agree on one interpretation? Where does the consensus come from? These are a few of the critical questions we must ask ourselves before we commit to an interpretation with no way back.

Revolutions Among Revolutions

Data is essential to the acquisition of new knowledge and the accurate transfer of this knowledge to wider audiences. In his lecture, Hanlon defined data as “a thing given.” This is accurate, because data is so often taken for granted. The research and intent behind data collection is forgotten as the numbers, percentages, and predictions are blindly accepted. It is easy to forget that before data was widely used, illustrations and descriptions were the main form of evidence, making knowledge much more difficult to convey with accuracy. The more unsavory aspect of data revolutions, as Hanlon mentioned, is the accompanying decline of the meaning associated with the numbers.

Hanlon cited Robert Hooke’s Micrographia as an example of the pre-data depiction of research. Hooke used images from a microscope to explain his findings, and directed readers to the image when words did not suffice. This combination of images and written explanation requires readers to think more deeply about the information presented, whereas facts and statistics can be glazed over and easily forgotten. Since the information is more difficult to thoroughly present this way, more thorough thought is required to understand it. If a researcher has to interpret a picture or long explanation, the information is much more powerful.

Today, however, information is conveyed much differently. According to Hanlon, the use of the words “data” and “information” increased significantly around the year 1950, and there was no accompanying rise in the use of the word “meaning.” This suggests that data and information are becoming more prevalent, which implies progress and an increase in research and applicable science. However, meaning is becoming detached from these words. Just like the graph Hanlon presented, where the line representing “data” sloped upward and the line representing “meaning” remained stable, data and its meaning are diverging.

The trajectory of this graph can perhaps be attributed to trends occurring in contemporary society. The data revolution, it seems, can be connected to the technological revolution. This is possible because the use of the word “data” increased around the year 1950, which is about the same time as the invention of the computer. These revolutions may have built on each other, and together contributed to a rising impersonality of information. The computer, for example, made immeasurable amounts of data available at the click of a button; it became easy to find the information one sought. Similarly, as data revolutionized from words and images to numbers and statistics, it became easier to digest- researchers did not have to read deeply as frequently to seek out the purpose of the information. This detaches researchers from the process behind the acquisition of knowledge, the connections behind different pieces of information, because bits of data are so readily available.

The data revolution encompasses the progression from image and written observation-based information to graphs and numerical data. This is scientific progress and shows a growing understanding of information- on the part of those who actually collect it. Those who digest it, however, are discouraged from thinking deeply about it because they do not have to.

« Older posts