What REU provides is something that no other internship or program can provide. If you plan to go to graduate school or even if you plan to just work after graduation, REU is still the right choice. Why? Because the experience that you gain from the program pushes you forward in both directions.
When I started I wanted to see if research was something that I wanted to pursue and it indeed is. However, even if I decided that I didn’t want to go to graduate school, the experience I gained can be used in a work place.
Is REU the right path for an undergraduate student? It absolutely is; no matter how you look at it.
During this week, I added the last section to my extension. This section shows realtime data and data visualization of the logs on the database. I worked on some little details and wrapped up the project. For now.
I learned a lot and did a lot during this summer. However, my favorite was creating the Chrome Extension. For me, it is very fascinating to see creations come to life. It is amazing to watch the work you have done come together and become something useful.
The hardest thing I did this summer became my most favorite thing.
I have combined the past two weeks together because I ended up having a short week during the first week of July. I became a US citizen last Friday.
I have fully devoted my time to developing my project, which is the chrome extension for tracking. I have made tremendous progress. As of right now, the extension can track click events based on selected elements and store them in the database. There is a filtering function that allows you to filter the list of elements on the page so you can find the elements you want to track easier.
The data is stored as shown below in order to make it easier to search throw logs and organize them. Each event data includes the userid for identifying the user, type of event, time stamp, and the view.
I plan on including more events for tracking such as drag, zoom in, and entering and leaving views to get extra information regarding user interactions. User interface will also be changed to improve accessibility.
I finally finished cleaning up the entities in the database. I wrote a few more scripts to make the process smoother and less manual work. Three scripts to be exact; one for each entity. These scripts allow for changes to be made at any time so if there was a entity that needed to be fixed, it’s just a matter of loading up the corresponding script and running it. New backups were made from the database (hopefully we won’t need them).
I also started working on the other project assigned to me which is creating a chrome extension that tracks user interactions within the browser. The goal is to create a universal tool that can track user interaction within different interfaces and store them in a database for further data analysis. I might add a visualizing tool as well just so there is some real time information available for the people performing the study.
We can never avoid mistakes. We can only make them. Whether it’s a missed } or ; in our code or deleting databases on accident, the only thing that matters in the end is to make sure we learn from those mistakes (hence why I keep making backups, maybe too many times).
I spent this week managing the database. I made new backups and and updated the data.
One of the issues that we encountered when developing the interface was the fact that the tweets’ texts were encoded improperly so there was a lot of characters in the database that were not being recognized properly. After trying many different solutions, this line of code managed to do the job (I don’t really know how it works, but it does).
The links were also removed from the tweets using regex since they were not going to be used.
All that is left to do is to clean up the entities. There are three different categories which we are using. However, each category has false positives and entities that are left undetected. This is not something that a computer can easily fix. Writing code to clean up the entities can help but it will always include manual labor.
This week has been a little slower compare to last but still very insightful and full of new learning opportunities and experiences.
With data being finalized, I worked on making a Mongo database so the data can be stored on cloud so all the team members can access it from anywhere. Other than the tweets being stored, some entities and geolocations were also stored on there. All of this data is being used for visualization purposes such as making a map of tweets or a social network of the accounts or making a wordcloud.
Yesterday I ran into a problem. The database containing all the tweets got deleted by accident. Thankfully, we ha the data stored locally so we could restore but we had to update it again to match what we wanted. It took couple hours but at the end it was a learning lesson. I ended up making a backup database and copying all data to it.
I have also been asked to read previous research papers for literature review and take notes/write a summary on those papers. It is interesting to find related information to current research and look at the methods being used in the past to get similar results or results that can help with current projects.
I have also been learning some front-end development stuff such as Node.js. Setting up new programs and platforms and getting everything to work properly is always a challenge. No matter how many years of experience you have, you’ll always end up running into technical issues that makes you feel like you’re having a midlife crisis.
We finally managed to finish a list of all the Twitter accounts that contain fake data and compile it into a spreadsheet. We created a list of all accounts that post potential fake or biased news. This list will be used later to filter out the data collected from Twitter using the Twitter API. Right now we have more than 100,000 tweets.
After getting Anaconda installed, I began learning python. I must say that this was fairly easy. Having programmed in C++ and Java for a while, the basic concepts are already there so learning Python was just a matter of learning how those concepts are written in Python. It is a very handy programming language and very easy to use.
I created a script that would filter the tweets based on a given list of account names, so the accounts that we don’t want won’t be included in our data. I did more exploring on my own and created a wordcloud based on the content of tweets. Hopefully I’ll get to share that information during my poster presentation. I also learned a bit about MongoDB and how it can be used to extract data from a database and analyze it with Python.
Today I setup a database with all the tweet data so everyone working on this project can access it from cloud instead of needing to store data locally. Even though this data is public and available to anyone (the data doesn’t have any personal data), it is a good and safe practice to keep the data securely on cloud instead of locally.
I started my first day as a REU intern getting my IRB certificate so I can research and study human subjects ethically.
After meeting my mentors, I was told that I will be working on a project involving fake news, twitter, and confirmation bias.
For the first week, me and my partner looked at a dataset of twitter accounts some posting real news and some posting fake news and determined if those accounts are reliable for extracting tweets and other data from.
As of today, Friday 6/2/17, we have finished creating a list of accounts that will be used to extract tweets from for this particular study.
Overall, I want my REU experience to be a place where I get to apply what I’ve learned so far and learn skills that will help me in the future workspace or graduate studies. For me, REU will determine if I will be pursuing a graduate degree or not. Other than that, I want to learn more about data visualization and data science and how it can be used to discover new things or improve what we already have.
By the way, Anaconda is very complicated to install on Linux.