As a child, I was fascinated by water. I would sit and watch the winter snow thaw, the droplets collecting on the tips of icicles as it melted, falling into little puddles on the ground. I’d watch those puddles soften the ground and saturate it. We had a stream running through the back of our property which became the center of my attention on many spring and summer days. There was little else to do in the vast woods of upstate New York than to watch a stream flow. Always different. Always the same.
For these reasons, I became enamored with flows, streams, poolings, and lakes.
Little wonder then that as an adult I spend my time thinking about flows, streams, pools, and lakes — but not of water — of information. Data.
This era we are living through now, right now, is that of information. And so spending your energy thinking about data is a worthwhile enterprise. Data has become the currency of many businesses, to have data is to have power, insight, the ability to predict. To be dataless is to be ignorant, striking at ghosts in the dark.
This is the information age because of the volume and detail of all the data generated and consumed and because of the things we are able to do with data.
If you’re a well-functioning member of society then you generate a good amount of it each day. You generate it while sleeping when your Fitbit tracks your sleeping hours. You generate it in the morning when you tweet about your dreams, Instagram your breakfast, and Facebook about the jerk who cut off on your drive to work.
Aside: I just used Instagram and Facebook as verbs in that last sentence and you know what I meant by it.
All day on the internet you create data. It flows from your browser into massive data centers where it’s stored cheaply in databases optimized to retrieve it quickly and process it effectively. That data, your data, can be accessed by hundreds (thousands?) of people you don’t know. People who might be on the tail end of the seven degrees of separation. They are pushing your data, the comments you left on a cat video, through pipelines custom built to handle this kind of information.
Algorithms trained to understand aspects of who you are and how you act feed on your data. Many of them were built to persuade, influence, engage. Their purpose might be to better target catfood ads in your direction or they might be looking to influence you with personalized Right/Left political videos. Surely, some of these models were trained for nobler causes. They might identify early stages of a flu outbreak, check up on you when you’ve been acting strange, help understand humanity broadly.
After all, it’s not the technology that’s to blame: it’s how it’s used by those with the power to use it.
If what I have described thus far has caught your attention — and I suppose it was because you’re still with me here — then maybe you too want to become a data engineer.
I say too because I’m also learning. Sure, I work with, think about, and talk data for almost all my working life but the field still feels young. Our tools are evolving to cope with the cavalcade of information streaming into the warehouses. Every second of every hour of every day.
And it’s exciting to see how the field and society will change in the presence of all this data and all the things we can do with it.
I won’t presume to tell you how to become a data engineer. But I will tell you my plan. My plan is to look at and think about data as much as possible. That’s particularly why I’ve started video series working with data pulled from the giants of Reddit, Youtube.
And it is why I’m signing up for the data engineering nanodegree from Udacity. The program promises hands-on work.
That’s what I think it takes to become a data engineer. Maybe it’s what’s required to become any kind of engineer or any kind of anything really. Hands-on work. Lots of it.
So I am asking you to follow along this journey on my YouTube channel and here on this blog. I’ll make every attempt to track my journey and let you know what I honestly think of it.
But what about you? Are you a data engineer or data scientist? If not, do you want to become one? What do you think a blog like this one could do to help? What are you struggling with, what don’t you understand or wish you understood more deeply? What have you read that you think should become staples in the field?
You can catch me here on this blog and on YouTube, getting my hands dirty.
I hope you’ll do the same.