What is the current state of data products and what’s their future?
Data products are applications or services whose core business rests on aggregation, insight, prediction and/or display of information. You don’t hear about these kinds of businesses too much in the news as but you use them a lot. Google search is the preeminent example, a product that millions of people use multiple times a day. And there are other examples too, profitable ones that we’ll walk through in this post. The prevalence and scope of these products will likely increase in the near term as businesses collect more and diverse kinds of data on and about individuals. Then too, once you have data about a network of individuals you can start deriving insights about individuals based on how that network operates.
Being able to work with data is a valuable skill not only for employers but for yourself. There are data products that you could be building, today.
What are some examples of data products?
I asked this question to the reddit data science community. I’m grateful those who responded but overall participation wasn’t enthusiastic.
Having not heard of Google ngrams previously I decided to take them for a spin.
Interesting use of data but not necessarily a product.
The second commenter mentioned Spotify and their song recommendation engine. Again, a neat use of data that could be turned into a product but that recommendation engine is not the product. The product for Spotify is a subscription for advertisement free listening to music. The product for Google is you — your searches and your attention on the ads you see everytime you search.
The data products that Google has put forward interest me in the way that the planet Mars interests me: Exciting to think about but I don’t expect to get there anytime soon. Instead, I want to focus on data products that you could conceivably build yourself — with a bit of effort and luck. Let’s think about products that already exist or will soon exist.
So here’s what I put together as examples of data products that individuals or small teams of individuals could conceivably create themselves.
Information aggregators: SEO and marketing analytics (e.g. Ahrefs, SEMRush)
This data product relies on gathering a lot of information. Let’s think about a site that displays rankings for Google search terms. This information is valuable because it allows businesses to plan out how to market themselves and teams to situate their products for selling to a wider audience. How profitable is such a business? Take a look at the pricing options for just one of these websites:
If you want to be a major player you’ll need to scrape stats on every website you can find and do it with some regularity. This is why the services provided by sites like www.ahrefs.com (pictured above) are so valuable. The team behind this site knows its value and charges accordingly.
It would take some time and a lot of engineering to compete with a site like Ahrefs. But you could build a product that covers a specific niche.
Here’s how you’d build a data aggregation service
- Build a system that googles keywords;
- For each link that is returned for that keyword, follow it to the site;
- Follow the links on that site to the other places they go;
- Grab and store this content, strip out the keywords in the content;
- Store all of this information in a database;
- Update that database regularly;
- Find a way to investigate the data from different angles and present it to customers.
In this way, your customers will be kept up-to-date on the latest trends, articles, and content being produced in their niche. And they will pay for access to this information. You could charge customers a fee for every keyword they search but what appears normal in this category of data product is the monthly subscription.
Dashboards, analytics, and insight (e.g. Datadog)
Once companies grow to a certain size they start collecting data internally on how well their operations (and applications) are performing. They typically ask for a dashboard to monitor this information. It’s hard to tell how most businesses across America handle aggregating and displaying this information. Do they use google sheets or are they relying on something more advanced or more primitive? Are they even tracking their own data at all? The businesses that are going to survive through adolescent are tracking their own data, the question is how.
Services like Datadog provide dashboards and information about what is happening in the business application in real time. It’s money to the business and to the dashboard vendor, in real time. Other services like tableau can easily and convincingly produce reports for the company to act upon.
Here’s how you’d build a dashboard service
- Pick a niche with very particular and predictable data, a lot of it. It also helps if this niche has real trouble managing their data. It doesn’t help you to build a product no one needs;
- Build out vibrant, real-time dashboards using dummy data (shiny sells);
- Pass that data through existing libraries for cleaning/manipulation;
- Use open source tools to visualize and share that information with the team internally.
There are two kinds of businesses — ones that sell vitamins and ones that sell pain relievers. You want to be in the pain relief business. Grabbing and visualizing data for individual companies, especially ones for whom this kind of manipulation doesn’t come naturally, is a pain-relieving proposition.
Content curation and generation (e.g. Article Forge, Botnik)
These services are in the league of automated news reports. Think about the last time you read about the day’s weather, how a particular stock was performing or even a listing for an apartment. How many of those articles, you think, were written by a reporter? Fewer and fewer. And once these content generation sites finetune their pipelines there will be little ways for humans to compete with the machines. Luckily if you’re reading this blog then you probably are familiar with the tools and techniques needed to get you started.
Automated content generation is a way to extend your influence and mutliply your ouptut.
One example of automated content generation is Article Forge, which can’t wait to start charging you for its service.
I listed Botnik as another example of this kind of service although they are a different thing entirely and it’s not even very clear how they make their money. Botnik is a community of artists poets, PhDs, and creatives who are creating what is essentially art with the latest data science techniques. Anyone can join their community and start creating alongside their users. It’s an exciting way to create content…collaboratively.
The near-future, I think, will see the rise of a hybrid approach where businesses can approach a company for an article and video sourcing. Potentially for documents and videos that are generated via machine learning models and not written by freelance writers.
Here’s how you’d build a content generation service
- Gather a lot of data, again, preferably about a particular niche;
- Build some kind of deep neural network pipeline that can replicate iterations of content;
Notice how I didn’t just say text. In the next few years, I think we’ll see a lot of generative images and videos. This technology is already available, it just hasn’t been put to great use thus far. Someone will figure out how to use it profitably. It might as well be you.
Machine learning model as a service (e.g. DeepArt, Lyrebird)
This is a new class of products composed of trained machine learning models that are experts at doing something. The something here could be classification, pricing, voice synthesis or manipulating pieces of art. The businesses we’ve reviewed thus far certainly have some machine learning models backing them but the product isn’t strictly the output of the model.
Here’s how you’d train a machine learning model as a service
- Gather a lot of data about a particular niche (this was true for all previous products but especially true here);
- Build machine learning pipelines to generate models;
- Rent those products out to consumers on the web. It seems fairly standard now to charge a fee per prediction.
Where are we going from here?
This is what I have thus far. I’m interested in hearing what you think so let’s start talking in the comments below. Maybe we can get a conversation flowing about different kinds of data products — what are you seeing out there and what gets you excited?
Over the next few weeks, I’ll start putting out coded demonstrations of each of these products. They won’t be ready to sell to anyone but perhaps one of them will strike an idea for you and someone will take them further.
Get in touch and let’s collaborate
And, you guessed it, I’m interested too in building data products. If you have a passion for data and software engineering contact me at firstname.lastname@example.org with some ideas for what you’d like to build.
Maybe we can collaborate and build the next great data product. 🙂