Project Retro: Tech Stack, Successes, Challenges of OnlyNews

OnlyNews was a personal project I created in the summer of 2022. Its purpose was to take your Twitter timeline, extract links from users you follow, and create a website similar to below that serves you ‘custom news’.

The motivation for creating the project was to create an experience with personalized, relevant news articles from different sources. While Twitter is (was?) great for the banter, it’s also usually my go-to for news. My issue was that every time I would log onto Twitter, I would end up looking at tweets rather than articles. I wanted to create an experience that put news first.

The site still works as of today (3/8/23), but will be deprecated as soon as Twitter fulfills their promise to scrape their proverbial couch for pennies and end free access to their APIs.

Tech Stack

Django

One of the motivations for creating the site was to learn Django, a Python-based web framework. I used the code presented in this piece as my general template for using Django to store Twitter auth credentials (Django lets you set up a user authentication system, but this approach let you use Twitter credentials as a proxy for Django credentials). Every user who authenticates via Twitter is stored as a user in Django, and the articles from their timeline are stored via Django (so that in future web sessions, the articles stored in the database will be pulled rather than taking the time to hit the Twitter API live).

Building from a template rather than creating code from scratch was definitely a shortcut, for better and worse. I discuss that tradeoff below. But overall I really liked getting exposure to Django and definitely see myself using it again for future projects.

HTML/CSS

The frontend of the site was built via HTML/CSS. I chose to keep it simple and not include any Javascript components, as this was the first time I built something substantial via HTML. Overall I was pretty happy with the result; while there are still parts of HTML structure that are unintuitive to me, there are a ton of resources online for learning it and it’s really easy to play around and test changes inside the browser.

Celery/CloudAMQP

I used Celery via CloudAMQP to create a regular task to pull relevant articles for all users every half-hour, so that rather than hitting the Twitter API upon opening the page, users would see the most recent articles stored in the database. Setting this up felt messier than I expected. While it does work, I think I have a ways to go with standing this up on a real app (monitoring? logging? debugging?)

Successes

It works!

The site has worked for six months now, looks good to me on desktop and mobile, is quick, and ultimately serves the purpose that I intended it to.

Challenges

Fumbling at the 5-Yard Line

So I spent around six weeks working on this project, learning Django, learning how to make the page look good and learning how to automate the task of pulling articles. But there were some actual issues in the deployment of the page.

There is no splash page for the homepage, so when someone goes to the page for the first time, they are greeted with the below, rather than seeing any info about what the site is, or what it offers.

Furthermore, after logging in, the site redirects to the Heroku URL rather than the original URL, so even though the content is the same, the URL bar shows https://stormy-reaches-76674.herokuapp.com/ rather than https://onlynews.me which is confusing.

Neither of these feel like insurmountable problems, but I lost the motivation to continue improving the project given that these initial problems really stunted initial interest and growth.

Burned by Shortcuts

This is a microcosm of a bigger issue for myself and the majority of data scientists who have developed over the past decade - we are Type 2 engineers who learn skills and technologies rather than Type 1 engineers who learn the underlying foundations of these technologies. We are either self-taught or have learned through bootcamps, tutorials or other hacks. This isn’t entirely our fault - the incentives for doing so are clear and in many analytics roles the cost of doing so isn’t really a burden. Why would an Analytics role need to know the underlying code behind Pandas more than a finance role would need to know the underlying code behind Excel?

But in building an app this approach caught up to me - it was too difficult to debug these issues as there were components of the app I fully didn’t understand. Next time I would just create an app from scratch, and have the tradeoff of having it stand up in a few more weeks pay off in the long haul where I can fully debug my own code.

Twitter’s Free API is Being Deprecated

It is what it is.

Next Steps

Honestly, I’m not really motivated to do anything on this project for the moment until Twitter’s API issue is resolved. I will not pay for the Twitter API. If there is a reliable third-party service to fetch tweets, I’d entertain that. I would definitely not build a scraper or something custom - the tradeoff of time does not seem worth it for this. If that is resolved, and there is a way to reliably pull tweets going forward, I’d probably work through re-factoring the Django code so I could address the above issues and help grow the site.