How I automated my Instagram account using Machine Learning and Python
13 mfernandes 1 hr 3
https://www.linkedin.com/pulse/how-i-am-earning-500-followers-weekly-instagram-using-fernandes/
news.ycombinator.com/item?id=20381180
The answer to do hard work without effort is automation. So I created a bot that does all the tedious work while I sit back doing nothing.
I am a guy that likes to learn new stuff, so for some time now I wanted to find out if all the fuss about Machine Learning (ML) was worth it.
At university I learned a lot about Machine Learning and Artificial Intelligence but unfortunately I've never had the change to work in a real life project where I could use it.
Challenge accepted! It's time to change that, so I decided to take a course on Machine Learning to update myself, and at the same time learn Python.
The best method for me to learn something fast is just to do something practical, and after thinking for a bit I decided to write a bot to automate an Instagram (IG) account by using Python and ML.
I created a new IG account, handed it to my bot and sat back watching it work. I was impressed that it quickly started earning followers and growing in popularity. In a couple of months it earned more than 2000 followers and currently the growing rate is about 500 new followers per week but the rate is increasingly fast.
These numbers might not seem impressive for you IG marketeers, given that there are several tools out there that do the exact same thing and probably perform much better, but for me it is a very good outcome, given that this bot was entirely written by me in Python from scratch and as a result of a learning experience. Oh, and did I mention that now that the bot is alive I don't have to do any actual work?
In this article I will explain what I did in detail, explaining the best I can all the tech I used, but will not provide any source code.
What is it?
Nowadays all we ear is about IG influencers and how many followers they have and the amount of money they make with sponsored posts. Having a personal IG account for some years that never got anywhere (I only have a couple hundred followers), I wanted to know how hard is it to have a lot of followers and therefore to have "success" at IG.
I created a new IG account and then automated the whole thing: it chooses the content to post, it posts content 3 times per day, it writes the captions, chooses the tags, it credits the original author, it chooses who to follow and who to unfollow. It basically automates the whole work of a real human in a way that you can't tell the difference. I basically just sit back and watch it work.
I don't even pay for a server for my bot to live in. It is running from a small old Raspberry Pi I had dusting on a shelf. I just pay for the electricity it consumes.
I got to learn some Python, I learned some Machine Learning and I proved I could start a successful project from scratch by using technologies I nearly knew nothing about before I started. It seems to me a clear win-win scenario. Oh, and I had some fun too ;-)
How it's made
My bot does these actions:
It gathers new pictures from IG (scrapping)
Among those pictures it selects only the best ones (cleaning)
Three times per day it posts a new picture. It writes its caption and gives credit to the original poster (posting)
It gathers new users to follow (promoting)
It follows new users regularly (follow)
It unfollows users regularly (unfollow)
Here's the details:
Scrapping
First I choose a theme for the IG account. I am not going to reveal the real theme because I want the account to remain unknown. But let's imagine that I decided to create an account about cats.
I gave my bot a list of tags and accounts so that it knows what kind of content I want it to post. Therefore I researched all the best tags and accounts that post the best pictures about cats and I created a list with it.
It's important to know that the bot does not generate any content on its own, it only reposts content from other users giving them full credit. IG is special in this area because most users will be happy to see their content being reposted because it means more exposure for them. In fact many of them write a comment thanking my bot for selecting it. Of course, if anyone would complain about it I will have to delete the post immediately but till this moment it never happened yet.
A user thanking me for sharing his picture
Once in a while the bot checks if there is enough content available to post, and if not, then it starts the process of visiting some of the tags or/and accounts and selects a few pictures and records all the related information and metadata.
Automatic cleaning
Deciding what's good content and what's bad is not a simple task. Many accounts publish pictures with ads and of course my bot doesn't want to post this kind of content. Therefore it must carefully select what's good and what's not.
My bot uses a combination of heuristics and as a last step it uses a Machine Learning model to select only the best cats. If it passes every test then it's good, if not it trashes it. In a medium where there is unlimited content, my bot can afford to reject content that it minimally suspects might not be the right one. When in doubt just reject it...
Here's the selection criteria my bot is using in more detail:
Image similarity
by excelAnt is licensed under CC PDM 1.0
When a image is really popular it is normal for it to be reposted in multiple IG accounts and my bot may be tempted to wrongly post the same image twice coming from different accounts.
To prevent this it uses a Perceptual Hash Algorithm that calculates a hash per image. Perceptual hash algorithms describe a class of comparable hash functions in which features in the image are used to generate a distinct fingerprint, and these fingerprints can be compared to find similar images.
In this way when there is a new image it can compare it to the existing ones to spot duplicates. Yes, this algorithm does not only spot equal images but also spots similar images, meaning a image does not have to be exactly the same to be marked as duplicate.
In this way my bot never posts the same image twice, even if the image is found in different accounts.
Check bad words in caption
My bot scans the caption for stop words or expressions that may be suspicious like "buy now" or "on sale now". If it finds any of these suspicious words it immediately discards the image because they may indicate posts used as ads.
Check post properties
If the original poster doesn't want people to repost his pictures then the bot respects it by examining the original poster properties. There is a IG property indicating that.
Another indication that the post may be commercial is if the comments are disabled. The bot avoids these posts also.
Check number of user tags
Many user tags in a single post normally indicate it is commercial. My bot discards this kind of posts also.
Check like and comment ratios
A good indicator of post quality is the number of likes and comments. Typically better content generates more interaction which translates on the number of likes and comments. But for an account with more followers it is only natural to have more interaction than an account with less followers. The answer to compare content quality ignoring the number of followers is to use a ratio instead of an absolute number.
For this reason my bot checks for the likes/followers and comments/followers ratio before deciding if the content is worth it or not.
Choosing the best content by using Machine Learning
I have a sense of what is good content and what's not, but to explain it with an algorithm is no easy task. This seems to be the perfect fit for a ML algorithm where I have lots of data and I classify it as good or not good. In fact this is the typical classification problem that can be solved by several ML well known algorithms.
As I described previously, I scrapped a bunch of IG images and created a spreadsheet with it. All I have to do is go through all the pictures and mark them as good (1) or not good (0).
While scrapping I record all sorts of other data like the number of likes and comments, the number of followers, number of tags, the full caption and many other types of data.
The goal is to train a model that given a picture and all its associated data, is able to tell me if the picture is good or not.
After several experiments I ended up using the Support Vector Machines (SVM) algorithm with a Radial Basis Function kernel (rbf) because the relation between the input and output may not be linear.
So, I classified a couple thousand images, because it is a tedious work. Yes I know this in not enough data but even so I managed to come up with a model with an accuracy of more than 75%.
Because I am able to get the probability, I only approve a picture if the probability of it being a "good" image is greater that 75%.
I used the excellent scikit-learn Python library to help me not rewrite the wheel.
Posting
Posting regularly is very important for IG popularity. It is also a very tedious work and most people just give up this task after a few weeks. Not my bot: it never gets tired or bored of posting. It posts 3 new photos per day, it never rests and never has a day off. Technically it is just a cron job that is fired 3 times a day.
But before posting, my bot has to create a caption and give credit to the original owner.
Automatic captioning
Getting good captions is not easy therefore I am using a mix of several methods:
- I get quotes and phrases that basically can be applied to any photo. For instance, the following quotes apply to any photo about cats:
"Time spent with cats is never wasted"
"Cats choose us, we don't own them"
So I scrap them from the internet and make a list with them. But these sources quickly run out and I want an almost infinite source of captions so my bot doesn't repeat itself ever.
- I generate a bunch of phrases based on an automatic algorithm.
I used the Markov algorithm to generate a bunch of sentences which are created statistically from a bigger text. In the end I had to pick the best ones from the results but it is less work than to write every single one of them.
- I also used OpenAI's GPT-2 model, a large-scale unsupervised language model which generates coherent paragraphs of text. It was used to generate automatic captions with incredible results. I used the 345M model as it produces better results. I can configure the starting words and the length of the produced text, so I just gave it some general starting sentences and watched it generate incredible realistic and credible text. Some texts included inaccurate facts which I had to, after a quick read, remove all the bad ones. But overall the quality is incredible.
Without any doubts, GPT-2 is my preferred method for the future if I ever need to generate more captions. I was amazed by how well it works!
With these methods I wrote a huge text file with a caption in each line. At the time of picking a new caption my bot simply chooses randomly one of these sentences. Because I have so many sentences, there is little danger of it repeating itself.
Tagging
It's important to choose a good set of tags for good post visibility. I have a file with a set of tags that are relevant for this type of content and I let my bot choose a random set from it. After a while I can choose the best tags that produce the best results.
Automatic credits
Giving credit is very important because I don't want to use content without authorization. almost everyone is happy to re-share its content because it means more exposure. In fact most of the people thank me personally in form of a comment, a private message or even an IG story.
My bot parses the caption of the original photo and tries to find the author by searching for sentences like "photo by" or "author". It uses a regular expression to simplify this task. If it does not find the author by this method then it gives credit to the IG account it scrapped the photo from.
A template to wrap it all together
After my bot calculates the caption, tags and credits it uses a template to assemble the final caption. It is simpler in case I want to change it later. With all this calculated a typical caption may look like this:
Cats choose us, we don't own them. Credits to @cats_of_instagram
cats #catsofinstagram #cats_of_world #cats_of_day
Growing the number of followers
An important aspect of IG is the promotion of my account because it is through promotion that it gathers more cat followers. From what I've learned the best types of promotions are liking other photos, commenting on other photos and following other users.
To start I decided my bot would only follow/unfollow other users. Currently it is not liking or commenting on other content.
User scrapping
For best results the users should come from accounts with the same interests as my bot's account, therefore I searched for accounts related to cats.
It then looks at these accounts and extracts its users. It uses these users to follow them hoping they will follow it back.
Follow and unfollow
You may be asking why do I need to unfollow? Well, it turns out you can only follow a finite amount of users. After that if you want to keep following you have to unfollow some. For this reason my bot follows a group of users, it then waits a couple of days, and then it unfollows them.
The activity of following/unfollowing becomes less important as the account grows because then the account has more exposure and it is able to grow organically.
Where does the bot live?
At first I thought it would be better to have a server somewhere where my bot could live, but the Raspberry Pi is a pretty good solution because I don't have to pay for any hosting and the internet it uses is paid anyway. I have it connected on a shelf without any screen, keyboard or mouse.
When I need to access the Pi for updates I use ssh. If I am away from home then I use ngrok to access it from anywhere in the world.
I set up a cron job on my Raspberry Pi to post a new photo 3 times a day. It looks like the best hours to post are early in the morning, at lunch time and at the end of the afternoon.
I have a different cron for follow/unfollow that runs several times per day to distribute the load evenly through the day.
Raspberry Pi 3 B+" by ghalfacree is licensed under CC BY-SA 2.0
Future work
Regarding ML there are some new tweaks that may improve the overall performance of the quality of image posts:
- Use number of likes and number of comments of published images on the bot's account to check its quality. That's right, instead of classifying manually if a picture is good or not, we can use the number of likes and comments as an indicator of post quality.
For example, if after two days, we get more than a certain number of likes in a post, then we can assume the quality of the post is good, otherwise we mark it as no good.
After training a model with this information then we can try to predict the number of likes and comments of a picture before publishing it.
- Use online learning
Using the previous technique we can make our model learn continuously by giving it continuous feedback. That is, after we learn if a picture is good or not, we can feed this information into our existing model making it adapt continuously. This technique is called "online learning".
The results
I have lots of fun just watching how my bot behaves and also seeing what pictures it "chooses" to post. I know I created it but even so, it feels like it has live of its own. Strange feeling...
It's also fun to consider that if I would have do all this work manually I would not have enough time to do anything else because it is a lot of work and my bot does it all automatically. Maybe I should try to automate other stuff in my life too...
by steve p2008 is licensed under CC BY 2.0
Manuel Fernandes
Manuel Fernandes
Tech Lead / Senior Full Stack Web Developer at Isobar Switzerland
How I automated Instagram - please read and comment #machinelearning #ml #artificialinteligence #ai #instagram #bot #automation #pi #raspberry #raspberrypi #python #gpt-2 #markov #scikitlearn #openai #scraping