How We Trained an Algorithm to See What’s Beautiful

By Appu - 5 min read

Our Head of R&D Appu Shaji explains how his team developed a deep learning technology that understands aesthetic taste – then applies it to your photos.

It’s hard enough for us humans to define exactly what makes a good photograph, so how can we teach and train a computer do exactly that? First, we need to analyze the a human mental process, define each step and then enable the computer to recreate it.

I’ve explained the process and mathematical formulas behind this in more detail inanother article, but in a nutshell you can describe it as follows:

Imagine a scenario in which an art professor wants to teach her class how to understand what makes a good photo. One methodology that she can use is to show her students examples of how she might curate a set of photographs. She groups the good photos together and sets the not so good photos apart. After doing this process multiple times, she will ask her students to repeat it. The only feedback she gives, is if the student was successful in his/her choice.

If the student succeeds, he/she continues with the exercise. Otherwise, the student has to think over the error, learn from it and move onto the next set of photos. This is the basic process machine learning systems also go through.

The advantage is that we can do it on a large scale and learn from an infinite volume of data.

Developing our own aesthetic criteria

To make a computer understand aesthetics in photographs, we train it with a dataset. Understanding and appreciating aesthetics is quite an expert-level task. For this reason, our researchers and photo curators closely collaborated to develop our training data. When collecting the samples of “good” photographs for our training set, we set very high standards. The photo curators selected only pictures that communicate strong stories with good composition, and that were shot with technical mastery.

“If it tells a strong story, it shouldn’t be dismissed.”


The above picture was one of EyeEm’srecent Mission winners. It’s a photo that does not adhere to the traditional rules of photography; for example, the colors are desaturated and the image has an unusual composition. In technical terms, this image may not be regarded as a good photograph. But it tells a strong story and shouldn’t be dismissed.

“We encouraged our photo curators to use their innate visual sense and judgement.”

In an artistic medium like photography, photographers constantly explore and innovate. Images that deviate from the established rules are often the ones that evoke strong aesthetics. For this reason, we purposely dissuaded the photo curators from deconstructing the technical aspects and encouraged them to use their innate visual sense and judgement. We have thus developed our own aesthetic criteria with which to train our dataset.

The goal behind that approach is to leverage expert opinions over a much larger scale, using more easily available data for tasks like visual aesthetics which require an enormous amount of human intellectual and emotional judgement. Our hope is to facilitate a conversation among art, humanity and technology.

“Technology can never replace your personal taste.”

Of course, technology can never replace your personal taste and judgments. But we sincerely believe that we are entering a fascinating stage in which technology can power curation, enabling human stories to be discovered within the firehose of photographic data — starting with your very own camera roll.

This article is an abbreviated version of How We Trained an Algorithm to Predict What Makes a Beautiful Photo on Medium and Understanding Aesthetics with Deep Learning, first published by Nvidia in February 2016.