Google programmers set out to instill in the machine an artistic taste and began with aesthetically correct photo processing. The peculiarity of the concept of beauty is that it does not have clear criteria. The car had to create these criteria. The training took place on the basis of a popular photo site, a contextual database with metadata on images, and panoramic spherical images from Google Street View were the subject of processing. The resulting set of algorithms is called Creatism
- a deep learning system for creating artistic content.
The authors of Creatism, Hui Fang and Meng Zhang, believe that they have developed a scale of beauty ratings, which photographers can use in the future for objective comparisons. According to the results, they conducted what they called the Turing test for photographers. The researchers offered the experts to evaluate a mixture of the best shots made by people and created by Creatism, not to mention that there are machines in the set. 40% of the work of artificial intelligence were rated "good pictures with artistic taste." The developers aim to help any amateur photographer to turn your photo into a beautiful image without filters and settings. By pressing a single button that launches Creatism.Approaching the transmission of light to the natural and most "deep", the algorithm sometimes made small mistakes of pasting panoramas that can be seen in this photo.
Problems of light transmission in photos
The matrix of a digital camera is not able to simultaneously capture information in dark areas of the image, for which exposure is necessary (exposure) more, and in light, where there is enough exposure less. Dynamic range is the difference in exposure steps between the darkest and lightest parts of an image that can be reproduced without loss of information. In completely black areas of the image (in overexposed), as well as in overexposed (underexposed) information, it is impossible to recover. Dark areas of the image can be brightened, but with distortion. With the HDR method, a picture from several pictures taken with different exposures is combined in one 32-bit file.
Human vision is able to capture a visual picture with a difference of 10-14
light levels under bright sunlight (the pupil does not have the ability to adapt to different light levels in the sun) and up to 24 steps in the dim light of the stars (it can be adapted to the light difference). We can see, but to capture even a part of this range in the photo can be difficult. The dynamic range of a normal negative film is about 9-11 exposure levels, the slide film is 5-6 steps, the digital camera matrix is from 8 to 11 for most digital cameras. Special cameras provide 17 or more steps of shooting. Play real dynamic range is also not easy. Photo paper, for example, is capable of reproducing only 7-8 exposure stages.
For the experiment, 15,000 professional photos were taken from 500px.com with a resolution of at least 299 x 299 pixels. With their help, the developers taught Creatism to highlight the most interesting things in the landscape. Then, based on 40,000 panoramic spherical images of landscapes in the national US, Canada and Europe, the algorithm was taught to work with color and light.
Then, each frame was enhanced with dynamic range, and applied its own “expressiveness filter”, which improves shadows, lighting, and color. For this, according to tradition,
for image processing, we used a model of a generative adversary neural network - a model where one part of the program degrades the quality of the original, while another tries to restore it and learns how not to do it.
To create the final rating scale, we took the AVA database, in which 250,000 images, and most importantly, a variety of diverse metadata, including a large number of aesthetic ratings for each image, semantic tags in 60 categories, as well as labels related to the photographic style for professional sorting.
Following the results of all processing operations, 400 photographs of the experiment were mixed with 800 AVA photographs and were given to experts.
Photo experts with vocational education and at least 2 years of experience were asked to assess to which category the picture belongs and give an appropriate assessment:
- Made "soap box" - as it turned out, it happened. No settings, no focus.
- A good photo of an inexperienced novice, but the artistic value is minimal.
- Semi-professional snapshot. Clearly visible artistic taste.
- Made by a professional.
From the number of processed Creatism 40% of the pictures fell into the category from 3 to 4 - that is, they were recognized as at least semi-professional. The average score was below 3.
Researchers have published the
best images processed by Creatism. Under each for comparison, a full panorama is shown from which it was cut.