Multimedia Computing and Computer Vision Lab












Student Theses


Source Code / Datasets





From Multimedia Computing Lab - University of Augsburg


We have created a new publicly available dataset called "Flickr-10M" to evaluate the proposed retrieval methodology on a large real-world image database.

This data set consists of 10 million images downloaded from Flickr. We aimed to make this dataset as diverse as possible to allow the evaluation of greatly varying retrieval approaches. Therefore we collected images that were annotated with specific tags, which indicate a variety of landmarks, scenes, cities, stars as well as objects. Geotags were explicitly not used to download images for two reasons: In most cases the number of images that actually have been geo-tagged is very small even for popular landmarks. Furthermore many landmarks are photographed from the far distance. In that case the geo-tagged location may be far from the position of the landmark itself. Also, for many categories like cities or national parks geotags are relatively meaningless despite narrowing down the number of available images.

Therefore we focused on tags and image descriptions. In cases a certain category did not yield a sufficient number of images (e.g. several thousands) we performed a full-text search for the query term in the image description to select the downloaded images.

This size of the dataset is beyond most datasets targeting a specific domain like scenes (e.g. SUN database, objects (e.g. PASCAL VOC), or landmarks (e.g. Oxbuild). It is comparable in its size to Imagenet and orders of magnitudes bigger than datasets that were previously used for image retrieval evaluations like Oxbuild or Corel.

This dataset consists of JPEG images with their associated metadata. This includes tags, titles, descriptions, and other user-generated content as well as other information stored with the photos (e.g. EXIF data if available). There are 852,697 different Flickr users that contribute at least one photo to our dataset. In total there are more than 300 different categories yielding a total of 10,080,251 images.

The database has not been cleaned or post-processed. Thus, it includes all kinds of content, e.g. from highquality to low-quality photographs with and without annotations in all kinds of languages. In short, we believe this database is a representative sample of the real data that is uploaded and shared on community websites and social networks on a daily basis.

Categories / Tags used for downloading


arm, wrestling, ax, throwing, badminton, ballett, baseball, basketball, belly, dance, billards, bmx, bowling, Boxing, Breaststroke, bullfighting, Bunjee, Jumping, Caber, Tossing, Canoe, Racing, Cheerleading, Croquet, Discgolf, Dodgeball, Fingerboarding, Footbag, Golfing, Handball, high, jump, Hockey, Horse, Riding, Ice, Speed, Skating, Inline, Hockey, Inline, Skating, Judo, Karate, Kendo, Kitesurfing, Lacrosse, long, jump, Marathon, Nordic, Walking, Paintball, Parachuting, Playing, Polo, Cello, Playing, Drums, Playing, Guitar, Playing, Piano, Poker, pole, vault, rafting, rock, climbing, Rodeo, rowing, rugby, shot, put, Skateboarding, Ski, Jumping, skiing, Skipping, Skydiving, Slackline, Snorkeling, table, tennis, tennis, volleyball, wakeboarding, waterski, yoga, aikido, archery


Abu Simbel, Allianz Arena, Angel Falls, Angkor Wat, Arc de Triomphe, Auferstehungskirche St Petersburg, Ayers Rock, Banaue Rice Terraces, Basilica de Notre Dame (Montreal), Baalbek, Berliner Mauer, Berliner Philharmonie, Big Ben, Bilbao Museum, Biosphere Montreal, Borodur, Brandenburger Tor, Brooklyn Bridge, Bundestag, Burj Al Arab, Burj Khalifa, Canals Venice, Capitol Washington, Carlsbad Caverns, Cathedrale de notre dame, Cathedral of Ely, Chain Bridge, charles bridge, Chichen Itza, Chinesische Mauer, Christo Redentor, Chrysler Building, CN Tower, Colloseum Rom, Columbia Icefield, Diaolou, Dom Helsinki, Donauturm, Dreischluchten Damm, Easter Island, Eiffel Tower, Empire State Building, Fernsehturm Berlin, Festung Hohensalzburg, Frauenkirche Dresden, Frauenkirche Munich, Forbidden City, Fuji Sama Berg, Gasometer in Oberhausen, Golden Gate Bridge, Golden Pavillon, Golden Roof, Golden Temple (India), Grand Canyon, Ground Zero, Guggenheim Museum, Harbour Bridge Sydney, Hollywood sign, Iguazu Falls, Itsukushima, Kreml Moskau, Liberty Bell, Lincoln Memorial, Louvre, Louvre Paris, Marks Basilica, Markusplatz(Venedig), Matterhorn, Meenakshi, Michaelis Church, Mosque Cordoba, Nationalstadium Beijing, Neuschwanstein, Ngorongoro Crater, Niagara Falls, Notre Dame Ronchamp, Opera House Oslo, Opera Sydney, Oratoire St Joseph, Palast Versailles, Parliamt Hill Ottawa, Parthenon, Petronas Tower, Portala Palace, pyramid,cheops, pyramids,gizeh, Qutb Minar, Rideau Canal, Rila Monastery, Rockefeller Center, Royal Ontario Museum, Schiefer Turm Pisa, Schloss Neuschwanstein, Schloss Sans Soucci, Schloss Schoenbrunn, Seagram Building, Sears Tower, Shwedagon Pagoda, Sixtinische Kapelle, Sky tower Auckland, Spanische Treppe Rom, Sphinx, St Basil Cathedral, Statue of Liberty, Stonehenge, Sultan Ahmed Mosque, Taj Mahal, Temple Heaven, Teotihuacan, Tianmen Square, Tokyo Tower, Tower Bridge London, Tower of London, Uffizi Gallery, Umayyad Mosque, Valley of Kings, Victoria Falls, Wat Phra Kaew, Wattenmeer, Wenzelplatz, World Trade Center, xochimilco, Zwinger Dresden, Zytglogge


Hanoi, Havanna, Helsinki, Hiroshima, Istanbul, Jerusalem, Kiew, Kuala, Lumpur, Lagos, Leipzig, Leningrad, Lissabon, London, Los, Angeles, Luxemburg, Madrid, Mailand, Melbourne, Minsk, Montreal, Mumbai, Munich, Nagoya, New, York, City, Old, City, of, Jerusalem, Oslo, Ottawa, Oxford, Paris, Philadelphia, Phnom, Penh, Prag, Quebec, City, Redmond, Rom, Saigon, Salzburg, San, Francisco, Santa, Clara, Santiago, de, chile, Sao, Paolo, Seoul, Shanghai, St, Tropez, Stockholm, Taipei, Tallin, Tirana, Toronto, Tunis, Vancouver, Venedig, Warschau, Washington, Wien, Zuerich, Alexandria, Amsterdam, Ankara, Augsburg, Barcelona, Beirut, Berlin, Bratislava, Budapest, Buenos, Aires, Bukarest, Casablanca, Chicago, Damaskus, Delhi, (India), Den, Haag, Dresden, Dubai, Hamburg, Bangkok, Beijing, Las Vegas, Moscow, Rio de Janeiro, Singapore, Sydney, Zagreb


Beach, Carnival, Christmas, City, Desert, Forest, Portrait, Street(s), Sunset, Wedding


aircraft, bicycle, bird, birds, boat, bottle, bottles, building, bus, buses, butterfly, car, cat, cats, chair, chairs, cow, cows, dog, dogs, fish, flower, flowers, horse, horses, hot air ballon, monitor, motorcycle, motorcycles, mountain, people, sailing boat, sheep, ship, sofa, tractor, train, trains, tree, tv monitor


Abel Tasman, Acadia, Addo Elfephant, Algonquin Park, Arches, Ayuittuq, Bandhavgarh, Banff Park, Bromo Tenger Park, Cuc Phuong, Gran Paradiso, Great Smoky Mountains, Guyana, Kakadu Park, Krüger NP, Payette, White Sands Park, Yellowstone, Yosemite


Drew Barrymore, Eminem, George Bush, George Clooney, Gisele Bündchen, Gwen Stefani, Halle Berry, Harrison Ford, Hayden Panettiere, Heidi Klum, Hillary Duff, Hunter Burgan, Jack Nicholson, Jake Gyllenhaal, Janet Jackson, Jennifer Aniston, Jennifer Lopez, Jessica Biel, Jessica Simpsons, John Travolta, Johnny Depp, Jude Law, Julia Roberts, Justin Bieber, Justin Timberlake, Kate Winslet, Katie Holmes, Katy Perry, Keira Knightley, Kirsten Dunst, Kylie Minogue, Lady Gaga, Leonardo DiCaprio, Lindsay Lohan, Mariah Carey, Matt Damon, Meryl Streep, Michael Jackson, Miley Cyrus, Morgan Freeman, Natalie Portman, Nicole Kidman, Nicole Richie, Orlando Bloom, Paris Hilton, Paul McCartney, Penelope Cruz, Phil Collins, Rihanna, Robert Pattinson, Rod Steward, Salma Hayek, Samuel Jackson, Sarah Jessica Parker, Sarah Palin, Scarlett Johansson, Sean Penn, Steve Jobs, Tiger Woods, Tina Turner, Tom Cruise, Tom Hanks, Tyra Banks, Victoria Beckham, Will Smith, Woody Allen, Zac Efron, Alice Cooper, Angelina Jolie, Ashley Olsen, Audrey Hepburn, Barack Obama, Ben Stiller, Bill Clinton, Bill Gates, Bono, Brad Pitt, Britney Spears, Bruce Willis, Bryan Adams, Cameron Diaz, Carrie Underwood, Charlize Theron, Cher, Chris Brown, Christina Aguilera, Daniel Radcliff, David Beckham, David Bowie


This dataset is available upon request and shipped by sending hard-disks per snail-mail. Please contact Prof. Dr. Rainer Lienhart for more information.


  • Stefan Romberg, Rainer Lienhart, Eva Hörster Multimodal Image Retrieval International Journal of Multimedia Information Retrieval, Springer, February 2012. [PDF]