Looters (instalooter.looters)

Instagram looters implementations.

class instalooter.looters.HashtagLooter(hashtag, **kwargs)[source]

Bases: instalooter.looters.InstaLooter

A looter targeting medias tagged with a hashtag.

Create a new hashtag looter.

Parameters

username (str) – the hashtag to search for.

See InstaLooter.__init__ for more details about accepted keyword arguments.

download(destination, condition=None, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)

Download all medias passing condition to destination.

Parameters
  • destination (FS or str) – the filesystem where to store the downloaded files, as a filesystem instance or FS URL.

  • condition (function) – the condition to filter the medias with. If None is given, a function is created using the get_videos and videos_only passed at object initialisation.

  • media_count (int or None) – the maximum number of medias to download. Leave to None to download everything from the target. Note that more files can be downloaded, since a post with multiple images/videos is considered to be a single media.

  • timeframe (tuple or None) – a tuple of two datetime objects to enforce a time frame (the first item must be more recent). Leave to None to ignore times.

  • new_only (bool) – stop media discovery when already downloaded medias are encountered.

  • pgpbar_cls (type or None) – an optional ProgressBar subclass to use to display page scraping progress.

  • dlpbar_cls (type or None) – an optional ProgressBar subclass to use to display file download progress.

Returns

the number of queued medias.

May not be equal to the number of downloaded medias if some errors occurred during background download.

Return type

int

download_pictures(destination, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)

Download all the pictures to the provided destination.

Actually a shortcut for download with condition set to accept only images.

download_videos(destination, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)

Download all videos to the provided destination.

Actually a shortcut for download with condition set to accept only videos.

get_post_info(code)

Get media information from a given post code.

Parameters

code (str) – the code of the post (can be obtained either from the shortcode attribute of media dictionaries, or from a post URL: https://www.instagram.com/p/<code>/)

Returns

a media dictionaries, in the format used by Instagram.

Return type

dict

logged_in()

Check if there’s an open Instagram session.

login(username, password)

Log the instance in using the given credentials.

Parameters
  • username (str) – the username to log in with.

  • password (str) – the password to log in with.

logout()

Log the instance out from the currently opened session.

medias(timeframe=None)

Obtain an iterator over the Instagram medias.

Wraps the iterator returned by InstaLooter.pages to seamlessly iterate over the medias of all the pages.

Returns

an iterator over the medias in every pages.

Return type

MediasIterator

pages()[source]

Obtain an iterator over Instagram post pages.

Returns

an iterator over the instagram post pages.

Return type

PageIterator

class instalooter.looters.InstaLooter(add_metadata=False, get_videos=False, videos_only=False, jobs=16, template='{id}', dump_json=False, dump_only=False, extended_dump=False, session=None)[source]

Bases: object

A brutal Instagram looter that raids without API tokens.

Create a new looter instance.

Parameters
  • add_metadata (bool) – Add date and comment metadata to the downloaded pictures.

  • get_videos (bool) – Also get the videos from the given target.

  • videos_only (bool) – Only download videos (implies get_videos=True).

  • jobs (bool) – the number of parallel threads to use to download media (12 or more is advised to have a true parallel download of media files).

  • template (str) – a filename format, in Python new-style-formatting format. See the the Template page of the documentation for available keys.

  • dump_json (bool) – Save each resource metadata to a JSON file next to the actual image/video.

  • dump_only (bool) – Only save metadata and discard the actual resource.

  • extended_dump (bool) – Attempt to fetch as much metadata as possible, at the cost of more time. Set to True if, for instance, you always want the top comments to be downloaded in the dump.

  • session (Session or None) – a requests session, or None to create a new one.

download(destination, condition=None, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)[source]

Download all medias passing condition to destination.

Parameters
  • destination (FS or str) – the filesystem where to store the downloaded files, as a filesystem instance or FS URL.

  • condition (function) – the condition to filter the medias with. If None is given, a function is created using the get_videos and videos_only passed at object initialisation.

  • media_count (int or None) – the maximum number of medias to download. Leave to None to download everything from the target. Note that more files can be downloaded, since a post with multiple images/videos is considered to be a single media.

  • timeframe (tuple or None) – a tuple of two datetime objects to enforce a time frame (the first item must be more recent). Leave to None to ignore times.

  • new_only (bool) – stop media discovery when already downloaded medias are encountered.

  • pgpbar_cls (type or None) – an optional ProgressBar subclass to use to display page scraping progress.

  • dlpbar_cls (type or None) – an optional ProgressBar subclass to use to display file download progress.

Returns

the number of queued medias.

May not be equal to the number of downloaded medias if some errors occurred during background download.

Return type

int

download_pictures(destination, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)[source]

Download all the pictures to the provided destination.

Actually a shortcut for download with condition set to accept only images.

download_videos(destination, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)[source]

Download all videos to the provided destination.

Actually a shortcut for download with condition set to accept only videos.

get_post_info(code)[source]

Get media information from a given post code.

Parameters

code (str) – the code of the post (can be obtained either from the shortcode attribute of media dictionaries, or from a post URL: https://www.instagram.com/p/<code>/)

Returns

a media dictionaries, in the format used by Instagram.

Return type

dict

logged_in()[source]

Check if there’s an open Instagram session.

login(username, password)[source]

Log the instance in using the given credentials.

Parameters
  • username (str) – the username to log in with.

  • password (str) – the password to log in with.

logout()[source]

Log the instance out from the currently opened session.

medias(timeframe=None)[source]

Obtain an iterator over the Instagram medias.

Wraps the iterator returned by InstaLooter.pages to seamlessly iterate over the medias of all the pages.

Returns

an iterator over the medias in every pages.

Return type

MediasIterator

abstract pages()[source]

Obtain an iterator over Instagram post pages.

Returns

an iterator over the instagram post pages.

Return type

PageIterator

class instalooter.looters.PostLooter(code, **kwargs)[source]

Bases: instalooter.looters.InstaLooter

A looter targeting a specific post.

Create a new hashtag looter.

Parameters

code (str) – the code of the post to get.

See InstaLooter.__init__ for more details about accepted keyword arguments.

download(destination, condition=None, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)[source]

Download the refered post to the destination.

See InstaLooter.download for argument reference.

Note

This function, opposed to other looter implementations, will not spawn new threads, but simply use the main thread to download the files.

Since a worker is in charge of downloading a media at a time (and not a file), there would be no point in spawning more.

download_pictures(destination, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)

Download all the pictures to the provided destination.

Actually a shortcut for download with condition set to accept only images.

download_videos(destination, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)

Download all videos to the provided destination.

Actually a shortcut for download with condition set to accept only videos.

get_post_info(code)

Get media information from a given post code.

Parameters

code (str) – the code of the post (can be obtained either from the shortcode attribute of media dictionaries, or from a post URL: https://www.instagram.com/p/<code>/)

Returns

a media dictionaries, in the format used by Instagram.

Return type

dict

logged_in()

Check if there’s an open Instagram session.

login(username, password)

Log the instance in using the given credentials.

Parameters
  • username (str) – the username to log in with.

  • password (str) – the password to log in with.

logout()

Log the instance out from the currently opened session.

medias(timeframe=None)[source]

Return a generator that yields only the refered post.

Yields

dict – a media dictionary obtained from the given post.

Raises

StopIteration – if the post does not fit the timeframe.

pages()[source]

Return a generator that yields a page with only the refered post.

Yields

dict – a page dictionary with only a single media.

class instalooter.looters.ProfileLooter(username, **kwargs)[source]

Bases: instalooter.looters.InstaLooter

A looter targeting medias on a user profile.

Create a new profile looter.

Parameters

username (str) – the username of the profile.

See InstaLooter.__init__ for more details about accepted keyword arguments.

download(destination, condition=None, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)

Download all medias passing condition to destination.

Parameters
  • destination (FS or str) – the filesystem where to store the downloaded files, as a filesystem instance or FS URL.

  • condition (function) – the condition to filter the medias with. If None is given, a function is created using the get_videos and videos_only passed at object initialisation.

  • media_count (int or None) – the maximum number of medias to download. Leave to None to download everything from the target. Note that more files can be downloaded, since a post with multiple images/videos is considered to be a single media.

  • timeframe (tuple or None) – a tuple of two datetime objects to enforce a time frame (the first item must be more recent). Leave to None to ignore times.

  • new_only (bool) – stop media discovery when already downloaded medias are encountered.

  • pgpbar_cls (type or None) – an optional ProgressBar subclass to use to display page scraping progress.

  • dlpbar_cls (type or None) – an optional ProgressBar subclass to use to display file download progress.

Returns

the number of queued medias.

May not be equal to the number of downloaded medias if some errors occurred during background download.

Return type

int

download_pictures(destination, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)

Download all the pictures to the provided destination.

Actually a shortcut for download with condition set to accept only images.

download_videos(destination, media_count=None, timeframe=None, new_only=False, pgpbar_cls=None, dlpbar_cls=None)

Download all videos to the provided destination.

Actually a shortcut for download with condition set to accept only videos.

get_post_info(code)

Get media information from a given post code.

Parameters

code (str) – the code of the post (can be obtained either from the shortcode attribute of media dictionaries, or from a post URL: https://www.instagram.com/p/<code>/)

Returns

a media dictionaries, in the format used by Instagram.

Return type

dict

logged_in()

Check if there’s an open Instagram session.

login(username, password)

Log the instance in using the given credentials.

Parameters
  • username (str) – the username to log in with.

  • password (str) – the password to log in with.

logout()

Log the instance out from the currently opened session.

medias(timeframe=None)

Obtain an iterator over the Instagram medias.

Wraps the iterator returned by InstaLooter.pages to seamlessly iterate over the medias of all the pages.

Returns

an iterator over the medias in every pages.

Return type

MediasIterator

pages()[source]

Obtain an iterator over Instagram post pages.

Returns

an iterator over the instagram post pages.

Return type

PageIterator

Raises
  • ValueError – when the requested user does not exist.

  • RuntimeError – when the user is a private account and there is no logged user (or the logged user does not follow that account).