Skip to content
/ i2dc Public

Vibe coded a tool for taking an instagram archive zip and transforming it into bepress ingestible files with metadata in colab

License

Notifications You must be signed in to change notification settings

e3la/i2dc

Repository files navigation

instagram2digitalcommons

Are a collection of vibe coded tools for taking an instagram archive zip and transforming it into something closer to bepress ingestible files with metadata. The avante guard technique of vibe coding is being put to use in part because I know enough python to be dangerous and I'm curious if I can make useful things from ai prompts and plenty of testing.

📺 Watch the Introduction to i2dc on youtube

watch an introduction to i2dc on youtube

📸 instagram2digitalcommons

instagram2digitalcommons is a collection of hopefully librarian-friendly tools for transforming an Instagram archive .zip into structured media files + metadata packages for upload to your institutional repository (with the digital commons batch upload). It all runs online with no local installation required.


🚀 Features

  • ✅ Upload and parse Instagram .zip archives
  • ✅ Extract images, videos, captions, dates, and metadata
  • ✅ Categorize content into Reels, Posts, and Stories
  • ✅ Download a set zip files with:
    • Clean well labeled media files
    • Metadata spreadsheets (.xlsx)
    • README files for each set

🚀 Bonus Features

  • ✅ Upload your batch zips for Reels and Posts
  • ✅ Examine images and videos
  • ✅ Add subtitles and alt text generated by AI
  • ✅ Download anew set zip files with:
    • Clean well labeled media files
    • Metadata spreadsheets (.xlsx)

🔍 Use Case

Collect and archive social media content for preservation and research. Instagram archives include valuable materials (e.g. event photos, exhibition documentation, institutional campaigns). This tool supports:

  • Special Collections digitization projects
  • Campus or departmental documentation
  • Faculty digital scholarship
  • Community history initiatives

📁 Input Format

Upload an official Instagram archive .zip file that you receive when you request your data from Instagram. The instagram account admin can request their archive by visiting https://www.instagram.com/download/request.


📦 Output

Each category (Reels, Posts, Stories) will be exported as zip files


🧑‍💻 How to Run i2dc (in Google Colab)

You will need a google account to run this from the link, but if you have another way to run a jupyter notebook, feel free to do that.

Click the button below to launch the notebook:

First Pass - Open in Colab

  1. Upload your Instagram archive .zip when prompted (either directly or using google drive saved to a folder named /i2dc)
  2. Configure your preferences and metadata
  3. Download your zips directly or into your google drive /i2dc folder
  4. Extract your zips and explore your media and metadata in a new more familiar format!

Advanced Review Tools

For Reels: Reels Metadata review - Open in Colab Upload the reels zips you made in the core i2dc tool and review them and their metadata including subtitles, and generate new subtitles with whisper ai with this tool, and if you run out of GPU there is a backup slower cpu only version.

For Posts: An aistudio google app for generating alt text to your posts Upload the posts zips you made in the First Pass (this is the reason I recommend 20 at a time), review them, and add AI descriptions.


📚 Dependencies

You'll need a google account to run both google colab and google ai studio. You should be able to run any of the colab ipynb files without connecting it to your google drive, but it is all faster if you do. This tool was designed because so many librarians can't install this sort of python toolkit on their computer.


✨ Future Ideas

  • Custom field mapping for specific IR schemas
  • Alt text generation using AI
  • OCR of embedded text in images

👩‍🏫 Author

Created by Helena Marvin
AI-assisted by ChatGPT & Gemini


📚 Further Resources and Many Thanks to These Learning Resources

I watched the first half of this video about vibe coding and it helped me better conceptualize how get what I wanted done: https://www.youtube.com/watch?v=iLCDSY2XX7E by Tina Huang. I also enjoyed her 20 minute summary of google's 9 hour AI prompt engineering -https://www.youtube.com/watch?v=p09yRj47kNM.

I have learned a lot from https://www.freecodecamp.org/ and would have struggled a lot more with this project if I hadn't done a data analysis bootcamp at launchcode https://www.launchcode.org/ in St. Louis Missouri.

If you have access to the O'Reilly database it is a treasure of tech information that can be useful to you.

I haven't used it but if you're wanting whisper transcripts and you're on windows try out https://nolongerset.com/whisper-desktop/ .


My Workflow 💡➡️🤖➡️🧪➡️🔄

My most effective workflow began with figuring out the high-level vision for this project, a process I learned from Tina Huang's "vibe coding" video. I then used that as a prompt to Gemini in Google AI Studio with the specific request to generate the individual cells of a Colab notebook to execute that vision. I copied from the chat with gemini in ai studio and tested it directly in Colab. For any minor issues or small bugs, I use the integrated Gemini assistant inside Colab for quick fixes.

After those fixes were made to make more substantial changes, such as adding new features or changing the workflow, I download the entire Colab notebook as a py. I then start a new chat session with Gemini and upload that file. This provides Gemini with the full context of the code, allowing me to ask specific questions like, "Which cell or cells should I modify to add this new functionality?". This iterative process of downloading, uploading, and targeted changes proven to be a somewhat efficient way to build and refine this tool.

I wrote the above by tossing this word vomit into a chat with gemini: "The workflow that I found most useful to me was, get a high level overview of what I wanted (as described in Tina Huang's vibe coding video) and ask gemini in google ai studio to write cells of a colab that would do that. Test the colab, fix tiny errors with gemini inside the colab, and download the entire colab and go upload it back into a new chat in google and ask it which cell I needed to change to add or fix any part of the tool." I don't always word so good and that means being able to edit a prompt after I get an output is really helpful and part of what I enjoy about the aistudio interface, that and being able to see my token count so I can know when to shift to a new chat.


📄 License

CC0 because I believe ai generated code should be public domain

About

Vibe coded a tool for taking an instagram archive zip and transforming it into bepress ingestible files with metadata in colab

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published