Are a collection of vibe coded tools for taking an instagram archive zip and transforming it into something closer to bepress ingestible files with metadata. The avante guard technique of vibe coding is being put to use in part because I know enough python to be dangerous and I'm curious if I can make useful things from ai prompts and plenty of testing.
instagram2digitalcommons is a collection of hopefully librarian-friendly tools for transforming an Instagram archive .zip into structured media files + metadata packages for upload to your institutional repository (with the digital commons batch upload). It all runs online with no local installation required.
- ✅ Upload and parse Instagram
.ziparchives - ✅ Extract images, videos, captions, dates, and metadata
- ✅ Categorize content into Reels, Posts, and Stories
- ✅ Download a set zip files with:
- Clean well labeled media files
- Metadata spreadsheets (.xlsx)
- README files for each set
- ✅ Upload your batch zips for Reels and Posts
- ✅ Examine images and videos
- ✅ Add subtitles and alt text generated by AI
- ✅ Download anew set zip files with:
- Clean well labeled media files
- Metadata spreadsheets (.xlsx)
Collect and archive social media content for preservation and research. Instagram archives include valuable materials (e.g. event photos, exhibition documentation, institutional campaigns). This tool supports:
- Special Collections digitization projects
- Campus or departmental documentation
- Faculty digital scholarship
- Community history initiatives
Upload an official Instagram archive .zip file that you receive when you request your data from Instagram.
The instagram account admin can request their archive by visiting https://www.instagram.com/download/request.
Each category (Reels, Posts, Stories) will be exported as zip files
You will need a google account to run this from the link, but if you have another way to run a jupyter notebook, feel free to do that.
Click the button below to launch the notebook:
- Upload your Instagram archive
.zipwhen prompted (either directly or using google drive saved to a folder named /i2dc) - Configure your preferences and metadata
- Download your zips directly or into your google drive /i2dc folder
- Extract your zips and explore your media and metadata in a new more familiar format!
For Reels:
Upload the reels zips you made in the core i2dc tool and review them and their metadata including subtitles, and generate new subtitles with whisper ai with this tool, and if you run out of GPU there is a backup slower cpu only version.
For Posts: An aistudio google app for generating alt text to your posts Upload the posts zips you made in the First Pass (this is the reason I recommend 20 at a time), review them, and add AI descriptions.
You'll need a google account to run both google colab and google ai studio. You should be able to run any of the colab ipynb files without connecting it to your google drive, but it is all faster if you do. This tool was designed because so many librarians can't install this sort of python toolkit on their computer.
- Custom field mapping for specific IR schemas
- Alt text generation using AI
- OCR of embedded text in images
Created by Helena Marvin
AI-assisted by ChatGPT & Gemini
- Prompt and conversation for developing this readme with chatgpt at [https://chatgpt.com/share/6830b6b7-a65c-800f-9391-53d0c394587e]
- Prompts for gemini in the prompts zip at [https://github.com/e3la/i2dc/blob/main/docs/prompts-07112025-google-ai-studio-smaller.zip]
I watched the first half of this video about vibe coding and it helped me better conceptualize how get what I wanted done: https://www.youtube.com/watch?v=iLCDSY2XX7E by Tina Huang. I also enjoyed her 20 minute summary of google's 9 hour AI prompt engineering -https://www.youtube.com/watch?v=p09yRj47kNM.
I have learned a lot from https://www.freecodecamp.org/ and would have struggled a lot more with this project if I hadn't done a data analysis bootcamp at launchcode https://www.launchcode.org/ in St. Louis Missouri.
If you have access to the O'Reilly database it is a treasure of tech information that can be useful to you.
I haven't used it but if you're wanting whisper transcripts and you're on windows try out https://nolongerset.com/whisper-desktop/ .
My most effective workflow began with figuring out the high-level vision for this project, a process I learned from Tina Huang's "vibe coding" video. I then used that as a prompt to Gemini in Google AI Studio with the specific request to generate the individual cells of a Colab notebook to execute that vision. I copied from the chat with gemini in ai studio and tested it directly in Colab. For any minor issues or small bugs, I use the integrated Gemini assistant inside Colab for quick fixes.
After those fixes were made to make more substantial changes, such as adding new features or changing the workflow, I download the entire Colab notebook as a py. I then start a new chat session with Gemini and upload that file. This provides Gemini with the full context of the code, allowing me to ask specific questions like, "Which cell or cells should I modify to add this new functionality?". This iterative process of downloading, uploading, and targeted changes proven to be a somewhat efficient way to build and refine this tool.
I wrote the above by tossing this word vomit into a chat with gemini: "The workflow that I found most useful to me was, get a high level overview of what I wanted (as described in Tina Huang's vibe coding video) and ask gemini in google ai studio to write cells of a colab that would do that. Test the colab, fix tiny errors with gemini inside the colab, and download the entire colab and go upload it back into a new chat in google and ask it which cell I needed to change to add or fix any part of the tool." I don't always word so good and that means being able to edit a prompt after I get an output is really helpful and part of what I enjoy about the aistudio interface, that and being able to see my token count so I can know when to shift to a new chat.
CC0 because I believe ai generated code should be public domain
