Speech2Model

Welcome to the Speech2Model project! This program converts spoken descriptions into detailed 3D models using the Meshy.ai API and a custom speech-to-text pipeline.

YouTube Video Demo

Overview

The application enables you to:

Transcribe your speech using a microphone.
Generate detailed 3D modeling prompts with the help of a language model.
Create and download 3D models directly from Meshy.ai.
View the generated models immediately.

Features

Live Transcription: Use your microphone to describe the 3D model.
Intelligent Prompt Generation: Automatically enhance vague descriptions into detailed prompts for 3D modeling.
Real-time Feedback: Generate and download your 3D models in real time.

Installation

Prerequisites

Ensure you have the following installed:

Ollama with LLM (For sanitizing and improving 3D prompt)
Python 3.8 or later
Required Python libraries (install via pip):
```
pip install speechrecognition requests
```

Clone the Repository

git clone https://github.com/jsammarco/Speech2Model.git
cd Speech2Model

Setup

Obtain your Meshy.ai API key.
Replace YOUR MESHY.AI KEY in the code with your actual API key.

Usage

Running the Application

Run the script to start the live transcription:

python speech2model.py

How It Works

Describe Your Model: Speak into your microphone to describe the 3D model.
Initiate Model Creation: Say "Create Model" when you're done describing.
Prompt Generation: The program generates a detailed modeling prompt.
Model Download: The 3D model is created and saved as a .glb file.
View the Model: The model opens automatically with your system's default viewer.

Example Workflow

Say: "A futuristic car with sleek design and neon lights."
Say: "Create Model."
The program processes your input and generates the model.

File Structure

speech2model.py: Main program file.
.glb: Generated 3D model files will be saved here.

Configuration

Modify the following parameters as needed:

Meshy.ai API key: Replace in the api_key variable.
Polling Interval: Adjust poll_interval for checking model status.
Default Output File: Change output_file to specify where models are saved.

Troubleshooting

Ensure your microphone is working and permissions are granted.
Verify your Meshy.ai API key is valid.
Check your internet connection for API requests.

Contributing

Feel free to fork this repository and submit pull requests to contribute improvements.

License

This project is licensed under the MIT License.

Contact

For any questions or issues, please reach out via the GitHub repository: Speech2Model

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Speech2Model.jpg		Speech2Model.jpg
generate_model_test.py		generate_model_test.py
speech2model.py		speech2model.py
stt_test.py		stt_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech2Model

Overview

Features

Installation

Prerequisites

Clone the Repository

Setup

Usage

Running the Application

How It Works

Example Workflow

File Structure

Configuration

Troubleshooting

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech2Model

Overview

Features

Installation

Prerequisites

Clone the Repository

Setup

Usage

Running the Application

How It Works

Example Workflow

File Structure

Configuration

Troubleshooting

Contributing

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages