不能识别图像

大佬好，感谢工作

在实际使用时想加入图像，在open webui和python代码中都无法识别（url和base64都试过），demo文件如下
```
"""
使用OpenAI协议调用视觉模型对图像进行标注
支持OpenAI GPT-4V 或兼容OpenAI协议的模型（如Qwen-VL、LLaVA等）
"""

import base64
from typing import Optional

import requests
from openai import OpenAI


def encode_image_from_url(image_url: str) -> str:
    """从URL下载图像并编码为base64格式"""
    response = requests.get(image_url, timeout=30)
    response.raise_for_status()
    return base64.b64encode(response.content).decode("utf-8")


def encode_image_from_file(image_path: str) -> str:
    """从本地文件读取图像并编码为base64格式"""
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


def caption_image(
    image_source: str,
    prompt: str = "请详细描述这张图片的内容。",
    model: str = "gpt-4o",
    base_url: Optional[str] = None,
    api_key: Optional[str] = None,
    max_tokens: int = 1024,
) -> str:
    """
    使用视觉模型对图像进行标注
    
    Args:
        image_source: 图像来源，可以是URL或本地文件路径
        prompt: 提示词，用于指导模型生成描述
        model: 模型名称，默认为gpt-4o
        base_url: API基础URL，用于兼容其他OpenAI协议的服务
        api_key: API密钥，如果为None则从环境变量OPENAI_API_KEY读取
        max_tokens: 最大生成token数
        
    Returns:
        模型生成的图像描述
    """
    # 初始化客户端
    client = OpenAI(
        api_key=api_key,
        base_url=base_url,
    )
    
    # 判断图像来源类型并处理
    if image_source.startswith(("http://", "https://")):
        # 从URL下载并编码为base64
        base64_image = encode_image_from_url(image_source)
        # 根据URL判断MIME类型
        url_lower = image_source.lower()
        if url_lower.endswith(".png"):
            mime_type = "image/png"
        elif url_lower.endswith(".gif"):
            mime_type = "image/gif"
        elif url_lower.endswith(".webp"):
            mime_type = "image/webp"
        else:
            mime_type = "image/jpeg"
    else:
        # 本地文件，编码为base64
        base64_image = encode_image_from_file(image_source)
        # 根据文件扩展名确定MIME类型
        if image_source.lower().endswith(".png"):
            mime_type = "image/png"
        elif image_source.lower().endswith(".gif"):
            mime_type = "image/gif"
        elif image_source.lower().endswith(".webp"):
            mime_type = "image/webp"
        else:
            mime_type = "image/jpeg"
    
    # 构建消息 - 使用 base64 编码的图片
    messages = [
        {
            "role": "system",
            "content": "You are a helpful assistant with vision capabilities. You can see and analyze images.",
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:{mime_type};base64,{base64_image}",
                    },
                },
                {
                    "type": "text",
                    "text": prompt,
                },
            ],
        }
    ]
    
    # 调用API
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=max_tokens,
        stream=False,
    )
    
    return response.choices[0].message.content


def main():
    """主函数：演示图像标注功能"""
    # 示例图像URL
    demo_image_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
    
    print("=" * 60)
    print("图像标注示例")
    print("=" * 60)
    print(f"图像URL: {demo_image_url}")
    print("-" * 60)
    
    try:
        caption = caption_image(
            image_source=demo_image_url,
            prompt="请详细描述这张图片的内容，包括场景、人物、物体和氛围。",
            model="gemini-3-pro-preview",
            base_url="http://192.168.8.184:3000/v1",
            api_key="api_key",
        )
        
        print("图像描述:")
        print("-" * 60)
        print(caption)
        print("=" * 60)
        
    except Exception as e:
        print(f"错误: {e}")
        print("\n提示：")
        print("1. 确保已设置环境变量 OPENAI_API_KEY")
        print("2. 或在代码中直接传入 api_key 参数")
        print("3. 如果使用本地服务，请设置正确的 base_url")


if __name__ == "__main__":
    main()

```

在config中图像能力全开，但是url和base64均不能被模型识别`vision_ability = "all"`，回复是

```
很抱歉，我无法直接看到您提到的图片。作为一个人工智能助手，我只能处理文本和代码信息，无法浏览或分析图像文件。

如果您希望我协助您完成与该图片相关的任务（例如编写对应的网页布局、分析数据图表或实现某种视觉效果），请尝试以下方法：

1.  **详细描述图片内容**：请用文字告诉我图片里有什么（例如：“这是一个登录页面，顶部有Logo，中间是用户名和密码输入框，底部有一个蓝色的登录按钮”）。
2.  **说明您的目标**：告诉我您希望通过这张图片实现什么功能或代码。

一旦您提供了描述，我会很乐意为您编写相应的代码或提供建议！
```

以下为config
```
# 用于共享的认证令牌，仅Chat端点权限(轮询与AUTH_TOKEN不同步)，无其余权限
share_token = ""

# 是否启用慢速池（true/false）(已失效)
slow_pool_enabled = false

# 是否启用长上下文模式（true/false）
long_context_enabled = true

# 图片处理能力配置
# 可选值:
# - none 或 disabled：禁用图片功能
# - base64 或 base64-only：仅支持 base64 编码的图片
# - all 或 base64-http：支持 base64 和 HTTP 图片
#   注意：启用 HTTP 支持可能会暴露服务器 IP (使用通用代理)
vision_ability = "all"

# 额度检查配置
# 可选值:
# - none 或 disabled：禁用额度检查
# - default：详见 README
# - all 或 everything：额度无条件检查
# - 以,分隔的模型列表，为空时使用默认值
model_usage_checks = "none"

# 动态密钥功能的安全验证密钥
# 用途：验证密钥配置请求的来源合法性（防止恶意添加/修改）
# 留空：自动禁用动态密钥功能
# 格式：
#   1. 原始密钥模式（推荐）：
#      以 "hex:" 开头，后跟 64 到 128 个十六进制字符。
#      - 长度必须 >= 64 字符（即至少 32 字节熵）。
#      - 推荐使用 128 字符以填满 64 字节密钥空间。
#      示例：dynamic_key_secret="hex:a1b2..."
#   2. 密码哈希模式：
#      若不带前缀或长度不足 64 字符，将计算该字符串的 SHA256 哈希值作为密钥。
# 生成：openssl rand -hex 64
dynamic_key_secret = ""

# 包含网络引用
web_references_included = true

# 模型数据获取模式
# 可选值:
# - truncate        - 覆盖模式(默认):    完全使用新获取的模型列表，替换所有现有模型
# - append:truncate - 智能合并模式:      保留现有模型中不在新列表中的，同时添加或更新新模型
# - append          - 纯追加模式:        只添加不存在的新模型，已有模型保持不变
raw_model_fetch_mode = "truncate"

# 模拟平台(默认{DEFAULT_PLATFORM})
# 可选值:
# - Windows
# - macOS
# - Linux
emulated_platform = "macOS"

# Cursor客户端版本
cursor_client_version = "2.4.19"
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

不能识别图像 #34

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

不能识别图像 #34

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions