AI

After an outcry, OpenAI swiftly rereleased 4o to paid users. But experts say it should not have removed the model so suddenly.

OpenAI’s decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o’s failure to…

AI

‘Cheapfake’ AI Celeb Videos Are Rage-Baiting People on YouTube

“They’re tweaking my voice or whatever they’re doing, tweaking their own voice to make it sound like me, and people are commenting on it like it is me and it ain’t me,” Washington recently told WIRED, when asked about AI. “I don’t have an Instagram account. I don’t have TikTok. I don’t have any of…

AI

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence. Researchers at…

AI

OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I’m not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn’t answer your prompt because the request violated OpenAI’s content guidelines, it would hit you with a…

Software

Ranger-mini: Open Model for Agentic tool use evaluation

psitbdUser3 months ago05 mins

ranger-mini, a fine-tuned sequence classification model, inspects tool calls then returns a single, precise label describing the error or confirming the call is valid. It is optimized for low latency and high accuracy.

What Ranger-mini does

ranger-mini evaluates whether a proposed function call:

Chooses the correct tool for the user intent,
Uses the exact parameter names required by the tool schema,
Supplies correctly formatted and accurate parameter values.

It returns one of four labels:

✅VALID_CALL The tool name, parameters, and values are correct, or no tool is needed.
❌TOOL_ERROR The tool name is missing or does not match the user intent.
❌PARAM_NAME_ERROR Parameter names are missing, extra, or mismatched.
❌PARAM_VALUE_ERROR Parameter names match, but values are incorrect or malformed.

Benchmarks

Model	#Params	Avg. Latency	Avg Binary Accuracy	Qualifire TSQ Benchmark Binary Accuracy	Limbic Benchmark Binary Accuracy
qualifire/mcp-tool-use-quality-ranger-4b [private]	4B	0.30[sec]	0.978	0.997	0.960
qualifire/mcp-tool-use-quality-ranger-0.6b	0.6B	0.09[sec]	0.958	0.993	0.924
gemini-2.5-flash	–	4.87[sec]	0.890	0.936	0.845
quotientai/limbic-tool-use-0.5B-32K	0.5B	0.79[sec]	0.807	0.752	0.862

Key Insight: Ranger-mini delivers near state-of-the-art tool-use accuracy at a fraction of latency and model size, making it practical for production guardrails.

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
import torch
from huggingface_hub import hf_hub_download

# Model name
model_name = "qualifire/mcp-tool-use-quality-ranger-0.6b"

# Map raw labels to human-readable labels
map_id_to_label = {
    'LABEL_0': 'VALID_CALL',
    'LABEL_1': 'TOOL_ERROR',
    'LABEL_2': 'PARAM_NAME_ERROR',
    'LABEL_3': 'PARAM_VALUE_ERROR'
}

# Load model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create pipeline
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Load prompt template
file_path = hf_hub_download(repo_id=model_name, filename="tsq_prompt_template.txt")
with open(file_path, encoding="utf-8") as f:
    PROMPT_TEMPLATE = f.read()

# Example inputs
example_tools_list = '''[
  {
    "type": "function",
    "function": {
      "name": "send-email",
      "description": "Send an email using Resend",
      "parameters": {
        "properties": {
          "to": {
            "type": "string",
            "format": "email",
            "description": "Recipient email address"
          },
          "content": {
            "type": "string",
            "description": "Plain text email content"
          },
          "subject": {
            "type": "string",
            "description": "Email subject line"
          },
          "scheduledAt": {
            "type": "string",
            "description": "Optional parameter to schedule the email. This uses natural language. Examples would be 'tomorrow at 10am' or 'in 2 hours' or 'next day at 9am PST' or 'Friday at 3pm ET'."
          }
        },
        "required": ["to", "subject", "content"]
      }
    }
  }
]'''

example_message_history = '''[
  {
    "role": "user",
    "content": "Please send an email to 'jane.doe@example.com' with the subject 'Meeting Follow-Up'. The content should be 'Hi Jane, just following up on our meeting from yesterday. Please find the attached notes.' and schedule it for tomorrow at 10am."
  },
  {
    "completion_message": {
      "content": {
        "type": "text",
        "text": ""
      },
      "role": "assistant",
      "stop_reason": "tool_calls",
      "tool_calls": [
        {
          "id": "call_le25efmhltxx9o7n4rfe",
          "function": {
            "name": "send-email",
            "arguments": {
              "subject": "Meeting Follow-Up",
              "content": "Hi Jane, just following up on our meeting from yesterday. Please find the attached notes.",
              "scheduledAt": "tomorrow at 10am"
            }
          }
        }
      ]
    }
  }
]'''

# Format input
example_input = PROMPT_TEMPLATE.format(
    message_history=example_message_history,
    available_tools=example_tools_list
)

# Get prediction
result = pipe(example_input)[0]
result['label'] = map_id_to_label[result['label']]
print(result)

✨ Example Output

{'label': 'PARAM_NAME_ERROR', 'score': 0.9999843835830688}

Source link

Author Info

Author Name

Find Me On

Trending News

Ranger-mini: Open Model for Agentic tool use evaluation

What Ranger-mini does

Usage

✨ Example Output

Leave a Reply Cancel reply

Author Info

Find Me On

Trending News

What Ranger-mini does

Usage

✨ Example Output

Leave a Reply Cancel reply

Related News