Testing Ollama Web Search 🔍 and a Thinking Model 💭

Testing Ollama Web Search with gpt-oss

Ollama Web Search

The recent addition of web search capabilities represents a significant expansion of Ollama’s platform, moving it beyond a tool purely for running static local Large Language Models (LLMs) to a more dynamic, agent-enabled system.

This functionality addresses the primary limitation of local LLMs: their knowledge is fixed at their training cutoff date. By integrating web search as a core API feature, Ollama allows models to access real-time information from the internet, which dramatically reduces the risk of hallucinations and improves the accuracy and relevance of responses to queries about current events or highly specific data.

Key Web Search Components

Ollama’s web search is not just a simple query; it’s a suite of tools designed for effective information retrieval and integration:

Web Search API

: This is a dedicated REST API endpoint that performs a search for a given query, returning structured results. Each result includes a title, URL, and a content snippet, making it easy for an LLM or a developer to consume and process the information. Ollama provides a generous free tier for individuals to use this service.

Web Fetch API: Complementing the search function, the fetch API allows a model to retrieve the full content of a specific URL. This is crucial for agents that need to dig deeper into a single, relevant search result instead of relying only on a snippet.
Tool Integration: The web search and fetch functions are fully integrated as tools that modern, instruction-following models (like Llama 3.1, Qwen 3, and select cloud models) can intelligently call themselves via the Function Calling mechanism. This allows developers to build sophisticated search agents that autonomously decide when and how to use the internet to answer a user’s question.

Thinking / Reasoning Models

Models with “thinking” capabilities, often called reasoning models or models with an explicit Chain-of-Thought (CoT), represent a significant advancement in large language model (LLM) agent frameworks. Instead of providing a direct, instant answer, these models are prompted to articulate their internal reasoning process — a sequence of logical steps, analysis, and reflection — before generating the final output. This “thinking” mode, which can be explicitly enabled on platforms like Ollama for models such as DeepSeek R1 and Qwen3, or featured in models like GPT-OSS or IBM Granite 3.2, offers several benefits: it dramatically improves accuracy on complex, multi-step tasks by allowing the model to self-correct and avoid premature conclusions; it enhances transparency by letting users audit the logic behind an answer; and it is a key component in sophisticated AI agents that use frameworks like ReAct (Reason+Act) or Tree-of-Thoughts (ToT) to break down goals and utilize external tools. This explicit reasoning is crucial for applications demanding reliability, such as automated coding, enterprise workflows, and complex problem-solving where a black-box answer is insufficient.

Putting Search and Reasoning Together

Combining external web search with a model’s reasoning (or “thinking”) capacity creates a powerful synergy that results in highly accurate, informed, and up-to-date responses, overcoming the two main limitations of a standalone Large Language Model (LLM). The core benefit is Grounding and Recency: search provides the LLM with current, factual information that was not included in its original training data, allowing it to answer questions about recent events, real-time data, or specific proprietary information, effectively eliminating knowledge cutoffs and hallucinations based on outdated facts. The model’s reasoning capacity then processes this retrieved information using structured techniques like Chain-of-Thought — to logically evaluate and synthesize multiple search results — or ReAct — to decide whether to search, which tools to use, and how to combine the observations, leading to a much higher-quality answer that is not only factually correct but also logically derived and fully explained.

Test

Ok for testing… it was late, the debugging lights were flickering, and I was on the verge of declaring my code a lost cause when I stumbled upon the golden repository. I stared at the clean, perfect example, a stark contrast to my tangled mess, and realized my folly: sometimes you just need to surrender to the experts. The moment of pure, unadulterated programmer bliss arrived right after I took the following code as-is from Ollama GitHub examples and make it run with very slight changes. The script executed flawlessly, a magnificent monument to successful, strategic borrowing, and I proudly declared victory, knowing that the greatest coding skill of all is the art of the humble, yet effective, copy-paste.

Environment preparation…

python3 -m venv venv
source venv/bin/activate
source .env

pip install --upgrade pip
pip install ollama
pip install gpt-oss

You can make an “.env” file for the Ollama API Key ⬇️

# .env
export OLLAMA_API_KEY="exyz"

source .env

Web Search Helper for gpt-oss ⬇️

# web_search_gpt_oss_helper.py

from __future__ import annotations

import re
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any, Dict, List, Optional, Protocol, Tuple
from urllib.parse import urlparse

from ollama import Client


@dataclass
class Page:
  url: str
  title: str
  text: str
  lines: List[str]
  links: Dict[int, str]
  fetched_at: datetime


@dataclass
class BrowserStateData:
  page_stack: List[str] = field(default_factory=list)
  view_tokens: int = 1024
  url_to_page: Dict[str, Page] = field(default_factory=dict)


@dataclass
class WebSearchResult:
  title: str
  url: str
  content: Dict[str, str]


class SearchClient(Protocol):
  def search(self, queries: List[str], max_results: Optional[int] = None): ...


class CrawlClient(Protocol):
  def crawl(self, urls: List[str]): ...


# ---- Constants ---------------------------------------------------------------

DEFAULT_VIEW_TOKENS = 1024
CAPPED_TOOL_CONTENT_LEN = 8000

# ---- Helpers ----------------------------------------------------------------


def cap_tool_content(text: str) -> str:
  if not text:
    return text
  if len(text) <= CAPPED_TOOL_CONTENT_LEN:
    return text
  if CAPPED_TOOL_CONTENT_LEN <= 1:
    return text[:CAPPED_TOOL_CONTENT_LEN]
  return text[: CAPPED_TOOL_CONTENT_LEN - 1] + '…'


def _safe_domain(u: str) -> str:
  try:
    parsed = urlparse(u)
    host = parsed.netloc or u
    return host.replace('www.', '') if host else u
  except Exception:
    return u


# ---- BrowserState ------------------------------------------------------------


class BrowserState:
  def __init__(self, initial_state: Optional[BrowserStateData] = None):
    self._data = initial_state or BrowserStateData(view_tokens=DEFAULT_VIEW_TOKENS)

  def get_data(self) -> BrowserStateData:
    return self._data

  def set_data(self, data: BrowserStateData) -> None:
    self._data = data


# ---- Browser ----------------------------------------------------------------


class Browser:
  def __init__(
    self,
    initial_state: Optional[BrowserStateData] = None,
    client: Optional[Client] = None,
  ):
    self.state = BrowserState(initial_state)
    self._client: Optional[Client] = client

  def set_client(self, client: Client) -> None:
    self._client = client

  def get_state(self) -> BrowserStateData:
    return self.state.get_data()

  # ---- internal utils ----

  def _save_page(self, page: Page) -> None:
    data = self.state.get_data()
    data.url_to_page[page.url] = page
    data.page_stack.append(page.url)
    self.state.set_data(data)

  def _page_from_stack(self, url: str) -> Page:
    data = self.state.get_data()
    page = data.url_to_page.get(url)
    if not page:
      raise ValueError(f'Page not found for url {url}')
    return page

  def _join_lines_with_numbers(self, lines: List[str]) -> str:
    result = []
    for i, line in enumerate(lines):
      result.append(f'L{i}: {line}')
    return '\n'.join(result)

  def _wrap_lines(self, text: str, width: int = 80) -> List[str]:
    if width <= 0:
      width = 80
    src_lines = text.split('\n')
    wrapped: List[str] = []
    for line in src_lines:
      if line == '':
        wrapped.append('')
      elif len(line) <= width:
        wrapped.append(line)
      else:
        words = re.split(r'\s+', line)
        if not words:
          wrapped.append(line)
          continue
        curr = ''
        for w in words:
          test = (curr + ' ' + w) if curr else w
          if len(test) > width and curr:
            wrapped.append(curr)
            curr = w
          else:
            curr = test
        if curr:
          wrapped.append(curr)
    return wrapped

  def _process_markdown_links(self, text: str) -> Tuple[str, Dict[int, str]]:
    links: Dict[int, str] = {}
    link_id = 0

    multiline_pattern = re.compile(r'\[([^\]]+)\]\s*\n\s*\(([^)]+)\)')
    text = multiline_pattern.sub(lambda m: f'[{m.group(1)}]({m.group(2)})', text)
    text = re.sub(r'\s+', ' ', text)

    link_pattern = re.compile(r'\[([^\]]+)\]\(([^)]+)\)')

    def _repl(m: re.Match) -> str:
      nonlocal link_id
      link_text = m.group(1).strip()
      link_url = m.group(2).strip()
      domain = _safe_domain(link_url)
      formatted = f'【{link_id}†{link_text}†{domain}】'
      links[link_id] = link_url
      link_id += 1
      return formatted

    processed = link_pattern.sub(_repl, text)
    return processed, links

  def _get_end_loc(self, loc: int, num_lines: int, total_lines: int, lines: List[str]) -> int:
    if num_lines <= 0:
      txt = self._join_lines_with_numbers(lines[loc:])
      data = self.state.get_data()
      chars_per_token = 4
      max_chars = min(data.view_tokens * chars_per_token, len(txt))
      num_lines = txt[:max_chars].count('\n') + 1
    return min(loc + num_lines, total_lines)

  def _display_page(self, page: Page, cursor: int, loc: int, num_lines: int) -> str:
    total_lines = len(page.lines) or 0
    if total_lines == 0:
      page.lines = ['']
      total_lines = 1

    if loc != loc or loc < 0:
      loc = 0
    elif loc >= total_lines:
      loc = max(0, total_lines - 1)

    end_loc = self._get_end_loc(loc, num_lines, total_lines, page.lines)

    header = f'[{cursor}] {page.title}'
    header += f'({page.url})\n' if page.url else '\n'
    header += f'**viewing lines [{loc} - {end_loc - 1}] of {total_lines - 1}**\n\n'

    body_lines = []
    for i in range(loc, end_loc):
      body_lines.append(f'L{i}: {page.lines[i]}')

    return header + '\n'.join(body_lines)

  # ---- page builders ----

  def _build_search_results_page_collection(self, query: str, results: Dict[str, Any]) -> Page:
    page = Page(
      url=f'search_results_{query}',
      title=query,
      text='',
      lines=[],
      links={},
      fetched_at=datetime.utcnow(),
    )

    tb = []
    tb.append('')
    tb.append('# Search Results')
    tb.append('')

    link_idx = 0
    for query_results in results.get('results', {}).values():
      for result in query_results:
        domain = _safe_domain(result.get('url', ''))
        link_fmt = f'* 【{link_idx}†{result.get("title", "")}†{domain}】'
        tb.append(link_fmt)

        raw_snip = result.get('content') or ''
        capped = (raw_snip[:400] + '…') if len(raw_snip) > 400 else raw_snip
        cleaned = re.sub(r'\d{40,}', lambda m: m.group(0)[:40] + '…', capped)
        cleaned = re.sub(r'\s{3,}', ' ', cleaned)
        tb.append(cleaned)
        page.links[link_idx] = result.get('url', '')
        link_idx += 1

    page.text = '\n'.join(tb)
    page.lines = self._wrap_lines(page.text, 80)
    return page

  def _build_search_result_page(self, result: WebSearchResult, link_idx: int) -> Page:
    page = Page(
      url=result.url,
      title=result.title,
      text='',
      lines=[],
      links={},
      fetched_at=datetime.utcnow(),
    )

    link_fmt = f'【{link_idx}†{result.title}】\n'
    preview = link_fmt + f'URL: {result.url}\n'
    full_text = result.content.get('fullText', '') if result.content else ''
    preview += full_text[:300] + '\n\n'

    if not full_text:
      page.links[link_idx] = result.url

    if full_text:
      raw = f'URL: {result.url}\n{full_text}'
      processed, links = self._process_markdown_links(raw)
      page.text = processed
      page.links = links
    else:
      page.text = preview

    page.lines = self._wrap_lines(page.text, 80)
    return page

  def _build_page_from_fetch(self, requested_url: str, fetch_response: Dict[str, Any]) -> Page:
    page = Page(
      url=requested_url,
      title=requested_url,
      text='',
      lines=[],
      links={},
      fetched_at=datetime.utcnow(),
    )

    for url, url_results in fetch_response.get('results', {}).items():
      if url_results:
        r0 = url_results[0]
        if r0.get('content'):
          page.text = r0['content']
        if r0.get('title'):
          page.title = r0['title']
        page.url = url
        break

    if not page.text:
      page.text = 'No content could be extracted from this page.'
    else:
      page.text = f'URL: {page.url}\n{page.text}'

    processed, links = self._process_markdown_links(page.text)
    page.text = processed
    page.links = links
    page.lines = self._wrap_lines(page.text, 80)
    return page

  def _build_find_results_page(self, pattern: str, page: Page) -> Page:
    find_page = Page(
      url=f'find_results_{pattern}',
      title=f'Find results for text: `{pattern}` in `{page.title}`',
      text='',
      lines=[],
      links={},
      fetched_at=datetime.utcnow(),
    )

    max_results = 50
    num_show_lines = 4
    pattern_lower = pattern.lower()

    result_chunks: List[str] = []
    line_idx = 0
    while line_idx < len(page.lines):
      line = page.lines[line_idx]
      if pattern_lower not in line.lower():
        line_idx += 1
        continue

      end_line = min(line_idx + num_show_lines, len(page.lines))
      snippet = '\n'.join(page.lines[line_idx:end_line])
      link_fmt = f'【{len(result_chunks)}†match at L{line_idx}】'
      result_chunks.append(f'{link_fmt}\n{snippet}')

      if len(result_chunks) >= max_results:
        break
      line_idx += num_show_lines

    if not result_chunks:
      find_page.text = f'No `find` results for pattern: `{pattern}`'
    else:
      find_page.text = '\n\n'.join(result_chunks)

    find_page.lines = self._wrap_lines(find_page.text, 80)
    return find_page

  # ---- public API: search / open / find ------------------------------------

  def search(self, *, query: str, topn: int = 5) -> Dict[str, Any]:
    if not self._client:
      raise RuntimeError('Client not provided')

    resp = self._client.web_search(query, max_results=topn)

    normalized: Dict[str, Any] = {'results': {}}
    rows: List[Dict[str, str]] = []
    for item in resp.results:
      content = item.content or ''
      rows.append(
        {
          'title': item.title,
          'url': item.url,
          'content': content,
        }
      )
    normalized['results'][query] = rows

    search_page = self._build_search_results_page_collection(query, normalized)
    self._save_page(search_page)
    cursor = len(self.get_state().page_stack) - 1

    for query_results in normalized.get('results', {}).values():
      for i, r in enumerate(query_results):
        ws = WebSearchResult(
          title=r.get('title', ''),
          url=r.get('url', ''),
          content={'fullText': r.get('content', '') or ''},
        )
        result_page = self._build_search_result_page(ws, i + 1)
        data = self.get_state()
        data.url_to_page[result_page.url] = result_page
        self.state.set_data(data)

    page_text = self._display_page(search_page, cursor, loc=0, num_lines=-1)
    return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}

  def open(
    self,
    *,
    id: Optional[str | int] = None,
    cursor: int = -1,
    loc: int = 0,
    num_lines: int = -1,
  ) -> Dict[str, Any]:
    if not self._client:
      raise RuntimeError('Client not provided')

    state = self.get_state()

    if isinstance(id, str):
      url = id
      if url in state.url_to_page:
        self._save_page(state.url_to_page[url])
        cursor = len(self.get_state().page_stack) - 1
        page_text = self._display_page(state.url_to_page[url], cursor, loc, num_lines)
        return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}

      fetch_response = self._client.web_fetch(url)
      normalized: Dict[str, Any] = {
        'results': {
          url: [
            {
              'title': fetch_response.title or url,
              'url': url,
              'content': fetch_response.content or '',
            }
          ]
        }
      }
      new_page = self._build_page_from_fetch(url, normalized)
      self._save_page(new_page)
      cursor = len(self.get_state().page_stack) - 1
      page_text = self._display_page(new_page, cursor, loc, num_lines)
      return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}

    # Resolve current page from stack only if needed (int id or no id)
    page: Optional[Page] = None
    if cursor >= 0:
      if state.page_stack:
        if cursor >= len(state.page_stack):
          cursor = max(0, len(state.page_stack) - 1)
        page = self._page_from_stack(state.page_stack[cursor])
      else:
        page = None
    else:
      if state.page_stack:
        page = self._page_from_stack(state.page_stack[-1])

    if isinstance(id, int):
      if not page:
        raise RuntimeError('No current page to resolve link from')

      link_url = page.links.get(id)
      if not link_url:
        err = Page(
          url=f'invalid_link_{id}',
          title=f'No link with id {id} on `{page.title}`',
          text='',
          lines=[],
          links={},
          fetched_at=datetime.utcnow(),
        )
        available = sorted(page.links.keys())
        available_list = ', '.join(map(str, available)) if available else '(none)'
        err.text = '\n'.join(
          [
            f'Requested link id: {id}',
            f'Current page: {page.title}',
            f'Available link ids on this page: {available_list}',
            '',
            'Tips:',
            '- To scroll this page, call browser_open with { loc, num_lines } (no id).',
            '- To open a result from a search results page, pass the correct { cursor, id }.',
          ]
        )
        err.lines = self._wrap_lines(err.text, 80)
        self._save_page(err)
        cursor = len(self.get_state().page_stack) - 1
        page_text = self._display_page(err, cursor, 0, -1)
        return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}

      new_page = state.url_to_page.get(link_url)
      if not new_page:
        fetch_response = self._client.web_fetch(link_url)
        normalized: Dict[str, Any] = {
          'results': {
            link_url: [
              {
                'title': fetch_response.title or link_url,
                'url': link_url,
                'content': fetch_response.content or '',
              }
            ]
          }
        }
        new_page = self._build_page_from_fetch(link_url, normalized)

      self._save_page(new_page)
      cursor = len(self.get_state().page_stack) - 1
      page_text = self._display_page(new_page, cursor, loc, num_lines)
      return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}

    if not page:
      raise RuntimeError('No current page to display')

    cur = self.get_state()
    cur.page_stack.append(page.url)
    self.state.set_data(cur)
    cursor = len(cur.page_stack) - 1
    page_text = self._display_page(page, cursor, loc, num_lines)
    return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}

  def find(self, *, pattern: str, cursor: int = -1) -> Dict[str, Any]:
    state = self.get_state()
    if cursor == -1:
      if not state.page_stack:
        raise RuntimeError('No pages to search in')
      page = self._page_from_stack(state.page_stack[-1])
      cursor = len(state.page_stack) - 1
    else:
      if cursor < 0 or cursor >= len(state.page_stack):
        cursor = max(0, min(cursor, len(state.page_stack) - 1))
      page = self._page_from_stack(state.page_stack[cursor])

    find_page = self._build_find_results_page(pattern, page)
    self._save_page(find_page)
    new_cursor = len(self.get_state().page_stack) - 1

    page_text = self._display_page(find_page, new_cursor, 0, -1)
    return {'state': self.get_state(), 'pageText': cap_tool_content(page_text)}

The main part of the code 👇

# web-search-gpt-oss.py

# /// script
# requires-python = ">=3.11"
# dependencies = [
#     "ollama",
# ]
# ///
from typing import Any, Dict, List

from web_search_gpt_oss_helper import Browser

from ollama import Client


def main() -> None:
  client = Client()
  browser = Browser(initial_state=None, client=client)

  def browser_search(query: str, topn: int = 10) -> str:
    return browser.search(query=query, topn=topn)['pageText']

  def browser_open(id: int | str | None = None, cursor: int = -1, loc: int = -1, num_lines: int = -1) -> str:
    return browser.open(id=id, cursor=cursor, loc=loc, num_lines=num_lines)['pageText']

  def browser_find(pattern: str, cursor: int = -1, **_: Any) -> str:
    return browser.find(pattern=pattern, cursor=cursor)['pageText']

  browser_search_schema = {
    'type': 'function',
    'function': {
      'name': 'browser.search',
    },
  }

  browser_open_schema = {
    'type': 'function',
    'function': {
      'name': 'browser.open',
    },
  }

  browser_find_schema = {
    'type': 'function',
    'function': {
      'name': 'browser.find',
    },
  }

  available_tools = {
    'browser.search': browser_search,
    'browser.open': browser_open,
    'browser.find': browser_find,
  }

  query = "what is Docling and what does it do"
  print('Prompt:', query, '\n')

  messages: List[Dict[str, Any]] = [{'role': 'user', 'content': query}]

  while True:
    resp = client.chat(
      # model="gpt-oss:120b-cloud",
      # changed with the local dowloaded gpt-oss
      model='gpt-oss',
      messages=messages,
      tools=[browser_search_schema, browser_open_schema, browser_find_schema],
      #####
      think=True,
      #####

    )

    if resp.message.thinking:
      print('Thinking:\n========\n')
      print(resp.message.thinking + '\n')

    if resp.message.content:
      print('Response:\n========\n')
      print(resp.message.content + '\n')

    messages.append(resp.message)

    if not resp.message.tool_calls:
      break

    for tc in resp.message.tool_calls:
      tool_name = tc.function.name
      args = tc.function.arguments or {}
      print(f'Tool name: {tool_name}, args: {args}')
      fn = available_tools.get(tool_name)
      if not fn:
        messages.append({'role': 'tool', 'content': f'Tool {tool_name} not found', 'tool_name': tool_name})
        continue

      try:
        result_text = fn(**args)
        print('Result: ', result_text[:200] + '...')
      except Exception as e:
        result_text = f'Error from {tool_name}: {e}'

      messages.append({'role': 'tool', 'content': result_text, 'tool_name': tool_name})


if __name__ == '__main__':
  main()

The execution and output result ⬇️

python web-search-gpt-oss.py
Prompt: what is Docling and what does it do 

Thinking:
========

We need to answer: what is Docling and what does it do. Likely Docling is a documentation generator? Might be a tool. We need to browse.

Tool name: browser.search, args: {'query': 'Docling documentation tool', 'source': 'news', 'topn': 10}
Thinking:
========

The source param is optional but maybe not supported. We'll just search.

Tool name: browser.search, args: {'query': 'Docling documentation generator', 'topn': 10}
Result:  [0] Docling documentation generator(search_results_Docling documentation generator)
**viewing lines [0 - 105] of 138**

L0: 
L1: # Search Results
L2: 
L3: * 【0†docling-project/docling: Get your docume...
Thinking:
========

Open the docling project page.

Tool name: browser.open, args: {'cursor': 0, 'id': 1}
Result:  [1] Docling - GitHub Pages(https://docling-project.github.io/docling/)
**viewing lines [0 - 31] of 31**

L0: URL: https://docling-project.github.io/docling/ 【0†Skip to
L1: content†docling-project.gith...
Thinking:
========

This page gives a description. We need to explain what Docling is and what it does. Provide answer: open-source document processing for AI, supports many formats, advanced PDF understanding, provides unified representation, export to Markdown, HTML, JSON, integration with LangChain, LlamaIndex, etc. It can be used for RAG, AI assistants. Provide explanation. Use citations. Also maybe mention the DoclingDocument representation, MCP server, CLI. Provide summary. Let's gather a few lines to cite. Lines 1-4, 5-9, 10-14, etc.

Tool name: browser.open, args: {'cursor': 1, 'loc': 0}
Result:  [2] Docling - GitHub Pages(https://docling-project.github.io/docling/)
**viewing lines [0 - 31] of 31**

L0: URL: https://docling-project.github.io/docling/ 【0†Skip to
L1: content†docling-project.gith...
Thinking:
========

Now, maybe also open docling.ai page.

Tool name: browser.open, args: {'cursor': 0, 'id': 2}
Result:  [3] Docling - Open Source Document Processing for AI(https://www.docling.ai/)
**viewing lines [0 - 24] of 24**

L0: URL: https://www.docling.ai/ Docling Preps Your Files for GenAI, RAG, and Beyond
L1:...
Thinking:
========

Now we can summarize. Provide answer: Docling is an open-source tool/library that simplifies document processing and parsing for AI applications, handling many formats, advanced PDF understanding, OCR, audio, etc. It outputs a unified DoclingDocument representation, can export to Markdown, HTML, JSON, integrate with LangChain, LlamaIndex, etc., has a CLI and MCP server for agentic AI, can run locally.

We need to answer "what is Docling and what does it do". So answer accordingly. Provide citations from lines. Let's cite lines 1-5, 6-9, 10-14 etc. We'll use lines 2-5 for description, 6-9 for features. Provide answer.

Response:
========

**Docling** is an open‑source, AI‑ready document‑processing system that turns messy files into clean, structured data you can feed straight into generative‑AI or retrieval‑augmented‑generation (RAG) pipelines.  
It was started by IBM Research and is now maintained under the LF AI & Data Foundation.  

What it does  
- **Parses a wide range of formats** – PDF, DOCX, PPTX, XLSX, HTML, images, audio, video subtitles, etc.  
- **Deep PDF understanding** – extracts page layout, reading order, tables, code snippets, mathematical formulas, image classification, and more.  
- **Unified representation** – produces a `DoclingDocument` object that captures the hierarchy and semantics of the source file.  
- **Multiple export options** – Markdown, HTML, lossless JSON, DocTags, or any custom format.  
- **Ready‑for‑AI integration** – plug‑and‑play connectors for LangChain, LlamaIndex, Haystack, Crew‑AI, Langflow, and a lightweight MCP (model‑control‑proxy) server for agentic workflows.  
- **OCR and audio support** – works with scanned PDFs, images, and audio files via ASR.  
- **Local, air‑gap execution** – all processing can be done on‑premises, which is useful for sensitive data.  
- **CLI and API** – a simple command‑line tool and a programmable interface make it easy to batch‑process or embed in larger pipelines.  

In short, Docling is a comprehensive, AI‑oriented document‑to‑structure converter that takes anything from a corporate slide deck to a scanned research paper and delivers it in a format that large language models and other AI services can ingest and understand.  

Sources: Docling homepage【2†L1-L9】【2†L10-L15】.

Et voilà 💯

Conclusion

In conclusion, the convergence of models with “thinking” capacities and web search tools marks a paradigm shift in the utility and intelligence of large language models. The evolution of LLMs into sophisticated AI agents, powered by explicit reasoning frameworks like Chain-of-Thought (CoT) and ReAct, enables them to break down complex problems, self-critique their logic, and develop strategic plans. This inherent reasoning, now made transparent and controllable in systems like Ollama’s “thinking” mode and GPT-OSS or IBM Granite 3.2, becomes an indispensable asset when paired with real-time data access. By using web search, the model is no longer limited by its training data; it achieves Grounding and Recency, ensuring its detailed, logical conclusions are built upon the most current and factual information. Ultimately, whether leveraging strategic borrowing from public examples — as I playfully demonstrated by adapting the working Ollama code — or orchestrating a search-and-reason loop, this synthesis is what delivers answers that are not just coherent, but truly comprehensive, accurate, and reliably informed.

Links

Source link