Skip to content

Commit 8cda451

Browse files
authored
Merge branch 'pipecat-ai:main' into main
2 parents fc90bdc + 99d3227 commit 8cda451

File tree

233 files changed

+9736
-2283
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

233 files changed

+9736
-2283
lines changed

.github/workflows/tests.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: test
1+
name: tests
22

33
on:
44
workflow_dispatch:
@@ -49,4 +49,4 @@ jobs:
4949
- name: Test with pytest
5050
run: |
5151
source .venv/bin/activate
52-
pytest --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
52+
pytest

CHANGELOG.md

Lines changed: 198 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,205 @@ All notable changes to **Pipecat** will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.0.55] - 2025-02-05
9+
10+
### Added
11+
12+
- Added a new `start_metadata` field to `PipelineParams`. The provided metadata
13+
will be set to the initial `StartFrame` being pushed from the `PipelineTask`.
14+
15+
- Added new fields to `PipelineParams` to control audio input and output sample
16+
rates for the whole pipeline. This allows controlling sample rates from a
17+
single place instead of having to specify sample rates in each
18+
service. Setting a sample rate to a service is still possible and will
19+
override the value from `PipelineParams`.
20+
21+
- Introduce audio resamplers (`BaseAudioResampler`). This is just a base class
22+
to implement audio resamplers. Currently, two implementations are provided
23+
`SOXRAudioResampler` and `ResampyResampler`. A new
24+
`create_default_resampler()` has been added (replacing the now deprecated
25+
`resample_audio()`).
26+
27+
- It is now possible to specify the asyncio event loop that a `PipelineTask` and
28+
all the processors should run on by passing it as a new argument to the
29+
`PipelineRunner`. This could allow running pipelines in multiple threads each
30+
one with its own event loop.
31+
32+
- Added a new `utils.TaskManager`. Instead of a global task manager we now have
33+
a task manager per `PipelineTask`. In the previous version the task manager
34+
was global, so running multiple simultaneous `PipelineTask`s could result in
35+
dangling task warnings which were not actually true. In order, for all the
36+
processors to know about the task manager, we pass it through the
37+
`StartFrame`. This means that processors should create tasks when they receive
38+
a `StartFrame` but not before (because they don't have a task manager yet).
39+
40+
- Added `TelnyxFrameSerializer` to support Telnyx calls. A full running example
41+
has also been added to `examples/telnyx-chatbot`.
42+
43+
- Allow pushing silence audio frames before `TTSStoppedFrame`. This might be
44+
useful for testing purposes, for example, passing bot audio to an STT service
45+
which usually needs additional audio data to detect the utterance stopped.
46+
47+
- `TwilioSerializer` now supports transport message frames. With this we can
48+
create Twilio emulators.
49+
50+
- Added a new transport: `WebsocketClientTransport`.
51+
52+
- Added a `metadata` field to `Frame` which makes it possible to pass custom
53+
data to all frames.
54+
55+
- Added `test/utils.py` inside of pipecat package.
56+
57+
### Changed
58+
59+
- `GatedOpenAILLMContextAggregator` now require keyword arguments. Also, a new
60+
`start_open` argument has been added to set the initial state of the gate.
61+
62+
- Added `organization` and `project` level authentication to
63+
`OpenAILLMService`.
64+
65+
- Improved the language checking logic in `ElevenLabsTTSService` and
66+
`ElevenLabsHttpTTSService` to properly handle language codes based on model
67+
compatibility, with appropriate warnings when language codes cannot be
68+
applied.
69+
70+
- Updated `GoogleLLMContext` to support pushing `LLMMessagesUpdateFrame`s that
71+
contain a combination of function calls, function call responses, system
72+
messages, or just messages.
73+
74+
- `InputDTMFFrame` is now based on `DTMFFrame`. There's also a new
75+
`OutputDTMFFrame` frame.
76+
77+
### Deprecated
78+
79+
- `resample_audio()` is now deprecated, use `create_default_resampler()`
80+
instead.
81+
82+
### Removed
83+
84+
- `AudioBufferProcessor.reset_audio_buffers()` has been removed, use
85+
`AudioBufferProcessor.start_recording()` and
86+
``AudioBufferProcessor.stop_recording()` instead.
87+
88+
### Fixed
89+
90+
- Fixed a `AudioBufferProcessor` that would cause crackling in some recordings.
91+
92+
- Fixed an issue in `AudioBufferProcessor` where user callback would not be
93+
called on task cancellation.
94+
95+
- Fixed an issue in `AudioBufferProcessor` that would cause wrong silence
96+
padding in some cases.
97+
98+
- Fixed an issue where `ElevenLabsTTSService` messages would return a 1009
99+
websocket error by increasing the max message size limit to 16MB.
100+
101+
- Fixed a `DailyTransport` issue that would cause events to be triggered before
102+
join finished.
103+
104+
- Fixed a `PipelineTask` issue that was preventing processors to be cleaned up
105+
after cancelling the task.
106+
107+
- Fixed an issue where queuing a `CancelFrame` to a pipeline task would not
108+
cause the task to finish. However, using `PipelineTask.cancel()` is still the
109+
recommended way to cancel a task.
110+
111+
### Other
112+
113+
- Improved Unit Test `run_test()` to use `PipelineTask` and
114+
`PipelineRunner`. There's now also some control around `StartFrame` and
115+
`EndFrame`. The `EndTaskFrame` has been removed since it doesn't seem
116+
necessary with this new approach.
117+
118+
- Updated `twilio-chatbot` with a few new features: use 8000 sample rate and
119+
avoid resampling, a new client useful for stress testing and testing locally
120+
without the need to make phone calls. Also, added audio recording on both the
121+
client and the server to make sure the audio sounds good.
122+
123+
- Updated examples to use `task.cancel()` to immediately exit the example when a
124+
participant leaves or disconnects, instead of pushing an `EndFrame`. Pushing
125+
an `EndFrame` causes the bot to run through everything that is internally
126+
queued (which could take some seconds). Note that using `task.cancel()` might
127+
not always be the best option and pushing an `EndFrame` could still be
128+
desirable to make sure all the pipeline is flushed.
129+
130+
## [0.0.54] - 2025-01-27
131+
132+
### Added
133+
134+
- In order to create tasks in Pipecat frame processors it is now recommended to
135+
use `FrameProcessor.create_task()` (which uses the new
136+
`utils.asyncio.create_task()`). It takes care of uncaught exceptions, task
137+
cancellation handling and task management. To cancel or wait for a task there
138+
is `FrameProcessor.cancel_task()` and `FrameProcessor.wait_for_task()`. All of
139+
Pipecat processors have been updated accordingly. Also, when a pipeline runner
140+
finishes, a warning about dangling tasks might appear, which indicates if any
141+
of the created tasks was never cancelled or awaited for (using these new
142+
functions).
143+
144+
- It is now possible to specify the period of the `PipelineTask` heartbeat
145+
frames with `heartbeats_period_secs`.
146+
147+
- Added `DailyMeetingTokenProperties` and `DailyMeetingTokenParams` Pydantic models
148+
for meeting token creation in `get_token` method of `DailyRESTHelper`.
149+
150+
- Added `enable_recording` and `geo` parameters to `DailyRoomProperties`.
151+
152+
- Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings to a custom AWS bucket.
153+
154+
### Changed
155+
156+
- Enhanced `UserIdleProcessor` with retry functionality and control over idle
157+
monitoring via new callback signature `(processor, retry_count) -> bool`.
158+
Updated the `17-detect-user-idle.py` to show how to use the `retry_count`.
159+
160+
- Add defensive error handling for `OpenAIRealtimeBetaLLMService`'s audio
161+
truncation. Audio truncation errors during interruptions now log a warning
162+
and allow the session to continue instead of throwing an exception.
163+
164+
- Modified `TranscriptProcessor` to use TTS text frames for more accurate assistant
165+
transcripts. Assistant messages are now aggregated based on bot speaking boundaries
166+
rather than LLM context, providing better handling of interruptions and partial
167+
utterances.
168+
169+
- Updated foundational examples `28a-transcription-processor-openai.py`,
170+
`28b-transcript-processor-anthropic.py`, and
171+
`28c-transcription-processor-gemini.py` to use the updated
172+
`TranscriptProcessor`.
173+
174+
### Fixed
175+
176+
- Fixed an `GeminiMultimodalLiveLLMService` issue that was preventing the user
177+
to push initial LLM assistant messages (using `LLMMessagesAppendFrame`).
178+
179+
- Added missing `FrameProcessor.cleanup()` calls to `Pipeline`,
180+
`ParallelPipeline` and `UserIdleProcessor`.
181+
182+
- Fixed a type error when using `voice_settings` in `ElevenLabsHttpTTSService`.
183+
184+
- Fixed an issue where `OpenAIRealtimeBetaLLMService` function calling resulted
185+
in an error.
186+
187+
- Fixed an issue in `AudioBufferProcessor` where the last audio buffer was not
188+
being processed, in cases where the `_user_audio_buffer` was smaller than the
189+
buffer size.
190+
191+
### Performance
192+
193+
- Replaced audio resampling library `resampy` with `soxr`. Resampling a 2:21s
194+
audio file from 24KHz to 16KHz took 1.41s with `resampy` and 0.031s with
195+
`soxr` with similar audio quality.
196+
197+
### Other
198+
199+
- Added initial unit test infrastructure.
200+
8201
## [0.0.53] - 2025-01-18
9202

10203
### Added
11204

12-
- Added `ElevenLabsHttpTTSService` and the
13-
`07d-interruptible-elevenlabs-http.py` foundational example.
205+
- Added `ElevenLabsHttpTTSService` which uses EleveLabs' HTTP API instead of the
206+
websocket one.
14207

15208
- Introduced pipeline frame observers. Observers can view all the frames that go
16209
through the pipeline without the need to inject processors in the
@@ -1381,6 +1574,9 @@ async def on_connected(processor):
13811574

13821575
### Changed
13831576

1577+
- `FrameSerializer.serialize()` and `FrameSerializer.deserialize()` are now
1578+
`async`.
1579+
13841580
- `Filter` has been renamed to `FrameFilter` and it's now under
13851581
`processors/filters`.
13861582

README.md

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
 <img alt="pipecat" width="300px" height="auto" src="https://raw.githubusercontent.com/pipecat-ai/pipecat/main/pipecat.png">
33
</div></h1>
44

5-
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) <a href="https://app.commanddash.io/agent/github_pipecat-ai_pipecat"><img src="https://img.shields.io/badge/AI-Code%20Agent-EB9FDA"></a>
5+
[![PyPI](https://img.shields.io/pypi/v/pipecat-ai)](https://pypi.org/project/pipecat-ai) ![Tests](https://github.com/pipecat-ai/pipecat/actions/workflows/tests.yaml/badge.svg) [![Docs](https://img.shields.io/badge/Documentation-blue)](https://docs.pipecat.ai) [![Discord](https://img.shields.io/discord/1239284677165056021)](https://discord.gg/pipecat) <a href="https://app.commanddash.io/agent/github_pipecat-ai_pipecat"><img src="https://img.shields.io/badge/AI-Code%20Agent-EB9FDA"></a>
66

77
Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.
88

@@ -53,13 +53,7 @@ To keep things lightweight, only the core framework is included by default. If y
5353
pip install "pipecat-ai[option,...]"
5454
```
5555

56-
Or you can install all of them with:
57-
58-
```shell
59-
pip install "pipecat-ai[all]"
60-
```
61-
62-
Available options include:
56+
### Available services
6357

6458
| Category | Services | Install Command Example |
6559
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------- |
@@ -87,7 +81,7 @@ Here is a very basic Pipecat bot that greets a user when they join a real-time s
8781
```python
8882
import asyncio
8983

90-
from pipecat.frames.frames import EndFrame, TextFrame
84+
from pipecat.frames.frames import TextFrame
9185
from pipecat.pipeline.pipeline import Pipeline
9286
from pipecat.pipeline.task import PipelineTask
9387
from pipecat.pipeline.runner import PipelineRunner
@@ -128,7 +122,7 @@ async def main():
128122
# Register an event handler to exit the application when the user leaves.
129123
@transport.event_handler("on_participant_left")
130124
async def on_participant_left(transport, participant, reason):
131-
await task.queue_frame(EndFrame())
125+
await task.cancel()
132126

133127
# Run the pipeline task
134128
await runner.run(task)
@@ -195,7 +189,7 @@ pip install "path_to_this_repo[option,...]"
195189
From the root directory, run:
196190

197191
```shell
198-
pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
192+
pytest
199193
```
200194

201195
## Setting up your editor

dev-requirements.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
build~=1.2.2
2-
grpcio-tools~=1.69.0
2+
grpcio-tools~=1.67.1
33
pip-tools~=7.4.1
44
pre-commit~=4.0.1
55
pyright~=1.1.392
66
pytest~=8.3.4
7+
pytest-asyncio~=0.25.2
78
ruff~=0.9.1
8-
setuptools~=75.8.0
9+
setuptools~=70.0.0
910
setuptools_scm~=8.1.0
1011
python-dotenv~=1.0.1

examples/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,10 @@ Next, follow the steps in the README for each demo.
3939
| [Translation Chatbot](translation-chatbot) | Listens for user speech, then translates that speech to Spanish and speaks the translation back. Demonstrates multi-participant use-cases. | Deepgram, Azure, OpenAI, Daily, Daily Prebuilt UI |
4040
| [Moondream Chatbot](moondream-chatbot) | Demonstrates how to add vision capabilities to GPT4. **Note: works best with a GPU** | Deepgram, ElevenLabs, OpenAI, Moondream, Daily, Daily Prebuilt UI |
4141
| [Patient intake](patient-intake) | A chatbot that can call functions in response to user input. | Deepgram, ElevenLabs, OpenAI, Daily, Daily Prebuilt UI |
42-
| [Dialin Chatbot](dialin-chatbot) | A chatbot that connects to an incoming phone call from Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
42+
| [Phone Chatbot](phone-chatbot) | A chatbot that connects to PSTN/SIP phone calls, powered by Daily or Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
4343
| [Twilio Chatbot](twilio-chatbot) | A chatbot that connects to an incoming phone call from Twilio. | Deepgram, ElevenLabs, OpenAI, Daily, Twilio |
4444
| [studypal](studypal) | A chatbot to have a conversation about any article on the web | |
45-
| [WebSocket Chatbot Server](websocket-server) | A real-time websocket server that handles audio streaming and bot interactions with speech-to-text and text-to-speech capabilities | `python-websockets`, `openai`, `deepgram`, `silero-tts`, `numpy` |
45+
| [WebSocket Chatbot Server](websocket-server) | A real-time websocket server that handles audio streaming and bot interactions with speech-to-text and text-to-speech capabilities. | Cartesia, Deepgram, OpenAI, Websockets |
4646

4747
> [!IMPORTANT]
4848
> These example projects use Daily as a WebRTC transport and can be joined using their hosted Prebuilt UI.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Bot ready signaling
2+
3+
A simple Pipecat example demonstrating how to handle signaling between the client and the bot,
4+
ensuring that the bot starts sending audio only when the client is available,
5+
thereby avoiding the risk of cutting off the beginning of the audio.
6+
7+
## Quick Start
8+
9+
### First, start the bot server:
10+
11+
1. Navigate to the server directory:
12+
```bash
13+
cd server
14+
```
15+
2. Create and activate a virtual environment:
16+
```bash
17+
python3 -m venv venv
18+
source venv/bin/activate # On Windows: venv\Scripts\activate
19+
```
20+
3. Install requirements:
21+
```bash
22+
pip install -r requirements.txt
23+
```
24+
4. Copy env.example to .env and configure:
25+
- Add your API keys
26+
5. Start the server:
27+
```bash
28+
python server.py
29+
```
30+
31+
### Next, connect using the client app:
32+
33+
For client-side setup, refer to the [JavaScript Guide](client/javascript/README.md).
34+
35+
## Important Note
36+
37+
Ensure the bot server is running before using any client implementations.
38+
39+
## Requirements
40+
41+
- Python 3.10+
42+
- Node.js 16+ (for JavaScript)
43+
- Daily API key
44+
- Cartesia API key
45+
- Modern web browser with WebRTC support
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# JavaScript Implementation
2+
3+
Basic implementation using the [Pipecat JavaScript SDK](https://docs.pipecat.ai/client/js/introduction).
4+
5+
## Setup
6+
7+
1. Run the bot server. See the [server README](../../README).
8+
9+
2. Navigate to the `client/javascript` directory:
10+
11+
```bash
12+
cd client/javascript
13+
```
14+
15+
3. Install dependencies:
16+
17+
```bash
18+
npm install
19+
```
20+
21+
4. Run the client app:
22+
23+
```
24+
npm run dev
25+
```
26+
27+
5. Visit http://localhost:5173 in your browser.

0 commit comments

Comments
 (0)