Skip to content

Releases: sgl-project/rbg

v0.5.0

03 Dec 13:15
74e94ed

Choose a tag to compare

🌟 Highlights

  • InstanceSet Workload: Introduced native support for InstanceSet, enabling fine-grained management for stateful workloads.
  • Coordinated Rollout: Implemented synchronized updates across interdependent roles to maintain system stability during upgrades.
  • In-Place Updates: Enabled efficient updates by using InstanceSet workloads without requiring pod recreation.
  • Revision Management: Added full ControllerRevision support for robust version tracking and rollback capabilities.

🚀 Features

  • InstanceSet Support
    • Introduced InstanceSet API and Controller.
    • Added support for using InstanceSet as a Role Workload in RoleBasedGroup (RBG).
  • Controller Revision & Versioning
    • Added ControllerRevision support to RoleBasedGroup for better version control.
    • Updated rbgctl to support RBG revision operations.
    • Enabled updating roles using controllerrevision hash.
  • Orchestration & Scheduling
    • Implemented rollout coordination for roles in RBG.
    • Supported parallel execution for roles sharing the same dependencies.
    • Added support for Pod Group Policy.
  • Runtime & Integrations
    • Added Mooncake integration.
    • Added Engine Runtime support.
    • Supported sgl-router PD (Prefix Disaggregation) with engine runtime.
  • Miscellaneous
    • Added support for role-level metadata.
    • Migrated Template, leaderWorkerSet, and RollingUpdate fields from value to pointer types.

🐛 Bug Fixes

  • StatefulSet & Reconciler
    • Fixed sts_reconciler failure when retrieving historical StatefulSet revisions.
    • Updated StatefulSet service naming to meet Kubernetes requirements (compatible headless service name).
  • Workload & Updates
    • Fixed an issue where updating a role did not trigger a rolling update.
    • Fixed leader pods with InstanceSet workload lacking environment variables.
    • Added max length checks for workloadName and serviceName.
  • Cleanup & Resources
    • Fixed issue where corresponding PodGroups created by RBG were not deleted during gang scheduling.
    • Fixed incorrect apiVersion in auto-generated applyconfigurations.
    • Fixed InstanceSet RBAC and LWS environment build issues.

📖 Documentation

  • Added missing model examples.
  • Updated Gang Scheduling documentation.
  • Updated sglang PD disaggregation example with sgl-router.
  • Fixed link errors in README and added CI checks for markdown quality.

What's Changed

  • fix build image error in Makefile by @gujingit in #41
  • feat: Support parallel execution for roles with same dependencies by @tlipoca9 in #42
  • Build: update rbg helm chart by @cheyang in #44
  • feat: Supports using controllerrevision hash to update role by @bcfre in #34
  • feat: add instanceset api by @veophi in #50
  • doc: add missing model examples by @bcfre in #49
  • feat: use code-generator to generate applyconfiguration code by @liubing0427 in #48
  • Build docker image for supporting controller revision by @cheyang in #51
  • [WIP]: Add in-place update api and core codes for InstanceSet by @veophi in #52
  • fix: change stateful set service name to meet k8s requirements by @liubing0427 in #53
  • feat: add engine runtime by @gujingit in #55
  • bugfix: add max len check for workloadName & serviceName by @gujingit in #58
  • fix: delete corresponding podgroup created by rbg when gang-schedulin… by @ShirleyDing in #57
  • Update Helm chart 0.5.0-alpha.1 by @cheyang in #60
  • fix: address review comments in pr-57 by @bcfre in #67
  • doc: update gang scheduling by @bcfre in #62
  • feat: The ControllerRevision not store the replicas for RBG Roles by @bcfre in #61
  • CI:add release script by @cheyang in #71
  • feat: rbgctl supports rbg revision operations by @bcfre in #54
  • KEP-8: Reduce YAML Duplication via RoleTemplates by @LikiosSedo in #70
  • Release 0.5.0-alpha.2 by @cheyang in #73
  • feat: add instance controller by @yangsoon in #66
  • fix: sts_reconciler failure to retrieve historical StatefulSet revisions by @bcfre in #77
  • fix(sts-reconciler): use compatible headless service name for statefu… by @TrafalgarZZZ in #81
  • [KEP-31]: Adding ControllerRevision support to the RoleBasedGroup by @bcfre in #27
  • Release 0.5.0-alpha.3 by @cheyang in #78
  • feat(engine-runtime): support sgl-router pd disaggregation with engine runtime by @TrafalgarZZZ in #82
  • KEP-30: add role coordination kep by @gujingit in #59
  • feat: add instanceset controller codes by @veophi in #83
  • KEP-8: Refine API naming and preview/diff design based on community feedback by @LikiosSedo in #79
  • [KEP-30]: Introduce InstanceSet Workload Support in RoleBasedGroup for Improved LLM Orchestration by @veophi in #26
  • Fix(rbg): update role not trigger rolling update by @Syspretor in #89
  • doc: Update sglang pd disaggregation example with sgl-router by @TrafalgarZZZ in #86
  • KEP 74: Mooncake integration by @Syspretor in #75
  • chore(codegen): Generate instanceSet related go-client codes by @Syspretor in #93
  • chore(hack): add script to generate manifests.yaml by @Syspretor in #95
  • feat(rbg): add rbac with resource instancesets/instances by @Syspretor in #97
  • Build: update rbg helm chart 0.5.0-alpha.4 by @cheyang in #98
  • fix(rbg): fix incorrect apiVersion in auto-generated applyconfigurations by @Syspretor in #101
  • docs: fix link error in README; fix lint errors in markdown files; add a CI to automatically check markdown quality by @Phil-Fan in #100
  • feat: impl rollout coordination for roles in rbg by @veophi in #91
  • fix: add explicit permissions to the docs-check workflow by @Phil-Fan in #105
  • rbgs support pod group policy by @nightmeng in #107
  • refactor: migrate Template field from value to pointer type by @LikiosSedo in #102
  • chore(rbg): migrate role.leaderWorkerSet field type from value to poi… by @Syspretor in #108
  • feat(rbg): support role-level metadata by @Syspretor in #113
  • feature(rbg): support to use InstanceSet as role workload by @Syspretor in #110
  • chore(rbg): migrate RollingUpdate field type by @Syspretor in #114
  • Update readme by @cheyang in #111
  • fix(rbg): fix instanceset rbac and lws env build by @Syspretor in #116
  • fix(rbg): fix leader pod with instanceset workload lack envs by @Syspretor in #122
  • Build: update rbg helm chart 0.5.0 by @cheyang in #123

New Contributors

Full Changelog: v0.4.0...v0.5.0

v0.5.0-alpha.4

12 Nov 05:55
5210dde

Choose a tag to compare

v0.5.0-alpha.4 Pre-release
Pre-release

What's Changed

  • feat(engine-runtime): support sgl-router pd disaggregation with engine runtime by @TrafalgarZZZ in #82
  • KEP-30: add role coordination kep by @gujingit in #59
  • feat: add instanceset controller codes by @veophi in #83
  • KEP-8: Refine API naming and preview/diff design based on community feedback by @LikiosSedo in #79
  • [KEP-30]: Introduce InstanceSet Workload Support in RoleBasedGroup for Improved LLM Orchestration by @veophi in #26
  • Fix(rbg): update role not trigger rolling update by @Syspretor in #89
  • doc: Update sglang pd disaggregation example with sgl-router by @TrafalgarZZZ in #86
  • KEP 74: Mooncake integration by @Syspretor in #75
  • chore(codegen): Generate instanceSet related go-client codes by @Syspretor in #93
  • chore(hack): add script to generate manifests.yaml by @Syspretor in #95
  • feat(rbg): add rbac with resource instancesets/instances by @Syspretor in #97
  • Build: update rbg helm chart 0.5.0-alpha.4 by @cheyang in #98

New Contributors

Full Changelog: v0.5.0-alpha.3...v0.5.0-alpha.4

v0.5.0-alpha.3

02 Nov 09:52
29cc2a5

Choose a tag to compare

v0.5.0-alpha.3 Pre-release
Pre-release

What's Changed

  • feat: add instance controller by @yangsoon in #66
  • fix: sts_reconciler failure to retrieve historical StatefulSet revisions by @bcfre in #77
  • fix(sts-reconciler): use compatible headless service name for statefu… by @TrafalgarZZZ in #81
  • [KEP-31]: Adding ControllerRevision support to the RoleBasedGroup by @bcfre in #27
  • Release 0.5.0-alpha.3 by @cheyang in #78

New Contributors

Full Changelog: v0.5.0-alpha.2...v0.5.0-alpha.3

v0.5.0-alpha.2

28 Oct 02:13
b475502

Choose a tag to compare

v0.5.0-alpha.2 Pre-release
Pre-release

What's Changed

  • fix: address review comments in pr-57 by @bcfre in #67
  • doc: update gang scheduling by @bcfre in #62
  • feat: The ControllerRevision not store the replicas for RBG Roles by @bcfre in #61
  • CI:add release script by @cheyang in #71
  • feat: rbgctl supports rbg revision operations by @bcfre in #54
  • KEP-8: Reduce YAML Duplication via RoleTemplates by @LikiosSedo in #70
  • Release 0.5.0-alpha.2 by @cheyang in #73

New Contributors

Full Changelog: v0.5.0-alpha.1...v0.5.0-alpha.2

v0.5.0-alpha.1

26 Oct 08:25
d875e76

Choose a tag to compare

v0.5.0-alpha.1 Pre-release
Pre-release

What's Changed

  • fix build image error in Makefile by @gujingit in #41
  • feat: Support parallel execution for roles with same dependencies by @tlipoca9 in #42
  • Build: update rbg helm chart by @cheyang in #44
  • feat: Supports using controllerrevision hash to update role by @bcfre in #34
  • feat: add instanceset api by @veophi in #50
  • doc: add missing model examples by @bcfre in #49
  • feat: use code-generator to generate applyconfiguration code by @liubing0427 in #48
  • Build docker image for supporting controller revision by @cheyang in #51
  • [WIP]: Add in-place update api and core codes for InstanceSet by @veophi in #52
  • fix: change stateful set service name to meet k8s requirements by @liubing0427 in #53
  • feat: add engine runtime by @gujingit in #55
  • bugfix: add max len check for workloadName & serviceName by @gujingit in #58
  • fix: delete corresponding podgroup created by rbg when gang-schedulin… by @ShirleyDing in #57
  • Update Helm chart 0.5.0-alpha.1 by @cheyang in #60

New Contributors

Full Changelog: v0.4.0...v0.5.0-alpha.1

v0.4.0

23 Sep 03:49
Immutable release. Only release title and notes can be modified.
849757e

Choose a tag to compare

What's Changed

Features

  • feat: support rbgs scaling by @gujingit in #1
  • add workload status update event by @gujingit in #6
  • refactor: update dynamo demo; remove Chinese comments by @gujingit in #13
  • feat: Add pull request template by @bcfre in #11
  • add status check when diff workload by @gujingit in #15
  • feat: Format action templates to match sglang's pattern by @Pikabooboo in #19
  • add unit-tests by @gujingit in #28
  • support partition in rollingupdate by @gujingit in #30
  • feat: Add support for 1:1 rbg per topology assignment by @gujingit in #32
  • perf: reduce api-server load caused by exclusive-topology by @cheyang in #33
  • feature: support volcano podgroup by @ZYecho11 in #14

Bugfixs

  • bugfix: Fix the permission issue that rbgs controller cannot create rbg by @bcfre in #12
  • bugfix: Added consistency check for probes by @bcfre in #29

Build & CI

Docs

  • doc: Using SGLang as the default inference engine by @gujingit in #4
  • doc: Add CONTRIBUTING.md, development guide, updated image building logic by @bcfre in #9

New Contributors

Full Changelog: v0.3.0...v0.4.0

v0.3.0

28 Aug 06:54
77b103e

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: AliyunContainerService/rolebasedgroup@v0.2.0...v0.3.0

Full Changelog: v0.3.0...v0.3.0