Releases: sgl-project/rbg
Releases · sgl-project/rbg
v0.5.0
🌟 Highlights
- InstanceSet Workload: Introduced native support for
InstanceSet, enabling fine-grained management for stateful workloads. - Coordinated Rollout: Implemented synchronized updates across interdependent roles to maintain system stability during upgrades.
- In-Place Updates: Enabled efficient updates by using
InstanceSetworkloads without requiring pod recreation. - Revision Management: Added full
ControllerRevisionsupport for robust version tracking and rollback capabilities.
🚀 Features
- InstanceSet Support
- Introduced
InstanceSetAPI and Controller. - Added support for using
InstanceSetas a Role Workload in RoleBasedGroup (RBG).
- Introduced
- Controller Revision & Versioning
- Added
ControllerRevisionsupport to RoleBasedGroup for better version control. - Updated
rbgctlto support RBG revision operations. - Enabled updating roles using
controllerrevisionhash.
- Added
- Orchestration & Scheduling
- Implemented rollout coordination for roles in RBG.
- Supported parallel execution for roles sharing the same dependencies.
- Added support for Pod Group Policy.
- Runtime & Integrations
- Added Mooncake integration.
- Added Engine Runtime support.
- Supported
sgl-routerPD (Prefix Disaggregation) with engine runtime.
- Miscellaneous
- Added support for role-level metadata.
- Migrated
Template,leaderWorkerSet, andRollingUpdatefields from value to pointer types.
🐛 Bug Fixes
- StatefulSet & Reconciler
- Fixed
sts_reconcilerfailure when retrieving historical StatefulSet revisions. - Updated StatefulSet service naming to meet Kubernetes requirements (compatible headless service name).
- Fixed
- Workload & Updates
- Fixed an issue where updating a role did not trigger a rolling update.
- Fixed leader pods with
InstanceSetworkload lacking environment variables. - Added max length checks for
workloadNameandserviceName.
- Cleanup & Resources
- Fixed issue where corresponding PodGroups created by RBG were not deleted during gang scheduling.
- Fixed incorrect
apiVersionin auto-generated applyconfigurations. - Fixed
InstanceSetRBAC and LWS environment build issues.
📖 Documentation
- Added missing model examples.
- Updated Gang Scheduling documentation.
- Updated
sglangPD disaggregation example withsgl-router. - Fixed link errors in README and added CI checks for markdown quality.
What's Changed
- fix build image error in Makefile by @gujingit in #41
- feat: Support parallel execution for roles with same dependencies by @tlipoca9 in #42
- Build: update rbg helm chart by @cheyang in #44
- feat: Supports using controllerrevision hash to update role by @bcfre in #34
- feat: add instanceset api by @veophi in #50
- doc: add missing model examples by @bcfre in #49
- feat: use code-generator to generate applyconfiguration code by @liubing0427 in #48
- Build docker image for supporting controller revision by @cheyang in #51
- [WIP]: Add in-place update api and core codes for InstanceSet by @veophi in #52
- fix: change stateful set service name to meet k8s requirements by @liubing0427 in #53
- feat: add engine runtime by @gujingit in #55
- bugfix: add max len check for workloadName & serviceName by @gujingit in #58
- fix: delete corresponding podgroup created by rbg when gang-schedulin… by @ShirleyDing in #57
- Update Helm chart 0.5.0-alpha.1 by @cheyang in #60
- fix: address review comments in pr-57 by @bcfre in #67
- doc: update gang scheduling by @bcfre in #62
- feat: The ControllerRevision not store the replicas for RBG Roles by @bcfre in #61
- CI:add release script by @cheyang in #71
- feat: rbgctl supports rbg revision operations by @bcfre in #54
- KEP-8: Reduce YAML Duplication via RoleTemplates by @LikiosSedo in #70
- Release 0.5.0-alpha.2 by @cheyang in #73
- feat: add instance controller by @yangsoon in #66
- fix: sts_reconciler failure to retrieve historical StatefulSet revisions by @bcfre in #77
- fix(sts-reconciler): use compatible headless service name for statefu… by @TrafalgarZZZ in #81
- [KEP-31]: Adding ControllerRevision support to the RoleBasedGroup by @bcfre in #27
- Release 0.5.0-alpha.3 by @cheyang in #78
- feat(engine-runtime): support sgl-router pd disaggregation with engine runtime by @TrafalgarZZZ in #82
- KEP-30: add role coordination kep by @gujingit in #59
- feat: add instanceset controller codes by @veophi in #83
- KEP-8: Refine API naming and preview/diff design based on community feedback by @LikiosSedo in #79
- [KEP-30]: Introduce InstanceSet Workload Support in RoleBasedGroup for Improved LLM Orchestration by @veophi in #26
- Fix(rbg): update role not trigger rolling update by @Syspretor in #89
- doc: Update sglang pd disaggregation example with sgl-router by @TrafalgarZZZ in #86
- KEP 74: Mooncake integration by @Syspretor in #75
- chore(codegen): Generate instanceSet related go-client codes by @Syspretor in #93
- chore(hack): add script to generate manifests.yaml by @Syspretor in #95
- feat(rbg): add rbac with resource instancesets/instances by @Syspretor in #97
- Build: update rbg helm chart 0.5.0-alpha.4 by @cheyang in #98
- fix(rbg): fix incorrect apiVersion in auto-generated applyconfigurations by @Syspretor in #101
- docs: fix link error in README; fix lint errors in markdown files; add a CI to automatically check markdown quality by @Phil-Fan in #100
- feat: impl rollout coordination for roles in rbg by @veophi in #91
- fix: add explicit permissions to the docs-check workflow by @Phil-Fan in #105
- rbgs support pod group policy by @nightmeng in #107
- refactor: migrate Template field from value to pointer type by @LikiosSedo in #102
- chore(rbg): migrate role.leaderWorkerSet field type from value to poi… by @Syspretor in #108
- feat(rbg): support role-level metadata by @Syspretor in #113
- feature(rbg): support to use InstanceSet as role workload by @Syspretor in #110
- chore(rbg): migrate RollingUpdate field type by @Syspretor in #114
- Update readme by @cheyang in #111
- fix(rbg): fix instanceset rbac and lws env build by @Syspretor in #116
- fix(rbg): fix leader pod with instanceset workload lack envs by @Syspretor in #122
- Build: update rbg helm chart 0.5.0 by @cheyang in #123
New Contributors
- @tlipoca9 made their first contribution in #42
- @veophi made their first contribution in #50
- @liubing0427 made their first contribution in #48
- @ShirleyDing made their first contribution in #57
- @LikiosSedo made their first contribution in #70
- @yangsoon made their first contribution in #66
- @TrafalgarZZZ made their first contribution in #81
- @Syspretor made their first contribution in #89
- @Phil-Fan made their first contribution in #100
- @nightmeng made their first contribution in #107
Full Changelog: v0.4.0...v0.5.0
v0.5.0-alpha.4
What's Changed
- feat(engine-runtime): support sgl-router pd disaggregation with engine runtime by @TrafalgarZZZ in #82
- KEP-30: add role coordination kep by @gujingit in #59
- feat: add instanceset controller codes by @veophi in #83
- KEP-8: Refine API naming and preview/diff design based on community feedback by @LikiosSedo in #79
- [KEP-30]: Introduce InstanceSet Workload Support in RoleBasedGroup for Improved LLM Orchestration by @veophi in #26
- Fix(rbg): update role not trigger rolling update by @Syspretor in #89
- doc: Update sglang pd disaggregation example with sgl-router by @TrafalgarZZZ in #86
- KEP 74: Mooncake integration by @Syspretor in #75
- chore(codegen): Generate instanceSet related go-client codes by @Syspretor in #93
- chore(hack): add script to generate manifests.yaml by @Syspretor in #95
- feat(rbg): add rbac with resource instancesets/instances by @Syspretor in #97
- Build: update rbg helm chart 0.5.0-alpha.4 by @cheyang in #98
New Contributors
- @Syspretor made their first contribution in #89
Full Changelog: v0.5.0-alpha.3...v0.5.0-alpha.4
v0.5.0-alpha.3
What's Changed
- feat: add instance controller by @yangsoon in #66
- fix: sts_reconciler failure to retrieve historical StatefulSet revisions by @bcfre in #77
- fix(sts-reconciler): use compatible headless service name for statefu… by @TrafalgarZZZ in #81
- [KEP-31]: Adding ControllerRevision support to the RoleBasedGroup by @bcfre in #27
- Release 0.5.0-alpha.3 by @cheyang in #78
New Contributors
- @yangsoon made their first contribution in #66
- @TrafalgarZZZ made their first contribution in #81
Full Changelog: v0.5.0-alpha.2...v0.5.0-alpha.3
v0.5.0-alpha.2
What's Changed
- fix: address review comments in pr-57 by @bcfre in #67
- doc: update gang scheduling by @bcfre in #62
- feat: The ControllerRevision not store the replicas for RBG Roles by @bcfre in #61
- CI:add release script by @cheyang in #71
- feat: rbgctl supports rbg revision operations by @bcfre in #54
- KEP-8: Reduce YAML Duplication via RoleTemplates by @LikiosSedo in #70
- Release 0.5.0-alpha.2 by @cheyang in #73
New Contributors
- @LikiosSedo made their first contribution in #70
Full Changelog: v0.5.0-alpha.1...v0.5.0-alpha.2
v0.5.0-alpha.1
What's Changed
- fix build image error in Makefile by @gujingit in #41
- feat: Support parallel execution for roles with same dependencies by @tlipoca9 in #42
- Build: update rbg helm chart by @cheyang in #44
- feat: Supports using controllerrevision hash to update role by @bcfre in #34
- feat: add instanceset api by @veophi in #50
- doc: add missing model examples by @bcfre in #49
- feat: use code-generator to generate applyconfiguration code by @liubing0427 in #48
- Build docker image for supporting controller revision by @cheyang in #51
- [WIP]: Add in-place update api and core codes for InstanceSet by @veophi in #52
- fix: change stateful set service name to meet k8s requirements by @liubing0427 in #53
- feat: add engine runtime by @gujingit in #55
- bugfix: add max len check for workloadName & serviceName by @gujingit in #58
- fix: delete corresponding podgroup created by rbg when gang-schedulin… by @ShirleyDing in #57
- Update Helm chart 0.5.0-alpha.1 by @cheyang in #60
New Contributors
- @tlipoca9 made their first contribution in #42
- @veophi made their first contribution in #50
- @liubing0427 made their first contribution in #48
- @ShirleyDing made their first contribution in #57
Full Changelog: v0.4.0...v0.5.0-alpha.1
v0.4.0
Immutable
release. Only release title and notes can be modified.
What's Changed
Features
- feat: support rbgs scaling by @gujingit in #1
- add workload status update event by @gujingit in #6
- refactor: update dynamo demo; remove Chinese comments by @gujingit in #13
- feat: Add pull request template by @bcfre in #11
- add status check when diff workload by @gujingit in #15
- feat: Format action templates to match sglang's pattern by @Pikabooboo in #19
- add unit-tests by @gujingit in #28
- support partition in rollingupdate by @gujingit in #30
- feat: Add support for 1:1 rbg per topology assignment by @gujingit in #32
- perf: reduce api-server load caused by exclusive-topology by @cheyang in #33
- feature: support volcano podgroup by @ZYecho11 in #14
Bugfixs
- bugfix: Fix the permission issue that rbgs controller cannot create rbg by @bcfre in #12
- bugfix: Added consistency check for probes by @bcfre in #29
Build & CI
- Add build CI by @gujingit in #17
- CI: disable golint lll check by @gujingit in #37
- Enhance/replace with docker hub by @cheyang in #40
- Build: add vendor by @gujingit in #39
Docs
- doc: Using SGLang as the default inference engine by @gujingit in #4
- doc: Add CONTRIBUTING.md, development guide, updated image building logic by @bcfre in #9
New Contributors
- @bcfre made their first contribution in #12
- @Pikabooboo made their first contribution in #19
- @ZYecho11 made their first contribution in #14
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's Changed
- add lws crd exist check by @gujingit in AliyunContainerService/rolebasedgroup#8
- Support rolling update by @gujingit in AliyunContainerService/rolebasedgroup#7
- support restart policy for role; support rollingupdate policy nil by @gujingit in AliyunContainerService/rolebasedgroup#10
- add e2e tests by @gujingit in AliyunContainerService/rolebasedgroup#13
- Feature: add pods read related rules for rbg cluster role by @Syspretor in AliyunContainerService/rolebasedgroup#16
- Support for dynamic watch lws CRD by @zmberg in AliyunContainerService/rolebasedgroup#12
- update helm version to v0.3.0 by @gujingit in AliyunContainerService/rolebasedgroup#18
- remove unnecessary pod envs & labels & annotations by @gujingit in AliyunContainerService/rolebasedgroup#20
- Feature: support rbg scaling adapter by @Syspretor in AliyunContainerService/rolebasedgroup#11
- support gang scheduling by @gujingit in AliyunContainerService/rolebasedgroup#19
- fix e2etest error by @gujingit in AliyunContainerService/rolebasedgroup#21
- change rollingStrategy to ptr by @gujingit in AliyunContainerService/rolebasedgroup#22
- Fix: ensure a fallback reconcile logic of scalingAdapters by @Syspretor in AliyunContainerService/rolebasedgroup#23
- dymaic watch podGroup CRD by @zmberg in AliyunContainerService/rolebasedgroup#24
- add upgrade crd tools by @gujingit in AliyunContainerService/rolebasedgroup#25
- remove default value for restart policy in crd by @gujingit in AliyunContainerService/rolebasedgroup#26
New Contributors
- @Syspretor made their first contribution in AliyunContainerService/rolebasedgroup#16
- @zmberg made their first contribution in AliyunContainerService/rolebasedgroup#12
Full Changelog: AliyunContainerService/rolebasedgroup@v0.2.0...v0.3.0
Full Changelog: v0.3.0...v0.3.0