Spark for Python Developers

Spark for Python Developers aims to combine the elegance and flexibility of Python with the power and versatility of Apache Spark. Spark is written in Scala and runs on the Java virtual machine. It is nevertheless polyglot and offers bindings and APIs for Java, Scala, Python, and R. Python is a well-designed language with an extensive set of specialized libraries. This book looks at PySpark within the PyData ecosystem. Some of the prominent PyData libraries include Pandas, Blaze, Scikit-Learn, Matplotlib, Seaborn, and Bokeh. These libraries are open source. They are developed, used, and maintained by the data scientist and Python developers community. PySpark integrates well with the PyData ecosystem, as endorsed by the Anaconda Python distribution. The book puts forward a journey to build data-intensive apps along with an architectural blueprint that covers the following steps: frst, set up the base infrastructure with Spark. Second, acquire, collect, process, and store the data. Third, gain insights from the collected data. Fourth, stream live data and process it in real time. Finally, visualize the information.

























剩余205页未读,继续阅读

- 尹成2017-04-19很好很强大

- 粉丝: 14
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 基于单片机的交流电机转动控制系统方案设计书.doc
- 《项目管理决策分析与评价》摸底评测.doc
- 综合布线设计方案.docx
- 区块链技术在金融领域应用的风险管理策略研究.docx
- 数据库应用技术知识点.doc
- ATS单片机停车场车位设计.doc
- 2018年度四川省大数据时代的互联网信息安全试题及答案1.doc
- 数据库设计报告1111111111111.doc
- 项目管理在农用飞机维修工程中的应用.docx
- 基于物联网的智能家居系统的设计与应用.docx
- kubernetes系列03—kubeadm安装部署K8S集群.docx
- 基于服务器虚拟化的政务云平台设计.docx
- C语言程序设计工业和信息化普通高等教育“十二五”规划教材立项项目-赵山林-高媛.doc
- matlab电炉温度控制算法比较及仿真研究分析.doc
- 电力调度自动化系统的网络安全问题与对策分析.docx
- 大数据时代人力资源管理创新策略初探.docx


